You're comment is correct, but it was not the subject my post brought up, nor the point of it. Instead, this was my point:
But you said it was unfair for people to compare Piledriver CPUs to Skylake or Haswell. It isn't unfair. AMD stayed frozen in time, while Intel did not (not that Intel has been making great leaps in performance either, but still).
Continually comparing Bulldozer or Piledriver CPUs to Sandy Bridge on newer and newer generations of software only holds so much interest. Eventually, we are going to want to see how it handles on today's hardware.
I.e. maybe the FX CPUs such as FX8350 did not get a fair chance, since the SW available at that time did not make use of its full multi-core potential.
AMD knew what was the software environment of the day just as well as Intel. If anything was "unfair" it was the FMA4/FMA3 switcheroo that Intel pulled (arguably, AMD should have been able to account for that). As far as the AVX/AVX2 split instruction thing goes, if AMD had the willingness and ability to produce their own free compiler alternative to ICC that performed just as well as ICC (if not better), then I'm sure we'd see a lot better support for split AVX and/or xOP in executables compiled in that environment. Fact is, AMD can't even provide that software, much less convince anyone to use it when compiling their code.
Now AMD has not released any 4 Module / 8 Thread follow-up to those CPUs for 3 years. So if we compare it to the latest Intel CPUs, then obviously it will not keep up.
But they have released 2m/4t CPUs, and those don't keep up with Intel's 4c/4t CPUs. Sometimes they don't even keep up with Intel's 2c/4t CPUs.
But that does not say much about how competitive it could have been at the time of release, if different SW had been available back then. To figure that out, the FX8350 will have to be compared to the Intel CPUs available back then, but with the latest SW.
Sorry to have to say it this way, but most people don't care about hardware from 2011/2012 anymore, unless they're holding on to 2500k or 2600k CPUs and bragging about how they're "still relevant" or what have you.
Finally, the good that has come out of this is that when AMD Zen is released, Bulldozer has at least already paved the way for better multi-core support in SW. So SW should be able to make better use of the 8 cores in Zen already from the start.
I don't really agree with that statement. If we're still talking games like AotS which is presumably FP-heavy without many SSE/SIMD optimizations, we have a situation where BD/PD act more like a quad than an octocore CPU performance-wise. Chips like the 3770k and 5820k have done more to put pressure on PC publishers/developers to start supporting more than 4 threads in software.
AotS is compiled with ICC afaik. Of course it will run like a dog on AMD cpus. I expect DX12 games with CPU Agnostic compilers to work nicely in every CPU out there that is not more than 5 years old.
ICC generally does a better job than the ever-ubiquitous MSVC when it comes to compiling and autovectorizing, even for AMD CPUs. There have been circumstances where ICC deliberately uses older/slower ISAs for CPUs that don't register as "genuine Intel", and much has been done to expose that problem. But if you have code that is (for example) aimed at SSE2 or SSE3 at best, then ICC is going to produce executables that outperform those compiled by GCC by a hair and MSVC by quite a bit, even on AMD CPUs.
Today, the problem comes from how ICC handles stuff like . . . AVX/AVX2 (it forces the AMD CPUs that support such instructions to split them in hardware) or xOP (ICC doesn't support xOP at all).
AMD should have invested more in compiler development years ago before they launched the first Bulldozer CPUs. Then they should have continued with compiler development and launched a full suite of software to support HSA prior to the release of Kaveri. They did neither, and they're still reaping the whirlwind.
Pointless arguing is pointless.
It is all about opmitization.
To see how all CPUs should stack just look at some synthetic CPU benchmarks like Cinebench:
Cinebench is far from perfectly optimized. At best, it shows us fp performance when a). using dated ISA extensions and b). the code is loose enough that not everything stays in l2. Cinebench R11.5 particularly puts the knife to AMD, though I can't tell if that's an ICC problem or what. R10 is (in my opinion) the most "fair" since it uses maybe SSE2. It shows you legacy code performance, which is often the most relevant software anyway.
Perfectly optimized games should have CPU hierarchy the same as CB MultiThread benchmark.
No, perfectly - and by that I mean ABSOLUTELY perfectly - optimized games should have CPU hierarchy similar to something like y-cruncher. But how many games support AVX2 or xOP? Probably none.