I don't understand your reasoning. Why would the synthetic benchmark not affect various cpu in a similar manner? Thus the comparison between them should be valid.
Only if the benchmark was optimized for a particular design could you say the delta would change.
You are not using that argument.
Yes that is not easy to understand and obviously it needs some explanation.
Let us make an example:
If you get 20% more performance in a synthetic benchmark that will reduce to let's say 14% in a real world benchmark. The numbers are just example and not a fixed rule.
The reason for this - as I stated already - is that synthetic benchmarks are coded better than real applications. So all applications will perform less than synthetic ones. This in general would not change the relative performance in theory so far, but with the worse programming other issues come into play.
The worse code of real world applications reduces the ability to make use of the capabilities of a faster processor. Most importantly you have more scaling issues in real world applications. Next important one is IO and memory limitations. Most synthetic benchmarks do not stress e.g. the memory subsystem.
Even memory benchmarks are not suited. Memory benchmarks are to test memory and the memory subsystem. For that they use special code to deliver utmost performance on memory (to exclude CPU impact). But a real world application e.g. uses a much worse routine for memory operations, so they do not get the maximum performance. Yet again this obviously only reduces absolut numbers not the deltas between them, but yet again the bad code comes with another effect of worse scaling/usage of capabilities.
The passmark benchmark is not as synthetic as e.g. Memtests, Drhystone, etc. but much more synthetic than SPEC benchmark. SPEC is the most honest benchmark out there because it uses real world applications available as source code and allows for maximum compiler optimization. So the SPEC benchmark gives you the real ability of a processor achieveable with real compilers out there. It just excludes the disabilities of software vendors.
Application benchmark sets e.g. here at Anandtech differ from SPEC results because for those a certain mix of applications is used and those applications are likly to favor one or the other CPU, just because they are compiled with same compiler (and compiler settings) for both CPUs.
Anandtech does it like this because their audience is consumers using consumer applications. Consumers are interested in what performance do I get for the applications I use. The audience for SPEC on the other hand is professionals using professional applications and those are interested in how much performance can I get from a certain CPU.
It's just different questions:
So Anandtech benchmark set -> How fast are those applications running
SPEC benchmark -> How fast is the CPU
Synthetic benchmarks -> How fast is the CPU excluding some issues (e.g. code quality, memory impact)
And therefore you should not expect that the relative values are just the same. To say it clear: If that benchmark ist true and Bulldozer is ~70% faster in this as e.g. 980X then you will see less difference in a real world application, maybe 40% just to give a sample number for a sample real world benchmark.
Just to tell you a story of my own: I wrote some consumer application (very integer and FPU and SSE intense, very low memory impact but high cache impact). I compiled this with high optimizations with Visual Studio compiler.
Result was 100 on AMD and 100 on Intel. Then I compiled with Intel compiler and result was around 72 on AMD and 110 on Intel. That really puzzled me because I expected something like 110 on AMD and 120 on Intel assuming that it favors Intel specifcs. But it was a lot slower on AMD than default compiler. So there is really a lot wrong with Intel compilers. I have evidence that Intel compiler forfeits performance improvements to be able to hurt AMD CPUs more and that is e.g. with the use of lea instructions. Intel compiler still uses lea if an add instruction would be sufficient. However using lea hurts AMD more than Intel. And that is only one of many examples you see only in the Intel compiler (others are mul replacement rules (AMD is very fast in muls), usage of push/pop etc.). Those icc problems started in around 2007 and got worse esp. since icc 10 (old Intel compilers (version 5-7) have been quite well for both architectures). That makes the situation especially worse as because of the historically good results on both CPUs in many companies the icc compiler was used.
To make an application look good on an Intel CPU and bad on AMD CPU you just take icc (version 10 or above) and that's it, there is no need to - as often written - "optimize software" for a certain CPU, on the other hand you take Pathscale or PGI if you want to favor AMD (though those do not produce special bad code for Intel). It is especially funny that with the AMD Bulldozer many of those Intel icc tricks will then work in favor (or let's say not as a disadvantage) of Bulldozer. You could see some surprises with AMD Bulldozer on benchmarks you recently thought that those will esp. favor Intel CPUs.
In addition e.g. with the application I mentioned above I found out a very strong weakpoint of all AMD CPUs so far: They have really high performance capabilities regarding throughput but it is almost impossible to make use of it. I even tried hand optimized assemler code (if CPU detected as AMD) but it's just not possible to fully utilize their capabilities with real code doing something useful. And Bulldozer is just a great attempt to overcome this issue. They offer less throughput (which does not hurt because was unused anyway) but increased effiency (I mean performance and die area efficiency). Together with the high frequency and longer pipeline design this completely changes as well how to optimze software and that is why those current icc tricks will no longer work for Bulldozer.
To summarize, AMD optimized their CPUs in the past for utmost theoretical performance. They could never reach that because either of practical programming issues and/or some kick by icc tricks.
Bulldozer architecture on the other hand is fully optimized to get utmost practical performance though if you look at paper specs they dropped a lot of theoretical features (e.g. the 3-way AGU and 3-way FLOAT unit). I mean they got 2 cores with almost the die area of one previous core. In my opinion this is the largest CPU design change ever in the x86 business since Pentium, more than P4 was and more than Core was.