And alternative explanation could be that the IPC itself is there, but a truly competitive performance is not. For example if the maximum frequencies would at the moment still be in the =< 3.2GHz region, AMD might be hoping the process maturation to bring additional improvements and hold further statements / releases before they truly know what kind of clocks they can achieve. At least some Zeppelin SKUs will be facing Kaby Lake, which will operate up to 4.5GHz at stock. So even if they had perfectly competitive IPC with BDW, SKL, KBL but their design for any reason could not hit above 3.2GHz or so... It would explain a lot. Anyway, if we stick with the official statements AMD has made so far (i.e 40% IPC improvement over Excavator), Zen won't be threatening the IPC of any Intel designs newer than Ivy Bridge anyway.
I don't buy the <=3.2GHz thing.
Zen has a 19 stages integer pipeline. Bulldozer wasn't disclosed, but anything between 15 and 20. So the Zen FO4 should be in the same ballpark of Bulldozer's
Polaris is moreless a shrink of the old architecture, and gains 10-20% clock from 28nm bulk to 14nm FF
Bristol Ridge has 3.8-4.2GHz quad core + GPU in 65W...
So...
Because this single test relies on specific AES instructions, so it only means how good the CPU is performing on... guess what... AES en/decryption task.
Which is clearly NOT representative of tons of other scenarios / kind of code.
Well, if you consider that Zen has 4 128-bit FPU units whereas Intel CPUs have only 2 of them for this kind of code, I can say that no: it isn't what was expected. In theory Zen has TWICE the FPU processing power with 128-bit code, and with Blender it shown only a 2% advantage...
That's not event counting the other amount of resources (4 complex decoders, bigger uop-cache, double L1-I cache, 4 integer units, etc.) that Zen puts on that table compared to the competitors, which should put it in even better conditions. You can make a nice table and compare them.
Last INTEL architecture have 4x256 memory units, versus 3x128 of Zen. In complex FPU calculations should win Zen (but only on 128 bit code or not FMAC). In easy FPU calculation, limited by RAM B/W, should win INTEL
I user regularly Matlab that has an auto parallelization feature. It uses it for complex calculus, like a transcendent function calculation for every item of a matrix, but it doesn't use it for simple calculations like matrix summation, because this is limited by ram B/W and not FPU resources...