New Zen microarchitecture details

Abwx · May 20, 2016

mikk said:
Assuming this is compared to an FX-8150 (CB R15 ~550), the score for Zen is roughly 1100. This is comparable to an Haswell-E based sixcore i7-5820K. But considering Zen has a 2 core advantage it would possibly mean that there is still quite a big gap in Singlethread performance to Intel.

At same TDP Zen would be 21% faster than the 5820K, since it has 33% cores advantage this amount to 9% disadvantage/core for Zen.

Of course that s without accounting for the eventual MT scaling penalty of CB from 6 cores to 8 cores, wich should be quite low, though, dunno how CB scale at this level.

Also that s assuming that SMT scaling is comparable to HyperThreading.

Anyway ST wise any hope that there will be a big gap is a pipe dream, besides we dont know the frequencies, what if it is at much lower one than HW..?.

Snafuh · May 20, 2016

Again 40% IPC improvement and 2016 is still up

http://phx.corporate-ir.net/Externa...WxkSUQ9LTF8VHlwZT0z&t=1&cb=635992964179233292

Burpo · May 20, 2016

That's one hell of a cautionary statement..

The Stilt · May 20, 2016

I find it interesting that they don't mention Excavator any more in the IPC prediction. Piledriver and Steamroller parts are still shipping (making them "current core") and it is easier to achieve the "over 40%" improvement over them than it is over Excavator.

As I've said before when directly compared to Excavator I heard and seen AMD using term "up to" as a prefix. And that started to happen quite recently.

Arachnotronic · May 20, 2016

The Stilt said:
I find it interesting that they don't mention Excavator any more in the IPC prediction. Piledriver and Steamroller parts are still shipping (making them "current core") and it is easier to achieve the "over 40%" improvement over them than it is over Excavator.

As I've said before when directly compared to Excavator I heard and seen AMD using term "up to" as a prefix. And that started to happen quite recently.

The Stilt,

Where do you expect Zen to land performance wise relative to the current crop of Intel cores? Are you hearing anything about frequencies for Summit Ridge?

Thanks.

krumme · May 20, 2016

Snafuh said:
Again 40% IPC improvement and 2016 is still up

http://phx.corporate-ir.net/Externa...WxkSUQ9LTF8VHlwZT0z&t=1&cb=635992964179233292

For as long as i remember cache latency have been very bad for amd cpu. Its like 30 years and always bad. I really hope they deliver on this braggin here because whatever tricks they have if latency is still lagging its going to be very tough against intel portfolio. Intel is crusing in second gear. So it needs to perform and do it cheap and lean.

yuri69 · May 20, 2016

The Stilt said:
I find it interesting that they don't mention Excavator any more in the IPC prediction. Piledriver and Steamroller parts are still shipping (making them "current core") and it is easier to achieve the "over 40%" improvement over them than it is over Excavator.

Sadly, it seems you are right. The linked presentation really compares Zen to Orochi on page 11.

krumme · May 20, 2016

Burpo said:
That's one hell of a cautionary statement..

Yeaa. But even with the cpu in hand what reference for workload would you use?

The Stilt · May 20, 2016

Arachnotronic said:
The Stilt,

Where do you expect Zen to land performance wise relative to the current crop of Intel cores? Are you hearing anything about frequencies for Summit Ridge?

Thanks.

I expect Zen cores the have similar IPC as Ivy Bridge, on average. Faster in certain workloads but the average should be quite well matched.

Seeing how much additional effort AMD has taken in regards of Zen power and clock management and how strict the VRM requirement (based on the existing boards) are, I don't feel that Zen will be able to scale too well in any aspect.

Because of that I actually have lowered my expectations for Zen's shipping frequencies from my original estimations (for the "halo" 8C/16T desktop FX), which originally were 3000MHz (±200MHz) base and 3600MHz (±200MHz) maximum boost. At the moment I'd expect 2600MHz (±200MHz) base and 3200MHz (±200MHz) maximum boost. However I have no idea what the clocks will actually be, so as always anyone elses guess is just as good as mine.

Arachnotronic · May 20, 2016

krumme said:
Yeaa. But even with the cpu in hand what reference for workload would you use?

They could run a workload and publish the results for that workload. Like they did with Bristol Ridge.

Abwx · May 20, 2016

Arachnotronic said:
They could run a workload and publish the results for that workload. Like they did with Bristol Ridge.

They did run a ton of workloads and the average is twice a FX8150 throughput, only thing unknown is the frequency, from Kaveri s reviewer guide we can be almost 100% sure that they used among others Povray, Cinebench R15, X264 HD 5.0.1 or higher, TrueCrypt, Blender and 7 Zip.

Arachnotronic · May 20, 2016

Abwx said:
They did run a ton of workloads and the average is twice a FX8150 throughput, only thing unknown is the frequency, from Kaveri s reviewer guide we can be almost 100% sure that they used among others Povray, Cinebench R15, X264 HD 5.0.1 or higher, TrueCrypt, Blender and 7 Zip.

Oh cool, did you confirm this with AMD? Thanks!

Abwx · May 20, 2016

Arachnotronic said:
Oh cool, did you confirm this with AMD? Thanks!

I rarely saw someone of so bad faith, as said they used those softs in the reviewer guide they sent to sites for Kaveri to measure its IPC in respect of Richland, why should they discard them to compare Zen with Excavator wich is the follower of SR..?..

Anyway if your intention is to derail continuously this thread i guess that i have no other choice that to put you on ignore list, you re the second one, a rarity, but at least i did waste some time being constructive for whom is interested on actual info rather than on continual straws and thread crapping.

The Stilt · May 20, 2016

I don't think AMD needs to cherry pick the review "compatible" benchmarks with Zen. Sure they wanted to use those workloads Abxw listed in reviews, but they had no issues in accepting Cinebench R15 or 3DMark physics (Bullet) results internally even on Bulldozer based products. The benchmarks which are considered malicous and purely Intel optimized by some people... Not to mention that their own ACML library and GPU drivers are compiled by the same compiler, using the same optimizations as Cinebench for example :sneaky:

Sweepr · May 20, 2016

Abwx said:
I rarely saw someone of so bad faith, as said they used those softs in the reviewer guide they sent to sites for Kaveri to measure its IPC in respect of Richland, why should they discard them to compare Zen with Excavator wich is the follower of SR..?..

Just because they used the software listed for Kaveri testing it doesn't mean the same applies here, and so far you provided zero evidence showing otherwise.

jhu · May 20, 2016

The Stilt said:
I don't think AMD needs to cherry pick the review "compatible" benchmarks with Zen. Sure they wanted to use those workloads Abxw listed in reviews, but they had no issues in accepting Cinebench R15 or 3DMark physics (Bullet) results internally even on Bulldozer based products. The benchmarks which are considered malicous and purely Intel optimized by some people... Not to mention that their own ACML library and GPU drivers are compiled by the same compiler, using the same optimizations as Cinebench for example :sneaky:

Yeah, I don't get it either.

Abwx · May 20, 2016

The Stilt said:
I don't think AMD needs to cherry pick the review "compatible" benchmarks with Zen. Sure they wanted to use those workloads Abxw listed in reviews, but they had no issues in accepting Cinebench R15 or 3DMark physics (Bullet) results internally even on Bulldozer based products. The benchmarks which are considered malicous and purely Intel optimized by some people... Not to mention that their own ACML library and GPU drivers are compiled by the same compiler, using the same optimizations as Cinebench for example :sneaky:

Comparing a same brand products with whatever Cinebench version is relevant since it will run the same code path anyway, contrary to a brand/brand comparison.

For a brand/brand comparison one should first compare the results with Cinebench R10, 11.5 and R15 and see the pattern, from what i remember here a few observations i made and wich can be checked :

- In CB R10 the Core 2 Duo got about 10% IPC advantage over the Athlon 64 X2 according to AT review.

- In CB 11.5 the difference between those two CPUs increased to 28.5%.

- With CB R15 AMD scores in respect of CB 11.5 were degraded in comparison of Intel by 10%, the FX more or less manage to be on par but both Kaveri and Excavator show miniscule IPC gain in respect of Piledriver and in comparison of CB 11.5.

At the end it s too easy to compile and recompile a soft at will as well as change the scene to grab %ages that are currently impossible to get without new instructions and serious uarches updates, we are talking of 18% and 10% penalty here while elsewhere people consider as normal single digit improvement between CPU gens, whoever is minimaly concerned by enginering level accuracy should be aware of enormities like this one.

MajinCry · May 20, 2016

"intel optimized"

The "optimizations" have nothing to do with intel CPU features. It's to do with non-intel CPUid strings being designated needlessly slow SSE code. Agner explained all this in simple English.

As I said earlier in the thread, even if Zen is worst case ~Sandybridge, if AMD finally gets rid of whatever is causing their draw call deficit, they'll finally be competent.

Lower energy usage (compared to Sandybridge), more cores and more threads? Seeing as how AMD will naturally be competing on price, that lil' proccy sounds pretty damn sexy.

But that's only if the draw call deficit will be rectified. It'll be a still-birth if it isn't.

Abwx · May 20, 2016

Sweepr said:
Just because they used the software listed for Kaveri testing it doesn't mean the same applies here, and so far you provided zero evidence showing otherwise.

For whom dont know what is enginering certainly that it s not an evidence, personaly when i compare several gen of whatever products i use the same metrics, otherwise how one could have an accurate idea of the improvements , you mean that making apple to potatoes comparisons is a better methodology..?..

The Stilt · May 20, 2016

Abwx said:
Comparing a same brand products with whatever Cinebench version is relevant since it will run the same code path anyway, contrary to a brand/brand comparison.

For a brand/brand comparison one should first compare the results with Cinebench R10, 11.5 and R15 and see the pattern, from what i remember here a few observations i made and wich can be checked :

- In CB R10 the Core 2 Duo got about 10% IPC advantage over the Athlon 64 X2 according to AT review.

- In CB 11.5 the difference between those two CPUs increased to 28.5%.

- With CB R15 AMD scores in respect of CB 11.5 were degraded in comparison of Intel by 10%, the FX more or less manage to be on par but both Kaveri and Excavator show miniscule IPC gain in respect of Piledriver and in comparison of CB 11.5.

At the end it s too easy to compile and recompile a soft at will as well as change the scene to grab %ages that are currently impossible to get without new instructions and serious uarches updates, we are talking of 18% and 10% penalty here while elsewhere people consider as normal single digit improvement between CPU gens, whoever is minimaly concerned by enginering level accuracy should be aware of enormities like this one.

As you very well know, the IPC improvements from Steamroller or Excavator over Piledriver vary heavily between the workloads. In some workloads Excavator can still be slower than Piledriver, despite the average IPC improvement over PD is 12% or so.

AMD doing so poorly in Cinebench R15 compared to R11.5 is just because the R15 utilizes the available resources more effectively and uses newer instructions. 11.5 goes up to SSE2 while R15 uses SSE3. SSE2 vs. SSE3 shouldn't hurt AMD at all thou.

Cinebench R15 is a standard workload for AMD PPO for all of their µarchs since the release of it.

Also patching the CPU detection code ("dispatcher") in any Cinebench version or spoofing all of the CPUID values under VM do not change the results in either way. So no foul play from Maxon's side either. Also with modern versions of ICL it is not possible to optimize the compiled binary or library for certain µarch, without forbidding the use of it on any other µarch. If you use optimization in ICL which produces the optimal code for Skylake, the compiled binary won't run on any AMD CPUs or Intel CPUs which are lacking the instructions supported by Skylake. The only difference to GCC (with march used) is that the binary compiled with ICL won't segfault, since the supported instructions are checked prior executing the code :sneaky:

Burpo · May 20, 2016

Thanks Stilt for that explanation.. :thumbsup:

TheRyuu · May 20, 2016

MajinCry said:
"intel optimized"

The "optimizations" have nothing to do with intel CPU features. It's to do with non-intel CPUid strings being designated needlessly slow SSE code. Agner explained all this in simple English.

This is also very easy to override as Agner has also pointed out. Furthermore it's only the case if the code is being compiled with the /Qx or /Qax compiler options. ICL still offers the more generic /arch option which is CPU agnostic. I don't know what's being used in this case but don't immediately assume foul play since there are other options that could have been used. I'm not saying they didn't use an option which penalizes AMD CPU's but without citing evidence to show they did I have to give them the benefit of the doubt.

The Stilt said:
If you use optimization in ICL which produces the optimal code for Skylake, the compiled binary won't run on any AMD CPUs or Intel CPUs which are lacking the instructions supported by Skylake. The only difference to GCC (with march used) is that the binary compiled with ICL won't segfault, since the supported instructions are checked prior executing the code :sneaky:

You can compile code with /arch and /Qax which will selectively use Intel specific code paths for certain code via a switch(?) statement (I don't know if this has been changed to using function pointers in newer versions or not) so this may not necessarily be true.

That being said you did say earlier in that same post that fiddling with the CPUID values have no affect on the speed of the program so I guess we can safely assume that they are not using such options which would specifically penalize non-Intel processors. ICL may still produce code which may slightly favor Intel processors (think gcc's mtune) but generally for computationally intensive code it can still wind up being faster than code produced by MSVC even for AMD CPU's.

The Stilt · May 20, 2016

TheRyuu said:
This is also very easy to override as Agner has also pointed out. Furthermore it's only the case if the code is being compiled with the /Qx or /Qax compiler options. ICL still offers the more generic /arch option which is CPU agnostic. I don't know what's being used in this case but don't immediately assume foul play since there are other options that could have been used. I'm not saying they didn't use an option which penalizes AMD CPU's but without citing evidence to show they did I have to give them the benefit of the doubt.

You can compile code with /arch and /Qax which will selectively use Intel specific code paths for certain code via a switch(?) statement (I don't know if this has been changed to using function pointers in newer versions or not) so this may not necessarily be true.

That being said you did say earlier in that same post that fiddling with the CPUID values have no affect on the speed of the program so I guess we can safely assume that they are not using such options which would specifically penalize non-Intel processors. ICL may still produce code which may slightly favor Intel processors (think gcc's mtune) but generally for computationally intensive code it can still wind up being faster than code produced by MSVC even for AMD CPU's.

Yeah, Qax is usable as long as the instructions are available on the CPU. However based on the tests I did on the most recent versions of the compiler, "/Arch:CORE-AVX2" provided the best results on all CPUs (both AMD and Intel). However in many cases the more recent iterations of GCC (e.g. 5.30) have surpassed the performance produced by ICL. In a generic FP CFD benchmark code compiled by GCC 5.30 x86-64 was nearly 20% faster than the fastest binary produced by ICL.

DrMrLordX · May 21, 2016

GCC > ICL? Wow, that's interesting.

jhu · May 21, 2016

DrMrLordX said:
GCC > ICL? Wow, that's interesting.

They've been about on par for x86/x86-64 for a few years. ICC has been otherwise ahead with Intel's MKL and Itanium code.

New Zen microarchitecture details

Lifer

Member

Diamond Member

Golden Member

Lifer

Diamond Member

Senior member

Diamond Member

Golden Member

Lifer

Lifer

Lifer

Lifer

Golden Member

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Golden Member

Diamond Member

Diamond Member

Golden Member

Lifer

Lifer