New Zen microarchitecture details

AMD Polaris · Jul 17, 2016

Arachnotronic said:
Or the 8C/16T Intel parts. Or the 10C/20T Intel parts.

Price wise I wouldn't be that sure...

coffeemonster · Jul 17, 2016

AMD Polaris said:
Zen ES is at the moment in revision A0 - it might not be a suprise.

Core counts are: 4c/8t, 8c/16t, 16c/32t, 32c/64t. As it seems now there won't be a 6c/12t at the launch

4 variants of ES Zen are available at the moment:
AM4 8 cores with 95W TDP
AM4 4 cores with 65W TDP
SP3 24 cores with 150W TDP
SP3 32 cores with 180W TDP

The most exciting part is core clock. The 8c/95W variant's base clock is 2.8GHz, all core boost is 3.05GHz and maximum boost is 3.2GHz.
The 4c/65W part's clock is the same. (I would expect 3.5GHz base clock for a retail 4c/95W variant.)

Interesting, I remember reading rumors that there wouldn't be 4 core variant initially and only 6/8.

3.5 base 4.0 turbo is all I was hoping for on initial release versions.

FieryUP · Jul 17, 2016

mikk said:
To me it sounds like some wishful thinking written from some AMD fanboy. And given how many fakes we got from AMD stuff in latest years I somehow doubt this is true. It's not even clear if these predictions are based on some official data/tested data or just some stuff made up from your wishful thinking. That's very dangerous. I think this is mixed with wishful thinking.

One thing I can tell you for sure: what he said is not fake, at all. Wishful thinking would have been a 4c/8t Summit Ridge already running at over 4 GHz clocks in a flawless AM4 motherboard and executing a bunch of benchmark software with zero issues.

mikk · Jul 17, 2016

FieryUP said:
One thing I can tell you for sure: what he said is not fake, at all.

Why should we trust you? Another low posting user coming from nowhere, everyone could post the same.

FieryUP said:
Wishful thinking would have been a 4c/8t Summit Ridge already running at over 4 GHz clocks in a flawless AM4 motherboard and executing a bunch of benchmark software with zero issues.

I didn't call the clock speeds wishful thinking because they are not impressive.

mrmt · Jul 17, 2016

AMD Polaris said:
AFAIK Intel has no answer for the 32c/64t Zen variant, so it could be a great win for AMD on the server market.

Do you know that 32 cores without FMA/AVX on the server market is the old AMD failed strategy of providing more cores to reach the same performance level, do you? And it doesn't work, because the moment you add more cores you add up to leakage and you end up with worse performance/watt.

Plus AMD doesn't have anything close to the Purley platform today and they didn't announce anything for the future either, so I would not have high expectations on that market.

Btw, if Zen is another Polaris AMD is headed for a fiasco.

HiroThreading · Jul 17, 2016

AMD Polaris said:
Price wise I wouldn't be that sure...

Then they wouldn't require multithreading performance in the first place.

FieryUP · Jul 17, 2016

mikk said:
Why should we trust you? Another low posting user coming from nowhere, everyone could post the same.

You don't have to trust me. I'm in the PC hw/sw business for over 20 years, and I have some degree of credibility in certain circles. Most likely I will also receive a Summit Ridge ES CPU months before review sites like Anandtech, but that's a whole different issue 🙂

FieryUP · Jul 17, 2016

mrmt said:
Do you know that 32 cores without FMA/AVX on the server market is the old AMD failed strategy of providing more cores to reach the same performance level, do you?

Who said anything about lack of AVX or FMA support? Zen as a microarchitecture introduces a few compromises, and one of those is the inferior (to Skylake) AVX or FMA performance. It's still supported, just considerably slower than with e.g. SKL or even HSW.

And it doesn't work, because the moment you add more cores you add up to leakage and you end up with worse performance/watt.

Are you saying that Intel wouldn't be able to roll out a 32-core BDW-EP with a cca. 2 GHz clock rate? Why would that be impossible to do? And keep in mind that a Zeppelin (Zen) core is simpler (leaner) than a BDW core, so piling them up might be a bit easier.

NTMBK · Jul 17, 2016

mrmt said:
Do you know that 32 cores without FMA/AVX on the server market is the old AMD failed strategy of providing more cores to reach the same performance level, do you?

Depends on whether you have a workload that cares about vector FP throughput. Plenty of workloads focus on integer throughput.

And if you have a workload that scales well the 8-wide FP vectors over 32 cores, you should probably be looking at putting it on a GPU anyway.

stuff_me_good · Jul 17, 2016

mrmt said:
Do you know that 32 cores without FMA/AVX on the server market is the old AMD failed strategy of providing more cores to reach the same performance level, do you? And it doesn't work, because the moment you add more cores you add up to leakage and you end up with worse performance/watt.

Isn't hose the features that will be remedy with ZEN+ when it comes year or two later?

mrmt · Jul 17, 2016

FieryUP said:
Are you saying that Intel wouldn't be able to roll out a 32-core BDW-EP with a cca. 2 GHz clock rate? Why would that be impossible to do? And keep in mind that a Zeppelin (Zen) core is simpler (leaner) than a BDW core, so piling them up might be a bit easier.

Think harder, Intel had this choice and instead took the other route. Why?

mrmt · Jul 17, 2016

NTMBK said:
Depends on whether you have a workload that cares about vector FP throughput. Plenty of workloads focus on integer throughput.

And if you have a workload that scales well the 8-wide FP vectors over 32 cores, you should probably be looking at putting it on a GPU anyway.

It doesn't matter whether the workload can scale over 32 cores with decent throughput, but whether the workload can be processed in an efficient manner. This is a glass-jaw in AMD architecture, a trade-off they are making. Whether they are making this due to 1)lack of resources, due to 2)being delusional or 3)disruptive business strategy is open to question, but given that Intel has much better access to the server market customers than AMD can dream of, I wouldn't put my bets on 3), I would go for 1) but wouldn't discard 2).

mrmt · Jul 17, 2016

stuff_me_good said:
Isn't hose the features that will be remedy with ZEN+ when it comes year or two later?

You mean two years later when Intel will have deployed something much better than what they have by the time Zen arrives?

FieryUP · Jul 17, 2016

mrmt said:
Think harder, Intel had this choice and instead took the other route. Why?

They also have a 72-core CPU, so it's more about knowing their market IMHO and working in incremental steps. Evolution vs. revolution. While AMD is kinda forced into taking huge leaps now to get back in the game. Since AMD uses 8-core dies, they can make both 24-core or 32-core server parts. A 24-core server part wouldn't make a huge buzz, we've already seen that with BDW-EX. But making both a 24-core and 32-core server CPU could work out as a breaking news and help AMD to capture the attention of server folks.

Also, I don't think a 32-core BDW-EP/EX could work well with their current 2-ring architecture. It would most likely require a more complex solution to scale well. While AMD chose a different path and that one could work well with up to 32 cores. Otherwise, if that would be hugely inferior to a 24-core layout, AMD simply wouldn't bother making a 32-core part...

mrmt · Jul 17, 2016

FieryUP said:
A 24-core server part wouldn't make a huge buzz, we've already seen that with BDW-EX. But making both a 24-core and 32-core server CPU could work out as a breaking news and help AMD to capture the attention of server folks.

Usually in the server market it doesn't matter how much buzz you make, it's more about delivering the promised performance within the efficiency targets, something that Intel has been doing in the last 10 years.

FieryUP said:
so, I don't think a 32-core BDW-EP/EX could work well with their current 2-ring architecture. It would most likely require a more complex solution to scale well. While AMD chose a different path and that one could work well with up to 32 cores. Otherwise, if that would be hugely inferior to a 24-core layout, AMD simply wouldn't bother making a 32-core part...

Would you care to explain how Intel dual ring solution for 32 cores would be worse than the MCM solution AMD is going to field with Zen? Unless AMD discovered a magical pixie dust that will magically solve the problem, AMD solution will be even worse than Intel's.

FieryUP · Jul 17, 2016

mrmt said:
Usually in the server market it doesn't matter how much buzz you make, it's more about delivering the promised performance within the efficiency targets, something that Intel has been doing in the last 10 years.

Yes, and no server guys would read any reviews, and noone in the upper management has ever been influenced by the power of marketing. That's exactly why noone spends a dollar on advertising server hardware 🙂

Would you care to explain how Intel dual ring solution for 32 cores would be worse than the MCM solution AMD is going to field with Zen? Unless AMD discovered a magical pixie dust that will magically solve the problem, AMD solution will be even worse than Intel's.

GMI and SDF will do the magic. I'm sure you know what they stand for and how they work 😉

As for 32 cores in BDW-EP/EX: there are certain hops that you can quite easily pile up and make things worse. The data packets have to travel around in those 2 rings, and the size of the rings (the number of hops) is quite an important factor. Yes, you could add 4 more cores per each ring, but I'm sure in many workloads it would make things worse, even at similar core clocks than with the current top-of-the-line 24-core BDW-EX... AMD's solution doesn't use any rings BTW.

The Stilt · Jul 17, 2016

If those clocks stated by AMD Polaris are even remotely true, I cannot say I'm too surprised. They align perfectly with the latest of my estimations I made in post #1259 (2600MHz (±200MHz) base and 3200MHz (±200MHz) maximum boost).

As explained in post #2044 I assumed that AMD used 2:1 scale in their famous "Orochi vs. Summit" and "Excavator vs. Zen" charts.

Also if the chart was infact about Cinebench R15 (which is highly likely), then the "Orochi" CPU they used in the comparison is likely to be the FX-8350 / FX-8370 (and not the -E model I previously speculated.) These two CPUs have the same base clock and TDP, meaning they will perform identically in multithreaded workloads. The FX-8350 / FX-8370 scores 643 points in Cinebench R15 MT. Multiply that by 1.5x and you'll end up with 965 points. Based on the expected IPC improvement and the clock estimation, Zeppelin should match this target at the 3050MHz "all core boost" if the SMT yield is ~11.7%. A pretty plausible SMT yield for a first implementation, IMO.

I ran the claimed figures through (with the expectation that the IPC is exactly 40% over Excavator).

Used Cinebench R15 only, since it is properly threaded workload which doesn't use modern instructions.

If the sustained base clock during CB R15 MT is 2.8GHz, 8C/16T Zeppelin should score: 833 / 872 / 912 / 952 / 991 (5 - 25% SMT yield)

If the sustained base clock during CB R15 MT is 3.05GHz, 8C/16T Zeppelin should score: 907 / 950 / 994 / 1037 / 1080 (5 - 25% SMT yield)

The single threaded score should be around 113 points at 3.2GHz.

Meanwhile Intel 8/16C parts would score as follows, at the same clocks:

Sandy Bridge-E = 971 / 1058 (2.8GHz / 3.05GHz)
Ivy Bridge-EP = 1002 / 1091 (2.8GHz / 3.05GHz)
Haswell-E = 1091 / 1189 (2.8GHz / 3.05GHz)
Broadwell-E = 1117 / 1217 (2.8GHz / 3.05GHz)

Intel ST at 3.2GHz:

Sandy Bridge = 114
Ivy Bridge = 117
Haswell = 128
Broadwell = 136
Skylake = 139

Based on these figures 8C/16T Zeppelin clocked to 2.8GHz / 3.05GHz would compete in multithreaded performance with i7-6700K (except in AVX2, most likely).

Sweepr · Jul 17, 2016

mrmt said:
Would you care to explain how Intel dual ring solution for 32 cores would be worse than the MCM solution AMD is going to field with Zen? Unless AMD discovered a magical pixie dust that will magically solve the problem, AMD solution will be even worse than Intel's.

Personally can't wait to see how this killer 4-way-MCM 32-core chip fares against the 28-core Skylake-EP/EX that will be available by the time of its release. 🙂

AMD Polaris said:
AM4 8 cores with 95W TDP
AM4 4 cores with 65W TDP

The most exciting part is core clock. The 8c/95W variant's base clock is 2.8GHz, all core boost is 3.05GHz and maximum boost is 3.2GHz.
The 4c/65W part's clock is the same.

Trouble clocking above 3GHz?

mrmt · Jul 17, 2016

FieryUP said:
GMI and SDF will do the magic. I'm sure you know what they stand for and how they work 😉

I'm not interested in magic, I'm interested on how they are technically better than Intel ring solution and it seems you do not have an answer, you are expecting a miracle. I'm expecting either huge latencies or very poor efficiency from AMD solution, it is a cheapskate solution for a market that has very little tolerance to it.

FieryUP said:
As for 32 cores in BDW-EP/EX: there are certain hops that you can quite easily pile up and make things worse. The data packets have to travel around in those 2 rings, and the size of the rings (the number of hops) is quite an important factor. Yes, you could add 4 more cores per each ring, but I'm sure in many workloads it would make things worse, even at similar core clocks than with the current top-of-the-line 24-core BDW-EX... AMD's solution doesn't use any rings BTW.

I'm not sure you are familiar with the current Xeon layout. They have four blocks of 6 cores each, each pair of blocks is connected by two rings and there is a bridge between the two rings of each pair, so you don't add four cores to each ring to reach 32 cores, you add two, and FYI the top Skylake-EP, the one that will compete with Zen, will have 28 cores.

Doom2pro · Jul 17, 2016

The question about the speculated Clocks from Mr. Polaris, the real question is, do early Engineering Samples clock near final product clocks?

😀

bigboxes · Jul 17, 2016

mrmt said:
It doesn't matter whether the workload can scale over 32 cores with decent throughput, but whether the workload can be processed in an efficient manner. This is a glass-jaw in AMD architecture, a trade-off they are making. Whether they are making this due to 1)lack of resources, due to 2)being delusional or 3)disruptive business strategy is open to question, but given that Intel has much better access to the server market customers than AMD can dream of, I wouldn't put my bets on 3), I would go for 1) but wouldn't discard 2).

The way you talk it's like you are rooting for Intel. Why does it matter to you who makes the best product? Let's just see what happens and then buy what works best for the given application. Can we even agree on that? Maybe you work for/have stock in Intel.

FieryUP · Jul 17, 2016

mrmt said:
I'm not sure you are familiar with the current Xeon layout. They have four blocks of 6 cores each, each pair of blocks is connected by two rings and there is a bridge between the two rings of each pair, so you don't add four cores to each ring to reach 32 cores, you add two, and FYI the top Skylake-EP, the one that will compete with Zen, will have 28 cores.

I'm familiar, but the 2 rings I've mentioned are still just two 2-way rings, and 12 cores are connected to each other by a double ring:

You would have to add 33% more cores to each ring to squeeze 32 cores into the BDW-EP/EX die. That's not a small number in an architecture that's already sub-optimal (IMHO). No surprise in SKL-EP/EX Intel will switch to a 3-ring (3 double-ring if you will) solution.

krumme · Jul 17, 2016

The Stilt said:
If those clocks stated by AMD Polaris are even remotely true, I cannot say I'm too surprised. They align perfectly with the latest of my estimations I made in post #1259 (2600MHz (±200MHz) base and 3200MHz (±200MHz) maximum boost).

As explained in post #2044 I assumed that AMD used 2:1 scale in their famous "Orochi vs. Summit" and "Excavator vs. Zen" charts.

Also if the chart was infact about Cinebench R15 (which is highly likely), then the "Orochi" CPU they used in the comparison is likely to be the FX-8350 / FX-8370 (and not the -E model I previously speculated.) These two CPUs have the same base clock and TDP, meaning they will perform identically in multithreaded workloads. The FX-8350 / FX-8370 scores 643 points in Cinebench R15 MT. Multiply that by 1.5x and you'll end up with 965 points. Based on the expected IPC improvement and the clock estimation, Zeppelin should match this target at the 3050MHz "all core boost" if the SMT yield is ~11.7%. A pretty plausible SMT yield for a first implementation, IMO.

I ran the claimed figures through (with the expectation that the IPC is exactly 40% over Excavator).

Used Cinebench R15 only, since it is properly threaded workload which doesn't use modern instructions.

If the sustained base clock during CB R15 MT is 2.8GHz, 8C/16T Zeppelin should score: 833 / 872 / 912 / 952 / 991 (5 - 25% SMT yield)

If the sustained base clock during CB R15 MT is 3.05GHz, 8C/16T Zeppelin should score: 907 / 950 / 994 / 1037 / 1080 (5 - 25% SMT yield)

The single threaded score should be around 113 points at 3.2GHz.

Meanwhile Intel 8/16C parts would score as follows, at the same clocks:

Sandy Bridge-E = 971 / 1058 (2.8GHz / 3.05GHz)
Ivy Bridge-EP = 1002 / 1091 (2.8GHz / 3.05GHz)
Haswell-E = 1091 / 1189 (2.8GHz / 3.05GHz)
Broadwell-E = 1117 / 1217 (2.8GHz / 3.05GHz)

Intel ST at 3.2GHz:

Sandy Bridge = 114
Ivy Bridge = 117
Haswell = 128
Broadwell = 136
Skylake = 139

Based on these figures 8C/16T Zeppelin clocked to 2.8GHz / 3.05GHz would compete in multithreaded performance with i7-6700K (except in AVX2, most likely).

My one and only guess was 2.5 3.2 🙂
http://forums.anandtech.com/showthread.php?p=38267634
No surprises here imo.
And i expect relatively weak smt too.
Where i do disagree is the st ipc assessment you do and build perf estimates on.
Based on some ppt and weak pr talk?
I think when they are gunning for broadwell ipc on integer - and why shouldnt they especially that wide - thats more or less what they will get. Hsw integer st ipc in cb. Fp more like ivy.
Efficiency will be the key. Then total system cost.

mrmt · Jul 17, 2016

FieryUP said:
I'm familiar, but the 2 rings I've mentioned are still just two 2-way rings, and 12 cores are connected to each other by a double ring:

It is not a ring, it is a bridge, but since you said that Intel solution is suboptimal, what would be the optimal solution for you? and how AMD solution is better than Intel suboptimal solution?

The Stilt · Jul 17, 2016

Doom2pro said:
The question about the speculated Clocks from Mr. Polaris, the real question is, do early Engineering Samples clock near final product clocks?

😀

On all designs since Trinity, EVT parts have existed at final clocks. Orochi B (Bulldozer) reached it's final clocks with it's second stepping from the launch stepping (B0 vs. B2 release). A1 stepping already reached 3.6GHz.

If AMD plans to launch Zeppelin in 2016, they must have the final silicon available by now. Therefore most likely A0 (which was displayed by Lisa Su) is the final stepping for Zeppelin.

New Zen microarchitecture details

Junior Member

Senior member

Junior Member

Diamond Member

Diamond Member

Member

Junior Member

Junior Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Diamond Member

Junior Member

Golden Member

Diamond Member

Diamond Member

Senior member

Lifer

Junior Member

Diamond Member

Diamond Member

Golden Member