Design changes in Zen 2 (CPU/core/chiplet only)

TheELF · Jan 16, 2019

ub4ty said:
Single-threaded performance walls hit... Industry transitions to Multi-core. Multi-core architectures have negative impacts on single threaded performance thus the divergence in performance going forward on single thread as reflected in both slide captions.

Just because manufacturing hits a wall and focuses on multicore - because they don't have a choice- doesn't mean that single thread has lost it's importance, it just means that the industry can't improve it anymore.

beginner99 · Jan 16, 2019

TheELF said:
It's not,this is just as backwards thinking as what uby does,even if we had a 20Ghz core two 20Ghz cores would still double the performance of a lot of workloads,three or four or eight of them would still multiply the performance by that much.

So would a 40, 60 or 80 Ghz core.

I mean such a core would be huge and putting 2 on the same die would not be possible due to die size limits.

TheELF · Jan 16, 2019

beginner99 said:
So would a 40, 60 or 80 Ghz core.

I mean such a core would be huge and putting 2 on the same die would not be possible due to die size limits.

And x times one of those would still give you x times the performance,do we know how big a 20Ghz core would be and if even one of those would fit on a die?

CatMerc · Jan 17, 2019

You need single threaded performance to keep increasing together with multi core, there's always going to be inherently serial work that needs to be done, I don't even know why this is debated. No sane microprocessor engineer will tell you that you don't need to keep pushing single threaded wherever possible within the power and transistor budget.

Topweasel · Jan 17, 2019

CatMerc said:
You need single threaded performance to keep increasing together with multi core, there's always going to be inherently serial work that needs to be done, I don't even know why this is debated. No sane microprocessor engineer will tell you that you don't need to keep pushing single threaded wherever possible within the power and transistor budget.

The point is an area of diminishing returns. You can only do so much to prevent problems with IO and other problems that can stall a thread. Otherwise we would have 4GHz Pentiums. We started seeing parallel computing with OoS with the Pentium Pro for this various issues. Multi-socket and multi-core computing is just the continuation of this as multitasking was needed and as Intel and AMD hit walls in terms of properly keeping a core fed more cores was needed. There is always a compromise and power usage is only one of them total and effective compute performance are in the end all that matters. While Intel for example happens to be the leader in ST performance. They still have to make the same trade off. It's also why they changed up the cache and inter die core communication to a mesh for SL-X. They had the ability and resources to dedicate driving up heavy parallel communication and the cost of ST performance while leaving the general user config alone. Trying to keep ST performance a priority died a long time ago even if Intel can showcase it in a CPU each gen. In the end its about the effort to increase compute power and driving up ST performance isn't always and quickly became not the answer. Otherwise we wouldn't have had nearly a decade of minimal increases in IPC on top of the clock increases.

So while you aren't wrong you kind of are. Engineers talk compute power. Not single core performance. Because single core performance doesn't matter nearly as much. Multicore computing has taken the edge off of ST developments and soon when AMD has massaged as much performance out of their architecture like Intel has, maximizing core count and clock speed within a production and power limitation will be their only worry. Just because both realize they can get better ST performance doesn't mean it's a primary, or even secondary focus.

Atari2600 · Jan 17, 2019

Topweasel said:
We started seeing parallel computing with OoS with the Pentium Pro for this various issues.

I would avoid bringing instruction level parallelism[1] into this argument, which is really centred on un-compiled software level parallelism[2].

It adds another layer of complexity that isn't necessary to the discussion here!

[1]Where the software coder has no awareness or care for what way the OoOE deals with his/her code.

[2]Where the software coder is specifying and controlling the multi-threading aspects of the code (and each thread is still subject to the OoOE performing instruction level parallelism anyway).

OoOE improves ST performance. It does not superscale MT performance beyond the individual contributions of each single thread.

CatMerc · Jan 17, 2019

Topweasel said:
The point is an area of diminishing returns. You can only do so much to prevent problems with IO and other problems that can stall a thread. Otherwise we would have 4GHz Pentiums. We started seeing parallel computing with OoS with the Pentium Pro for this various issues. Multi-socket and multi-core computing is just the continuation of this as multitasking was needed and as Intel and AMD hit walls in terms of properly keeping a core fed more cores was needed. There is always a compromise and power usage is only one of them total and effective compute performance are in the end all that matters. While Intel for example happens to be the leader in ST performance. They still have to make the same trade off. It's also why they changed up the cache and inter die core communication to a mesh for SL-X. They had the ability and resources to dedicate driving up heavy parallel communication and the cost of ST performance while leaving the general user config alone. Trying to keep ST performance a priority died a long time ago even if Intel can showcase it in a CPU each gen. In the end its about the effort to increase compute power and driving up ST performance isn't always and quickly became not the answer. Otherwise we wouldn't have had nearly a decade of minimal increases in IPC on top of the clock increases.

So while you aren't wrong you kind of are. Engineers talk compute power. Not single core performance. Because single core performance doesn't matter nearly as much. Multicore computing has taken the edge off of ST developments and soon when AMD has massaged as much performance out of their architecture like Intel has, maximizing core count and clock speed within a production and power limitation will be their only worry. Just because both realize they can get better ST performance doesn't mean it's a primary, or even secondary focus.

I never said the rest of the system isn't pushed too. There's a balance to strike that AMD and Intel work very hard on figuring out, modeling workloads years in advance, and bet billions on it being the right one. There isn't a correct one answer, but there is a close enough answer for the majority of workloads in a particular point in time.

moinmoin · Jan 18, 2019

So what do you guys think that all tells us for the design in Zen 2 (this thread's topic) and future Zen design iterations? Zen 2 is already going wider and increasing the cache sizes, which seem to be the most obvious areas for improvements, aside latency which AMD didn't talk about yet.

TheELF · Jan 18, 2019

moinmoin said:
So what do you guys think that all tells us for the design in Zen 2 (this thread's topic) and future Zen design iterations? Zen 2 is already going wider and increasing the cache sizes, which seem to be the most obvious areas for improvements, aside latency which AMD didn't talk about yet.

ZEN + is already so wide that it gives another -what was it- 40% ? more performance from SMT in cinebench...
Going wider and wider when there is no single piece of software that can support it is pointless.
Kaby is only 4 wide while zen/zen+ is 5 wide,and kaby doesn't have all the issues Zen has with thread migration,ccx interconnect and so on.

Atari2600 · Jan 18, 2019

moinmoin said:
So what do you guys think that all tells us for the design in Zen 2 (this thread's topic) and future Zen design iterations? Zen 2 is already going wider and increasing the cache sizes, which seem to be the most obvious areas for improvements, aside latency which AMD didn't talk about yet.

The last couple of pages tell us nothing but we are all stubborn!

The first post in the thread is probably the most informative for what makes sense for AMD to have changed.

Olikan · Jan 18, 2019

Bandwidth increase in branch prediction unit and level 1 instruction cache

https://patents.justia.com/patent/10127044

This patent was granted in the same day of new horizon event... and it "surprisingly fits" most front end changes in Zen2

Topweasel · Jan 18, 2019

CatMerc said:
I never said the rest of the system isn't pushed too. There's a balance to strike that AMD and Intel work very hard on figuring out, modeling workloads years in advance, and bet billions on it being the right one. There isn't a correct one answer, but there is a close enough answer for the majority of workloads in a particular point in time.

Okay but then we get away from the original conversation. They did find a balance and I am not saying they will never increase ST performance but only in the sense that increased per core performance also increases total compute power for the package. But lets say Intel decides the want a 400mm 600mm and 800mm dies for server. If they could make the CPU's run 5% better but the core size increases to a point that they would lose 2 cores on the 400mm die, 4 on the 600mm die and 6 on the 800mm die. They would shelve those upgrades. There is a balance there like the huge footprint of AVX 512 on SL-X. But that because the companies buying 10k versions of the chip want that because it can drastically increase performance in their workloads. But the balance is always on compute power, ST performance is only pushed in the sense on how it can affect overall compute power.

CatMerc · Jan 19, 2019

Topweasel said:
Okay but then we get away from the original conversation. They did find a balance and I am not saying they will never increase ST performance but only in the sense that increased per core performance also increases total compute power for the package. But lets say Intel decides the want a 400mm 600mm and 800mm dies for server. If they could make the CPU's run 5% better but the core size increases to a point that they would lose 2 cores on the 400mm die, 4 on the 600mm die and 6 on the 800mm die. They would shelve those upgrades. There is a balance there like the huge footprint of AVX 512 on SL-X. But that because the companies buying 10k versions of the chip want that because it can drastically increase performance in their workloads. But the balance is always on compute power, ST performance is only pushed in the sense on how it can affect overall compute power.

They model things depending on simulations of workloads. If their suite of workloads that they predict will be commonplace performs better with the 5% faster single thread than two extra cores, then they'll make that choice. The 28 core number on Skylake-X was very carefully chosen.

amd6502 · Jan 19, 2019

I think SMT4 and wider int core might be a posibility for Zen3. Furthermore, i think Zen3 might happen of FDSOI to lower design cost and chase a lower transistor count product line. Zen3 would complement and not replace Zen2/Zen2+.

Zen2 seems like a specialist release meant for very heavy FPU load.

They may go the opposite direction in Zen3 with an int heavy core.

SMT4 would have the nice effect of bringing back single core CPUs.

DrMrLordX · Jan 19, 2019

amd6502 said:
FDSOI

Not likely. Besides, if we start talking about SOI, then you-know-who will show up and . . .

amd6502 · Jan 19, 2019

DrMrLordX said:
Not likely. Besides, if we start talking about SOI, then you-know-who will show up and . . .

If not FDSOI then I think it will stay finfet with 12LP. I think they will have an 8 wide (5+3) int core. It will have really high IPC.

DrMrLordX · Jan 19, 2019

Zen3 not replacing Zen2 would be weird. But that's pretty far out, and will be on a different socket running DDR5.

somethingclever · Jan 20, 2019

amd6502 said:
I think SMT4 and wider int core might be a posibility for Zen3. Furthermore, i think Zen3 might happen of FDSOI to lower design cost and chase a lower transistor count product line. Zen3 would complement and not replace Zen2/Zen2+.

Zen2 seems like a specialist release meant for very heavy FPU load.

They may go the opposite direction in Zen3 with an int heavy core.

SMT4 would have the nice effect of bringing back single core CPUs.

Greater SMT for x86 would just move the bottleneck to instruction decode - isn't that the bottleneck already?

itsmydamnation · Jan 20, 2019

somethingclever said:
Greater SMT for x86 would just move the bottleneck to instruction decode - isn't that the bottleneck already?

Yes

Only workloads smt4 makes sense for are i/o / db / high cache miss.

Topweasel · Jan 20, 2019

CatMerc said:
They model things depending on simulations of workloads. If their suite of workloads that they predict will be commonplace performs better with the 5% faster single thread than two extra cores, then they'll make that choice. The 28 core number on Skylake-X was very carefully chosen.

It's based the workloads of the customers that will be purchasing it. 28c Was decided based on several parts, managable clock rates being one, die size, yield, AVX512 implementation, power usage, mesh complexity and so on.

What you are describing is why Intel offers multiple dies for SL-X so that people that have to worry about single thread performance or overall clock-speed (like someone needing multiple high speed single function VM's). They decide that balance. But Intel didn't build the SL-X architecture for the LCC die for fast ST. They designed it for the 28c chip and the design swims downward to the other dies. Realistically once you have broken away from the consumer dies you have left the world of ST performance and infact even at similar clocks you can see it in the benchmarks against its i7 cousins. They take a slight IPC hit to handle parallel computing better so that at the upper end of core usage the performance is manageable and predictable. Much like the Zen Arch was built to manage. So again total compute power is king.

Kedas · Jan 20, 2019

DrMrLordX said:
Zen3 not replacing Zen2 would be weird. But that's pretty far out, and will be on a different socket running DDR5.

From what I see and hear from AMD I think Zen3 will exist on AM4 and AM5.
Due to their I/O die chiplets design they can just replace the I/O die for AM4 or AM5.
And since it's not on an 7nm+ process they can already test/build fine-tune DDR5 I/O now without delay.
That is one of the big advantages of using chiplets, early and smooth transition to DDR5.
You want to use your current DDR4 with the latest CPU design, no problem, not need to buy new DDR5.

Also to avoid confusion, there is no Zen2+ it's called Zen3 although it's 7nm+

amd6502 · Jan 21, 2019

Kedas said:
From what I see and hear from AMD I think Zen3 will exist on AM4 and AM5.
Due to their I/O die chiplets design they can just replace the I/O die for AM4 or AM5.
And since it's not on an 7nm+ process they can already test/build fine-tune DDR5 I/O now without delay.
That is one of the big advantages of using chiplets, early and smooth transition to DDR5.
You want to use your current DDR4 with the latest CPU design, no problem, not need to buy new DDR5.

Also to avoid confusion, there is no Zen2+ it's called Zen3 although it's 7nm+

I hadn't remembered the old road map; just found it.

Zen3 is 7nm+ and on track. Zen4 design is surprisingly near design completion. Maybe Zen4 will be SMT4?

The key takeaway from AMD’s event is their roadmap. A predictable roadmap helps improve customers confidence in the platform. AMD wanted to show that they are capable of laying out a roadmap and execute on it. To that end, AMD expects Zen 2 to launch in 2019. Zen 3 is on track and Zen 4 is at the design completion phase.

https://fuse.wikichip.org/news/1815/amd-discloses-initial-zen-2-details/

I think TR4 and Epyc need DDR5 and a redesign more badly than AM4.

I don't think the next TR socket needs all that much wattage, given the efficiency gains. They should work on TR affordabilty and graphics output.

Xpage · Jan 22, 2019

I think the chiplet design is a great move and it's about time it finally happened. I think they could eventually go FD SOI for that I/O uncore portion or use FD SOI for a DRAM chip, phase memory or other large cache on chip or even on the I/O die. This would work especially well if GF ever gets a 1x nm FD SOI process working and if it holds true that less masks overall from FD SOI will save AMD money if they make a portion of the chip on FD SOI.

All highly speculative though and mainly dependent on GF, and as such since AMD has been burned by GF before, I think AMD will stick with 14nm or 12nm finfet for quite some time.

DrMrLordX said:
Not likely. Besides, if we start talking about SOI, then you-know-who will show up and . . .

I laughed at this. But you need to say his name 3 times though. Wonder if it will work... Nostajuice

DrMrLordX · Jan 22, 2019

Xpage said:
I laughed at this. But you need to say his name 3 times though. Wonder if it will work... Nostajuice

Stahp!!! That's almost as bad as saying ~~Juangra~~ Internet Strong Man. Oops sorry.

amd6502 said:
Maybe Zen4 will be SMT4?

Honest question: what makes you think AMD is interested in SMT4? Consider that currently AMD recycles the same core through their entire product lineup, all the way from mobile APUs to server chips. SMT4 would require a very wide core to be useful. Do we want/need that kind of core width in a mobile product? Isn't it easier to just go with a smaller, narrower core - if you can call anything Zen narrow, which it isn't really - and then use more cores to achieve greater thread parallelism?

amd6502 · Jan 22, 2019

DrMrLordX said:
Stahp!!! That's almost as bad as saying ~~Juangra~~ Internet Strong Man. Oops sorry.

Honest question: what makes you think AMD is interested in SMT4? Consider that currently AMD recycles the same core through their entire product lineup, all the way from mobile APUs to server chips. SMT4 would require a very wide core to be useful. Do we want/need that kind of core width in a mobile product? Isn't it easier to just go with a smaller, narrower core - if you can call anything Zen narrow, which it isn't really - and then use more cores to achieve greater thread parallelism?

I think SMT4 would be great for mobile. You only need one core with SMT4. It would be a counterbalance to the overly FPU heavy Zen2 (and Zen3, if Zen3 retains the FPU and isn't architecturally very different). Higher IPC would also allow it to clock lower, possibly getting good perf/watt.

With 8-wide integer pipe they can double up decoders (2x 4 instructions/cycle, just like from PD to Steamroller). Many components already exist on SOI. Doing it on FDX would give them sort of a semi-clean slate. Redo things (better) while knowing exactly what to do.

And it's not just 1c/4t monolithic mobile but one could do desktop and server too with compatible chiplets. One could fit maybe 3 or 4 cores on 22FDX (really depends on cache used), so that's a whole lot of threads. So a single chiplet DT could have 12 or more threads. Four chiplet server could have 48+ threads.

Design changes in Zen 2 (CPU/core/chiplet only)

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Senior member

Lifer

Senior member

Lifer

Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Lifer

Senior member