Design changes in Zen 2 (CPU/core/chiplet only)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ub4ty

Senior member
Jun 21, 2017
749
898
96
Overall, modern processor architecture is not a cake walk.
You would have to have a full plate of undergraduate course and graduate coursework centered on computer engineering with a focus on computer architecture coupled with a wealth of industry experience to even begin accurately dissecting, making suggestions, and/or comparing Intel to AMD. From a K.I.S.S perspective, single threaded vs multi-threaded performance are often in opposition due to architectural changes for multi-threaded that improve throughput at the cost of latency. You can widen pipes but the high-way still terminates and hits local roads. SIMD and Floating-points tasks are best suited for the GPU and dedicated accelerators. It's absolutely moronic to load down a CPU with such tasks. For every : AMD should do X,Y,Z for single threaded performance there's a trade-off that impacts power utilization, throughput, etc.

The future is multi-threaded. Has been for some time. It is the job of software engineers to catch up not AMD revert back to yester-year's battle especially for a vocal minority known as the never-ending screecher take my money vidya-gamer who likely knows nothing about software engineering or computer engineering disciplines.

AMD has made a processor centered around multi-threaded work flows. The dies are hand-downs from EPYC. This should excite anyone with a formal understanding of computing in that you are getting a server grade architecture. This is what attracted me to AMD's platform. It is highly scalable and there is uniformity from desktop to server. Now even moreso. The single threaded performance has been sufficient since the 1st generation of Zen. When software is authored towards modern hardware, it should not be heavily reliant on single threaded performance... This goes for game engines (Yes, I'm looking at you severely antiquated code). When you do code analysis and a singular process/thread is mucking up performance, you have crappy code.. Period.

The future is massive multi-threaded. Single threaded performance is for the birds and soon to be defunct software.

https://www.eteknix.com/world-of-tanks-engine-get-massive-multi-thread-overhaul/

wotold01.jpg

wotnew01.jpg


Get with the times people. Stop yapping about Intel's yester-year processors that have been tuned for single threaded performance which can't scale. And stop pretending you can out-wit people with PhDs in multi-processor architecture with decades of industry experience. Single threaded performance is sufficient. When it is sufficient, you can focus on far more important and complex tasks centered on multi-threading and scaling core count. AMD is pursuing a beautiful roadmap. The single threaded performance is going to be what its going to be : sufficient. The core scaling is the focus... Has been from the start. The focus should be an a revolutionary new chiplet design and the architecture therein. Meanwhile, people are focusing on Comp Arch (101) undergraduate level single-threaded performance.... Muh e-celeb IPC values.. That vocal minority of screechers who wont be the volume buyers nor those focused on the future. Learn how to write proper software !
 
Last edited:

ub4ty

Senior member
Jun 21, 2017
749
898
96
AMD took a page out of the Network Industry on how to scale and continues to do so.
You essentially have a data center on a chip.
XFabric+Architecture+Overview.jpg



^ show me a (muh single threaded performance) with this kind of throughput ...

People are screeching about single-threaded performance and AMD has just reinvented CPUs for years to come. Someone who plays video games thinks their workflow is more significant than a processor architecture targeted towards million dollar installs that push the boundaries of computing. AMD has pushed the Central I/O processor all the way down to the desktop. You have a server grade architecture at your finger tips. If you're still talking about single threaded performance and not this revolutionary mystery box, you're missing the point entirely. Single threaded performance only needs to be sufficient. What matters now is multi-threaded tasks and core scaling.

I'm off until more official details and developments come from AMD. This intermediary noise is exhausting and fruitless. AMD will do a hot chips talk and give the detailed run-down. What they deliver will be what they deliver. Performance, clocks, and power utilization will improve. A 16 core is now possible on AM4. Epyc's core count has been doubled from 32 to 64. That's what matters. They did what they needed to do architectural to preserve performance while doubling core count. Grad courses will probably be taught based on this for years to come. A new standard has been set and the madmen and women at AMD made it available to desktop users.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,203
11,909
136
Eagerly awaiting the inevitable big+little cpu designs now that IO has been centralized and inter die communication has evolved.
Oh god, that's exactly what we need: a big-LITTLE chiplet strategy with I/O chiplet level coordination.

We are this close to a class suit, the absurd amount of energy AMD made us waste by speculating on these forums must be illegal. I miss monolithic already. /s
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
Regarding the discussion about the balance of ST and SMT, it's pretty funny that wikichip's page for the future Zen 3 currently suggests it's getting 4-way SMT (no source mentioned, couldn't find any second source myself either, so take only with big grain of salt since the page is user editable etc.).
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
Eagerly awaiting the inevitable big+little cpu designs now that IO has been centralized and inter die communication has evolved.
Indeed, the game changer is the inter-connect. That I/O chip is where all the action is now. The long term goal of AMD has been HSA. The universal interconnect with widespread industry buy-in is what makes this possible. They've been laying the ground work for a progression to a universal fabric that any compute/accelerator can hook into since the launch of Ryzen. A lot of network hardware has uncanny similarities. When Ryzen was launched, I knew immediately where they were headed and why it was such a game changer. Being able to scale and scale in a cost effective manner is what changed everything. Being able to buy an 8 core for 1/2 the price that quad cores sold for was only possible due to methodology AMD pursued and with every release they keep driving this home. The only thing I want to learn far more about is what in the world is going on in that I/O chip. Summer can't come soon enough !

Again, it's all about the interconnects :
Screen Shot 2019-01-03 at 9.57.35 AM.png

Single threaded performance is a dead yester-year story. The future is multi-threaded performance, interconnects, and scaling... What you have to do architecturally to mitigate negative impacts on performance, you do.. Which is essentially what all the bells and whistles are for in modern chip architectures anyway... A giant soup of hacks and shortcuts.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Get with the times people. Stop yapping about Intel's yester-year processors that have been tuned for single threaded performance which can't scale. And stop pretending you can out-wit people with PhDs in multi-processor architecture with decades of industry experience. Single threaded performance is sufficient.

The only reason multi-core processors exist is because single-threaded performance is not enough (I mean that is obvious isn't it?) If intel netburst would have actually worked and we could run very wide 20 ghz processors we would not have multi-core CPUs and would not have to deal with the complexity of programming for multi-cores. In fact multi-core is just a crutch due to limits of current technology (silicon).

Amdahls law shows why it's a crutch. It doesn't scale linearly as well and you quickly run into limits. The only place were it works well is with servers like a web server were each user/request is completely independent and can be process by 1 thread. And that is why we see "many low clocked cores" on server and fewer higher clocked ones on desktop.

Besides that there are algorithms (problems) that can't be parallelized at all (or you gain nothing by doing so because critical part is single-threaded).

Besides that there is also the classic latency vs bandwidth issue. fast single-core = low latency, slow multi-core = high bandwidth. For a desktop with an UI, latency is key.
 

zrav

Junior Member
Nov 11, 2017
20
21
51
Amdahls law shows why it's a crutch. It doesn't scale linearly as well and you quickly run into limits.
Amdahls law is actually fairly optimistic regarding the achievable scaling, where performance eventually plateaus with more cores. In reality at some point performance starts to decrease again due to resource contention, synchronization overhead, etc, even on well parallelized code.
 
  • Like
Reactions: Zucker2k

DrMrLordX

Lifer
Apr 27, 2000
21,634
10,847
136
Oh god, that's exactly what we need: a big-LITTLE chiplet strategy with I/O chiplet level coordination.

Intel is already on its way with Lakefield. They even used stacked interposers on that thing, it's funky. A lot of new tech going into that chip.

We are this close to a class suit, the absurd amount of energy AMD made us waste by speculating on these forums must be illegal. I miss monolithic already. /s

You gotta admit, A64 and Athlon were a lot easier to understand than this mess.
 
  • Like
Reactions: OTG and Olikan

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Single threaded performance is sufficient.

ST performance - or more correctly - the performance of any given isolated thread - is never sufficient.

No matter how carefully architected the program is and how well constructed the code is, there is always going to be dependencies somewhere that force bottlenecks. At these bottlenecks, having faster performance for the 1 (or more) threads that are causing the bottlenecks would be very much desired.
 
May 11, 2008
19,561
1,195
126
It is funny to see that some people still think that a single core is the best option for everything.
Divide and rule is the answer but that rule works when using dedicated accelerators and hardware.
Only a single core cpu with an infinitely high clock speed and zero latency can run highly parallel tasks as efficient and fast as an dedicated accelerator with parallel cores can.

DSP are invented for a reason. GPU are invented for a reason. DMA is invented for a reason.
To offload a cpu to free it up for latency intensive tasks.
The AMIGA design long ago was a great example what good parallel working hardware can do.

Multicore works as long as the synchronization overhead execution time between the threads on the different cores is way less than the execution time available for the actual application code. When code does not benefit from multicore code, it is usually a large string of for example if then else statements that depend on each other.
 

zrav

Junior Member
Nov 11, 2017
20
21
51
It is funny to see that some people still think that a single core is the best option for everything.
No-one said that. It is a fact that multi-threading comes with it's own set of problems, so having high ST performance will always be desirable. Engineering software for multithreading is simply one more complexity layer developers have to deal with for the sake of performance. And it has it's limits. Apart from scaling issues once you go into higher thread counts, not everything can be parallelized efficiently. Nine women can deliver nine babies in nine months, but nine women can't deliver one baby in one month.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Amdahls law is actually fairly optimistic regarding the achievable scaling, where performance eventually plateaus with more cores. In reality at some point performance starts to decrease again due to resource contention, synchronization overhead, etc, even on well parallelized code.
Amdahls law is far too misused in these discussions. Yes more cores all else being equal does pleatu. But that's never the case. You always increase resources in conjunction with those cores.
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
Amdahls law is far too misused in these discussions. Yes more cores all else being equal does pleatu. But that's never the case. You always increase resources in conjunction with those cores.

Amdahl's law isn't about the distribution of resources over cores- it's about the ratio between parallel and non-parallel portions of the work to execute.
 

Spartak

Senior member
Jul 4, 2015
353
266
136
Nine women can deliver nine babies in nine months, but nine women can't deliver one baby in one month.

That must be one of the best analogies for limitations of multithreading I've ever read :D
 
Last edited:
  • Like
Reactions: ShingenDX

ub4ty

Senior member
Jun 21, 2017
749
898
96
The only reason multi-core processors exist is because single-threaded performance is not enough (I mean that is obvious isn't it?) If intel netburst would have actually worked and we could run very wide 20 ghz processors we would not have multi-core CPUs and would not have to deal with the complexity of programming for multi-cores. In fact multi-core is just a crutch due to limits of current technology (silicon).

Amdahls law shows why it's a crutch. It doesn't scale linearly as well and you quickly run into limits. The only place were it works well is with servers like a web server were each user/request is completely independent and can be process by 1 thread. And that is why we see "many low clocked cores" on server and fewer higher clocked ones on desktop.

Besides that there are algorithms (problems) that can't be parallelized at all (or you gain nothing by doing so because critical part is single-threaded).

Besides that there is also the classic latency vs bandwidth issue. fast single-core = low latency, slow multi-core = high bandwidth. For a desktop with an UI, latency is key.
When you get a chance, go to your task manager and copy/paste how many processes and threads are running purely at idle. Multi-core is a standard. UI was sped up due to multi-core processors not single-threaded clocks. All modern OS kernels are multi-threaded. The boost comes from this. I suggest you re-read or read how long a context switch takes if you suggest a highly clocked single core can best a multi-core processor.
plot-launch-switch.png


Also, Electronic interference/cross-talk is a thing when you boost clocks infinitely at small gate sizes. Clock scaling has limits and is extremely inefficient and expensive for a number of basic reasons. Most things can be parallelized. Nothing you can do in a UI is going to cause issues for a multi-core processor at 3.5Ghz of even half of that. Multi-core is here to stay and the future . Single-threaded performance is sufficient at current clock speeds. Multi-process and multi-threaded code is more difficult to develop which is why you have a lot of crappy code that can be parallelized but isn't. I stand by my commentary. Feel free to post links to write-ups that suggest otherwise. Were not in the days of 333Mhz processors anymore. I remember them. Current clock rates are more than sufficient. Human beings aren't cyborgs. Your response time is somewhere is around 200ms.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
Amdahl's law isn't about the distribution of resources over cores- it's about the ratio between parallel and non-parallel portions of the work to execute.
Yes, I remember covering this sophomore year in computer engineering 101.
Modern processors, operating systems, memory technology, and algorithms when all mixed together goes far and beyond Amdahl's laws. What about someone with 100 browser tabs open while gaming while streaming? What does Amdahl's law say about Neural Networks? Or a desktop at a basic baseline, there's : 215 processes and over 1340 threads running for a modern user with no major processing tasks running. A lot has changed since 1967 and a number of users here have formal degrees in Comp Sci./Comp Eng.

ST performance - or more correctly - the performance of any given isolated thread - is never sufficient.

Programs don't run in isolation. They run on modern operating systems with a slew of processes, threads, and other applications running. ST performance is sufficient. Multi-core is the standard and future. An NVME drive alone has a quad core processor. I doubt you can even get an internship at AMD/Intel unless you've taken multi-core architecture.

No matter how carefully architected the program is and how well constructed the code is, there is always going to be dependencies somewhere that force bottlenecks. At these bottlenecks, having faster performance for the 1 (or more) threads that are causing the bottlenecks would be very much desired.

No matter how amazing the hardware is, it's rare that any commercial software is capable of exploiting it 100%. A lot of modern software runs like crap and is full of bottlenecks simply because the hardware is capable of masking it and it saves companies money in terms of more properly developed and complex software. If you think a modern 8-core processor running at 4.0Ghz is the problem and not crappy software filled with bottle-necks that no one budgets money to fix, I have a highway to nowhere to sell you (or a 9900k to mask the problem) for double the cost that is....

4.0Ghz Modern processor...
Single threaded is insufficient. Desktop users crack me up.
They're the only group of people doing the least significant tasks with the least significant requirements utilizing the least efficient software that complain the most that performance isn't adequate. When if ever they will show a detailed breakdown of run-time analysis that proves its single threaded performance and not RAM latency, disk access, PCIE communication latency, horrible synchronization, crappy software that sees a 30% boost in performance after a game is patched, crappy drivers, crappy algorithms, or an insane combination of them all is yet to be seen. Meanwhile, in the professional environment where things are processed at 500fps or other eye-watering metrics, hilariously more is achieved on lower clocked hardware.. Namely because more money and focus is given to efficiency and profits correlate to it. Desktop users will buy horribly inefficient software no matter what and gladly pay insane prices for hardware that masks the problem. Professionals won't and have to scale. Inefficiencies compounded cost real money which is why the software is far more sufficient and processors that run at half the clocks as desktops adequate.

If you can't manage to write a game engine whose messily and easily achievable FPS isn't CPU bound on a 4.0Ghz 8 core processor, you're the problem not the hardware. When a game developer says to management : But I can fix that... They're confronted with :
> Were not taking the risk of anyone touching that code
> Is that a feature that will bring in sales?

Meanwhile, consumers who care nothing about how things actually work in detail simply buy hardware that costs twice as much, declare they are kings of performance, the game developer doesn't have to spend money, the consumer eats all the costs and drives hardware sales and everybody is happy. General software has become more inefficient with modern hardware. The consumer less discerning and knowledgeable.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
I fail to understand these comments... Are people without degrees in this subject area or did they acquire them before multi-core processors? This is not even a debatable stance that requires a degree. It's common knowledge.
Yeah because you are a fanatic...if multicore is good a faster multicore is even better what is there to not understand?
Statements like "Single threaded performance only needs to be sufficient" are in the same league as 640kb will be enough for anybody,
 

Lovec1990

Member
Feb 6, 2017
88
17
51
problem with multicore is not all games/programs are build too use all cores some still prefere single core performance and only now they are getting more optimised for more cores thanks too Ryzen who is offering CPUs with 8 cores at nice prices and forced intel into releasing 8 cores as well
 

dnavas

Senior member
Feb 25, 2017
355
190
116
Are people without degrees in this subject area or did they acquire them before multi-core processors?

Well, in my judgement people are still building API sets as if CPUs worked like they did in the 90s. I was sort of hopeful that we'd get higher level constructs on top of stream processors back around when the G80 shipped, but I haven't seen as much as I expected, and I don't see it filtering down yet to the CPU world. So long as most engineering is done in the world of single-threaded synchronous behavior, this is the attitude I'd expect.
That said, I also see a lot of code written that clearly doesn't give a lot of thought about cache use. Given non-existent performance advancements in CPUs for a decade(ish), I would have thought most of us would have gotten the memo, but we seem completely happy to continue laying abstraction layers on top of each other. This just seems to be the way of things at the moment.
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
Yes, I remember covering this sophomore year in computer engineering 101.
Modern processors, operating systems, memory technology, and algorithms when all mixed together goes far and beyond Amdahl's laws. What about someone with 100 browser tabs open while gaming while streaming? What does Amdahl's law say about Neural Networks? Or a desktop at a basic baseline, there's : 215 processes and over 1340 threads running for a modern user with no major processing tasks running. A lot has changed since 1967 and a number of users here have formal degrees in Comp Sci./Comp Eng.

Thanks for the wonderfully patronizing reply. :confused_old: Yes, evidently Amdahl's law is not the be all and end all. Does not change the fact that it was being misused.
 

BigDaveX

Senior member
Jun 12, 2014
440
216
116
Multi-thread performance is only the be-all and end-all of things when you're working with code that you can equally (or roughly equally) parallelize across large numbers of threads. If you've only got a small handful of primary worker threads plus a bunch of less intensive threads to handle ancillary tasks, then single-thread performance is still going to be a critical factor.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
When you get a chance, go to your task manager and copy/paste how many processes and threads are running purely at idle. Multi-core is a standard. UI was sped up due to multi-core processors not single-threaded clocks.

And now go read my post again. It's only standard because we essential hit a clock-wall that prevented single-core / single-threaded speed to increase. Again read my post. IF a 20 ghz very wide core would be cheaply available and physically feasible, that is what we would use and it would make software including OS much, much simpler.

And about the context switch, the OS itself is constantly shuffling threads from core to core eg, context switching them. So if that is so terrible, windows UI responsiveness would suck.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
And now go read my post again. It's only standard because we essential hit a clock-wall that prevented single-core / single-threaded speed to increase. Again read my post. IF a 20 ghz very wide core would be cheaply available and physically feasible, that is what we would use and it would make software including OS much, much simpler.

And about the context switch, the OS itself is constantly shuffling threads from core to core eg, context switching them. So if that is so terrible, windows UI responsiveness would suck.

You must not remember the days back then. Multi-core and multi-thread was always the future. At least for x86. I remember using a 1c CPU when we had 1c focused apps. I remember using 2c CPU's when everything was still coded for 1c. I still remember using a 4c CPU when everything was coded for 1c. A single core 20GHz would be a billion times more sensitive to billion different IO issues. Major applications would be constantly be stalling out. Applications would still thread lock out the system when IO hanged.

On top of all of that what Ubty said is correct. Desktop users are the only people that get locked in a stupid clock speed matters mindset. A 20GHz CPU doing 8 different tasks at different times in its chain isn't going to work an HPC, Nueral network, or simulation setting as well as 10 2GHz cores that can each be continuously working on parallel work stuff that doesn't work when you have each task taking a break instead of constantly working. Servers had been dual and quad CPU's for a long time well be the idea of a single CPU multi core system ever existed. The Pentium D, Conroe, Nahelem, Sandybridge, A64X2, Phenom, and even BD are all designed for server first solutions. Certain dies like 4c SB, or Llama, may have been mobile or desktop oriented with extra attention to reducing power or kicking up clocks in mind. But unlike SL vs. SL-X everything but Atom and Jaguar had Server uses in mind first and even then nothing is hugely different between SL and SL-X but enough for the market to be very visible. All this is circling back to the main point disapointment in max clocks aside Netburst was always a dead end. It was the last major Arch before Intel moved to an extremely parallel arch in Itanium for the whole market and AMD forced a change by pushing performance and multi core making Intel to push forward in x86. The results would have been the same.