Design changes in Zen 2 (CPU/core/chiplet only)

ub4ty · Jan 13, 2019

Overall, modern processor architecture is not a cake walk.
You would have to have a full plate of undergraduate course and graduate coursework centered on computer engineering with a focus on computer architecture coupled with a wealth of industry experience to even begin accurately dissecting, making suggestions, and/or comparing Intel to AMD. From a K.I.S.S perspective, single threaded vs multi-threaded performance are often in opposition due to architectural changes for multi-threaded that improve throughput at the cost of latency. You can widen pipes but the high-way still terminates and hits local roads. SIMD and Floating-points tasks are best suited for the GPU and dedicated accelerators. It's absolutely moronic to load down a CPU with such tasks. For every : AMD should do X,Y,Z for single threaded performance there's a trade-off that impacts power utilization, throughput, etc.

The future is multi-threaded. Has been for some time. It is the job of software engineers to catch up not AMD revert back to yester-year's battle especially for a vocal minority known as the never-ending screecher take my money vidya-gamer who likely knows nothing about software engineering or computer engineering disciplines.

AMD has made a processor centered around multi-threaded work flows. The dies are hand-downs from EPYC. This should excite anyone with a formal understanding of computing in that you are getting a server grade architecture. This is what attracted me to AMD's platform. It is highly scalable and there is uniformity from desktop to server. Now even moreso. The single threaded performance has been sufficient since the 1st generation of Zen. When software is authored towards modern hardware, it should not be heavily reliant on single threaded performance... This goes for game engines (Yes, I'm looking at you severely antiquated code). When you do code analysis and a singular process/thread is mucking up performance, you have crappy code.. Period.

The future is massive multi-threaded. Single threaded performance is for the birds and soon to be defunct software.

https://www.eteknix.com/world-of-tanks-engine-get-massive-multi-thread-overhaul/

Get with the times people. Stop yapping about Intel's yester-year processors that have been tuned for single threaded performance which can't scale. And stop pretending you can out-wit people with PhDs in multi-processor architecture with decades of industry experience. Single threaded performance is sufficient. When it is sufficient, you can focus on far more important and complex tasks centered on multi-threading and scaling core count. AMD is pursuing a beautiful roadmap. The single threaded performance is going to be what its going to be : sufficient. The core scaling is the focus... Has been from the start. The focus should be an a revolutionary new chiplet design and the architecture therein. Meanwhile, people are focusing on Comp Arch (101) undergraduate level single-threaded performance.... Muh e-celeb IPC values.. That vocal minority of screechers who wont be the volume buyers nor those focused on the future. Learn how to write proper software !

ub4ty · Jan 13, 2019

AMD took a page out of the Network Industry on how to scale and continues to do so.
You essentially have a data center on a chip.

^ show me a (muh single threaded performance) with this kind of throughput ...

People are screeching about single-threaded performance and AMD has just reinvented CPUs for years to come. Someone who plays video games thinks their workflow is more significant than a processor architecture targeted towards million dollar installs that push the boundaries of computing. AMD has pushed the Central I/O processor all the way down to the desktop. You have a server grade architecture at your finger tips. If you're still talking about single threaded performance and not this revolutionary mystery box, you're missing the point entirely. Single threaded performance only needs to be sufficient. What matters now is multi-threaded tasks and core scaling.

I'm off until more official details and developments come from AMD. This intermediary noise is exhausting and fruitless. AMD will do a hot chips talk and give the detailed run-down. What they deliver will be what they deliver. Performance, clocks, and power utilization will improve. A 16 core is now possible on AM4. Epyc's core count has been doubled from 32 to 64. That's what matters. They did what they needed to do architectural to preserve performance while doubling core count. Grad courses will probably be taught based on this for years to come. A new standard has been set and the madmen and women at AMD made it available to desktop users.

RoarTiger · Jan 13, 2019

ub4ty said:
A AMD has just reinvented CPUs for years to come.

Eagerly awaiting the inevitable big+little cpu designs now that IO has been centralized and inter die communication has evolved.

coercitiv · Jan 13, 2019

RoarTiger said:
Eagerly awaiting the inevitable big+little cpu designs now that IO has been centralized and inter die communication has evolved.

Oh god, that's exactly what we need: a big-LITTLE chiplet strategy with I/O chiplet level coordination.

We are this close to a class suit, the absurd amount of energy AMD made us waste by speculating on these forums must be illegal. I miss monolithic already. /s

moinmoin · Jan 13, 2019

Regarding the discussion about the balance of ST and SMT, it's pretty funny that wikichip's page for the future Zen 3 currently suggests it's getting 4-way SMT (no source mentioned, couldn't find any second source myself either, so take only with big grain of salt since the page is user editable etc.).

ub4ty · Jan 13, 2019

RoarTiger said:
Eagerly awaiting the inevitable big+little cpu designs now that IO has been centralized and inter die communication has evolved.

Indeed, the game changer is the inter-connect. That I/O chip is where all the action is now. The long term goal of AMD has been HSA. The universal interconnect with widespread industry buy-in is what makes this possible. They've been laying the ground work for a progression to a universal fabric that any compute/accelerator can hook into since the launch of Ryzen. A lot of network hardware has uncanny similarities. When Ryzen was launched, I knew immediately where they were headed and why it was such a game changer. Being able to scale and scale in a cost effective manner is what changed everything. Being able to buy an 8 core for 1/2 the price that quad cores sold for was only possible due to methodology AMD pursued and with every release they keep driving this home. The only thing I want to learn far more about is what in the world is going on in that I/O chip. Summer can't come soon enough !

Again, it's all about the interconnects :

Single threaded performance is a dead yester-year story. The future is multi-threaded performance, interconnects, and scaling... What you have to do architecturally to mitigate negative impacts on performance, you do.. Which is essentially what all the bells and whistles are for in modern chip architectures anyway... A giant soup of hacks and shortcuts.

beginner99 · Jan 14, 2019

ub4ty said:
Get with the times people. Stop yapping about Intel's yester-year processors that have been tuned for single threaded performance which can't scale. And stop pretending you can out-wit people with PhDs in multi-processor architecture with decades of industry experience. Single threaded performance is sufficient.

The only reason multi-core processors exist is because single-threaded performance is not enough (I mean that is obvious isn't it?) If intel netburst would have actually worked and we could run very wide 20 ghz processors we would not have multi-core CPUs and would not have to deal with the complexity of programming for multi-cores. In fact multi-core is just a crutch due to limits of current technology (silicon).

Amdahls law shows why it's a crutch. It doesn't scale linearly as well and you quickly run into limits. The only place were it works well is with servers like a web server were each user/request is completely independent and can be process by 1 thread. And that is why we see "many low clocked cores" on server and fewer higher clocked ones on desktop.

Besides that there are algorithms (problems) that can't be parallelized at all (or you gain nothing by doing so because critical part is single-threaded).

Besides that there is also the classic latency vs bandwidth issue. fast single-core = low latency, slow multi-core = high bandwidth. For a desktop with an UI, latency is key.

zrav · Jan 14, 2019

beginner99 said:
Amdahls law shows why it's a crutch. It doesn't scale linearly as well and you quickly run into limits.

Amdahls law is actually fairly optimistic regarding the achievable scaling, where performance eventually plateaus with more cores. In reality at some point performance starts to decrease again due to resource contention, synchronization overhead, etc, even on well parallelized code.

DrMrLordX · Jan 14, 2019

coercitiv said:
Oh god, that's exactly what we need: a big-LITTLE chiplet strategy with I/O chiplet level coordination.

Intel is already on its way with Lakefield. They even used stacked interposers on that thing, it's funky. A lot of new tech going into that chip.

We are this close to a class suit, the absurd amount of energy AMD made us waste by speculating on these forums must be illegal. I miss monolithic already. /s

You gotta admit, A64 and Athlon were a lot easier to understand than this mess.

Atari2600 · Jan 14, 2019

ub4ty said:
Single threaded performance is sufficient.

ST performance - or more correctly - the performance of any given isolated thread - is never sufficient.

No matter how carefully architected the program is and how well constructed the code is, there is always going to be dependencies somewhere that force bottlenecks. At these bottlenecks, having faster performance for the 1 (or more) threads that are causing the bottlenecks would be very much desired.

William Gaatjes · Jan 14, 2019

It is funny to see that some people still think that a single core is the best option for everything.
Divide and rule is the answer but that rule works when using dedicated accelerators and hardware.
Only a single core cpu with an infinitely high clock speed and zero latency can run highly parallel tasks as efficient and fast as an dedicated accelerator with parallel cores can.

DSP are invented for a reason. GPU are invented for a reason. DMA is invented for a reason.
To offload a cpu to free it up for latency intensive tasks.
The AMIGA design long ago was a great example what good parallel working hardware can do.

Multicore works as long as the synchronization overhead execution time between the threads on the different cores is way less than the execution time available for the actual application code. When code does not benefit from multicore code, it is usually a large string of for example if then else statements that depend on each other.

zrav · Jan 14, 2019

William Gaatjes said:
It is funny to see that some people still think that a single core is the best option for everything.

No-one said that. It is a fact that multi-threading comes with it's own set of problems, so having high ST performance will always be desirable. Engineering software for multithreading is simply one more complexity layer developers have to deal with for the sake of performance. And it has it's limits. Apart from scaling issues once you go into higher thread counts, not everything can be parallelized efficiently. Nine women can deliver nine babies in nine months, but nine women can't deliver one baby in one month.

CatMerc · Jan 14, 2019

zrav said:
Amdahls law is actually fairly optimistic regarding the achievable scaling, where performance eventually plateaus with more cores. In reality at some point performance starts to decrease again due to resource contention, synchronization overhead, etc, even on well parallelized code.

Amdahls law is far too misused in these discussions. Yes more cores all else being equal does pleatu. But that's never the case. You always increase resources in conjunction with those cores.

NTMBK · Jan 14, 2019

CatMerc said:
Amdahls law is far too misused in these discussions. Yes more cores all else being equal does pleatu. But that's never the case. You always increase resources in conjunction with those cores.

Amdahl's law isn't about the distribution of resources over cores- it's about the ratio between parallel and non-parallel portions of the work to execute.

Spartak · Jan 14, 2019

zrav said:
Nine women can deliver nine babies in nine months, but nine women can't deliver one baby in one month.

That must be one of the best analogies for limitations of multithreading I've ever read

ub4ty · Jan 14, 2019

beginner99 said:
The only reason multi-core processors exist is because single-threaded performance is not enough (I mean that is obvious isn't it?) If intel netburst would have actually worked and we could run very wide 20 ghz processors we would not have multi-core CPUs and would not have to deal with the complexity of programming for multi-cores. In fact multi-core is just a crutch due to limits of current technology (silicon).

Amdahls law shows why it's a crutch. It doesn't scale linearly as well and you quickly run into limits. The only place were it works well is with servers like a web server were each user/request is completely independent and can be process by 1 thread. And that is why we see "many low clocked cores" on server and fewer higher clocked ones on desktop.

Besides that there are algorithms (problems) that can't be parallelized at all (or you gain nothing by doing so because critical part is single-threaded).

Besides that there is also the classic latency vs bandwidth issue. fast single-core = low latency, slow multi-core = high bandwidth. For a desktop with an UI, latency is key.

When you get a chance, go to your task manager and copy/paste how many processes and threads are running purely at idle. Multi-core is a standard. UI was sped up due to multi-core processors not single-threaded clocks. All modern OS kernels are multi-threaded. The boost comes from this. I suggest you re-read or read how long a context switch takes if you suggest a highly clocked single core can best a multi-core processor.

Also, Electronic interference/cross-talk is a thing when you boost clocks infinitely at small gate sizes. Clock scaling has limits and is extremely inefficient and expensive for a number of basic reasons. Most things can be parallelized. Nothing you can do in a UI is going to cause issues for a multi-core processor at 3.5Ghz of even half of that. Multi-core is here to stay and the future . Single-threaded performance is sufficient at current clock speeds. Multi-process and multi-threaded code is more difficult to develop which is why you have a lot of crappy code that can be parallelized but isn't. I stand by my commentary. Feel free to post links to write-ups that suggest otherwise. Were not in the days of 333Mhz processors anymore. I remember them. Current clock rates are more than sufficient. Human beings aren't cyborgs. Your response time is somewhere is around 200ms.

ub4ty · Jan 14, 2019

NTMBK said:
Amdahl's law isn't about the distribution of resources over cores- it's about the ratio between parallel and non-parallel portions of the work to execute.

Yes, I remember covering this sophomore year in computer engineering 101.
Modern processors, operating systems, memory technology, and algorithms when all mixed together goes far and beyond Amdahl's laws. What about someone with 100 browser tabs open while gaming while streaming? What does Amdahl's law say about Neural Networks? Or a desktop at a basic baseline, there's : 215 processes and over 1340 threads running for a modern user with no major processing tasks running. A lot has changed since 1967 and a number of users here have formal degrees in Comp Sci./Comp Eng.

Atari2600 said:
ST performance - or more correctly - the performance of any given isolated thread - is never sufficient.

Programs don't run in isolation. They run on modern operating systems with a slew of processes, threads, and other applications running. ST performance is sufficient. Multi-core is the standard and future. An NVME drive alone has a quad core processor. I doubt you can even get an internship at AMD/Intel unless you've taken multi-core architecture.

Atari2600 said:
No matter how carefully architected the program is and how well constructed the code is, there is always going to be dependencies somewhere that force bottlenecks. At these bottlenecks, having faster performance for the 1 (or more) threads that are causing the bottlenecks would be very much desired.

No matter how amazing the hardware is, it's rare that any commercial software is capable of exploiting it 100%. A lot of modern software runs like crap and is full of bottlenecks simply because the hardware is capable of masking it and it saves companies money in terms of more properly developed and complex software. If you think a modern 8-core processor running at 4.0Ghz is the problem and not crappy software filled with bottle-necks that no one budgets money to fix, I have a highway to nowhere to sell you (or a 9900k to mask the problem) for double the cost that is....

4.0Ghz Modern processor...
Single threaded is insufficient. Desktop users crack me up.
They're the only group of people doing the least significant tasks with the least significant requirements utilizing the least efficient software that complain the most that performance isn't adequate. When if ever they will show a detailed breakdown of run-time analysis that proves its single threaded performance and not RAM latency, disk access, PCIE communication latency, horrible synchronization, crappy software that sees a 30% boost in performance after a game is patched, crappy drivers, crappy algorithms, or an insane combination of them all is yet to be seen. Meanwhile, in the professional environment where things are processed at 500fps or other eye-watering metrics, hilariously more is achieved on lower clocked hardware.. Namely because more money and focus is given to efficiency and profits correlate to it. Desktop users will buy horribly inefficient software no matter what and gladly pay insane prices for hardware that masks the problem. Professionals won't and have to scale. Inefficiencies compounded cost real money which is why the software is far more sufficient and processors that run at half the clocks as desktops adequate.

If you can't manage to write a game engine whose messily and easily achievable FPS isn't CPU bound on a 4.0Ghz 8 core processor, you're the problem not the hardware. When a game developer says to management : But I can fix that... They're confronted with :
> Were not taking the risk of anyone touching that code
> Is that a feature that will bring in sales?

Meanwhile, consumers who care nothing about how things actually work in detail simply buy hardware that costs twice as much, declare they are kings of performance, the game developer doesn't have to spend money, the consumer eats all the costs and drives hardware sales and everybody is happy. General software has become more inefficient with modern hardware. The consumer less discerning and knowledgeable.

TheELF · Jan 14, 2019

ub4ty said:
I fail to understand these comments... Are people without degrees in this subject area or did they acquire them before multi-core processors? This is not even a debatable stance that requires a degree. It's common knowledge.

Yeah because you are a fanatic...if multicore is good a faster multicore is even better what is there to not understand?
Statements like "Single threaded performance only needs to be sufficient" are in the same league as 640kb will be enough for anybody,

Lovec1990 · Jan 14, 2019

problem with multicore is not all games/programs are build too use all cores some still prefere single core performance and only now they are getting more optimised for more cores thanks too Ryzen who is offering CPUs with 8 cores at nice prices and forced intel into releasing 8 cores as well

dnavas · Jan 14, 2019

ub4ty said:
Are people without degrees in this subject area or did they acquire them before multi-core processors?

Well, in my judgement people are still building API sets as if CPUs worked like they did in the 90s. I was sort of hopeful that we'd get higher level constructs on top of stream processors back around when the G80 shipped, but I haven't seen as much as I expected, and I don't see it filtering down yet to the CPU world. So long as most engineering is done in the world of single-threaded synchronous behavior, this is the attitude I'd expect.
That said, I also see a lot of code written that clearly doesn't give a lot of thought about cache use. Given non-existent performance advancements in CPUs for a decade(ish), I would have thought most of us would have gotten the memo, but we seem completely happy to continue laying abstraction layers on top of each other. This just seems to be the way of things at the moment.

NTMBK · Jan 14, 2019

ub4ty said:
Yes, I remember covering this sophomore year in computer engineering 101.
Modern processors, operating systems, memory technology, and algorithms when all mixed together goes far and beyond Amdahl's laws. What about someone with 100 browser tabs open while gaming while streaming? What does Amdahl's law say about Neural Networks? Or a desktop at a basic baseline, there's : 215 processes and over 1340 threads running for a modern user with no major processing tasks running. A lot has changed since 1967 and a number of users here have formal degrees in Comp Sci./Comp Eng.

Thanks for the wonderfully patronizing reply.

Yes, evidently Amdahl's law is not the be all and end all. Does not change the fact that it was being misused.

BigDaveX · Jan 14, 2019

Multi-thread performance is only the be-all and end-all of things when you're working with code that you can equally (or roughly equally) parallelize across large numbers of threads. If you've only got a small handful of primary worker threads plus a bunch of less intensive threads to handle ancillary tasks, then single-thread performance is still going to be a critical factor.

Spartak · Jan 14, 2019

Jesus Christ. Let's send all Apple, Intel and AMD engineers to ub4ty's microarchitecture masterclass. Who knew they urgently need reschooling??! /s

beginner99 · Jan 14, 2019

ub4ty said:
When you get a chance, go to your task manager and copy/paste how many processes and threads are running purely at idle. Multi-core is a standard. UI was sped up due to multi-core processors not single-threaded clocks.

And now go read my post again. It's only standard because we essential hit a clock-wall that prevented single-core / single-threaded speed to increase. Again read my post. IF a 20 ghz very wide core would be cheaply available and physically feasible, that is what we would use and it would make software including OS much, much simpler.

And about the context switch, the OS itself is constantly shuffling threads from core to core eg, context switching them. So if that is so terrible, windows UI responsiveness would suck.

Topweasel · Jan 14, 2019

beginner99 said:
And now go read my post again. It's only standard because we essential hit a clock-wall that prevented single-core / single-threaded speed to increase. Again read my post. IF a 20 ghz very wide core would be cheaply available and physically feasible, that is what we would use and it would make software including OS much, much simpler.

And about the context switch, the OS itself is constantly shuffling threads from core to core eg, context switching them. So if that is so terrible, windows UI responsiveness would suck.

You must not remember the days back then. Multi-core and multi-thread was always the future. At least for x86. I remember using a 1c CPU when we had 1c focused apps. I remember using 2c CPU's when everything was still coded for 1c. I still remember using a 4c CPU when everything was coded for 1c. A single core 20GHz would be a billion times more sensitive to billion different IO issues. Major applications would be constantly be stalling out. Applications would still thread lock out the system when IO hanged.

On top of all of that what Ubty said is correct. Desktop users are the only people that get locked in a stupid clock speed matters mindset. A 20GHz CPU doing 8 different tasks at different times in its chain isn't going to work an HPC, Nueral network, or simulation setting as well as 10 2GHz cores that can each be continuously working on parallel work stuff that doesn't work when you have each task taking a break instead of constantly working. Servers had been dual and quad CPU's for a long time well be the idea of a single CPU multi core system ever existed. The Pentium D, Conroe, Nahelem, Sandybridge, A64X2, Phenom, and even BD are all designed for server first solutions. Certain dies like 4c SB, or Llama, may have been mobile or desktop oriented with extra attention to reducing power or kicking up clocks in mind. But unlike SL vs. SL-X everything but Atom and Jaguar had Server uses in mind first and even then nothing is hugely different between SL and SL-X but enough for the market to be very visible. All this is circling back to the main point disapointment in max clocks aside Netburst was always a dead end. It was the last major Arch before Intel moved to an extremely parallel arch in Itanium for the whole market and AMD forced a change by pushing performance and multi core making Intel to push forward in x86. The results would have been the same.

Design changes in Zen 2 (CPU/core/chiplet only)

Senior member

Senior member

Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Junior Member

Lifer

Golden Member

Lifer

Junior Member

Golden Member

Lifer

Senior member

Senior member

Senior member

Diamond Member

Member

Senior member

Lifer

Senior member

Senior member

Diamond Member

Diamond Member