Design changes in Zen 2 (CPU/core/chiplet only)

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Programs don't run in isolation. They run on modern operating systems with a slew of processes, threads, and other applications running. ST performance is sufficient.

Sure tell you what - you go call AMD and Intel and tell them no further IPC per thread improvements are required.

See how far you get into your explanation before they wet themselves laughing.


No matter how amazing the hardware is, it's rare that any commercial software is capable of exploiting it 100%.

Its not rare - its unheard of. The software that exploits 100% of the hardware with absolutely no sub-optimal lines has not yet been released - its still in streamlining and testing - or the company making it has went bust.

Every code has some degree of cost:benefit to how optimised it needs to be.

If it is cheaper to buy in faster hardware to provide faster performance (to a level deemed fast enough) than it is to spend on resource to (further) optimise the code - then that is what you do.

I've written stuff that runs pretty well on 32 threads. Is it perfect? No, its incredibly far from perfect. Is it fast enough? Not really, but it'll do. Can I speed it up further? Not without spending hundreds of man-hours learning further optimisation techniques and making sure they don't break things. So how will I speed it up in future? Faster hardware.


That is the real world ub4ty. People/Organisations simply don't have the time/money to optimise to the nth degree.
 
May 11, 2008
21,984
1,355
126
No-one said that. It is a fact that multi-threading comes with it's own set of problems, so having high ST performance will always be desirable. Engineering software for multithreading is simply one more complexity layer developers have to deal with for the sake of performance. And it has it's limits. Apart from scaling issues once you go into higher thread counts, not everything can be parallelized efficiently. Nine women can deliver nine babies in nine months, but nine women can't deliver one baby in one month.

If you would have read my post instead of thinking amusing but not accurate analogies , you would have read that i wrote exactly what you have written in your post.
 

DrMrLordX

Lifer
Apr 27, 2000
22,594
12,480
136
@ub4ty we still need better "ST performance" for VMs. Right now it's hard to get a server CPU with clocks higher than 3 GHz. For example, EPYC 7601 is the highest-clocked EPYC out there, and it hits a max all-core turbo of 2.7 GHz. VMs need good core response times, which often leads to a lot of server room admins disabling features like SMT that allow full use of the core. More clocks and/or better IPC through core/cache optimization would help alleviate the issue and might allow more server admins to enable SMT for VMs, which in turn allows for greater parallelism. Barring the possibility that SMT might amount to a security issue . . .
 
  • Like
Reactions: Gideon

ub4ty

Senior member
Jun 21, 2017
749
898
96
You must not remember the days back then. Multi-core and multi-thread was always the future. At least for x86. I remember using a 1c CPU when we had 1c focused apps. I remember using 2c CPU's when everything was still coded for 1c. I still remember using a 4c CPU when everything was coded for 1c. A single core 20GHz would be a billion times more sensitive to billion different IO issues. Major applications would be constantly be stalling out. Applications would still thread lock out the system when IO hanged.

On top of all of that what Ubty said is correct. Desktop users are the only people that get locked in a stupid clock speed matters mindset. A 20GHz CPU doing 8 different tasks at different times in its chain isn't going to work an HPC, Nueral network, or simulation setting as well as 10 2GHz cores that can each be continuously working on parallel work stuff that doesn't work when you have each task taking a break instead of constantly working. Servers had been dual and quad CPU's for a long time well be the idea of a single CPU multi core system ever existed. The Pentium D, Conroe, Nahelem, Sandybridge, A64X2, Phenom, and even BD are all designed for server first solutions. Certain dies like 4c SB, or Llama, may have been mobile or desktop oriented with extra attention to reducing power or kicking up clocks in mind. But unlike SL vs. SL-X everything but Atom and Jaguar had Server uses in mind first and even then nothing is hugely different between SL and SL-X but enough for the market to be very visible. All this is circling back to the main point disapointment in max clocks aside Netburst was always a dead end. It was the last major Arch before Intel moved to an extremely parallel arch in Itanium for the whole market and AMD forced a change by pushing performance and multi core making Intel to push forward in x86. The results would have been the same.
^Thank you ! No need to reply to anyone else. Exactly correct ! Someone gets it although a small minority. It's like they never heard of interrupt routines that handle I/O and carefully ignored how many processes/threads they can confirm are running in task manager in a modern OS. UI's are responsive exactly because of multi-core. Single core was a horrid experience. Dual core changed everything (C2D).

Jesus Christ. Let's send all Apple, Intel and AMD engineers to ub4ty's microarchitecture masterclass. Who knew they urgently need reschooling??! /s
There's no need but it's clear that a large number of users who are ranting and raving never took comp arch 101 or OS. Meanwhile, pretending like they have PhDs. Feel free to confirm this for everyone.
 

beginner99

Diamond Member
Jun 2, 2009
5,314
1,756
136
A 20GHz CPU doing 8 different tasks at different times in its chain isn't going to work an HPC, Nueral network, or simulation setting as well as 10 2GHz cores that can each be continuously working on parallel work stuff that doesn't work when you have each task taking a break instead of constantly working. Servers had been dual and quad CPU's for a long time well be the idea of a single CPU multi core system ever existed.

All you are saying is that multi-core works for "embarrassingly parallel" workloads and many server workloads. Different request on a web server can all be treated independently. Scaling is very good here obviously. As a site note you still want that single request to be very fast if your request is heavy in calculation. Not everything is just a database query. You aren't going to run a web server on a 10000 286-cores. Also scaling is very good on VMs. You actually assign more cores than you have and the VMs are completely independent.

And then ub4ty rants about high refresh rate gaming and that the 10th of ms in difference don't matter and are not noticeable but here context switches on the single digit micro-second level suddenly are a big deal? You can only choose one of the 2 standpoints.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
ub4ty: You keep equating single thread performance to mean a single core processor - and then are acting as if everyone else needs to do a phd in microprocessor design before daring to confront your 1st semester undergrad level argument.

One is not the other. Literally no-one else is saying we should go back to a single core, single thread processor - although some are saying it'd make things a lot easier to program (TRUE). Everyone is saying that single thread performance matters - and its something you seem unwilling to accept.

If ST performance didn't matter - then why was Bulldozer such a failure? If ST performance didn't matter - then why was the Magny-Cours Opteron a flop? If ST performance didn't matter - then why are Intel at ~90-95% marketshare in the server/workstation environment when AMD could easily have been competitive on core counts long before Zen?
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
@ub4ty we still need better "ST performance" for VMs. Right now it's hard to get a server CPU with clocks higher than 3 GHz. For example, EPYC 7601 is the highest-clocked EPYC out there, and it hits a max all-core turbo of 2.7 GHz. VMs need good core response times, which often leads to a lot of server room admins disabling features like SMT that allow full use of the core. More clocks and/or better IPC through core/cache optimization would help alleviate the issue and might allow more server admins to enable SMT for VMs, which in turn allows for greater parallelism. Barring the possibility that SMT might amount to a security issue . . .
Sure you can user more clocks but at what cost? My point was about desktop (namely vidya gamers) ranting and raving about ST performance as if its the end-all-be all for computing. Who else complains about 'ST performance' on this forum besides this group? My point about the pro-market was that they run on even lower clocks.. Always have because that's what's power efficient. No one is running absurdly clocked CPUs in the enterprise because at scale the power/cooling bill would eat away significantly at profits and for no big benefit. Meanwhile, some dude playing video games whose engines are as old as dirt, inefficient as tar (proven by updates/patches that boost performance by double digits), is ranting about ST performance.... Pretending to have a grad degree in multi-core chip architecture at that enough to try to pretend they'd develop a better chip than AMD. ST performance is going to be what it is going to be at this point. Speedups that can be done between chip releases will occur. However, it's not the focus. Multi-core chip architecture and the complexities therein in. The higher the clocks, the less power efficient and hotter things are. At the absurd clocks others mention, especially at 7nm and smaller you'd have cross-talk, signal integrity, and interference issues. Physics 101.
 

beginner99

Diamond Member
Jun 2, 2009
5,314
1,756
136
Single core was a horrid experience. Dual core changed everything (C2D).

You are comparing going from 1 processor to 2 of the same kind. Of course that makes a difference. I'm comparing going from several, say 4 (or 64, it doesn't really matter for general windows tasks like browsing web or office) "slow cores" to 1 one very, very fast core is a completely different scenario. And the context switch in such a core would be a lot faster. I mean even in current ones it's in single-digit micro seconds and to say that matters when browsing anandtech or such is ridiculous.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
All you are saying is that multi-core works for "embarrassingly parallel" workloads and many server workloads. Different request on a web server can all be treated independently. Scaling is very good here obviously. As a site note you still want that single request to be very fast if your request is heavy in calculation. Not everything is just a database query. You aren't going to run a web server on a 10000 286-cores. Also scaling is very good on VMs. You actually assign more cores than you have and the VMs are completely independent.

And then ub4ty rants about high refresh rate gaming and that the 10th of ms in difference don't matter and are not noticeable but here context switches on the single digit micro-second level suddenly are a big deal? You can only choose one of the 2 standpoints.

Context .. Context .. Context switch. My informative post was in reply to a user who ranted about a 20Ghz single core processor while ignoring (OS 101) in which there are context switches, hardware interrupts, cache flushes, and a slew of other things that would bring it to its knees. Your 20Ghz processor will be nothing when it constantly has to hit a much slower main memory and before context switches that take micro-seconds to complete... Whereas various operations take nano-seconds. Just in case you need a reference..
A microsecond is equal to 1000 nanoseconds. When people make absurd comments about a 20Ghz single core vs a 4Ghz 8 core, I indeed question if they know a single thing about computer architecture, OS, or the order of magnitude difference between a micro-second vs nano-second operations and how costly a context switch is as well as cache flushing. Basic stuff you learn in school... Not the kinds of things you grasp while consuming tidbits of information from e-celebs regarding modern CPU performance.
 

DrMrLordX

Lifer
Apr 27, 2000
22,594
12,480
136
Your outta date on that one - they released a 16C hot rod a while back I think. Gimme a minute...


yep: EPYC 7371; 16C32T, 3.1 GHz base, 3.6 GHz all core Turbo, 3.8 GHz turbo (on 8C).

https://www.servethehome.com/amd-epyc-7371-review-now-the-fastest-16-core-cpu/

Okay, I didn't notice that one. Must be a competitor to some of the Xeon Gold Skylake-SP models that boost up to 4.2 GHz.

Sure you can user more clocks but at what cost?

Power, basically. You give up die space and cores for higher clocks. Still there are some server guys who are willing to pay that price and shed some cores to meet minimum core response times. See Xeon Gold/EPYC 7371 above.

Reminds me of when one of our forum heavies (forget exactly who; mighta been aigo) got an early QX6700 and overclocked the hell out of it to run four instances of single-threaded Forex software. He needed rapid response times out of that software. Being able to run multiple instances in parallel was good, but a multi-socket server board with chips running at lower clocks would not fit his workload. The QX6700 fit his needs since he could overclock it to improve response. It isn't all about UI responsiveness or fps. There are people in professional applications that need more speed per thread. Audio editing is another area where you want ultra-low latency everywhere - CPU core speed, I/O latency, RAM, etc.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
ub4ty: You keep equating single thread performance to mean a single core processor - and then are acting as if everyone else needs to do a phd in microprocessor design before daring to confront your 1st semester undergrad level argument.
The user stated quite clear that's what they're referring to and several users beyond myself replied as such. I suggest you re-read what kicked this off and stop for a minute and consider the absurdity of even mentioning a 20GHz processor. Furthermore, when a user makes comments that reflect they clearly haven't taken 1st semester undergrad courses, I'm going to remind them of it. As for higher level arguments, anyone can feel free to post any information backing their assertions. So far, I'm the only person whose posted outside links that confirm their statements.

One is not the other. Literally no-one else is saying we should go back to a single core, single thread processor - although some are saying it'd make things a lot easier to program (TRUE). Everyone is saying that single thread performance matters - and its something you seem unwilling to accept.
Where have you been throughout the history of computing? It's been all about single threaded performance and clocks for as long as I can remember. We finally crack past quad core just a couple of years ago w/ a doubling and suddenly it's all about ST performance again? 8 cores at 4Ghz isn't enough? So then I ask, who isn't it enough for? and the only people who clearly rant about this are people playing video games on antiquated game engines that don't properly utilize 8 cores and are filled with bottlenecks and inefficiencies that are reflected in double digit performance games after a patch. Thats what the major portion of this ST performance rant rests on. Zero technical analysis. Zero understanding of OS. Literally some e-celeb's benchmark on youtube comparing to unequal processors. Ignoring the fact that 14nm had limits that were inefficient and stupid to push that 7nm now frees up as it iteratively does with every gate shrink. So, no I don't accept absurd and unsupported arguments. When you know a good amount about how various things function, you don't accept a peanut gallery statement concerning a complicate systems performance issue. 4Ghz .. 4,000,000,000 clock cycles per second vs 0.005 processing time (200fps). You have to a very limited amount of brain function to actually suggest that the processor is the limiting factor here. Its the inefficient game engine and the large number of other complexities no one wants to pay money to properly develop for and why should they when their consumers will argue until they're blue in the face about Single threaded performance on anandtech forums.

If ST performance didn't matter - then why was Bulldozer such a failure? If ST performance didn't matter - then why was the Magny-Cours Opteron a flop? If ST performance didn't matter - then why are Intel at ~90-95% marketshare in the server/workstation environment when AMD could easily have been competitive on core counts long before Zen?

Modern day processors as in the release of Ryzen where the core count was doubled from Quad (thanks intel) to Octa and the ST performance was sufficient. Game engines magically were updated to take advantage of 8 core processors in a year (the software the ST performance ragers won't stop ranting about)? Do cores matter? Is anyone even using them? Anything can always be better. When gates shrink, they naturally do get better because there's a new window of physical efficiency that can be pursued. The present day absurdity regarding performance rants is established by a loud but vocal group of PC users who play vidya games. The absurdity rests in the fact that right now you have a 64 core processor about the fall out of the back of a truck. You can also buy a much higher clocked Quad core, six core, eight core, 16 core, and a 32 core. The more cores, the less the clocks will be and ST performance... So, what's all the bickering about? Buy what suits your needs. Hell, my Quad core Intel from 2014 still is great for gaming and that's because most games are still built for Quad core.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
You are comparing going from 1 processor to 2 of the same kind. Of course that makes a difference. I'm comparing going from several, say 4 (or 64, it doesn't really matter for general windows tasks like browsing web or office) "slow cores" to 1 one very, very fast core is a completely different scenario. And the context switch in such a core would be a lot faster. I mean even in current ones it's in single-digit micro seconds and to say that matters when browsing anandtech or such is ridiculous.
part101_infographics_v08.png

A lot can be achieved in the clock cycles it takes for a context switch. Relative to clock cycles, you face the same fundamental issue no matter the clock speed. One operation takes far more clock cycles than many others. Responsiveness in a multi-processor environment which I have already detailed is actual achieved by multiple cores. How efficiently you write your software for the cores is the next big issue. Least of all is the ST performance... because its been solved with only iterative increases left.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
Power, basically. You give up die space and cores for higher clocks.
Exactly. It's essentially a well served market with a product for everyone. Thus there being no need for rants about performance in 2019. Given that it's a market and there is no one size fits all, a company has to decide as to what segments they'll target. In this comes the distinctions. Something that won't change. At scale, power and efficiency is important thus why chips are binned far above desktop chips and clocks capped.

Still there are some server guys who are willing to pay that price and shed some cores to meet minimum core response times. See Xeon Gold/EPYC 7371 above.
And they are a super minority of the market...and the chips that are specifically targeted towards them a serious premium as it should be. The same for the desktop market with even less significance and money.

Reminds me of when one of our forum heavies (forget exactly who; mighta been aigo) got an early QX6700 and overclocked the hell out of it to run four instances of single-threaded Forex software. He needed rapid response times out of that software. Being able to run multiple instances in parallel was good, but a multi-socket server board with chips running at lower clocks would not fit his workload. The QX6700 fit his needs since he could overclock it to improve response.
Reminds me of a person with a similar issue who ran linux with a custom scheduler patch and efficiencies therein. I myself have designed low level software coupled with hardware targeted at requirements for that specific industry. The hardware couldn't be changed so the speedups had to come from software. Resulted in me modifying code that was over a decade old that required multi-levels of management approval and a dedicated test team to ensure there weren't collateral bugs or regressions. Ended up being a couple of lines of elegant code as opposed to the thousands it would take from a crazy approach a dev manager suggested. A lot of issues center on the knowledge gap that exists between the hardware focused efforts and the software focused efforts. Whenever you combine the two, beautiful things happen. Alot has changed from those years and now. Complexity has increased significantly and there aren't many people who maintain a broad knowledge on how everything ties together. On the desktop, there's an insignificant portion of users with real ST performance issues which is why chips aren't designed with their complaints in mind. There's a tradeoff between multi-core performance and single-core efficiency. This is fundamental at the architectural level. So, in an age of high core-count, you're fundamentally losing single core performance every-time you increase core count. This is the major point. What a company can do to cost-effectively improve ST performance they'll do but the stance is sufficiency/adequacy with a focus on scaling cores. Literally the difference between comp-arch 101 and multi-core chip architecture. You learn the basics of a single flow pipeline and then you progress on to the advanced and much more difficult tasks of multi-cores. Everybody making chips has stamped out the ST performance iterations over the years. The focus is on multi-core scaling now. What bones are thrown to ST will be thrown but there is no big expectation of improvement nor often requirements.

It isn't all about UI responsiveness or fps. There are people in professional applications that need more speed per thread. Audio editing is another area where you want ultra-low latency everywhere - CPU core speed, I/O latency, RAM, etc.
A large number of professional application's core code is ancient. Google any number of them and you'll see their respective forums littered with complaints from users about large portions of code being single-threaded just because it was written when the max core count was dual core. Forget about core scaling. I recall a ridiculous exchange I had with a user regarding Autocad and why he had to go with intel. Meanwhile, I did some quick research and found an in-depth review and analysis between Ryzen/Intel and sure enough it was found that the software is just ancient and crappy and does not need to be single-threaded. Furthermore, most workloads are multi-threaded and an insignificant portion relied on single-thread. Audio-editing :
https://www.gearslutz.com/board/mus...ssors-faster-cores-v-s-higher-core-count.html
Again, you gotta stay w/ the times. The issue was that a large number of plugins weren't multi-threaded and didn't scale with cores. That's been resolved as one would expect and modern processors meet the clock requirements as one would expect. The last piece in the puzzle is the software being updated to modern hardware specs. Something that is progressing not regressing. So, do you design your processor for yester-years software or tomorrows? Hardware is driving the changes now not software. It wasn't always this way thus why debates about historic precedence are mute now. It's cyclical and hw is currently leading software not the other way around.

If you have requirements, there is HW to serve you. It simply comes down to how much $$ you have to resolve the issue. If you want to pay double AMD's CPU price for a dead socket and chip design with better ST performance for your vidya, you can do that. AMD is focused on the future not the past and the ryzen line and core complexes are server grade designs. Two different design philosophies and focus areas. What's the future? What's the past? Whose losing catering to yester-years hotrod ST performance and whose winning focusing on tomorrow's core scaling? Not enough ST performance? Go pay double for the competitor's ST focused chips. The end.
 
Last edited:

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
The user stated quite clear that's what they're referring to and several users beyond myself replied as such.

I'm reading your posts - and I keep seeing you state single-thread in the context of single-core.

If your really as educated on the matter as you claim, then you should already know that conflating the two is very inaccurate and misleading.

Either take the time to get it right, or stop with the "I know what I'm talking about and no one else does" attitude.


I suggest you re-read what kicked this off and stop for a minute and consider the absurdity of even mentioning a 20GHz processor.

If you'd bothered to actually read that post from your high horse, you'd see it was clearly an example to illustrate a point.

It was actually yourself that started this whole load of crap with #26 - which in itself was merely a means for you to try and appear clever but indeed appeared pretty much the opposite.

Lisa Su herself said:

Lisa Su said:
“Our first priority is overall system performance, but we know how important single-thread performance is,” Su said. “So you will see us push single-threaded performance.”

very far removed from:

You said:
Single threaded performance is for the birds and soon to be defunct software.


https://www.pcworld.com/article/3332205/amd/amd-ceo-lisa-su-interview-ryzen-raytracing-radeon.html

Furthermore, when a user makes comments that reflect they clearly haven't taken 1st semester undergrad courses, I'm going to remind them of it.

The construction of your own argument certainly isn't reflecting much beyond first semester undergrad.


Where have you been throughout the history of computing? It's been all about single threaded performance and clocks for as long as I can remember. <SNIP>

What does this entire paragraph even mean?

When did I remark on the history of CPUs?!? What on earth are you bringing in that for?!? I said no one is asking for us to go back to single core CPUs - I don't see how that means I'm saying they've never existed.

Take a minute and read what is actually being put to you, not what you think is being put to you.





Modern day processors as in the release of Ryzen where the core count was doubled from Quad (thanks intel) to Octa and the ST performance was sufficient.

Sufficient? For what? Maybe for what you are doing.

But I could point to many problems where computing power is still the limiting factor, and within those problems, much of the bottlenecks in workflow will reduce to areas largely defined by ST performance.

For instance, CFD - an area most would consider very parallel - yet when it comes to pre and post processing - not so much - so you are back to being bottlenecked by the performance of one (or a small number) of threads. In which case ST performance really matters. Not the be-all and end-all, but very important nonetheless.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
I'm reading your posts - and I keep seeing you state single-thread in the context of single-core.

If your really as educated on the matter as you claim, then you should already know that conflating the two is very inaccurate and misleading.

Either take the time to get it right, or stop with the "I know what I'm talking about and no one else does" attitude.
My stance and reply won't change because the post I was responding to hasn't.
The assertion of a 20Ghz single core monster was presented and I pointed out the absurdity. If you want to change the topic of discussion feel free. However, I already addressed that too regarding my statements about market coverage. If you claim single threaded performance isn't adequate on one companies multi-core chip, go and buy the competitor's for double the price. Niche products have niche pricing. There's nothing I have to debate about the performance differences between a 9900k and 2800x just the absurdity of continuing to rail against a mainstream processor's adequacy in terms of single-threaded performance without the consideration of how inefficient the software is that has questionable performance issues. If you're offended by this well formed assertion, so bet it. It's a position you are welcome to refute. I could care less as to my personal association with it. As for my critique of others, I have supported that with technical details. I have yet to see anyone else do the same. Again, you're more than welcome to. If you know what you're talking about, you know where to find the facts to support your assertions as opposed to continuing to try to mis-represent others.

If you'd bothered to actually read that post from your high horse, you'd see it was clearly an example to illustrate a point.
I suggest you take your own advice and remove horses and riders from the picture and re-read the thread w/o emotion because my posts are on-topic, supporting by technical information and in direct response to prior posts.


It was actually yourself that started this whole load of crap with #26 - which in itself was merely a means for you to try and appear clever but indeed appeared pretty much the opposite.
https://forums.anandtech.com/thread...2-cpu-core-chiplet-only.2556539/post-39703364 (Post #18)
https://forums.anandtech.com/thread...2-cpu-core-chiplet-only.2556539/post-39703383 (Post #19)
Post #20,#21,#22, #23
https://forums.anandtech.com/thread...2-cpu-core-chiplet-only.2556539/post-39703627 (#Post #24)
Post #25
Is where the ST performance absurdity began and then came my well informed post that put the ST nonsense debate to rest w/ technical supports from a game developer and the modifications they made to their game engine which received a good number of thumbs up replies which you claim was a load of crap. Then you come in having not participated in the whole thread with personal slants. The glaring issue you're having is that you're coming after me personally as opposed to the technical information and analysis that I put forth. I'm not sure I'm concerned with what some random person thinks about me. I'm more for putting well formed views forward with supporting arguments. If you disagree, post your position with supports. If not, get off the soap box. You're contributing nothing


Lisa Su herself said:
very far removed from:
https://www.pcworld.com/article/3332205/amd/amd-ceo-lisa-su-interview-ryzen-raytracing-radeon.html
The construction of your own argument certainly isn't reflecting much beyond first semester undergrad.
Yes, it's essentially how trivial the ST ranting is. As I already stated, you learn better in semester undergrad. If you care to get personal, I have a grad degree and work in this space for a living. Not that it matters or is necessary to point out the absurdity in a 20Ghz single core processor or the idea that UI's are responsive due to it. UI's are responsive due to multi-threading and multi-core processors. If a person actually knew this and wasn't full of hot air, they'd never assert the foolishness I clearly dismantled with first semester undegrad coursework. This makes me believe a good number of the typical characters with this malformed arguments have no formal education or professional experience in this subject matter. So, I have no clue who such people think they're fooling but people with actual education and experience know whose who and whose not. It's clear when you put forth absurd arguments that are easily dismissed with outside fact checking.

What does this entire paragraph even mean?

When did I remark on the history of CPUs?!? What on earth are you bringing in that for?!? I said no one is asking for us to go back to single core CPUs - I don't see how that means I'm saying they've never existed.

Take a minute and read what is actually being put to you, not what you think is being put to you.
You're 0/20. I'm not giving you anymore considerations.


Sufficient? For what? Maybe for what you are doing.

But I could point to many problems where computing power is still the limiting factor, and within those problems, much of the bottlenecks in workflow will reduce to areas largely defined by ST performance.

For instance, CFD - an area most would consider very parallel - yet when it comes to pre and post processing - not so much - so you are back to being bottlenecked by the performance of one (or a small number) of threads. In which case ST performance really matters. Not the be-all and end-all, but very important nonetheless.
Say what?
https://engineering.stanford.edu/ma...hers-break-million-core-supercomputer-barrier

“Computational fluid dynamics (CFD) simulations, like the one Nichols solved, are incredibly complex. Only recently, with the advent of massive supercomputers boasting hundreds of thousands of computing cores, have engineers been able to model jet engines and the noise they produce with accuracy and speed,” said Parviz Moin, the Franklin M. and Caroline P. Johnson Professor in the School of Engineering and Director of CTR.

CFD simulations test all aspects of a supercomputer. The waves propagating throughout the simulation require a carefully orchestrated balance between computation, memory and communication. Supercomputers like Sequoia divvy up the complex math into smaller parts so they can be computed simultaneously. The more cores you have, the faster and more complex the calculations can be.

CFD is all about core scaling, I/O, and highspeed interconnects. Exactly what AMD is targeting since Ryzen was launched.

https://www.hpcwire.com/2018/12/13/contract-signed-for-new-finnish-supercomputer/
^totally inadequate ST performance

I'm done with this foolishness. I'm obviously in the wrong place.
 
Last edited:
Dec 10, 2018
63
84
51
The future is massive multi-threaded. Single threaded performance is for the birds and soon to be defunct software.

This is the quote that started off the entire argument....

And the entire conflict up to know has been the issue that, yes you are right; The future is massively multi-threaded. HOWEVER that does not mean that single-threaded performance no longer matters (whether ST performance now is sufficient or not is a semantic debate on what sufficient means). For example, I'm sure on many tasks there is still ST overhead to divide the work (I only claim undergrad knowledge of this subject so feel free to correct me). If you can't divide the work up fast enough; you're still going to be bottlenecked in the end by ST performance.

I don't think most competent/knowledgeable people in this thread would argue against you regarding the importance of MT performance. But please don't mistake other's qualification of your statement as a personal attack. I'm here trying to keep my interest/passion in computer architecture up in the face of my boring classes, and it makes me sad if people (who claim) to work in the field I want to work in can't have a sane discussion.
 

Gideon

Platinum Member
Nov 27, 2007
2,012
4,986
136
The whole argument of "sufficient" single threaded perforance is stupid. It's can at best only be only adequate for the time being. If any competitor were to release a new "conroe" with a 50% IPC jump, you bet your a** that would be the new standard for ST apps before long.

Since we are process limited on clock speed, we will almost assuredly see IPC growth in the coming years (since it's the only avenue left)

Also, the notion that every single piece of programming from now on will be massively parallel ... just shows that you've never had to ship a single line of code for a fixed date or budget.

Code, especially perf critical code, will get more paralellized for sure. There will still be a boatload of software, where the added effort makes no fiscal sense.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Say what?

What did I say about reading what was actually being put to you rather than what you think is being put to you?

For instance, CFD - an area most would consider very parallel - yet when it comes to pre and post processing - not so much - so you are back to being bottlenecked by the performance of one (or a small number) of threads. In which case ST performance really matters. Not the be-all and end-all, but very important nonetheless.

I am pointing out that even on very parallel workflows, there are parts which bottleneck to be sensitive to ST performance.
Running the simulation is one thing, meshing it up and then interpreting the results are quite another.



Anyway, you are too entrenched in this ridiculous position you've taken to admit your talking crap, so I'll leave you to it.
 

DrMrLordX

Lifer
Apr 27, 2000
22,594
12,480
136
So, do you design your processor for yester-years software or tomorrows?

Both, really. There's still a lot of yester-year's software out there, and increasing core count won't automatically get rid of it. Hell x265 is fairly modern, but there's a thread limit to it beyond which your only avenue for improvement is SIMD, improving non-SIMD IPC, and increasing clocks.
 

Hitman928

Diamond Member
Apr 15, 2012
6,633
12,203
136
What did I say about reading what was actually being put to you rather than what you think is being put to you?



I am pointing out that even on very parallel workflows, there are parts which bottleneck to be sensitive to ST performance.
Running the simulation is one thing, meshing it up and then interpreting the results are quite another.



Anyway, you are too entrenched in this ridiculous position you've taken to admit your talking crap, so I'll leave you to it.

I don't do CFD sims but I do 3D EM sims a lot, which probably are very similar calculations. I experience the same thing, the actual matrix reductions and solving algorithm are usually ridiculously parallel work and scale very well across many threads, but the meshing work and matrix setup (pre-calc) are almost completely single threaded work by nature and can't really be multi-threaded.

Depending on the structure and desired resolution this part of the sim can be anywhere from 5% to 50% of the sim time (though it tends to the lower end of that range). One annoying thing about it is that while the matrix solver is running, you can start to see results at various points and verify that everything is setup correctly and results pass the sanity test but the pre-calc work has to be done first which can constitute a lot of waiting. Post processing is also pretty much single threaded but for these calculations is rarely a significant portion of the sim time.

Mark Papermaster just did an interview where he again addresses single thread (specifically gaming) performance. Still no details given but he says that Ryzen 3 will offer "very exciting gains" in this area. I guess we'll see in a few months if it's really an exciting uplift.

https://forums.anandtech.com/threads/amd-previews-ryzen-3rd-generation-at-ces.2559783/post-39705673
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
The difference between heckling and accurate assertions backed by industry standard information :
https://www.lanl.gov/conferences/salishan/salishan2011/3moore.pdf
Screen Shot 2019-01-15 at 2.45.19 PM.png
Frequency and Single-threaded performance .... The focus of processors for decades hits a fundamental wall... The focus shifts to multi-core/multi-threaded performance. Single-threaded performance primarily achieves gains w/ gate shrink as opposed to fundamental architectural changes as one would expect from a mature and well engineered execution pipeline formed over decades.


Screen Shot 2019-01-15 at 2.43.35 PM.png
Single-threaded performance walls hit... Industry transitions to Multi-core. Multi-core architectures have negative impacts on single threaded performance thus the divergence in performance going forward on single thread as reflected in both slide captions. Later comes HSA which uses custom accelerators for various work flows that cannot be accelerated by CPUs/GPUs which both have fundamental limitations. Take for instance dedicated hardware encode/decode asics found on GPUs for : .h264 and .h265.

http://www.gotw.ca/publications/concurrency-ddj.htm

CFD
A cute deflection from the core reality that can't be refuted :
(single threaded performance has fundamentally plateaued).... The hacks and optimizations have largely been figured out. The present-day improvements are not and will not be inline with the past. This is proven by the graph I posted and numerous other individuals who've put forth their own graphs. Out of order execution only works if the current instruction has no dependency on the prior's data output. Random data access leads to cache misses and tanks performance. Modern pipelines have been tweaked to reasonable limits with room for small iterative increases. With gate shrinks comes the ability to efficiently target elements that may yield single-threaded performance boosts. However, the hard work and big gains has been done for decades in this area... Known to anyone who knows computing history or taking an intro course to computing...Which is why single-threaded performance can be accurately stated as sufficient. Given that memory is the true bottleneck and subsequent memory stalls in which the CPU can't compute anything for a thread, the speed up for single-threaded performance will actually come from memory advancements as they have for some time.

The solver portion of CFD is embarrassingly parallel which is why compute clusters and super computers are used. This is where the embarrassing speed up comes from... Not from a 300Mhz bump in processor speed that executes instructions faster.
Single threaded performance is going to be what its going to be : sufficient.
IF/ID/EX/MEM/WB.. Comp Arch 101. Take even the most complex micro-architecture and it all comes down to the basics : IF/ID/EX/MEM/WB. The bottleneck is memory access which is why so much work goes into memory hacks and optimizations regarding caching, pre-fetch, and things like Out of order execution.

So, one can rant all they want about their specific use case to deflect from the core discussion. The debate can become as complicated as one would like.. However, that doesn't change physical reality and basics which all chip architectures are built on. As it relates to CFD, Workstations with lower core counts, higher clocks, and single threaded performance for pre/post.. compute clusters for solvers. The future has been multi-core and multi-threaded for some time. A workstation isn't a single core.. Current recommendation is 8 core. AMD is recommended as of Ryzen (because its sufficient). Absolutely everyone who legitimately works in the industry will laugh at you if you suggest otherwise and for basic reasons that are taught freshman year in college or standard industry knowledge.
For execution :
A->B->C->D with chain dependencies can only execute as fast as you can clock a processor, how quickly you can get data into the CPU, and how simple you can make your pipeline for a given flow. This goes directly against multi-core design requirements which is why the graph shows single-threaded performance diverging and in some cases decreasing going forward.

ansys-mechanical-recommendation.jpg


The bottleneck for single-threaded performance is memory access.
When your CPU doesn't have to stall for 10-100 cycles for the data it needs to push a thread through the pipeline, your thread will execute faster. Comp Arch 101.

This is the quote that started off the entire argument....
It was an accurate one as backed by data. The assumption was that I was an idiot and was directly stated by one user which is why this went off the rails. I find this stance to be common for someone who actually doesn't know what they're talking about (projection) and I eventually walk away from such mediums. A huge number of threads are filled with the usual detractors and the same template nonsensical arguments.

I don't think most competent/knowledgeable people in this thread would argue against you regarding the importance of MT performance. But please don't mistake other's qualification of your statement as a personal attack. I'm here trying to keep my interest/passion in computer architecture up in the face of my boring classes, and it makes me sad if people (who claim) to work in the field I want to work in can't have a sane discussion.
And yet that's exactly what certain users did w/o competency/knowledge which is why this went off the rails. But i'm the bad guy for calling them out w/ supported data?

Sad? You are in for a rollercoaster of emotions in this industry if you think you can inaccurately make assertions, feign knowledge you don't have, and talk down someone's data backed points with nonsense... I've directly witnessed cases where people were fired on the spot for doing so. If you don't know what you're talking about, you better ask questions and be silent. Trying to over talk someone w/ unbacked foolishness is a quick way to get yourself put out of a job especially in a no fault employment state.

My time here has expired.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
My time here has expired.

I've debated whether it is worthwhile writing this as your never going to see the light, but anywayz.

Kabylake would be around 30% quicker than Sandybridge in ST operations [normalised]- which is quite the contrast to your little graph which would postulate a regression in ST performance since 2010.


Regarding the CFD gump you've spewed - yet again - you are not reading what is being posted. You have yet to acknowledge the problems caused in pre and post processing via aspects of the problem that are sequential in nature.

Furthermore, since you are presenting yourself as some kind of subject matter expert - you should know CFD is not "embarassingly" parallel* as the communication burden between threads quickly you only benefit from increasing cores as your job size scales up.

Funny that you should actually state this - which is of course - quite (but not entirely) wrong with regards CFD:

The bottleneck for single-threaded performance is memory access.

*hence why I was careful not to use the term "embarassingly parallel" but instead "very parallel".


Anyway, I give up. You can remain happy in your little bubble where apparently continued improvement in ST performance doesn't matter to AMD or Intel.
 

beginner99

Diamond Member
Jun 2, 2009
5,314
1,756
136
I've debated whether it is worthwhile writing this as your never going to see the light, but anywayz.

Probably not worthwhile because even if he saw it, he failed to get it. The graphs don't matter. Nobody said multi-core isn't happening. We said single-threaded performance is still important and on top of that I said that multi-core is simply the result of hitting the clock wall. He(?) failed to think abstractly and arguing with what the current state is failing to see the current state would be different if there wasn't a clock wall. If 20 ghz cpu would be easier than 4 4 ghz ones, that is were we would be because it would also make the software easier. Path of least resistance and most profit. I mean even in current siutation there is a lot of expensive commercial software that would greatly benefit form multi-threading yet isn't and relies 100% on single.core performance. Why? Because multi-threaded is hard to get right and hence costly.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Probably not worthwhile because even if he saw it, he failed to get it. The graphs don't matter. Nobody said multi-core isn't happening. We said single-threaded performance is still important and on top of that I said that multi-core is simply the result of hitting the clock wall. He(?) failed to think abstractly and arguing with what the current state is failing to see the current state would be different if there wasn't a clock wall. If 20 ghz cpu would be easier than 4 4 ghz ones, that is were we would be because it would also make the software easier. Path of least resistance and most profit. I mean even in current siutation there is a lot of expensive commercial software that would greatly benefit form multi-threading yet isn't and relies 100% on single.core performance. Why? Because multi-threaded is hard to get right and hence costly.
It's not,this is just as backwards thinking as what uby does,even if we had a 20Ghz core two 20Ghz cores would still double the performance of a lot of workloads,three or four or eight of them would still multiply the performance by that much.