• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."
  • Community Question: What makes a good motherboard?

Speculation: Ryzen 4000 series/Zen 3

Page 59 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

gdansk

Senior member
Feb 8, 2011
524
211
116
Speculation aside, what the heck is the fascination with SMT4? Because it really hasn't been done before (or at all in x86-land)? If AMD saw a reason for it, I'm sure they'd be working on it. With the massive amount of real cores available these days, I really question how much doubling threads would help outside of very specific use cases.
I think it's mainly that more than 2 threads per core has been done by other CPUs, in the SPARC and Power families. Those were also focused on high IO bandwidth, much like AMD's current EPYC offerings. But AMD has to design for both consumers and servers and SMT2 is a pretty good compromise.

Some example benchmarks from 7zip on a SPARC T5, 1 core 8 threads:

ThreadsCompression (MIPS)Decompression (MIPS)
1 22402100
2 36003230
4 43204570
8 46005460

As you can see the benefit diminishes.
 
Last edited:

jamescox

Member
Nov 11, 2009
149
264
136
Speculation aside, what the heck is the fascination with SMT4? Because it really hasn't been done before (or at all in x86-land)? If AMD saw a reason for it, I'm sure they'd be working on it. With the massive amount of real cores available these days, I really question how much doubling threads would help outside of very specific use cases.
As others noted, it has been done before. I am not sure what the first implementation was. There was the UltraSPARC T1 all of the way back in 2005. That one was 279 million transistors on a 90 nm process with an area of 378 mm2 (from Wikipedia) for up to eIght 4 thread cores. There was some weirder cpu designs even earlier like the Cray MTA processors (128 hardware threads per core). The IBM power 9 does SMT4 and SMT8 now.

The early multithreaded processors really didn’t have the resources to run both single thread and multithreaded code well. The T1 was up to 8 cores with only 279 million transistors. Modern processors have billions of transistors to play with. Most people I know turned HT off on intel processors up until intel made a bunch of improvements in Haswell or Broadwell; I forget which. Even after Broadwell, it often doesn’t increase performance at all for HPC and workstation applications.

I don’t really expect AMD to do SMT4 in Zen 3. It does make some sense though. They already have a good SMT2 implementation, so extending that to 4 threads could be plausible design-wise. Also, there doesn’t seem to be a core count increase with Zen 3, so that could make up for that. It does make sense for some server applications and AMD is server first these days.

I don’t think tackling SMT4 with Zen 3 would have been their best course of action. It isn’t useful outside of niche (but very profitable) applications. We pretty much know that Zen 3 is a massive rework of the cache architecture. The CCX is going 8 core and 32 MB of L3, which is a massive change by itself. I think they are also going to increase cache bandwidth significantly to support increased floating point throughout. We pretty much already know that also, we Just don’t know how much improvement there will be. There is a lot of discussion about SMT4 because it is still a bit of an unknown. With all of the other somewhat known changes, it just seems like too much to also add SMT4. We don’t really need to go up to more threads. Perhaps when they switch to DDR5 and PCi-e 5.0, they will also increase the thread count. That may also be the time to switch the IO die to an interposer with massive L4 cache. They are still at 8 channel memory for up to 128 threads now. They are also in the strange situation where they have more IO bandwidth than memory bandwidth. That will be great for supercomputer designs; massive overkill for just about anything else. I think they will want more memory bandwidth before they increase the thread count. A super large L3 cache variant will help with that. I think the applications where Intel wins will be almost non-existent with Zen 3 unless intel really pulls a hat out of a rabbit soon.
 
  • Like
Reactions: amd6502

Ajay

Diamond Member
Jan 8, 2001
7,593
2,753
136
Yep. 17% IPC overall and no SMT4 for Zen 3. Also no real clock bumps for Ryzen 4000 series.

Hence the broken clock comment. The guy finally gets to get something that's not the Radeon VII right.

Good on him.
We don’t know if he got anything right because Zen3 hasn’t shipped yet. It looked like he had a source from RTG or an AIB who gave him some info on Vega20 - that doesn’t automatically mean that he has any real contacts in the CPU teams or supply chain. If you throw enough spaghetti up against the wall, some of it will stick.
 
Last edited:

exquisitechar

Senior member
Apr 18, 2017
443
502
106
We don’t know if he got anything right because Zen3 hasn’t shipped yet. It looked like he had a source from RTG or an AIB who gave him some info on Vega20 - that doesn’t automatically mean that he has any real contacts in the CPU teams or supply chain. If you throw enough spaghetti up against the wall, some of it will stick.
Yeah, the Radeon VII is just about the only leak of his that’s been right. Every CPU related stuff he’s put out has just been him parroting the current rumors. He is probably mostly right in this case, though.
 

Thunder 57

Golden Member
Aug 19, 2007
1,573
1,553
136
As others noted, it has been done before. I am not sure what the first implementation was. There was the UltraSPARC T1 all of the way back in 2005. That one was 279 million transistors on a 90 nm process with an area of 378 mm2 (from Wikipedia) for up to eIght 4 thread cores. There was some weirder cpu designs even earlier like the Cray MTA processors (128 hardware threads per core). The IBM power 9 does SMT4 and SMT8 now.

The early multithreaded processors really didn’t have the resources to run both single thread and multithreaded code well. The T1 was up to 8 cores with only 279 million transistors. Modern processors have billions of transistors to play with. Most people I know turned HT off on intel processors up until intel made a bunch of improvements in Haswell or Broadwell; I forget which. Even after Broadwell, it often doesn’t increase performance at all for HPC and workstation applications.

I don’t really expect AMD to do SMT4 in Zen 3. It does make some sense though. They already have a good SMT2 implementation, so extending that to 4 threads could be plausible design-wise. Also, there doesn’t seem to be a core count increase with Zen 3, so that could make up for that. It does make sense for some server applications and AMD is server first these days.

I don’t think tackling SMT4 with Zen 3 would have been their best course of action. It isn’t useful outside of niche (but very profitable) applications. We pretty much know that Zen 3 is a massive rework of the cache architecture. The CCX is going 8 core and 32 MB of L3, which is a massive change by itself. I think they are also going to increase cache bandwidth significantly to support increased floating point throughout. We pretty much already know that also, we Just don’t know how much improvement there will be. There is a lot of discussion about SMT4 because it is still a bit of an unknown. With all of the other somewhat known changes, it just seems like too much to also add SMT4. We don’t really need to go up to more threads. Perhaps when they switch to DDR5 and PCi-e 5.0, they will also increase the thread count. That may also be the time to switch the IO die to an interposer with massive L4 cache. They are still at 8 channel memory for up to 128 threads now. They are also in the strange situation where they have more IO bandwidth than memory bandwidth. That will be great for supercomputer designs; massive overkill for just about anything else. I think they will want more memory bandwidth before they increase the thread count. A super large L3 cache variant will help with that. I think the applications where Intel wins will be almost non-existent with Zen 3 unless intel really pulls a hat out of a rabbit soon.
Poor wording on my part. I should have just said hasn't been done on x86, which isn't even entirely true. I was aware of POWER, not the others though. I still think it really only helps in certain areas though. I'm not saying we won't see it, just not with Zen 3. It may be useful for some server applications so we may see it at some point.
 

jamescox

Member
Nov 11, 2009
149
264
136
Poor wording on my part. I should have just said hasn't been done on x86, which isn't even entirely true. I was aware of POWER, not the others though. I still think it really only helps in certain areas though. I'm not saying we won't see it, just not with Zen 3. It may be useful for some server applications so we may see it at some point.
I essentially agree with that. I don’t think we will see it with Zen 3. It isn’t out of the realm of possibility though, especially since they already have a seemingly very good SMT2 implementation. It’ll the applications where it helps are niche, but it is a very profitable niche. They seem to be making some large cache variants that are also niche, but very profitable. Those will be for database servers and HPC machines. With all of the factors in favor and against, i don't think I can rule it out, but it does seem an unlikely feature for Zen 3. I think people are looking at the the somewhat known features and are wondering what the “surprise” is going to be, hence this fascination with the possibility of SMT4.
 

Thunder 57

Golden Member
Aug 19, 2007
1,573
1,553
136
I essentially agree with that. I don’t think we will see it with Zen 3. It isn’t out of the realm of possibility though, especially since they already have a seemingly very good SMT2 implementation. It’ll the applications where it helps are niche, but it is a very profitable niche. They seem to be making some large cache variants that are also niche, but very profitable. Those will be for database servers and HPC machines. With all of the factors in favor and against, i don't think I can rule it out, but it does seem an unlikely feature for Zen 3. I think people are looking at the the somewhat known features and are wondering what the “surprise” is going to be, hence this fascination with the possibility of SMT4.
Maybe people shouldn't be looking for a "surprise" feature. What the one for Zen 2? The only thing that really surprised me was the additional AGU. That was only a matter of time but I don't recall anyone talking about it before launch. I guess I'd have to over the launch again and see what may have been the "surprise" feature.
 

Richie Rich

Senior member
Jul 28, 2019
470
227
76
I should have just said hasn't been done on x86, which isn't even entirely true.
Intel Xeon Phi Knights Corner had 4-way SMT in 2011. Intel used SMT4 for slow In-Order 2xALU Atom core. They tried to eliminate stall states (typical for in-order core) to keep max throughput. When OoO uarch compensates stalls by speculative execution, the SMT4 will give energetic efficiency advantage (less speculative exec -> less missprediction -> higher IPC/core). Especially for 64-core Epyc systems (heavy TDP limited clocks) it would be useful. Every system which can load more than 256 threads would probably benefit. SMT4 is purely large scale server feature and AMD is aiming for this segment.

Regarding the +17% INT IPC and +40-50% FP IPC from rumors. I remember that AMD stated Zen uarch is aiming for +40% IPC over BD. However it was +52% in reality. So Zen3 could be +20-25% INT IPC in reality. It's pretty sure Zen3 will be significant wider new uarch... at least in FPU (doubling FP units and keeping 256-bit width). And those +17% INT IPC would be +35% IPC over Zen1 (that's a lot). I wouldn't be surprised when Zen 3 is going wider with ALUs too. I still believe Zen3 is Keller's Alpha EV8 resurrection in AMD (super wide +SMT4) :)
 

Thunder 57

Golden Member
Aug 19, 2007
1,573
1,553
136
Intel Xeon Phi Knights Corner had 4-way SMT in 2011. Intel used SMT4 for slow In-Order 2xALU Atom core. They tried to eliminate stall states (typical for in-order core) to keep max throughput. When OoO uarch compensates stalls by speculative execution, the SMT4 will give energetic efficiency advantage (less speculative exec -> less missprediction -> higher IPC/core). Especially for 64-core Epyc systems (heavy TDP limited clocks) it would be useful. Every system which can load more than 256 threads would probably benefit. SMT4 is purely large scale server feature and AMD is aiming for this segment.

Regarding the +17% INT IPC and +40-50% FP IPC from rumors. I remember that AMD stated Zen uarch is aiming for +40% IPC over BD. However it was +52% in reality. So Zen3 could be +20-25% INT IPC in reality. It's pretty sure Zen3 will be significant wider new uarch... at least in FPU (doubling FP units and keeping 256-bit width). And those +17% INT IPC would be +35% IPC over Zen1 (that's a lot). I wouldn't be surprised when Zen 3 is going wider with ALUs too. I still believe Zen3 is Keller's Alpha EV8 resurrection in AMD (super wide +SMT4) :)
Ehh, there were no SMT4 Atoms. Knight Corner was a co-processor. Knights Landing may have offered an alternative but as pointed out, the two are not comparable.

However, what I really want to know, is the answer to my question below:


You called someone naive for thinking AMD wouldn't create bogus slides in a presentation because it could be leaked. I asked you this:

Will you admit you were naive when Zen 3 comes out and there is no SMT4? Honest question.

I will admit I was wrong if so. Would you?
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
227
76
Ehh, there were no SMT4 Atoms. Knight Corner was a co-processor. Knights Landing may have offered an alternative but as pointed out, the two are not comparable.
The back-end was based on Atom core. So it's very comparable.

However, what I really want to know, is the answer to my question below:
You called someone naive for thinking AMD wouldn't create bogus slides in a presentation because it could be leaked. I asked you this:
I will admit I was wrong if so. Would you?
I'm afraid you didn't get the point. Let me explain it again please. I called anybody "naive" regarding UNVEILING SMT4 as Zen 3 feature during Zen2 presentation. AMD neither Intel never unveiled such a major feature at presentation of predecessor. Did Intel unveiled Nehalem's SMT2 at Core2Duo presentation? NO. Did AMD unveiled during presentation of Zen 1 some major features of Zen 2 (256-bit FPU, 3rd AGU, chiplet design)? NO. Is it clear now that AMD didn't want to unveil any major features of Zen 3 during that presentation, so there had to be written SMT2 (even Zen3 will support SMT4)? Not mentioning the presentation was at time when first engineering samples of Zen 3 was expected. Imagine dealing with possible HW problems/bugs with first generation of SMT4 core, AMD can just disable the feature and ship Zen3 without SMT4, repairing it later on with new revision or in Zen4. Not mentioning possible stock manipulation if SMT4 would be unveiled but not delivered. Early unveiling is just so bad, lose-lose scenario. "Naive" is right expression, no offense to anyone.


Regarding SMT4 it's just the matter of time when it becomes a standard server feature as SMT2 is nowadays. I'd call myself optimist regarding any inevitable CPU feature (SMT4, super wide core, sharing resources among cores, 6xALUs, 8xFPU pipes, chiplet design adopted by Intel, etc.). I'm holding some AMD stocks so Zen 3 is very important technology demonstrator for my future investment (I invest preferably into disruptive technology). So for me SMT4 would be something like AMD saying "We are not afraid to aim to be a leader and stay there". So it's not about being naive dreamer, it's about being realistic and make money while bringing useful stuff to the world. However, I understand from a conservative point of view (thanks to Intel stagnation) SMT4 may looks a risky feature at so early product like Zen 3.
 
Last edited:

Thunder 57

Golden Member
Aug 19, 2007
1,573
1,553
136
The back-end was based on Atom core. So it's very comparable.

I'm afraid you didn't get the point. Let me explain it again please. I called anybody "naive" regarding UNVEILING SMT4 as Zen 3 feature during Zen2 presentation. AMD neither Intel never unveiled such a major feature at presentation of predecessor. Did Intel unveiled Nehalem's SMT2 at Core2Duo presentation? NO. Did AMD unveiled during presentation of Zen 1 some major features of Zen 2 (256-bit FPU, 3rd AGU, chiplet design)? NO. Is it clear now that AMD didn't want to unveil any major features of Zen 3 during that presentation, so there had to be written SMT2 (even Zen3 will support SMT4)? Not mentioning the presentation was at time when first engineering samples of Zen 3 was expected. Imagine dealing with possible HW problems/bugs with first generation of SMT4 core, AMD can just disable the feature and ship Zen3 without SMT4, repairing it later on with new revision or in Zen4. Not mentioning possible stock manipulation if SMT4 would be unveiled but not delivered. Early unveiling is just so bad, lose-lose scenario. "Naive" is right expression, no offense to anyone.


Regarding SMT4 it's just the matter of time when it becomes a standard server feature as SMT2 is nowadays. I'd call myself optimist regarding any inevitable CPU feature (SMT4, super wide core, sharing resources among cores, 6xALUs, 8xFPU pipes, chiplet design adopted by Intel, etc.). I'm holding some AMD stocks so Zen 3 is very important technology demonstrator for my future investment (I invest preferably into disruptive technology). So for me SMT4 would be something like AMD saying "We are not afraid to aim to be a leader and stay there". So it's not about being naive dreamer, it's about being realistic and make money while bringing useful stuff to the world. However, I understand from a conservative point of view (thanks to Intel stagnation) SMT4 may looks a risky feature at so early product like Zen 3.
What you are writing is basically fanfiction, as others have said. You are saying what you want and assuming it will be true. 6 ALU this, SMT4 that, it gets old really fast. The people building the actual thing are way smarter than any of us plebs, get over it.

The chiplet design was known way before Zen 2 came to be. It wasn't some "state secret". So AMD will ship a maybe broken SMT4 in Zen 2 but fix it later? OMG. TLB all over again? There will be no SMT4 in Zen 3. Are you willing to make an agreement that when it comes out, you or I will have to admit they were wrong? I doubt it.

You are naive. You will proven to be naive. Other than you holding AMD stock. There will be no SMT4.
 

IntelUser2000

Elite Member
Oct 14, 2003
7,249
1,839
136
Ehh, there were no SMT4 Atoms. Knight Corner was a co-processor.
That's true. The Knights Corner core is based on extending the P54C, or the original Pentium core.

Knights Landing is Atom-based because the core is built from the 2-issue out of order Silvermont(actually, heavily modified). However Knights Landing as a product is not Atom.

And Atom Silvermont doesn't have any form of SMT, nevermind SMT4. It continues 3 generations later with Tremont that'll release sometime next year.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,049
627
136
Regarding SMT4 it's just the matter of time when it becomes a standard server feature
I just don't see why. SMT4 increases performance per core, at the cost of performance per thread. As AMD keeps pushing core counts up, the demand for more threads per core will go down, not up. I don't see moving to SMT4 as a matter of time at all.
 

Richie Rich

Senior member
Jul 28, 2019
470
227
76
I just don't see why. SMT4 increases performance per core, at the cost of performance per thread. As AMD keeps pushing core counts up, the demand for more threads per core will go down, not up. I don't see moving to SMT4 as a matter of time at all.
SMT4 pros:
  1. Context switching with lower penatly - Imagine server system with 256 threads running on it. 64c/128t EPYC will need OS scheduler switching between two threads -> there is a penalty (flushing caches etc.). 64c/256t CPU will run without context switching penalty.
  2. SMT4 lowers your latency penalty by half - that's huge benefit especially whey AMD use chiplet design (with higher latency).
  3. Higher performance from better core utilization.
  4. Energy efficient - when SMT4 double threads per core (half IPC per thread) -> less speculative OoO -> less penalty miss-prediction -> more performance per watt -> higher clocks within same TDP (not much, maybe single digit percentage).
SMT4 performance gain could be all above combined together something like 15-20% at cost of 5-10% transistor increase. Even if it would be only 5-10% performance gain, it's great deal due to linear scaling performance per transistor. There is not many such a low hanging fruits left. IMHO for high threaded applications is SMT4 a great advantage. And AMD is targeting this server sector. How does SMT4 looks now?
 

Adonisds

Member
Oct 27, 2019
97
33
51
What you are writing is basically fanfiction, as others have said. You are saying what you want and assuming it will be true. 6 ALU this, SMT4 that, it gets old really fast. The people building the actual thing are way smarter than any of us plebs, get over it.

The chiplet design was known way before Zen 2 came to be. It wasn't some "state secret". So AMD will ship a maybe broken SMT4 in Zen 2 but fix it later? OMG. TLB all over again? There will be no SMT4 in Zen 3. Are you willing to make an agreement that when it comes out, you or I will have to admit they were wrong? I doubt it.

You are naive. You will proven to be naive. Other than you holding AMD stock. There will be no SMT4.
That's funny, he just can't say that empirical data might show in the future that he was wrong this time, no matter how many times you repeat. If he is wrong he probably will find a way to evade admitting that he was wrong in this small thing and continue wishful thinking
 
  • Haha
Reactions: A///

jamescox

Member
Nov 11, 2009
149
264
136
Nah, it def power2
SMT2 -> SMT4 -> SMT16 -> SMT256 -> SMT65536 -> SMT4294967296 -> so on so forth.
128 threads per core has actually been done. It wasn’t SMT though, it was a barrel processor. Although, after reading about the Cray MTA processors, I have wondered if such a thing would actually be able to do ray tracing well.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,049
627
136
Context switching with lower penatly - Imagine server system with 256 threads running on it. 64c/128t EPYC will need OS scheduler switching between two threads -> there is a penalty (flushing caches etc.). 64c/256t CPU will run without context switching penalty.
On modern x86, no caches are flushed on context switches. (All caches are physically tagged, TLB entries have ASIDs). In this respect, more SMT threads is not in any way superior to just using OS context switching. The primary problem, having a limited amount of cache competitively shared between increasing demand for it, remains.

SMT4 lowers your latency penalty by half - that's huge benefit especially whey AMD use chiplet design (with higher latency).
It doesn't reduce the penalty by half, it increases the size of the group of threads where there is no penalty. This is a different thing, and not nearly as valuable.
Higher performance from better core utilization.
Higher overall performance per core at the cost of lower performance per thread.
Energy efficient - when SMT4 double threads per core (half IPC per thread) -> less speculative OoO -> less penalty miss-prediction -> more performance per watt -> higher clocks within same TDP (not much, maybe single digit percentage).
Energy efficient at a lower performance level. If the core is widened to support more SMT, energy efficiency is going to go down, because many structures become substantially less energy-efficient as they are grown. For example, the energy used by the forwarding network grows at >n^2 as the amount of units is increased.

SMT4 performance gain could be all above combined together something like 15-20% at cost of 5-10% transistor increase. Even if it would be only 5-10% performance gain, it's great deal due to linear scaling performance per transistor. There is not many such a low hanging fruits left. IMHO for high threaded applications is SMT4 a great advantage. And AMD is targeting this server sector. How does SMT4 looks now?
Bad.

Because while you are purchasing 20% of speed at the cost of 5% of transistors, you are doing so at a performance level at around 60% of SMT2 or ~35% of the performance of a single thread. You know what else would be amazingly energy-efficient at that performance level, totally beating Zen3? A chip with 128 ARM cellphone cores (or jaguar cores, or whatever). Or even shrunken Bulldozer. The demand for such CPUs is zilch. Because we are already hitting Amdahl's law pretty hard on most modern systems, increasing total performance by reducing performance/thread is not really interesting for anyone. The server world wants more cores because it allows us to consolidate more systems into fewer servers. This works well when the performance/thread available from the system goes up or at least stays flat. Which is precisely why Rome is such a great product. And why there is very little actual demand or interest for SMT4.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,209
724
136
A chip with 128 ARM cellphone cores (or jaguar cores, or whatever). Or even shrunken Bulldozer. The demand for such CPUs is zilch.
The demand is exceedingly high, if the price is less than EPYC.

If AMD came out with a Opteron using an architecture derived from Family 15h designs. Lets say shifted to Low-power Embedded w/ some cost-sensitivity.
64-core same as EPYC2, but it performed less, consumed less, and costed less.

Then, there would be huge demand for it.

$1899(max. observation) => 12FDX w/ 64c (2 GHz(stock), less than 96 MB L2+L3 cache)
Same IOD with EPYC is possible. However, a lower costing, lower power 8-die capability 12FDX IOD is preferred.

FDSOI does chiplets better at lower cost.
FDSOI does i/o chips better at lower cost.

There is potential for a 64CPP High Mobility option being included into 12FDX. Which would allow 12FDX to be competitive to 7FF HPC's 57/64 pitch std cells at lower cost(~60 masks(lower defect) vs ~90 masks(higher defect)). It is hasn't been done in a while; 22nm Intel = (90 CPP/80 Mx) and Samsung 20LPE/GloFo's 20LPM 80p option = (90 CPP/80 Mx), so it will be interesting to see a 12FD option for 64CPP/56Mx.
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
227
76
On modern x86, no caches are flushed on context switches. (All caches are physically tagged, TLB entries have ASIDs). In this respect, more SMT threads is not in any way superior to just using OS context switching. The primary problem, having a limited amount of cache competitively shared between increasing demand for it, remains.
You are wrong. Of'course there is L1 and L2 cache content replacement after every context switch. It has some performance pros and some cons. However at the end there are 25% SMT2 gain. And approximately 15% SMT4 gain.


It doesn't reduce the penalty by half, it increases the size of the group of threads where there is no penalty. This is a different thing, and not nearly as valuable.
My bad wording. SMT4 reduces relative latency by half. Performance is code specific and variable. Anyway, it more than compensates chiplet higher latency penalty.


Because while you are purchasing 20% of speed at the cost of 5% of transistors, you are doing so at a performance level at around 60% of SMT2 or ~35% of the performance of a single thread. You know what else would be amazingly energy-efficient at that performance level, totally beating Zen3? A chip with 128 ARM cellphone cores (or jaguar cores, or whatever). Or even shrunken Bulldozer. The demand for such CPUs is zilch. Because we are already hitting Amdahl's law pretty hard on most modern systems, increasing total performance by reducing performance/thread is not really interesting for anyone. The server world wants more cores because it allows us to consolidate more systems into fewer servers. This works well when the performance/thread available from the system goes up or at least stays flat. Which is precisely why Rome is such a great product. And why there is very little actual demand or interest for SMT4.
You probably don't understand what Amdahl's law mean. This is related to CFD, FEM and other math models of physics where parallelization is limited. But I was talking specifically about servers with HIGH number of threads (web servers, SQL, etc.) where SMT4 gives performance advantage. AMD knows how many customers might benefit from SMT4 and if their analysis shows good profit I'm sure they will bring it in Zen 3 (as Sparc and IBM did).

And BTW those 128 cores of Cortex A77 (106% IPC of Skylake according SPECint2006) will be much faster than 64c/128t x86 CPU. Skylake with SMT2 gain +20% (120% total), so 60% per thread. And ARM A77 has 106% already, no SMT, so it's 60% vs. 106%.... that's almost double the performance per thread. ARM becomes more and more serious threat for x86 world (Amazor Graviton2, Nuvia, Cavium etc.). AMD has to evolve fast and bring innovations such as SMT4 as soon as possible because this server specific path ARM's Cortex cores cannot follow right know. Next new uarch for SMT4 will be Zen5 in 2022 and this might be too late. In 2022 ARM will have 25% laptops and 5-10% servers.
 

soresu

Golden Member
Dec 19, 2014
1,534
762
136
And BTW those 128 cores of Cortex A77 (106% IPC of Skylake according SPECint2006) will be much faster than 64c/128t x86 CPU.
Pointless speculation as neither a 128C Neoverse design, nor N2 has been announced yet - let alone benched.

Nx is based on A7x, but there are a changes made beyond the base core that make predicting performance difficult until we have a working N1 based chip being benched by Phoronix or some such.
 

amrnuke

Golden Member
Apr 24, 2019
1,002
1,515
96
You are wrong. Of'course there is L1 and L2 cache content replacement after every context switch. It has some performance pros and some cons. However at the end there are 25% SMT2 gain. And approximately 15% SMT4 gain.
You're going to need to provide a source for your assertion.

You probably don't understand what Amdahl's law mean. This is related to CFD, FEM and other math models of physics where parallelization is limited. But I was talking specifically about servers with HIGH number of threads (web servers, SQL, etc.) where SMT4 gives performance advantage. AMD knows how many customers might benefit from SMT4 and if their analysis shows good profit I'm sure they will bring it in Zen 3 (as Sparc and IBM did).
SMT4 may be beneficial for servers. But since Zen3 chiplets will be used in server, HEDT, and mainstream, and SMT4 will require added die space from a redesigned and expanded front-end, one of two things would need to be true:

1) AMD are splitting Zen3 chiplets into server design and mainstream/HEDT design
or
2) AMD are willing to decrease yields on chiplets for a feature that will be disabled (SMT2 only) on mainstream/HEDT systems, in order to see a minor server gain at the expense of increased power expenditure and heat production

I don't see either of those being true. Hence I will stick with the reality that SMT4 is a pipedream for Zen3.

And BTW those 128 cores of Cortex A77 (106% IPC of Skylake according SPECint2006) will be much faster than 64c/128t x86 CPU. Skylake with SMT2 gain +20% (120% total), so 60% per thread. And ARM A77 has 106% already, no SMT, so it's 60% vs. 106%.... that's almost double the performance per thread. ARM becomes more and more serious threat for x86 world (Amazor Graviton2, Nuvia, Cavium etc.). AMD has to evolve fast and bring innovations such as SMT4 as soon as possible because this server specific path ARM's Cortex cores cannot follow right know. Next new uarch for SMT4 will be Zen5 in 2022 and this might be too late. In 2022 ARM will have 25% laptops and 5-10% servers.
This is just speculation. ARM may have 25% marketshare in 2022 --- but only on the back of Apple if they do end up switching to ARM-based processors (which are not Cortex-based, mind you).

I doubt AMD is worried about Apple switching from Intel to ARM, since Apple's Mac laptop shipments are growing at a whopping 1% per year. AMD cares more about digging into Intel's marketshare among Dell, HP, and Lenovo laptops, who combined ship 65% of all laptops. ARM isn't going to make major inroads into the corporate laptop market (50% of laptop sales last year) since Windows ARM sucks and it's not going to be ready for corporate deployment in 2022, or realistically even 2025.

Dell - does not offer ANY ARM-based laptops on their online store, even their Chromebook is Intel-based
HP - does not offer ANY ARM-based laptops on their online store, even their Chromebook is Intel-based
Lenovo - has some ARM offerings, not many, certainly nothing enterprise-worthy

Even IF Apple moves to ARM, it's not Cortex, it's just ARMv8 based, Apple's software may not even be compatible with other ARM-based processors, and even then, the only market ARM-proper (Cortex-based) is going to keep making significant inroads into by 2022 is the low-margin cheap laptop area. Even then, it may have difficulty hitting 25% even WITH Apple's conversion.
 

soresu

Golden Member
Dec 19, 2014
1,534
762
136
Dell - does not offer ANY ARM-based laptops on their online store, even their Chromebook is Intel-based
HP - does not offer ANY ARM-based laptops on their online store, even their Chromebook is Intel-based
Lenovo - has some ARM offerings, not many, certainly nothing enterprise-worthy
Snapdragon chromebooks are coming, and will likely be seen in the stables of several manufacturers.

Google's ongoing work on the open Adreno GPU drivers (Freedreno/OpenGL and TURNIP/Vulkan) supports their commitment to this aim.
Even IF Apple moves to ARM, it's not Cortex, it's just ARMv8 based, Apple's software may not even be compatible with other ARM-based processors, and even then, the only market ARM-proper (Cortex-based) is going to keep making significant inroads into by 2022 is the low-margin cheap laptop area. Even then, it may have difficulty hitting 25% even WITH Apple's conversion.
I'm still looking forward to BYOD systems that plug your phone into a USB C hub that handles all your monitor, network, keyboard, mouse and USB connections.

Android 10 has more extensive multi window support, and Chrome OS seems to be leaning towards variable/virtual desktop support too these days - could be good for cheap chromebooks plugging in at work desks.

I wouldn't be surprised to see iPad OS gain such features too.
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
227
76
You're going to need to provide a source for your assertion.


SMT4 may be beneficial for servers. But since Zen3 chiplets will be used in server, HEDT, and mainstream, and SMT4 will require added die space from a redesigned and expanded front-end, one of two things would need to be true:

1) AMD are splitting Zen3 chiplets into server design and mainstream/HEDT design
or
2) AMD are willing to decrease yields on chiplets for a feature that will be disabled (SMT2 only) on mainstream/HEDT systems, in order to see a minor server gain at the expense of increased power expenditure and heat production
Did you noticed Zen1CPU chip had a ton of server features (ECC, interchip links etc.) on board which was disabled in desktop? The same for K10 Agena, Barcelona, Thuban and K8 too. Weak up, AMD is designing server CPUs as a priority for long time. Desktop was, is and will be just a derivative.


This is just speculation. ARM may have 25% marketshare in 2022 --- but only on the back of Apple if they do end up switching to ARM-based processors (which are not Cortex-based, mind you).
Yes, that's speculation, however people like you does underestimate ARM too much. AMD engineers don't. What will be the first 5nm CPU at TSMC? Apple A14 and Kirin 1000 (based on Cortex), both ARMs, both larger die size that AMDs chiplets. 55% of gaming revenues are from ARM platform and still increasing. ARM has 6xALU in Apple's A12 SoC with highest INT IPC, ARM will have SVE2 instruction set with 128-2018 bit variable vector width. That's scary stuff already. AMD needs to push in servers at maximum. SMT4 is low hanging fruits in servers, higher performance gains than transistor increase. That's why I expect SMT4 for Zen 3 already.


The chiplet design was known way before Zen 2 came to be. It wasn't some "state secret". So AMD will ship a maybe broken SMT4 in Zen 2 but fix it later? OMG. TLB all over again? There will be no SMT4 in Zen 3. Are you willing to make an agreement that when it comes out, you or I will have to admit they were wrong? I doubt it.

You are naive. You will proven to be naive. Other than you holding AMD stock. There will be no SMT4.
TLB bug was reality for K10, it can happen again to any manufacturer. There is a ton of bugs in every CPU, some of it limits features, some of it limits clocks, just look in errata, that's nothing new.

You try to avoid my question about "naivety". That question was: What would you put into that presentation if you know that Zen 3 is SMT4? Please just answer the question. My thinking is: Empty space would lead to speculations, SMT4 can't use for to keep it hidden, IMHO the only option is to use SMT2 because it is supported as well. I'm looking forward to see what would be your choice.
 

soresu

Golden Member
Dec 19, 2014
1,534
762
136
SMT4 is low hanging fruits in servers, higher performance gains than transistor increase.
I've said it before and I'll say it again - SMT4 is nothing special.

POWER7 had it in 2010 and SPARC T3 had SMT8 in the same year.

If it was such an obvious 'low hanging fruit' gain, then everybody would be doing it rather than the now more obscure POWER and SPARC options.

If it was obvious then you can bet Intel who introduced x86 SMT would have done it by now - they've had more than enough opportunity while they were maintaining a more competitive cadence before 10nm woes set in.

You fail to acknowledge that engineering a core for maximum thread count could compromise its single thread performance - a direction AMD has tested before and barely lived to regret with Bulldozer before Zen.

At this point their concentration is on ST performance and core counts - those 2 things alone provide a steady improvement to MT performance per generation.
 

ASK THE COMMUNITY