Speculation: Ryzen 4000 series/Zen 3

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
21,627
10,841
136
You have exactly the same issue with big.LITTLE. If a scheduler is theoretically capable of detecting high utilization threads and moving them from little to big cores it's also theoretically capable of moving them from SMT shared to dedicated physical cores. It's all a software problem.

I haven't seen a scheduler move low-utilization threads to a logical core over a physical core, ever. At least not under Linux. Which scheduler actually does this?

Did you see AMD going with SMT2 before they announced it? Did anybody see that first implementation beating Intel's HT with the very first implementation?

The only thing that was clear that was Zen would not be a CMT design. So the next logical conclusion was an SMT2 implementation, at least to copy Intel.

That's completely beside the point. Does the majority of desktop users need AVX2? Most very likely do not.

Depends on which users we're talking about here. Anyone who does encoding or rendering will like it. There are use cases. SMT4 though? Maaaaaybe, maybe not. I'd have to see some benchmarks to really understand how AMD's implementation of SMT4 would work before I was sold on it. If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former. 8c/32t probably moves me into a different price bracket/power envelope, making it maybe not an option for me anymore.

That's actually wrong unless you are specifically talking about HPC specifically. Servers in general are all about over-provisioning all kinds of resources, being prepared for the worst case resource usage scenarios.

HPC is one of the server applications where you'd want SMT4, so I was sort of erring on that side. Might be useful in high-utilization databases as well.

Patently wrong. The more cores a chip contains in one shared envelope the more the cores' activity will affect each other. The more cores can be put into deep sleep state the more headroom other cores can make use of. And as we know AMD developed Zen's microcode in PB in a way to dynamically make use of more headroom so it profits from that now already.

Now you're arguing thermals though, which is missing the point I'm making, since I'm assuming high CPU utilization overall for servers in scenarios where SMT4 might make sense. If all your cores are routinely sitting at 75% or higher utilization, then no, you do not worry about how the scheduler wakes up particular cores, since they aren't sleeping anyway.

But in the last two years AMD did the opposite of "selling more of the same since it works". Zen to Zen 2 completely changed the MCM topology. SMT is still very new to AMD, having been introduced only two years ago. Software support didn't prevent AMD from launching any of the Ryzen nor the Threadripper chips. Windows scheduler had serious issues with TR 1's NUMA, then again with TR 2 WX's unbalanced NUMA.

They sold SMT2 between 2017 and 2019. WRT SMT (or alternate strategies), that's "more of the same". They improved the individual cores and rejiggered IF links, but they didn't change their SMT strategy at all. They didn't go asynchronous core, they didn't go SMT4, they didn't kill SMT altogehter, they didn't resurrect CMT (thank goodness), etc.

resource allocation can be changed even after the creation of a VM,

In a matter of seconds? Milliseconds?

And disabling SMT/HT for cloud providers is due to them specifically offering resources per single vCPU, and you don't want this vCPU resource being a variable that depends on how many concurrent threads are on it. But that doesn't prevent server providers offering computing resources per CCX (or comparable big.LITTLE blocks) instead where SMT could be left enabled.

Okay, fair point. Some cloud providers might like SMT4. Others might not.

You yourself were arguing for the cat cores before.

Ah, but you have missed the larger point. Yes, I mentioned that a cut-down core or updated cat core might make more sense than moving to a wider Zen3 + SMT4. I'm still willing to acknowledge that it's 99.999999% unlikely that AMD would ever do such a thing. Making it even less likely that AMD will adopt SMT4.

...which is part of the uncore and offers intra chip connectivity that one always needs on any chip...

You may notice that not everyone has these problems in their design.

And Zen cores can power gate everything except the shared L3$. (I think I remember the APUs can even power gate the L3$ itself since it's not shared due to its single CCX nature, not sure.)

If the L3 is inclusive then I don't think they can. ARM manages to gate off parts of the L3 by using one that's exclusive or . . . psuedo-exclusive or something.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Hyper threatting is way too dangerous. I don't think they would do it, and if they did, people probably would be hesitant to use it.


As for SMT-n it's been floating around CS academia for decades, so I doubt it's patentable.


Maybe a trademark search (eg threadripping) might some day give us more to speculate on.
Too dangerous as to "Intel will sue the life out of you" or perhaps it's execution? Hyperthreading and SMT are basically the same, HT naming belongs to Intel and AMD has never used the term just SMT(SMT was coined by Dean Tullsen)
 
Last edited:
  • Like
Reactions: DarthKyrie and Ajay

moinmoin

Diamond Member
Jun 1, 2017
4,946
7,656
136
I haven't seen a scheduler move low-utilization threads to a logical core over a physical core, ever. At least not under Linux. Which scheduler actually does this?
Under Linux if the governor is set to powersave the scheduler may already opt to prefer running two threads on one SMT2 core instead two physical cores. Afaik there is no generic logic yet to differentiate between low and high utilization threads per se, but last year developers first introduced group_misfit_task to move computationally demanding processes from little to big cores and later ARM contribute Energy Aware Scheduling. And in the big picture the difference between logical and physical cores as well as the availability and sharing of resources inside a core are just details of the topology the scheduler can and should respect.

The only thing that was clear that was Zen would not be a CMT design. So the next logical conclusion was an SMT2 implementation, at least to copy Intel.
Exactly. And with Zen 2 already being as wide as Power 7 support for SMT4 in Zen 3 can be a valid logical conclusion as well.

Depends on which users we're talking about here. Anyone who does encoding or rendering will like it. There are use cases. SMT4 though? Maaaaaybe, maybe not. I'd have to see some benchmarks to really understand how AMD's implementation of SMT4 would work before I was sold on it. If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former. 8c/32t probably moves me into a different price bracket/power envelope, making it maybe not an option for me anymore.
My primary point was supply begets demand, not the other way around. Now that AVX2 is more readily available it will be supported more. At the beginning new hardware features are picked up by high specialized software (like video encoders are for SIMD extensions) before developers figure out general uses.

Your point about 8c/32t moving you into a different price bracket/power envelope confuses me a little. In all the years HT never really made a difference in the power envelope, it was simply a stagnant premium feature for Intel. AMD on the other hand only disables SMT on some lower end chips, again without affecting the power envelope in any way. Thanks to the generally increasing amount of cores in the last couple years software is more and more being prepared and optimized for multi-threading, and SMT/HT profit of that as well. For non-optimized software the advantage of SMT to the process itself is ambivalent, and the same would be true for SMT4.

Now you're arguing thermals though, which is missing the point I'm making, since I'm assuming high CPU utilization overall for servers in scenarios where SMT4 might make sense. If all your cores are routinely sitting at 75% or higher utilization, then no, you do not worry about how the scheduler wakes up particular cores, since they aren't sleeping anyway.
Here you are sticking to the HPC scenario whereas our initial discussion up to now was focused on big.LITTLE versus SMT4. All cores being at high utilization all the time is generally a niche exception, not the rule. Yes, in that scenario sleeping cores doesn't apply as a power efficiency measure, but that doesn't invalidate or affect the other scenarios.

They sold SMT2 between 2017 and 2019. WRT SMT (or alternate strategies), that's "more of the same". They improved the individual cores and rejiggered IF links, but they didn't change their SMT strategy at all. They didn't go asynchronous core, they didn't go SMT4, they didn't kill SMT altogehter, they didn't resurrect CMT (thank goodness), etc.
That's a very limited time window to discuss about. 2 years. 3 gens, of which one is a refresh. My definition of "more of the same" would be HT and it being virtually unchanged since over a decade. I don't expect that to happen with AMD.

In a matter of seconds? Milliseconds?
Sure, why not? This is purely a matter of software and (VM) OS support. E.g. VMware's vSphere supports hot addition and removal of cores.

You may notice that not everyone has these problems in their design.
You may have noticed that not everyone offers the same memory and connectivity in their designs. One big reason why ARM doesn't matter in datacenters so far aside software support is that once one adds all the memory and connectivity required ARM's big power efficiency advantage vastly diminishes. And the one area where they could make a difference, massive amount of cores (which, again, are not the issue here for power efficiency), is the one AMD is currently targeting.
 

NTMBK

Lifer
Nov 14, 2011
10,236
5,017
136
-> Up until recently, both Xbox One and PlayStation 4 have reserved two entire CPU cores (out of eight available) in order to run the background operating system in parallel with games. <-

With the option of custom thread occupation. All cores can be used by the OS(1-thread) and all cores can be used by the game(3-thread). The OS sees one thread, the developer sees up to three threads, the hardware sees four potential threads, * x cores.

In the case of Windows, CMT/SMT optimization cases;
Two threads or more = reduced performance => push it to another core
Two threads or more = increased performance => keep it on the same core.

The whole point of the dedicated OS cores is to provide consistent performance. Having OS threads unpredictably stealing cache, branch slots and execution resources from your game threads is no help whatsoever.

They would be better off just putting a quad Jaguar cluster on die to handle the OS independently, or allocating a single Zen 2 core to the OS.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
The whole point of the dedicated OS cores is to provide consistent performance.
Tell that to the Xbox 360. Core 0 is pure game, Core 1 and Core 2 are hybrid game/OS. There is no need for dedicated OS cores, if there is enough frequency, IPC, etc.
5% to 50% random OS usage, with the rest being game well within the lines of SMT.
They would be better off just putting a quad Jaguar cluster on die to handle the OS independently, or allocating a single Zen 2 core to the OS.
No, that would be worse.
 
Last edited:
  • Like
Reactions: DarthKyrie

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
Hyper threatting is way too dangerous. I don't think they would do it, and if they did, people probably would be hesitant to use it.


As for SMT-n it's been floating around CS academia for decades, so I doubt it's patentable.


Maybe a trademark search (eg threadripping) might some day give us more to speculate on.

HyperThreading is just Intel's branding for SMT, it isn't something different/special. There isn't any special sauce to anyones SMT implementation. The special sauce is in the core design itself and how many resources it has + how well it handles contention.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
  • Like
Reactions: amd6502

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
HyperThreading is just Intel's branding for SMT, it isn't something different/special. There isn't any special sauce to anyones SMT implementation. The special sauce is in the core design itself and how many resources it has + how well it handles contention.
At one time it was known as Project Jackson, Intel received assistance from Dean Tullsen

Jackson gets a name: Hyper-Threading Technology
https://www.anandtech.com/show/819/4.
 
Last edited:
  • Like
Reactions: NTMBK

amd6502

Senior member
Apr 21, 2017
971
360
136
For anyone reading or searching for SMT development history in Intel/AMD

Alpha EV8 (Part 1): Simultaneous Multi-Threat


Very interesting links. Amazing to see that transistor growth:
EV6 15.2M
EV7 130M
EV8 250M

Compare to Piledriver 106M per core (half module) and Presler's dual threading near 4ghz speed demon, which i guesstimate at ~130M per core ( https://www.anandtech.com/show/1910/3 , http://www.cpu-world.com/CPUs/Pentium_Extreme_Edition/Intel-Pentium Extreme Edition 965 HH80553PH1094M (BX80553965).html ) .

I'm guesstimating an efficient modern big core like Zen1 ~ 210M, plus or minus 20M depending on if you include L3 or not.



If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former.

Not a terribly good question. Better question, would you rather have a 4c/8t or 4c/16t in your future 2021/22 mobile 10W tdp APU.


And putting other threads to same core that already runs high-priority thread instead of idle cores is just stupid as it will slow down that high-priority thread.

Under Linux if the governor is set to powersave the scheduler may already opt to prefer running two threads on one SMT2 core instead two physical cores.

Exactly, the scheduler will need to work with both user preference and frequency governor to switch between performance versus power-efficiency prioritized scheduling.
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
Andrei has repeatedly said that SMT is not precisely very power efficient and that explains why ARM hasn't bothered implementing it on its Cortex A series.


That may or may not be true. Nosta had similar view and criticism against SMT efficiency, with CMT winning. CMT typically is monothreading on the integer side, but if you take it to the max and share the front end it has even further efficiency advantages over monothreading.

But I think there are problems with these views, even if it's technically true.

How would the consumer market like it if we went big.little acorn style by taking Raven or Picasso, or the 7nm equivalent quadcore SMT2, and tagging along 4 or 8 Puma+++ cores, or hyper efficient in-order Atom cores? Even if it is more efficient at churning through huge loads (that imho aren't typically seen by consumers) I think it would be unpopular with the consumer, as well as not the most balanced product.
 

jpiniero

Lifer
Oct 1, 2010
14,590
5,213
136
How would the consumer market like it if we went big.little acorn style by taking Raven or Picasso, or the 7nm equivalent quadcore SMT2, and tagging along 4 or 8 Puma+++ cores, or hyper efficient in-order Atom cores? Even if it is more efficient at churning through huge loads (that imho aren't typically seen by consumers) I think it would be unpopular with the consumer, as well as not the most balanced product.

One nice thing about big.LITTLE is that the big core doesn't have to worry as much about power saving, since it'll just be shut off the moment it's not needed. So on the big cores you might be able to go wider than you might otherwise, and keep the frequency high without needing to ramp up.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Nosta had similar view and criticism against SMT efficiency, with CMT winning. CMT typically is monothreading on the integer side, but if you take it to the max and share the front end it has even further efficiency advantages over monothreading.
Single core/dual core, Single SMT2 core(high ipc), single CMT2 module(replicated core+smt2 front-end) basically has the CMT2 module winning power efficiency. This power efficiency comes with better frequency scaling if the process has good Fmax merit; 22FDX/12FDX have more stable Fmax than 14LPP/7LP, etc. (There is also an issue of merit with Vmin/Vmax as well for FinFETs.)

It scales with CMP versions as well, with CMP8-CMT8 usually running on-par with 64-core, while running significantly less power than CMP8-SMT8. CMP8-CMT8 can be a fast ring, however CMP64 has to run a slow mesh. There is no scaling guarantee for the feasibility of a SMT8 doing well against an eight core. In a perfect world, it will run faster single threads and have less power, in reality it slogs hard.

However, there is Clustered Simultaneous Multithreading(CSMT) which fuses CMT and SMT. Which quite literally fixes everything with SMT. An CMP8-CSMT8 will have faster single threaded and the same power efficiency of a CMP8-CMT8. It can scale in reverse, CSMT2 has the IPC of a SMT2 core and the power efficiency of a CMT2 module. It has several names starting with DT-CMT(2005), CSMT(2007), IBM's variable-partitionable core(2009), etc. One of the early models actually implement power-gating and frequency/voltage-boost capability in the module as well. Allowing for a big brainiac when needed down to some speed-demons when needed.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,627
10,841
136
Afaik there is no generic logic yet to differentiate between low and high utilization threads per se

Supposedly EAS can do this already:

https://developer.arm.com/tools-and...software/linux-kernel/energy-aware-scheduling

Exactly. And with Zen 2 already being as wide as Power 7 support for SMT4 in Zen 3 can be a valid logical conclusion as well.

Um, not really? I think the chance of AMD copying anything meaningful from the POWER family to be next-to-zero. They'll probably continue to mimic Intel's best design decisions while tossing aside their worst ones.

My primary point was supply begets demand, not the other way around. Now that AVX2 is more readily available it will be supported more. At the beginning new hardware features are picked up by high specialized software (like video encoders are for SIMD extensions) before developers figure out general uses.

Yeah, but look at AVX512. How many developers are going to pick up that SIMD ISA? I would say, far fewer than have embraced AVX2. See @NTMBK's post below. How many developers are really going to want to embrace SMT4? Remember, if AMD does go SMT4, today's 8c buyers are going to be sold 4c or 6c products tomorrow. Most likely.

Your point about 8c/32t moving you into a different price bracket/power envelope confuses me a little. In all the years HT never really made a difference in the power envelope, it was simply a stagnant premium feature for Intel.

Of course it has made a difference. It's a price differentiator between the 9700k and 9900k today, just as it's been a feature/price differentiator at the low-end of Intel's stack as well (2c/4t i3s versus 2c/2t Pentiums, for example). Also see i5-6600k vs i7-6700k and others. Sure, there's binning and cache differences as well, but HT was a premium far far back in Intel's product stack. Even AMD differentiated between products as SMT on or SMT off with Summit Ridge. Look at the R3 product lineup. AMD is cost-conscious, and if they can sell a 4c/16t CPU in the price slot as their 8c/16t chips from today, then they will. Not sure where the TDPs would fall, mind you.

Here you are sticking to the HPC scenario

It's a best-case for SMT4, so why not?

whereas our initial discussion up to now was focused on big.LITTLE versus SMT4.

If DynamIQ wants to be taken seriously in the server room, it has to be useful in any possible scenario. And again, I think high-utilization database applications also fall outside of the HPC world. There are some times when a database is going to be running all cores well above 50%. Also CPU render clusters and other things.

That's a very limited time window to discuss about. 2 years. 3 gens, of which one is a refresh. My definition of "more of the same" would be HT and it being virtually unchanged since over a decade. I don't expect that to happen with AMD.

That's all we've got to go off of, though. Unless you think they'll go back to CMT or to a hybrid CMT/SMT solution. I think Sun/Oracle went that route. On second thought, let's hope AMD doesn't do that. In any case, AMD has no history of SMT4/8. Neither does Intel for that matter. I don't expect either one of them to try it considering how well IBM has done with advanced SMT implementation (which is to say, not very well at all; look at their market share).

Sure, why not? This is purely a matter of software and (VM) OS support. E.g. VMware's vSphere supports hot addition and removal of cores.

That's actually interesting, and makes the use case for DynamIQ SoCs in cloud environments somewhat compelling. But I'm veering off-topic a bit.

The whole point of the dedicated OS cores is to provide consistent performance. Having OS threads unpredictably stealing cache, branch slots and execution resources from your game threads is no help whatsoever.

They would be better off just putting a quad Jaguar cluster on die to handle the OS independently, or allocating a single Zen 2 core to the OS.

While I agree with you wholeheartedly in theory, in practice utilizing "little" cores might increase design costs. big.LITTLE/DynamIQ is also mostly unproven in the server room . . . for now. But what you're saying rings true.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
No I disagree that monothreading is more efficient when you compare similarly powerful cores. Well done little cores have an efficiency advantage over wide cores with a big reorder window. You've got to compare a like core to a like core. Can you sketch a diagram? I have a hard time figuring out how CMT could be merged with SMT or what you're saying.


How many developers are really going to want to embrace SMT4? Remember, if AMD does go SMT4, today's 8c buyers are going to be sold 4c or 6c products tomorrow. Most likely.

You wouldn't really need developers to embrace it. Not many at least; it would be mostly transparent to the applications. The only part might be the OS scheduler, and it would not be a big addition. Linus could write it over one weekend.

I completely disagree that there would be a regression in core counts. Quite likely SMT4 won't even be enabled by default for consumer desktop, where SMT2 likely would be the default. The use is much more for mobile where core counts are lower and where factors like battery life are very very important. There I see it enabled by default (assuming SMT4 arrives at all in Zen4 or Zen3).

If it arrives it will be a win-win for any PC user. Laptops will make leaps and bounds, and desktop users will just enjoy the wider core.

But anyways, we're all speculating a few years ahead. First we need to wait till 2020 for Zen2 APU. (Or, if it remains MCM we might see it by end of this year?)
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Can you sketch a diagram? I have a hard time figuring out how CMT could be merged with SMT or what you're saying.
Well there are different ways of implementing SMT on CMT and vice versa.

For example; ALQ0 running T0, ALQ1 running T1, ALQ2 running T2, ALQ3 running T3 // ALQx-a r Tx-a is a form of CMT on SMT. (Each ALQx has 2 ALUs)
While; Micro-core(core) C0-C3 all capable of running T0 through T3 is a form of SMT on CMT. (Each core(micro-core) has 2 ALUs+2 AGUs)
Any fine-grain thread control that can be achieved via OS, software, hardware that clusters hardware is CMT, even if it is only implemented for a SMT core. Predictability over the standard method of SMT is what CMT is striving for and reduced optimization times, reduced hardware bloat, etc.

First example has capability of disabling pairs(as AMD's SMT) and the second example can disable dispatch to lowest data cache(as AMD's CMT), for lower power. Which is more effective than multi-core of completely replicated processors of the same capability.

SMT4 -> Super-Wide OoO via 6x ALUs == bad for high frequency design <== barebone SMT design
SMT4 -> Clustered 2x, Wide OoO via 4x ALUs // Clustered 4x, Narrow OoO via 2x ALUs == good for high frequency design(least to most) <== CSMT design
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,627
10,841
136
No I disagree that monothreading is more efficient when you compare similarly powerful cores. Well done little cores have an efficiency advantage over wide cores with a big reorder window.

Not sure what you're saying here.

You've got to compare a like core to a like core.

Why? That's not what's on the table. Realistically, we all know (or should reasonably expect) that AMD will continue with SMT2 no matter how wide they make Zen3. But if we're looking at alternatives to synchronous core arrangements (CCXs) with SMT2, then we're inevitably going to be comparing fundamentally different cores, like 4-wide SMT2 vs. 6-wide SMT4 vs 4-wide SMT2 + something else entirely.

Can you sketch a diagram? I have a hard time figuring out how CMT could be merged with SMT or what you're saying.

https://en.wikipedia.org/wiki/SPARC

UltraSPARC T1 has at least one OpenSPARC implementation with 4-way SMT (known as S1). The UltraSPARC Architecture 2005 includes CMT extensions. Theoretically, one could have SMT and CMT implemented in the same hardware. Not sure if any of the existing SPARC implementations actually feature that odd combination. Nevertheless, it didn't help out Oracle much so it's all academic.

You wouldn't really need developers to embrace it. Not many at least; it would be mostly transparent to the applications. The only part might be the OS scheduler, and it would not be a big addition. Linus could write it over one weekend.

Unicorns and rainbows. Moving to SMT4 would be mostly transparent to applications that are throughput-bound. Anyone that actually cares about thread response time will have fits if their entire userbase switches to SMT4. I don't think it would be as bad as CMT implementations like Piledriver where cache thrashing could drive performance into the basement (see 3DPM v1 on Piledriver; it's horrendous). It would still cause a lot of headaches for developers. Plus don't pipeline flushes get worse for performance as you expand the number of threads that can be handled by SMT? Unless people are using something like this now:

https://patents.google.com/patent/US20040215938A1/en

I completely disagree that there would be a regression in core counts.

Why? A wider core requires more transistors. Expanding SMT capabilities requires more registers (again, more transistors). AMD isn't going to go for a major density change until 5nm (Zen4). If they give us wider cores, then we either get physically larger chiplets or we get fewer cores per chiplet. Either or. Not to speak of the changes they'd make in marketing their SMT4 products.

Quite likely SMT4 won't even be enabled by default for consumer desktop, where SMT2 likely would be the default.

. . . really?

The use is much more for mobile where core counts are lower and where factors like battery life are very very important. There I see it enabled by default (assuming SMT4 arrives at all in Zen4 or Zen3).

Why would you want SMT4 in mobile products? Asynchronous core arrangements currently dominate there. Intel is getting on board with Lakefield. Pushing SMT4 in the successor to Renoir would be bizarre.

But anyways, we're all speculating a few years ahead. First we need to wait till 2020 for Zen2 APU. (Or, if it remains MCM we might see it by end of this year?)

Renoir will probably not turn many heads in terms of funky design decisions. It will probably be monolithic to save on IF power overhead
 
  • Like
Reactions: NTMBK

moinmoin

Diamond Member
Jun 1, 2017
4,946
7,656
136
Andrei has repeatedly said that SMT is not precisely very power efficient and that explains why ARM hasn't bothered implementing it on its Cortex A series.
Eh, I think some people are mislead by the higher power consumption of SMT. That higher power consumption is a logical result of SMT making better use of the resources than a single thread would. But since resources are shared, the overall power efficiency is actually better with SMT on than off. Just compare power consumption per thread/performance for e.g. Cinebench MT with SMT on then off.

----

Um, not really? I think the chance of AMD copying anything meaningful from the POWER family to be next-to-zero. They'll probably continue to mimic Intel's best design decisions while tossing aside their worst ones.
WTH, who is talking about copying Power or Intel? Are you even following this discussion?

Look, the whole point of SMT is making better use of chip resources. As instructions differ in complexity, resulting in different latency and throughput, optimizing for heavier instructions doesn't benefit lighter instructions. So additional resources for speeding up heavy instructions will be laying dormant for most lighter instructions. Here SMT comes in to allow such dormant resources to be used for executing additional threads concurrently.

Yeah, but look at AVX512. How many developers are going to pick up that SIMD ISA? I would say, far fewer than have embraced AVX2.
Eww, personally I hope AVX512 disappears from the face of the earth again and is wholesale replaced by some/any SVE implementation. AVX512 is a very poorly designed ISA extension that just facilitates the future addition of even more custom extensions for the purpose of artificial market segmentation instead offering a flexible future proof framework for increasingly wider SIMD calculations.

See @NTMBK's post below. How many developers are really going to want to embrace SMT4? Remember, if AMD does go SMT4, today's 8c buyers are going to be sold 4c or 6c products tomorrow. Most likely.
At this point I honestly don't think you have a grasp on who works with SMT and what needs to happen to make good use of it. And I don't think you correctly interpret the market if you really think people are happy to suddenly decrease the number of cores again just because SMT2 were to be replaced by SMT4. The number of threads is a very technical detail most normal consumers don't know what it is about and usually just ignore.

Of course it has made a difference. It's a price differentiator between the 9700k and 9900k today, just as it's been a feature/price differentiator at the low-end of Intel's stack as well (2c/4t i3s versus 2c/2t Pentiums, for example). Also see i5-6600k vs i7-6700k and others. Sure, there's binning and cache differences as well, but HT was a premium far far back in Intel's product stack.
Really now? Do you really not read my posts?
In all the years HT (...) was simply a stagnant premium feature for Intel.
You are just repeating me.

Even AMD differentiated between products as SMT on or SMT off with Summit Ridge. Look at the R3 product lineup. AMD is cost-conscious, and if they can sell a 4c/16t CPU in the price slot as their 8c/16t chips from today, then they will.
There is zero indication AMD would ever sell a 4c/16t CPU as their new 8c/16t chips. Are you making this up just for the sake of discussion, or where does that come from?

It's a best-case for SMT4, so why not?
Because you are not staying on topic and we are increasingly running in circles as a result.

That's all we've got to go off of, though. Unless you think they'll go back to CMT or to a hybrid CMT/SMT solution. I think Sun/Oracle went that route. On second thought, let's hope AMD doesn't do that. In any case, AMD has no history of SMT4/8. Neither does Intel for that matter. I don't expect either one of them to try it considering how well IBM has done with advanced SMT implementation (which is to say, not very well at all; look at their market share).
Sorry, but your approach is pretty bad for predictions and makes for boring discussion. "Nobody did it that way before so it won't happen."



Let's agree to disagree. I think this wall-o-text discussion has run its course and you obviously didn't pick up several of the points raised. Nevertheless thanks for the nice discussion.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,660
1,860
136
Renoir will probably not turn many heads in terms of funky design decisions. It will probably be monolithic to save on IF power overhead
Possibly, but there is still Dali which was meant to be the value APU option, which would imply monolithic/small to me.

I think Renoir will be chiplet still, it's more a question of how that will be configured.

Of course they could both be monolithic but that seems doubtful to me.
 

moinmoin

Diamond Member
Jun 1, 2017
4,946
7,656
136
Possibly, but there is still Dali which was meant to be the value APU option, which would imply monolithic/small to me.

I think Renoir will be chiplet still, it's more a question of how that will be configured.

Of course they could both be monolithic but that seems doubtful to me.
I'd think Dali would be the 7nm follow up to Banded Kestrel, so 2c4t monolith. It may come sooner this time as 7nm die space is significantly more expensive, so an actually smaller die for 2c4t chips offers some savings. Renoir I expect to be a monolith as well. AMD may want to wait another round of significant density improvements (so 5nm) before moving parts of mobile to MCM, then use standard CCDs there as well.
 

soresu

Platinum Member
Dec 19, 2014
2,660
1,860
136
I'd think Dali would be the 7nm follow up to Banded Kestrel, so 2c4t monolith. It may come sooner this time as 7nm die space is significantly more expensive, so an actually smaller die for 2c4t chips offers some savings. Renoir I expect to be a monolith as well. AMD may want to wait another round of significant density improvements (so 5nm) before moving parts of mobile to MCM, then use standard CCDs there as well.
Renoir was expected to be Vega, any thoughts on Dali?
 

moinmoin

Diamond Member
Jun 1, 2017
4,946
7,656
136
Renoir was expected to be Vega, any thoughts on Dali?
I'd expect Dali to be just further cut down from Renoir just like Banded Kestrel was from Raven Ridge. So both based on Vega 20 plus VCN 2.0.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,348
1,533
136
I'm guesstimating an efficient modern big core like Zen1 ~ 210M, plus or minus 20M depending on if you include L3 or not.

Cache requires 8 transistors for a single bitcell, plus 1 for the access path. (more for tags, but let's just ignore them for now.) So you need at least 9*8*4*1024*1024 ~= 300M transistors for 4MB of cache.
 

Ajay

Lifer
Jan 8, 2001
15,448
7,857
136
Cache requires 8 transistors for a single bitcell, plus 1 for the access path. (more for tags, but let's just ignore them for now.) So you need at least 9*8*4*1024*1024 ~= 300M transistors for 4MB of cache.
Huh, I thought Intel used to used 6T cells (back, oh, 20+ years ago). Are 8T cells more stable at higher frequencies (or lower drive currents)?