Speculation: Ryzen 4000 series/Zen 3

nicalandia · Aug 25, 2019

amd6502 said:
Hyper threatting is way too dangerous. I don't think they would do it, and if they did, people probably would be hesitant to use it.

As for SMT-n it's been floating around CS academia for decades, so I doubt it's patentable.

Maybe a trademark search (eg threadripping) might some day give us more to speculate on.

Too dangerous as to "Intel will sue the life out of you" or perhaps it's execution? Hyperthreading and SMT are basically the same, HT naming belongs to Intel and AMD has never used the term just SMT(SMT was coined by Dean Tullsen)

moinmoin · Aug 25, 2019

DrMrLordX said:
I haven't seen a scheduler move low-utilization threads to a logical core over a physical core, ever. At least not under Linux. Which scheduler actually does this?

Under Linux if the governor is set to powersave the scheduler may already opt to prefer running two threads on one SMT2 core instead two physical cores. Afaik there is no generic logic yet to differentiate between low and high utilization threads per se, but last year developers first introduced group_misfit_task to move computationally demanding processes from little to big cores and later ARM contribute Energy Aware Scheduling. And in the big picture the difference between logical and physical cores as well as the availability and sharing of resources inside a core are just details of the topology the scheduler can and should respect.

DrMrLordX said:
The only thing that was clear that was Zen would not be a CMT design. So the next logical conclusion was an SMT2 implementation, at least to copy Intel.

Exactly. And with Zen 2 already being as wide as Power 7 support for SMT4 in Zen 3 can be a valid logical conclusion as well.

DrMrLordX said:
Depends on which users we're talking about here. Anyone who does encoding or rendering will like it. There are use cases. SMT4 though? Maaaaaybe, maybe not. I'd have to see some benchmarks to really understand how AMD's implementation of SMT4 would work before I was sold on it. If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former. 8c/32t probably moves me into a different price bracket/power envelope, making it maybe not an option for me anymore.

My primary point was supply begets demand, not the other way around. Now that AVX2 is more readily available it will be supported more. At the beginning new hardware features are picked up by high specialized software (like video encoders are for SIMD extensions) before developers figure out general uses.

Your point about 8c/32t moving you into a different price bracket/power envelope confuses me a little. In all the years HT never really made a difference in the power envelope, it was simply a stagnant premium feature for Intel. AMD on the other hand only disables SMT on some lower end chips, again without affecting the power envelope in any way. Thanks to the generally increasing amount of cores in the last couple years software is more and more being prepared and optimized for multi-threading, and SMT/HT profit of that as well. For non-optimized software the advantage of SMT to the process itself is ambivalent, and the same would be true for SMT4.

DrMrLordX said:
Now you're arguing thermals though, which is missing the point I'm making, since I'm assuming high CPU utilization overall for servers in scenarios where SMT4 might make sense. If all your cores are routinely sitting at 75% or higher utilization, then no, you do not worry about how the scheduler wakes up particular cores, since they aren't sleeping anyway.

Here you are sticking to the HPC scenario whereas our initial discussion up to now was focused on big.LITTLE versus SMT4. All cores being at high utilization all the time is generally a niche exception, not the rule. Yes, in that scenario sleeping cores doesn't apply as a power efficiency measure, but that doesn't invalidate or affect the other scenarios.

DrMrLordX said:
They sold SMT2 between 2017 and 2019. WRT SMT (or alternate strategies), that's "more of the same". They improved the individual cores and rejiggered IF links, but they didn't change their SMT strategy at all. They didn't go asynchronous core, they didn't go SMT4, they didn't kill SMT altogehter, they didn't resurrect CMT (thank goodness), etc.

That's a very limited time window to discuss about. 2 years. 3 gens, of which one is a refresh. My definition of "more of the same" would be HT and it being virtually unchanged since over a decade. I don't expect that to happen with AMD.

DrMrLordX said:
In a matter of seconds? Milliseconds?

Sure, why not? This is purely a matter of software and (VM) OS support. E.g. VMware's vSphere supports hot addition and removal of cores.

DrMrLordX said:
You may notice that not everyone has these problems in their design.

You may have noticed that not everyone offers the same memory and connectivity in their designs. One big reason why ARM doesn't matter in datacenters so far aside software support is that once one adds all the memory and connectivity required ARM's big power efficiency advantage vastly diminishes. And the one area where they could make a difference, massive amount of cores (which, again, are not the issue here for power efficiency), is the one AMD is currently targeting.

NTMBK · Aug 25, 2019

NostaSeronx said:
-> Up until recently, both Xbox One and PlayStation 4 have reserved two entire CPU cores (out of eight available) in order to run the background operating system in parallel with games. <-

With the option of custom thread occupation. All cores can be used by the OS(1-thread) and all cores can be used by the game(3-thread). The OS sees one thread, the developer sees up to three threads, the hardware sees four potential threads, * x cores.

In the case of Windows, CMT/SMT optimization cases;
Two threads or more = reduced performance => push it to another core
Two threads or more = increased performance => keep it on the same core.

The whole point of the dedicated OS cores is to provide consistent performance. Having OS threads unpredictably stealing cache, branch slots and execution resources from your game threads is no help whatsoever.

They would be better off just putting a quad Jaguar cluster on die to handle the OS independently, or allocating a single Zen 2 core to the OS.

NostaSeronx · Aug 25, 2019

NTMBK said:
The whole point of the dedicated OS cores is to provide consistent performance.

Tell that to the Xbox 360. Core 0 is pure game, Core 1 and Core 2 are hybrid game/OS. There is no need for dedicated OS cores, if there is enough frequency, IPC, etc.
5% to 50% random OS usage, with the rest being game well within the lines of SMT.

NTMBK said:
They would be better off just putting a quad Jaguar cluster on die to handle the OS independently, or allocating a single Zen 2 core to the OS.

No, that would be worse.

extide · Aug 25, 2019

amd6502 said:
Hyper threatting is way too dangerous. I don't think they would do it, and if they did, people probably would be hesitant to use it.

As for SMT-n it's been floating around CS academia for decades, so I doubt it's patentable.

Maybe a trademark search (eg threadripping) might some day give us more to speculate on.

HyperThreading is just Intel's branding for SMT, it isn't something different/special. There isn't any special sauce to anyones SMT implementation. The special sauce is in the core design itself and how many resources it has + how well it handles contention.

nicalandia · Aug 25, 2019

For anyone reading or searching for SMT development history in Intel/AMD

Alpha EV8 (Part 1): Simultaneous Multi-Threat
https://www.realworldtech.com/alpha-ev8-wider/

Intel's Jackson will offer 2 chips for 1
https://www.theregister.co.uk/2001/02/09/intels_jackson_will_offer/

Jackson Technology And SMT
http://www.slcentral.com/articles/01/6/multithreading/page10.php

Jackson gets a name: Hyper-Threading Technology
https://www.anandtech.com/show/819/4

nicalandia · Aug 25, 2019

extide said:
HyperThreading is just Intel's branding for SMT, it isn't something different/special. There isn't any special sauce to anyones SMT implementation. The special sauce is in the core design itself and how many resources it has + how well it handles contention.

At one time it was known as Project Jackson, Intel received assistance from Dean Tullsen

Jackson gets a name: Hyper-Threading Technology
https://www.anandtech.com/show/819/4.

amd6502 · Aug 25, 2019

nicalandia said:
For anyone reading or searching for SMT development history in Intel/AMD

Alpha EV8 (Part 1): Simultaneous Multi-Threat

Very interesting links. Amazing to see that transistor growth:
EV6 15.2M
EV7 130M
EV8 250M

Compare to Piledriver 106M per core (half module) and Presler's dual threading near 4ghz speed demon, which i guesstimate at ~130M per core ( https://www.anandtech.com/show/1910/3 , http://www.cpu-world.com/CPUs/Pentium_Extreme_Edition/Intel-Pentium Extreme Edition 965 HH80553PH1094M (BX80553965).html ) .

I'm guesstimating an efficient modern big core like Zen1 ~ 210M, plus or minus 20M depending on if you include L3 or not.

DrMrLordX said:
If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former.

Not a terribly good question. Better question, would you rather have a 4c/8t or 4c/16t in your future 2021/22 mobile 10W tdp APU.

naukkis said:
And putting other threads to same core that already runs high-priority thread instead of idle cores is just stupid as it will slow down that high-priority thread.

moinmoin said:
Under Linux if the governor is set to powersave the scheduler may already opt to prefer running two threads on one SMT2 core instead two physical cores.

Exactly, the scheduler will need to work with both user preference and frequency governor to switch between performance versus power-efficiency prioritized scheduling.

nicalandia · Aug 25, 2019

amd6502 said:
Not a terribly good question. Better question, would you rather have a 4c/8t or 4c/16t in your future 2021/22 mobile 10W tdp APU..

I agree, that is a much better question

Lodix · Aug 25, 2019

nicalandia said:
I agree, that is a much better question

Andrei has repeatedly said that SMT is not precisely very power efficient and that explains why ARM hasn't bothered implementing it on its Cortex A series.

amd6502 · Aug 25, 2019

Lodix said:
Andrei has repeatedly said that SMT is not precisely very power efficient and that explains why ARM hasn't bothered implementing it on its Cortex A series.

That may or may not be true. Nosta had similar view and criticism against SMT efficiency, with CMT winning. CMT typically is monothreading on the integer side, but if you take it to the max and share the front end it has even further efficiency advantages over monothreading.

But I think there are problems with these views, even if it's technically true.

How would the consumer market like it if we went big.little acorn style by taking Raven or Picasso, or the 7nm equivalent quadcore SMT2, and tagging along 4 or 8 Puma+++ cores, or hyper efficient in-order Atom cores? Even if it is more efficient at churning through huge loads (that imho aren't typically seen by consumers) I think it would be unpopular with the consumer, as well as not the most balanced product.

jpiniero · Aug 25, 2019

amd6502 said:
How would the consumer market like it if we went big.little acorn style by taking Raven or Picasso, or the 7nm equivalent quadcore SMT2, and tagging along 4 or 8 Puma+++ cores, or hyper efficient in-order Atom cores? Even if it is more efficient at churning through huge loads (that imho aren't typically seen by consumers) I think it would be unpopular with the consumer, as well as not the most balanced product.

One nice thing about big.LITTLE is that the big core doesn't have to worry as much about power saving, since it'll just be shut off the moment it's not needed. So on the big cores you might be able to go wider than you might otherwise, and keep the frequency high without needing to ramp up.

NostaSeronx · Aug 25, 2019

amd6502 said:
Nosta had similar view and criticism against SMT efficiency, with CMT winning. CMT typically is monothreading on the integer side, but if you take it to the max and share the front end it has even further efficiency advantages over monothreading.

Single core/dual core, Single SMT2 core(high ipc), single CMT2 module(replicated core+smt2 front-end) basically has the CMT2 module winning power efficiency. This power efficiency comes with better frequency scaling if the process has good Fmax merit; 22FDX/12FDX have more stable Fmax than 14LPP/7LP, etc. (There is also an issue of merit with Vmin/Vmax as well for FinFETs.)

It scales with CMP versions as well, with CMP8-CMT8 usually running on-par with 64-core, while running significantly less power than CMP8-SMT8. CMP8-CMT8 can be a fast ring, however CMP64 has to run a slow mesh. There is no scaling guarantee for the feasibility of a SMT8 doing well against an eight core. In a perfect world, it will run faster single threads and have less power, in reality it slogs hard.

However, there is Clustered Simultaneous Multithreading(CSMT) which fuses CMT and SMT. Which quite literally fixes everything with SMT. An CMP8-CSMT8 will have faster single threaded and the same power efficiency of a CMP8-CMT8. It can scale in reverse, CSMT2 has the IPC of a SMT2 core and the power efficiency of a CMT2 module. It has several names starting with DT-CMT(2005), CSMT(2007), IBM's variable-partitionable core(2009), etc. One of the early models actually implement power-gating and frequency/voltage-boost capability in the module as well. Allowing for a big brainiac when needed down to some speed-demons when needed.

DrMrLordX · Aug 26, 2019

moinmoin said:
Afaik there is no generic logic yet to differentiate between low and high utilization threads per se

Supposedly EAS can do this already:

https://developer.arm.com/tools-and...software/linux-kernel/energy-aware-scheduling

Exactly. And with Zen 2 already being as wide as Power 7 support for SMT4 in Zen 3 can be a valid logical conclusion as well.

Um, not really? I think the chance of AMD copying anything meaningful from the POWER family to be next-to-zero. They'll probably continue to mimic Intel's best design decisions while tossing aside their worst ones.

My primary point was supply begets demand, not the other way around. Now that AVX2 is more readily available it will be supported more. At the beginning new hardware features are picked up by high specialized software (like video encoders are for SIMD extensions) before developers figure out general uses.

Yeah, but look at AVX512. How many developers are going to pick up that SIMD ISA? I would say, far fewer than have embraced AVX2. See @NTMBK's post below. How many developers are really going to want to embrace SMT4? Remember, if AMD does go SMT4, today's 8c buyers are going to be sold 4c or 6c products tomorrow. Most likely.

Your point about 8c/32t moving you into a different price bracket/power envelope confuses me a little. In all the years HT never really made a difference in the power envelope, it was simply a stagnant premium feature for Intel.

Of course it has made a difference. It's a price differentiator between the 9700k and 9900k today, just as it's been a feature/price differentiator at the low-end of Intel's stack as well (2c/4t i3s versus 2c/2t Pentiums, for example). Also see i5-6600k vs i7-6700k and others. Sure, there's binning and cache differences as well, but HT was a premium far far back in Intel's product stack. Even AMD differentiated between products as SMT on or SMT off with Summit Ridge. Look at the R3 product lineup. AMD is cost-conscious, and if they can sell a 4c/16t CPU in the price slot as their 8c/16t chips from today, then they will. Not sure where the TDPs would fall, mind you.

Here you are sticking to the HPC scenario

It's a best-case for SMT4, so why not?

whereas our initial discussion up to now was focused on big.LITTLE versus SMT4.

If DynamIQ wants to be taken seriously in the server room, it has to be useful in any possible scenario. And again, I think high-utilization database applications also fall outside of the HPC world. There are some times when a database is going to be running all cores well above 50%. Also CPU render clusters and other things.

That's a very limited time window to discuss about. 2 years. 3 gens, of which one is a refresh. My definition of "more of the same" would be HT and it being virtually unchanged since over a decade. I don't expect that to happen with AMD.

That's all we've got to go off of, though. Unless you think they'll go back to CMT or to a hybrid CMT/SMT solution. I think Sun/Oracle went that route. On second thought, let's hope AMD doesn't do that. In any case, AMD has no history of SMT4/8. Neither does Intel for that matter. I don't expect either one of them to try it considering how well IBM has done with advanced SMT implementation (which is to say, not very well at all; look at their market share).

Sure, why not? This is purely a matter of software and (VM) OS support. E.g. VMware's vSphere supports hot addition and removal of cores.

That's actually interesting, and makes the use case for DynamIQ SoCs in cloud environments somewhat compelling. But I'm veering off-topic a bit.

NTMBK said:
The whole point of the dedicated OS cores is to provide consistent performance. Having OS threads unpredictably stealing cache, branch slots and execution resources from your game threads is no help whatsoever.

They would be better off just putting a quad Jaguar cluster on die to handle the OS independently, or allocating a single Zen 2 core to the OS.

While I agree with you wholeheartedly in theory, in practice utilizing "little" cores might increase design costs. big.LITTLE/DynamIQ is also mostly unproven in the server room . . . for now. But what you're saying rings true.

amd6502 · Aug 26, 2019

No I disagree that monothreading is more efficient when you compare similarly powerful cores. Well done little cores have an efficiency advantage over wide cores with a big reorder window. You've got to compare a like core to a like core. Can you sketch a diagram? I have a hard time figuring out how CMT could be merged with SMT or what you're saying.

DrMrLordX said:
How many developers are really going to want to embrace SMT4? Remember, if AMD does go SMT4, today's 8c buyers are going to be sold 4c or 6c products tomorrow. Most likely.

You wouldn't really need developers to embrace it. Not many at least; it would be mostly transparent to the applications. The only part might be the OS scheduler, and it would not be a big addition. Linus could write it over one weekend.

I completely disagree that there would be a regression in core counts. Quite likely SMT4 won't even be enabled by default for consumer desktop, where SMT2 likely would be the default. The use is much more for mobile where core counts are lower and where factors like battery life are very very important. There I see it enabled by default (assuming SMT4 arrives at all in Zen4 or Zen3).

If it arrives it will be a win-win for any PC user. Laptops will make leaps and bounds, and desktop users will just enjoy the wider core.

But anyways, we're all speculating a few years ahead. First we need to wait till 2020 for Zen2 APU. (Or, if it remains MCM we might see it by end of this year?)

NostaSeronx · Aug 26, 2019

amd6502 said:
Can you sketch a diagram? I have a hard time figuring out how CMT could be merged with SMT or what you're saying.

Well there are different ways of implementing SMT on CMT and vice versa.

For example; ALQ0 running T0, ALQ1 running T1, ALQ2 running T2, ALQ3 running T3 // ALQx-a r Tx-a is a form of CMT on SMT. (Each ALQx has 2 ALUs)
While; Micro-core(core) C0-C3 all capable of running T0 through T3 is a form of SMT on CMT. (Each core(micro-core) has 2 ALUs+2 AGUs)
Any fine-grain thread control that can be achieved via OS, software, hardware that clusters hardware is CMT, even if it is only implemented for a SMT core. Predictability over the standard method of SMT is what CMT is striving for and reduced optimization times, reduced hardware bloat, etc.

First example has capability of disabling pairs(as AMD's SMT) and the second example can disable dispatch to lowest data cache(as AMD's CMT), for lower power. Which is more effective than multi-core of completely replicated processors of the same capability.

SMT4 -> Super-Wide OoO via 6x ALUs == bad for high frequency design <== barebone SMT design
SMT4 -> Clustered 2x, Wide OoO via 4x ALUs // Clustered 4x, Narrow OoO via 2x ALUs == good for high frequency design(least to most) <== CSMT design

DrMrLordX · Aug 26, 2019

amd6502 said:
No I disagree that monothreading is more efficient when you compare similarly powerful cores. Well done little cores have an efficiency advantage over wide cores with a big reorder window.

Not sure what you're saying here.

You've got to compare a like core to a like core.

Why? That's not what's on the table. Realistically, we all know (or should reasonably expect) that AMD will continue with SMT2 no matter how wide they make Zen3. But if we're looking at alternatives to synchronous core arrangements (CCXs) with SMT2, then we're inevitably going to be comparing fundamentally different cores, like 4-wide SMT2 vs. 6-wide SMT4 vs 4-wide SMT2 + something else entirely.

Can you sketch a diagram? I have a hard time figuring out how CMT could be merged with SMT or what you're saying.

https://en.wikipedia.org/wiki/SPARC

UltraSPARC T1 has at least one OpenSPARC implementation with 4-way SMT (known as S1). The UltraSPARC Architecture 2005 includes CMT extensions. Theoretically, one could have SMT and CMT implemented in the same hardware. Not sure if any of the existing SPARC implementations actually feature that odd combination. Nevertheless, it didn't help out Oracle much so it's all academic.

You wouldn't really need developers to embrace it. Not many at least; it would be mostly transparent to the applications. The only part might be the OS scheduler, and it would not be a big addition. Linus could write it over one weekend.

Unicorns and rainbows. Moving to SMT4 would be mostly transparent to applications that are throughput-bound. Anyone that actually cares about thread response time will have fits if their entire userbase switches to SMT4. I don't think it would be as bad as CMT implementations like Piledriver where cache thrashing could drive performance into the basement (see 3DPM v1 on Piledriver; it's horrendous). It would still cause a lot of headaches for developers. Plus don't pipeline flushes get worse for performance as you expand the number of threads that can be handled by SMT? Unless people are using something like this now:

https://patents.google.com/patent/US20040215938A1/en

I completely disagree that there would be a regression in core counts.

Why? A wider core requires more transistors. Expanding SMT capabilities requires more registers (again, more transistors). AMD isn't going to go for a major density change until 5nm (Zen4). If they give us wider cores, then we either get physically larger chiplets or we get fewer cores per chiplet. Either or. Not to speak of the changes they'd make in marketing their SMT4 products.

Quite likely SMT4 won't even be enabled by default for consumer desktop, where SMT2 likely would be the default.

. . . really?

The use is much more for mobile where core counts are lower and where factors like battery life are very very important. There I see it enabled by default (assuming SMT4 arrives at all in Zen4 or Zen3).

Why would you want SMT4 in mobile products? Asynchronous core arrangements currently dominate there. Intel is getting on board with Lakefield. Pushing SMT4 in the successor to Renoir would be bizarre.

But anyways, we're all speculating a few years ahead. First we need to wait till 2020 for Zen2 APU. (Or, if it remains MCM we might see it by end of this year?)

Renoir will probably not turn many heads in terms of funky design decisions. It will probably be monolithic to save on IF power overhead

moinmoin · Aug 26, 2019

Lodix said:
Andrei has repeatedly said that SMT is not precisely very power efficient and that explains why ARM hasn't bothered implementing it on its Cortex A series.

Eh, I think some people are mislead by the higher power consumption of SMT. That higher power consumption is a logical result of SMT making better use of the resources than a single thread would. But since resources are shared, the overall power efficiency is actually better with SMT on than off. Just compare power consumption per thread/performance for e.g. Cinebench MT with SMT on then off.

----

DrMrLordX said:
Um, not really? I think the chance of AMD copying anything meaningful from the POWER family to be next-to-zero. They'll probably continue to mimic Intel's best design decisions while tossing aside their worst ones.

WTH, who is talking about copying Power or Intel? Are you even following this discussion?

Look, the whole point of SMT is making better use of chip resources. As instructions differ in complexity, resulting in different latency and throughput, optimizing for heavier instructions doesn't benefit lighter instructions. So additional resources for speeding up heavy instructions will be laying dormant for most lighter instructions. Here SMT comes in to allow such dormant resources to be used for executing additional threads concurrently.

DrMrLordX said:
Yeah, but look at AVX512. How many developers are going to pick up that SIMD ISA? I would say, far fewer than have embraced AVX2.

Eww, personally I hope AVX512 disappears from the face of the earth again and is wholesale replaced by some/any SVE implementation. AVX512 is a very poorly designed ISA extension that just facilitates the future addition of even more custom extensions for the purpose of artificial market segmentation instead offering a flexible future proof framework for increasingly wider SIMD calculations.

DrMrLordX said:
See @NTMBK's post below. How many developers are really going to want to embrace SMT4? Remember, if AMD does go SMT4, today's 8c buyers are going to be sold 4c or 6c products tomorrow. Most likely.

At this point I honestly don't think you have a grasp on who works with SMT and what needs to happen to make good use of it. And I don't think you correctly interpret the market if you really think people are happy to suddenly decrease the number of cores again just because SMT2 were to be replaced by SMT4. The number of threads is a very technical detail most normal consumers don't know what it is about and usually just ignore.

DrMrLordX said:
Of course it has made a difference. It's a price differentiator between the 9700k and 9900k today, just as it's been a feature/price differentiator at the low-end of Intel's stack as well (2c/4t i3s versus 2c/2t Pentiums, for example). Also see i5-6600k vs i7-6700k and others. Sure, there's binning and cache differences as well, but HT was a premium far far back in Intel's product stack.

Really now? Do you really not read my posts?

moinmoin said:
In all the years HT (...) was simply a stagnant premium feature for Intel.

You are just repeating me.

DrMrLordX said:
Even AMD differentiated between products as SMT on or SMT off with Summit Ridge. Look at the R3 product lineup. AMD is cost-conscious, and if they can sell a 4c/16t CPU in the price slot as their 8c/16t chips from today, then they will.

There is zero indication AMD would ever sell a 4c/16t CPU as their new 8c/16t chips. Are you making this up just for the sake of discussion, or where does that come from?

DrMrLordX said:
It's a best-case for SMT4, so why not?

Because you are not staying on topic and we are increasingly running in circles as a result.

DrMrLordX said:
That's all we've got to go off of, though. Unless you think they'll go back to CMT or to a hybrid CMT/SMT solution. I think Sun/Oracle went that route. On second thought, let's hope AMD doesn't do that. In any case, AMD has no history of SMT4/8. Neither does Intel for that matter. I don't expect either one of them to try it considering how well IBM has done with advanced SMT implementation (which is to say, not very well at all; look at their market share).

Sorry, but your approach is pretty bad for predictions and makes for boring discussion. "Nobody did it that way before so it won't happen."

Let's agree to disagree. I think this wall-o-text discussion has run its course and you obviously didn't pick up several of the points raised. Nevertheless thanks for the nice discussion.

soresu · Aug 26, 2019

DrMrLordX said:
Renoir will probably not turn many heads in terms of funky design decisions. It will probably be monolithic to save on IF power overhead

Possibly, but there is still Dali which was meant to be the value APU option, which would imply monolithic/small to me.

I think Renoir will be chiplet still, it's more a question of how that will be configured.

Of course they could both be monolithic but that seems doubtful to me.

moinmoin · Aug 26, 2019

soresu said:
Possibly, but there is still Dali which was meant to be the value APU option, which would imply monolithic/small to me.

I think Renoir will be chiplet still, it's more a question of how that will be configured.

Of course they could both be monolithic but that seems doubtful to me.

I'd think Dali would be the 7nm follow up to Banded Kestrel, so 2c4t monolith. It may come sooner this time as 7nm die space is significantly more expensive, so an actually smaller die for 2c4t chips offers some savings. Renoir I expect to be a monolith as well. AMD may want to wait another round of significant density improvements (so 5nm) before moving parts of mobile to MCM, then use standard CCDs there as well.

soresu · Aug 26, 2019

moinmoin said:
I'd think Dali would be the 7nm follow up to Banded Kestrel, so 2c4t monolith. It may come sooner this time as 7nm die space is significantly more expensive, so an actually smaller die for 2c4t chips offers some savings. Renoir I expect to be a monolith as well. AMD may want to wait another round of significant density improvements (so 5nm) before moving parts of mobile to MCM, then use standard CCDs there as well.

Renoir was expected to be Vega, any thoughts on Dali?

moinmoin · Aug 26, 2019

soresu said:
Renoir was expected to be Vega, any thoughts on Dali?

I'd expect Dali to be just further cut down from Renoir just like Banded Kestrel was from Raven Ridge. So both based on Vega 20 plus VCN 2.0.

Tuna-Fish · Aug 26, 2019

amd6502 said:
I'm guesstimating an efficient modern big core like Zen1 ~ 210M, plus or minus 20M depending on if you include L3 or not.

Cache requires 8 transistors for a single bitcell, plus 1 for the access path. (more for tags, but let's just ignore them for now.) So you need at least 9*8*4*1024*1024 ~= 300M transistors for 4MB of cache.

Ajay · Aug 26, 2019

Tuna-Fish said:
Cache requires 8 transistors for a single bitcell, plus 1 for the access path. (more for tags, but let's just ignore them for now.) So you need at least 9*8*4*1024*1024 ~= 300M transistors for 4MB of cache.

Huh, I thought Intel used to used 6T cells (back, oh, 20+ years ago). Are 8T cells more stable at higher frequencies (or lower drive currents)?

amd6502 · Aug 26, 2019

Tuna-Fish said:
Cache requires 8 transistors for a single bitcell, plus 1 for the access path. (more for tags, but let's just ignore them for now.) So you need at least 9*8*4*1024*1024 ~= 300M transistors for 4MB of cache.

That would be too much I think; was doing a ballpark area guess late at night.

So 9*4*1M= 36M

Compares better than order of magnitude close, as 40M equals difference between (X ± 20M).

On another semi related topic, right now I'm looking at the RWT history of the EV8 tragedy linked by nicelandia.

nicalandia said:
Alpha EV8 (Part 1): Simultaneous Multi-Threat
https://www.realworldtech.com/alpha-ev8-wider/

The IPC gained from going out-of-order is actually quite small; they kept the transistors growth relatively low, from EV5's 9M to EV8's 15M. Adjusting for doubled clockspeed we see the IPC increase is not that big ():

At 300MHz, int/FP for EV5 is: 8/13
For EV6 it would be: 15/24

EV7's transistor budget ballooned, and it clocked to 1.5GHz. At 300MHz the projected IPC (neglecting gains from lower cycle RAM benefits) would be:

EV7: 18/32

So it looks like they turned this into a long core speed demon; something that Bulldozer likely inherited from.

More reading for anyone interested: 1. http://alasir.com/articles/alpha_history/alpha_21364_21464.html 2. http://alasir.com/articles/alpha_history/alpha_21264.html

It would also be interesting to know how many transistors were added during the POWER SMT4 jump.

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Lifer

Senior member