Speculation: Ryzen 4000 series/Zen 3

RetroZombie · Jun 30, 2020

Gideon said:
Matisse reaaaally struggles to go below 60ns, even with similar crazy ram-kits and custom timings.

Going from 16MB L3 into just 4MB gives less 15ns of latency.

Adonisds · Jun 30, 2020

Thunder 57 said:
You are being foolish. There is no SMT4. I have given you proof. Yet you still run your mouth. Shut up and wait for it to come out. Keeping it a secret? What a joke! 6 ALU's, shut up. You don't think Intel or AMD could come up with a 6 ALU design? Yes, we will soon see if Zen 3 has SMT4 or not. When you are proven wrong, will take a slice of humble pie?

He shows signs of being incapable of admitting he was wrong if he turns out to be wrong about SMT4. I have no problem with people who are wrong, but people who can't admit their mistakes I can't stand. We'll see what happens

Thunder 57 · Jun 30, 2020

Adonisds said:
He shows signs of being incapable of admitting he was wrong if he turns out to be wrong about SMT4. I have no problem with people who are wrong, but people who can't admit their mistakes I can't stand. We'll see what happens

If Milan has SMT4 I will admit I was wrong,

thigobr · Jun 30, 2020

RetroZombie said:
Going from 16MB L3 into just 4MB gives less 15ns of latency.

It's not just L3 difference... The biggest improvement comes from have on-die memory controller again

RetroZombie · Jun 30, 2020

thigobr said:
It's not just L3 difference... The biggest improvement comes from have on-die memory controller again

Yes that account for 5ns, which ends up with a total of 20ns improvement and that bones well for the consoles.

Makaveli · Jun 30, 2020

RetroZombie said:
Yes that account for 5ns, which ends up with a total of 20ns improvement and that bones well for the consoles.

The consoles are using Zen 2 do we know if that has been tweaked to be more like the apu core than the standard Zen 2 core?

RetroZombie · Jun 30, 2020

Makaveli said:
The consoles are using Zen 2.

Renoir is zen2, so?

amd6502 · Jun 30, 2020

NostaSeronx said:
imho, AMD will be killing off SMT rather than increasing it.

=> Kill SMT
=> Switch to double FE + 1C/1T L0i.

Zen3 will launch with "SMT2" but I have been hearing it is actually a VMT2 implementation w/ ST mode being best overall perf/watt.
Zen4 will then drop multithreading on a single core and push for double piped front-end and an improved singlethreaded pure-L0i(no switching between op-cache and L0i).

What is VMT?

I have many doubts that Zen4 would go monothreading again. It's possible they might have a mode where there is 1 thread plus 3 background threads.

Assuming the focus on one main thread, some low hanging fruit might be clustering to enable sharing of the two FPUs (1 FPU/1 core). So, for FPU code, SMT2 would be happening within core pairs (aka modules in CMT lingo), which would double the maximum FPU should the neighboring core not utilize its FPU.

Another idea (this is sort of far out considering 'fusion' vision hasn't materialized and the majority of products are still without iGPU): to utilize the iGPU for far ahead speculative FPU calculations. Potentially useful calculations would be stored in the L1 and L2.

Saylick · Jun 30, 2020

amd6502 said:
What is VMT?

I'm sure someone could explain better than I, but if I'm not mistaken, VMT = virtual multi-threading, or "reverse" SMT as it's colloquially known. Basically, where SMT exposes physical cores to the OS as many virtual cores (from few to many), VMT is virtualizating multiple physical cores into one virtual core (from many to few).

SMT feeds wide cores with many threads so that you get higher utilization of resources when you have many light workloads, while VMT takes many narrow cores and gangs them up into a virtual large core to tackle workloads requiring more single threaded performance.

NostaSeronx · Jun 30, 2020

amd6502 said:
What is VMT?

In my post, it is vertical multithreading. However, it might not be historical vertical multithreading in implementation.

17h is single path instruction flow and it switches between thread A or thread B.
19h could be aimed towards going for two path instruction flow and it algorithmically prioritizes for thread A on path A and thread B on path B. With future models dropping thread B and implementing dual task/process execution without two logical cores. Expanding OoO efficiency w/o duct taping SMT on the core.

Bucket A + Bucket B
Bucket A + Bucket A minus N or Bucket A + Bucket A plus N

Hardware OoO + Software SMT is also an option after they kill SMT.

Exist50 · Jun 30, 2020

amd6502 said:
What is VMT?

It's absolute hogwash is what it is. As a general rule, if Nosta likes to talk about it, then it's probably just a nonsensical combination of technobabble. Most on this forum have learned to ignore him by now.

amd6502 · Jun 30, 2020

NostaSeronx said:
In my post, it is vertical multithreading. However, it might not be historical vertical multithreading in implementation.

Ok I had to do some research on vertical MT. Supposedly this is a crude predecessor to modern SMT, where each stage can only work on one thread at a time, but where stalled threads allow a another thread to wake up and resume. So kind of a coarse grain MT. Supposedy these were the early days of P4 hyperthreading as well as Larrabee atom HT.

I much doubt this would happen (maybe with exception for background threads). I also disagree with duct tape analogy. Duct tape SMT method was already done for the FPU side in BD/Piledriver family. Zen seems very much designed for SMT2 from the start. I disagree with Richie that Zen1 would have been A7 Apple related; these apple acorn cores are monothreaders. It seems funny to think that they took an A7, duct taped some SMT to it, also duct taped an x86 decoder to it, and also made it run on AMD mu ops rather than armv8.

I agree with Richie that cores are getting wider. I kind of doubt it'd be as wide as EV8. But regardless, whether it is an 8+4 wide or 5+3 wide core, Vertical MT would do almost nothing to help these pipes from getting underutilized. I guess the Apple stategy was to not mind all the underutilization, and to probably put most of them to sleep when idle like conditions were detected. (I don't think that's a good approach).

But for a thread that aims to maximize IPC in a modern very wide core, the amount of branch prediction, look ahead, and spec execution doesn't seem to be agreeable if maximizing perf/watt is one of the main goals. So for that I could see that be limited to one main priority thread, while the other threads aim for much lower OoO execution and modest IPC's like Piledriver's.

Or... straight up SMT4, but the OS would taskset running processes to the fewest number of cores, and so maximize the the number of idling cores that can then be put into low power mode. And then hope that the SMT4 quarters the amount of lookahead and spec execution, so that it goes into reasonable and energy efficient territory.

Richie Rich · Jul 1, 2020

amd6502 said:
Ok I had to do some research on vertical MT. Supposedly this is a crude predecessor to modern SMT, where each stage can only work on one thread at a time, but where stalled threads allow a another thread to wake up and resume. So kind of a coarse grain MT. Supposedy these were the early days of P4 hyperthreading as well as Larrabee atom HT.

I much doubt this would happen (maybe with exception for background threads). I also disagree with duct tape analogy. Duct tape SMT method was already done for the FPU side in BD/Piledriver family. Zen seems very much designed for SMT2 from the start. I disagree with Richie that Zen1 would have been A7 Apple related; these apple acorn cores are monothreaders. It seems funny to think that they took an A7, duct taped some SMT to it, also duct taped an x86 decoder to it, and also made it run on AMD mu ops rather than armv8.

I agree with Richie that cores are getting wider. I kind of doubt it'd be as wide as EV8. But regardless, whether it is an 8+4 wide or 5+3 wide core, Vertical MT would do almost nothing to help these pipes from getting underutilized. I guess the Apple stategy was to not mind all the underutilization, and to probably put most of them to sleep when idle like conditions were detected. (I don't think that's a good approach).

But for a thread that aims to maximize IPC in a modern very wide core, the amount of branch prediction, look ahead, and spec execution doesn't seem to be agreeable if maximizing perf/watt is one of the main goals. So for that I could see that be limited to one main priority thread, while the other threads aim for much lower OoO execution and modest IPC's like Piledriver's.

VMT is working in modern cores already. They divide 1 single thread into several sub-threads for each back-end port(ALU, LSU, FPU). OoO machine can speculatively execute both branch ways if needed. I would say there is no need for VMT at macro-level because it's already built in OoO mechanism.

I never said that Keller brought A7 design and say built this and duct-tape SMT on it. But there are some surprising similarities:

A7 ..... 4xALU .... 2xBranch shared ..... 192-entry ROB ... 64kB+64kB L1 cache ... 2xAGU
Zen .... 4xALU .... 2xBranch shared .....192-entry ROB ... 64kB+64kB L1 cache ... 2xAGU

There might be more identical things in INT core but A7 info is very limited. Nosta mentioned this similarity long time ago and he was right. I'm not HW engineer so I don't understand all Nosta's ideas however sometimes he has good catch (like with early N5 Zen4).

I'm afraid that AMD did reduce Keller's EV8 resurrection AKA Zen3 into something smaller though. Probably 8xALU -> 6xALU and keeping SMT4 of course.

mopardude87 · Jul 1, 2020

Someone loving those Apple processors, i am almost convinced of tossing my 3900x in the trash. I admire your passion sir.

Tuna-Fish · Jul 1, 2020

Saylick said:
I'm sure someone could explain better than I, but if I'm not mistaken, VMT = virtual multi-threading, or "reverse" SMT as it's colloquially known. Basically, where SMT exposes physical cores to the OS as many virtual cores (from few to many), VMT is virtualizating multiple physical cores into one virtual core (from many to few).

Doing that is not possible. Something in this vein is frequently suggested as something someone should figure out how to do by people who have absolutely no understanding of how any of this works. The latency on die between two cores is simply too high for "two cores working on single thread" to ever produce any performance benefit.

Anyone who suggests any kind of "reverse SMT" should instantly be discredited, as they clearly no not understand even the very basics of what limitations matter when doing things in silicon.

Richie Rich said:
When Zen3 will not have SMT4 I will say: "OK. I was wrong, you we right. But it's a missed opportunity to be more advanced over Intel."

Wider SMT is not more advanced. The server market is not currently asking for more SMT. I know for a fact that it has lately become more common for server customers to go the opposite way, and completely disable SMT on machines they purchase. There are two reasons for this:

Firstly, some of the recent security issues hit machines with SMT worse than ones without, and it's disabled for perceived security reasons.

Secondly, it's disabled because disabling it improves the performance/cost of the server. Renewed competition in the CPU area has drastically lowered the part of a server's cost that goes to the CPU. This has left RAM as the largest cost, by far, typically near half the cost of the whole server. Every process you run requires the same amount of ram to do it's thing, regardless of how fast it is running. If you double the amount of threads, but only increase the system throughput by 25%, you have just doubled the amount of RAM you need to pay for, for just a quarter extra speed. In a world where RAM is more than 50% of the cost of a new server, this has just worsened your performance/cost by ~16%.

Thibsie · Jul 1, 2020

Richie Rich said:
I'm not HW engineer so I don't understand all Nosta's ideas however sometimes he has good catch (like with early N5 Zen4).

We don't either and I don't think it is the reason you mention.

moinmoin · Jul 1, 2020

Tuna-Fish said:
Doing that is not possible. Something in this vein is frequently suggested as something someone should figure out how to do by people who have absolutely no understanding of how any of this works. The latency on die between two cores is simply too high for "two cores working on single thread" to ever produce any performance benefit.

Anyone who suggests any kind of "reverse SMT" should instantly be discredited, as they clearly no not understand even the very basics of what limitations matter when doing things in silicon.

Aww come on, let me dream that dream, even if it's completely unrealistic currently.

LightningZ71 · Jul 1, 2020

Tuna-Fish said:
Doing that is not possible. Something in this vein is frequently suggested as something someone should figure out how to do by people who have absolutely no understanding of how any of this works. The latency on die between two cores is simply too high for "two cores working on single thread" to ever produce any performance benefit.

Anyone who suggests any kind of "reverse SMT" should instantly be discredited, as they clearly no not understand even the very basics of what limitations matter when doing things in silicon.

Wider SMT is not more advanced. The server market is not currently asking for more SMT. I know for a fact that it has lately become more common for server customers to go the opposite way, and completely disable SMT on machines they purchase. There are two reasons for this:

Firstly, some of the recent security issues hit machines with SMT worse than ones without, and it's disabled for perceived security reasons.

Secondly, it's disabled because disabling it improves the performance/cost of the server. Renewed competition in the CPU area has drastically lowered the part of a server's cost that goes to the CPU. This has left RAM as the largest cost, by far, typically near half the cost of the whole server. Every process you run requires the same amount of ram to do it's thing, regardless of how fast it is running. If you double the amount of threads, but only increase the system throughput by 25%, you have just doubled the amount of RAM you need to pay for, for just a quarter extra speed. In a world where RAM is more than 50% of the cost of a new server, this has just worsened your performance/cost by ~16%.

You also missed another item in the decision tree of not having SMT enabled on servers: Licensing costs. Big software vendors are moving from a "per-socket" licensing model to a "per-thread" model, by way of "per-core". If those extra threads are costing you as much as the first threads on each core cost, but they only bring an additional 25% of performance to the table, while also increasing operating temperatures and slowing down the performance of individual threads (there is some overhead to running the extra threads), it can make more sense to just deploy a few extra servers with SMT off and come out ahead in the long run. To enhance what you were saying about RAM, disabling SMT can also let you run lower density memory modules in each server as there is less memory demand from fewer active threads, cutting RAM costs markedly on a per GB basis.

maddie · Jul 1, 2020

Tuna-Fish said:
Doing that is not possible. Something in this vein is frequently suggested as something someone should figure out how to do by people who have absolutely no understanding of how any of this works. The latency on die between two cores is simply too high for "two cores working on single thread" to ever produce any performance benefit.

Anyone who suggests any kind of "reverse SMT" should instantly be discredited, as they clearly no not understand even the very basics of what limitations matter when doing things in silicon.

Wider SMT is not more advanced. The server market is not currently asking for more SMT. I know for a fact that it has lately become more common for server customers to go the opposite way, and completely disable SMT on machines they purchase. There are two reasons for this:

Firstly, some of the recent security issues hit machines with SMT worse than ones without, and it's disabled for perceived security reasons.

Secondly, it's disabled because disabling it improves the performance/cost of the server. Renewed competition in the CPU area has drastically lowered the part of a server's cost that goes to the CPU. This has left RAM as the largest cost, by far, typically near half the cost of the whole server. Every process you run requires the same amount of ram to do it's thing, regardless of how fast it is running. If you double the amount of threads, but only increase the system throughput by 25%, you have just doubled the amount of RAM you need to pay for, for just a quarter extra speed. In a world where RAM is more than 50% of the cost of a new server, this has just worsened your performance/cost by ~16%.

Your last paragraph is one hell of an example of 2nd and 3rd order effects. Who would have clearly seen that?

Richie Rich · Jul 1, 2020

Tuna-Fish said:
Secondly, it's disabled because disabling it improves the performance/cost of the server. Renewed competition in the CPU area has drastically lowered the part of a server's cost that goes to the CPU. This has left RAM as the largest cost, by far, typically near half the cost of the whole server. Every process you run requires the same amount of ram to do it's thing, regardless of how fast it is running. If you double the amount of threads, but only increase the system throughput by 25%, you have just doubled the amount of RAM you need to pay for, for just a quarter extra speed. In a world where RAM is more than 50% of the cost of a new server, this has just worsened your performance/cost by ~16%.

It sounds reasonable in theory. But why AWS comparison of Graviton2 vs. Rome is with SMT2 enabled?

Amazon cloud can disable SMT but it's asked mainly by customer whos running HPC loads. https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/

As far as most server systems has SMT ON, then you are wrong. However feel free to provide the data that most servers running SMT OFF. Not mentioning that some tasks with low ILP like SQL benefits from SMT a lot.

Zen3 with 8xALU and SMT4 would need to reduce SMT4 down to SMT2. Disabling completely SMT would let the core underutilized.

moinmoin · Jul 1, 2020

maddie said:
Your last paragraph is one hell of an example of 2nd and 3rd order effects. Who would have clearly seen that?

Everybody who's building servers optimized for specific purposes?

LightningZ71 · Jul 1, 2020

Richie Rich said:
It sounds reasonable in theory. But why AWS comparison of Graviton2 vs. Rome is with SMT2 enabled?

Amazon cloud can disable SMT but it's asked mainly by customer whos running HPC loads. https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/

As far as most server systems has SMT ON, then you are wrong. However feel free to provide the data that most servers running SMT OFF. Not mentioning that some tasks with low ILP like SQL benefits from SMT a lot.

Zen3 with 8xALU and SMT4 would need to reduce SMT4 down to SMT2. Disabling completely SMT would let the core underutilized.

Why would Amazon care if SMT is Off or On with respect to software licensing costs? All they have to provide is the VM instance. Depending on what software foundation they're using, they aren't seeing any higher licensing costs per thread over per core. Its the user of the iron that's got to figure out what's best for them. If I'm throwing a software package on the cloud that I have to pay "per-thread" licensing for, then all I care about is what cloud has the best performance per thread and per dollar of CPU time that works with my licensing model. If I'm hosting my own server, and I'm paying for a DB package that is licensed by the thread, then I'm going to be looking for the solution that gives me the lowest cost of performance that can fit in my existing footprint. That's a complicate calculus as rack space is finite, cooling costs money, and per thread licensing can be quite expensive. Its not impossible that it makes more sense to get more physical cores in more physical systems because my per thread licensing is crazy high, and I better maximize the performance of each clock cycle that I have to license.

If I'm just hosting iron for other people, then all I have to worry about is my hypervisor, uptime management, and load balancing among systems while providing the highest value per vCPU minute that I offer. SMT makes sense for me.

Richie Rich · Jul 1, 2020

LightningZ71 said:
Why would Amazon care if SMT is Off or On with respect to software licensing costs? All they have to provide is the VM instance. Depending on what software foundation they're using, they aren't seeing any higher licensing costs per thread over per core. Its the user of the iron that's got to figure out what's best for them. If I'm throwing a software package on the cloud that I have to pay "per-thread" licensing for, then all I care about is what cloud has the best performance per thread and per dollar of CPU time that works with my licensing model. If I'm hosting my own server, and I'm paying for a DB package that is licensed by the thread, then I'm going to be looking for the solution that gives me the lowest cost of performance that can fit in my existing footprint. That's a complicate calculus as rack space is finite, cooling costs money, and per thread licensing can be quite expensive. Its not impossible that it makes more sense to get more physical cores in more physical systems because my per thread licensing is crazy high, and I better maximize the performance of each clock cycle that I have to license.

If I'm just hosting iron for other people, then all I have to worry about is my hypervisor, uptime management, and load balancing among systems while providing the highest value per vCPU minute that I offer. SMT makes sense for me.

SMT has some pros and cons like any other tech. Please do not talk only. But give me a proof that more than 50% of servers runs with SMT OFF today. Give numbers, links...

maddie · Jul 1, 2020

moinmoin said:
Everybody who's building servers optimized for specific purposes?

What I meant is that AMD obviously expended a lot of effort into increasing the effectiveness of SMT and now because of their pricing structure relative to memory, it's more cost effective for some server clients to disable the feature.

In other words, choosing to sell at such reduced prices negated the SMT work. Was this predicted?

moinmoin · Jul 1, 2020

maddie said:
What I meant is that AMD obviously expended a lot of effort into increasing the effectiveness of SMT and now because of their pricing structure relative to memory, it's more cost effective for some server clients to disable the feature.

In other words, choosing to sell at such reduced prices negated the SMT work. Was this predicted?

In a way it was, as it's the natural fate of any all-rounder that not all of its capability is being used to the same degree at the same time (Pareto principle). It's the resulting flexibility why all-rounders are usually chosen, but flexibility can also mean disabling parts of the capability if that improves other parts of the whole like it's possible in this case.

Speculation: Ryzen 4000 series/Zen 3

Senior member

Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Platinum Member

Senior member

Senior member

Diamond Member

Golden Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member