***OFFICIAL*** Ryzen 5000 / Zen 3 Launch Thread REVIEWS BEGIN PAGE 39

Page 26 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Possibly, but sub $300 6 core is certainly going to happen, no? I mean, Rocket Lake of all things being a price hike with worse prices than Zen3 would be embarrassing.

Well . . .

Well, Intel needs to accept they cant keep having the same margins, rebranding when they dont longer have ANY lead is a bad idea, the only chance they have is in the mainstream area were AMD is going to keep using Zen2, Renoir (maybe) and Cezanne. Unless AMD launches a Zen3 below the 5600.

The problem is: which chip do they aim 6c Rocket Lake-S at, exactly? A well-binned 6c Rocket Lake-S will compete with a 10700k with fewer cores, and probably use a little less power. You would think Intel would price it against that. However the 5600 may eat their lunch, or at least come close enough in performance to be a problem. It really depends on how arrogant they are and how well they can push chips without the product posting up well against what may be a scarce (for a few months) part from their primary competitor. And as @jpiniero seems to indicate, there may only be the 8c Rocket Lake die anyway, so Intel may not prioritize many Rocket Lake dice for anything other than top-tier products. 6c Rocket Lake-S might not exist at all. If it does, it's probably not going to be a discount part.

The half memory write bandwidth was per 4c CCX in Zen 2. Ryzen 5000 appears to use the same IOD, and every CCD has one 8c CCX instead two 4c CCX, so 5800X should have (two half = one) full memory write bandwidth.

I'm not sure I follow the reasoning.

3900X has full memory write bandwidth with two CCDs. 3800X has half memory write bandwidth with one CCD. Assuming write bandwidth is set up on a per-CCD basis (rather than per CCX), you would think the 5800X would also have half memory write bandwidth.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
Why are you assuming the bandwidth is "halved" in any way? The I/O die is connected to each CCD with an IF link. That means a CCD always sees the same bandwidth towards a I/O die, then yes, 3900X/5900X and 3950X/5950X have 2 CCDs and then they have 2 IF links (one per CCD) but each core "sees" the same bandwidth in accessing the memory, as there is no way to share the "double link" over two CCDs.
 
  • Like
Reactions: Tlh97 and scannall

CP5670

Diamond Member
Jun 24, 2004
5,508
586
126
Even if that is the case, AMD will still have to compete with themselves. No one really likes the $50 price hikes with Zen 3, but the performance per dollar uplift is still there. It just not as good as it could have been.

Even when a company has a dominant position it can still keep pushing ahead. Apple's SoC team is a pretty good example of this. Companies who don't make significant improvements won't get me to replace what I already have from them. There's a reason I never bothered to upgrade the Ivy Bridge CPU I had to a newer Intel offering. There just wasn't enough uplift for what it cost.

I'm definitely getting a Zen 3, maybe even one that has a little more hardware than I honestly need. However once I have it I have it and past performance doesn't give me a compelling reason to upgrade. If AMD wants me to open my wallet for them again they need to give me a reason to do so.

This is kind of an issue with desktop CPUs in general. Zen 3 looks great but it's hard to justify getting one if you have pretty much anything midrange/high end from AMD or Intel from the last 2 or 3 years. I tend to only upgrade hardware if the improvement is clearly noticeable, or if there is a specific feature I want. A CPU easily lasts 5+ years these days before it starts feeling too slow.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
3900X has full memory write bandwidth with two CCDs. 3800X has half memory write bandwidth with one CCD. Assuming write bandwidth is set up on a per-CCD basis (rather than per CCX), you would think the 5800X would also have half memory write bandwidth.
With Zen 2 every single CCX was connected separately with the IOD, each of them getting a 32B/cycle read and 16B/cycle write connection. With Zen 3 the CCX is double the size and should get double the connections, so 64B/cycle read and 32B/cycle write.

Why are you assuming the bandwidth is "halved" in any way? The I/O die is connected to each CCD with an IF link. That means a CCD always sees the same bandwidth towards a I/O die, then yes, 3900X/5900X and 3950X/5950X have 2 CCDs and then they have 2 IF links (one per CCD) but each core "sees" the same bandwidth in accessing the memory, as there is no way to share the "double link" over two CCDs.
With Zen 2 the connections were per CCX, not per CCD.

Write bandwidth being half of the read bandwidth is widely known, that's how both AMD's and also Intel's core are set up. With Zen 2's hierarchy AMD opted to apply the same balance to the IF links as well. We had a thread on that: https://forums.anandtech.com/threads/ryzen-3700x-low-ram-write-speed-conundrum.2567215/
Igor's Lab got a response from AMD confirming that it's a design choice last year, though oddly enough that quote didn't spread outside of German press apparently:
"Concerning AIDA 64, we didn’t talk about it in the call, but one of the optimizations brought by Zen 2 is the reduction of the writing bandwidth from a CCD to the IOD from 32B/cycle to 16B/cycle when the writing bandwidth remains fully provisioned at 32B/cycle. Since workloads have little writing, the link does not need to be 32B wide. This choice of design makes it possible to optimize the surface area used but also the consumption to concentrate innovation efforts in other parts of the architecture. In other words, with the 3700X and only one chiplet the observed behavior is normal (and on 3900X with two chiplets you will logically observe higher theoretical writing results)."
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
With Zen 2 the connections were per CCX, not per CCD.

Write bandwidth being half of the read bandwidth is widely known, that's how both AMD's and also Intel's core are set up. With Zen 2's hierarchy AMD opted to apply the same balance to the IF links as well. We had a thread on that: https://forums.anandtech.com/threads/ryzen-3700x-low-ram-write-speed-conundrum.2567215/
Igor's Lab got a response from AMD confirming that it's a design choice last year, though oddly enough that quote didn't spread outside of German press apparently:
"Concerning AIDA 64, we didn’t talk about it in the call, but one of the optimizations brought by Zen 2 is the reduction of the writing bandwidth from a CCD to the IOD from 32B/cycle to 16B/cycle when the writing bandwidth remains fully provisioned at 32B/cycle. Since workloads have little writing, the link does not need to be 32B wide. This choice of design makes it possible to optimize the surface area used but also the consumption to concentrate innovation efforts in other parts of the architecture. In other words, with the 3700X and only one chiplet the observed behavior is normal (and on 3900X with two chiplets you will logically observe higher theoretical writing results)."

I understand that, but even if the bandwidth per CCX is doubled, the effective bandwidth per core is the same, as we have now 8 cores per CCX instead of 4. What I mean, is, that we have for sure advantages with this setup compared to the past, but from a theoretical point of view if you need concurrent accesses from multiple cores at the same time we could have conflicts reducing the effective bandwidth like in the past. Of course for writes this is unlikely, but for reads it may happen more often. I think the effectiveness of this improvement is better seen in single/lightly threaded workloads.
 

biostud

Lifer
Feb 27, 2003
18,193
4,674
136
AMD had to get market share and build a reputation for delivering solid CPUvs and platforms after the Bulldozer generation. They have proven that with the first three generations of ryzen. Now it seems they actually have a better cpu on all fronts and they have shown they can deliver, so obviously they want to increase margins. They are no longer the cheap alternative to Intel, they are market leaders. Once rocket lake is released, a price war will begin, maybe with AMD launching non X variants, or adapting prices on the X versions. But as long as they sell the best processors they want to earn as much money as possible.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Write bandwidth being half of the read bandwidth is widely known, that's how both AMD's and also Intel's core are set up.

I think on Intel memory bw is symetrical and does not discern between writes and reads.
On AMD there is bw limit for 1 CCD chips, 8 cores are limited to half of read BW.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I understand that, but even if the bandwidth per CCX is doubled, the effective bandwidth per core is the same, as we have now 8 cores per CCX instead of 4. What I mean, is, that we have for sure advantages with this setup compared to the past, but from a theoretical point of view if you need concurrent accesses from multiple cores at the same time we could have conflicts reducing the effective bandwidth like in the past. Of course for writes this is unlikely, but for reads it may happen more often. I think the effectiveness of this improvement is better seen in single/lightly threaded workloads.
Yeah, it's pure over-provisioning like AMD does in many areas of its Zen design. While effective bandwidth per core is the same, the improvement boils down to 8 cores having a bigger chance of getting their bandwidth needs satisfied by 64B resp. 32B/cycle than 4 cores by 32B resp. 16B/cycle. Lightly threaded workloads profit in both cases, but with a group of 8 instead two 4 cores with a resp. higher bandwidth they will happen more often.

I think on Intel memory bw is symetrical and does not discern between writes and reads.
On AMD there is bw limit for 1 CCD chips, 8 cores are limited to half of read BW.
For Intel the asymmetrical approach is used only in select areas instead throughout the design like AMD does, e.g. Skylake uses it for L1 data cache with 2x Loads and 1x Store (32B/cycle each on client, 64B/cycle each on server).

(Btw. bandwidth increase for Load/Store very likely is one of the major performance improvements Zen 3 received.)
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
For Intel the asymmetrical approach is used only in select areas instead throughout the design like AMD does, e.g. Skylake uses it for L1 data cache with 2x Loads and 1x Store (32B/cycle each on client, 64B/cycle each on server).

(Btw. bandwidth increase for Load/Store very likely is one of the major performance improvements Zen 3 received.)

I think asymmetry in inner core has little to do with AMD choice to save on write bandwith in path to memory. Several ZEN2 cores already can saturate ~25GB/s of write bw available for DDR3200 setup on single CCD chips. And dual CCD chip even if they show high write, do nothing for that limit of write BW available to cores grouped, just the test is run on cores from both CCDs and speeds summed.
I am certain that it is not a problem in most workloads, but on Intel there is no such asymmetry between write/read path to memory.

If ZEN3 retains this config, it will be dissapointing, as with more powerful cores it is easier to hit limits with workloads where write bw matters.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
Mark Papermaster interview by Ian Cutress:

https://www.anandtech.com/show/1617...ermaster?utm_source=twitter&utm_medium=social

This is a nice point, Ian also touched upon in his youtube channel (tech-tech potato):

IC: Zen 3 is now the third major microarchitectural iteration of the Zen family, and we have seen roadmaps that talk about Zen 4, and potentially even Zen 5. Jim Keller has famously said that iterating on a design is key to getting that low hanging fruit, but at some point you have to start from scratch on the base design. Given the timeline from Bulldozer to Zen, and now we are 3-4 years into Zen and the third generation. Can you discuss how AMD approaches these next iterations of Zen while also thinking about that the next big ground-up redesign?


MP:
Zen 3 is in fact that redesign. It is part of the Zen family, so we didn’t change, I’ll call it, the implementation approach at 100000 feet. If you were flying over the landscape you can say we’re still in the same territory, but as you drop down as you look at the implementation and literally across all of our execution units, Zen 3 is not a derivative design. Zen 3 is redesigned to deliver maximum performance gain while staying in the same semiconductor node as its predecessor.

So Zen 3 is the ground-up redesign Jim Keller mentioned. That should mean there are again low-hanging fruit to pick for Zen 4.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Mark Papermaster interview by Ian Cutress:

https://www.anandtech.com/show/1617...ermaster?utm_source=twitter&utm_medium=social

This is a nice point, Ian also touched upon in his youtube channel (tech-tech potato):
So Zen 3 is the ground-up redesign Jim Keller mentioned. That should mean there are again low-hanging fruit to pick for Zen 4.
To be honest, the interview did not reveal much that we can't glean from the announcement video itself.
I think we are expecting WikiChip level of analysis, complete with a programming manual. :)
I have been checking the PPR for Family19h everyday, still not there yet. But seems like they only added changes only necessary for them to upstream stuffs. Because their PR usually link back to sections of the manuals which are consulted by the maintainers (Borislav Petkov, Roedel, et al) for approving the changes.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Mark Papermaster interview by Ian Cutress:

https://www.anandtech.com/show/1617...ermaster?utm_source=twitter&utm_medium=social

This is a nice point, Ian also touched upon in his youtube channel (tech-tech potato):



So Zen 3 is the ground-up redesign Jim Keller mentioned. That should mean there are again low-hanging fruit to pick for Zen 4.
My favorite points:

1) "It is in fact the core is in the same 7nm node, meaning that the process design kit [the PDK] is the same. So if you look at the transistors, they have the same design guidelines from the fab."
2) "as you look at the implementation and literally across all of our execution units, Zen 3 is not a derivative design"
3) "It was tremendous engineering on the reorganization on the Zen 3 core that truly delivers the benefit in reduced latency"
4) "when you add the amount of logic changes that we did to achieve that 19% IPC, normally of course the power envelope would go up. [...] So I think your readers would have naturally assumed therefore we went up significantly in power but the team did a phenomenal job of managing not just the new core complex but across every aspect of implementation and kept Zen 3 in the power envelope that we had been in Zen 2."
5) "The load/store enhancements were extensive, and it is highly impactful in its role it plays in delivering the 19% IPC. It’s really about the throughput that we can bring into our execution units. So when we widen our execution units and we widen the issue rate into our execution units it is one of the key levers that we can bring to bear. So what you’ll see as we roll out details that we have increased our throughput on both loads per cycle and stores per cycle, and again we’ll be having more details coming shortly."

- They achieved a 19% IPC and 24% performance per watt gain without any significant process change.
- He states that it's logic changes that achieved the 19% IPC uplift and goes on to detail wider execution units, load-store changes, and so on. When he describes the 19% IPC uplift and where it came from he does not credit the larger cache. I think the clear message is that this is a major core redesign that happens to also benefit from a larger L3$.
- This is also very interesting to me especially looking forward at AM5 and 5nm. Work out the kinks on a new design rather than taking on that issue PLUS a new socket and process.


Future-looking:

1) "if you look to the future we drive improvements in every generation. So you will see AMD transition to PCIe Gen 5 and that whole ecosystem. You should expect to hear from us in our next round of generational improvements across both the next-gen core that is in design as well as that next-gen IO and memory controller complex."

- Assume by "next round of generation improvements" he means Zen4, meaning it will also feature a next-gen IO and memory controller complex, and next-gen core. While he mentions PCIe 5, this sounds like more of a "we'll do it eventually" rather than giving a timeframe for it.


Slightly off-topic from Zen3 in particular, but still interesting:

1) "It’s not about ISA (instruction set architecture) - in any ISA once you set your sight on high performance you’re going to be adding transistors to be able to achieve that performance. There are some differences between one ISA and another, but that’s not fundamental - we chose x86 for our designs because of the vast software install base, the vast toolchain out there, and so it is x86 that we chose to optimize for performance. That gives us the fastest path to adoption in the industry."

- There is an extremely high level of pragmatism at play in AMD's decision-making here, refreshing to see some realism in addition to the massive advancements they're making.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
IC: With the (massive) raw performance increases in Zen 3, there hasn’t been much talk on how AMD is approaching CPU-based AI acceleration. Is it a case of simply having all these cores and the strong floating point performance, or is there scope for on-die acceleration or optimized instructions?

MP:
Our focus on Zen 3 has been raw performance - Zen 2 had a number of areas of leadership performance and our goal in transitioning to Zen 3 was to have absolute performance leadership. That’s where we focused on this design - that does include floating point and so with the improvements that we made to the FP and our multiply accumulate units, it’s going to help vector workloads, AI workloads such as inferencing (which often run on the CPU). So we’re going to address a broad swatch all of the workloads. Also we’ve increased frequency which is a tide that, with our max boost frequency, it’s a tide that raises all boats. We’re not announcing a new math format at this time.

Does anyone think that more or less confirms the rumor from nearly a year ago that AMD added a third AVX2 and FMA unit to each core in Zen 3?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Does anyone think that more or less confirms the rumor from nearly a year ago that AMD added a third AVX2 and FMA unit to each core in Zen 3?
Discussed here.
Papermaster was not willing to share anything though, so we have to wait till next month to confirm it.

+3.3% Execution Engine
  • "most likely an additional int unit and fp unit, taking it to 5x INT, 3x AGU, 3x FP" via #3

1602872762529.png

Zen2 Execution backend/Engine, 4x INT, 3xAGU, 2xFP(Mul + Add each)
1602872901976.png
 

Panino Manino

Senior member
Jan 28, 2017
813
1,010
136
To be honest, the interview did not reveal much that we can't glean from the announcement video itself.
I think we are expecting WikiChip level of analysis, complete with a programming manual. :)
I have been checking the PPR for Family19h everyday, still not there yet. But seems like they only added changes only necessary for them to upstream stuffs. Because their PR usually link back to sections of the manuals which are consulted by the maintainers (Borislav Petkov, Roedel, et al) for approving the changes.

You know what? This signals a huge and important change for AMD.
Before they used to give a lot of details about the architecture and just closer to the launch that they would give real information about performance. Now the reverse happened, they just told the performance and gave almost no detailed information about the architecture. Yes, they talked a bit about the changes, o convey that Zen 3 changed a lot, but what I want to say is that they now can just talk about the performance and no ones questions anymore if it's true. How they achieved such performance becomes an academic curiosity, they established Zen's reputation so much that they don't need to prove anymore.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,075
136
So architectural width is judged by how many INT units right? So Zen 2 is 4 issue wide, and if Zen 3 gets another INT unit it will be 5 issue, just like Willow Cove?


Zen2 can issue 6 uops a cycle to combination of ALU/AGU /FPU schedulers

Zen2 has
4 ALU ports
3 AGU ports
- 1 data store
- 2 data load
4 FPU ports


Zen2 can retire 8 uops a cycle

so if ALU and FPU increase i would expect 8 uops a cycle to be issued , not sure about retirement kept at 8 or upto 10?
AGU is more interesting , will there be another AGU or just additional load and store paths?
if number of AGU's increase is it still a unified scheduler?

i wonder how big the PRF/ROB will be?


edit:

for reference,

willow cove :
issue 6 uops from frontend
4 shared ALU/FPU ports
4 AGU ports
- 2 data store
- 2 data load
Sunny cover added a very big ROB/PRF they are much bigger then Zen2's

So Zen2 is already just as wide overall then Sunnycove. Sunny Coves OOOe window will be quite a bit bigger then Zen2's. Sunny coves 2nd store port is limited in when it can be used for a single thread ( probably helps in SMT alot) will be interesting to see what Zen3 can do.

The other thing to remember is not to forget palm cove, intel didnt really tell us anything about it so its hard to know just how much was new for sunny cove vs palm cove. The changes look quite big coming from skylake.
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,685
3,957
136
AMD might have used the combined ALU/AGU port known previously as AGLU. We'll know soon, so many interesting details are yet to be uncovered. AMD didn't talk much about FP units except that they are also "wider" (3rd FMAC?).
It's great to see AMD again on the top of the IPC hill, hopefully we will see another major (~15%) uplift with Zen4, it sure seems like it will happen.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
So architectural width is judged by how many INT units right? So Zen 2 is 4 issue wide, and if Zen 3 gets another INT unit it will be 5 issue, just like Willow Cove?
1602888830876.png

Zen2 is 6 issue wide. Decode is 4 wide.
In the execution engine, there are much more ports. 7 (4x INT/ 3x AGU ) + 4 ( 2x MUL / 2x ADD, in 2 FP units). So from Papermaster presentation this will go up.

For the front end, there are some unmentioned improvements, dont know what.
1602889482297.png

But we can expect issue width to go up and probably decode as well.
From their patents they keep mentioning decode and dispatch burn a lot of power. So without a node improvement they might need to look for other knobs.
Micro op cache compaction and virtualization are some of the ideas they are investigating with their patents.

AGU is more interesting , will there be another AGU or just additional load and store paths?
I think only two of the AGUs are identical.

So Zen2 is already just as wide overall then Sunnycove.
Sunny has wider decode too.