Zen+: Facts, My Theories, and Other Thoughts

eek2121 · Apr 22, 2018

I'm writing this thread to invite objective discussion on the Zen 2xxx series. I want to spread a few facts around, give my theory on things, and then invite anyone else (especially those in the field) to share their thoughts. If you are an Intel fanboy, coming here to bash Ryzen, take the bashing elsewhere, this is an objective discussion on what I believe the 2xxx series actually is (though disclaimer, it sheds a bit of a negative light on AMD: More on that soon.)

Last year when AMD launched the Ryzen 1xxx series, it was an incredible leap forward. We got fast, cheap, affordable, powerful octacore chips, and even a 16 core monster for under a grand. IPC was slightly weaker than Intel's best offerings, but in exchange we got more cores and better hyperthreading performance. However, the chips had 1 major issue: Core performance boost (Precision Boost) was based on a fixed function table. As the load went up on the CPU, the frequency would step down in increments. This meant for example, that if 5 cores were being utilized 100%, the entire CPU would throttle down towards the base clock. This was mentioned briefly by AnandTech, and you can actually verify this behavior with various benchmarking software. It had nothing to do with thermals. Your CPU could be at 40C and still see the same drop. This caused performance issues in games. In addition, for some yet unknown reason, cache and memory latencies were often higher than Threadripper/EPYC counterparts.

For Zen+, I have a general theory I've come up with that is holding water thus far, so I figure I'd share it with others. Pinnacle Ridge is Summit Ridge with a software update. Whatever the differences between Threadripper chips and Ryzen chips were for Zen 1 (whether it be microcode or some other mechanism at play), the cache issues have been corrected. I expect it was a software/microcode issue. The fact that Threadripper is running the exact same die (just 'higher quality' bins) tells me that it was something minor to begin with. If you look at the technical documentation, you'll note that they even state what the latencies were supposed to be at around Threadripper levels (someone posted it somewhere, maybe thestilt?

Okay, so what about this 12nm nonsense? AMD opted not to shrink the dies to 12nm. It is true that Zen+ uses 12nm, but there are simply empty gaps taking up the space. This spread things out which caused less interference (source: anandtech). Less interference means higher clocks. So why is the 2700X so much faster than last gen? Why is it even competitive with coffee lake? We know this as well. They rewrote the boosting algorithm. The stepping table is gone and replaced with an algorithm that looks at thermal headroom and boosts as high as possible within thermal limits. The 2700X also gets a boosted TDP.

The boosting algorithm I also expect is entirely microcode and/or UEFI based.. AMD could bring this technology to Zen 1 if they want to. However I suspect they won't. We know it's not a hardware function because the boost technology is also present in Raven Ridge, which is NOT Pinnacle Ridge.

So what about the improved memory compatibility? What about improved XFR? X470 brought better boards to market. More PCB layers, better VRMs (some with heatsinks!), etc. This allows for better memory compatibility. If you look back at Threadripper vs Ryzen 1xxx, you will notice that Threadripper had better memory compatibility for many boards. This wasn't a maturity issue, this was because the the boards had more layers/traces to accomodate, and due to the way quad channel is implemented on Threadripper, there wasn't such a strain on the IMC.

There is also the SMALL possibility that the IMC may have been tweaked, but I'm betting they didn't touch the chip at all. This was all a minor port to 12nm and a software upgrade. This gives them the time they need for a significantly faster next gen chip.

So where does this leave Threadripper? I'm at a bit of a loss here. Threadripper would also get the same upgrades, but it won't benefit as near as much. They may just leave it at that. The 2950X would still be faster. However, It wouldn't get as nearly as much out of the scenario (since the dies have less TDP to work with) as the 2700X. It is possible they could raise TDP, however, the other possibility is that they could raise core count for bragging rights for the top end. I guess we'll find out.

Note that most of this information above is based on theory that is derived from information derived from the media and also the fact that I can match 2700X benchmarks by setting downcore control to 4+0, removing 2 dimms, and making sure I'm in local mode.

One final note and a bit of a rant: Only one benchmark was off with Anandtech's review that I am aware of: Rocket League. Nvidia has made some fundamental changes to their drivers (don't even make me pull out my tinfoil hat and tell you my theory on their 'optimizations'...if only AMD could get back in the graphics game), and rocket league's publisher has released a bunch of patches. That's why I call for open source benchmarking. Review sites should also keep all current chips and retest with the latest firmware, drivers, etc. every new CPU review. AT (to my knowledge) did not retest Ryzen 1xxx or CL when it did this review. I suspect that AT doesn't get to keep review samples, doesn't purchase their own retail copies, and therefore can fall into this trap from time to time. It took me maybe 2 hours to run nearly every benchmark in that suite that they were transparent about. If they wrote a script to run the benchmarks in sequence, they could have retested as many chips as they wanted in 2 hours and had the full review ready before launch. Instead we got placeholders, shills attacking AT's numbers, etc. However, every review site/channel out their has it's fault.

Thunder 57 · Apr 22, 2018

You say that PR is SR with a software update, then admit that the die has changed. If it was just software, one should technically "update" an 1800X to run like a 2700X. No update is going to change SR's cache latency. RR is also it's own die. It came out much later than SR and therefore improvements were possible.

Also, I would suspect that review sites simply don't have the time/resources to redo every benchmark whenever new hardware/software is released. Sometimes you will see retesting, like Extremetech did with the 1800X just recently. Even that is just a few notable benchmarks, though.

wahdangun · Apr 22, 2018

SR pathetic latency because it's was a rush job, to meet deadline and RR incorporating the supposed actually feature.

So no PR is not "software update" , although you can argue that it's have same exact count of transistor.

And actually if you can get the ram to run at 3200 MT/s, the latency different is not that far off(between ccx),

The biggest gain was improvement (in games) in cache, because it was reworked. But for throughput it's almost no improvement at all

bsp2020 · Apr 22, 2018

I'm pretty sure PR is just SR on refined 14nm. They used to do this for most of their chips. (RX480 to RX580, Kaveri to Carrizo, etc.) I suspect that they had some boosting algorithm which they could not quite make work in time for first gen Ryzen launch. So, they are releasing the refined microcode/firmware along with faster binned dies as 2nd gen Ryzen.

SR was designed to have 12 clock cycle L2 cache latency (https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/13). But it seems like that requires high speed dies. So, they could only sell small number of those dies as ThreadRipper/EPYC. With improved 14nm+ transistors, now they get more dies that can run L2 cache at higher speed. This also explains why AMD does not plan on updating EPYC. EPYC, since it came out a bit later than 1st gen Ryzen and use high speed bin, it already has most of the improvements in Ryzen Gen2.

I agree with OP that this leaves little room for TR improvement. So, I'm curious to see what AMD has planed for TR2 refresh later this year. I also find it interesting that AMD's highest 2nd gen Ryzen is 2700X. What does AMD plan to release as 2800X when they seem to have pushed the clock speed so high that there is no room to OC 2700X chips to higher clocks. I'm hoping that AMD will bring dual die chip as 2800X. Their embedded EPYC dual die package is not any larger than the single die version (https://overclock3d.net/gfx/articles/2018/02/21102434663l.jpg). Since the only TR that sells well is 1950X, releasing 12 core Ryzen on AM4 plarform won't hurt TR sales too much and will allow AMD to regain core count superiority. The only thing is the TDP. AMD will need to release boards that will support 140W. Maybe that's why they are planning Z490 boards (https://videocardz.com/75949/rumor-roundup-amd-z490-asus-arez-intel-8-core-coffeelake)?

Topweasel · Apr 23, 2018

I think people don't understand what the 1800x was for. It existed for 2 major reason's, the 1700 was actually the better dies on the process. So the 1700x and 1800x existed as an outlet for high leakage dies. Specially as they awaited the launch of the TR platform. The second is the most important reason which was to add extra value to the 1700 and 1700x. The nearest competitor was $1000 as launch. Buy offering the 1800x, a $500 CPU showed they were serious, but more importantly it made the 1700x and 1700 look like steals at that price. That move tripled the ASP of their CPU's overnight. Whereas if it was just the 1700x and 1700 they might not have seemed like steals being the highest priced CPU's and the 1600x and 1600 would have looked a little more like small steps down and the better buy. Getting near a $500 for $350 would sway a lot of users though.

They don't need that now. They are already fighting Intel in 8 core CPU's. They prices don't sky rocket until you get over that on Intel's side so they need to be more price competitive. They also now have a halo chip better serving that market in TR. So they can still offer these ~$300 as deals. Also means less competition for the best dies for when they launch TR2. AMD probably won't make a 2800x because the market doesn't real need one.

tamz_msc · Apr 23, 2018

It was a consensus that last time around with Ryzen 1000, the non-X versions were the better value. Now, considering how PB2 and XFR2 works, that value proposition is reversed since it's only a paltry 30$ between X and non-X. That should drive the ASP higher, which will help in closing the gap to Intel.

.vodka · Apr 23, 2018

Thunder 57 said:
You say that PR is SR with a software update, then admit that the die has changed. If it was just software, one should technically "update" an 1800X to run like a 2700X. No update is going to change SR's cache latency. RR is also it's own die. It came out much later than SR and therefore improvements were possible.

Also, I would suspect that review sites simply don't have the time/resources to redo every benchmark whenever new hardware/software is released. Sometimes you will see retesting, like Extremetech did with the 1800X just recently. Even that is just a few notable benchmarks, though.

Actually, you can change SR's cache latency on the C6H (and now C7H), with a measurable performance increase almost everywhere. There's a somewhat mysterious setting called "Performance bias" that can be configured for different software:

elmor said:
www.overclock.net/t/1624603/rog-crosshair-vi-overclocking-thread/8130#post_26001100

The performance bias options rely on setting non-default AMD settings which are disabled by AMD due to instability issues affecting some CPUs. Unfortunately it seems you have such a chip. I believe you can confirm by disabling SMT when using this option to make it work.

elmor said:
www.overclock.net/forum/11-amd-motherboards/1624603-rog-crosshair-vi-overclocking-thread-3391.html#post26772969

It's messing with a couple of hidden registers on the CPU. It might not be 100% stable on all systems, which is why we keep it as an opt-in feature.

elmor said:
www.overclock.net/forum/11-amd-motherboards/1624603-rog-crosshair-vi-overclocking-thread-3391.html#post26773089

Some systems are perfectly stable with it, while others are not. It's a simple option giving free performance, definitely worth testing.

I also remember Elmor stating that although the settings are called CBR15, CBR11.5 and AIDA+GB, these are just placeholders that don't represent anything in particular, they're just names. What the tweaks do, isn't really clear as they're undocumented by AMD, or so it would seem. We can see the end results, though.

This option exists in the C6H since day one for Ryzen/AM4.

http://www.overclock.net/forum/11-a...nce-asus-prime-x370-pro-806.html#post27215417

Nope - it wasn't created for that purpose - It was someone from ASUS that told me. They were meant to be memory/cache performance options, but because they couldn't guarantee they would work reliably with all hardware setups, they decided to leave them in but name them the way they did. If they work on your machine, they can bring real benefits to whatever you do. CB15 for example improves cache performance and can really help with workloads that use the cache a lot, not just benchmarks.

testing:

http://www.overclock.net/forum/11-a...nce-asus-prime-x370-pro-807.html#post27216417

Wow... what a discovery! Thank you very much.
I've just tested each Performance Bias options, with a x265 encoding and with Cryptonight cryptomining and the results are stunning.

x265: the lower is better
Cryptonight: the higher is better

That's a healthy performance increase. Does it ring any bells? (AMD's 3% IPC increase and cache latency decrease on Zen+). My guess is AMD finally validated this tweak on Threadripper and is on by default (as it uses the same B1 dies as AM4 Ryzen) and is naturally included in later releases like RR and PR.

SR does not have RR's and PR's improved memory controller and the ~10ns latency decrease it brings, but it can still benefit from this. Buried in many pages in the C6H thread @ OCN, I remember someone saying this tweak needs more vSOC to be stable if it produces crashes.

I have a R7 1700, batch 1708SUT (segfault suspect). Some testing of my own:

CB R15 bias selected:

That's pretty strong performance for only 3.8GHz on SR

No bias selected:

That's what you'd expect for a 3.8GHz SR

Analysis:

% ST increase: 2.58%
% MT increase: 3.51%
% L1 Latency decrease: none
% L2 Latency decrease: 40.6%
% L3 Latency decrease: 12.26%
% Memory Latency decrease: 2.66%

SR's L2 latency on paper in cycles: 17
PR's L2 latency on paper in cycles: 12
17 to 12 cycles: 41.66% decrease

AMD's data:

Yeah, that looks somewhat similar to PR's improvements on the caches, at least on the L2 that is now running at its "supposed" latency. Someone with a 2700X and a C6H/C7H should test this option to see if there's any effect.

I could do some more testing, just tell me what you'd like me to run and I'll get to it when I can.

moinmoin · Apr 23, 2018

.vodka said:
My guess is AMD finally validated this tweak on Threadripper and is on by default (as it uses the same B1 dies as AM4 Ryzen) and is naturally included in later releases like RR and PR.

This. Imo AMD's release roadmap is best thought of as more of a rolling releases where they keep tightening their validation and microcode (these are the lowest hanging fruits). New releases introduce new features that again require tightening. Earlier releases don't profit automatically as some bad dies that slipped through (due to previously insufficient validation) prevent settings to be default across the board. Due to this I'm sure AMD won't have any issue making Threadripper 2 a clear upgrade to the previous gen.

Note that AMD never really put much focus on Zen+, and the 12LP moniker was only introduced closer to PR's launch. Unlike Zen+, Zen 2 will be a new die design and as such be the real evolution of all the lessons learned through the different launches up to now.

CatMerc · Apr 23, 2018

Anandtech's review contains information about the latency. Basically, they needed to get out the product but didn't have time to tune the layout and firmware, and validate the 12 clocks latency. This was corrected with EPYC, Threadripper, and Raven Ridge. Pinnacle Ridge contains hardware tweaks that allowed them to drop the latency to 11 cycles on L2, and drop L3 latency from 39 cycles to 30 cycles.

The current boost algorithm is the one AMD intended to have from the beginning, but once again validation time reared its ugly head. Pinnacle Ridge and Raven Ridge contain the boosting algorithm that was intended from the beginning.

The memory controller is updated and is completely different physically to the found in Summit Ridge. Raven Ridge has a very similar one.

AMD did the least effort route with 12nm, manufacturing a very very similar design from their 14nm chip.

PeterScott · Apr 23, 2018

Thunder 57 said:
You say that PR is SR with a software update, then admit that the die has changed. If it was just software, one should technically "update" an 1800X to run like a 2700X. No update is going to change SR's cache latency. RR is also it's own die. It came out much later than SR and therefore improvements were possible.

I don't think the die changed. They didn't create new masks. This is really a process/microcode update.

There was an interview with someone from GF, that pretty much said this was going to happen. There are new design rules to create a smaller dies with 12LP, but only if you do a new tapeout, and they pretty much said, that OTOH, you could not do a new tapeout and keep the same size features, but benefit from only the performance improvements in 12LP, and they even said, AMD was doing this IIRC.

2700x is just a process tweak that improves the performance of transistors. This let AMD update microcode for tighter cache timings and run at slightly higher clock speed.

It was nice gain while saving all the costs/testing of a new tapeout.

raghu78 · Apr 23, 2018

PeterScott said:
I don't think the die changed. They didn't create new masks. This is really a process/microcode update.

There was an interview with someone from GF, that pretty much said this was going to happen. There are new design rules to create a smaller dies with 12LP, but only if you do a new tapeout, and they pretty much said, that OTOH, you could not do a new tapeout and keep the same size features, but benefit from only the performance improvements in 12LP, and they even said, AMD was doing this IIRC.

2700x is just a process tweak that improves the performance of transistors. This let AMD update microcode for tighter cache timings and run at slightly higher clock speed.

It was nice gain while saving all the costs/testing of a new tapeout.

Correct. AMD used the transistor improvements from 12nm and used the same 14LPP cell libraries. So same die size and transistor count. This is primarily about reducing risk and work (validation) . AMD is focussing their efforts on Zen 2.

https://www.forbes.com/sites/patric...ation-desktop-ryzen-processor/2/#313be1337427

"The company could have rearchitected 2nd Gen Ryzen with 12nm cell libraries but opted instead to use the 12nm transistors instead. Had AMD used the 12nm cell libraries, they would have opened themselves up to a lot more work and risk. I think this was a good call from a resource-payback standpoint, so AMD can focus its big resources on 7nm and Zen 2."

whm1974 · Apr 23, 2018

raghu78 said:
Correct. AMD used the transistor improvements from 12nm and used the same 14LPP cell libraries. So same die size and transistor count. This is primarily about reducing risk and work (validation) . AMD is focussing their efforts on Zen 2.

https://www.forbes.com/sites/patric...ation-desktop-ryzen-processor/2/#313be1337427

"The company could have rearchitected 2nd Gen Ryzen with 12nm cell libraries but opted instead to use the 12nm transistors instead. Had AMD used the 12nm cell libraries, they would have opened themselves up to a lot more work and risk. I think this was a good call from a resource-payback standpoint, so AMD can focus its big resources on 7nm and Zen 2."

Well to be honest, AMD doesn't have the resources or staff that they did back during the Athlon/Athlon64 days.

trparky · Apr 23, 2018

whm1974 said:
Well to be honest, AMD doesn't have the resources or staff that they did back during the Athlon/Athlon64 days.

I certainly hope that this changes with Ryzen 2 being the seller that it is since it will bring more money AMD's way.

Thunder 57 · Apr 23, 2018

.vodka said:
Actually, you can change SR's cache latency on the C6H (and now C7H), with a measurable performance increase almost everywhere. There's a somewhat mysterious setting called "Performance bias" that can be configured for different software.

Well that is interesting. I learned something new today.

PeterScott said:
I don't think the die changed. They didn't create new masks. This is really a process/microcode update.

There was an interview with someone from GF, that pretty much said this was going to happen. There are new design rules to create a smaller dies with 12LP, but only if you do a new tapeout, and they pretty much said, that OTOH, you could not do a new tapeout and keep the same size features, but benefit from only the performance improvements in 12LP, and they even said, AMD was doing this IIRC.

2700x is just a process tweak that improves the performance of transistors. This let AMD update microcode for tighter cache timings and run at slightly higher clock speed.

It was nice gain while saving all the costs/testing of a new tapeout.

Right, I think what I meant was that the process had changed, not the die. I was just trying to say that there was more to PR than software/microcode updates.

Search

Zen+: Facts, My Theories, and Other Thoughts

eek2121

Diamond Member

Thunder 57

Diamond Member

wahdangun

Golden Member

bsp2020

Member

Topweasel

Diamond Member

tamz_msc

Diamond Member

.vodka

Golden Member

moinmoin

Diamond Member

CatMerc

Golden Member

PeterScott

Platinum Member

raghu78

Diamond Member

whm1974

Diamond Member

trparky

Junior Member

Thunder 57

Diamond Member

TRENDING THREADS