Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 407 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
820
1,456
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

maddie

Diamond Member
Jul 18, 2010
5,151
5,537
136
Well... as of
(1) I was talking about high temperatures in general, not a specific temperature limit value, which I assume is the result of the same trade-offs as when choosing TDP/PPT for a particular SKU.
(2) sorry, I don't quite understand what you mean by "worse than before", but assuming I may have misunderstood the point of your question, which apparently was about a specific value of 95°C, this also applies to high temperatures in general.
2)
Why is 95 C high? It's just a number. Are we biased on basing it on the past normal? Does that still apply?
 

Timmah!

Golden Member
Jul 24, 2010
1,567
920
136
We might be stuck in time regarding temps. Assuming total ignorance, why is 95 C so terrible? Is it that our past experience is preventing us from seeing a new model? 95 C in a vacuum is meaningless. Give me context, why bad?

More noise from the cooler trying to cool 95C rather than 65C. Simple as that.
 
  • Like
Reactions: igor_kavinski

Timmah!

Golden Member
Jul 24, 2010
1,567
920
136
You missed an important fact about Zen 4. No matter what solution you use, it will boost as high as possible and try to stay at or below 95 degrees. Less than that, you would need a Coolermaster Subzero AIO maybe.

Well then i am bit disappointed 420 AIO gives you only 5,1 GHz allcore. Would have hoped for 100~200 MHz more.
 

PJVol

Senior member
May 25, 2020
853
837
136
Why is 95 C high? It's just a number. Are we biased on basing it on the past normal? Does that still apply?
If we talk about the temperature limit, it's the same 95° as for Zen3 desktop parts. But I think people mostly complain about higher working temperatures for Zen4, that is quite evident from the raised power limit needed to meet the performance target.
So yes, it's neither high nor low, but just characteristic of the target frequency and current process node, regardless of what it was in a previous gen.
Nope, actually the opposite
I believe he meant not to maintain the said temps )
 
Last edited:
  • Like
Reactions: maddie

Saylick

Diamond Member
Sep 10, 2012
3,943
9,195
136
More noise from the cooler trying to cool 95C rather than 65C. Simple as that.
FWIW, the fan curve can be adjusted so that the fan isn't screaming at 95C because the CPU will get up to 95C one way or another, with emphasis on the fact that the processor will self-regulate so even with a slower fan it won't go above ~95C.
 

biostud

Lifer
Feb 27, 2003
19,775
6,862
136
So if temperature often is the limit, then would lowering the voltage increase the possibility of a higher boost clock?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,114
16,027
136
OK, so I have 2 7950x systems, one on a 420 AIO and one on a 250 watt air (dark rock pro 4). They are both running primegrid with lasso for balancing affinity, running 16 threads, which due to the affinity is most likely cores, and not smt. So its not full load, but close. They both run 95c (stock everything), they both run ~5 ghz and both run about 168 watts PPT (including the SOC power at ~13 watts.)I would try using a smaller air cooler, but it can take 230 watts, so I would rather stay in that area to be in spec. When I get time to play more with them (we are in a competition right now), I will try newer bios, and I will have 6000 cl30 memory. Right now one is running 4800 and the other 5600. cl40 or so.

The point is, using a cooler within spec of the 230 watt ppt, I see no difference in performance at the moment. And of course you remember I was running WAY under cooling capacity with the tape over the AIO and it would only run 3100 mhz. So AMD's goal of maintaining 95c seems fine, and the firmware protected the CPU when I had the tape over it, simulating a less than proper cooling solution. I will probably set 85c later, when I can play and see how it acts. So if you cheap out on the cooler, you lose performance, but it does not kill the CPU.

And one last update. This is only ONE application, but with AVX-512 utilized, nothing can touch these in performance, not even a 12700F with e-cores disabled using avx-512. (the tasks times are 8 thread, so a direct comparison is valid).

See this post in the DC arena https://forums.anandtech.com/threads/primegrid-challenges-2022.2600827/post-40861820

Zen 4 is absolutely a beast, and unbeatable, even by Adler Lake WITH AVX-512 still working (its in the log files that they are using it). And I can provide links to log files for those who think I am lying. But they won't be around forever.

See this link (if it lets you) and click on the tasks for each host and then on the task number for the log file. http://www.primegrid.com/hosts_user.php?userid=849529

Here is one such log: ( this one took 1:18 to run)

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<stderr_txt>
BOINC PrimeGrid wrapper 2.02 (Nov 23 2020 23:35:38)
running ../../projects/www.primegrid.com/llr2_1.3.0_win64_220821.exe -v
LLR2 Program - Version 1.3.0, using Gwnum Library Version 30.9
running ../../projects/www.primegrid.com/llr2_1.3.0_win64_220821.exe -oGerbicz=1 -oProofName=proof -oProofCount=128 -oProductName=prod -oPietrzak=1 -oCachePoints=0 -pSavePoints -q93839*2^13036880-1 -d -t8 -oDiskWriteTime=1
Gerbicz check is requested, switching to PRP.
Starting probable prime test of 93839*2^13036880-1
Using AVX-512 FFT length 1200K, Pass1=640, Pass2=1920, clm=2, 8 threads, a = 3, L2 = 350*291, M = 101850
Compressed 128 points to 7 products. Time : 14.390 sec.
Testing complete.
08:05:11 (7188): called boinc_finish(0)

</stderr_txt>
]]>

And lastly for those alder lake fans, here is the fastest run I could find at 2 hours and 3 minutes running avx-512:

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
BOINC PrimeGrid wrapper 2.02 (Nov 17 2020 23:46:30)
running ../../projects/www.primegrid.com/sllr2_1.3.0_linux64_220821 -v
LLR2 Program - Version 1.3.0, using Gwnum Library Version 30.9
running ../../projects/www.primegrid.com/sllr2_1.3.0_linux64_220821 -oGerbicz=1 -oProofName=proof -oProofCount=128 -oProductName=prod -oPietrzak=1 -oCachePoints=0 -pSavePoints -q107347*2^13497741-1 -d -t8 -oDiskWriteTime=1
Gerbicz check is requested, switching to PRP.
Starting probable prime test of 107347*2^13497741-1
Using AVX-512 FFT length 1280K, Pass1=128, Pass2=10K, clm=2, 8 threads, a = 3, L2 = 370*285, M = 105450
Compressed 128 points to 7 products. Time : 21.484 sec.
Testing complete.
19:20:37 (218226): called boinc_finish(0)

</stderr_txt>
]]>

@Exist50

NOTE: I did say this is one application. For a host of others see @Det0x 's posts
 
Last edited:

PJVol

Senior member
May 25, 2020
853
837
136
So if temperature often is the limit, then would lowering the voltage increase the possibility of a higher boost clock?
First, there's no way to directly control voltage in Zen when CPB (aka Precision Boost) is enabled other than manually adjusting PSM count margins (aka Curve Optimizer) used by AVFS modules.
And there are other limiters that can affect the final frequency, but regardless of which one is reached, lowering the voltage in this way almost always results in a frequency bump, unless the global frequency ceiling is hit.
 
Last edited:

Kocicak

Golden Member
Jan 17, 2019
1,177
1,232
136
Well, 95°C is actually 368 K and 65°C is 338 K. 95°C is thus 9 percent higher temperature than 65°C. All unwanted processes happening in the silicone, which are accelerated in higher temperature, will happen 9% quicker at that higher temperature. So the 10 year life of the CPU may shorten to 9 years, for example.

The difference is there, it is real, it is bad, but the question is, if it practically matters.
 

eek2121

Diamond Member
Aug 2, 2005
3,387
5,014
136
Well, 95°C is actually 368 K and 65°C is 338 K. 95°C is thus 9 percent higher temperature than 65°C. All unwanted processes happening in the silicone, which are accelerated in higher temperature, will happen 9% quicker at that higher temperature. So the 10 year life of the CPU may shorten to 9 years, for example.

The difference is there, it is real, it is bad, but the question is, if it practically matters.

Sure, if you are running it with a full load 24/7/365, but realistically it isn’t going to matter for most users. I think the complaints and ridicule regarding temps are a bit ridiculous, especially when you can change the target temp if you want.
 

maddie

Diamond Member
Jul 18, 2010
5,151
5,537
136
Well, 95°C is actually 368 K and 65°C is 338 K. 95°C is thus 9 percent higher temperature than 65°C. All unwanted processes happening in the silicone, which are accelerated in higher temperature, will happen 9% quicker at that higher temperature. So the 10 year life of the CPU may shorten to 9 years, for example.

The difference is there, it is real, it is bad, but the question is, if it practically matters.
You forgot to add, "for the identical CPU". Plus, I don't think it's a 1:1 linear correlation but a polynomial function. There are also nodes that allow for this. Some heat ( auto & aviation) engine circuits + space and military ones are temp hardened.
 

Timmah!

Golden Member
Jul 24, 2010
1,567
920
136
Nope, actually the opposite. For a given heat load (joules), it's easier or quieter for a given fan/cooler if the delta T is greater. I think there's a largely general misunderstanding of thermodynamics.

Uh, really? Then consider me one generally misunderstanding how this works and in need of explanation.
Cause from my layman view and empirical observation, the fan speed tends to generally increase as the temperature of the chip the fan is supposed to cool down, rises.
 

Abwx

Lifer
Apr 2, 2011
11,855
4,832
136
You forgot to add, "for the identical CPU". Plus, I don't think it's a 1:1 linear correlation but a polynomial function. There are also nodes that allow for this. Some heat ( auto & aviation) engine circuits + space and military ones are temp hardened.

That some behaviour similar to mechanical constraints, as long as you re within a given region you ll keep being within elastic deformation, meaning that once exerced forces are stopped the object will get back in its initial shape, but above a given force the deformation become highly non linear and the item will break, so it goes with CPU, below a given temp it will last forever but above a critical temp it will break on the short term.

Now for silicon the limit temp is in the 150-200°C region depending of the devices, macro transistors can endure up to 200°C, complex circuitries will endure the same localy but often a part of the area will reach critical temp while the whole device is still measured at 150°C, wich will imply lower max temps.
 

maddie

Diamond Member
Jul 18, 2010
5,151
5,537
136
That some behaviour similar to mechanical constraints, as long as you re within a given region you ll keep being within elastic deformation, meaning that once exerced forces are stopped the object will get back in its initial shape, but above a given force the deformation become highly non linear and the item will break, so it goes with CPU, below a given temp it will last forever but above a critical temp it will break on the short term.

Now for silicon the limit temp is in the 150-200°C region depending of the devices, macro transistors can endure up to 200°C, complex circuitries will endure the same localy but often a part of the area will reach critical temp while the whole device is still measured at 150°C, wich will imply lower max temps.
Fair enough, but.
Linear does not have to mean 1:1, which is what was said. A 9% increase in temp can = 9/X reduction in life, still linear but not 1:1.
 

maddie

Diamond Member
Jul 18, 2010
5,151
5,537
136
Uh, really? Then consider me one generally misunderstanding how this works and in need of explanation.
Cause from my layman view and empirical observation, the fan speed tends to generally increase as the temperature of the chip the fan is supposed to cool down, rises.
All other things being equal, the higher the temp difference the easier it is to dissipate energy. The entire cooling system is rules by a W/M/K, or Watts per Meter per Kelvin. So, for a given wattage dissipated, you can decrease the thickness of the CPU cover, increase the conductivity of the interface (solder), or increase the delta temp.

What you're thinking, probably, is if I set a temp goal, then to maintain it I need a bigger cooler or run my existing fans faster, which is true. What I meant is if you increase your temps, you will dissipate the same amount of power with a smaller or quieter cooler if you want. Of course one can always go too small and end up loud and noisy, but a given cooler will run slower and quieter if you raise the operating temperature of the device needing cooling, which is one reason that data centers run the CPUs at fairly high temps 24/7, The cooling costs are lower.
 

Timmah!

Golden Member
Jul 24, 2010
1,567
920
136
All other things being equal, the higher the temp difference the easier it is to dissipate energy. The entire cooling system is rules by a W/M/K, or Watts per Meter per Kelvin. So, for a given wattage dissipated, you can decrease the thickness of the CPU cover, increase the conductivity of the interface (solder), or increase the delta temp.

What you're thinking, probably, is if I set a temp goal, then to maintain it I need a bigger cooler or run my existing fans faster, which is true. What I meant is if you increase your temps, you will dissipate the same amount of power with a smaller or quieter cooler if you want. Of course one can always go too small and end up loud and noisy, but a given cooler will run slower and quieter if you raise the operating temperature of the device needing cooling, which is one reason that data centers run the CPUs at fairly high temps 24/7, The cooling costs are lower.

I see, thank you.

Anyway, if the thickness of IHS is compounding the high temp "issue" (or non-issue), as reported, then i think AMD have shot themselves into foot with their decision to keep compatibility with AM4 coolers. I mean, one already has to buy new CPU, expensive new mobo, expensive new RAM - why keeping your old cooler, like the cheapest thing of the lot, was so important, i dont understand. Additionally, are the coolers not compatible inbetween various Intel and AMD sockets (outside of the big TR/Intel 4189/4677 socket ones)? AFAIK my Skylake-X CPU was bigger than AM4 CPUs, but my AIO was compatible with both? So what kept them from increasing the size from AM4 to say LGA2066 size?
 
  • Like
Reactions: Tlh97 and maddie

Abwx

Lifer
Apr 2, 2011
11,855
4,832
136
Fair enough, but.
Linear does not have to mean 1:1, which is what was said. A 9% increase in temp can = 9/X reduction in life, still linear but not 1:1.

I m aware that linear doesnt mean 1:1 but distributivity and homogeneity of the considered function...

Here that s more akin to an exponential with a variable exponent such that its rate of change is meaningless below a given temp and abruptly increase once this level is reached.

For instance past Nvidia GFXs VRMs could reach 127°C permanently and the card still work for years without any trouble, they could have worked at say 145°C the same way, but if it ever got close to 175°C then the item duration would had been reduced to a few months at best.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,114
16,027
136
I see, thank you.

Anyway, if the thickness of IHS is compounding the high temp "issue" (or non-issue), as reported, then i think AMD have shot themselves into foot with their decision to keep compatibility with AM4 coolers. I mean, one already has to buy new CPU, expensive new mobo, expensive new RAM - why keeping your old cooler, like the cheapest thing of the lot, was so important, i dont understand. Additionally, are the coolers not compatible inbetween various Intel and AMD sockets (outside of the big TR/Intel 4189/4677 socket ones)? AFAIK my Skylake-X CPU was bigger than AM4 CPUs, but my AIO was compatible with both? So what kept them from increasing the size from AM4 to say LGA2066 size?
Keeping your old cooler was not the reason. It was so all the cooler companies could just say all of their AM4 coolers are AM5 compatible ! The amount of work (and cost) it saves all of those companies will end up saving the user a lot of grief ! (and possibly money)
 

Doug S

Diamond Member
Feb 8, 2020
3,370
5,923
136
Maybe that's the issue. We conflate heat and temperature.

You can have an extremely hot object emitting small amounts of heat, as in joules.
You can also have a slightly hot object emitting huge amounts of heat.
And all possible combinations in between.

Temperature, by itself, cannot indicate heat output.


While that's true, you can conflate power draw and heat output. If your CPU is drawing 100 watts, you know it is emitting 100 watts of heat into the ambient environment around your PC.

I agree with those who say 95C isn't a problem. If the CPU is rated for continuous operation at that temperature without compromising its life (at least not beyond the time you plan to use it) then who cares? What matters is the PC's total power draw, because that's what impacts your power bill and may heat up the room in which the PC resides to an uncomfortable level.

What if CPUs were made with a different material that allowed them to operate at 300C, would people complain about that? If it had the same lifetime, and the same overall power draw, why should you care if a small spot on the die was 300C or 95C or only 35C?
 
  • Like
Reactions: Schmide and Tlh97