Solved! How should we test the stability of CPUs which are able to boost past their all-core frequency? (With particular focus on AMD's 3950X)

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
Warning: This is a very involved read that probably would take around 15 minutes.

1. Question/issue: How should we test the stability of CPUs which are able to boost past their all-core frequency? In other words, how should we test the stability of CPUs which are able to operate at a higher frequency than their “base frequency” when only a few of their cores are loaded?

Why this is more complicated than it might first appear
2. Take for example my AMD 3950X. It is able to operate up to ~4.2GHz when all cores are loaded, but is supposedly able to run 1 core at up to 4.7GHz. The implication is that if I were just to run 32 threads of Prime95, I am only really testing whether my 3950X is Prime95 stable at or below ~4.2GHz (and not whether it is stable running 1 core at 4.7GHz, or anything above ~4.2GHz for that matter). The corollary, leaving aside the complication discussed in paragraph 5 of this post, is that if I wanted to check whether the 3950X is Prime95 stable, I would have to run 1 thread of Prime95, 2 threads of Pime95, up until 32 threads of Prime95, so that I can be sure my 3950X is Prime95 stable regardless how many cores are active

3. A further complication is that the maximum heat output does not necessarily occur when all cores are loaded. For example, the 3950X that Anandtech tested consumed the most amount of power when 10 out of 16 cores are loaded. (See Anandtech’s 3950X review, page 2.) The implication is that a 3950X may very well be stable when all 16 cores are loaded, but unstable when 10 cores are loaded, because the temperature of the 3950X is higher when 10 cores are loaded.

4. Tentative conclusion #1 in light of the issues I raised in paragraphs 2 and 3 of this post: One characteristic of an ideal stress test is that it is able to dynamically adjust the number of active threads as the stress test progresses.

5. In addition, Ryzen 3rd generation CPUs (i.e. Zen 2 architecture using TSMC’s 7nm manufacturing process) are only able to reach (close to) their advertised single-core max boost speeds for extremely brief periods of time. (See Anandtech’s 3950X review (at page 2), where it was stated that peak single core frequency of 4650 MHz on the Ryzen High Performance (RHP) power plan was “very instantaneous, as when we put a consistent single thread load on the core, the [frequency] very quickly came down”. Also see this Anandtech article (at page 7) where it was stated that “Ultimately, by opting for a more aggressive binning strategy so close to silicon limits, AMD has reached a point where, depending on the workload and the environment, a desktop CPU might only sustain a top Turbo bins momentarily”.)

This behaviour is unlike modern Intel CPUs which, given sufficient cooling and a sufficiently high Power Limit 2 value, are able to boost to their maximum single-core boost frequencies until the Power Limit 2 (PL2) duration – aka Turbo Time Parameter (Tau or τ) – is reached. (See Anandtech’s 2019 interview with Guy Therien, this 2019 article, and this 2018 article.)

The implication is that since there’s no way of sustaining the maximum frequencies achieved by the Ryzen 3rd generation CPUs for any meaningful duration, there is no way of testing whether such a CPU is stable at the highest frequencies which it is able to achieve for only brief periods of time.

6. Tentative conclusion #2 in light of the issue I raised in paragraph 5 of this post: Another characteristic of an ideal stress test is that it is able to generate bursts of intense workloads interspersed with zero loads, in order to coax the CPU into operating at its highest frequencies.

Different instruction sets

7. An ideal stress test would also test every possible type of instruction a CPU supports (and every combination thereof).

8. Prime95 presumably doesn’t do this, hence I chose my words very carefully and said “Prime95 stable” in my posts and not merely “stable”. Ex hypothesis, this also means the Prime95 algorithm shouldn’t be placed on a pedestal as the gold-standard of stability tests, but merely one of several stability tests to perform.

Prime95 oddity

9. Pirme95 specific observation: I noticed that Prime95 version 28.9 causes my 3950X to produce varying amounts of heat, at least when 32 threads are running. To elaborate:

(a) The 3950X would hum along at ~70C most of the time, then occasionally hit ~90C before going go back to ~70C. The cycle then repeats.​
(b) Also, the 3950X would operate anywhere between 3.3GHz and 4.2GHz when all cores are loaded, but mostly between 3.8GHz and 4GHz. This is probably because (a) the intensity of the Prime95 workload varies over time and (b) the 3950X is being forced to operate within the specified power or current limits viz.:​
(i) Package Power Tracking (PPT), the power threshold that is allowed to be delivered to the socket;​
(ii) Thermal Design Current (TDC), the maximum amount of current delivered by the motherboard’s voltage regulators when under thermally constrained scenarios; and​
(iii) Electrical Design Current (EDC), the maximum amount of current at any instantaneous short period of time that can be delivered by the motherboard’s voltage regulators.​

Those with 3950X or any other Ryzen 3rd gen CPU, do you notice a similar behaviour when running Prime95?

4GHz 90C 97% power - 24 Dec 2019.jpg
Screenshot of HWInfo and Ryzen Master after running Prime95 Blend (32 threads)
10. I’ll include the relevant specifications/configuration of my system for reference:
  • Motherboard: MSI x570 Unify
  • Motherboard BIOS: 7C35vA2 (released 2019-11-07), and most likely includes the AMD ComboPI1.0.0.4 Patch B (SMU v46.54)
  • AMD Chipset Driver version: 1.11.22.454 (released 11/25/2019), which inter alia includes AMD Ryzen Power Plan v5.0.0.0
  • Windows Power Plan: AMD Ryzen High Performance plan (which, unlike the Ryzen Balanced plan, retains the fast Frequency Ramp-Up times - see Anandtech’s article on Collaborative Processor Performance Control 2 (CPPC2), but see this Anandtech article (at page 7) for a better explanation of CCPC2)
  • Windows build: 10.0.18363 (version 1909)
Air cooling is adequate for 3950X at default settings

11. As an aside, I am of the opinion that my Noctua NH-U14S is adequate for running 3950X at stock, since it is, broadly speaking, able to keep the 3950X at around 70C when running Prime95 even when ambient temperature is a fairly warm ~28.5C. The occasional spikes to 90C when running Prime95 will probably still occur even on the best ambient water-cooling system, since the bottleneck of the heat dissipation seems to occur at the interface between the die and head spreader, or even within the die itself. Moreover, the heatsink only feels warm to the touch (as distinct from being so warm that it is unconformable to touch for long periods of time), further suggesting that the heat dissipation capability of the NH-U14S is adequate for a 3950X running at stock.

12. The high temperatures observed with Ryzen 3rd Generation on ambient cooling is likely due to the 7nm node – a lot of heat is being generated by a relatively small die.

13. If you want to do significantly better than air cooling, then you would have to look at cooling solutions which are able to bring the temperature of the heat spreader below ambient temperatures, such as phase-change systems or Peltier coolers (aka thermoelectric cooling). Because it is only by decreasing the temperature of the heat spreader that the rate of heat dissipation from the die to the heat spreader would improve. (For the proposition that rate of heat dissipation is proportional to the temperature difference, see Fourier’s Law.) A high-end air cooling already seems to be already be able to maintain the heat spreader at close to ambient temperatures, so the best ambient water-cooling is unlikely to yield any significant benefit.

Rant

14. Rant #1: HWInfo v6.20 does not report correct voltages, clock speeds, etc. when “Memory Integrity” in Windows Security is enabled.

15. Rant #2: Ryzen Master v2.1.0.1424 will not open/start if Virtualization Based Security is enabled.

Conclusion

16. To reiterate, and putting my original question differently: is there any CPU stability test which incorporates the principles I mentioned in paragraphs 4, 6 and 7 of this post? If not, I would like to invite developers or programmers out there to design one.

17. The reasons why I believe it’s particularly important to test Ryzen 3rd Generation CPUs even at stock and independently verify the stability of the CPU is as follows:

(a) First, because of how close these CPUs are operating to their maximum potential. As noted in this Anandtech write-up on AMD’s boost behaviour (at page 3), “….the CPU out of the box is already near its peak limits, and AMD’s metrics from manufacturing state that the CPU has a lifespan that AMD is happy with despite being near silicon limits…”.​
(b) Second, apparently AMD would gradually increase the voltage over time to compensate for the effects of electromigration (ibid). But can we trust this algorithm to accurately compensate for the effects of electromigration?​
 
Solution
I understand the original post, and the questions make sense in a way, ultimately boiling down to “how do you know a processor is stable if you can’t test it in a manner that stresses it similarly to normal use due to how the boost behavior works”

I would answer by saying “just use it”. If the system crashes a lot, it’s not stable. If it doesn’t crash, it’s good.

-AG

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
Reserved

Edit: I would be grateful if you could cite/quote the specific paragraph of my original post if you would like to discuss any particular point raised in my post.

Edit 2: I beseech you to the entire post before posting. There's a lot of nuance in there.
 
Last edited:

thigobr

Senior member
Sep 4, 2016
231
165
116
The idea to test varying thread loads might be a good one to check how much heat the CPU generates. But under all conditions the CPU will be stable just because it can throttle. Unless you have a really faulty CPU (or any other part) the computer should be stable. But that would show anyways independent of workload.
 
  • Like
Reactions: Markfw

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
I have 3 3900x's. All are overlocked slightly and undervolted to 4.1 ghz all core load @ 1.1 vcore. They all run about 80c under a 240mm AIO @100% utilization 24/7/365. This is running all core@100% using either WCG or Rosetta@home.

I can't follow what you are trying to get at here
 
  • Like
Reactions: Drazick

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
I have 3 3900x's. All are overlocked slightly and undervolted to 4.1 ghz all core load @ 1.1 vcore. They all run about 80c under a 240mm AIO @100% utilization 24/7/365. This is running all core@100% using either WCG or Rosetta@home.

I can't follow what you are trying to get at here
Please see paragraphs 16 and 17.
 

UsandThem

Elite Member
May 4, 2000
16,068
7,380
146
At least he's not asking for "the BEST" stress-test...
Please see paragraphs 1- 17. :p

These "bossy" type posts (especially from brand new users) never go the way they envisioned them. Most people are told what we have to do in "real life", so most users don't want to be bossed around when talking tech for fun/entertainment, or to feel like they are writing their senior thesis. Just my .02.
 

chrisjames61

Senior member
Dec 31, 2013
721
446
136
Please see paragraphs 1- 17. :p

These "bossy" type posts (especially from brand new users) never go the way they envisioned them. Most people are told what we have to do in "real life", so most users don't want to be bossed around when talking tech for fun/entertainment, or to feel like they are writing their senior thesis. Just my .02.


I hope the OP gets the replies he is looking for. That beings said I think most of us come to forums to read interesting things, interact and be entertained. Reading a post like that is actually difficult and almost becomes work-like and tedious which defeats the purpose. It is just too much imho.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Please see paragraphs 1- 17. :p

These "bossy" type posts (especially from brand new users) never go the way they envisioned them. Most people are told what we have to do in "real life", so most users don't want to be bossed around when talking tech for fun/entertainment, or to feel like they are writing their senior thesis. Just my .02.
Do you realize that you still didn't cite the paragraph _you_ discussed?
🤣
 

Charlie22911

Senior member
Mar 19, 2005
614
228
116
7. An ideal stress test would also test every possible type of instruction a CPU supports (and every combination thereof).

This just isn't feasible, to test every possible combination of available CPU instructions would probably take longer than the useful life of the CPU under test. I think a better test would be one that causes the CPU to draw the maximum amount of power doing work that is verifiable (eg prime95 or LinX\Linpack).

9. Pirme95 specific observation: I noticed that Prime95 version 28.9 causes my 3950X to produce varying amounts of heat, at least when 32 threads are running.

This could be due to variable FFT lengths, you will probably get a more consistent load by manually setting FFT length.

Air cooling is adequate for 3950X at default settings

There are some VERY good air coolers out there, to say any modern CPU requires water cooling is unreasonable. This is of course ignoring the HEDT stuff Intel has out, because those parts are also unreasonable and almost no one should buy them.

12. The high temperatures observed with Ryzen 3rd Generation on ambient cooling is likely due to the 7nm node – a lot of heat is being generated by a relatively small die.

It's not really node specific so much as it is a symptom of having very dense high-power logic. There is a limit to how fast heat can "travel" through a material, also known as thermal resistance. As logic has gotten more dense, the heat generated is more concentrated. So much so that it "builds up" due to the thermal resistance of silicon itself, so you see temperatures shoot up to 70c-80c or more seemingly instantly; despite the fact the the IHS side of the die is likely still at ambient. I first noticed this was a big problem with Broadwell-E (14nm).
 

Muadib

Lifer
May 30, 2000
17,914
838
126
9. Pirme95 specific observation: I noticed that Prime95 version 28.9 causes my 3950X to produce varying amounts of heat, at least when 32 threads are running.

That's what Prime does. I would set Prime to small FFTs, and run it, watching the temps. My 3900x hits the 70 range during the test. If your is hitting the 90s, then your cooling is suspect.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
. . .

9. Pirme95 specific observation: I noticed that Prime95 version 28.9 causes my 3950X to produce varying amounts of heat, at least when 32 threads are running. . . .

This is why I did not use Prime95 when I was reviewing. Since my review systems were Intel, I had no problem using LinX 0.6.5, which was a frontend for Linpack with AVX2 libraries. YMMV, but for my systems, I had a consistent heating effect. It took about 10 minutes to equilibrate, then the heat stayed stable for the duration of my runs, no matter how long they were.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Pretty much this.
Actually, after re-reading the post, after he got the 70's and 90's temps, I am not sure he trusts the CPU to survive, and wants to test it now to see if its really OK. While my reply did not address specific concerns, I think my bottom line opinion is the same as another poster. If it gets too hot, it will throttle. It won't die early.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
My 3900x has no temp variations running Prime95 SmallFFTs. Not sure why OP has these problems.

Also, the tone of the post reminds me of a different poster around here who also seemed unimpressed with AMD's boost algorithms on Matisse.

That's why I got out of it. A roundabout, convoluted way to complain about Zen 2 boosting behavior. The tone is quite suspect.

I will say this point is valid though.

15. Rant #2: Ryzen Master v2.1.0.1424 will not open/start if Virtualization Based Security is enabled.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
The saying is "Even a broken clock is right twice a day".

BTW, a quote nazi is not anywhere near as bad as a grammar nazi. :p

Edit:

Dangit, I forgot to cite a paragraph again. I'm gonna be so perma-banned, I just know it! ;)
I knew that completely well, but I thought (silly me) I didn't have to quote the whole thing for people to get my point. Of course, what else should I expect from someone who can't even cite a single paragraph...
 
  • Haha
Reactions: UsandThem

dlerious

Golden Member
Mar 4, 2004
1,772
719
136
The saying is "Even a broken clock is right twice a day".

BTW, a quote nazi is not anywhere near as bad as a grammar nazi. :p

Edit:

Dangit, I forgot to cite a paragraph again. I'm gonna be so perma-banned, I just know it! ;)
My clocks are always right even when broken, except for digital which doesn't display time. You don't believe me? I bet it's 8:30 somewhere whenever you check this message.
 
  • Haha
Reactions: lobz

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
Please see paragraphs 1- 17. :p

These "bossy" type posts (especially from brand new users) never go the way they envisioned them. Most people are told what we have to do in "real life", so most users don't want to be bossed around when talking tech for fun/entertainment, or to feel like they are writing their senior thesis. Just my .02.

I hope the OP gets the replies he is looking for. That beings said I think most of us come to forums to read interesting things, interact and be entertained. Reading a post like that is actually difficult and almost becomes work-like and tedious which defeats the purpose. It is just too much imho.

My belief that the quality of the articles Anandtech produces would be commensurate with intellect of the readers of Anandtech has been grossly misplaced.

I came to Anandtech forum hoping to have some constructive discussion. But even the replies by some of the moderators of the forum leave much to be desired, which make me question how they became moderators in the first place.

Why there is such a discrepancy between the quality of the articles on Anandtech and the quality of the replies on this forum is a mystery to me.


Unless a moderator is posting in bold text (such as this) and signing their post with a moderator title (such as you'll find below), a poster is not posting as a moderator and is to be treated as any other member here. If you have an issue with moderation, then you should respectfully post a thread in the Moderator Discussion sub forum. Otherwise, we do not allow discussion of moderation or moderators on this board. What you have done in this post is subject to warning and sanction under our rules, but because you're new I'm only informing you of the rules rather than formally warning you.

AT Moderator ElFenix
 
Last edited by a moderator:

VirtualLarry

No Lifer
Aug 25, 2001
56,229
9,990
126
My belief that the quality of the articles Anandtech produces would be commensurate with intellect of the readers of Anandtech has been grossly misplaced.

I came to Anandtech forum hoping to have some constructive discussion. But even the replies by some of the moderators of the forum leave much to be desired, which make me question how they became moderators in the first place.

Why there is such a discrepancy between the quality of the articles on Anandtech and the quality of the replies on this forum is a mystery to me.

Quoted for posterity, and reported, for a Mod call-out, along with a member insult.
 
  • Like
Reactions: Thunder 57