Solved! How should we test the stability of CPUs which are able to boost past their all-core frequency? (With particular focus on AMD's 3950X)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
Warning: This is a very involved read that probably would take around 15 minutes.

1. Question/issue: How should we test the stability of CPUs which are able to boost past their all-core frequency? In other words, how should we test the stability of CPUs which are able to operate at a higher frequency than their “base frequency” when only a few of their cores are loaded?

Why this is more complicated than it might first appear
2. Take for example my AMD 3950X. It is able to operate up to ~4.2GHz when all cores are loaded, but is supposedly able to run 1 core at up to 4.7GHz. The implication is that if I were just to run 32 threads of Prime95, I am only really testing whether my 3950X is Prime95 stable at or below ~4.2GHz (and not whether it is stable running 1 core at 4.7GHz, or anything above ~4.2GHz for that matter). The corollary, leaving aside the complication discussed in paragraph 5 of this post, is that if I wanted to check whether the 3950X is Prime95 stable, I would have to run 1 thread of Prime95, 2 threads of Pime95, up until 32 threads of Prime95, so that I can be sure my 3950X is Prime95 stable regardless how many cores are active

3. A further complication is that the maximum heat output does not necessarily occur when all cores are loaded. For example, the 3950X that Anandtech tested consumed the most amount of power when 10 out of 16 cores are loaded. (See Anandtech’s 3950X review, page 2.) The implication is that a 3950X may very well be stable when all 16 cores are loaded, but unstable when 10 cores are loaded, because the temperature of the 3950X is higher when 10 cores are loaded.

4. Tentative conclusion #1 in light of the issues I raised in paragraphs 2 and 3 of this post: One characteristic of an ideal stress test is that it is able to dynamically adjust the number of active threads as the stress test progresses.

5. In addition, Ryzen 3rd generation CPUs (i.e. Zen 2 architecture using TSMC’s 7nm manufacturing process) are only able to reach (close to) their advertised single-core max boost speeds for extremely brief periods of time. (See Anandtech’s 3950X review (at page 2), where it was stated that peak single core frequency of 4650 MHz on the Ryzen High Performance (RHP) power plan was “very instantaneous, as when we put a consistent single thread load on the core, the [frequency] very quickly came down”. Also see this Anandtech article (at page 7) where it was stated that “Ultimately, by opting for a more aggressive binning strategy so close to silicon limits, AMD has reached a point where, depending on the workload and the environment, a desktop CPU might only sustain a top Turbo bins momentarily”.)

This behaviour is unlike modern Intel CPUs which, given sufficient cooling and a sufficiently high Power Limit 2 value, are able to boost to their maximum single-core boost frequencies until the Power Limit 2 (PL2) duration – aka Turbo Time Parameter (Tau or τ) – is reached. (See Anandtech’s 2019 interview with Guy Therien, this 2019 article, and this 2018 article.)

The implication is that since there’s no way of sustaining the maximum frequencies achieved by the Ryzen 3rd generation CPUs for any meaningful duration, there is no way of testing whether such a CPU is stable at the highest frequencies which it is able to achieve for only brief periods of time.

6. Tentative conclusion #2 in light of the issue I raised in paragraph 5 of this post: Another characteristic of an ideal stress test is that it is able to generate bursts of intense workloads interspersed with zero loads, in order to coax the CPU into operating at its highest frequencies.

Different instruction sets

7. An ideal stress test would also test every possible type of instruction a CPU supports (and every combination thereof).

8. Prime95 presumably doesn’t do this, hence I chose my words very carefully and said “Prime95 stable” in my posts and not merely “stable”. Ex hypothesis, this also means the Prime95 algorithm shouldn’t be placed on a pedestal as the gold-standard of stability tests, but merely one of several stability tests to perform.

Prime95 oddity

9. Pirme95 specific observation: I noticed that Prime95 version 28.9 causes my 3950X to produce varying amounts of heat, at least when 32 threads are running. To elaborate:

(a) The 3950X would hum along at ~70C most of the time, then occasionally hit ~90C before going go back to ~70C. The cycle then repeats.​
(b) Also, the 3950X would operate anywhere between 3.3GHz and 4.2GHz when all cores are loaded, but mostly between 3.8GHz and 4GHz. This is probably because (a) the intensity of the Prime95 workload varies over time and (b) the 3950X is being forced to operate within the specified power or current limits viz.:​
(i) Package Power Tracking (PPT), the power threshold that is allowed to be delivered to the socket;​
(ii) Thermal Design Current (TDC), the maximum amount of current delivered by the motherboard’s voltage regulators when under thermally constrained scenarios; and​
(iii) Electrical Design Current (EDC), the maximum amount of current at any instantaneous short period of time that can be delivered by the motherboard’s voltage regulators.​

Those with 3950X or any other Ryzen 3rd gen CPU, do you notice a similar behaviour when running Prime95?

4GHz 90C 97% power - 24 Dec 2019.jpg
Screenshot of HWInfo and Ryzen Master after running Prime95 Blend (32 threads)
10. I’ll include the relevant specifications/configuration of my system for reference:
  • Motherboard: MSI x570 Unify
  • Motherboard BIOS: 7C35vA2 (released 2019-11-07), and most likely includes the AMD ComboPI1.0.0.4 Patch B (SMU v46.54)
  • AMD Chipset Driver version: 1.11.22.454 (released 11/25/2019), which inter alia includes AMD Ryzen Power Plan v5.0.0.0
  • Windows Power Plan: AMD Ryzen High Performance plan (which, unlike the Ryzen Balanced plan, retains the fast Frequency Ramp-Up times - see Anandtech’s article on Collaborative Processor Performance Control 2 (CPPC2), but see this Anandtech article (at page 7) for a better explanation of CCPC2)
  • Windows build: 10.0.18363 (version 1909)
Air cooling is adequate for 3950X at default settings

11. As an aside, I am of the opinion that my Noctua NH-U14S is adequate for running 3950X at stock, since it is, broadly speaking, able to keep the 3950X at around 70C when running Prime95 even when ambient temperature is a fairly warm ~28.5C. The occasional spikes to 90C when running Prime95 will probably still occur even on the best ambient water-cooling system, since the bottleneck of the heat dissipation seems to occur at the interface between the die and head spreader, or even within the die itself. Moreover, the heatsink only feels warm to the touch (as distinct from being so warm that it is unconformable to touch for long periods of time), further suggesting that the heat dissipation capability of the NH-U14S is adequate for a 3950X running at stock.

12. The high temperatures observed with Ryzen 3rd Generation on ambient cooling is likely due to the 7nm node – a lot of heat is being generated by a relatively small die.

13. If you want to do significantly better than air cooling, then you would have to look at cooling solutions which are able to bring the temperature of the heat spreader below ambient temperatures, such as phase-change systems or Peltier coolers (aka thermoelectric cooling). Because it is only by decreasing the temperature of the heat spreader that the rate of heat dissipation from the die to the heat spreader would improve. (For the proposition that rate of heat dissipation is proportional to the temperature difference, see Fourier’s Law.) A high-end air cooling already seems to be already be able to maintain the heat spreader at close to ambient temperatures, so the best ambient water-cooling is unlikely to yield any significant benefit.

Rant

14. Rant #1: HWInfo v6.20 does not report correct voltages, clock speeds, etc. when “Memory Integrity” in Windows Security is enabled.

15. Rant #2: Ryzen Master v2.1.0.1424 will not open/start if Virtualization Based Security is enabled.

Conclusion

16. To reiterate, and putting my original question differently: is there any CPU stability test which incorporates the principles I mentioned in paragraphs 4, 6 and 7 of this post? If not, I would like to invite developers or programmers out there to design one.

17. The reasons why I believe it’s particularly important to test Ryzen 3rd Generation CPUs even at stock and independently verify the stability of the CPU is as follows:

(a) First, because of how close these CPUs are operating to their maximum potential. As noted in this Anandtech write-up on AMD’s boost behaviour (at page 3), “….the CPU out of the box is already near its peak limits, and AMD’s metrics from manufacturing state that the CPU has a lifespan that AMD is happy with despite being near silicon limits…”.​
(b) Second, apparently AMD would gradually increase the voltage over time to compensate for the effects of electromigration (ibid). But can we trust this algorithm to accurately compensate for the effects of electromigration?​
 
Solution
I understand the original post, and the questions make sense in a way, ultimately boiling down to “how do you know a processor is stable if you can’t test it in a manner that stresses it similarly to normal use due to how the boost behavior works”

I would answer by saying “just use it”. If the system crashes a lot, it’s not stable. If it doesn’t crash, it’s good.

-AG

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Do people still use P95? I thought most had moved onto OCCT/Linpack.
 

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
Actually, after re-reading the post, after he got the 70's and 90's temps, I am not sure he trusts the CPU to survive, and wants to test it now to see if its really OK. While my reply did not address specific concerns, I think my bottom line opinion is the same as another poster. If it gets too hot, it will throttle. It won't die early.

You're right on the money. I don't quite trust the CPU to be stable over time.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,581
10,220
126
You're right on the money. I don't quite trust the CPU to be stable over time.
Bottom line: You don't trust AMD's Schmoo plots for each core of their chiplets, and TSMC's long-term characterization of their process technology. Well, if it's provable in court, then I expect that you would be able to get a substantial amount of money.

... that's if you even know what a Schmoo plot is, without resorting to WikiPedia.
 

JasonLD

Senior member
Aug 22, 2017
488
447
136
Bottom line: You don't trust AMD's Schmoo plots for each core of their chiplets, and TSMC's long-term characterization of their process technology. Well, if it's provable in court, then I expect that you would be able to get a substantial amount of money.

... that's if you even know what a Schmoo plot is, without resorting to WikiPedia.

Schmoo plot......:grinning:. Happy new year btw. :grin:
 
  • Like
Reactions: VirtualLarry

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,600
6,084
136
You're right on the money. I don't quite trust the CPU to be stable over time.

You're asking questions to which none of us on the forum can provide definitive answers to, either due to lacking the knowledge, or due to being bound by non-disclosure/IP agreements.

The reason why you are getting a negative reception is because your quest(ion) is quixotic.
 
  • Like
Reactions: Makaveli

VirtualLarry

No Lifer
Aug 25, 2001
56,581
10,220
126
Schmoo plot......:grinning:. Happy new year btw. :grin:
Happy New Year. I may have... spelled that wrong, and gotten it confused with a cartoon character of my youth.

Edit: Ok, chagrinned appropriately, I had to look up Schmoo on WikiPedia myself.


Yup, cartoon character. LOL. :p

Edit: Ok, not so chagrinned.


They even mention that it is based on the cartoon character. So I wasn't completely wrong. :p
 

DrMrLordX

Lifer
Apr 27, 2000
22,881
12,939
136
Why there is such a discrepancy between the quality of the articles on Anandtech and the quality of the replies on this forum is a mystery to me.

The only low-quality posting here is yours. Brevity is the soul of wit; furthermore, you've apparently got an axe to grind with AMD, and hiding it behind a bunch of flowery/over-wrought balderdash isn't helping matters.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,600
6,084
136
If the core concern re: stability is accuracy of processing work, than neither Intel nor AMD consumer-level parts should be used, as both major vendors have moved to an opportunistic turbo+ boosting algorithm (Thermal Velocity Boost in Intel's case) in response to market demands for more performance. The complaints listed in the OP would also apply to processors using TVB.

If stability and accuracy are true concerns, then Epyc or Xeon platforms with ECC should be used instead.
 
  • Like
Reactions: lightmanek

DrMrLordX

Lifer
Apr 27, 2000
22,881
12,939
136
If stability and accuracy are true concerns, then Epyc or Xeon platforms with ECC should be used instead.

He seems more concerned about whether AMD's Matisse CPUs will kill themselves via their boost algo. Or he's just raising concerns to get other people worried. One or the other. Hard to say which.

AMD had some suicidal 3600s early in the launch sequence. Seems like early board UEFIs may have been the culprit but it's hard to say for sure. You didn't hear much about the other SKUs killing themselves, and those stories (mostly on reddit) seem to have dried up.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
My belief that the quality of the articles Anandtech produces would be commensurate with intellect of the readers of Anandtech has been grossly misplaced.

I came to Anandtech forum hoping to have some constructive discussion. But even the replies by some of the moderators of the forum leave much to be desired, which make me question how they became moderators in the first place.

Why there is such a discrepancy between the quality of the articles on Anandtech and the quality of the replies on this forum is a mystery to me.
It's OK. I like to be beseeched.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
Since the operations inside the CPU are fine-tuned by AMD and have become pretty complex by now, you have to wonder if it's really up to us to try to figure out in advance how long your CPU will last.

You can try your best to keep your CPU (with a big margin) inside its temp limits, trying to extant its lifetime.
Yes lower temps are always better but the question will be does it matter.
Would you prefer a faster CPU that last 10 years or one that is slower but last 20 years. After 10 years the value of your CPU is very low so do the +10 years matter.
Obviously I am not talking about overclocking then you basically have no idea how long it will last. Some may last longer than others with the same settings.

It is the same as if you buy a car and you just avoid driving very fast in the desert so your car would not break too soon.
We do not test cars to see how long they will survive, we look back and say that one was a very good car I drove xx years with it.

Or in short make sure your cooling is OK, use stock settings and just hope that if it breaks that it breaks within your warranty.
 
  • Like
Reactions: lightmanek

chrisjames61

Senior member
Dec 31, 2013
721
446
136
My belief that the quality of the articles Anandtech produces would be commensurate with intellect of the readers of Anandtech has been grossly misplaced.

I came to Anandtech forum hoping to have some constructive discussion. But even the replies by some of the moderators of the forum leave much to be desired, which make me question how they became moderators in the first place.

Why there is such a discrepancy between the quality of the articles on Anandtech and the quality of the replies on this forum is a mystery to me.

I would say you proved your point well with your own post but I don't want to get in trouble lol!
 
  • Haha
Reactions: lobz

AnnoyedGrunt

Senior member
Jan 31, 2004
596
25
81
I understand the original post, and the questions make sense in a way, ultimately boiling down to “how do you know a processor is stable if you can’t test it in a manner that stresses it similarly to normal use due to how the boost behavior works”

I would answer by saying “just use it”. If the system crashes a lot, it’s not stable. If it doesn’t crash, it’s good.

-AG
 
  • Like
Reactions: Charlie22911
Solution

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
I understand the original post, and the questions make sense in a way, ultimately boiling down to “how do you know a processor is stable if you can’t test it in a manner that stresses it similarly to normal use due to how the boost behavior works”

I would answer by saying “just use it”. If the system crashes a lot, it’s not stable. If it doesn’t crash, it’s good.

-AG
It was more like: "How can I finally explain somewhere, that despite everything being stable and doing no overclock, my AMD 3900X will kill himself? Well, AT looks like a site operated by professionals, maybe I can beseech the forum users to appreciate my distinguished language skills."
 
  • Haha
Reactions: lightmanek

UsandThem

Elite Member
May 4, 2000
16,068
7,383
146
The only low-quality posting here is yours. Brevity is the soul of wit; furthermore, you've apparently got an axe to grind with AMD, and hiding it behind a bunch of flowery/over-wrought balderdash isn't helping matters.
Exactly. We don't want a way too long thesis to read through, or the brand-new poster rejecting posts like they are a professor admonishing the class for not following the syllabus. "How can you have any pudding if you don't eat your meat?". :cool:

As far being concerned about AMD CPUs having short lives, there's zero evidence of that. They offer a 3 year warranty on them, and it would be absolutely idiotic for them to do such a thing since they have all the momentum over the last 3 years.

That being said, Intel dropped the warranty down to one year on the 9900KS, so I'm kind of surprised you didn't mention that in your thesis original post. Are you not concerned about its boost rates and expected lifetime, or is this just an AMD hit piece?
 

DrMrLordX

Lifer
Apr 27, 2000
22,881
12,939
136
@UsandThem

If OP had wanted to make a good post, he could have asked, "Why is my 3950X fluctuating in temperature during stress tests" and "Is it going to die prematurely"? Then we could have tried to help him get down to the business of why his chip was having power/temp fluctuations. And he DID ask that question, buried in the middle of his extensive post. Too bad most readers fell asleep before getting there.

@moinmoin

May as well.
 
Last edited:
  • Like
Reactions: lobz and UsandThem

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
@UsandThem

If OP had wanted to make a good post, he could have asked, "Why is my 3950X fluctuating in temperature during stress tests" and "Is it going to die prematurely"? Then we could have tried to help him get down to the business of why his chip was having power/temp fluctuations. And he DID ask that question, buried in the middle of his extensive post. Too bad most readers fell asleep before getting there.

Then you've completely missed my point...

Also, Anandtech shouldn't bother writing articles more than ~500 words then
 
Last edited:

Pro-competition

Junior Member
Dec 13, 2019
11
0
6
The only low-quality posting here is yours. Brevity is the soul of wit; furthermore, you've apparently got an axe to grind with AMD, and hiding it behind a bunch of flowery/over-wrought balderdash isn't helping matters.
Could you summarise my post in 100 words then? I'd be grateful.