Question DEGRADING Raptor lake CPUs

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
I noticed some reports about degrading i9 13900K and KF processors.

I experienced this problem myself, when I ran it at 6 GHz, light load (3 threads of Cinebench), at acceptable temperature and non extreme voltage. After only few minutes it crashed, and then it could not run even at stock setting without bumping the voltage a bit.

I was thinking about the cause for this and I believe the problem is, that people do not appreciate, how high these frequencies are and that the real comfortable frequency limit of these CPUs is probably at something like 5500 or 5600 MHz. These CPUs are made on a same process (possibly improved somehow) on which Alder lake CPUs were made. See the frequencies 12900KS runs at. The frequency improvement of the new process tweak may not be so high as some people presume.

Those 13900K CPUs are probably highly binned to be able to find those which contain some cores which can reliably run at 5800 MHz. Some of the 13900K probably have little/no OC reserve left and pushing them will cause them to degrade/break.

The conclusion for me is that the best you can do to your 13900K or 13900KF is to disable the 5800 MHz peak, which will allow you to offset the voltage lower, and then set all core maximal frequency to some comfortable level, I guess the maximum level could be 5600 MHz. With lowered voltage this frequency should be gentler to the processor than running it at original 5500 MHz at higher voltage. You can also run it at lower frequencies, allowing for even higher voltage drop, but then the CPU is slowly loosing its sense (unless you want some high efficiency CPU intended for heavy multithread loads).

Running it with some power consumption limit dependent on your cooling solution to keep the CPU at sensible temperature will help too for sure.
 
Last edited:

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
This post is hugely incorrect.

Frequency is not really accelerating (or even causing) the wear and degradation, it's voltage that the frequency requires. Bleeding edge boost frequency like 6 GHz is not happening in isolation, it requires voltages that are close to being able to quickly damage the chip. It's the voltage that is doing the harm.
It is the electric current density that does the harm along with the temperature.

I thought this causal chain:

high required performance - high frequency - high voltage - high current density and temperature - quick degradation

is already commonly known and understood?!
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,795
14,851
136
It is the electric current density that does the harm along with the temperature.

I thought this causal chain:

high required performance - high frequency - high voltage - high current density and temperature - quick degradation

is already commonly known ?!
You're having trouble differentiating between "will" and "may", between direct and indirect causality:
  • high voltage/currents AND high temps WILL lead to degradation, these factors directly influence degradation
  • high frequency MAY lead to degradation if the parameters above are not properly controlled, this factor indirectly influences degradation
High frequency is safe as long as engineers set the proper boundaries for temps & voltage & current. Best recent example is the fmax/temperature plot for Zen 5, where everybody can see how much of an influence thermals have on max voltage settings, which then dictate the max frequency that can be achieved for a given temperature:

1727937111117.png

Max allowed frequency can go as high as 5945 Mhz @ 35C and then drop under 5400Mhz @ 80C+. And that's just an attempt to isolate frequency (voltage) vs. temps, introducing max current limits adds even more complexity.
 

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
High frequency is safe as long as engineers set the proper boundaries for temps & voltage & current.

We are not talking about whether any particular frequency is safe or not, we are talking about basic relation between variables. You need higher voltage for higher frequency and vice versa.

Intel wanted highest possible performance from the CPUs and sacrificed EVERYTHING for it. And the decision was made by top management, not the engineers.
 

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
...
However, that doesn't mean the undervolt was harmful. It just makes you notice the aging sooner (and if oyu readjust sour underclock, stuff will continue to work). Actually, since the aging gets faster with voltage, what actually means that if you underclocked at the start, you likely slowed the aging down. If it was a significant reduction, you could even slow it down so much you won't realistically see the effects of aging before you stop using the PC.
...
generally, by lowering the voltage, you can only make the CPU to age/degrade less, not more.
That undervolting slows down degradation is obvious.

The "wear-out voltage margin" ensures not only long term stability, but also reduces chance that the CPU crashes due to some motherborad power circuitry hiccup, the voltage requirements of the CPU are extremely volatile, when it is allowed to change frequency.

So that is another reason not to decrease the stability voltage margin, leave it alone instead and adjust just frequency alone.

For anybody USING the computer AS INTENDED, not just as a testing or benchmarking vehicle, stability of the system is a primary objective. Normal users do not want to deal with system crashes, losing data, reinstalling OS, etc etc.

UNDERVOLTING IS NOT A GOOD PRACTISE.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Intel wanted highest possible performance from the CPUs and sacrificed EVERYTHING for it. And the decision was made by top management, not the engineers.
That's nothing new though, that's what intel has been doing for many many years now, this is not the thing that changed.

What changed is that the CPUs got too good, they boot and work at voltages and clocks that previous gens would just outright black screen and reset bios options, but now they just work seemingly normally until they degrade enough for people to notice.
 

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
According to this Intel is done with the repair job.


I would like to remind that I (I am probably not alone) could not update the microcode and BIOS for the August update, because my system kept crashing after that and I had to revert back to previous BIOS and microcode. I also cannot update for the 12B microcode, because Gigabyte has not released the BIOS for my motherboard yet.

So I would like to know how to run the CPUs safely even without the microcode fixes.

From the Verge article:
Intel ... does not have an update on a tool to let you easily test your chip to see if it’s aged prematurely.
It should not be hard to make some utility (possibly some tool that you load from a flash drive before getting in the OS) that runs some simple stability test and after each run it decreases the voltage slightly to find out, how much of the "wear-out voltage margin" is left and that would adjust the voltage curve accordingly, or figure out if the CPU needs to have its operating frequencies lowered as well. After that adjustment you should have a reliable CPU and you could decide, if you are fine with such a CPU that would possibly need to run slower than specified, or if you would like to have it exchanged for a new piece.
 
  • Like
Reactions: Thibsie

alcoholbob

Diamond Member
May 24, 2005
6,340
410
126
According to this Intel is done with the repair job.


I would like to remind that I (I am probably not alone) could not update the microcode and BIOS for the August update, because my system kept crashing after that and I had to revert back to previous BIOS and microcode. I also cannot update for the 12B microcode, because Gigabyte has not released the BIOS for my motherboard yet.

So I would like to know how to run the CPUs safely even without the microcode fixes.

From the Verge article:

It should not be hard to make some utility (possibly some tool that you load from a flash drive before getting in the OS) that runs some simple stability test and after each run it decreases the voltage slightly to find out, how much of the "wear-out voltage margin" is left and that would adjust the voltage curve accordingly, or figure out if the CPU needs to have its operating frequencies lowered as well. After that adjustment you should have a reliable CPU and you could decide, if you are fine with such a CPU that would possibly need to run slower than specified, or if you would like to have it exchanged for a new piece.

Most likely scenario it has nothing to do with Intel default updates but something to do with vendors screwing up AC/DC Loadline settings in newer BIOS revisions. Try setting AC/DC loadline to 0.5mohm, or if you have a MSI board, set to AC/DC LL to Mode 3 and see if this helps.
 
Last edited:

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136

Is this an example?
Example of what?

There are like 8 separate things in play:
  1. Power limits which allow the CPU to produce so much heat that the thermal interfaces transfering the heat simply cannot move so much heat from the chip at 100°C or lower temperature.
  2. High operating frequencies that require high voltage which causes both high power draw and temperatures and high electric current density.
  3. Degradation of the silicon, which is a normal process that can be significantly sped up by high temperature and high electric current density.
  4. CPU microcode that operates the CPU a may protect it against unfavourable operating conditions (unfavourable in terms of speeding up degradation too much) in different degrees.
  5. Settings that MB manufacturer did.
  6. Settings that user did, as overclocking and undervolting.
  7. Cooling setup - CPU cooler and case cooling.
  8. Use/stress intensity - what workloads and how often does the CPU need to endure.
What I see in the video is a degraded chip that is momentarily stabilised.
Long term stabilisation would involve increasing the operating voltage (possibly by as much as 100mV), and limiting frequencies and power draw.
 
Last edited:

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
did we ever get any final numbers on performance after the fix?
What Intel did so far is not a real fix. Updating a microcode is just one part of the puzzle, see the points in the post above.

A REAL FIX will always require limiting frequency (and power draw most likely as well) and decreasing the performance of the CPU somewhat.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,198
126
What Intel did so far is not a real fix. Updating a microcode is just one part of the puzzle, see the points in the post above.

A REAL FIX will always require limiting frequency (and power draw most likely as well) and decreasing the performance of the CPU somewhat.
This reminds me of my days being a system builder, and selling "speed-margined" Q6600 rigs. Sure, they had a powerful heatpipe cooler, but they still ran hot. (AIO WC kits weren't really a thing back then.)

Unfortunately, the number of them that came back (OK, I only sold one of them, but it came back six months later with crashes), was too high for my tastes. I learned my lesson in selling overclocked rigs to consumers. I had to downgrade the overclock, from balls-to-the-wall, to something more "sane" and more environmentally-forgiving.

Too bad Intel didn't learn that sooner, too.
 
  • Like
Reactions: igor_kavinski

Majcric

Golden Member
May 3, 2011
1,392
55
91
What Intel did so far is not a real fix. Updating a microcode is just one part of the puzzle, see the points in the post above.

A REAL FIX will always require limiting frequency (and power draw most likely as well) and decreasing the performance of the CPU somewhat.
Frequency or limiting voltage and voltage spikes?
 

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
Limiting frequency so that the voltages the CPU needs are notably lower than the default Intel settings even after the microcode "fix", leading to much lower strain the CPU is subjected to.

BTW a few moments ago I noticed that Gigabyte finaly published the BIOS for my motherboard with the 12B microcode. I updated the BIOS, fired up HW info to find out, that there are 320W power limits in place. Then I found out, that Gigabyte by default after the BIOS update runs the Extreme Intel profile, not Performance, is it a mistake or is it because I have 13000KS installed?
 

Hitman928

Diamond Member
Apr 15, 2012
6,412
11,467
136
Limiting frequency so that the voltages the CPU needs are notably lower than the default Intel settings even after the microcode "fix", leading to much lower strain the CPU is subjected to.

BTW a few moments ago I noticed that Gigabyte finaly published the BIOS for my motherboard with the 12B microcode. I updated the BIOS, fired up HW info to find out, that there are 320W power limits in place. Then I found out, that Gigabyte by default after the BIOS update runs the Extreme Intel profile, not Performance, is it a mistake or is it because I have 13000KS installed?

Pretty sure the KS models use the extreme profile as the Intel default.
 

Hulk

Diamond Member
Oct 9, 1999
4,775
3,051
136
It is the electric current density that does the harm along with the temperature.

I thought this causal chain:

high required performance - high frequency - high voltage - high current density and temperature - quick degradation

is already commonly known and understood?!
This is a chicken or the egg situation. Yeah, it's the voltage. But the VID won't ask for high voltage unless you are running high frequency!
 

Hulk

Diamond Member
Oct 9, 1999
4,775
3,051
136
BTW. Intel takes forever to reply. Last time they contacted me it was to tell me to send a PDF of my receipt so they could get my reimbursement going. That was November 15 and I haven't heard from them since.

Running fine on auto setting with frequency capped at 5.2GHz and power at 200W.
 

Kocicak

Golden Member
Jan 17, 2019
1,158
1,218
136
This is a chicken or the egg situation.
I dont see any chickens and eggs in this situation, unless you wanted to compare it to having a breed of chicken which can lay 5 eggs a week and forced them to lay 6, and the chicken would get some liver disease from all the hormones and vitamins you fed them and die soon.
 

Hulk

Diamond Member
Oct 9, 1999
4,775
3,051
136
I dont see any chickens and eggs in this situation, unless you wanted to compare it to having a breed of chicken which can lay 5 eggs a week and forced them to lay 6, and the chicken would get some liver disease from all the hormones and vitamins you fed them and die soon.
I was defending your post where you stated frequency kills and Jan said you are hugely incorrect.

Voltage and frequency go hand-in-hand.
 
  • Like
Reactions: Kocicak