Question 5800X fails single core cycler test

Nov 26, 2005
15,093
312
126
Bought the chip in Nov. 2020 for my main/daily rig and it's been running relatively smooth since then. I had a system crash with the following reboot saying there was no drive. Don't know what fixed it but 3rd or 4th boot it came back to life. I had a screen flicker about a week or so ago but nothing ended up happening: my GPU is really outdated it's a Radeon 6450 passive card with a 140mm fan blowing over it - stays cool. I'm not experiencing reboots nor WHEA errors but I decided to run a P95 core cycler test called CoreCycler-v0.7.9.2 that I found on the Overclock forums. Link: https://github.com/sp00n/corecycler The 5th or 6th core consistently gives an error (only 2 tests) The bios is a full release by Asus 3302 for the Dark Hero. However, it can pass OCCT for an hour without WHEA errors or any errors. Everything is at default, even the memory.

5800X error at stock bios - edit.jpg


It seems ok but I'm developing a little suspicion. I've already visited the AMD Warranty page. Maybe it's the bios.. I'm waiting for the next release which is fairly soon, there's a beta with the mouse fix AGESA 1.2.0.2 listed on the support page.. I'm hesitant about going back to an older bios but I might look into it if I have to.

What do you think?
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
You could try to put a little positive offset to the CPU voltage to try, if the problems go away. If they do, this offset is small and it does not cause any troubles (overheating etc.), you can keep it this way, or RMA it.

If any reasonable positive offset does not help, the cause is elsewhere.
 
Nov 26, 2005
15,093
312
126
@Kocicak I'm not even sure how to do that. In the old days we just applied more CPU vcore. I guess that would help diagnose it.. but my future concern is the resale down the road. If it's not anything else but the chip I can't re-sell it without disclosure and that's a big hit on it's value.

@Kenmitch I didn't apply any changes to the memory, that's what the board sets it to so the board's memory default. It's a 4133 cl19 1.35v kit that was intended for another board.
 
Last edited:

Kocicak

Senior member
Jan 17, 2019
982
973
136
Well, different CPUs need different voltage to run, if just may be possible that by chance your CPU was assigned a little bit lower voltage than it really needs. Applying positive offset just increases Vcore by a defined value, but lets everything else run automatically. I cannot tell where exactly is this setting in your bios. I would try to increase the voltage by 0.0125 or 0.0250V and see if it helped.

How I said, if this is the problem, you can always RMA the CPU, if you do not want a CPU with this "blemish" .
 
Last edited:
  • Like
Reactions: BTRY B 529th FA BN

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
I'm assuming you're running it at stock settings and hadn't made adjustments to the chip, in which case I'd RMA it even if you could theoretically fix it with a voltage tweak. If it's not even performing within the specifications that AMD sold it under, then I'd ask for a new one.

It isn't as though the 5800X is in any kind of short supply where you'd possibly be without a replacement for weeks or months, so I'd suggest returning it.
 
Nov 26, 2005
15,093
312
126
I'm wondering about the test and how it only uses SSE instructions, which core 5 or 6 seems to fail consistently when running at it's max potential. This seems to be an edge case scenario as described in the readme.txt I'm not familiar at all with instruction sets or how they play a part but I'm curious how often this error would come up. Like I said, I've had some odd things happen since I bought the chip that I never had happen before on a different platform and my main concern with this setup is stability. Am I concerning myself with an edge case scenario, too much? Possibly.
 
Last edited:

blckgrffn

Diamond Member
May 1, 2003
9,123
3,057
136
www.teamjuchems.com
I'm wondering about the test and how it only uses SSE instructions, which core 5 or 6 seems to fail consistently when running at it's max potential. This is seems to be an edge case scenario as described in the readme.txt I'm not familiar at all with instruction sets or how they play a part but I'm curious how often this error would come up. Like I said, I've had some odd things happen since I bought the chip that I never had happen before on a different platform and my main concern with this setup is stability. Am I concerning myself with an edge case scenario, too much? Possibly.

I don't blame you at all. It's like a sore tooth or something, once you are aware of it needs to be poked and prodded until your satisfied.

Personally, I always blame motherboard ghosts (and power supplies) for this type of phenomenon but I've seen one CPU carry issues like this from board to board in the past out of all the times it has been a bad motherboard, so it can happen.

You could start the RMA process with AMD just to see what they advise? I've never done that through them so I have no idea what it entails, but perhaps they could provide tools or tests to get at the issue in a more definitive way.
 
Nov 26, 2005
15,093
312
126
I think I might start with the voltage off-set. Then rolling back the bios. If that doesn't resolve the issue I have a local Micro Center that I could pick up another 5800X to trouble shoot. I just hate doing something like that. Returning a chip for troubleshooting purposes is unfair to the store and the next buyer. I do have another X570 + 5800X rig but I'm not willing to break down the rig just to trouble shoot.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,035
126
@Kenmitch I didn't apply any changes to the memory, that's what the board sets it too so the board's memory default. It's a 4133 cl19 1.35v kit that was intended for another board.
I would honestly try setting the DRAM Voltage to 1.35V, even if you only run them at 2133. Some of this "overclocker RAM" is only really qualified at XMP settings, and JEDEC can be hit or miss IF the DRAM chips onboard actually NEED more voltage to function properly. Technically, ALL DDR4 should be able to run at JEDEC speed/voltage specs, but I find that sometimes, "overclocker RAM" needs the VDIMM boost, even at JEDEC clocks, to function properly.

That said, one could argue that if the above is true, that the RAM is faulty in some way, and it may be.

Just be aware of that factor, and consider it in the same manner as tweaking the CPU's Core Voltage up a little bit.
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
Well, it should be kept in mind that the CPUs are oveclocked today from the factory to nearly 100% of their capability, and in this situation and limited testing time in manufacture it can simply happen that some cores are not getting so much voltage as they need in every imaginable type of load.

I do not think that a little positive offset Vcore support is a tragedy. As long as it does not worsen thermals (and consequently drop frequency) too much.
 
Nov 26, 2005
15,093
312
126
Well, it should be kept in mind that the CPUs are oveclocked today from the factory to nearly 100% of their capability, and in this situation and limited testing time in manufacture it can simply happen that some cores are not getting so much voltage as they need in every imaginable type of load.

I do not think that a little positive offset Vcore support is a tragedy. As long as it does not worsen thermals (and consequently drop frequency) too much.

But if this is more common than we know it'll create problems with resale. I thought my chip was fine until I ran this. I don't even know if my other 5800X in my gaming rig is 100% stable with this test.
 

Justinus

Diamond Member
Oct 10, 2005
3,173
1,515
136
Don't forget, this is edge case scenario. A core at full speed doing SSE instructions. My question is when does that usually come up?

I'm from the old school camp of if there is any situation it's not stable, it's not stable.

I use the system for work. If it can't pass stress testing at bone stock, the system is worthless.

It just happens that due to the nature of the very much so load-dependent boosting that using SSE in prime95 allows it to boost higher than AVX, so it will run at a higher clock tier and stress those higher clock tiers. It doesn't mean it only can error using SSE, any number of other lighter loads that achieve those high boost tiers may experience errors.

It's also not surprising my two best cores are failing - they are clearly driven to much higher clockspeeds at the same boost voltages as other cores.

I haven't been able to complete a whole pass since the script keeps having an error when switching cores - I enabled it to restart prime95 between cores, increased the wait time between switching to 30 seconds, and increased the test time to 10 minutes.

Looking forward to seeing a full pass so I can try bumping vcore offset a little.
 
  • Like
Reactions: BTRY B 529th FA BN

Iron Woode

Elite Member
Super Moderator
Oct 10, 1999
30,876
12,383
136
ENGLISH
-------
This little script will run Prime95 with only one worker thread and sets the affinity of the Prime95 process
alternating to each physical core, cycling through all of them. This way you can test the stability of your Curve
Optimizer setting for each core individually, much more thoroughly than e.g. with Cinebench or the Windows Repair, and
much easier than manually setting the affinity of the process via the Task Manager.
It will still need a lot of time though. If for example you're after a 12h "prime-stable" setup which is common for
regular overvlocks, you'd need to run this script for 12x12 = 144 hours on a 5900X with 12 physical cores, because
each core is tested individually, and so each core also needs to complete this 12 hour test individually. Respectively,
on a 5600X with its 6 physical cores this would be "only" 6x12 = 72 hours.
Unfortunately such an all-core stress test with Prime95 is not effective for testing Curve Optimizer settings, because
the cores cannot boost as high if all of them are stres tested, and therefore you won't be able to detect instabilities
that occur at a higher clock speed. For example, with my CPU I was able to run a Prime95 all-core stress test for
24 hours with an additional Boost Override of +75 MHz and a Curve Optimizer setting of -30 on all cores. However, when
using this script, and with +0 MHz Boost Override, I needed to go down to -9 on one core to have it run stable (on the
other hand, another core was still happy with a -30 setting even in this case).

When you start the script for the first time, it will copy the included config.default.ini to config.ini, in which you
then can change various settings, e.g. which mode Prime95 should run in (SSE, AVX, AVX2, CUSTOM, where SSE causes the
highest boost clock, because it's the lightest load on the processor of all the settings), how long an individual core
should be stressed for before it cycles to the next one, if certain cores should be ignored, etc. For each setting
there's also a description in the config.ini file.

As a starting point you could set the Curve Optimizer to e.g. -15 or -20 for each core and then wait and see which core
runs through fine and which throws an error. Then you could increase the setting for those that have thrown an error by
e.g. 2 or 3 points (e.g. from -15 to -13) and decrease those that were fine by 2 or 3 further into the negative (-15 to
-17). Once you've crossed a certain point however there is no way around modifying the value by a single point up/down
and letting the script run for a long time to find the very last instabilities.

By the way, it is intended that only one thread is stressed for each core if Hyperthreading / SMT is enabled, as the
boost clock is higher this way, compared to if both (virtual) threads would be stressed. However, there is a setting
in the config.ini to enable two threads as well.
 
  • Like
Reactions: BTRY B 529th FA BN

thigobr

Senior member
Sep 4, 2016
231
165
116
I sent my 5950X to RMA last week because of this issue plus WHEAs and few random crashes. The computer would be stable for days and then boom, some crash or non fatal WHEA logged. My old 3700X was fully stable on the same machine as is my temporary 1700.

In my case core #0 would fail the single core load test... It's definitely an issue as the CPU should be stable at stock settings anyways no matter their programmed burst behavior.

Positive offset just delayed PRIME95 rounding errors but after some time it would still happen. This kind of instability is unacceptable for a productivity machine...
 

Justinus

Diamond Member
Oct 10, 2005
3,173
1,515
136
I sent my 5950X to RMA last week because of this issue plus WHEAs and few random crashes. The computer would be stable for days and then boom, some crash or non fatal WHEA logged. My old 3700X was fully stable on the same machine as is my temporary 1700.

In my case core #0 would fail the single core load test... It's definitely an issue as the CPU should be stable at stock settings anyways no matter their programmed burst behavior.

Positive offset just delayed PRIME95 rounding errors but after some time it would still happen. This kind of instability is unacceptable for a productivity machine...

Do you have an ETA for receiving a replacement?
 

Justinus

Diamond Member
Oct 10, 2005
3,173
1,515
136
I stopped this after 4 I had no failures.

A quick sanity check seems reasonable to me. I don't know about the OP but my two failing cores consistently fail within about 30 seconds of the test starting on them.

They would never make it through the 6 minute test time, much less multiple passes.