Question Need help figuring out whether MB/RAM/CPU is faulty

clpalmer

Junior Member
Mar 11, 2021
6
0
6
Hi!

I recently purchased parts to build a new system, including:

i7-10700KF
ASUS PRIME Z590M-PLUS MB
16G (2 x 8G) DDR4-3600 G-Skill Ripjaws V
Rosewill Glacier 600W PSU

Running linux on it, with no changes but to put the MB into XMP for the 3600 RAM and started noticing frequent crashes in chrome browser and in games. Ran memtest86 and it threw some RAM errors pretty quickly, so I called up the store and started an RMA process. Before shipping them back, though, I wanted to take a closer look as there were some other oddities.
I have another system, also running Linux, that's an:
i5-7600K
GIGABYTE GA-H270N-WIFI MB
16G (2 x 8G) DDR4-2400 Ballistix

I pulled out a ram stick from that and put it in the new system and got some errors. Tried all sorts of combinations of 1 or two sticks of new RAM in various slots and finally came to some conclusion that the new RAM was faulty as I was able to run a memtest on the old RAM in the proper single-stick slot for a few hours with no errors (attributed the one fault I did see with the old RAM to wrong bios settings when I was messing with the new RAM, but can't say whether it wasn't an actual fault that just shows up more rarely due to the lower RAM speed).

The other odd things I found are:

- The kernel reports an MCE consistently on boot, but I can't seem to find info on what exact problem that is:
Code:
Mar 10 12:30:45 server kernel: [    0.227974] mce: [Hardware Error]: Machine check events logged
Mar 10 12:30:45 server kernel: [    0.227979] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee0000000040110a
Mar 10 12:30:45 server kernel: [    0.227983] mce: [Hardware Error]: TSC 0 ADDR fef20200 MISC 43880000086
Mar 10 12:30:45 server kernel: [    0.227991] mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1615408240 SOCKET 0 APIC 0 microcode e2

- memtest86 indicates horrible cache and RAM speeds on the new i7 compared to the old i5, but I don't know what the other factors are there or if that's relable info

Old system, with 1 stick of the old RAM
Code:
PassMark MemTest86 V9.0 Free
Intel Core 15-7600K @ 3.80GHz
CIk/Temp: 3793 MHz / 76C
L1 Cache : 64K 247.4 GB/s
L2 Cache : 256K 111.0 GB/s
L3 Cache : 6144K 61.6 GB/s
Memory: 8085M 14.9 GB/s
RAM Info : PC4-19200 DDR4 XMP 2400MHZ / 16-16-16-39 / Cruc
CPUS Found: 4
CPUS Started: 4
CPUS Active: 4

New system with 1 stick of the old RAM
Code:
PassMark MemTest86 V9.0 Free
Intel Core 17-10700KF @ 3.80GHz
CIk/Temp: 3832 MHz / 24C
L1 Cache : 64K 42.8 GB/s
L2 Cache : 256K 18.2 GB/s
L3 Cache : 16384K 9.8 GB/s
Memory: 8073M 6288 MB/s
RAM Info : PC4-19200 DDR4 XMP 2400MHZ / 16-16-16-39 / Cruc
CPUS Found: 16
CPUS Started: 8
CPUS Active: 8

Then I took the new RAM and put it in the old system and it ran a bit better, but did find errors in test 6 after a half-hour or so:

Old system with 2 sticks of new RAM:
Code:
PassMark MemTest86 V9.0 Free
Intel Core 15-7600K @ 3.80GHz
CIk/Temp: 3793 MHz / 65C
L1 Cache : 64K 245.7 GB/s
L2 Cache : 256K 111.2 GB/s
L3 Cache : 6144K 63.5 GB/s
Memory: 15.8G 20.2 GB/s
RAM Info : PC4-28800 DDR4 XMP 3602 MHZ / 16-19-19-39 /G Sk RAM Temp
CPUS Found: 4
CPUS Started: 4
CPUS Active: 4

New system with 2 sticks of new RAM:
Code:
PassMark MemTest86 V9.0 Free
Intel Core 17-10700KF @ 3.80GHz
CIk/Temp: 3832 MHz / 28C
L1 Cache : 64K 43.1 GB/s
L2 Cache : 256K 18.2 GB/s
L3 Cache : 16384K 9.8 GB/s
Memory: 15.8G 6403 MB/s
RAM Info : PC4-28800 DDR4 XMP 3602 MHZ / 16-19-19-39 / G Sk
CPUS Found: 16
CPUS Started: 8
CPUS Active: 8

So my conclusion now is that at least one stick of the new RAM is faulty, as it's reporting errors on both the new and old system. However, what I don't get is:
  • Why there's an MCE error on boot
  • What that error means and if it's a concern at all
  • Why my cache and RAM speeds are so horrible on the new i7 compared to the old i5
  • Are there any bios settings that would affect cache/RAM speeds? I tried both Auto and XMP I and both seem to perform the same and configure it for 3600MHZ, 16-19-19-19 1.35V which is exactly as spec'd on the sticks.
  • Are there any other tests I can do (from linux or from a boot stick) that could help pinpoint MB or CPU issues if it's not already evident from the above?
To add to my concerns, here's a geekbench score for it:

It scores 874 single and 3833 multi, which is far lower than the typical 1200-1300 single and 6k-10k multi scores that other systems get with the same CPU...

Can anyone with more experience provide some insight? Thanks in advance!
 
Last edited:

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
28,446
20,452
146
The MCE is hardware, and you are doing the right thing by starting with the ram. If your ram is the F4-3600C16D-32GTZNC kit then it is on the QVL. If the new kit throws up errors, the board is the next thing to be replaced. You obviously are an experienced user, so as you are aware sometimes it come down to the process of elimination. I can also understand your indecision on the culprit since the old ram errored too. That points more toward the board, but the ram is smaller and easier to rma, so may as well start there.

Please let us know how it turns out, and I hope it gets resolved quickly so you can get to using your new system. :beercheers:
 
  • Like
Reactions: AnitaPeterson

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
Probably a stupid question, but have you assembled and tested the board outside the case to see if it makes a difference? A grounding issue between the board and the case can also cause these type of problems.
 

clpalmer

Junior Member
Mar 11, 2021
6
0
6
Not a stupid question at all. That kind of info is exactly why I'm here. Gonna RMA the RAM as it fails in the other machine as well and the other machine's ram works fine here, but I will also do a bench test and see how it goes. Thanks!
 

clpalmer

Junior Member
Mar 11, 2021
6
0
6
@Steltek - Tried a bench test out of the case and saw similar results to in-case.

After that, however, decided to go digging through the bios settings and flipping a bunch of stuff around and saw a huge performance increase. After a process of elimination, it seems like it's the Intel Speed Shift setting that's causing the issue. I reloaded bios optimized defaults and then just disabled speed shift (left speed step on auto and turbo mode enabled) and now the geekbench score is far closer to being inline with the other ones of the same CPU:


and the memtest cache and RAM speeds have gone way up as well:
Code:
L1 Cache: 64K 200.3 GB/s
L2 Cache: 256K 89.2 GB/s
L3 Cache: 16384K 45.0 GB/s
Memory: 15.8G 17.0 GB/s

Still not exactly where I'd expect them to be, considering my much older i5-7600K system gets a fair bit better results, but light years better than with Speed Shift enabled.

Does anyone know whether this is indicative of a bad CPU or if there's anything further I should do to debug it?
 

VirtualLarry

No Lifer
Aug 25, 2001
56,326
10,034
126
Does anyone know whether this is indicative of a bad CPU
My gut feeling is, yes, with the caveat that you're using a new enough version of Win10 to support SpeedShift.

I recently built a 10th-gen Pentium Gold G6400 rig, with an ASRock B560M-HDV mobo, and it had SpeedShift (and SpeedStep) enabled by default. Not sure if that was the ticket, or the 3200 RAM was the star, but it was running really fast and smooth at the desktop. No games tested on it.

IMHO, you should get a spare SSD, and test with a fresh install. Moving an install from one platform to another, can wreak havoc on subtle power-management settings deep in the OS. (Only applies to Windows.)
 
Last edited:

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
28,446
20,452
146
My gut feeling is, yes, with the caveat that you're using a new enough version of Win10 to support SpeedShift.

I recently built a 10th-gen Pentium Gold G6400 rig, with an ASRock B560M-HDV mobo, and it had SpeedShift (and SpeedStep) enabled by default. Not sure if that was the ticket, or the 3200 RAM was the star, but it was running really fast and smooth at the desktop. No games tested on it.
That's something I skimmed past, he is on a Linux distro.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,326
10,034
126
Do you need to "manually config" Linux distros to take advantage of "SpeedShift"? Like somehow indicate the preferred "CPU governor" in one of those /proc/sys devices or something?
 

clpalmer

Junior Member
Mar 11, 2021
6
0
6
Yep, latest bios on it.

To add to this. After disabling the speed shift, the memtest ran happilyi for a couple hours on the new sticks with no errors.

Still can't get myself past both the MCE error that always pops up (and that I can't figure out how to decode to see what it means...):
Code:
Mar 10 12:30:45 server kernel: [    0.227974] mce: [Hardware Error]: Machine check events logged
Mar 10 12:30:45 server kernel: [    0.227979] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee0000000040110a
Mar 10 12:30:45 server kernel: [    0.227983] mce: [Hardware Error]: TSC 0 ADDR fef20200 MISC 43880000086
Mar 10 12:30:45 server kernel: [    0.227991] mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1615408240 SOCKET 0 APIC 0 microcode e2

and the fact that enabling or disabling speed shift, which 'should' be fine on auto, has such a dramatic and consistent effect on memtest cache/ram speeds and geekbench CPU benchmarks.
 
Last edited:

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
Yep, latest bios on it.

To add to this. After disabling the speed shift, the memtest ran happilyi for a couple hours on the new sticks with no errors.

Still can't get myself past both the MCE error that always pops up (and that I can't figure out how to decode to see what it means...):
Code:
Mar 10 12:30:45 server kernel: [    0.227974] mce: [Hardware Error]: Machine check events logged
Mar 10 12:30:45 server kernel: [    0.227979] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee0000000040110a
Mar 10 12:30:45 server kernel: [    0.227983] mce: [Hardware Error]: TSC 0 ADDR fef20200 MISC 43880000086
Mar 10 12:30:45 server kernel: [    0.227991] mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1615408240 SOCKET 0 APIC 0 microcode e2

and the fact that enabling or disabling speed shift, which 'should' be fine on auto, has such a dramatic and consistent effect on memtest cache/ram speeds and geekbench CPU benchmarks.

This one might actually be worth a contact with ASUS support, if you can get past the first line support drones. Speed Shift has been around long enough (since 2015 with the Skylake CPUs, if I remember right) that there should be at least some level of default support baked into the Ubuntu 20.04.2 Linux kernel.

The board itself is new enough with not many BIOS releases that their engineers might need to take a look at it for a UEFI BIOS fix.

For comparison purposes, it might also be worth downloading Win10 and doing a throw-away temp install just to see if the same issues apply in Windows.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,400
2,436
146
If the RAM is erroring in either system, in MT86, it sounds like a RAM issue, regardless of BIOS settings most likely. That said, what is speedshift exactly?
 

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
That said, what is speedshift exactly?

It is the Skylake evolution of Intel's CPU dynamic frequency scaling technology (it was previously called SpeedStep in older CPU releases). Major difference between the two is SpeedShift allows CPU and OS to work in closer coordination while frequency scaling the CPU.
 

tompetrillo

Junior Member
Apr 30, 2021
3
0
6
Did you resolve your issues? I am experiencing strangely similar behavior although I do not have faulty memtest results. My ram also reports super slow speeds in memtest. I get the same error message in fedora. Any advice?
 

clpalmer

Junior Member
Mar 11, 2021
6
0
6
@tompetrillo - Sadly no, didn't resolve it. Ended up RMA'ing the RAM (as it showed faulty on memtest on both the problematic PC and another one) as well as the motherboard and CPU. They didn't have stock on the CPU or motherboard any more, so got a refund and repurchased a 10700K instead of KF, and an ASRock (similar class) motherboard. With that setup, I no longer have any issues with memtest speeds, 0 errors on a long memtest run, great benchmark scores, no hanging early in boot when using nvidia 460 drivers, no MCE error, and no other issues.

No clue if it was the combination of that motherboard and CPU, or if either was defective, so can't offer much advice there, other than if it's not working and you can return the MB/CPU, maybe look into that.
 

tompetrillo

Junior Member
Apr 30, 2021
3
0
6
@tompetrillo - Sadly no, didn't resolve it. Ended up RMA'ing the RAM (as it showed faulty on memtest on both the problematic PC and another one) as well as the motherboard and CPU. They didn't have stock on the CPU or motherboard any more, so got a refund and repurchased a 10700K instead of KF, and an ASRock (similar class) motherboard. With that setup, I no longer have any issues with memtest speeds, 0 errors on a long memtest run, great benchmark scores, no hanging early in boot when using nvidia 460 drivers, no MCE error, and no other issues.

No clue if it was the combination of that motherboard and CPU, or if either was defective, so can't offer much advice there, other than if it's not working and you can return the MB/CPU, maybe look into that.
I think you are right. There' not enough time left in my life to figure out which piece is defective -- i've already sunk a lot of time into diagnosing this and its not eary if you dont have a duplicate set of functioning components.. I'm going to RMA the mobo/cpu and buy replacements.