Question Faulty SSD methinks, any further suggestions please (solved)

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
PC spec:

Core i3-4330
4GB DDR3
ASUS H97M-PLUS
previous SSD: Samsung 850 PRO 256GB SATA
Integrated graphics
previous OS: Windows 8.1

It was running perfectly stabily on Win81. It needed a drive capacity upgrade and Win10 installing, I went with a Samsung 970 Evo Plus M.2 1TB drive. I updated the BIOS, checked the detection of the M.2 drive, and started Win10 setup from USB. It has crashed out of setup many times at various places, but never once getting to the second interactive stage of setup (ie. creating a user, privacy settings). Asus Anti-Surge has been tripped a number of times, but even if I disabled that and fast boot (I found fast boot to be problematic during install on my own Haswell-era Asus board), I still got nowhere.

I reconnected the previous SSD and booted into Win81 (with the M.2 drive also connected). I've run a full filesystem check on the SSD, SMART data looks fine, I ran an ATTO benchmark to see whether the system would fail with a lot of writing to the SSD, I ran prime95 for ten minutes (I figured that an OS install does not cause 10 minutes of CPU saturation), and I tested the RAM for 10 hours overnight.

I then tried out a spare M.2 NVMe drive and that has got through two out of two installs of Win10 just fine whereas the 970 hasn't managed to get through even one out of many attempts.
 

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
I second RMA'ing the drive. It looks like you did all the right troubleshooting steps to narrow it to just the 970. One last thing to try forcing is M.2 slot to PCIe 2. Just for giggles?

I can't find a setting explicitly for M.2; one setting seemingly controls the entire PCIE system and another controls the PCIIEX16_1 slot. I'm giving the former a try.

- edit - hum, it rebooted even quicker than usual :)
 
  • Haha
Reactions: igor_kavinski

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
Sssssssugar. The replacement drive 970 Evo Plus just did exactly the same thing.

- edit - I only had one remaining possibility being Asus Anti-Surge, as unlikely as it was. I connected a spare PSU with the first 970 Evo Plus, no problem whatsoever. I'm kicking myself for not trying this sooner, but to be fair, it seemed a lot more likely to be the SSD.
 
Last edited:
  • Wow
Reactions: igor_kavinski

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
The fallout from this is interesting though, because when the customer wanted their machine upgrading, it seemed the logical choice to go for M.2 partly because it's a better-performing choice going forward for when this computer will eventually need replacing (the customer chose the capacity of this drive with the long term in mind too), but I would be really surprised if I would have encountered the same issue if I went with a 1TB SATA SSD. Maybe the PSU's days were numbered anyway and for some reason this situation brought it up sooner?

I wouldn't have guessed that there would be a noteworthy difference between say an M.2 drive and a SATA SSD in terms of power draw, let alone between two different M.2 drives. The 980 PRO apparently uses less than 10W under load, and this 970 Evo Plus is never going to be pushed to its limits (I would have thought) with a Haswell board surely? AT's review of the original 970 Evo Plus reckons a max of 6W draw. Maybe lots of SSD IO causes a greater power demand than raw throughput? I wouldn't have thought so.

Maybe there's an element of the M.2 spec with regard to power requirements which isn't relevant for the spare M.2 drive I tried but is for the 970 Evo Plus? I'm thinking for example how processor C6 support which was rolled in with Haswell was problematic for older PSUs.
 
Jul 27, 2020
24,268
16,926
146
NVMe SSD would keep the CPU less idle so the CPU power draw would go up. The memory subsystem would also get more active due to more data flowing in from the faster data transfer. I think it should be interesting to see how the total system power draw looks between an HDD, SATA SSD and NVMe SSD.
 

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
NVMe SSD would keep the CPU less idle so the CPU power draw would go up. The memory subsystem would also get more active due to more data flowing in from the faster data transfer. I think it should be interesting to see how the total system power draw looks between an HDD, SATA SSD and NVMe SSD.

Interesting idea, but surely if it was about overall power consumption then the memtest I did overnight or the prime95 job should have exposed such a weakness. The other thing that mildly surprises me too is how the ATTO benchmark apparently wasn't enough either; it causes high CPU usage every time during the lower size data transfer tests so that would have been CPU + M2 stress.

The other thing that surprises me about power having any role in this situation was the randomness of the crashes; I had seen it at least once that immediately after committing the partitioning (which was me just deleting the partitions) and before the stage with ticks on the screen, the system rebooted. Surely at that point Setup was just committing the new partition layout and probably doing a quick format of the file systems; how much system power demand is that going to take? On other occasions it was crashing during the 'ticks' stage, on other occasions it fell over after the first normal reboot and during 'getting ready / starting services'.

In the last ten years I've been increasingly dubious about just how much stress doing a Windows install is on modern hardware, I mean if it falls over during setup then you're almost guaranteed to have a hardware issue, but I've seen enough situations post-setup that make me believe that the more tests I perform post-setup, the better. Again I'm surprised that this issue was only exposed during setup. Logically I expect that some kind of test to thrash the SSD would have exposed the issue, but what? During Win10 setup (at least the 1803 installer I use), only about 10GB data is written. I purposefully use an old installer because a lot more writes occur during big feature updates (going from 1803 to 22H2 for example results in about another 60GB data written).
 
Jul 27, 2020
24,268
16,926
146
When I was tuning the memory timings for my EXPO RAM kit on Z790 mobo, I was really surprised to see the system boot at really low timings. It got me very excited when the final timings were something like 26-24-24-50. Then I started the Windows setup and it would crash during the copying files phase. Had to relax the timings to 28-30-30-60 and it's stable enough to work most of the time. Sometimes it will reboot. Sometimes I will be forced to reboot because Windows will start acting weird or become unresponsive. But I'm not using the system for anything critical and it's stable enough for my needs.
 

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
Maybe it is just the old board being flaky?

I'm considering doing another clean install on this drive once I've stuck a new PSU in as yet another test (the current install has also since done a 1803->22H2 feature update successfully and I copied nearly 200GB of customer data to it). That'll be a 100% failure rate for about 8 install attempts followed by the PSU swap and then two (presumably) successful attempts.

I'm kinda glad I didn't manage to convince the customer to upgrade the processor to an i5 at the same time; chances are I would have thrown the whole lot in at once and I would have been tearing my hair out at so many possibilities :) As it was I convinced them to upgrade memory to 8GB, but the moment I started having problems I pulled the memory from the testing procedure.
 

In2Photos

Platinum Member
Mar 21, 2007
2,449
2,691
136
When I was tuning the memory timings for my EXPO RAM kit on Z790 mobo, I was really surprised to see the system boot at really low timings. It got me very excited when the final timings were something like 26-24-24-50. Then I started the Windows setup and it would crash during the copying files phase. Had to relax the timings to 28-30-30-60 and it's stable enough to work most of the time. Sometimes it will reboot. Sometimes I will be forced to reboot because Windows will start acting weird or become unresponsive. But I'm not using the system for anything critical and it's stable enough for my needs.
You should wait until after Windows is installed before you start messing with ram timings. I don't even enable XMP. I leave everything default until Windows is installed and updated.
 
Jul 27, 2020
24,268
16,926
146
You should wait until after Windows is installed before you start messing with ram timings.
My crazy timings would have trashed the Windows installation, needing to install again. But one thing is clear, if you are bored and want to kill time, start tuning RAM. You won't believe how fast time will fly then.
 

tcsenter

Lifer
Sep 7, 2001
18,818
484
126
This board implements some sort of M.2 socket that supports both SATA and PCIe interface devices. I see there are a few BIOS updates for 'improve support for NVMe' or 'SSD' drives. The ASUS device compatability test list only includes two NVMe drives. This is probably a glitch/bug between the BIOS and that particular drive.
 
Last edited:
  • Like
Reactions: igor_kavinski

BoomerD

No Lifer
Feb 26, 2006
65,671
14,057
146
This board implements some sort of M.2 socket that supports both SATA and PCIe interface devices. I see there are a few BIOS updates for 'improve support for NVMe' or 'SSD' drives. The ASUS device compatability test list only includes two NVMe drives. This is probably a glitch/bug between the BIOS and that particular drive.

I browsed the manual. Is there anything connected to the SATA 5, 6 ports? Those share bandwidth with the M.2 slot. COULD be conflict there. Also, there's a "switch" in the BIOS that lets you set the priority between M.2 and SATA. (section 2.6.3)

I attempted to copy the info...the pdf is protected. If that's not the issue...then I agree. RMA the drive.
 

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
I've updated the thread title because (as I already indicated), the problem has been fixed already. Since putting in an actually new PSU I also made the drive re-copy all the customer's data as an additional test and everything has been fine.
 

mikeymikec

Lifer
May 19, 2011
19,994
14,326
136
So what did you conclude or what 'fixed' it?

The PSU. One of the symptoms was that ASUS Anti-Surge was being triggered, so after I tried another 970EP which had the same problem, I swapped back to the first 970EP and connected up a spare power supply, no more problem.

A bit of googling suggests that M.2 uses the 3V rail. Maybe the old PSU was ailing somewhat on that rail, so therefore the tests I performed weren't enough to push it? I would have thought the ATTO benchmark would have tested that rail though while writing.