NVME drives do go bad or....

Markfw · May 21, 2021

My 7452 EPYC box went to la-la land... I could boot the windows install I did, but that is it. Then I powered down and then rebooted, and ubuntu now showed up, but I got nothing but IO errors. NOT a big heatsink, just a tiny one, adata NVME. So I am in the process of re-installing Linux, and BOINC, etc... I will never get a small HSF on one again, as I am sure that was the problem.

crashtech · May 22, 2021

Time was I could tell if a solid state component was running out of spec by laying a finger on it, try that now and you might get a blister.

Hans Gruber · May 22, 2021

I am using an old B350 motherboard. I can tell you once in awhile my Pc will hang and not boot with an adata SX8100. It's rare but a reboot fixes the problem. I can't guarantee that it's the NVMe drive.

Markfw · May 22, 2021

Hans Gruber said:
I am using an old B350 motherboard. I can tell you once in awhile my Pc will hang and not boot with an adata SX8100. It's rare but a reboot fixes the problem. I can't guarantee that it's the NVMe drive.

But this is a server motherboard...... a $500 motherboard.

Assimilator1 · May 22, 2021

I don't even know if my main rig's NVMe drive has a heatsink, can't remember what it is and I'm hoping my sig will show it, lol. [edit] Yep 🙂

Fardringle · May 23, 2021

I'm not sure the exact failure rate, but out of probably 800-1000 computers at work that have NVMe drives, we've had to replace at least 4 or 5 dozen of the drives due to partial or complete failure...

So yes, they can and do fail sometimes.

TennesseeTony · May 23, 2021

Wow. That seems to be a rather high failure rate. 🙁 Only Seagate Spinners are worse.

New study on hard drive failure rates surprises many. – TechR2

www.techr2.com

Fardringle · May 23, 2021

The numbers are just rough estimates off the top of my head, but yes, the failure rate is annoyingly high. Most of the failed drives have been from Lite-On and Adata, but a surprising number have also been from Intel and Samsung. Not "pro" drives, but still more than expected from those companies.

Endgame124 · May 23, 2021

TennesseeTony said:
Wow. That seems to be a rather high failure rate. 🙁 Only Seagate Spinners are worse.

New study on hard drive failure rates surprises many. – TechR2

www.techr2.com

Here is the most current backblaze reliability report. SeaGate doesn’t look so bad at the moment.

Backblaze Drive Stats for Q1 2021 | Hard Drive Failure Rates

A look at the quarterly and lifetime failure rates of 175,443 drives, including a comparison of failure rates of HDD and SSD boot drives.

www.backblaze.com

VirtualLarry · May 25, 2021

Fardringle said:
Most of the failed drives have been from Lite-On and Adata,

Can concur, have has issues with Adata NVMe drives "dropping out" during usage of system (Deskmini 110W). Unsure if thermally-related.

Skillz · May 25, 2021

I don't remember the exact model, but I bought around 10, 1TB seagate drives back around 2005/2006 (give or take a couple years, it's been a while) and every-single-one of those drives failed within' 6 months. It was my first time buying Seagate drives and my last time.

Sadly, they sent me replacements, exact models, which also failed in a very, short time. I didn't even bother trying to RMA the drives after the first batch.

LightningZ71 · May 27, 2021

I had the same experience with 500GB EDGE branded SATA SSDs. We bought about 20 of them for our site and used them to replace spinners in ordinary, well cooled desktop PCs for generic office work. Out of the 20, 5 failed within one year, 14 had failed by the second year, and an additional two failed by the third year. Never again Edge, never again...

Assimilator1 · May 27, 2021

So 21 failed out of 20? That is bad!

Joking aside, all failed is ridiculously bad!

LightningZ71 · May 27, 2021

Whoops, I meant 12 the second year. Only one still works.

Assimilator1 · May 28, 2021

Heh, and yea that's outrageous.
You should send them a really snotty email and post them all your dead drives with a note saying 'you can bin all this crap!', or similar 😉

LightningZ71 · May 28, 2021

Oh, there were words exchanged. Both the vendor and tech support at Edge got strongly worded letters of dissatisfaction from us. It wasn't worth involving council in, given the dollar amount at play, but both were removed from our approved sources lists for our MUCH larger national level buys.

Markfw · May 28, 2021

Update !!!! I think the NVME slot on the motherboard went bad. I replaced the drive after it would not post today (accidently turned breaker off while working on the panel for mew AC), and it would not even go into bios. Code 92, PCIE devices. I removed the NVME drive and put a sata drive in it, and BAM< up and running, but I still get the code 92 upon boot.

StefanR5R · May 29, 2021

Sounds worrisome; all of the PCIe lanes are directly driven by the 7452 after all.

What board is it?

Half a year ago, I had a failure on a Supermicro H11DSi, after the board was in service for half a year. A small chip which is part of the power delivery to one out of four RDIMM sections on that board, I believe it was a PWM controller, had gone up in smoke (literally, not just figuratively). When I populated the replacement board with the components of the failed computer, I learned that the two 7452 CPUs luckily came out of the disaster unscathed, but the RDIMMs within the respective section had been killed in the event.

I wonder how the power delivery to the m.2 slot is implemented. I don't see a typical VRM section in the PCIe/m.2 area, nor is there any VRM section besides CPU VRMs, SoC VRMs, and RAM VRMs listed in the IPMI sensors. But since you get the error even if the slot is not populated, it might not be an issue with power delivery, at least not with peripheral power delivery.

mikeymikec · May 29, 2021

Skillz said:
I don't remember the exact model, but I bought around 10, 1TB seagate drives back around 2005/2006 (give or take a couple years, it's been a while) and every-single-one of those drives failed within' 6 months. It was my first time buying Seagate drives and my last time.

Sadly, they sent me replacements, exact models, which also failed in a very, short time. I didn't even bother trying to RMA the drives after the first batch.

1TB in 2005/2006?

First terabyte hard drive

The Guinness World Records Official site with ultimate record-breaking facts & achievements. Do you want to set a world record? Are you Officially Amazing?

www.guinnessworldrecords.com

I was still using 80GB drives in customers' builds in Aug 2006 🙂

The flood in Indonesia that affected hard drive manufacturers was in 2011. I was using 500GB drives as standard then, and 1TB didn't cost a great deal more. IIRC a lot of drives had reliability issues around that time. IIRC in 2010, Seagate produced some high capacity drives (over 1TB IIRC) that were notorious for high failure rates too.

Markfw · May 29, 2021

StefanR5R said:
Sounds worrisome; all of the PCIe lanes are directly driven by the 7452 after all.

What board is it?

Half a year ago, I had a failure on a Supermicro H11DSi, after the board was in service for half a year. A small chip which is part of the power delivery to one out of four RDIMM sections on that board, I believe it was a PWM controller, had gone up in smoke (literally, not just figuratively). When I populated the replacement board with the components of the failed computer, I learned that the two 7452 CPUs luckily came out of the disaster unscathed, but the RDIMMs within the respective section had been killed in the event.

I wonder how the power delivery to the m.2 slot is implemented. I don't see a typical VRM section in the PCIe/m.2 area, nor is there any VRM section besides CPU VRMs, SoC VRMs, and RAM VRMs listed in the IPMI sensors. But since you get the error even if the slot is not populated, it might not be an issue with power delivery, at least not with peripheral power delivery.

Its the EPYCD8-2T, almost all my EPYC setups use this board.

Edit: What I don't get is the 2080TI is PCIE, and it works fine. (F@H) So why would the NVME slot error out ? and Why will it not even go into bios if the NVME is populated, but when removed, it boots fine to SATA ?

Skillz · May 31, 2021

mikeymikec said:
1TB in 2005/2006?

First terabyte hard drive

The Guinness World Records Official site with ultimate record-breaking facts & achievements. Do you want to set a world record? Are you Officially Amazing?

www.guinnessworldrecords.com

I was still using 80GB drives in customers' builds in Aug 2006 🙂

The flood in Indonesia that affected hard drive manufacturers was in 2011. I was using 500GB drives as standard then, and 1TB didn't cost a great deal more. IIRC a lot of drives had reliability issues around that time. IIRC in 2010, Seagate produced some high capacity drives (over 1TB IIRC) that were notorious for high failure rates too.

Well, I did say give or take a couple years.

Perhaps it was 2007/2008. I bought them because they were real cheap (retail) and wanted to fool around with RAID arrays. Then again, they might not have been 1TB drives. Possibly 160GB drives come to think of it. I know the drives were super cheap, like ~$50 per drive or something. Brand new, retail. I got them all and played around with RAID on them, luckily I only experimented with the drives and they wasn't being used in anything where data loss would have mattered.

It was the first time I bought a Seagate drive as I only bought WD and Hitachi drives; which I still practice today. With the exception of SSDs, I will only buy WD or Hitachi hard drives.

Endgame124 · May 31, 2021

Skillz said:
Well, I did say give or take a couple years. Perhaps it was 2007/2008. I bought them because they were real cheap (retail) and wanted to fool around with RAID arrays. Then again, they might not have been 1TB drives. Possibly 160GB drives come to think of it. I know the drives were super cheap, like ~$50 per drive or something. Brand new, retail. I got them all and played around with RAID on them, luckily I only experimented with the drives and they wasn't being used in anything where data loss would have mattered.

It was the first time I bought a Seagate drive as I only bought WD and Hitachi drives; which I still practice today. With the exception of SSDs, I will only buy WD or Hitachi hard drives.

Hitachi bought the IBM desk star line… perhaps the highest failure rate line of HDDs ever.

StefanR5R · Jun 1, 2021

Endgame124 said:
Hitachi bought the IBM desk star line… perhaps the highest failure rate line of HDDs ever.

This problem was unique to Deskstar 75GXP which was made by IBM but not by HGST. Deskstar 120GXP and 180GXP, the models which crossed the acquisition, were not affected. 75GXP and 120GXP had glass platters, 180GXP was back at aluminum platters. 75GXP's problem of the coating coming lose was evidently fixed in 120GXP. AFAIK.

NVME drives do go bad or....

Moderator Emeritus, Elite Member

Lifer

Platinum Member

Moderator Emeritus, Elite Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Senior member

No Lifer

Golden Member

Platinum Member

Elite Member

Platinum Member

Elite Member

Platinum Member

Moderator Emeritus, Elite Member

Elite Member

Lifer

Moderator Emeritus, Elite Member

Golden Member

Senior member

Elite Member