Bad sectors on SSDs

AdamK47

Lifer
Oct 9, 1999
15,652
3,517
136
I found a great utility that can read the SMART information on each drive in the array on my LSI controller called HD Sentinel. One of the surprising things I found is that one of my drives has 8 bad sectors and another one has 1 bad sector. I wish I had found this utility earlier. That way I could have seen if these bad sectors were when the drive was made or if it developed over time. Should I be concered with any bad sectors on an SSD? I know that over-provisioning is supposed to take care of this. Doing some searching through Goolgle showed most people with problems having dozens or hundreds of bad sectors.

HDSentinel.png


The start/stop count is high due to tweaking tests with the ASRock Extreme11 motherboard. :) It power cycles 3 to 4 times if overclocking or memory settings fail POST. I'm done with that sort of tweaking... for now.
 

ryderOCZ

Senior member
Feb 2, 2005
482
0
76
The only thing to be concerned about is if the numbers grow in a short amount of time. If you check on a weekly basis and the numbers don't change, you are probably fine. If, after a week, you see changes, there is probably an issue.
 

Coup27

Platinum Member
Jul 17, 2010
2,140
3
81
How common is it for SSDs to get bad sectors?
 
Last edited:

IndyColtsFan

Lifer
Sep 22, 2007
33,655
687
126
On another note, thanks for the tip about HD Sentinel. I just installed it on my home server so I could review all the drives in my array and it found one that is close to failing.
 

AdamK47

Lifer
Oct 9, 1999
15,652
3,517
136
The only thing to be concerned about is if the numbers grow in a short amount of time. If you check on a weekly basis and the numbers don't change, you are probably fine. If, after a week, you see changes, there is probably an issue.

Thanks, I'll keep an eye on it and see if the count goes up.

On another note, thanks for the tip about HD Sentinel. I just installed it on my home server so I could review all the drives in my array and it found one that is close to failing.

No problem. Are you using the trial? I found a 40% off coupon code and bought the standard edition for under $14. It's well worth it.
 

rsutoratosu

Platinum Member
Feb 18, 2011
2,716
4
81
I had to rma 3 LG ssd drives (dell oem) it behaves just like a regular hdd, can't read certain sectors, just hangs and time out, you just dont hear the cranking sound like a regular motor hd.. so at first I was like wtf, the pc is locked up, then I got the can't read from disk errors..
 

ryderOCZ

Senior member
Feb 2, 2005
482
0
76
How common is it for SSDs to get back sectors?
It is not common for the SSD to develop bad sectors.
It is common that new Nand may have some, which are found and noted during the initial manufacturing and bad block scan. Which is what the OP is most likely seeing.
 

Makaveli

Diamond Member
Feb 8, 2002
4,915
1,503
136
This is a pretty cool app.

Just installed it on my Vaio Z which has toshiba SSD's in Raio 0.

So far no bad sectors.

I use the intel toolbox at home but will install this later and see if there are any bad sectors there.

 

IndyColtsFan

Lifer
Sep 22, 2007
33,655
687
126
No problem. Are you using the trial? I found a 40% off coupon code and bought the standard edition for under $14. It's well worth it.

Yeah, I just connected to my server at home and installed the trial. Where can I find the coupon?
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
How common is it for SSDs to get back sectors?

It's pretty common.

I've had a number of SSDs. Every single one gets about 1 or 2 new bad sectors every couple of months.

Generally, during manufacturing neither the flash chips nor the assembled drives get rigorously tested. Typically the flash chips are "batch tested". A batch of maybe 10,000 chips are made, and 100 chips are selected at random and thoroughly tested. If all meet specifications, then it's assumed that the rest of the batch will also be OK, so they get minimal testing, just to make sure that there are no DOAs.

Testing costs money, so the manufacturers do just enough testing to prevent DOAs. You should expect the drive to find a few bad sectors in the first few weeks or months of use.
 
Last edited:

Makaveli

Diamond Member
Feb 8, 2002
4,915
1,503
136
Another thing I like about app is it shows how many writes per day.



This is a very good application you found adam.
 

AdamK47

Lifer
Oct 9, 1999
15,652
3,517
136
It's pretty common.

I've had a number of SSDs. Every single one gets about 1 or 2 new bad sectors every couple of months.

Generally, during manufacturing neither the flash chips nor the assembled drives get rigorously tested. Typically the flash chips are "batch tested". A batch of maybe 10,000 chips are made, and 100 chips are selected at random and thoroughly tested. If all meet specifications, then it's assumed that the rest of the batch will also be OK, so they get minimal testing, just to make sure that there are no DOAs.

Testing costs money, so the manufacturers do just enough testing to prevent DOAs. You should expect the drive to find a few bad sectors in the first few weeks or months of use.

I can see that being the case with the NAND manufacturer, but what about the SSD manufacturer? RyderOCZ mentioned a "bad block scan". What type of test is performed on the SSD once it's completely assembled?

I also have another question that's been on my mind. From what I know, when a sector or any part of the NAND fails, the location of the bad sectors is noted by the drive and blocked off. Part of the over-provisioning is then allocated to take its place. What happens to the data that was in this location? If the sector is bad, how does it know what data was there to be moved over to the over-provisioned location?
 

Makaveli

Diamond Member
Feb 8, 2002
4,915
1,503
136
It was really the only app I found that could read SMART information from individual drives in a RAID array using an LSI controller. :)

Its great because it provides most of the information I get with the Intel toolbox but works for everyone that doesn't have an intel SSD.

This thread is subbed I want to see if you get any more bad sectors over the course of the next few weeks.

I would also like to see some post with users that have SSD's in service for 2+ years to see if anyone else has seen alot of bad sectors and if its more so on the brand or something else.
 

Coup27

Platinum Member
Jul 17, 2010
2,140
3
81
My 13 month old Samsung 830 128GB:

Untitled.jpg


I'm surprised my writes average out to ~1GB per day. I didn't think they would be that high. Samsung Magician v3.2 is also reporting 99% life remaining.
 

AdamK47

Lifer
Oct 9, 1999
15,652
3,517
136
Is that a registered copy of the Pro edition? How much did the Pro edition cost you?
 

Makaveli

Diamond Member
Feb 8, 2002
4,915
1,503
136
My 13 month old Samsung 830 128GB:

Untitled.jpg


I'm surprised my writes average out to ~1GB per day. I didn't think they would be that high. Samsung Magician v3.2 is also reporting 99% life remaining.

Yours shows one extra piece of info I don't see in my own shots or adams.

Life time writes. Which I would assume is the same as Host writes that I see in the intel toolbox.

Since adam and I are both using Raid array I think they may be the reason we don't see that.

The intel toolbox allows me to see host writes for each drive in the array!
 

Coup27

Platinum Member
Jul 17, 2010
2,140
3
81
Is that a registered copy of the Pro edition? How much did the Pro edition cost you?
£29.83 UK pounds. I stuck it on works credit card. As I'm moving every machine (except our SBS server) over to Samsung SSD's, I'll say its an "essential monitoring tool" :p
 

AdamK47

Lifer
Oct 9, 1999
15,652
3,517
136
£29.83 UK pounds. I stuck it on works credit card. As I'm moving every machine (except our SBS server) over to Samsung SSD's, I'll say its an "essential monitoring tool" :p

Excellent!

Although my intent in this thread was to ask about bad sectors on SSDs, I feel inclined as someone who works in software development to kindly post the following:

If you find use in the trial version of this software, and want to continue using it, please pay for it. The standard edition with the 40% coupon code is only $13.
 

IndyColtsFan

Lifer
Sep 22, 2007
33,655
687
126
Its great because it provides most of the information I get with the Intel toolbox but works for everyone that doesn't have an intel SSD.

This thread is subbed I want to see if you get any more bad sectors over the course of the next few weeks.

I would also like to see some post with users that have SSD's in service for 2+ years to see if anyone else has seen alot of bad sectors and if its more so on the brand or something else.

I'll install it on my PC when I get home tonight and check my SSD numbers. I've had one of my SSDs in service for about 1.5 years.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
I can see that being the case with the NAND manufacturer, but what about the SSD manufacturer? RyderOCZ mentioned a "bad block scan". What type of test is performed on the SSD once it's completely assembled?

I also have another question that's been on my mind. From what I know, when a sector or any part of the NAND fails, the location of the bad sectors is noted by the drive and blocked off. Part of the over-provisioning is then allocated to take its place. What happens to the data that was in this location? If the sector is bad, how does it know what data was there to be moved over to the over-provisioned location?

A bad block scan is only relatively minimal testing. Sure, you check each block once. But the problem with flash is that dead cells are just one particular type of fault, you get weak cells or leaky cells. They may only corrupt data with a certain probability (e.g. 10% of writes get corrupted at that cell), they may only get corrupted with certain data patterns, or the cell might be weak and fade prematurely, if there are a lot of "partial-page writes" (which weaken data already in the page).

Testing flash in a very robust way would need many read/write passes. A single write/read pass will pick out a lot of failures, so that they can be remapped, but may leave the weaker cells undetected. The problem is that multi-hour burn-in testing is simply too expensive for most consumer level drives, although I can imagine for enterprise drives this might be done.

In terms of the reallocation issue, flash is, by design, an unreliable storage medium. There is an expected bit-error rate for reading flash - which is in the region of 0.001-0.1% (normally towards the bottom end, but a well worn sector on a bad day, might be towards the top end).

In other words, that means in a typical read of an 8K sector, you might expect 1-50 corrupted bits. To get around this each sector is significantly oversized - so, although the flash is specified as having "8k" sectors, meaning 8192 bytes in a sector, there are typically 8640 bytes available in each sector. In the extra 448 bytes, the SSD controller will store ECC/parity data (and, usually, various internal control data, write counts, so that it can keep track of write amplification, wear levelling, etc.).

The ECC is calculated from the original data by the controller when it writes the data to the sector. In the event of data corruption, the ECC can be used to detect and repair the corruption before the controller sends the data to the host PC.

The SSD controller will monitor the error rates in individual sectors, every time it reads a sector. If the error rate in a particular sector is high and, if the data was allowed to deteriorate would be at risk of exceeding the ECC's repair capability, then the SSD controller may, after recovering the data with ECC, copy the recovered data to a spare area and retire the failing sector.

In the unlikely event that a sector had suffered catastrophic corruption, and the ECC could not recover it, then the only option to the drive is send the "bad sector" message to the OS. In the event of major corruption, the drive has to 'fess up to the OS that it has lost the data (which is what "bad sector" means). If a drive testing tool zeros out, or saves new data to the "bad" sector, this will usually trigger a reallocation event. (There's no point reallocating a sector, if the data has been lost - might as well wait for fresh data to come along).
 

IndyColtsFan

Lifer
Sep 22, 2007
33,655
687
126
I'll install it on my PC when I get home tonight and check my SSD numbers. I've had one of my SSDs in service for about 1.5 years.

Ok, I installed it and found no bad blocks on my two SSDs: 80 GB Intel 320 which serves as the OS (1.05 TB of lifetime writes; approximately 1 year old) and a 120 GB Intel X25-M (819 GB lifetime writes and almost 1.5 years in use; former OS drive now used for frequently played games).
 

AdamK47

Lifer
Oct 9, 1999
15,652
3,517
136
A bad block scan is only relatively minimal testing. Sure, you check each block once. But the problem with flash is that dead cells are just one particular type of fault, you get weak cells or leaky cells. They may only corrupt data with a certain probability (e.g. 10% of writes get corrupted at that cell), they may only get corrupted with certain data patterns, or the cell might be weak and fade prematurely, if there are a lot of "partial-page writes" (which weaken data already in the page).

Testing flash in a very robust way would need many read/write passes. A single write/read pass will pick out a lot of failures, so that they can be remapped, but may leave the weaker cells undetected. The problem is that multi-hour burn-in testing is simply too expensive for most consumer level drives, although I can imagine for enterprise drives this might be done.

In terms of the reallocation issue, flash is, by design, an unreliable storage medium. There is an expected bit-error rate for reading flash - which is in the region of 0.001-0.1% (normally towards the bottom end, but a well worn sector on a bad day, might be towards the top end).

In other words, that means in a typical read of an 8K sector, you might expect 1-50 corrupted bits. To get around this each sector is significantly oversized - so, although the flash is specified as having "8k" sectors, meaning 8192 bytes in a sector, there are typically 8640 bytes available in each sector. In the extra 448 bytes, the SSD controller will store ECC/parity data (and, usually, various internal control data, write counts, so that it can keep track of write amplification, wear levelling, etc.).

The ECC is calculated from the original data by the controller when it writes the data to the sector. In the event of data corruption, the ECC can be used to detect and repair the corruption before the controller sends the data to the host PC.

The SSD controller will monitor the error rates in individual sectors, every time it reads a sector. If the error rate in a particular sector is high and, if the data was allowed to deteriorate would be at risk of exceeding the ECC's repair capability, then the SSD controller may, after recovering the data with ECC, copy the recovered data to a spare area and retire the failing sector.

In the unlikely event that a sector had suffered catastrophic corruption, and the ECC could not recover it, then the only option to the drive is send the "bad sector" message to the OS. In the event of major corruption, the drive has to 'fess up to the OS that it has lost the data (which is what "bad sector" means). If a drive testing tool zeros out, or saves new data to the "bad" sector, this will usually trigger a reallocation event. (There's no point reallocating a sector, if the data has been lost - might as well wait for fresh data to come along).

Thanks Mark. That was very well explained.