Question What up with my Crucial SSD's?

Charlie98 · Feb 9, 2020

I've got 2 Crucial 500GB MX500 SSD's in 2 different PC's... one is the OS drive, and the other, after serving as the OS drive in another rig, has been relegated to scratch drive duty in my main desktop. What concerns me is the Health Status... both of these drives have fairly low hours and writes, but are already showing reduced life. I don't get it.

I've got a crappy old OCZ 60GB drive that has been thrashed... it's showing 99% life even after 8 years of use (20TB of writes over 10K hours, on about 80% less nand?)

I have a Crucial M550 525GB drive (formerly the OS drive in my desktop before the update to M.2) and it shows nothing for wear, with far more hours and writes.

Is this the result of needless TRIM? (These are/were on W7 systems.) Were the MX500's so 'budget driven' that they fail sooner?

What am I missing?

Lucretia19 · Feb 21, 2020

I found an intriguing claim that may explain why a low rate of writing by the host pc causes very high WAF:

"Wear Leveling makes sure that all blocks receive approximately the same number of P/E cycles. The best wear leveling is of the static type, whereby data blocks circulate, even if rarely written to (static data is moved)."

That's an excerpt from a blog post at: https://www.delkin.com/blog/managing-errors-nand-flash/

If true, the implication is that it's not straight-forward to try to increase an ssd's years of life by reducing the host write rate (by moving frequently written temporary files to a hard drive). "Life Consumed as a function of Host Writes" isn't necessarily reasonably monotonic; it may have a sweet spot.

It's another reason to test the theory that WAF will be helped by running an app that reads frequently from the ssd (such as a virus scanner). It's plausible that reading reduces the rate at which the ssd can circulate blocks.

It seems to me the truly "best" wear leveling algorithm should depend on the host write rate... the algorithm when write rate is low shouldn't be the same as the algorithm when write rate is high or moderate. I see no value in "over-circulating" static data blocks when write rate is low... it seems counter-productive to heavily waste scarce NAND writes in order to maximize equality of P/E cycles.

Charlie98 · Feb 21, 2020

Lucretia19 said:
it seems counter-productive to heavily waste scarce NAND writes in order to maximize equality of P/E cycles.

It does. I also wonder about something like image backup... a program like Acronis that makes a full (or even incremental) backup of data. I don't know if it's the same usage cycle as say an antivirus program, but it's worth considering.

Lucretia19 · Feb 22, 2020

Charlie98 said:
It does. I also wonder about something like image backup... a program like Acronis that makes a full (or even incremental) backup of data. I don't know if it's the same usage cycle as say an antivirus program, but it's worth considering.

A backup process would presumably spend less time reading the ssd, because it would also spend time writing. Perhaps it could be designed to write to a bit bucket (in other words, just pretend to write) but I doubt any common software has been designed to do that. Similarly, if the software is testing the integrity of a backup by comparing it to the ssd, it wouldn't write, but it would still spend only some of its time reading the ssd, because it would spend some time reading the backup.

A search process might work as well as a virus scan. Searching unindexed files on the ssd for some arbitrary pattern.

Lucretia19 · Feb 22, 2020

Does anyone know of SMART monitoring software that's able to periodically append the SMART data to a log file? I've recently started recording some SMART values multiple times per day to see whether WAF variations have a pattern that once-a-day logging won't reveal, and this has required me to manually type values displayed by HWiNFO & CrystalDiskInfo into a spreadsheet, multiple times per day. I'd prefer to automate as much of that recording as possible. It would become practical to record the values MANY times per day.

I googled to try to find SMART software capable of saving data to a file, and couldn't find even one winner.

I'd settle for a DOS-like cli program that can write to Standard Output, since I could set Windows Scheduler to periodically launch it and use ">> filename" at the end of the command line to redirect its text output to (append to) a text file.

Billy Tallis · Feb 22, 2020

Lucretia19 said:
I'd settle for a DOS-like cli program that can write to Standard Output,

smartmontools

www.smartmontools.org

smartd will log changes in SMART attributes. smartctl prints out that information, either in formatted text for human consumption, or JSON for easy processing. I haven't personally used smartd on Windows, but I use smartctl on Windows all the time.

Lucretia19 · Feb 22, 2020

Billy Tallis said:
smartmontools

www.smartmontools.org

smartd will log changes in SMART attributes. smartctl prints out that information, either in formatted text for human consumption, or JSON for easy processing. I haven't personally used smartd on Windows, but I use smartctl on Windows all the time.

Thank! Smartmontools looks promising.

It didn't occur to me to include the word 'print' in my google search. An archaic misleading term, since I don't want to output to my printer.

Lucretia19 · Feb 23, 2020

Last night I began a new experiment, and so far its WAF results are very encouraging. I've been running a batch file that commands the ssd to run an extended self-test every 26 minutes. I chose 26 because it appears the self-test takes about 25 minutes to complete and I want the ssd to spend most of its time running a self-test. My theory is that by spending a lot of time internally reading itself for the self-tests, the ssd won't be able to spend as much time running the overly aggressive Static Wear Leveling process.

The self-test idea is similar to the ideas we discussed earlier: keep the ssd busy reading by running virus scans, searches, or backups. I think self-tests have huge advantages: it's a background task within the ssd so I think its low priority doesn't interfere much with host read/write performance, and it doesn't require the host cpu to do any extra work so it doesn't heat the cpu or waste as much power as the other ideas would. However, it should be noted that the self-test raises the temperature of the ssd by about 5 degrees C. And it prevents the ssd from going to a low power state, which can be seen by its effect on Power On Hours.

One of the questions to be determined empirically is whether the self-test is a lower priority process than the Static Wear Leveling process. If it's lower priority, it might not reduce the wear leveling runtime, and thus might be useless. It's been less than a day since I began the experiment -- too early to reach a solid conclusion -- but the SMART snapshots so far are encouraging:

Date

Time

S.M.A.R.T. F7

S.M.A.R.T.
F8

Power On Hours

WAF
= 1
+ F8/F7

ΔF7
(1 row)

ΔF8
(1 row)

Recent WAF
(1 row)
= 1
+ ΔF8/ΔF7

Δ2F7
(2 rows)

Δ2F8
(2 rows)

Recent WAF
(2 rows)
= 1
+ Δ2F8/Δ2F7

Recent WAF
(3 rows)
= 1
+ Δ3F8/Δ3F7

Recent WAF
(4 rows)
= 1
+ Δ4F8/Δ4F7

02/20/2020

13:50

223,801,408

1,383,179,818

1,038

7.18

02/21/2020

05:14

223,966,265

1,388,591,917

1,040

7.20

164,857

5,412,099

33.83

09:50

224,017,483

1,389,029,824

1,041

7.20

51,218

437,907

9.55

216,075

5,850,006

28.07

14:50

224,088,982

1,389,879,812

1,042

7.20

71,499

849,988

12.89

122,717

1,287,895

11.49

24.30

18:31

224,115,581

1,391,279,760

1,043

7.21

26,599

1,399,948

53.63

98,098

2,249,936

23.94

19.00

26.78

02/22/2020

12:23

224,290,793

1,399,713,786

1,045

7.24

175,212

8,434,026

49.14

201,811

9,833,974

49.73

40.09

35.27

19:30

224,382,402

1,404,294,903

1,046

7.26

91,609

4,581,117

51.01

266,821

13,015,143

49.78

50.13

42.83

21:10

224,421,785

1,404,677,501

1,047

7.26

39,383

382,598

10.71

130,992

4,963,715

38.89

44.75

45.46

selftesting

22:13

224,430,155

1,405,023,474

1,047

7.26

8,370

345,973

42.33

47,753

728,571

16.26

39.10

44.69

02/23/2020

09:02

224,518,597

1,405,960,122

1,057

7.26

88,442

936,648

11.59

96,812

1,282,621

14.25

13.23

28.42

10:42

224,541,107

1,406,315,952

1,059

7.26

22,510

355,830

16.81

110,952

1,292,478

12.65

14.73

13.73

12:28

224,566,093

1,406,411,752

1,061

7.26

24,986

95,800

4.83

47,496

451,630

10.51

11.21

13.02

12:55

224,590,612

1,406,455,265

1,061

7.26

24,519

43,513

2.77

49,505

139,313

3.81

7.88

9.92

13:48

224,604,985

1,406,652,719

1,062

7.26

14,373

197,454

14.74

38,892

240,967

7.20

6.27

9.02

14:42

224,617,092

1,406,662,503

1,063

7.26

12,107

9,784

1.81

26,480

207,238

8.83

5.92

5.56

15:35

224,624,894

1,406,673,091

1,064

7.26

7,802

10,588

2.36

19,909

20,372

2.02

7.35

5.44

Four of the columns show Recent WAF over various-duration short periods of time... rolling averages over 1, 2, 3 or 4 rows. But those are VERY short periods of time, and might not be representative of the longer periods to come.

The .bat file uses smartctl.exe -- a utility included with Smartmontools -- to launch the self-tests. It also uses smartctl to append SMART data to a logfile every 26 minutes. (I haven't pasted every SMART snapshot into the spreadsheet.)

Here's the .bat file:

@echo off

:loop

echo ________________________________________________________
echo ________________________________________________________ >> ssdSMART.log
echo Appending SMART data to file ssdSMART.log...
echo %date% %time%
echo %date% %time% >> ssdSMART.log
smartctl -A c: >> ssdSMART.log

echo Executing 'smartctl -t long c:' to run 30 minutes extended self-test of ssd...
echo Executing 'smartctl -t long c:' >> ssdSMART.log
smartctl -t long c:

echo Waiting 1600 seconds before re-launching self-test...
timeout 1600

goto loop

The .bat file executes in a CMD.exe window that's run as Administrator, since smartctl apparently needs admin privilege to execute the self-test command.

While composing this post, five more SMART snapshots were recorded at 26 minute intervals. WAF continues to be much better than before the self-testing started. The five latest "26 minute WAFs" were 2.45, 17.00, 2.29, 2.33, and 2.70. WAF has averaged 9.18 since the self-testing began last night, and 5.44 since 10:42am this morning. It's possible my tinkering with the .bat file last night and this morning makes some of the experiment's early values unreliable and misleading; I can't say for sure that the self-tests were running near non-stop before I finished tinkering this morning. In principle I could analyze the logfile to figure out details of my tinkering, but it makes more sense to just keep collecting more data and discount the early values.

Lucretia19 · Feb 24, 2020

The periodic SMART snapshots continue to show that self-tests are effective at reducing WAF. With self-tests running, about four in five of the 26 minute WAFs are less than 3, and about one in five is close to 20. Average WAF is approximately 6.

I stopped the self-tests for about 2 hours this morning. The 2 hour WAF skyrocketed to about 50.

I don't know why WAF occasionally jumps to 20-ish. Perhaps there's a limit to how long the ssd will postpone running the static wear leveling. The priority of the static wear leveling process might get bumped up if it doesn't run for a few hours. There may be a pattern buried in the SMART logfile, but I don't know if I can find the time to hunt for it. (It would help if I could find a ready-made utility to parse the relevant data from the logfile into my spreadsheet, to eliminate the manual copy&paste operations.)

Last night I discovered the extended self-test occasionally lasts longer than 26 minutes. I also discovered smartctl.exe has a command option to force the ssd to start a new self-test even if the previous self-test hasn't completed, so I've added that option to the .bat file, and reduced the .batfile pause to 20 minutes. This should cause the self-tests to run non-stop. I'm curious to see whether this helps reduce WAF further.

The edited line in the .bat file that forces the ssd to start running a new extended self-test (aborting a test that's currently running) is:
smartctl -t long -t force c:

Lucretia19 · Feb 24, 2020

The non-stop self-tests have reduced WAF to 2.04 averaged over the last 280 minutes. Here are the fourteen most recent 20 minute WAFs:
1.32, 2.07, 1.96, 2.11, 2.22, 2.17, 2.30, 1.91, 2.37, 1.95, 1.68, 2.15, 2.25, 1.91

I call that a success. At this rate, it will take about 30 years before 180 TB have been written to NAND.

I guess the questions now are about negative side effects:

1. An increase of power consumption is obvious, but the 5 degree C rise in ssd temperature suggests the increase is small. I want to check that out, to estimate whether the increased cost of electricity over 5 years will be much less than the cost of a new ssd.

2. Eventually I'll benchmark read & write speed to see if the self-test reduces performance. There's reason to believe the ssd pauses a self-test whenever the host tries to read or write, so that performance won't be hurt at all. It's even possible performance increases, since the ssd will never need to transition from low power state.

3. The consequence of preventing the Static Wear Leveling process from running is unclear. I don't know whether there's a way to see how much wear inequality accumulates over time; the Average Block Erase Count doesn't give a clue about inequality. (Also, it's possible that ABEC and Remaining Life will be inaccurate if the SWL process doesn't run as much as the firmware designers assumed it would run.) It might be prudent to revert back to what I was doing before -- starting a new self-test every 26 minutes or so -- in case SWL needs a little time to run and isn't getting enough time with non-stop self-tests. Or maybe it would make sense to wait 10 years before reverting back to 26 minutes. I plan to look at ABEC every few days for awhile to make sure ABEC grows reasonably slowly, rather than not at all, since zero growth would be a hint that there's a problem.

I modified the .bat file so it will log SMART data every two hours instead of every 20 minutes. Copy/pasting from the log into my spreadsheet every 20 minutes has been tedious.

Still to be worked out is how to get Windows to automatically start the .bat file each time the pc is restarted.

Charlie98 · Feb 25, 2020

Dumb questions time...

1) Do you think this is only with Crucial SSD's? Or one specific Cricial hierarchy or controller? My M550, for example, does not suffer these problems. It just so happens I have 2 MX500's...

2) Are you running Crucial Executive? I uninstalled it on the PC that has the MX500 as the OS drive... but I also uninstalled a few other things, and I quit running excessive spyware scans, too, however.

Lucretia19 · Feb 25, 2020

@Charlie98:
1) I have no idea whether other brands or other Crucial models have such a high WAF. The specs for the MX500 say it has Dynamic Write Acceleration and Static Wear Leveling, and both of those features execute during ssd idle time. My guess is that at least one of those features causes the high WAF. I assume both features use proprietary algorithms coded in the firmware, and could have significantly different algorithms in other brands, other models, and other versions of the MX500 firmware. By googling I learned that Static Wear Leveling algorithms are a hot topic in academic research, which implies there's no consensus about the best way to do it. I doubt Micron publishes their algorithms; they probably view them as a trade secret.

Here's the info about Dynamic Write Acceleration in Crucial's MX500 brochure:

Dynamic write acceleration optimizes SSD performance for typical client-computing environments, where WRITE operations tend to occur in bursts of commands with idle time between these bursts.

Capacity for accelerated performance is derived from the adaptive usage of the SSD's native NAND array, without sacrificing user-addressable storage. Recent advances in Micron NAND technology enable the SSD firmware to achieve acceleration through on the fly mode switching between SLC and TLC modes to create a high-speed SLC pool that changes in size and location with usage conditions.

During periods of idle time between write bursts, the drive may free additional capacity for accelerated write performance. The amount of accelerated capacity recovered during idle time depends on the portion of logical addresses that contain user data and other runtime parameters. In applications that do not provide sufficient idle time, the device may need to perform SLC-to-TLC data migration during host activity.

Under accelerated operation, write performance may be significantly higher than nonaccelerated operations. Power consumption per-byte written is lower during accelerated operation, which may reduce overall power consumption and heat production.

I can't tell from that vague description whether Dynamic Write Acceleration implies extra writes to NAND. Maybe it's just Crucial's name for the common practice of using TLC NAND in fast SLC mode (writing one bit per cell) as a write cache when the host is trying to write at a high rate, and later (during idle times) copying the SLC bits more densely into other blocks as TLC. Where the description mentions freeing of additional capacity during idle times, I think it's talking about garbage collection. My hunch is that the WAF culprit is the Static Wear Leveling running much more than it needs to when the host pc doesn't write much, and that Dynamic Write Acceleration isn't a culprit, or at least isn't as significant. I think both you and I have low rates of host writing to the ssd, which probably means Dynamic Write Acceleration isn't needed by our ssds as often as by other people's ssds.

2) I uninstalled Storage Executive about a week ago, in case it might have been causing the WAF problem, and I haven't reinstalled it. I installed it in January hoping to enable its Momentum Cache, but couldn't get the cache to "activate" even with help from Crucial tech support, who ended up suggesting I try 3rd party SMART software (which seems like absurd advice when the goal is a write cache that minimizes random writes).

Have you been collecting snapshots of SMART data? To do so, you could download Smartmontools, extract the smartctl.exe file from it, open a CMD prompt, and run "smartctl -A c: > name_of_log_file" to output the SMART data to a file. The way to measure WAF over some period of time is to look at the changes of the SMART F7 & F8 values over that time. In other words, save a snapshot at the beginning of the period and another snapshot at the end, so you can subtract the beginning values from the ending values. WAF during the period from time Start to time End is:
1 + ((F8end - F8start) / (F7end - F7start))
It's the only way to see what your WAF has been doing recently, so you can quickly see the effects of experiments. I'd very much like not to be the only person testing WAF, so I hope you'll post a short-term WAF, perhaps using two snapshots made a day apart.

Lucretia19 · Feb 25, 2020

I've updated my .bat file that runs ssd selftests in an infinite loop. The main change is to periodically abort the selftest so the selftest runs only 19 minutes of every 20 minutes. The idea is to give the ssd a little idle time, in case it needs some idle time to stay healthy. In other words, a little Static Wear Leveling, instead of none.

I also added some ANSI color to its onscreen output, to make it easier to distinguish different kinds of information.

To use it, edit the drive and foldername in lines 5 & 6 to match the drive and folder in your system where you store smartctl.exe, and move the .bat file to that folder. (On my computer, I store them in the folder N:\fix_ssd_WAF. N: is my hard drive, but I see no reason why you can't store them on your ssd.) You'll need to launch it with Administrator privilege because smartctl won't run the selftest (will give an error message) if it doesn't have the elevated privilege.

Two numbers in the .bat file control the amount of time the selftest is allowed to run and the amount of idle time between selftests: Each selftest runs for 1140 seconds (19 minutes), followed by 60 seconds of idle time. You can edit those numbers too, if my choice of 19 minutes and 1 minute isn't optimal for your system. I don't know yet if it's optimal for mine.

Here's the content of the .bat file (except this forum doesn't display indentations, and displays multiple spaces as a single space):

@echo off
call :resetcolor
echo SSD Selftest Infinite Loop (to reduce Static Wear Leveling write amplification)

N:
cd \fix_ssd_WAF

rem Infinite loop
:loop
call :green
echo _______________
call :cyan
echo %date% %time%

call :green
echo Appending SMART data to file ssdSMART.log...
call :resetcolor
echo __________________________>> ssdSMART.log
echo %date% %time%>> ssdSMART.log
smartctl -A c: >> ssdSMART.log

rem Run ssd selftest 6 times, for 19 minutes of every 20 minutes
rem (Note that total time to execute the following line is 2 hours.)
for /L %%j in (1,1,6) do call :selftest 60 1140

goto loop

rem Subroutine (Note: 1st param is #seconds to idle, 2nd param is #seconds to selftest)
:selftest
call :green
echo Executing 'smartctl -X to ABORT ssd selftest (for 60 seconds or whatever)
call :resetcolor
echo %date% %time% ABORTING SELFTEST with 'smartctl -X'>> ssdSMART.log
call :yellow
smartctl -X c:
call :cyan
echo Waiting %1 seconds...
call :resetcolor
timeout %1 >nul
call :green
echo Executing 'smartctl -t long c:' to RUN ssd selftest...
call :resetcolor
echo %date% %time% LAUNCHING SELFTEST with 'smartctl -t long c:'>> ssdSMART.log
call :yellow
smartctl -t long c:
call :cyan
echo %date% %time%
echo Waiting %2 seconds...
call :resetcolor
timeout %2 >nul
rem Return from Subroutine
exit /B

:green
echo | set /p="[92m"
exit /B

:cyan
echo | set /p="[96m"
exit /B

:yellow
echo | set /p="[93m"
exit /B

:resetcolor
echo | set /p="[0m"
exit /B

If anyone tests it, I hope they'll post here about their experience, including some SMART snapshots and calculated WAF values with and without the .bat file running.

Charlie98 · Feb 25, 2020

Lucretia19 said:
Maybe it's just Crucial's name for the common practice of using TLC NAND in fast SLC mode (writing one bit per cell) as a write cache when the host is trying to write at a high rate, and later (during idle times) copying the SLC bits more densely into other blocks as TLC.

That's an interesting thought... as well as Page File.

The PC that the 93% drive is on was recently turned into a working PC (as well as it's primary former role as HTPC.) My wife does some work on it... but also uses it to work up vinyl cutouts and layouts. The layouts suck up a huge amount of RAM, and I imagine it's writing to the page file (I have 4.096GB set as page file,) although that doesn't explain the low amount of writes to disk, assuming page file writes are included in that figure.

Lucretia19 said:
Have you been collecting snapshots of SMART data?

I was, and then I quit. When I get back next week, I'll start to compile data just to see. That particular PC is fixing to get changed to W10 in a few weeks, and I've decided to put the M550 in it as it's OS drive, even though it has 30K+ hours on it, vs the very low 1600hr MX500., so I'll compile as much data as I can.

Lucretia19 · Feb 25, 2020

@Charlie98: I don't think the pagefile is responsible for your high WAF. People may rightly blame the pagefile for excess host writes to an ssd, but I don't see why it would cause unusually high amplified writes.

Since your ssd has a low Total Host Writes, I don't think the pagefile can be blamed for hurting your ssd in any way... neither excess host writes nor excess amplified writes.

I moved my pagefile to a hard drive many months ago, to reduce host writes to the ssd. I don't know if this made any difference, because with 16 GB of ram in my pc it should be rare that the system needs to overflow to a pagefile. Task Manager shows about half my ram is in use (mostly by Firefox, which has dozens of tabs open).

Lucretia19 · Mar 2, 2020

I wrote another .bat file that monitors the ssd Write Amplification Factor over whatever interval of time I prefer. I've been running it set to 5-minute intervals, while the .bat file that controls the ssd selftests runs in a separate window.

Each loop of the selftest controller is currently set to last 20 minutes: 19.5 minutes of ssd extended selftest plus 30 seconds of ssd idle time.

By comparing the timestamps in the two programs' display streams, I observed the following:

All of the 5-minute intervals that had very high WAF included the 30 seconds of idle time.
In some of the 5-minute intervals that included the 30 seconds of idle time, WAF was low.
WAF was about 2 during all 5-minute intervals that had no idle time, except once when WAF was about 6.

Observations 1 & 3, taken together, are strong evidence that selftests successfully limit the ssd background processes that amplify writes.

Observation 2 is (weak) evidence that the ssd block wear inequality isn't growing terribly with the "19.5 minutes of each 20" selftest duty cycle. In other words, my intuition is that it's safe for the ssd's health and data integrity if the selftest duty cycle is such that WAF is low during some of the idle intervals. If the intuition is true, then I would want to add a feature to the selftests controller program, to automatically dynamically adjust the duty cycle so that idle intervals occasionally have low WAF. This might be safer than my earlier idea, to automatically adjust the duty cycle to try to keep average WAF within a reasonable range.

I'm also thinking about shortening each selftest controller loop to make them much shorter than 20 minutes. One of the two ssd background processes that I'm aware of is the SLC-to-TLC delayed writes of the ssd's Dynamic Write Acceleration feature. (DWA uses TLC NAND operating in fast SLC mode as a write cache, and later copies the data to regular TLC NAND blocks.) I presume DWA writes to SLC occur only if the ssd's dram cache overflows. Perhaps reducing the length of each controller loop will reduce the number of times that the dram cache overflows? At some point I'll start experimenting with selftest loops of about 2 minutes, with a few seconds of idle time in each loop.

I'd appreciate people's thoughts about the above ideas.

Lucretia19 · Mar 6, 2020

I've continued to run my ssd Selftests Controller app (.bat file). which runs an infinite loop of selftests. It's been running for more than 4 days with a duty cycle of 19.5 minutes of selftests out of each 20 minutes. I've also been running my ssd WAF & SMART monitor app (.bat file) mentioned in my previous post, but with shorter loop times than mentioned in my previous post; it's currently appending the data to a logfile every 5 seconds.

The 19.5m/20 selftest duty cycle seems to be very effective at reducing WAF to an acceptable amount. WAF has averaged 3.67 over the 4+ days of the test. For some reason WAF has been getting even better as time goes on; WAF averaged 2.68 over the last 46 hours and 2.09 over the last 24 hours. Perhaps the early days of this 19.5m/20 test were affected by the previous test (which had a duty cycle of 1195 seconds of each 1200) or by the several minutes between the 1195s/1200 test and the 19.5m/20 tests. I plan to continue to monitor 19.5m/20 closely for many more days and will see whether WAF stays so low with 19.5m/20.

The monitor's logfile continues to show high bursts of background NAND writing by the ssd's FTL controller during some of the 30 second intervals of idle time between selftests, and no bursts while a selftest is active. The 5-second logging interval clearly shows the short durations of the bursts, typically much shorter than the 30-second idle interval. I still think it's reassuring that the FTL controller writes for less than 30 seconds when it has opportunities to write, and sometimes doesn't write at all when it has an opportunity... I interpret this as a sign that the ssd background processes are getting enough runtime to keep the ssd healthy. So the 19.5m/20 selftest duty cycle seems acceptable (assuming WAF continues to be good).

If I change my usage so that the pc's average writes to the ssd increases significantly, I'll need to monitor the FTL bursts closely again to make sure the ssd background processes still have enough runtime.

Unless the results go bad, I don't plan to post more updates here. I'll post updates only at the forum thread I started:

Question - Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

forums.tomshardware.com

Lucretia19 · Mar 7, 2020

@Charlie98: Crucial's tech support has agreed to replace my ssd. I think what finally convinced them there's a problem they can't handle was an email that contained F7 & F8 data from two SMART snapshots taken weeks apart, along with the calculation of the very high WAF during those weeks, using the WAF formula in a Micron article about WAF calculation.

Over the course of many emails, I kept asking tech support to forward my emails to their ssd firmware engineers, to let THEM judge whether my data indicates there's a firmware bug. Tech support never even acknowledged those requests.

I haven't yet proceeded with the replacement. I asked tech support again to forward to the firmware engineers, because it seems likely that the replacement will have the same firmware bug.

Since I'm cynical, I'm wondering how I will be able to verify that the replacement drive will be brand new. I assume they could take a used ssd and rewrite its counters to zeros, and program it with a new serial number, so that it would appear to be new even though it may actually have a lot of block wear and/or high temperature abuse.

Charlie98 · Mar 7, 2020

It's very likely you will get a refurb... that's just the way warranty works. If you don't trust it, when you get the new one, just sell it and move on...

I bought an 840Pro some years ago, with a 5 year warranty. It up and died on me in less than a year.... and I got a refurb in return. I was not happy, and haven't bought a Samsung since.

Truthfully, the engineers don't want to see a bunch of anecdotal evidence, if they can't reproduce it in the lab, I doubt they really care. Unless there have been massive returns for such a condition, I doubt it's a blip on their radar.

Lucretia19 · Mar 7, 2020

@Charlie98: Assuming they will ship the replacement ssd to me before I send them my ssd, I'll be able to decide which one to keep and which one to return to them. It would be silly to keep the replacement if its Remaining Life is less. But my question was about whether they might reset the counters in a refurbished ssd to make it appear new.

Lucretia19 · Mar 10, 2020

Crucial's tech support says they will replace the ssd with a new one, not a refurbished one. This means my question -- whether they might reset the counters and serial number in a refurbished one to make it appear new -- is not just academic.

Question What up with my Crucial SSD's?

Diamond Member

Attachments

Member

Diamond Member

Member

Member

Senior member

Member

Member

Member

Member

Diamond Member

Member

Member

Diamond Member

Member

Member

Member

Member

Diamond Member

Member

Member