What SSDs are coming with 3D QLC NAND?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

cbn

Lifer
Mar 27, 2009
12,968
221
106
So has that made you think that even sub-20nm 2D MLC NAND can be fairly resistant to voltage drift?

Ok, then here's the surprise I had in store for you: The drive using sub-20nm 2D MLC NAND does indeed use sub-20nm 2D MLC NAND but it is three-bit MLC NAND or as it is more commonly known TLC NAND.

Unexpected?

Wow, that is amazing.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
I wonder what capacity and number of layers the Intel/Micron 3D QLC die will have?

For 32L (Gen1) they had 256Gb 3D MLC also configurable as 384Gb 3D TLC.

Then for 64L (Gen2) so far they have 256Gb 3D TLC (ie, small die) and 512Gb 3D TLC die

Could it be Intel/Micron also release a 64L (Gen2) 768Gb 3D TLC that can be converted to both 512Gb 3D MLC and 1024Gb 3D QLC?

P.S. 64L (Gen2) 768Gb 3D TLC should have the same die size as the 32L (Gen1) 384Gb TLC.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Interestingly, Phison is incorporating Machine learning into their SSD controllers:

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS81L00vNzA3ODE4L29yaWdpbmFsL1FMQy1QbGFuLmpwZw==


Some more info on why they are doing that:

http://www.taipeitimes.com/News/biz/archives/2017/09/25/2003679068

Using AI and machine learning, the company aims to create more “smart” controllers that enable faster data reading and writing speeds than regular SSD controllers, in addition to fixing errors and lowering the error rate, Pua said.

To fill the talent gap and enhance its technological capabilities, Phison is partnering with National Chiao Tung University to launch an AI lab that will develop cutting-edge technologies such as deep reinforcement learning for digital signal processors, he said.

The technology will help develop a new SSD controller that can enable self-repair, accelerate the drive’s reading and writing speeds, and help extend the drive’s lifetime, the company said.

And here is some info on another company using machine learning with SSDs:

http://www.infostor.com/disk-arrays/ssds-to-benefit-from-machine-learning-in-data-storage.html

By changing the default read and write voltage of the NAND and various other flash register settings – either once or several times during the life of an SSD, it may be possible to increase the endurance of the storage device significantly. And in fact endurance is not the only factor that could be enhanced. Other settings could increase the performance of the SSD, and still others its data retention capabilities.

Ultimately, choosing these settings is an optimization problem: endurance may be improved by altering settings without effecting anything else, for example, but in most case it can be improved only at the expense of performance or retention, or perhaps both. In practice the easiest trade-off is endurance versus retention. That's because many manufacturers choose settings that offer retention that can be measured in months, but in a data center environment an SSD may only be required to retain data for a day or two when the power is off. By altering the settings to lower the retention time big gains can often be made in endurance.

But here's the problem. Conventional 2D NAND may have 30 – 50 settings, and there is a highly complex interaction between them. That means that changing one can have a large and unexpected effect on another, making it very hard to optimize the settings manually to achieve a particular desired outcome. And when it comes to 3D NAND – the vertically stacked arrays of cells that most NAND makers are switching to – there can be thousands of settings. That makes the optimization fiendishly complex and practically impossible - for humans, at least.

And that's where machine learning in storage systems comes in to the equation: humans may not be able to optimize the thousands of NAND settings in 3D NAND, but it's the type of exercise that machine learning systems excel at.

Using machine learning, Coyle says the company's technology is able to optimize the NAND's register settings for endurance or performance or data retention, or even produce a dynamic set of settings for a two phase life: the first configuration optimized for performance, and then when performance starts to drop it as the NAND ages it can be optimized for long term storage.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
In practice the easiest trade-off is endurance versus retention. That's because many manufacturers choose settings that offer retention that can be measured in months, but in a data center environment an SSD may only be required to retain data for a day or two when the power is off. By altering the settings to lower the retention time big gains can often be made in endurance.

Intel's datacenter oriented SSDs used eMLC, which were rated at 10x the endurance of consumer SSDs. The trade-off was that it had only 3 months retention time.

Intel's client SSDs are rated at 12 months retention time.
 

Glaring_Mistake

Senior member
Mar 2, 2015
310
117
126
String stacking?

....Should probably have thought of that.
I shall blame the late hour.

Interestingly, Phison is incorporating Machine learning into their SSD controllers:

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS81L00vNzA3ODE4L29yaWdpbmFsL1FMQy1QbGFuLmpwZw==


Some more info on why they are doing that:

http://www.taipeitimes.com/News/biz/archives/2017/09/25/2003679068





And here is some info on another company using machine learning with SSDs:

http://www.infostor.com/disk-arrays/ssds-to-benefit-from-machine-learning-in-data-storage.html

Interesting that controllers may get smarter and even may enable features like self-repair.

That you can trade retention for endurance is something I point out sometimes when endurance tests are brought up.
People tend to look at how many writes a drive can endure but tend to neglect to reflect on how much the retention has suffered after all those writes.


Intel's datacenter oriented SSDs used eMLC, which were rated at 10x the endurance of consumer SSDs. The trade-off was that it had only 3 months retention time.

Intel's client SSDs are rated at 12 months retention time.

Think eMLC NAND gives up some retention for higher endurance as you say but are there no other Intel SSDs for datacenters using different NAND with the same retention time?
Because I would expect that since JEDEC specifies a three months retention time for enterprise-classed SSDs.

One thing I find interesting that is often mentioned on how they improve the endurance for eMLC NAND is that they've slowed down write speeds compared to usual MLC NAND.
Now it is not like endurance is improved simply because it will have written less than it would have otherwise.
Rather that if two drives both write the same amount of data but at different speeds then the one with slower write speeds will be less worn (everything else being equal).


Wow, that is amazing.

Did not expect 2D TLC NAND below 20nm to fare so well when compared to 3D TLC NAND, did you?
 
  • Like
Reactions: cbn

cbn

Lifer
Mar 27, 2009
12,968
221
106
Did not expect 2D TLC NAND below 20nm to fare so well when compared to 3D TLC NAND, did you?

I think I found the answer here:

https://www.micron.com/about/blogs/2015/may/addressing-data-retention-in-ssds

The notion that a NAND-based storage device will retain data for a bounded time period under very specific environmental conditions is not a new one and, in fact, it is very well-defined. What is also well-understood is that the data retention attribute changes over the lifetime of a NAND flash-based device. When a NAND flash device is new from the factory, it can retain data in an unpowered state (at specific temperature conditions) for many years. However, data retention specs are not given for a new device. It is very important to note that published specifications for data retention describe the behavior of the device only at the end of its prescribed service life.

TBW ratings are also referred to as endurance ratings. Endurance and data retention are very strongly linked. If the SSD in your server or data center has a lifetime endurance rating of 7 petabytes written, this does not mean that the SSD will fail when it writes that 8th petabyte. Rather, it means that when that 7th petabyte is written, the data retention in an unpowered state at a specific temperature (40°C or 104°F) is down to that three-month enterprise specification.

Similarly, for a client SSD, JEDEC set the data retention spec at one year, which allows the user much more time to go back to get data from an unused device, if needed. Again, keep in mind that this one-year spec only applies to the end of the SSD’s lifetime. When the SSD is new or of modest age, measured in TBW, the data retention is much longer. In our experience, it is very rare for the typical Windows or MacOS user to get anywhere near the TBW specification during the lifetime of the host computer system. In almost all client computing cases, the end user can be comfortable knowing that the SSD should have several years of data retention, until the drive approaches the end of the drive service life.

So I am thinking (all things being equal) the 3D TLC NAND in question was lower bin than the 2D TLC NAND and also optimized for performance and endurance rather than data retention.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Rather that if two drives both write the same amount of data but at different speeds then the one with slower write speeds will be less worn (everything else being equal).

So maybe a SSD controller with a higher amount of channels could help? (re: more NAND used in parallel would more easily hit certain write targets)

Two examples I am thinking of right now:

1. PCIe 3.0 x 4 quad channel controller with four packages (each with four 64L 512Gb 3D TLC dies)

2. PCIe 3.0 x 4 octa channel controller with eight packages (each with four 64L 256Gb 3D TLC dies).

Of the two options above the second one has 2x the parallel at the same capacity. So for #2 I am thinking it could either have 2x the write of the first option at the same endurance and retention.....or it could be tuned for the same level of write with either greater endurance or retention (or some combination of both greater endurance and greater retention).

If that is true, I also wonder how a PCIe 3.0 x 4 quad channel controller with four packages (each with four 64L 512Gb 3D MLC dies) would compare? Or maybe how a PCIe 3.0 x 4 sixteen channel controller would factor in (with various combinations of NAND)?
 
Last edited:

Glaring_Mistake

Senior member
Mar 2, 2015
310
117
126
I think I found the answer here:

https://www.micron.com/about/blogs/2015/may/addressing-data-retention-in-ssds







So I am thinking (all things being equal) the 3D TLC NAND in question was lower bin than the 2D TLC NAND and also optimized for performance and endurance rather than data retention.

That depends, both drives were pretty worn when these tests were run (like 80% of P/E cycles used for the one using 3D TLC NAND and 90% for the one using 2D TLC NAND).
So, both have seen some wear but the number of P/E cycles used is different since they're not specified for the same number of P/E cycles with the drive using 3D TLC NAND (unsurprisingly) the one with the higher endurance.
The same 3D TLC NAND is supposedly able to handle significantly more wear than this particular drive is specified for so it may not be the best binned NAND.
At the same time this was a pretty steep drop so we'll see how even this bit more conservative estimate will save it from any corruption.
It may be the other way around with the drive using 2D TLC NAND however since it still holds up pretty well.
In fact it performed about as well at 90% wear as it did before even a single P/E cycle was consumed which is also not something you'd be likely to think would be the case with 2D TLC NAND (at such a small lithography).

Worth mentioning is also that the drive using 3D TLC NAND was worn out more quickly (from 0% to like 80%) while the other had retention tests at 25,50 and 75% before reaching 90% wear and as mentioned that might have worked in favor of the drive using 2D TLC NAND.
Not sure how much of a difference it would have made though.

And of course there's also the fact that there's several factors determining how a drive is going to behave.
For example 2D FG does not behave like 3D CT or even like 3D FG.
Even just the controller and how well tuned its algorithms are can make a pretty big difference.

So maybe a SSD controller with a higher amount of channels could help? (re: more NAND used in parallel would more easily hit certain write targets)

Two examples I am thinking of right now:

1. PCIe 3.0 x 4 quad channel controller with four packages (each with four 64L 512Gb 3D TLC dies)

2. PCIe 3.0 x 4 octa channel controller with eight packages (each with four 64L 256Gb 3D TLC dies).

Of the two options above the second one has 2x the parallel at the same capacity. So for #2 I am thinking it could either have 2x the write of the first option at the same endurance and retention.....or it could be tuned for the same level of write with either greater endurance or retention (or some combination of both greater endurance and greater retention).

If that is true, I also wonder how a PCIe 3.0 x 4 quad channel controller with four packages (each with four 64L 512Gb 3D MLC dies) would compare? Or maybe how a PCIe 3.0 x 4 sixteen channel controller would factor in (with various combinations of NAND)?

I think that should work, though I'm not entirely sure of exactly how it works.

Here's some info related to that however:

"Regarding point two, it is known for NAND Flash memory that damage created with each p/e cycle partially recovers or heals during the delays between p/e cycles (see JESD22-A117).
Therefore, an endurance stress test that is performed over just a few weeks results in more net damage than would be experienced in normal use over several years.
The main effect of this higher net damage is to reduce the data retention capability compared to the capability that would exist in real use."

JESD218B-01

"The degradation rate of EEPROM products may depend strongly on the cycling frequency.
That is because some cycling-induced damage mechanisms exhibit partial recovery in between cycles; increasing the cycling rate may prevent that recovery and lead to early failures.
Typical recoverable degradation mechanisms are the detrapping of charge trapped during cycling in the transfer-dielectric layer of floating
gate devices, or detrapping of excess trapped charge in trapping-based non-volatile memories."

JESD22-A117C
 
  • Like
Reactions: cbn

Glaring_Mistake

Senior member
Mar 2, 2015
310
117
126
I noticed something about the Intel consumer SSDs using 3D TLC NAND a while ago.
That is that UBER for Intel 545s, 600P and 760P is <1 sector per 10^15 bit read.
Yet for example the Intel 540s (using the same SK Hynix 16nm TLC NAND found in the drive that had trouble with voltage drift according to VirtualLarry) has an UBER of <1 sector per 10^16 bit read.

Find it a bit odd that SSDs using their 3D TLC NAND (which should barely need any ECC according to TheMemoryGuy) should have a worse UBER rating than one using third party (2D) 16nm TLC NAND.
Relaxed ECC?
Maybe they consider an UBER of <1 sector per 10^15 bit read sufficient for a consumer drive?

Anyway, don't know why UBER has worsened but I noticed that and thought it was interesting since Intel used to have an UBER of <1 sector per 10^16 bit read for their consumer drives.


Of course it seems that UBER may not be quite 100% accurate: "The number of SSDs in the sample shall be sufficient to establish that both the FFR and UBER requirements are met at 60% confidence."
JESD218B-01

Don't know what degree of confidence is required in tests with other components but odds a bit better than a coin toss doesn't instill me with a lot of confidence.



Also, finally got my hands on an Intel 545s 128GB.
It will take a while before I get any results regarding to voltage drift but I've tested it a bit with like AS-SSD and CDM.
Have noticed that the SLC-cache can behave a bit oddly.

It's a bit small (I'd estimate that it's about 1GB) but sequential write speeds vary depending how you test it or even between tests with the same one.
But just look at this result with CDM:

20180210171856Crysta.png


Not even 400MB/s with a 50MiB testsize?
That is ....poor.
 
  • Like
Reactions: cbn