• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Two technological innovations that could reduce the need for NAND?

cbn

Lifer
Mar 27, 2009
12,968
221
106
Multi-layer and Multi-bit 3DXpoint (reducing need for NAND at the top end) plus Hard drives using a form of internal RAID (reducing need for NAND at the low end)?

https://www.anandtech.com/show/1120...-dive-into-3d-xpoint-enterprise-performance/2

This frees up 3D XPoint to use a multi-layer structure, though not one that is as easy to manufacture as 3D NAND flash. This initial iteration of 3D XPoint uses just two layers and provides a per-die capacity of 128Gb, a step or two behind NAND flash but far ahead of the density of DRAM. 3D XPoint is currently storing just one bit per memory cell while today's NAND flash is mostly storing two or three bits per cell. Intel has indicated that the technology they are using, with sufficient R&D, can support more bits per cell to help raise density.

https://patents.google.com/patent/US6546499B1/en

The present invention relates in general to using data management and storage techniques and concepts from Redundant Array of Independent Disks (RAID) technology and incorporating these techniques and concepts into a single disk drive and in particular to providing and using a redundant array of inexpensive platters (RAIP) within a single disk drive.

The RAID technology provides excellent solutions for storage and high performance access of data. However, the use of multiple disk drives, at times and instances, may be cost prohibitive, expensive, and infeasible in implementing the RAID methodology for deriving the advantages therefrom for desired applications and purposes. Thus, it would be highly desired at these times and instances to incorporate the RAID concepts and techniques into a single disk drive, particularly for providing the cost advantages of using less disk drives.

Of the two techs I listed I think number 1 (the Multi-layer Mult-bit 3DXpoint) is far more likely, but at the same time I feel like number 2 (the "Internal RAID" for hard drives) would have a bigger impact. (re: With 18 heads a 3.5" hard drive with 9 platters should be able to saturate x2 wide port SAS 24Gbps).

P.S. SAS 24 Gbps is more efficient per lane than NVMe (19.2 Gbps vs 15.8 Gbps per PCIe 4.0 lane) though IOPs (increased by the internal RAID inside the hard drive) will be still be much lower.
 
Last edited:

whm1974

Diamond Member
Jul 24, 2016
9,436
1,571
126
I think RAID in a single HDD would be more prone to failure. Personally I think 3D Xpoint could replace HDDs entirely if Intel and Micron can bring up the size and reduce the price to produce affordable SSDs.
 
Feb 25, 2011
16,992
1,621
126
w/r/t the old patent application for the unfortunately-named "RAIP" technology, it would increase fault tolerance, but actually decrease performance.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
whm1974 said:
Personally I think 3D Xpoint could replace HDDs entirely if Intel and Micron can bring up the size and reduce the price to produce affordable SSDs.

I don't think that would happen any time soon (re: I don't how many layers and bits Intel/Micron 3DXpoint can scale and how fast. Also hard drives have platter density increases coming in the future through MAMR), but one interesting thing that could happen is the integration of hard drive and 3DXpoint through the controller (eg, SSHD controller using internal platter RAID + 3DXpoint for read cache).

Platters (using internal RAID) = for larger files sizes and optimized for sequential read and write.
3DXpoint (read cache) = for small file sizes (held in 3DXpoint by an algorithm) optimized for random read.

Some info below in how the solid state read cache works on current NAND based SSHDs:

https://www.anandtech.com/show/3734/seagates-momentus-xt-review-finally-a-good-hybrid-hdd

The size of the NAND was a shocker to me when I first heard it. I honestly expected something much larger. In the Momentus XT however, the SLC NAND acts exclusively as a read cache - writes never touch the NAND. The drive looks at access patterns over time (most likely via a history table of LBAs and their frequency of access) and pulls some data into the NAND. If a read request comes in for an LBA that is present in the NAND, it's serviced out of the 4GB chip. If the LBA isn't present in the NAND, the data comes from the platters.

P.S. I found out using the link from this post, that 3DXpoint can actually be designed in at least two different ways:

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1="20160276022".PGNR.&OS=DN/20160276022&RS=DN/20160276022

Some embodiments include architectures in which two or more memory array decks are vertically stacked. One or more of the stacked decks is configured to have different operational characteristics relative to others of the stacked decks. For instance, one or more of the decks may be configured to have rapid access times suitable for utilization in XIP (execute in place) applications and/or dynamic random access memory (DRAM) emulation applications, and one or more others of the decks may be configured to have stabile, possibly slower access, storage suitable for utilization in long-term storage applications. Further, one or more of the decks may be configured to have more endurance than others of the decks. For instance, one or more of the decks may be suitable for a lifetime of approximately 100,000 cycles, whereas one or more others of the decks may be suitable for about 1,000,000 cycles (in other words, at least one of the decks may have a durability of at least about 10-fold more cycling times than another of the decks). The difference between the endurance of the decks may result from structural differences between the decks. For instance, a deck with higher endurance may have reduced thermal disturb and/or other memory-loss mechanisms as compared to a deck with less endurance. However, the deck with less endurance may have other advantages (for instance, faster access times, etc.) as compared to the deck with higher endurance. Accordingly, each memory array deck may be tailored for applicability relative to specific memory functions.

So a high endurance layer design with slower access time vs. low endurance layer design with faster access time.

So maybe Intel/Micron could deploy chips with lower endurance layers but fast access time as read cache for hard drives? (And other consumer electronics/PCs)
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
w/r/t the old patent application for the unfortunately-named "RAIP" technology

I don't like that acronym either.

How about RAIS3D? Redundant array independent surfaces, 3DXpoint (This for the combined controller system of platter surfaces and solid state memory)
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Or how about RAIS3D IOPS?

Redundant array independent surfaces, 3DXpoint increasing operations per second?

Example: Using a 18 head 9 Platter Helium 7200rpm 3.5" hard drive having each platter surface read/write by a head simultaneously would increase read and write IOPS (and Sequential) hypothetically by 18x compared to a conventional hard drive (where only one head can read or write at the same time). This assuming RPM and platter density did not change. If platter density and/or RPM must decrease then IOPS (and Sequential) would decrease proportionally.

Then 3DXpoint (as read cache) increases read IOPS (well beyond the hard drive platter surface IOPS) for anything stored on it.
 
Last edited:
Feb 25, 2011
16,992
1,621
126
Or how about RAIS3D IOPS?

Redundant array independent surfaces, 3DXpoint increasing operations per second?

So now it's a shdd with an even faster cache? Woohoo. :-|

Example: Using a 18 head 9 Platter Helium 7200rpm 3.5" hard drive having each platter surface read/write by a head simultaneously would increase read and write IOPS (and Sequential) hypothetically by 18x compared to a conventional hard drive (where only one head can read or write at the same time).

I'm pretty sure we've had this simultaneous head read/write discussion before. It was tried. It's impractical.

http://www.tomshardware.com/news/seagate-hdd-harddrive,8279.html

Also, it wouldn't increase IOPS: the nine heads for nine platters don't move independently, they're on a shared actuator. Actual random I/O would be the same.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
I'm pretty sure we've had this simultaneous head read/write discussion before. It was tried. It's impractical.

http://www.tomshardware.com/news/seagate-hdd-harddrive,8279.html

That is something different. What I am thinking of (with my previous example) is 18 heads (on 9 platters, each with two sides) reading or writing simultaneous via one actuator.

Also, it wouldn't increase IOPS: the nine heads for nine platters don't move independently, they're on a shared actuator. Actual random I/O would be the same.

IOPS would increase, but there would be wasted storage for small files since each sector is 4K and "the sector size represents the minimum amount of capacity that will be consumed, even if the file being written is smaller". .

http://www.tomshardware.com/reviews/advanced-format-4k-sector-size-hard-drive,2759.html

the sector size represents the minimum amount of capacity that will be consumed, even if the file being written is smaller.

So (for example) instead of putting a single 3.5K file in single 4K sector (as a conventional hard drive with only one active head would), the hard drive with 9 platters and 18 simultaneously active heads would have the same 3.5K spread over 18 4K sectors (each sector on a different platter surface)

 
Feb 25, 2011
16,992
1,621
126
That is something different. What I am thinking of (with my previous example) is 18 heads (on 9 platters, each with two sides) reading or writing simultaneous via one actuator.



IOPS would increase, but there would be wasted storage for small files since each sector is 4K and "the sector size represents the minimum amount of capacity that will be consumed, even if the file being written is smaller". .

http://www.tomshardware.com/reviews/advanced-format-4k-sector-size-hard-drive,2759.html

So (for example) instead of putting a single 3.5K file in single 4K sector (as a conventional hard drive with only one active head would), the hard drive with 9 platters and 18 simultaneously active heads would have the same 3.5K spread over 18 4K sectors (each sector on a different platter surface)

IOPs would not increase, since with a single actuator, if you're doing a truly random workload, you're still limited by the speed of the actuator and the RPM of the HDD. (Reading one sector from one spot or 1 sector striped across 18 sectors from 18 spots at the same time, would take the exact same amount of time. Seek time is the same, etc.)

But it's a moot point. Data density has gotten to the point where you can't reliably align more than one head at a time using a single actuator, which is mostly why that wouldn't work.
 

whm1974

Diamond Member
Jul 24, 2016
9,436
1,571
126
Personally I rather have a full blown SSD anyway so we can get the full speed benefits of them.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
IOPs would not increase, since with a single actuator, if you're doing a truly random workload, you're still limited by the speed of the actuator and the RPM of the HDD. (Reading one sector from one spot or 1 sector striped across 18 sectors from 18 spots at the same time, would take the exact same amount of time. Seek time is the same, etc.)

The speed of the actuator and RPM of the HDD would also be limiting factors for 18 hard drives in RAID-0, but IOPS do increase for that by 18x:

http://www.thecloudcalculator.com/calculators/disk-raid-and-iops.html

But it's a moot point. Data density has gotten to the point where you can't reliably align more than one head at a time using a single actuator, which is mostly why that wouldn't work.

After doing a google search using "Data density has gotten to the point where you can't reliably align more than one head at a time using a single actuator" I did find this thread (from 2005) with two really informative posts by Mark R.

So unless things have changed, you are right it won't work. (EDIT: For anyone reading this for the first time see post #14 for some new tech I think might change the situation)
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Personally I rather have a full blown SSD anyway so we can get the full speed benefits of them.

I think SSDs are great devices, but the NAND supply is getting so used up by mobile devices now.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
More or less. Because there are 18 drives with 18 actuators that can do 18 different random things at the same time.

Micro Actuators for each head do exist:

https://www.hgst.com/sites/default/files/resources/HGST-Micro-Actuator-TB.pdf (produced 11/15. Revised 9/17)

When differential voltage is applied to the HMA, one piezo element expands as the other contracts. This action causes a slight rotational motion of the read-write head.

NOTE: HMA = HGST Micro actuator

Previous to this there were Milli actuators:

https://www.hgst.com/sites/default/files/resources/WP_DSA.pdf (produced 5/13. Revised 10/13)

When voltage is applied to the MA, one piezo element expands as the other contracts. This action causes a slight – less than one millionth of a meter -- but exquisitely controlled motion of the read-write head. Since the MA’s stroke at the head element is so short and the moving mass so small and light, this element’s vibrational resonance frequency is much higher than that of the VCM single-stage actuator. As a result, the DSA can rapidly and accurately position the head element over the correct data track

NOTE: MA = Milli actuator

142e94ed6c396e9bb71fde.jpg


So with this tech (based on what I know) I could imagine multiple heads being positioned independently even with just the single actuator arm (If not with Micro actuation, then Milli actuation).
 
Last edited:
Feb 25, 2011
16,992
1,621
126
Micro Actuators for each head do exist:

https://www.hgst.com/sites/default/files/resources/HGST-Micro-Actuator-TB.pdf (produced 11/15. Revised 9/17)



NOTE: HMA = HGST Micro actuator

Previous to this there were Milli actuators:

https://www.hgst.com/sites/default/files/resources/WP_DSA.pdf (produced 5/13. Revised 10/13)



NOTE: MA = Milli actuator

142e94ed6c396e9bb71fde.jpg


So with this tech (based on what I know) I could imagine multiple heads being positioned independently even with just the single actuator arm (If not with Micro actuation, then Milli actuation).

That would make drives mechanically more complex and more prone to failure. It would address the alignment issue and allow simultaneous r/w from multiple platters, but not the random workload problem. (So, no usable increase in IOPS.)
 

beginner99

Diamond Member
Jun 2, 2009
5,318
1,763
136
The HDD internal raid doesn't solve HDDs main problem on client load: Fast random reads and writes at low Queue-depth. HDDs have terrible latency in this area because they are mechanical. SSDs are a lot better but also kind of slow bandwidth wise on low queue depth. 3DXpoint here is the clear winner as it is a lot faster than SSDs at low queue depth which is ideal for consumer workloads.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
It would address the alignment issue and allow simultaneous r/w from multiple platters, but not the random workload problem. (So, no usable increase in IOPS.)

If a number of hard drives in RAID-0 increase IOPS, then why would a number of platter surfaces in "RAID-0" not increase IOPS? (Both are basically the same thing....having essentially the same latency and seek time).
 
Feb 25, 2011
16,992
1,621
126
If a number of hard drives in RAID-0 increase IOPS, then why would a number of platter surfaces in "RAID-0" not increase IOPS? (Both are basically the same thing....having essentially the same latency and seek time).

It's about randomness.

Let's say you have 500GB of SQL data striped across eight drives. If you pick eight random pieces of data, chances are pretty good they're on at least >6 different drives, in 8 completely different platters/tracks/sectors. Which means the drives can go get them all at the almost same time. Add in mirroring (like a RAID-10) and you're almost certainly going to be able to get eight pieces of data from eight different drives, simultaneously. Stack up a few million random requests so we can queue them and execute them as fast as possible, and we get near-perfect IOPS scaling. (Hooray!)

If the data is on a single drive, as you know, 8 requests is 8 operations is 8 movements of the actuator, etc. It occurs in serial.

Now, if you use micro-actuators to address the alignment problem and read entire cylinders (tossing that link in here since we haven't used the term cylinder yet - sorry if you already knew it) at once, that's great, but what are the odds that more than one of the chunks of data you want is stored in that cylinder? Very, very small. So now you have to reposition the heads and read that data. You've essentially "welded" the multiple platters into one ultradense platter with huge sectors. You get a boost to sequential speeds, but sequential speeds aren't really the reason people prefer SSDs (as we explain repeatedly every time somebody starts yet another "what about raid-0 with 10k drives" thread.)

If you give the actuators more freedom of movement, so they can read from different tracks, you're still somewhat restricted - if head #1 has to read something from track 100, then the other heads are only able to read, say, tracks 95-105 on their platters, for the particular sector. You've increased the total number of options, and might see a small boost in performance sometimes, but as a % of the total data on the drive, it's still very small. More importantly, there's no change at all to "worst case" (which is another way of saying "guaranteed") performance, which is what keeps most people up at night.

And you've done a lot of engineering work and increased the cost of your drive a lot, to help increase a performance metric that nobody really cares about anymore, since your performance-oriented customers are going to go buy SSDs and your budget customers are using their HDD arrays for cold storage, so they just want the best $/TB.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
If a number of hard drives in RAID-0 increase IOPS, then why would a number of platter surfaces in "RAID-0" not increase IOPS? (Both are basically the same thing....having essentially the same latency and seek time).

It's about randomness.

Let's say you have 500GB of SQL data striped across eight drives. If you pick eight random pieces of data, chances are pretty good they're on at least >6 different drives, in 8 completely different platters/tracks/sectors. Which means the drives can go get them all at the almost same time. Add in mirroring (like a RAID-10) and you're almost certainly going to be able to get eight pieces of data from eight different drives, simultaneously. Stack up a few million random requests so we can queue them and execute them as fast as possible, and we get near-perfect IOPS scaling. (Hooray!)

If the data is on a single drive, as you know, 8 requests is 8 operations is 8 movements of the actuator, etc. It occurs in serial.

The way I was thinking about this all data (even the smallest files) would be equally split (written) across all the platters.

This way the data stays "synchronized". Otherwise, I can imagine subsequent data written beginning to deviate further from the actuator arm on different platters more and more as the drive fills up. (ie, data that should be in the same cylinder starts getting split into different cylinders)

z_q_cylinder.gif
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
You've essentially "welded" the multiple platters into one ultradense platter with huge sectors.

That is basically what RAID-5 does because data is written to all the drives in array. (And I thought RAID-0 did the same thing in all cases).

Except in this case the "welded" multiple platters are each located in a different drive rather than all in the same drive.

And Read IOPS are increased.
 
Last edited:
Feb 25, 2011
16,992
1,621
126
That is basically what RAID-5 does because data is written to all the drives in array. (And I thought RAID-0 did the same thing in all cases).

Except in this case the "welded" multiple platters are each located in a different drive rather than all in the same drive.

And Read IOPS are increased.

No, because the individual drives in a RAID stripe can read from completely different parts of the data set at the same time, and the raid controller knows which parts of which stripe on which drive have the requested data. So it isn't going to tell 8 drives to read a chunk of data when it's going to turn around and drop seven of that on the floor - it's going to assign queued tasks to the other drives.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Assuming "Internal RAID" hard drives became available what impact do you think that would have on caching strategy?

For example, I noticed this program allows a maximum size for files to be cached.

Smallfilecache.Maxsize - This will be the maximum allowed size of any file to be cached. This is currently set to 3MB and is the recommended size.
 
Last edited:

sdifox

No Lifer
Sep 30, 2005
100,364
17,925
126
Controller level caching is going to be cheaper and more reliable than this idea.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
So now it's a shdd with an even faster cache? Woohoo. :-|

What they can do is rather than having regular hard drives with 256MB DRAM-based buffers, they can move to a 3D XPoint based buffer at 1-2GB.

3DXpoint here is the clear winner as it is a lot faster than SSDs at low queue depth which is ideal for consumer workloads.

They can also replace DRAM and SLC caching by using 3D XPoint caching instead.

The limit to that is really whether IMFT is willing to make 3D XPoint dies at 8Gbit sizes. They are at 128Gbit for the smallest one now. It seems manufacturers use older deprecated DDR2 or DDR3 chips for the buffers, so maybe in about a decade we'll see hard drive and SSD vendors use 8GB first generation 3D XPoint chips for buffers, with their 80TB hard drive and 15TB SSDs.
 
  • Like
Reactions: cbn