Why SSD RAID-0 does not scale in 4k random read

tunggad · May 14, 2010

We saw a lot people experimenting RAID-0 with SDD (from the cheap Intel x25-v to the luxury crucial realssd c300), with 2,3,4 or more disks.

While the others benchmark nr. (seq. read/write, 4k random write) do increase/scale with the number of the disks in RAID-0 array, but the 4k RANDOM READ (queue depth = 1) always stays at the same as with regular one disk or only increases minimal.

Can someone give some reasons or explanation for this behavior?

In ssd world is 4k ramdom read general the most important operation from 4 (seq. read/write, 4k random write) isn't it ?

GlacierFreeze · May 14, 2010

tunggad said:
Can someone give some reasons or explanation for this behavior?

In ssd world is 4k ramdom read general the most important operation from 4 (seq. read/write, 4k random write) isn't it ?

Dunno but the thread title insinuates that you were gonna tell us.

Idontcare · May 14, 2010

raid-0 improves bandwidth (the time it takes to transfer the 4KB file) but does nothing to improve latency (the time it takes the controller to locate the 4KB file).

This is true whether you are on SSD or spindle.

4KB file bandwidth (aka IOPs) improved with SSD over spindle mostly because the latency improved, for all files.

If you want the latency to further improve, as is necessary for small file transfers to increase, then you need to improve the memory storage technology as well as the manner in which you access it.

This is what proprietary hardware like the fusion-IO guys bring to the table and is why there is a cost premium involved.

(and the technical truth is that effective latency actually degrades ever so slightly with raid-0 because you add into the equation the time penalty overhead of the extra controller being in the loop as well, the SSD latency doesn't change but as far as your cpu is concerned it does take a wee bit longer to get that 4KB file off a raid controller connected to the SSD versus getting it off the SSD directly)

tunggad · May 14, 2010

Idontcare said:
raid-0 improves bandwidth (the time it takes to transfer the 4KB file) but does nothing to improve latency (the time it takes the controller to locate the 4KB file).

This is true whether you are on SSD or spindle.

4KB file bandwidth (aka IOPs) improved with SSD over spindle mostly because the latency improved, for all files.

If you want the latency to further improve, as is necessary for small file transfers to increase, then you need to improve the memory storage technology as well as the manner in which you access it.

This is what proprietary hardware like the fusion-IO guys bring to the table and is why there is a cost premium involved.

(and the technical truth is that effective latency actually degrades ever so slightly with raid-0 because you add into the equation the time penalty overhead of the extra controller being in the loop as well, the SSD latency doesn't change but as far as your cpu is concerned it does take a wee bit longer to get that 4KB file off a raid controller connected to the SSD versus getting it off the SSD directly)

Ok, thank you for the clear explanation

With QUEUE DEPTH = 1 it does not matter how many disks we have in the raid-0 array. Because there is only one 4k request being sent to the array at the same time and this 4k request can be only processed by one disk of the array, therefor it takes a time as long as = raid controller overhead + access latency of one disk. That's why RAID-0 does not improve 4k random read with DP=1.

But should ssd raid-0 improve 4k random read with higher queue depth = 4, 8, 16? right?

Idontcare · May 14, 2010

tunggad said:
But should ssd raid-0 improve 4k random read with higher queue depth = 4, 8, 16? right?

Yes that is correct. (see top three results in following graph)

http://benchmarkreviews.com/index.p...sk=view&id=513&Itemid=60&limit=1&limitstart=4

sub.mesa · May 14, 2010

Some results of a RAID0 with 5 Intel X25-V 40GB, random read on FreeBSD :

This benchmark reads with request size between 4KiB and 128KiB; thus the average request size or 'file size' is little over 64KiB. For SSDs, that means its more of a sequential read benchmark; HDDs would have much more trouble with this.

1-queue = 240MB/s
2-queue = 430MB/s
4-queue = 684MB/s
16-queue = 1100MB/s
64-queue = 1233MB/s
128-queue = 1234MB/s
256-queue = 1236MB/s

As you can see, you really need a higher queue depth to unleash the power of RAID0. The more RAID0 member disks you have, the higher queue depth you need to gain anything from RAID0 in random read situations.

When we look at random write, however, things are different. Here the disks can buffer and finish the request instantly, just like HDDs with write buffering enabled do. The result is that multiple RAID0 members can be at work at the same time since they keep their own queue of writes to do. This is not possible with random read as this request can only be finished after supplying the requested data; thus no shortcuts possible here.

That's the reason why random write with queue depth=1 still scales with RAID0. Multiple queue depth might still allow higher speeds due to preventing any SSD from being without work at any particular time.

tunggad · May 15, 2010

sub.mesa said:
Some results of a RAID0 with 5 Intel X25-V 40GB, random read on FreeBSD :

This benchmark reads with request size between 4KiB and 128KiB; thus the average request size or 'file size' is little over 64KiB. For SSDs, that means its more of a sequential read benchmark; HDDs would have much more trouble with this.

1-queue = 240MB/s
2-queue = 430MB/s
4-queue = 684MB/s
16-queue = 1100MB/s
64-queue = 1233MB/s
128-queue = 1234MB/s
256-queue = 1236MB/s

As you can see, you really need a higher queue depth to unleash the power of RAID0. The more RAID0 member disks you have, the higher queue depth you need to gain anything from RAID0 in random read situations.

When we look at random write, however, things are different. Here the disks can buffer and finish the request instantly, just like HDDs with write buffering enabled do. The result is that multiple RAID0 members can be at work at the same time since they keep their own queue of writes to do. This is not possible with random read as this request can only be finished after supplying the requested data; thus no shortcuts possible here.

That's the reason why random write with queue depth=1 still scales with RAID0. Multiple queue depth might still allow higher speeds due to preventing any SSD from being without work at any particular time.

One super clear explanation too!

Your benchmark numbers are very impressiv.

What for RAID controller do you have ?

How large ist the STRIPE-size of the RAID-0 array?

Did you have to do any special partition alignment for the filesystem and which filesystem it was? (XFS, Ext4 etc.)

64-queue = 1233MB/s > 5 * 180MB/s = 900MB/s - more als 5x specification (180MB/s), really impressive!

sub.mesa · May 15, 2010

The Intel SSDs are connected to the onboard chipset SATA ports; but its software RAID that does the work. The disks operate in AHCI/NCQ mode which is required to use SSDs to their full potential.

The RAID engine is geom_stripe (generic RAID0 on FreeBSD). The Stripesize is 1 megabyte (1MiB). My tests suggest the higher the stripesize, the more random IOps.

Generally i found high stripesizes to work well with random I/O, and small stripesizes to work well for sequential transfers. With 1 megabyte stripesize, you need to read ahead alot to still put the other disks at work when doing sequential I/O.

FreeBSD (and also Linux) do not require partitions on disks; you can put the filesystem directly on the bare device node (/dev/sda, for example). This has the advantage of always being perfectly aligned. The downside is that booting into windows may overwrite portions of the disk quickly as windows asks to 'initialize' the disks. This is only a concern if you ever connect these disks to another OS like Windows.

The Intel X25-V read 245MB/s, and random read is slightly below that. So RAID0 in my case appears to scale perfectly; i couldn't be more happy about its performance. It writes with 40MB/s per disk; thus 160MB/s with 4 or 200MB/s with 5 SSDs in the RAID.

Some background info:

I'm using the SSDs to power all my five Ubuntu Linux workstations, who do not have any system disk or other local disk. Each workstation uses network boot and has its system drive on the central server instead, accessed using iSCSI and NFS. This has the advantage of having all my workstations running on SSDs on the central server. The downside is that gigabit limits my throughput considerably; which is why i'm looking at 10GBaseT and things like teaming several cheap gigabit NICs together to a faster (2Gbps) network interface.

One other sleek feature is snapshots on my system disks. Whenever i do an update, i snapshot first. If anything with the update goes wrong, which happened one time during upgrade to a beta-version, i can simply rollback to before the update again and my system disk would 'go back in time' so to speak.

So i really love ZFS+SSD.

Right now, i'm using the SSDs as:
- iSCSI images of system disks
- NAS central storage accessed with NFS
- ZFS L2ARC cache device for a huge multi RAID-Z array storing mass data

I'm still looking into multi-NIC setups, to increase my network bandwidth.

Search

Why SSD RAID-0 does not scale in 4k random read

tunggad

Junior Member

GlacierFreeze

Golden Member

Idontcare

Elite Member

tunggad

Junior Member

Idontcare

Elite Member

sub.mesa

Senior member

tunggad

Junior Member

sub.mesa

Senior member

TRENDING THREADS