SQL server and RAID ssd (or to not raid) Any tips?

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
I plan to do 4 or 6 kingston SSD. Got two 128gb for $80 and a few 96gb for $90 shipped. so they have GC which will work good with my cheapie raid controllers.

so with sql server enterprise you get parallel indexing and all sql server dispatches more i/o when you split up drives.

I could raid-0 them on this quad core box with 8 or 16gb of ram.

say 1 big raid 4 or 6 drives, or 2 raid of 2-3 drives each.

Or jbod.

Given enterprise edition of sql server parallel indexing and splitting files apart the other option of raid/JBOD would be:

1. DB on drive 0/1
2. TMP/MASTER on 2/3
3. LOG on 3/4

or
DB on 1/2/3
tmp on 4
LOG on 5/6

I could use two raid controllers (LSI) in jbod mode so sql server could dispatch 3 threads to work on db, 1 thread on tmp, and 2 threads for log?

or if i raid them i might get 1 thread to DB, 1 thread to tmp, 1 thread to log?

or if i raid them all together in one big raid-0 i might get higher burst read/write but at a cost of less worker threads?

Any thoughts - primary goal is to use consumer throwaway ssd to cache and handle huge catalogs of data and ETL (transformation of data formats).

The core work will be done on a normal disk server which has 6 x 600gb 15K 3.5" SAS drives in raid-10 (pretty fast) but not scalable since 1.8TB is about $4200 and my working set is not that big at all. I need the # of spindles in raid-10 for safety and disk iops.

With SSD and 6 96GB ssd - 90gb formatted - i have 540GB @ $1/gb - so i'll keep my writes down on this box (mostly dev but also catalog full text/parametric search). - No way i can afford an intel 710 they are $10G for the big boys each (no joke! look it up). same with fusion i/o $10G a pop.

So back to the raid-0 or jbod - more worker threads per volume - so a quad core or 6 core could dispatch more i/o threads to run concurrent and enterprise edition of sql server can divvy up index searches into more than 1 thread (per index) to push even more iops.

Anyone have a DEV setup with sql server and thoughts? I'm not worried about reliability - I make backups and this is dev and caching(read) so i can build two and if one fails just reroute at the model.

Bare metal server 2008 R2 + sql server - no vmware or anything - i want RAW max speed at cheapest cost. I know there are faster ssd's but no point torching a fancy ssd if i have a runaway routine burn up 100 terabytes of writes on a holiday weekend. that would not be nice.

I could move to raid-1/10 to add reliability if this works out. also if i don't partition a drive is it correct that the lifetime will increase? or is that hardcoded into the fimware.

Intel 710 = intel 320 plus custom firmware and uses 200 out of 320gb. So if i take my 96gb and chunk it down to 50gb - by not creating a partition would that be the same? or do you have to have firmware hacks to change the reservation?
 

razel

Platinum Member
May 14, 2002
2,337
93
101
It sounds like the Kingston SSD you're using isn't the HyperX version. At $1 per GIG, as much as I love recommending the Kingstons V100+ to friends and family, they do not support or utilize NCQ and are hopeless for high IO duties like MSSQL server. You could get the same performance of RAIDed V100+ by using a single different SSD.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
sure which other drive for less than $1/gb will perform better. remember 10 drives dispatch more worker threads than 1 drive can. 10 low queue > 1 large queue.
 

sub.mesa

Senior member
Feb 16, 2010
611
0
0
The Crucial M4 64GB would probably give you the most performance per dollar, since for the low capacity it got very high multiqueue random reads and random writes. When using RAID0 your IOps will fly!

But you should consider the Intel 320 too; slightly lower performance particularly considering random writes, but you do get supercapacitors which protects against corruption of the fragile mapping tables as well as recently written data. Intel controllers also have the lower write amplification of all.

Either way, with massive random writes you would be wise to use massive overprovisioning as well; 20-30% is highly recommended. The more the better, really, both for write longevity as well as keeping performance degradation to a minimum.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
well if the raid controller has 1GB flash backed write cache and trim is out of the question - does high queue depth matter? you are right - many (Sandforce,junk) write without a capacitor. that is too dangerous imo. I should disable write caching on the drives and rely on the raid controller cache to handle the queue to be safest. i wonder if that is even possible.

1. is the crucial M4 64gb about $1/gb or less?
2. is the crucial M4 safe (capacitor write back cache)?

with 128gb for $80 - i could afford more spindles and perhaps provision at 50% to increase life and performance. the garbage collection on toshiba (same as apple) is really good so you don't have to worry about trim.

but if i use JBOD then i could use trim but i'd lose the raid controller cache which is very fast (ddr2 is faster than any ssd in raid-0). hmm.

Trim doesn't necessarily increase the lifespan of the drive? i thought it decreased the lifetime but increased the speed?
 

sub.mesa

Senior member
Feb 16, 2010
611
0
0
TRIM support on RAID/SAS controllers
Your RAID controller will not support TRIM, whether in JBOD mode or not. Only ATA controllers support TRIM, meaning IDE or AHCI controllers. It also requires TRIM-compliant drivers. On the Windows platform that means Microsoft, Intel or AMD drivers. No other TRIM-compliant drivers exist to my knowledge. Though there has been some talk about Marvell drivers supporting TRIM, but only on their AHCI controller. And that's just talk, nothing substantive.

Write-back buffercache
Write-back caching (actually: buffering; a cache is something else) means the controller saves up write requests. This is especially useful in RAID5 arrays, but shouldn't be necessary for other configurations especially on SSDs. Still it might give a small performance boost, but at the cost of lost writes on failure. Only use write-back if you got a BBU (Battery Backup Unit).

Supercapacitors
But you still need a supercapacitor for data integrity. Not just for recently written data, but for keeping the mapping tables free of corruption. Whenever SSDs lose power unexpectedly, they may corrupt their tables if you're unlucky. You may recall the Intel 320 "8MB" bug which is an example, but not everyone knows that virtually all SSDs are susceptible to this issue. It's just that Intel 320 series features supercapacitors and thus this should not happen for that SSD series.

Mapping table corruption
Corruption of the mapping tables may present odd behavior, for example corruption ('blue screens' during boot) or altered capacity such as is the case for Intel SSDs, or simply the SSD not responding to any I/O; a 'bricked' SSD. Mapping tables are the tables that store where data was written in relation to how the operating system thinks it was written. Thus it keeps track of the disparity between logical LBA and NAND cell addressing. In simple English: the mapping tables store the difference between how Windows thinks data is being stored, in relation to where the data is actually being stored by the SSD controller without knowledge of Windows.

The mapping tables are always cached in (D)RAM, and are updated at regular intervals and during safe shutdowns. SSDs require the host to send a "STANDBY IMMEDIATE" signal before the power is cut, to allow SSDs to prevent corruption of the mapping tables. You may have 99 unsafe shutdowns, but the 100th could be your unlucky number where you lose a fair amount of data or even the entire SSD.

So if you need data integrity, you really need supercapacitors, or you need some kind of safe storage solution like ZFS can provide. For example, storing the actual SQL data on a ZFS NAS with HDDs and SSD as L2ARC + SLOG may be just what you need. High IOps and superior data protection. But this is a more complicated setup than just connecting SSDs to your Windows server, of course.

Crucial or Intel?
As for Crucial; from a performance standpoint it is superior to the Intels, but regarding data integrity it's got two main drawbacks:
1) lack of supercapacitor; protecting the mapping tables from corrupting
2) lack of redundancy on the NAND level; bit-errors on NAND are not corrected. The Intel 320 series as well as Sandforce SSDs (RAISE) uses redundancy not unlike RAID5; though in Intels case it behaves more like RAID4 (dedicated parity). This protects against bit-errors that occur at the NAND level. For this reason, the Intel 510 series using Marvell controller uses 34nm NAND instead of 25nm NAND because 25nm is more susceptible to bit-errors than 34nm NAND is.

So if you need data consistency I can highly recommend the Intel 320 series. Simply disabling the drive write-back will not save you here. Regarding overprovisioning, the more the better obviously. With enough OP you don't really need TRIM altogether. I'm also not a big fan of aggressive garbage collection since it increases the Write Amplification Factor (WAf) considerably. Intel is known for their extremely low WAf while Samsung has very aggressive GC resulting in extremely high WAf. Especially with high random write workloads like your SQL server will generate, this will drastically lower the endurance of your SSDs, perhaps to the point of them failing within their service life. My Intels can live for more than 50 years with my current usage pattern, usually write endurance is not a major concern for consumers but it is a concern for servers employing heavy random write workloads.

TRIM and spare space
Your last question: while TRIM results in extra spare space, it does not directly affect lifespan. Whether you have TRIM or not does not make that much of a difference if you have high degree of overprovisioning. It's the spare space the SSD has at its disposal which is important. TRIM can help giving the SSD more spare space which can improve its lifespan, yes. But in your case I would recommend a huge amount of spare space. Also consider that you can always 'grow' your filesystem larger in Windows 7. So you can start with even 70% OP using only 30% of the capacity. When you need more space, give it some more. But remember; shrinking will not work! The Overprovisioning only works if you NEVER WRITTEN to that LBA before. SSDs start with 100% spare space and only those parts which you've written to will be occupied and substracted from the spare space. TRIM will 'free them' again and release them as spare space again. In reality, only use Overprovisioning on brand new SSDs or after a secure erase procedure.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
thanks. will take that into consideration. the server will mostly other than ETL delta changes be designed for published (or mirrored) read cache. writes should be 5% of reads (similar to desktop) daily or hourly delta change change. problem with ZFS is obviously latency connecting it to storage. any nas has latency that reduces the huge advantage of ssd. If windows OS does the striping - windows o/s would see each drive as individual drives thus trim would work?
 

LokutusofBorg

Golden Member
Mar 20, 2001
1,065
0
76
If you get decent SSDs then you shouldn't be trying to optimize for (higher) thread count. The super low latency/service time of SSDs will be able to service each thread *much* faster and therefore the IO queue will act more single-threaded, if that makes sense.

Also, unless you're dealing with only a single database with a single log file, then separating the log file from the data files won't net you much gain. A single log file will have a sequential IO profile, but as soon as you put anything else on the same volume you'll get a more random IO profile. So you might as well just throw it all into one large array/volume and optimize for the lowest latency you can.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
yeah right now i have 6 15K SAS drives in raid-10 so average queue really doesn't go past 1 ever except during ETL's which can last an hour due to size of dataset and complexity of data. so high queue depth (to perfmon) never exists now. the lazy write routine in sqlos probably contributes to this