Horrible I/O performance on our servers

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Hey guys,

This morning I decided to run some test to benchmark the I/O performance on our servers, since it's saturday morning and there's no one here.

The first test was with the AJA System Test, which is not exactly a server-grade random read/write test, but running the same test on all server would give me a fair comparison, right? FYI: the test writes a single 256MB file.

So, here's the breadown:

ML350 G5 RAID5 4x SAS SFF 10k 146GB
Quad-Core Intel Xeon 2.5GHz, 6GB RAM
HP E200i 128MB BBWC SAS RAID
Windows Server 2003 R2

Write: 17MB/s
Read: 175MB/s

ML110 G5 RAID1 2x SATA 7.2k 1TB
Dual-Core Intel Pentium 1.8GHz, 3GB RAM
Intel 3200 Integrated SATA RAID
Windows Server 2008 Standard

Write: 38MB/s
Read: 86MB/s

ML115 G5 RAID10 4x SATA 7.2k 500GB
Dual-Core AMD Opteron 2.2GHz, 3GB RAM
Nvidia MCP55S Integrated SATA RAID
Windows Server 2008 Standard

Write: 125MB/s
Read: 126MB/s

ML150 G2 JBOD 2x SATA 7.2k 250GB
Intel Xeon 3.2GHz, 1GB RAM
Windows Server 2003 Small Business Server

Write: 43MB/s
Read: 60MB/s

This is pretty much standard I/O performance for SATA drives, either single or striped.
However, the shocker came with the ML350 G5, the most powerful server of the bunch.
It's running a 4-drive RAID5 stripe of 10k SAS drives, wich I assume would obliterate a 2x RAID0 stripe of slower SATA drives. I could ony get 17MB/s writes. I believe this is due to the nature of RAID5 (having to calculate parity for each block) but come on! 17MB/s?!? This is a 256MB file striped into 64KB blocks, so there's quite a lot of parity to calculate. Maybe the performance would be higher using smaller files.

Read performance is quite acceptable (175MB/s) but I was hoping to break 230MB/s with 4 drives. Either the E200i controller sucks, or RAID5 sucks completely altogether.

Should I switch to RAID10 instead? I will loose a couple of GB of usable space, but hopefully I'll get faster writes.

It's so funny that I get faster write performance on a $500 ML115 with integrated Nvidia SATA RAID controller than on a $2500 ML350 with Enterprise grade SAS drives and dedicated RAID controller. It's actually kind of frustrating.

To keep this in perspective, this are the results for my MBP:

MacBook Pro (late 2008) 1x SATA 5k 320GB
Dual-Core Intel Core 2 Duo 2.53GHz, 4GB RAM
Mac OS X 10.5.7

Write: 53MB/s
Read: 58MB/s
 

pjkenned

Senior member
Jan 14, 2008
630
0
71
www.servethehome.com
That's wild, to the point that I would wonder if there isn't some sort of software config issue. Read speeds are OK, however write speeds... with a 512MB raid card cache and a 256MB file... writing to 10k rpm SAS seems off. FYI, even the cheapo Dell Perc 5/i can write on an 8 disk raid 5 array at 300+ MB/s

I really doubt you have a HW raid controller that maxes out at 17MB/s write on a 4 disk array, especially since that box wasn't built in 1995.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Yeah, I've been browsing the HP forums for the past few hours trying to find a solution. I'll take a look at the array configuration, maybe there's something wrong there. I remember it was configured at 50% Read 50% Write performance. Changing those settings may have a positive impact.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Well, using the CLI ACU doesn't seem to do anything. I diasabled the write cache, enabled it back again, modified the cache ratio and all... it still tops at 17MB/s with the AJA System Test.

controller slot=0 modify drivewritecache=enable
controller slot=0 modify cacheratio=25/75

I also run ATTO Disk Benchmark to compare it with your results, this is what I got:

ML350 G5 E200i with 4 10k SAS drives in RAID 5 (50% write/50% read cache ratio)

ML350 G5 E200i with 4 10k SAS drives in RAID 5 (75% write/25% read cache ratio)

ML115 G5 with 4 7.2k SATA drives in RAID 10

ATTO reports higher reads, almost 300MB/s, and somehow faster writes (about 25MB/s) but still nowhere near as fast as it should be.
 

tcsenter

Lifer
Sep 7, 2001
18,352
259
126
Its probably the cheap HP E200i controller. Its fine for RAID 0/1 but this controller is software for RAID 5. Perc 5/i is not a 'cheapo' controller. It costs 2x ~ 3x more than the E200, depending on the configuration. Besides, I don't think E200 even has 512MB option. Are you sure its not the HP P400?

Anyway, try updating the firmware:

Latest F/W for E200/E200i

HP Bulletin/Advisory for E200/E200i
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Originally posted by: tcsenter
Its probably the cheap HP E200i controller. Its fine for RAID 0/1 but this controller is software for RAID 5. Perc 5/i is not a 'cheapo' controller. It costs 2x ~ 3x more than the E200, depending on the configuration. Besides, I don't think E200 even has 512MB option. Are you sure its not the HP P400?

Anyway, try updating the firmware:

Latest F/W for E200/E200i

HP Bulletin/Advisory for E200/E200i

My bad. It's only 128MB of cache, but still... write performance is very low. I might just get two more drives and switch to RAID 10.
Thanks for the FW links, I'll see if I can update them.
 

tcsenter

Lifer
Sep 7, 2001
18,352
259
126
Don't assume your array will still be intact after updating the firmware. Backup before-hand just in case and be prepared to rebuild it.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Originally posted by: tcsenter
Don't assume your array will still be intact after updating the firmware. Backup before-hand just in case and be prepared to rebuild it.

I know, thanks!
Rebuilding the RAID 5 array at this speed would take forever, so I'll just switch to RAID 10. Using 6 drives in RAID 10 will give me the exact same usable space so I'll just create snapshots for both volumes (OS and Data) with Acronis True Image, re-create the array, boot into Acronis and restore. Shouldn't take more than two hours.

Migrating the array will only results in one big headache.
 

najames

Senior member
Oct 11, 2004
393
0
0
I'll be interested to hear how your RAID10 goes. I have a Sun box here and admins just tried to give me a RAID10 setup, claiming it was soooo much faster than the RAID5 setup, blah, blah, blah. It was ok when I first submitted a program, then basically would grind to a halt, %0.4% CPU, disks 100% busy +100% wait in iostat while the program was still running. I told them to keep their RAID10 setup if they couldn't make it work any better than that. There is no way a properly configured RAID10 could be that freaking slow people. I'm back to RAID5 with a few more disks hung on it.

I'm building a single CPU core i7 920, onboard controller, with 6 disks, one for OS and data storage, 3 WD 640s short stroked in RAID0 for scratch space, 2 WD Black 1TB short stroked in RAID0 for sorting scratch space (read from one space sort/write to another space). I'll bet my $1000 "test server" PC is faster than our 5yr old 4x1.28GHz Sun server with their super duper fiber channel RAID setup that costs us $50,000 per year for "maintenance".

Reads are sequential, CrystalDiskMark showed the WD Black 1TB at about 130MB/s reads and 110MB/s writes. I know from previous LAN testing that the 640s will do over 100MB/s too. Not sure yet about the onboard ICH10R speed to run it all, will know more tonight hopefully.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
I've just ordered the drives, but they won't be here for a few days so I'm gonna have to wait to the weekend to set up the RAID 10 array. I've had a few users complaining about access speed with some apps on the server, which I suspect are related to the abysmal write performance.
 

najames

Senior member
Oct 11, 2004
393
0
0
I installed Win7 64bit, set up the 2 RAID0. Two of the older WD 640GB drives gets about 275MB/s reads and 220MB/s writes. The newer 2 WD 1TB drives gets 300MB/s reads and about 225MB/s writes. Both RAID0 are 128K stripes, 100GB total (50GB off each drive), writeback (?) option turned on. One set of disks for scratch sort space and one for regular scratch space. Again, my app will run the data onto the regular scratch space and then use the other for sorting. I tried to install the 32bit app from old CDs and it doesn't know what 64bit OS is and fails the install. A new set of 64bit app disks is on the way.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
That's great I/O for just two drives!
I've just received the two additional SAS drives, so I'll be re-creating the array over the weekend and post back my results. If you could get 225MB/s writes with only two drives, I hope to go beyond 300 using a 3-disk stripe of 10k SAS drives.
 

RebateMonger

Elite Member
Dec 24, 2005
11,588
0
0
Originally posted by: Zucarita9000
If you could get 225MB/s writes with only two drives, I hope to go beyond 300 using a 3-disk stripe of 10k SAS drives.
I imagine you'd be happy just getting better than 17 MB/s writes.
 

pjkenned

Senior member
Jan 14, 2008
630
0
71
www.servethehome.com
Originally posted by: najames
I installed Win7 64bit, set up the 2 RAID0. Two of the older WD 640GB drives gets about 275MB/s reads and 220MB/s writes. The newer 2 WD 1TB drives gets 300MB/s reads and about 225MB/s writes. Both RAID0 are 128K stripes, 100GB total (50GB off each drive), writeback (?) option turned on. One set of disks for scratch sort space and one for regular scratch space. Again, my app will run the data onto the regular scratch space and then use the other for sorting. I tried to install the 32bit app from old CDs and it doesn't know what 64bit OS is and fails the install. A new set of 64bit app disks is on the way.

Can you post an ATTO benchmark of this 300MB/s? I've never seen a 7200rpm SATA disk do 150MB/s sustained (not burst) writes, much less in raid 0 where you don't get 100% perfect scaling.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Alright, this is getting ridiculous.
I've spent the past two hours working on this server. I installed two more drives, created a RAID 10 with 6 10k SAS drives and restored the system. The RAID 10 uses 128K stripes. This is the controller confguration:

Smart Array E200i in Slot 0
Bus Interface: PCI
Slot: 0
Serial Number: QT88MP3175
Cache Serial Number: P9A3A0B9SWH0LV
RAID 6 (ADG) Status: Disabled
Controller Status: OK
Chassis Slot:
Hardware Revision: Rev A
Firmware Version: 1.78
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 sec
Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 50% Read / 50% Write
Drive Write Cache: Enabled
Total Cache Size: 128 MB
Battery Pack Count: 1
Battery Status: OK
SATA NCQ Supported: False

Remember that I used to get 17MB/s writes? Well, now I get a lousy 65MB/s, which is absolutely nowhere near where 3 SAS drives should give me. Reads are OK, pretty much the same with RAID 5 (about 220-250MB/s).

This is really f***ed up. Either the E200i is faulty or it's a plain old piece of crap.

Here's the ATTO benchmark. Take a look how the write performace scales up to the 128K mark, and then drops abysmally:

ATTO Disk Benchmark

Setting the cache to 100% write made a slight improvement

All 6 drives are Seagate 10K.2 SAS Savvio 3Gb/s 146GB hard drives (which are NOT cheap btw).

Any help will be greatly appreciated.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Small update: Using ATTO with a 32MB length and a queue depth of 10 gives me 420MB/s reads and 100MB/s writes.

I tried to perform real world tests and did the following:

1. Copied Ubuntu 9.04 iso image (698MB) from Serverhp3 (RAID 1 of two SATA drives) over Gigabit ethernet. It copied the file in 11 seconds, about 63MB/s

2. Copied the same file to a different volume in the array (the RAID 10 array has two volumes). The copy went almost instantly, couldn't even start the timer.

3. Copied the same iso image file from Serverhp4 (RAID 10 of 4 SATA drives) over Gigabit ethernet. It was copied in about the same amount of time, I figure the bottleneck is GigE (70MB/s top speed).

4. Copied a 456MB folder containing over 400 items of various sizes from and to the server over GigE, again, very good performance.

So, I don't know what to make out of this numbers.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Originally posted by: tcsenter
Originally posted by: Zucarita9000
Hardware Revision: Rev A
Firmware Version: 1.78
F/W 1.82 is the latest and "strongly recommended" by HP.

Allright, by popular demand I've updated the firmware to 1.82 an run the benchmark again.

F/W 1.82 results w/queue depth of 8
F/W 1.82 results w/queue depth of 4

One thing I noticed was that after upgrading the firmware I had the ability to enable advanced write caching options, so I did. After restarting, those option were not available anymore :S
 

tcsenter

Lifer
Sep 7, 2001
18,352
259
126
It would have helped had you done more benchmarking with the other setup. Hard to interpret these results in a vacuum, but your write performance did increase by over a factor of three. E200 is an entry-level RAID controller, though.
 

Zucarita9000

Golden Member
Aug 24, 2001
1,590
0
0
Originally posted by: tcsenter
It would have helped had you done more benchmarking with the other setup. Hard to interpret these results in a vacuum, but your write performance did increase by over a factor of three. E200 is an entry-level RAID controller, though.

I know, and since the move to RAID 10, our users no longer complain about performance. It seems the benchmarks don't really reflect on real world usage. The server doesn't appear to be slow, it's quite snappy actually.