Server performance problems.

Armitage · Dec 18, 2003

Ok, I need your guys help to convince my local tech-support guy that we have a problem

First, the system...
Tyan S2721GNN - E7501 chipset MB, 533MHz FSB
2x2.6GHz Xeon
4GB PC2100 Registered DDR
3xU160, 73GB, 10K RPM SCSI drives
Mylex RAID controller with 32MB cache
RedHat Linux 8.0

Drives are configured in RAID5 as a single partition

The problem is that the drive performance SUCKS.

Bonnie++ results
Version 1.01d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
localhost.local 14G 4536 16 4495 2 3869 2 23837 74 110626 29 280.0 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
localhost.locald 16 2387 86 +++++ +++ +++++ +++ 2604 91 +++++ +++ 6695 100
95 100

Compare to a similar machine, only with 2GB ram and a single 7200RPM IDE drive:

Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ssao3.aero.org 4G 21846 79 21285 13 7828 3 21298 65 45388 7 114.6 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 2634 96 +++++ +++ +++++ +++ 2742 97 +++++ +++ 7137 99
4G,21846,79,21285,13,7828,3,21298,65,45388,7,114.6,0,16,2634,96,+++++,+++,+++++,+++,2742,97,+++++,+++,7137,99
7137,99

per character & block output (writing to disk) is more then 4x better on the IDE machine, and rewrite is > 2x better!

The last straw came today as I'm trying to move a database onto this machine ... One one of the big tables, the network connection kept failing, so I dumped the table to a file and FTPed it over. The file is about 1GB, and has approx 6 million records and a few indexes. FWIW, this is MySQL we're dealing with.

Now I'm trying to cat the file straight into the database ... it's been going for a few hours now and I have about 60% of the data in the database at this point. Here's what top looks like:

5:45pm up 34 days, 4:10, 4 users, load average: 142.66, 138.09, 126.78
404 processes: 403 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 0.0% user, 0.2% system, 0.0% nice, 99.3% idle
CPU1 states: 0.0% user, 2.1% system, 0.0% nice, 97.3% idle
CPU2 states: 0.0% user, 0.2% system, 0.0% nice, 99.3% idle
CPU3 states: 0.1% user, 0.5% system, 0.0% nice, 98.4% idle
Mem: 3870836K av, 3855188K used, 15648K free, 0K shrd, 137492K buff
Swap: 17864240K av, 15028K used, 17849212K free 3476796K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
2051 ******** 16 0 1224 1224 796 R 0.9 0.0 28:40 top
12 root 16 0 0 0 0 DW 0.3 0.0 18:16 bdflush
13 root 15 0 0 0 0 DW 0.3 0.0 0:58 kupdated
24 root 15 0 0 0 0 DW 0.1 0.0 13:02 kjournald
757 root 15 0 3136 728 624 S 0.1 0.0 1:21 ypserv
1 root 15 0 472 436 424 S 0.0 0.0 0:22 init

As you can see, the CPUs are basically idle, but the system load is astronomical, and of course, the performance is dismal.

When I ask one of our support guys about it he says its because of the RAID5 and suggest we do RAID0 and more frequent backups

Or says it might be because we don't have a seperate OS drive (OS is on the RAID partition). I'm not buying it. There is no way this should be that slow. Agreed?? Suggestions??

Armitage · Dec 18, 2003

Sweet ... broke 200, and still about 1 million records left :disgust:

6:35pm up 34 days, 5:00, 4 users, load average: 200.41, 196.06, 183.88
522 processes: 521 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 1.1% user, 1.2% system, 0.0% nice, 97.0% idle
CPU1 states: 1.3% user, 1.3% system, 0.0% nice, 96.2% idle
CPU2 states: 0.5% user, 0.5% system, 0.0% nice, 98.0% idle
CPU3 states: 0.4% user, 0.5% system, 0.0% nice, 98.1% idle
Mem: 3870836K av, 3857856K used, 12980K free, 0K shrd, 137652K buff
Swap: 17864240K av, 15028K used, 17849212K free 3474940K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
2051 ******** 15 0 1296 1296 796 R 1.1 0.0 29:08 top
12 root 15 0 0 0 0 DW 0.1 0.0 18:23 bdflush
24 root 15 0 0 0 0 DW 0.1 0.0 13:08 kjournald
867 ntp 15 0 1868 1868 1808 S 0.1 0.0 0:58 ntpd
1 root 15 0 472 436 424 S 0.0 0.0 0:22 init
2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0
3 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU1
4 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU2

Just for kicks, I tried loading this table back onto my desktop where it originated. This is a 1GHz PIII, 512MB, Software RAID1 on a pair of 7200RPM IDE drives, RedHat 9, and already busy with a bunch of stuff running. I've got about 1/3 of the data in in about 40 minutes, and the system load is hanging around 4.

FWIW, the server is using ext3 filesystem. How do I check what journaling mode is in use?
Thanks

sharkeeper · Dec 18, 2003

Which Mylex HBA?

Models under the 2000 were not stellar performers especially in RAID5 IOE.

Cheers!

Armitage · Dec 18, 2003

Originally posted by: shuttleteam
Which Mylex HBA?

Models under the 2000 were not stellar performers especially in RAID5 IOE.

Cheers!

Don't recall, I'll have to check tomorrow.

sciencewhiz · Dec 19, 2003

Write performance will suck with raid 5. Less so, the more expensive controller that you get, but it won't ever approach raid 0.

Armitage · Dec 19, 2003

But should it be this bad??

Sunner · Dec 19, 2003

No, it most definately shouldn't, RAID-5's write performance isn't as good as 0 or 10, but it's not dismal either.

Are you sure you don't have any renegade processes doing lots and lots of disk IO?
What do you get from iostat?
Try something like "iostat 1 20".

The closest I've come to that was a Sun E250 box which had a logrotate program which went wild, the load increased roughly by 1 every day, only due to the IO wait this program caused, the box actually kept on performing it's duties, so it went unniticed until I logged in and ran "w" just to see how things were haning

drag · Dec 19, 2003

here are some tips from redhat about SCSI. I don't know how up to date they are, I've never realy used SCSI a whole lot, especially in RAIDs.

also from a different SCSI howto, also read next section

What drivers are you using? Try using "dmesg |less" command to see if you can find some info.

Maybe the cords are f**ked up, or not terminated correctly.. I've seen this warned on a few pages.

Another multi disk tuning page.

Version 1.03......------Sequential Output------ --Sequential Input- --Random-
.....................-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine........Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
spock............496M 13118 99 39488 23 17981 9 17562 90 39667 11 215.2 0
.....................------Sequential Create------ --------Random Create--------
....................-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
...............files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
................16 1355 97 +++++ +++ +++++ +++ 1389 99 +++++ +++ 3475 98

This is what I get with my machine. Western digital 7200 80 gig with 8meg cache. 1700+ with 256megs ram. Not a slow IDE harddrive....

One thing I noticed though was that the test for your harddrive used 14G Size, vs 4G used in the other one... I don't know if that makes a big difference in the benchmark though...

I also cat'd a 700M AVI file into another file in the same directory. It took about 15-30 seconds or so, so I think there is something definately weird going on with your raid setup.

Armitage · Dec 19, 2003

The large file size on the bonnie test is to make sure the machine isn't caching the whole thing. They reccomend 2xRAM, so the server should have been 8GB, but it shouldn't matter once you're past 2xRAM. Thanks for the tips guys ... not sure that I'll get to this today .. I'll be working out of a different building.

Armitage · Dec 19, 2003

Originally posted by: Sunner
No, it most definately shouldn't, RAID-5's write performance isn't as good as 0 or 10, but it's not dismal either.

Yea, that's what I thought ... for hardware SCSI RAID, it should still be pretty fast ... slow compared to other RAID modes maybe, but still fast.

Are you sure you don't have any renegade processes doing lots and lots of disk IO?

Don't think that's it, because when I'm not doing anything, the system load is 0 ... idle. And the performance is great for non-IO stuff.

What do you get from iostat?
Try something like "iostat 1 20".

The closest I've come to that was a Sun E250 box which had a logrotate program which went wild, the load increased roughly by 1 every day, only due to the IO wait this program caused, the box actually kept on performing it's duties, so it went unniticed until I logged in and ran "w" just to see how things were haning

This is just frustrating because the guy that is supposed to take care this stuff isn't, and I don't have the time or experience with this sort of hardware to sort it out myself.

dkozloski · Dec 19, 2003

I hate to suggest the simple stuff but have you checked your RAID manager to see if write caching is enabled for your Mylex caching controller. Someone may have turned it off in the interest of data integrity. Also my experience is that a Mylex caching controller is sensitive to caching stripe size, not to be confused with RAID stripe size.

Sunner · Dec 19, 2003

To continue on the "easy" route, have you checked, or asked him to check, to make sure you don't have a bad drive?
That'll slow it down a fair bit.

Not to mention you'd pretty much be running a RAID-0 array if that's the case

Search

Server performance problems.

Armitage

Banned

Armitage

Banned

sharkeeper

Lifer

Armitage

Banned

sciencewhiz

Diamond Member

Armitage

Banned

Sunner

Elite Member

drag

Elite Member

Armitage

Banned

Armitage

Banned

dkozloski

Diamond Member

Sunner

Elite Member

TRENDING THREADS