Clovertown vs. Kentsfield for SQL Server 2005

JohnVM

Member
May 25, 2004
170
0
76
I'm building a rig for a big SQL database which I use for my home business. Will have limited number of simultaneous connections but will need to quickly query large tables. For reference my largest table atm has 5158499623 rows, and this # is rapidly growing (that's five billion btw).

Would I be better off getting for instance 2 Intel Xeon X3220 Kentsfield 2.4GHz LGA 775 procs, or 2 Intel Xeon E5320 Clovertown 1.86GHz Socket 771 procs for this machine? Both are quad core but it looks like Clovertown is more meant for servers and has a 1333MHZ FSB vs the Kentsfield 1066.

Update: Well, after browsing NewEgg it seems there arent even dual socket LGA 775 motherboards... (for the Kentsfield). Is that correct?
 

myocardia

Diamond Member
Jun 21, 2003
9,291
30
91
Originally posted by: JohnVM
Update: Well, after browsing NewEgg it seems there arent even dual socket LGA 775 motherboards... (for the Kentsfield). Is that correct?
Yes, Intel doesn't want businesses to be able to do what you want to do: use more than one of the cheaper LGA 775 quads. If you want more than one quad-core, you have to use the more expensive LGA 771 chips. BTW, how exactly is this server being used? In alot of instances, you'll become I/O-bound much faster than you'll become cpu-bound, and a single faster quad, paired with SAS HD's will outperform dual slower quads, that aren't paired with fast HD's.
 

dandragonrage

Senior member
Jun 6, 2004
385
0
0
Edit: On second thought, Intel may be a better option than the T2 in your case with low connection amounts. But I'd still look into it.

Also, what database do you use? EnterpriseDB may end up better than Oracle.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,067
3,574
126
well heres some numbers that ive been keeping up with.

My Xeon 3220 @ 3375 would produce somewhere around 14-18k WCG points per day.

My friends dual cloverfield machine would produce around 25-30k points per day.

so if you double up on the dual kents, which most likely would be a penryn V8 board, @ 3.375 you could push 28-36k points per day approx.


my current Q6600 @ 3.6 does about 19-22k points per day. And no im not kidding. She's a monster.
 

sonoran

Member
May 9, 2002
174
0
0
Originally posted by: JohnVM
my largest table atm has 5158499623 rows, and this # is rapidly growing
Holy smokes! 5 billion records, and this is a home-based business!?! I can't help but wonder what the heck you could be keeping track of?

Anyhow, I'd second what Myocardia said about I/O. A SAN is probably your best option. They definitely don't come cheap, but a well-configured SAN can easily do 10x the I/O of local disk. I'd also suggest getting the fastest system you can that allows you to use LOTS of RAM. This is a case where 4GB probably is not cutting it.

You might also think about deleting some old data, if that's an option.
 

JohnVM

Member
May 25, 2004
170
0
76
Yeah, 5 billion records for a home business. Business is re: finance. The only reason the database is only 5 billion records is because my current rig cant handle more -- I'll probably rocket instantly up to 15-20 billion as soon as I get a new rig.

Box is going to run SQL Server 2005 Enterprise.

The current hw setup I'm planning is:

1 * ASUS DSBF-D/SAS Dual Socket 771 Intel 5000P SSI EEB 3.61 Server Motherboard - Retail
1 * areca ARC-1160 64-bit/133MHz PCI-X SATA II Controller Card - Retail
1 * BFG Tech BFGR800WPSU ATX 12V Ver.2.2/ EPS 12V Ver.2.91 800W Power Supply - Retail
2 * Intel Xeon E5335 Clovertown 2.0GHz Socket 771 Active or 1U Processor Model BX80563E5335A - Retail
2 * Kingston 4GB(2 x 2GB) 240-Pin DDR2 FB-DIMM DDR2 667 (PC2 5300) ECC Fully Buffered Dual Channel Kit Server Memory Model KVR667D2D4F5K2/4G - Retail
12 * SAMSUNG SpinPoint T Series HD501LJ 500GB 7200 RPM SATA 3.0Gb/s Hard Drive - OEM

Box costs ~$5500. It's a big hit to take though :-\
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Do you really need 6 TB of disk-space? I ask because the Areca card can be maxed out from a bandwidth standpoint with fewer drives and if you go with raptors and you will still have reasonable disk space (12x150MB) but with better access times/lower latencies.

Areca's published performance with IOP341

Also consider looking into the PCI-e variants of the areca cards...that PCI-X card will have an Intel IOP331 chip which gets maxed out with lower bandwidth than the 800MHz IOP341 processor Areca uses on their PCI-e cards.

Areca PCI-e

(I own a 1280ML)

But in any event - you'd be much better off doing what you can to have enough ram to avoid thrashing the disk array. A single SQL file containing 5 billion entries should come out to something around 2-3 GB of data, give or take obviously but my point is the SQL file isn't 500GB.

Can you comment on the size of the file? If it is reasonable - <16GB - then consider getting a dual-socket LGA-771 board with enough FB-DIMM slots that you can buy afforable cheap RAM and have it all run within the cache so-to-speak on Vista64. Or perhaps give a ramdisk a try.
 

JohnVM

Member
May 25, 2004
170
0
76
The SQL file is about 500GB atm, and once I get all the records in there it'd be close to 2TB actually.

The actual data is about half that but have it indexed.... approx doubled the space (but critical -- allows me to search it in under a second or so vs like 30 mins to search unindexed)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Ah, I see now! 6TB array would be in order then :D

Likely you'll want even more then - you wouldn't want to lose a 2TB file due to disk failuire or read error.

The 1280's will let you put 24 disks into a single array.
 

JohnVM

Member
May 25, 2004
170
0
76
so it looks like the Areca PCI-e's are all ML? What's the diff with Multilane vs straight SATA?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: JohnVM
so it looks like the Areca PCI-e's are all ML? What's the diff with Multilane vs straight SATA?

If you want the IOP341 then yes, I believe you must get the ML version.

I too was relunctant to take on this seemingly newfangled multi-lane conenction. But it is pretty simple actually. Basically you have a cable with 4 SATA connectors on one end and one ML connector on the other end. It plugs into your SATA drives just like any other SATA cable, and plugs into Areca card without a problem.

It works just fine. My 1280ML came with all the cables, 6 ML cables that is. I don't know if they still ship them with cables, the guys at flickerdown could tell you right quick though.

There are no penalties of any kind in using ML versus discreet SATA connections if that is what you might be worried about. It is just a packaging twist to simplify the logistics of managing up to 24 cable connectors on a PCB (runs out of real-estate real quick).
 

MerlinRML

Senior member
Sep 9, 2005
207
0
71
I'd second the need for PCI-E bandwidth on your RAID controller. With 12 disks, you're going to be pushing the upper limits of a PCI-X 133Mhz bus. Nevermind if you have anything else running on the bus sharing bandwidth with the card.

Do you have a case in mind that will handle all 12 of your hard drives? You'll also need a pretty beefy power supply, considering you're going to need around 300watts of power just to handle the disks. Your 800W supply should handle it, but how will you get power to all the disks? Looking at a backplane of some sort? Perhaps you should look at redundant power supplies as well.

That also makes me ask, will you be installing your OS on the array? I'm positive that 32-bit versions of Windows won't boot off a disk larger than 2TB. I'm not positive about the 64-bit versions.

And for the record, Kentsfield is a uniprocessor config only. For dual processors you're looking at dual-core Woodcrests or quad-core Clovertowns.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally posted by: myocardia
Originally posted by: JohnVM
Update: Well, after browsing NewEgg it seems there arent even dual socket LGA 775 motherboards... (for the Kentsfield). Is that correct?
Yes, Intel doesn't want businesses to be able to do what you want to do: use more than one of the cheaper LGA 775 quads. If you want more than one quad-core, you have to use the more expensive LGA 771 chips. BTW, how exactly is this server being used? In alot of instances, you'll become I/O-bound much faster than you'll become cpu-bound, and a single faster quad, paired with SAS HD's will outperform dual slower quads, that aren't paired with fast HD's.

Its true right now that you can't get a 2 socket M/B for Intel 775 . It is untrue however that intel doesn't want its customers building 2 socket 775 M/B . Skulltrail will be released in aug. It will use 2 socket 775 chips on a 1600 FSB probabaly the server chip with 775 pins. It will use DDR3 of course and have dual fsb.

Now that doesn't sound like INTEL doesn't want its customers buying a cheaper workstation now does it??



 

JohnVM

Member
May 25, 2004
170
0
76
Good responses guys -- thanks a lot for all the help thus far. I'll definitely be looking into those PCI-E cards then.

I *had* a case in mind to handle all 12 hdds, but NewEgg seems to have discontinued it as its no longer in my wishlist etc. and I cant recall the name of it (QUITE pissed about this). Any good reccomendations? The card supposedly supports staggered spin up too, which should help w/ the PSU. I might have to look into a 2nd one though, youre correct. Why woudl I need a backplane? Can't you use those PSU cable extender things that keep splitting the power outs so it could get to all the drives?

This array will be running Windows 2003 Enterprise x64.
 

tshen83

Member
Apr 8, 2001
176
0
0
JohnVM:

you post is interesting, and I think I know what you need the system for, since you mentioned finance, I am thinking you are probably trying to create a Terabyte worth of financial data? (equity, options, commodities time series data?)

if I am right on my assumption of what you want to do , I don't think the system is going to get you what you want.

1. It's I/O. The areca will get you 500MB/sec fully configured. If you were to query a 2TB database, that's 4000seconds, or about an hour. Won't do you any good

2. SQL isn't designed for large time series database. You should look for column based database..go google for them(there are commercial ones like kx.com but since you think 5000 is expensive, kx is another story.

3. 4GB of memory is a mismatch for the amount of data in the SQL server. For what you want to do, a dual dual core Opteron system on Socket F with 8GB-16GB ram is more of what you need, for NUMA memory bandwidth instead of 1066FSB bandwidth on the Xeons. Plus DDR is cheaper than FB-DIMM

4. It's actually better to look at this problem from a scale out(having a cluster of cheaper systems) perspective instead of scale up(having 1 system that's like 1500 cores). You should look into either splitting the table into smaller ones, or having a distributed file system(lustrefs and hadoop file system) spread over multiple cheaper systems will do you more good.

5. one more thought: I hope your Windows 2003 Enterprise x64 and SQL 2005 Exterprise are properly licensed :) if so, those licenses could cost you more than the physical system. a lot of distributed computing problems are not exactly done in Windows environments. Look into linux based distributed file system and clustering technologies for free. I recommend CentOS(since Redhat Advanced Server and SUSE Enterprise are very expensive)

that's just what I think...keep in touch.

if you don't go with the distributed approach, and insist on 1 big server, I recommend the following system:

1.Dual dual core Opterons (find the cheapest ones you can find, you don't be CPU bound, and you are going dual socket purely for the dual integrated NUMA memory controller found on the opterons) , you can get 2 of them for $400

2. 8GB(8x1GB) of memory minimum, 16GB(8x2GB) if you can afford them. 400-500 bux

3.a dual socket mobo $300-400

4.the 12 Samsung drives $800

5. Areca is fine, but get a DDR dimm for its cache(as big as it supports), although for your application, I am thinking maybe getting dual 8 channel PCI-E Areca's and then software RAID the two RAID-5s might get you close to 1GB/sec IO. $1000

so yes, you are looking at a $5000 system.

the distributed way:

get 4 1U Tyan Barebones, stick some cheap dual core E2160s in them 2GB each node and since the 1U tyans come in 4 hot swappable bays(RAID5 software), you are looking at buying 16 samsung SATAs to fill them, but this way it's cheaper, you can pull this off for 3000. Each 1U would have dual GigE, giving you 8Gigabit peak bandwidth over ethernet, probably a little slower than having dual Arecas in a single system.

As for the case, Cooler Master stacker is decent for lots of drives. If you want 1 system, you are going to have to pay for the hotswappable SATA cases like http://www.newegg.com/Product/...x?Item=N82E16811152048






 

myocardia

Diamond Member
Jun 21, 2003
9,291
30
91
Originally posted by: sonoran
Anyhow, I'd second what Myocardia said about I/O. A SAN is probably your best option. They definitely don't come cheap, but a well-configured SAN can easily do 10x the I/O of local disk. I'd also suggest getting the fastest system you can that allows you to use LOTS of RAM. This is a case where 4GB probably is not cutting it.
I wasn't talking about a SAN, I was talking about Serial Attached SCSI hard drives, the replacement for the slow old SCSI we're all used to servers using as their storage medium.
Originally posted by: Nemesis1
Skulltrail will be released in aug. It will use 2 socket 775 chips on a 1600 FSB probabaly the server chip with 775 pins. It will use DDR3 of course and have dual fsb.

Now that doesn't sound like INTEL doesn't want its customers buying a cheaper workstation now does it??
Now that you mention it, yes. Here is Intel's current price differential between their quad-core chips: 2.4 Ghz LGA 775 quad for ~$530, and 2.0 Ghz LGA 771 quad for $733. Be sure and bookmark this thread, though, so you can bring it back up when Intel starts giving any of their "server" processors away in cereal boxes, okay?;)
 

JohnVM

Member
May 25, 2004
170
0
76
tshen83: You nailed it. I'm data mining massive amounts of historical financial data for primarily back testing trading models/algorithms. Only problem w/ your statement is that that it's not even 1TB, it's 6TB+ (At the moment, and this figure will only increase over time. Rapidly.) Much of the data is indeed time series data. I do however have a lot of data which is not... few hundred GB, which also I need to data mine.

The "scale out" vs "scale up" argument is an old one when it comes to big sets of data (I've read about it for years w/ google etc.) and really you're probably right - scaling out is long-term a better solution. The difficulty in that is: I don't really have any experience doing it, and as it is (what is to me) a substantial amount of money, I need to know that it's going to work. If the $5000 wasn't a lot to me I'd be more than willing to try the scale-out, and if it didn't work just buy new hardware and build a huge rig. But as that's not an option, I have to know whatever solution I go with is going to work.

Your idea about 2 areca cards in software RAID is interesting. Would, for instance, me formatting the windows install and reinstalling at some point mean that the RAID is then dead/gone for good (aka is a software raid like that bound to a specific install of windows)? I've never messed with software raids outside of LVM in Debian. Designing one big rig to your specs described is moderately diff from the original one I had spec'd out -- you really think that higher end CPU's (the Xeons) would not really help much? Those Opterons are mega cheaper than the Xeon's I had spec'd out. I like the idea of a lot of RAM, and liked that idea before too -- just price constrained. However, if I could save on the CPUs, that becomes an option. Would FB-DIMMs help for this type of server vs DDR? Also, in this setup for this purpose, does a lot of RAM even help much, as regardless of how much RAM I put into the machine, it still won't be able to hold the whole DB?

Re: Licensing -- that's actually believe it or not one of the original reasons why I *did* choose the Windows setup. My college room mate owns a substantial sized colo/hosting co, and they have extra licenses kicking around going unused through some deal they have with Microsoft. He doesn't mind "lending" me one until they need it (which he thinks they never will), so that offers HUGE savings to me on the software side of things. I just looked at kx.com's site -- that looks incredible, and would be exactly what I need, but I don't think it's going to be at all affordable.

I'm sending you a PM. Perhaps we could talk on AIM or something sometime in the next few days?