Recommend me a data-crunching computer build?

bovinda

Senior member
Nov 26, 2004
692
0
0
I need to build a computer that will primarily be used for data analysis (so I assume multi-core, possibly multi-CPU?). Specifically, it will be using the program Gaussian (a chemistry analysis program), which does huge calculations for very long times... I've never built a computer like this, so any recs are appreciated.

Budget is approximately $2200-2500, maybe more if it makes sense.

I think probably will not be gaming, probably. But they might watch movies or stream tv on it to a 4K TV.

Might be overclocking, again, if it makes sense and can still run stable (I assume this computer will be left on for long periods of time while it runs these calculations).

Only thing that will be reused will be a mouse, keyboard, and monitor. But all the other hardware needs to be new.

Thanks for all suggestions! :)
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
Overclocking and data-crunching usually do not go well together. But let me ask you this: How important is it that this system get the right answer the first time, every time?

I also have a suspicion that, since Gaussian was originally written on non-x86, it may not utilize AVX and similar features of x86. It also looks like the software hasn't changed in seven years. :eek: So it might make sense to find an old, used dual Xeon system.

Can you provide any indication of disk requirements for this software? How much disk space does it need, and how frequently does it access it?

And will this system use the GPU accelerated version of Gaussian if/when it comes out?
 

bovinda

Senior member
Nov 26, 2004
692
0
0
Hey Ken, thanks for the response.

In that case, maybe I'll nix the overclocking idea. I think accuracy would definitely be more important for him.

It looks to me like the most recent release of Gaussian is Dec 14 2015, from the release notes on their website (it looks like the "9" in Gaussian 9 refers to the version, not the year). From talking with the family member who's familiar with this software, they're pretty sure it's able to utilize multiple cores/threads for processing. I guess at work they use multi-CPU set-ups to run it. But he wants to do some at home, on the side. Their website also says it will run with Windows 10, and has both 64 and 32 bit versions available.

Here's from their website:

Recommended Minimum System Requirements

The 64-bit version of G09W is not limited in the number of processors (or cores) that can be accessed for shared memory parallel calculations. It does not have the 32-bit addressing limitations that the 32-bit G09W version has, so for all practical purposes, the 64-bit version of G09W can access an unlimited amount of memory and disk space.

Processor: AMD64 or Intel64 (EM64T) system running supported 64-bit Windows version.
Operating System:
64-bit Microsoft Windows 7 Home, Premium, Professional, Ultimate, Windows 8, 8.1, Windows 10, Windows Server 2012 R2.
Memory (RAM): >2 GB
Disk: 1.5 GB (G09W storage); and 500 MB or more (scratch space)
Other: CD-ROM drive; Mouse


As for disk usage, it sounds like it does a lot of reading and writing during the calculation process.

I'm not sure if the most recent version of Gaussian is GPU accelerated or not, but let's assume that yes, he would use that.

Does this help in making any other suggestions?

Thanks Ken and others!
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
it looks like the "9" in Gaussian 9 refers to the version, not the year
I dunno, from their revision history (PDF), it looks like the 09 does refer to the year. They have versions back to "82".

That's what gives me pause about considering video cards. That and the fact that I don't actually see evidence that there is a working version of Gaussian that uses GPUs.
 

bovinda

Senior member
Nov 26, 2004
692
0
0
Ah, I see, I think you're right. But regardless of the name being 7 years old, the software's most recent version is from December 2015, and it looks like it would benefit from more cores, doesn't it? And why don't we assume it can't really utilize a video card for processing at this point, even if it might in the future.

So, knowing/assuming this, can you (or anyone else) recommend some parts for me to consider in building?
 

MarkLuvsCS

Senior member
Jun 13, 2004
740
0
76
It would be ideal to check with the family member familiar to give you an idea on some of the usages while he runs it. Dual E5-2670 or 2680 offer a lot of computational power if the software can utilize 16+ cores. It would be good to find out how much peak memory it may use so you can build for that need 32/64/128/256. If disk access is also heavily used while running than the Intel PCI-E SSDs would be ideal. Their capacities even range up to 1.2tb should space be a concern.

My E5-2670 run at 3ghz under full load, so the 2680s can probably maintain their higher boost of 3.5ghz pretty easily with good cooling. Unfortunately some of my neural nets can't even fit in 128gb ram. I couldn't find out ahead of time my requirements but you to have a base case to use.

My recommendations would certainly be for the retired enterprise gear if it can handle 16+ threads. Yes it is used but sellers typically guarantee against DOA. I have no fear this equipment will last for years more to come. You can get 2x E5-2670s + 128gb ECC RAM + mobo ~$550-600. That leaves $200 case, $100 cooling, and plenty of room for PCI-E SSDs/SSDs/HDDs.
 
Last edited:

strategyfreak

Junior Member
May 30, 2016
17
12
51
Gaussian does use AVX in their most recent builds (Rev E.) Get this distribution if you can. There is no GPU acceleration yet (rumors are that Gaussian might have it in the next version), but your relative might want to look into TeraChem if he's interested. I haven't used it myself, but I have heard good things about it. They use NVIDIA GPUs to get incredible speedups, but the scope of calculations is more limited compared to Gaussian, which is probably the most versatile out there.

Don't use the linda system for multinode computing. There is a minimal performance boost. If you can, build as powerful of a dual socket node as you can afford.

It scales decently well, but clockspeed is also important. Benchmarks on the Internet suggest that it scales very well to 12-20 cores, and good out to 48 cores, so pretty much get the biggest 2P system you can afford (dual E5-2699v4, ideally).

Get as much RAM as you can get your hands on. This is just as important as cores and clockspeed, and you need to have the RAM to match your processors. Frequency calculations on larger systems require enormous amounts of memory, like 5 or more GB of RAM per core to run efficiently. Otherwise, it can't keep all the cores fed and will only use a few. Most computing clusters actually don't have this much memory per node, so your friend will have a significant speed advantage if he can get like 1TB. If you can get enough RAM, you might be able to run your calculation "in core" which means entirely in memory instead of having to recompute the integrals as the processor works through them.

Get a very fast SSD if he's planning on running the "conventional" algorithm.

Here's some links about Gaussian's need for memory and fast I/O.
http://www.lct.jussieu.fr/manuels/Gaussian03/g_ur/m_eff.htm
http://www.cup.uni-muenchen.de/ch/compchem/energy/hf5.html

Also, diasble HyperThreading. Gaussian does not gain from it, and in some cases it will slow it down. Finally, if you are interested in the effects of various configurations on speed, DO NOT bench gaussian and report the times on the Internet. It is not allowed and you could get yourself as well as any institution associated with you banned.

Edit: A note on overclocking - anecdotal experience from the internet and "other sources" seems to suggest that a significant speedup can be obtained by doing so, but do so at your own risk. Recommend checking geometries and energies yourself and maybe running a duplicate calculation on a system at stock speed. Chances are if the program gets an error, it will simply crash rather than proceeding. Take all of this with a grain of salt though and proceed carefully. Either way Intel does not allow overclocking on Xeons post Nehalem so this will only apply if you get the 5960x or 6950x (don't buy the 6950x, get a dual Xeon instead).

Anyways, I'm curious to see what you get. I'd be so jealous if you got a dual 2699v4 system with 1TB of RAM. That would be a beast.
 
Last edited:

bovinda

Senior member
Nov 26, 2004
692
0
0
Thanks for the replies everyone, especially strategyfreak. This is all really helpful info. I'll let you know what we go with. Thanks again!