Recommendations for high RAM, single-threaded system?

mattk8

Junior Member
Sep 13, 2006
12
0
0
I am a student and researcher that does extensive analysis of large datasets using R, a statistical software program. R is single threaded and many necessary functions within R use a lot of memory, given the size of my data. I have upgraded my pc (Dell i7 system) several times in quick succession and am now at 12GB, its maximum supported RAM, and it's not enough to analyze my data. Servers available to my at my university top out at 16GB so I need to find another solution.

My data is growing rapidly and I would like to purchase a system that will be usable as my dataset grows to many times its current size, so I am looking for a system that will support large amount of RAM and be as fast as possible for single threaded operations.

What system build you would recommend?

What I have observed so far:
As far as I can tell, newer i7 systems only support up to 24GB of RAM, but I would outgrow that in a few months or sooner. While i5 or i7 systems might offer the best computational performance for my single-threaded needs, their RAM maximums are just too low.

Multi-socket Xeon systems support the largest amounts of RAM, as far as I can tell, up to 144GB or even 192GB in some cases, and more if you can pay huge amounts of money (I can't!). However, since my processes are single threaded I am unsure whether I could even make use of all the RAM in a multi-processor build. For example, would a dual processor build with 12*8GB = 96GB of RAM actually provide 96GB of RAM for my processes, or just the 48GB of RAM associated with one processor?

Assuming the answer to my previous question is 48GB, then it seems like the best I can do it find a single processor Xeon system that supports a large number of DIMM's, and 16GB DIMM's if possible.

Is this the best design for such a system? Am I failing to consider other important factors? I may have to buy such a system myself, so cost is an issue. Any help or recommendations are greatly appreciated. Thank you.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Is R able to use the hard drive for analyzing data instead of needing to hold it all in memory? If so, a 120 GB SSD for ~$250 might be your most practical choice, or possibly even a 250 - 320 GB drive.
 

mattk8

Junior Member
Sep 13, 2006
12
0
0
@ DaveSimmons

That's a great question/suggestion. I have looked into this but been unable to find any R documentation on it. However, based on my experiences it seems on the Windows version that it just hits a memory exception when it cannot allocate RAM for an object, but on my OSX machine it does seem to flow over to the SSD (I have one installed on my MBP laptop, but it only has around 10GB of free space plus the 8GB of RAM in that system which in total is often still not enough).

Not surprisingly, when operations do utilize the SSD on my MBP it's really (really) slow compared to operations that can be performed in RAM, but it's certainly much more cost effective than buying 200GB of RAM.

If most of my analyses were batch operations I would probably consider this a perfect solution. However, my analyses are often exploratory/iterative and so I am trying to at least see what is possible and semi-affordable in the world of high-RAM systems. If I can't find a solution (or if I eventually outstrip all RAM limits), then a fast SSD may be my only solution, even if it is painfully slow.
 
Last edited:

mattk8

Junior Member
Sep 13, 2006
12
0
0
@Sandorski

That's very interesting! If I used a dual processor MB but only installed one CPU, could I install and utilize RAM in all the sockets? Or do you have to have the second processor installed for some reason?
 

dorion

Senior member
Jun 12, 2006
256
0
76
Is R able to use the hard drive for analyzing data instead of needing to hold it all in memory? If so, a 120 GB SSD for ~$250 might be your most practical choice, or possibly even a 250 - 320 GB drive.

With a SSD just set the Swap file to gigantic amounts of space.
 
Last edited:

mattk8

Junior Member
Sep 13, 2006
12
0
0
@ mnewsham

I don't have a fixed budget. Less expensive is better (obviously) but there isn't a set upper limit. It looks like I could get a 48GB system for around $4000 and 96GB for under $6000, assuming all RAM in a dual processor build is usable for a single-threaded process. In the few instances where I've found systems that support >> 192GB of RAM, prices are astronomical and definitely way above what I'd like to spend.
 

mnewsham

Lifer
Oct 2, 2010
14,539
428
136
Well i have found a few systems that can support 512GB or 1TB of RAM, but probably out of your budget ;) (especially since the CPU's alone are $4k a piece and you need 4 of them)

It looks like you can get a dell with 64GB-96GB of RAM for about $6-8k (with a 3.6Ghz quad core)
 

sandorski

No Lifer
Oct 10, 1999
70,697
6,257
126
@Sandorski

That's very interesting! If I used a dual processor MB but only installed one CPU, could I install and utilize RAM in all the sockets? Or do you have to have the second processor installed for some reason?

I was thinking Single CPU with Multiple Cores. I'm no expert, but am pretty sure that each bank of Memory Slots is dedicated to a particular CPU in a Multiprocessor system.
 

mnewsham

Lifer
Oct 2, 2010
14,539
428
136
I was thinking Single CPU with Multiple Cores. I'm no expert, but am pretty sure that each bank of Memory Slots is dedicated to a particular CPU in a Multiprocessor system.

I think that's right but im not sure 100% either ;)
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
I was thinking Single CPU with Multiple Cores. I'm no expert, but am pretty sure that each bank of Memory Slots is dedicated to a particular CPU in a Multiprocessor system.

In a modern multi-socket system, the RAM banks are divided up between the CPU sockets. The RAM connects to a memory controller - on modern systems, the memory controller is in the CPU.

So, if you leave a socket empty, those RAM slots associated with that socket won't work, because there will be no memory controller for them.

If you install both CPUs, then the RAM will work. All the RAM will be available to every core on every socket (but performance will be slightly reduced when a CPU core accesses RAM connected to a different CPU socket - windows 7 supports this mismatch in RAM speed and will try to arrange for programs and RAM blocks to run on matched sockets, if this is possible)
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
In a modern multi-socket system, the RAM banks are divided up between the CPU sockets. The RAM connects to a memory controller - on modern systems, the memory controller is in the CPU.

So, if you leave a socket empty, those RAM slots associated with that socket won't work, because there will be no memory controller for them.

If you install both CPUs, then the RAM will work. All the RAM will be available to every core on every socket (but performance will be slightly reduced when a CPU core accesses RAM connected to a different CPU socket - windows 7 supports this mismatch in RAM speed and will try to arrange for programs and RAM blocks to run on matched sockets, if this is possible)

:thumbsup: This is correct.

OP, first of all, I'd suggest looking into server-class machines. I doubt your advisor would want to sink a bunch of money into a machine that could only be used by you. Time sharing the system across several users is a great way to make the cost more attractive. There are plenty of remote display options for both Windows and Linux that will allow you to have a desktop like experience across the network.

AMD is probably the best way to go to get a bunch of DIMM slots on the cheap. Take a look at the Dell R815. You can get a 16X8GB=128GB config for about $9000 (before academic discounts) with 16 free DIMM slots available to grow to 256GB as necessary.

However, throwing hardware at the problem is probably not viable long-term strategy. There are functions in R that let you write datasets out to a file and then read it back in. You'll have to be a bit more sophisticated with your coding, but you should be able to intelligently unload and load the datasets as needed.

By the way, as your dataset grows, you will probably have to parallelize your code in order to get it to run in a reasonable amount of time. There are plenty of R frameworks available to do this.
 

dorion

Senior member
Jun 12, 2006
256
0
76
:thumbsup: This is correct.

OP, first of all, I'd suggest looking into server-class machines. I doubt your advisor would want to sink a bunch of money into a machine that could only be used by you. Time sharing the system across several users is a great way to make the cost more attractive. There are plenty of remote display options for both Windows and Linux that will allow you to have a desktop like experience across the network.

AMD is probably the best way to go to get a bunch of DIMM slots on the cheap. Take a look at the Dell R815. You can get a 16X8GB=128GB config for about $9000 (before academic discounts) with 16 free DIMM slots available to grow to 256GB as necessary.

However, throwing hardware at the problem is probably not viable long-term strategy. There are functions in R that let you write datasets out to a file and then read it back in. You'll have to be a bit more sophisticated with your coding, but you should be able to intelligently unload and load the datasets as needed.

By the way, as your dataset grows, you will probably have to parallelize your code in order to get it to run in a reasonable amount of time. There are plenty of R frameworks available to do this.

+1
 

mattk8

Junior Member
Sep 13, 2006
12
0
0
Thank you everyone for your very helpful comments.

It sounds like a multiprocessor Xeon system, or an AMD server, is the way to go. This should allow me to install enough RAM to analyze my data now and to grow the system as my data continues to grow.

If we set this up as a server, do you have any recommendations for a remote display solution? I have been using logmein as a low-rent (free) solution in this regard, but the latency is extremely cumbersome for my analysis. I've looked up other options but am not sure where to turn.

The main memory hog for me are multi-level regression analyses in R. I suspect these are written fairly inefficiently since they take up much more RAM when running than my entire dataset itself when it's loaded into RAM (which is itself quite large).

I have experimented with functions like "ff" which offload storage to the HDD rather than being constrained by RAM, but these seem to be very immature and haven't worked well for me (they work for simple matrix operations but choke with anything remotely complex). There are some specialized "big data" functions which perform quite well but include only ordinary regression and not the multi-level regression functions that I unfortunately need (I can actually easily run ordinary regressions today, without special functions). If any of you know of a good package for "big data" multi-level regression, I'd love to know about it.

I have also searched for parallelized versions of multi-level regression functions but have so far failed to find any. It might be possible to write my own statistical function for multi-level regression that is parallelized but I don't have the time or expertise to do it, at this point, assuming it is even possible. Hopefully such a function will be created but at this point I seem to be stuck with what is available. I've tried every different R package I can find that contains multi-level regression, but all seem to have similar performance characteristics. In fact, every major statistical package seems to be single-threaded as far as I can tell (e.g. SAS, SPSS). If you happen to know of a multi-level package that is parallelized, I'd love to know about it.

Again, thank you all for your help.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Sorry, didn't know you were doing multi-level regression. That can get pretty hairy :(

As for remote display, since you're at a university, you should be able to get the server set up with a public IP very easily. Then just use the Remote Desktop Session host capabilities of Windows Server or XRDP for Linux (I know that there is a package for RHEL/CentOS/Fedora, and I'd imagine that one exists for Debian as well).