• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Suggested - Data Crunching Machine

Status
Not open for further replies.

imported_stantuna

Junior Member
I need to buy/build a special purpose, value oriented Window Desktop (no i can't use unix/linix) machine to do lots of batch data crunching, manipulation, merging, de-duping, etc... of large 100+ gigabyte files. This will be an ongoing process with lots of different datasets.

What would be the biggest bottlenecks, ie most important to spend $$ on.
Lots of Fast RAM
What type of Disk I/O subsystem, recommended drives
Single fast CPU
etc....

Any off the shelf systems for this?
 
Hm, can you elaborate on "data crunching, manipulation"? What data access patterns would you use? What software are you using that has to be in Windows?

"Merging" would be fine from two fast drives. De-duping could be harder, unless you mean just checking that one file is not identical to any other file previously seen. Block-level de-duping could be harder.

What would be the biggest bottlenecks, ie most important to spend $$ on.
Lots of Fast RAM
What type of Disk I/O subsystem, recommended drives
Single fast CPU
etc....

Any off the shelf systems for this?
Fast RAM probably won't help if a single file can't reliably fit in it. Though [post=35103032]how compressible is the data[/post]?

I'd love to suggest an SSD for this, but consumer SSDs don't last long on this kind of work. And it sounds like you can't afford the pro-level SLC SSDs.

P.S. Bagged a lurker! 😀
 
Lots of RAM
No need to worry over fast RAM. All RAM is fast, compared to anything but CPU cache.
What type of Disk I/O subsystem, recommended drives
At the least, fast SSDs, with significant user-added over-provisioning (to help keep WA down by never letting the drive truly get filled up, less than for performance, though it would generally improve that, too). Proper server SSDs might be needed, though, depending on how much, and how, the drives will be written to.

Maybe a RAID 0, if all is temporary, and can easily be restarted after downtime, or a RAID 10 if not. For Windows and RAID, go LSI, and for the sake of IOPS, no parity RAIDs (RAID 5 might be fast enough with SSDs, but it's only 33% more to make it a RAID 10, in a minimal configuration).

Of consumer SSDs, the Sandisk Extreme II and Samsung 840 Pro maintain low latencies pretty well, don't get bogged down with high QDs, and don't die too often (sorry, Vertexes...). The 840 Pro would need some added OP to really shine, but with it, it does. It stays inconsistent, but very fast on average. I'd get the Extreme II, on account of recent pricing, myself.
Any off the shelf systems for this?
Sure. But, open your wallet. Value-oriented dos not exist at all, and will certainly not exist for big OEM servers with lots of RAM and SSD arrays. As it is, with 100GB+ files, you'd ideally want much more than that in RAM alone. If that's not an option (I don't even what to know what 256GB costs, with the price hikes, lately), SSDs to back up what RAM you can afford will probably be the best you can do (more cost-effective, too).

Since it sounds pretty businessy and all, I'd seriously look at a Xeon equipped low end server, be it a diskless Thinkserver or SuperMicro barebones, and using ECC RAM. Upgrade via a 3rd party with compatible SKUs, like Crucial and Kingston. You can only go up to 32GB, with LGA1150 (such as a Xeon E3).
 
Last edited:
Status
Not open for further replies.
Back
Top