Question Choosing the right CPU/mobo for a system that will support 4-6 Titan RTXs, for the purposes of TensorFlow calculations

Gatecrasher3 · Jul 21, 2020

I have been tasked with building a system for my company, the system will be completely dedicated to machine learning using TensorFlow. We have chosen to use GPUs with large amounts of vRAM, so we will be using 4 or 6 Titan RTX GPUs (because of added cost we didn't use quattros).

I already have the dual PSUs, and chassis purchased, however the only parts I'm still debating is the CPU and mobo.
I was unclear about what CPU to use, because the necessary bus lanes that 4-6 Titans could possibly be using if all Titans were working at the same time. I don't know if GPUs doing machine learning calculations require the same amount of bus lanes that a GPU doing game rendering would. Also, lets say Im using 6 RTX GPUs, each (I believe) would use 16 lanes, does that mean I would need a CPU that could support 96 lanes in order for them to run without a bus lane bottleneck?
I see that the 3000 series of threadripper has 88 lanes for PCI IO, is that the highest amount of lanes a consumer grade CPU offers?

Anyways, I don't know enough about machine learning or bus lanes to fully choose the best CPU or mobo for what we need, and if anyone could shed some light that would be super helpful.

Any help would be SUPER appreciated.
Thank you!

MrGuvernment · Jul 21, 2020

A Full Hardware Guide to Deep Learning — Tim Dettmers

In this guide I analyse hardware from CPU to SSD and their impact on performance for deep learning so that you can choose the hardware that you really need.

timdettmers.com

Gatecrasher3 · Jul 21, 2020

MrGuvernment said:
A Full Hardware Guide to Deep Learning — Tim Dettmers

In this guide I analyse hardware from CPU to SSD and their impact on performance for deep learning so that you can choose the hardware that you really need.

timdettmers.com

Hummm, this is saying "PCI lanes dont matter for 4 or less GPUs", however we could be running 6, I will need to keep looking. Thanks thou.

scannall · Jul 21, 2020

You may want to consider a 32 core Epyc. Something like this. With this CPU.

Then stuff at least 128 gig of ram into it.

Hitman928 · Jul 21, 2020

Gatecrasher3 said:
Hummm, this is saying "PCI lanes dont matter for 4 or less GPUs", however we could be running 6, I will need to keep looking. Thanks thou.

The key takeaway from that link for multi-gpu PCIe lanes for me was this:

"I would make sure that I can get a support of 8 PCIe lanes per GPU"

So for 6 GPUs, you will want a motherboard that can give you a minimum of 48 PCIe lanes dedicated to GPUs.

DisEnchantment · Jul 21, 2020

Gatecrasher3 said:
so we will be using 4 or 6 Titan RTX GPUs (because of added cost we didn't use quattros)

Be careful here. You might want to get the cards with ECC otherwise hours (if not days) spent training might lead to a crash and/or wrong results. Its the last thing you want.
And same thing for CPU/MB/RAM. Get something with ECC.
We have gone through these things too in our team. Might be worth to think it over.

Gatecrasher3 said:
I have been tasked with building a system for my company, the system will be completely dedicated to machine learning using TensorFlow. We have chosen to use GPUs with large amounts of vRAM, so we will be using 4 or 6 Titan RTX GPUs (because of added cost we didn't use quattros).

I already have the dual PSUs, and chassis purchased, however the only parts I'm still debating is the CPU and mobo.
I was unclear about what CPU to use, because the necessary bus lanes that 4-6 Titans could possibly be using if all Titans were working at the same time. I don't know if GPUs doing machine learning calculations require the same amount of bus lanes that a GPU doing game rendering would. Also, lets say Im using 6 RTX GPUs, each (I believe) would use 16 lanes, does that mean I would need a CPU that could support 96 lanes in order for them to run without a bus lane bottleneck?
I see that the 3000 series of threadripper has 88 lanes for PCI IO, is that the highest amount of lanes a consumer grade CPU offers?

Anyways, I don't know enough about machine learning or bus lanes to fully choose the best CPU or mobo for what we need, and if anyone could shed some light that would be super helpful.

Any help would be SUPER appreciated.
Thank you!

Just keep in mind it is not only GPUs. A lot of things happen on CPU as well before the work can be scheduled on GPU like padding, tokenization, reshaping and what not.
Ideally you need a fast CPU with enough PCIe lanes.
You might be already aware but you should try your pipeline with a log showing where the execution is happening.

Python:

import tensorflow as tf
if tf.test.is_built_with_gpu_support():
    print("The installed version of TensorFlow includes GPU support.")
    if(tf.test.is_built_with_rocm() | tf.test.is_built_with_cuda()):
        for device in tf.config.list_physical_devices():
          if(device.device_type =='GPU'):
            print('GPU found ', device.name)
        tf.debugging.set_log_device_placement(True)
    else:
        print("No GPU found")
else:
    print("The installed version of TensorFlow does not include GPU support.")

DrMrLordX · Jul 22, 2020

@DisEnchantment

Threadripper Pro to the rescue?

tamz_msc · Jul 22, 2020

DrMrLordX said:
@DisEnchantment

Threadripper Pro to the rescue?

Lenovo P620 releases in Q4'20.

DisEnchantment · Jul 22, 2020

DrMrLordX said:
Threadripper Pro to the rescue?

Exactly.
If running them on a farm/cluster then the 7F series EPYCs with higher clocks.

DrMrLordX · Jul 22, 2020

tamz_msc said:
Lenovo P620 releases in Q4'20.

Well, if OP wants to wait . . .

juergbi · Jul 22, 2020

Gatecrasher3 said:
I see that the 3000 series of threadripper has 88 lanes for PCI IO, is that the highest amount of lanes a consumer grade CPU offers?

3rd Gen Threadripper on TRX40 "only" has 72 PCIe 4.0 lanes available to devices, up to 56 from the CPU and up to 16 from the chipset. 88 is mentioned in some places but it's a misleading and useless number as it also counts the lanes used by the CPU-chipset link.

Additional PCIe switches on the motherboard could theoretically support six 16x PCIe 3.0 cards without any bandwidth restriction, however, I'm not aware of a TRX40 motherboard with additional PCIe switches.

If you need more lanes, you probably want Threadripper Pro or EPYC.

NTMBK · Jul 22, 2020

DisEnchantment said:
Be careful here. You might want to get the cards with ECC otherwise hours (if not days) spent training might lead to a crash and/or wrong results. Its the last thing you want.
And same thing for CPU/MB/RAM. Get something with ECC.
We have gone through these things too in our team. Might be worth to think it over.

Absolutely this. Don't cheap out and waste developer time.

tamz_msc · Jul 22, 2020

NTMBK said:
Absolutely this. Don't cheap out and waste developer time.

Well ECC rules out Titan RTX. He'll have to use Quadros.

thesmokingman · Jul 22, 2020

Wait for the Pro cpus or you will be stuck with the low clocking Epyc. That'll probably mean returning whhat ya bought and getting the Lenovo then slappiung in your gpus.

MrTeal · Jul 22, 2020

For TRX40, what would he have available for motherboards with 6 x16 physical slots? There's a selection of X299 boards with that feature.
If going with RTX cards, you could probably fit them in single spaced if you're willing to go with water cooling.

Headfoot · Jul 22, 2020

neural net processing is failure tolerant. ECC on the GPUs is not really that important. You can use INT8 or even lower precision for some models, no way a tiny tiny amount of bit flips matters. The obsession over ECC for "professional" work is so completely overblown.

Maybe - maybe - it is worthwhile on the CPU where the code may be less failure tolerant

If your entire training process is one enormous monolithic job with no threading or subprocesses or subjob scheduling or any other modern scale-out technique (the sort of thing that is vulnerable to bitflip causing real issues), then ECC is the last of your worries.

Arkaign · Jul 22, 2020

Is this a situation where the actual CPU isn't actually loaded much overall? In that case the cheapest Epyc or even a cheap used Xeon 3647 platform (I believe these go way down to like 8C/8T low frequency) should do the job fine if it's basically an interface to run a bunch of PCIe lanes. This is similar to how people were making the cheapest possible mining rigs with whatever CPU was the cheapest to buy and run with the requisite GPUs taking all the budget.

beginner99 · Jul 22, 2020

tamz_msc said:
Well ECC rules out Titan RTX. He'll have to use Quadros.

Anything not Quadro or Tesla is by Nvidia license not allowed to run on a server anyway. You can put it in a workstation for a single person but as soon as you share the machine like a server, then RTX is not allowed. NV did this because obviously all cloud player etc used cheaper geforce/titan cards for their servers and this way NV makes a lot more clutter.

Nvidia: Using cheap GeForce, Titan GPUs in servers? Haha, nope!

Nice try, but no, you're gonna have to cough up for these expensive data center chips

www.theregister.com

The term "datacenter" might be ambigous and as a small player you will probably get away with but still not something I would do without my managers written approval.

MrTeal · Jul 22, 2020

Sign off on it, but at least for now it sounds like he's building a single machine. Even if multiple users access it, you'd be hard pressed to call that a datacenter deployment.

Nvidia Rep said:
“In contrast to PCs and small-scale LANs used for company and academic research projects, data centers typically are larger-scale deployments, often in multi-server racks, that provide access to always-on GPUs to many users,”

Question Choosing the right CPU/mobo for a system that will support 4-6 Titan RTXs, for the purposes of TensorFlow calculations

Senior member

Junior Member

Senior member

Golden Member

Diamond Member

Golden Member

Lifer

Diamond Member

Golden Member

Lifer

Junior Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member