Question Choosing the right CPU/mobo for a system that will support 4-6 Titan RTXs, for the purposes of TensorFlow calculations

Gatecrasher3

Senior member
Oct 15, 2004
417
0
76
I have been tasked with building a system for my company, the system will be completely dedicated to machine learning using TensorFlow. We have chosen to use GPUs with large amounts of vRAM, so we will be using 4 or 6 Titan RTX GPUs (because of added cost we didn't use quattros).

I already have the dual PSUs, and chassis purchased, however the only parts I'm still debating is the CPU and mobo.
I was unclear about what CPU to use, because the necessary bus lanes that 4-6 Titans could possibly be using if all Titans were working at the same time. I don't know if GPUs doing machine learning calculations require the same amount of bus lanes that a GPU doing game rendering would. Also, lets say Im using 6 RTX GPUs, each (I believe) would use 16 lanes, does that mean I would need a CPU that could support 96 lanes in order for them to run without a bus lane bottleneck?
I see that the 3000 series of threadripper has 88 lanes for PCI IO, is that the highest amount of lanes a consumer grade CPU offers?

Anyways, I don't know enough about machine learning or bus lanes to fully choose the best CPU or mobo for what we need, and if anyone could shed some light that would be super helpful.

Any help would be SUPER appreciated.
Thank you!
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
Hummm, this is saying "PCI lanes dont matter for 4 or less GPUs", however we could be running 6, I will need to keep looking. Thanks thou.

The key takeaway from that link for multi-gpu PCIe lanes for me was this:

"I would make sure that I can get a support of 8 PCIe lanes per GPU"

So for 6 GPUs, you will want a motherboard that can give you a minimum of 48 PCIe lanes dedicated to GPUs.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
so we will be using 4 or 6 Titan RTX GPUs (because of added cost we didn't use quattros)
Be careful here. You might want to get the cards with ECC otherwise hours (if not days) spent training might lead to a crash and/or wrong results. Its the last thing you want.
And same thing for CPU/MB/RAM. Get something with ECC.
We have gone through these things too in our team. Might be worth to think it over.

I have been tasked with building a system for my company, the system will be completely dedicated to machine learning using TensorFlow. We have chosen to use GPUs with large amounts of vRAM, so we will be using 4 or 6 Titan RTX GPUs (because of added cost we didn't use quattros).

I already have the dual PSUs, and chassis purchased, however the only parts I'm still debating is the CPU and mobo.
I was unclear about what CPU to use, because the necessary bus lanes that 4-6 Titans could possibly be using if all Titans were working at the same time. I don't know if GPUs doing machine learning calculations require the same amount of bus lanes that a GPU doing game rendering would. Also, lets say Im using 6 RTX GPUs, each (I believe) would use 16 lanes, does that mean I would need a CPU that could support 96 lanes in order for them to run without a bus lane bottleneck?
I see that the 3000 series of threadripper has 88 lanes for PCI IO, is that the highest amount of lanes a consumer grade CPU offers?

Anyways, I don't know enough about machine learning or bus lanes to fully choose the best CPU or mobo for what we need, and if anyone could shed some light that would be super helpful.

Any help would be SUPER appreciated.
Thank you!
Just keep in mind it is not only GPUs. A lot of things happen on CPU as well before the work can be scheduled on GPU like padding, tokenization, reshaping and what not.
Ideally you need a fast CPU with enough PCIe lanes.
You might be already aware but you should try your pipeline with a log showing where the execution is happening.

Python:
import tensorflow as tf
if tf.test.is_built_with_gpu_support():
    print("The installed version of TensorFlow includes GPU support.")
    if(tf.test.is_built_with_rocm() | tf.test.is_built_with_cuda()):
        for device in tf.config.list_physical_devices():
          if(device.device_type =='GPU'):
            print('GPU found ', device.name)
        tf.debugging.set_log_device_placement(True)
    else:
        print("No GPU found")
else:
    print("The installed version of TensorFlow does not include GPU support.")
 

juergbi

Junior Member
Apr 27, 2019
12
14
41
I see that the 3000 series of threadripper has 88 lanes for PCI IO, is that the highest amount of lanes a consumer grade CPU offers?

3rd Gen Threadripper on TRX40 "only" has 72 PCIe 4.0 lanes available to devices, up to 56 from the CPU and up to 16 from the chipset. 88 is mentioned in some places but it's a misleading and useless number as it also counts the lanes used by the CPU-chipset link.

Additional PCIe switches on the motherboard could theoretically support six 16x PCIe 3.0 cards without any bandwidth restriction, however, I'm not aware of a TRX40 motherboard with additional PCIe switches.

If you need more lanes, you probably want Threadripper Pro or EPYC.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,012
136
Be careful here. You might want to get the cards with ECC otherwise hours (if not days) spent training might lead to a crash and/or wrong results. Its the last thing you want.
And same thing for CPU/MB/RAM. Get something with ECC.
We have gone through these things too in our team. Might be worth to think it over.

Absolutely this. Don't cheap out and waste developer time.
 
  • Like
Reactions: Thunder 57

thesmokingman

Platinum Member
May 6, 2010
2,307
231
106
Wait for the Pro cpus or you will be stuck with the low clocking Epyc. That'll probably mean returning whhat ya bought and getting the Lenovo then slappiung in your gpus.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,698
136
For TRX40, what would he have available for motherboards with 6 x16 physical slots? There's a selection of X299 boards with that feature.
If going with RTX cards, you could probably fit them in single spaced if you're willing to go with water cooling.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
neural net processing is failure tolerant. ECC on the GPUs is not really that important. You can use INT8 or even lower precision for some models, no way a tiny tiny amount of bit flips matters. The obsession over ECC for "professional" work is so completely overblown.

Maybe - maybe - it is worthwhile on the CPU where the code may be less failure tolerant

If your entire training process is one enormous monolithic job with no threading or subprocesses or subjob scheduling or any other modern scale-out technique (the sort of thing that is vulnerable to bitflip causing real issues), then ECC is the last of your worries.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,377
126
Is this a situation where the actual CPU isn't actually loaded much overall? In that case the cheapest Epyc or even a cheap used Xeon 3647 platform (I believe these go way down to like 8C/8T low frequency) should do the job fine if it's basically an interface to run a bunch of PCIe lanes. This is similar to how people were making the cheapest possible mining rigs with whatever CPU was the cheapest to buy and run with the requisite GPUs taking all the budget.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Well ECC rules out Titan RTX. He'll have to use Quadros.

Anything not Quadro or Tesla is by Nvidia license not allowed to run on a server anyway. You can put it in a workstation for a single person but as soon as you share the machine like a server, then RTX is not allowed. NV did this because obviously all cloud player etc used cheaper geforce/titan cards for their servers and this way NV makes a lot more clutter.


The term "datacenter" might be ambigous and as a small player you will probably get away with but still not something I would do without my managers written approval.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,698
136
Sign off on it, but at least for now it sounds like he's building a single machine. Even if multiple users access it, you'd be hard pressed to call that a datacenter deployment.
Nvidia Rep said:
“In contrast to PCs and small-scale LANs used for company and academic research projects, data centers typically are larger-scale deployments, often in multi-server racks, that provide access to always-on GPUs to many users,”