Dual XEON 2696 v4 Workstation Build for Machine Vision, Gaming, Video Editing & Trading Build Log

traderjay

Senior member
Sep 24, 2015
220
165
116
My i7-3970x has served me well for the past 5 years and with the advent of 4K videos, better optimized multi-core applications, I've finally made the decision to upgrade my system. My new build consists of the following:

CPU - Dual Xeon E5 2696V4
RAM - 64 GB Crucial DDR4 PC-2666 (8x8GB)
Storage: 2X800GB HGST SAS SSD RAID 0 for application/scratch disk
OS - Samsung 512 GB 850Pro
GPU - Nvidia GTX 970
Case - Lian-Li PC-V2120x
Motherboard - Supermicro X10DAX
PSU - Seasonic Titanium 1000W

This system will also be used to benchmark various machine vision applications such as 3D point cloud processing used in machine vision inspection systems for quality control. (ScanXtream)

My other workstation is a single E5 2696V3 and Vegas 15 and Adobe Media Encore can take advantage of all physical cores when encoding videos and I can't wait to see the additional improvement from my new workstation. If you guys have good benchmark ideas let me know and I'll keep this thread up-to-date.
 

pjmssn

Member
Aug 17, 2017
89
11
71
Wow, awesome workstation, that's gonna be a beast! I look forward to seeing the benchmark results.
 

traderjay

Senior member
Sep 24, 2015
220
165
116
I crammed X10DAX with dual 2690v4 into an ATX midi tower:
https://forums.anandtech.com/threads/water-cooling-vs-fan-cooling.2505326/page-2#post-38902843
Later I did the same with dual 2696v4, but with storage reduced from PCIe to SATA.
These are Linux compute servers.

Amazing build you go there! Monster power in a small package! How is the noise level on the fans? I am using Noctua NH-U12DX i4 cooler. Also, did you use any of SuperMicro's overclocking features on the board?
 

hasu

Senior member
Apr 5, 2001
993
10
81
I crammed X10DAX with dual 2690v4 into an ATX midi tower:
https://forums.anandtech.com/threads/water-cooling-vs-fan-cooling.2505326/page-2#post-38902843
Later I did the same with dual 2696v4, but with storage reduced from PCIe to SATA.
These are Linux compute servers.

Beautiful build!

Looking at relatively low TDP of 2690v4 at 135W, I am wondering what would be the total power consumption of your setup while bench-marking (with all cores hitting 100%)? How about 2696v4?

While on turbo, do all the cores hit 3.5GHz (and 3.6)?
 

StefanR5R

Elite Member
Dec 10, 2016
6,504
10,113
136
Looking at relatively low TDP of 2690v4 at 135W, I am wondering what would be the total power consumption of your setup while bench-marking (with all cores hitting 100%)? How about 2696v4?
Under all-core non-AVX load, the dual 2690v4 box pulls about >350 W, and the dual 2696v4 box about <400 W. These are approximate numbers because I usually have only one power meter in front of several computers and some other devices. For the next few days I don't want to replug because I don't want to shutdown and restore their current jobs.

I don't recall AVX power consumption exactly; it's somewhat higher.

While on turbo, do all the cores hit 3.5GHz (and 3.6)?
Under all-core load, X10DAX (and also an Asus X99 board which I have with single 2696v4) maintain only the all-core turbo. More than that, with socket 2011-3 Xeons at least, is only possible with microcode hacks on E5v3 Xeons.

For reference, here are turbo bins of the 14-core 2690v4, the 22-core 2696v4, and its non-OEM sibling 2699v4:
Code:
               base      non-AVX turbo offset in 100 MHz, while n cores are used
        TDP    clock      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
-----------------------------------------------------------------------------------
2699v4  145 W  2.2 GHz   14 14 12 11 10  9  8  7  6  6  6  6  6  6  6  6  6  6  6  6  6  6
2696v4  150 W  2.2 GHz   15 15 13 12 11 10  9  8  7  6  6  6  6  6  6  6  6  6  6  6  6  6
2690v4  135 W  2.6 GHz    9  9  7  6  6  6  6  6  6  6  6  6  6  6  -  -  -  -  -  -  -  -

               AVX base  AVX turbo offset in 100 MHz, while n cores are used
        TDP    clock      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
-----------------------------------------------------------------------------------
2699v4  145 W  1.8 GHz   18 18 16 15 14 13 12 11 10  9  8  8  8  8  8  8  8  8  8  8  8  8
2696v4  150 W  1.8 GHz    ?  ...                                                         8
2690v4  135 W  2.1 GHz   14 14 12 11 10  9  8  8  8  8  8  8  8  8  -  -  -  -  -  -  -  -
I.e. in my usage with many-core workloads, the relevant processor clocks are 2.8/2.6 GHz (2696v4 non-AVX and AVX) and 3.2/2.9 GHz (2690v4 non-AVX and AVX).

How is the noise level on the fans?
IMO very satisfyingly with the 2690v4, but less so with the 2696v4.

So far I am using one of the built-in fan profiles of the BIOS; I don't remember which one. Subjectively, this works good enough for me with the 2690v4, but not as well with the 2696v4. That's because the maximum package temperatures are defined very differently in these two. Linux' sensors tool says:
2690v4: high = 93 °C, crit = 103 °C
2696v4: high = 83 °C, crit = 85 °C​

Under the current all-core non-AVX load and cosy ~26 °C room temperature, I get
2690v4: 60...63 °C CPU temp, ~750 RPM case fans, ~810 RPM cooler fans
2696v4: 49...51 °C CPU temp, ~920 RPM case fans, ~900 RPM cooler fans​

I.e. the BIOS's fan curve is aligned with the lower "high" temperature of the 2696v4 and drives its fans faster than on the 2690v4. (Or maybe the fan speed curve is aligned with TDP, but I doubt that.)

With more restrictive cases, I wouldn't like the fans speeds even of the 2690v4. But it's fine for me with the Corsair 400Q. (Front fascia is permanently off, but front dust filter is attached.) I fact I had this thing running 2 or 3 meters from my bed at some time and it didn't disturb my sleep at all.

The 900+ RPM of the 140 mm fans in the 2696v4 box causes more air noise which I wouldn't want under my desk, or in a bedroom. I haven't checked in the BIOS yet whether there is a slower fan profile, nor have I tried low noise adapters on the fans yet, as I had no pressing need for better acoustics so far.

My points for comparison are a Fractal Design R5 PCGH edition ( = no top vents, no side vent), in which I prefer 140 mm case fans and CPU cooler fan to stay below 650 RPM, and a Thermaltake Core case with 420 mm radiator at the top whose 140 mm radiator fans I preferred to stay below 700 RPM IIRC. (Both these setups are more restrictive than the 400Q, and the R5 also adds an irritating tonality at a certain level of air speed.)

By the way, as you may have noticed, my setup differs from yours regarding cooling config by lack of a high-wattage GPU. Therefore, both CPUs draw air almost at room temperature, not from the exhaust of the cooler of another high-powered device.

Furthermore, the front CPU is cooler than the back CPU by a few °C, typically by 2 °C, due to marginally better air supply of the CPU cooler in the front.

Also, did you use any of SuperMicro's overclocking features on the board?
I only enabled "Hyper-Turbo" which I presume helps with running the CPUs at turbo rather than base under all loads. I haven't experimented yet what would happen if I switch that off.

I did not enable BCLK overclocking yet, which Supermicro call "Hyper-Speed". It would overclock memory and PCIe too. I also kept core Voltage offset disabled so far.

Originally, I chose X10DAX for the 2690v4 based build (two builds actually) because I was interested in having at least all-core turbo applied at all time during many-core loads, and these being my first dual-processor builds ever, I was not sure whether other boards would give me that. I stuck with X10DAX with the 2696v4 build about a year later (actually two builds too) simply out of resistance against trying anything new. Clock speeds weren't that much of a criterion with the 2696v4 build for me anymore, but rather performance per socket and performance per Watt.
 
Last edited:

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
My i7-3970x has served me well for the past 5 years and with the advent of 4K videos, better optimized multi-core applications, I've finally made the decision to upgrade my system. My new build consists of the following:

CPU - Dual Xeon E5 2696V4
RAM - 64 GB Crucial DDR4 PC-2666 (8x8GB)
Storage: 2X800GB HGST SAS SSD RAID 0 for application/scratch disk
OS - Samsung 512 GB 850Pro
GPU - Nvidia GTX 970
Case - Lian-Li PC-V2120x
Motherboard - Supermicro X10DAX
PSU - Seasonic Titanium 1000W

This system will also be used to benchmark various machine vision applications such as 3D point cloud processing used in machine vision inspection systems for quality control. (ScanXtream)

My other workstation is a single E5 2696V3 and Vegas 15 and Adobe Media Encore can take advantage of all physical cores when encoding videos and I can't wait to see the additional improvement from my new workstation. If you guys have good benchmark ideas let me know and I'll keep this thread up-to-date.

Fantastic machine. I do wonder with the memory though. 64GB seems low for a system rocking 44c88t. I don't know how much memory the work would have you using just seems 1.5ish gigs per core seems smallish.
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
NICE BUILD! Small & packs a punch! :cool:

1z70wgg.jpg


Puter porn :cool:
 

traderjay

Senior member
Sep 24, 2015
220
165
116
StefanR5R - thanks for the info. Hopefully my case built in sound dampening will help a bit with the fan noise. I will also make a DIY fan shroud to direct the airflow across the two CPUs. Its normal to have HCC CPUs with lower max Tjmax due to the increase in transistor count.
 

pjmssn

Member
Aug 17, 2017
89
11
71
Fantastic machine. I do wonder with the memory though. 64GB seems low for a system rocking 44c88t. I don't know how much memory the work would have you using just seems 1.5ish gigs per core seems smallish.

I would agree with this, I realize that it depends on the workflow, but I would not use less than 4GB per core (my workflow includes lots of FEA simulations), my dual E5-2690 has 16 cores and 8GB of memory per core.
 

traderjay

Senior member
Sep 24, 2015
220
165
116
I would agree with this, I realize that it depends on the workflow, but I would not use less than 4GB per core (my workflow includes lots of FEA simulations), my dual E5-2690 has 16 cores and 8GB of memory per core.

My applications aren't that memory intensive but will there be a performance penalty with only 64GB of RAM?
 

StefanR5R

Elite Member
Dec 10, 2016
6,504
10,113
136
I have got 64 GB myself, and it is plenty for most of my workloads. There are situations in which this is difficult to work with, but I am waiting for RAM prices to come down again before I upgrade to 128 GB.

There is no performance penalty if you have all 2x4 channels populated. I have 8x8 GB = 1 DIMM per channel, registered ECC (RDIMMs).

BTW, DDR4-2400 is only supported with 1 DIMM per channel with RDIMMs like mine, or 1...2 DIMMs per channel with LRDIMMs. In contrast, 2 DIMMs per channel with RDIMMs are supported only up to DDR4-2133.

Oh, @traderjay, unregistered DIMMs are not supported by X10DAX, supposedly. I am curious myself whether unregistered DIMMs might work regardless, but never took the time to dismantle one of the servers to try it.

Edit:
E5-2696v4 + Asus X99-A single-socket board work with 2 DIMMs per channel, unregistered non-ECC DDR4-2400. But this says little about what's really possible on dual socket boards.
 
Last edited:

kjboughton

Senior member
Dec 19, 2007
330
118
116
To add value to the conversation, let me just say that dual Xeon configurations up the ante: the chipset will enforce Registered ECC memory requirements.

BTW, the way I operate my system (NUMA disabled, no HTT, 16 cores/CPU for 32 total logical/real cores), I would have only 2GB of memory "per core" (64GB total).
 
Last edited:

traderjay

Senior member
Sep 24, 2015
220
165
116
To add value to the conversation, let me just say that dual Xeon configurations up the ante: the chipset will enforce Registered ECC memory requirements.

BTW, the way I operate my system (NUMA disabled, no HTT, 16 cores/CPU for 32 total logical/real cores), I would have only 2GB of memory "per core" (64GB total).

Thank for the info! The memory that I will be using is the Crucial kit - http://www.crucial.com/usa/en/x10dac/CT9315072

The machine vision application that takes 3D point cloud captured by laser scanners for CAD comparison relies heavily on in-memory data shuffling and ECC is a must :)

I have got 64 GB myself, and it is plenty for most of my workloads. There are situations in which this is difficult to work with, but I am waiting for RAM prices to come down again before I upgrade to 128 GB.

There is no performance penalty if you have all 2x4 channels populated. I have 8x8 GB = 1 DIMM per channel, registered ECC (RDIMMs).

BTW, DDR4-2400 is only supported with 1 DIMM per channel with RDIMMs like mine, or 1...2 DIMMs per channel with LRDIMMs. In contrast, 2 DIMMs per channel with RDIMMs are supported only up to DDR4-2133.

Oh, @traderjay, unregistered DIMMs are not supported by X10DAX, supposedly. I am curious myself whether unregistered DIMMs might work regardless, but never took the time to dismantle one of the servers to try it.

Edit:
E5-2696v4 + Asus X99-A single-socket board work with 2 DIMMs per channel, unregistered non-ECC DDR4-2400. But this says little about what's really possible on dual socket boards.

Under what situation did you run into memory issues? I guess for my case I am stuck with PC 2133 Speed :( Also, what is the proper way of populating the slots? The manual is quiet vague "Starting with P1-DIMMA1"
 
Last edited:

kjboughton

Senior member
Dec 19, 2007
330
118
116
1 DPC (DIMM per Channel) on C612 with Broadwell-EP will allow for up to DDR4-2400; 2 DPC will drop to 2133MHz (effective). Each CPU is quad channel for a total of 8 channels in the system. I would highly recommend 8x8GB or if you feel you need more 8x16GB. Besides keeping speeds up this would allow you a path for upgrade down the line without replacing existing DIMMs.

Make sure you pay attention to the ranking. You cant mix rank types. If you go 1 DPC, i would recommend DR (dual rank) DIMMs to squeeze out that last 5% of BW from Rank interleaving on top of Channel interleaving performance.
 
  • Like
Reactions: Burpo

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
I would agree with this, I realize that it depends on the workflow, but I would not use less than 4GB per core (my workflow includes lots of FEA simulations), my dual E5-2690 has 16 cores and 8GB of memory per core.
Again I didn't know the workload. But generally the idea is the more cores you are working with the more memory your system needs to feed them. That is a simple ideology and obviously has it's faults. But when looking at a 2s 44c system 64GB looks rather small (hell my 8c system has 32GB). He knows his tasks better than I do though and if he can feed the best with 64GB more power to him.
 

kjboughton

Senior member
Dec 19, 2007
330
118
116
@traderjay

Firstly, read through this. Twice.
https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-broadwell-ep-memory-performance-ww-en.pdf

Money quote:
"The most striking performance disadvantages in the table, for example with the 8GB 1Rx4 RDIMM in 1DPC, can be explained by the lack of a rank interleave. Except for 1DPC configurations with single-rank DIMMs this case can also occur with mixed configurations, for example with the 16GB 2Rx4 RDIMM in the first bank and the 8GB 1Rx4 RDIMM in the second bank. In these cases of a missing or 1-way rank interleave you should be aware of a certain performance disadvantage. This case should be avoided insensitive use cases, particularly with powerful processor models."

Then go here:
https://www.kingston.com/us/MemorySearch/MemoryType?MemoryType=DDR4 2400MHz,,23,54

This is pre-selected for DDR4 memory... filter for what you want:
- Registered (there are registered and non-registered ECC memory types)
- ECC
- 2R Ranking <--- important!
- Kit of 4
- ...

Here's it is pre-filled:
https://www.kingston.com/us/MemorySearch/MemoryType?MemoryType=DDR4 2400MHz,,23,54&BUSINESS_TYPE=ValueRam&TotalCapacity=64&Capacity=16&ModuleType=DIMM&Voltage=1.2V&DIMMType=Registered&ErrorCheck=ECC&Kits=KIT OF 4&ChannelConfig=Quad Channel Kit&Ranking=2R

There's your memory right there:
http://www.kingston.com/dataSheets/KVR24R17D8K4_64.pdf

Recommended:
KVR24R17D8K4/64
64GB (16GB 2Rx8 2G x 72-Bit x 4 pcs.) PC4-2400
CL17 Registered w/Parity 288-Pin DIMM Kit

Close second (uses lower-density SDRAM chips and so will look "busy" compared to part above with ~half the number of discrete packages):
KVR24R17D4K4/64
64GB (16GB 2Rx4 2G x 72-Bit x 4 pcs.) PC4-2400
CL17 Registered w/Parity 288-Pin DIMM

You could mix these kits if you had to (same ranking), but I wouldn't.

One kit for 64GB, two kits for 128GB. If you go 2 DPC you will drop to DDR4-2133, which these will handle. The higher speed may come back into play in the case of an upgrade down the line...

EDIT: Looking at the Fijitsu whitepaper above (which has been revised since release for the v3 series Xeons), perhaps you can maintain DDR-2400 @ 2 DPC. Not sure if this is a Fijitsu-only development or other MB developers have done the same.
 
Last edited:

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,034
3,513
126
You have posted this thread in the wrong section.
This is CPU and OVERCLOCKING.
Primary is on the discussion of CPU's only and Overclocking.

This thread belongs in Cases and Cooling which features build logs / Work Logs / Show off my bling bling rig skills.
I will move this thread over to that section in a bit.

Cases and Cooling Moderator Aigo.
 

StefanR5R

Elite Member
Dec 10, 2016
6,504
10,113
136
Under what situation did you run into memory issues?
This was an application which I rarely run, and many instances of it in parallel. I simply had to take care not to run too many instances at a time, that's it.
Also, what is the proper way of populating the slots? The manual is quiet vague "Starting with P1-DIMMA1"
I populated all *1 slots (and left all *2 slots empty).
 

traderjay

Senior member
Sep 24, 2015
220
165
116
You have posted this thread in the wrong section.
This is CPU and OVERCLOCKING.
Primary is on the discussion of CPU's only and Overclocking.

This thread belongs in Cases and Cooling which features build logs / Work Logs / Show off my bling bling rig skills.
I will move this thread over to that section in a bit.

Cases and Cooling Moderator Aigo.

Thanks and my apologies for posting in the wrong forum. Once the rig is built, I can create a new thread in the CPU forum to discuss benchmarks and performance?
 

Micrornd

Golden Member
Mar 2, 2013
1,340
220
106
1 DPC (DIMM per Channel) on C612 with Broadwell-EP will allow for up to DDR4-2400; 2 DPC will drop to 2133MHz (effective).
I believe you'll find that to be dependent on the MB board bios rather than the CPU/chipset(s). ;)
As an example, your pair of Haswells should be running at 2133Mhz in quad channel since you have only 4 DIMMs per CPU (1DPC), right?
Yet my same pair of Haswells (E5-2696s v3) are also running at 2133Mhz with 8 DIMMs per CPU (2DPC).