Techspot: Haswell-E vs Dual Xeon (SB-EP)

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jhu

Lifer
Oct 10, 1999
11,918
9
81
So I got the PCIe 6 pin to EPS 8 pin for the second CPU. I plug it in and the second CPU and full 64 gigs of memory display. I tried out Blender some, the thing just scorches with both CPUs. I now also have a Nvidea GPU so I can also render with GPU, but previously I only had a i5 and an AMD GPU so it was much slower.

BUT... the power draw is beyond the limits of my APC power supply. So it beeps. Which actually it used to if I used the GPU on full power, but I almost never did since I stopped gaming. Power supply in my house is surprisingly inconsistent - I have a problem several times a year. So before I start using the CPUs on full power I need to get a bigger APC.

Once I get the APC in a month or so I'll run some burn in tests and some benchmarks.

Another note - since I have the fans on a fan controller, they don't turn off when the computer is sleeping. So the computer is never silent unless it is off. I dislike that, not sure if there is a workaround though. Also note that boot up time with this motherboard is quite a bit longer than my old computer. Maybe 30 seconds, I'm not sure. But turning it off regularly is also a pain. Hmm...

What GPU do you have? How does rendering speed compare with the dual Xeons?
 

NAC

Golden Member
Dec 30, 2000
1,105
11
81
What GPU do you have? How does rendering speed compare with the dual Xeons?

I have a GTX960. I expect that it may be about the same speed as the dual CPUs. I've read online that Linux rendering with two CPUs is like twice as fast as Windows rendering, so I'll experiment with it, but not for a few weeks.

And I've discovered having so many cores is a bit of a pain in the a** actually. Or at least wasteful. Some apps... most apps aren't faster with all of the cores. But many apps will use them up. Apparently the same with some rendering like in Premiere - it will use all the cores but be no faster than if it used half of them. So I'm running handbrake and have to manually set affinity, or just let the computer blaze away using energy but not moving any faster. And right now, with the weak APC I have no choice but to set affinity, or hear the beeping. I guess I will create custom shortcuts for some apps and set the affinity in the shortcuts. And choose which cores to be spread around according to the tasks I may do simultaneously.
 

NAC

Golden Member
Dec 30, 2000
1,105
11
81
Another idea is that I can create multiple instances of some apps, and divide the work onto different cores and get it done faster. I could probably run 4 instances of handbrake and encode 4 videos at once in a much faster time than sequentially. Even blender - I can encode with GPU and a core or two for some frames, and the rest of the cores in CPU mode and then merge things together. But it is a lot of manual setup. I'm sure in the future more and more apps will do this automatically.
 

nitromullet

Diamond Member
Jan 7, 2004
9,031
36
91
And I've discovered having so many cores is a bit of a pain in the a** actually. Or at least wasteful.

Yeah, I'm with you on this. I have somewhat of a new found respect for high clocks and high IPC. For my needs, my i5 4670K is faster for most things.

Even for audio work, I can run more tracks with less latency on the i5 than I can on the Xeons. The only thing that I've tried that really is massively faster on the Xeons is Cinebench, and I don't do any 3D rendering.

My original plan was to migrate to the Xeon machine, sell my i5, and perhaps build a Skylake-E machine in the future. I have decided to use the Xeon box as a secondary rig, and keep my Haswell i5 for the time being.

That being said, I don't have any regrets. The dual Xeon box is still a very powerful machine, and I got it for a steal.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Even for audio work, I can run more tracks with less latency on the i5 than I can on the Xeons. The only thing that I've tried that really is massively faster on the Xeons is Cinebench, and I don't do any 3D rendering.

Doh! All I really do is render, so I'm seeing a pretty good performance boost over the prior machine (FX8350).
 

Sonikku13

Member
Jun 16, 2009
37
0
61
I pounced on one of these CPUs at $67 a while ago. I still have not built the tower based on the E5-2670, yet... but will this processor bottleneck a Radeon R9 Nano? And will the E5-2670 be better than an A10-7850K in Final Fantasy XIV: Heavensward?

Also, anyone think this will occur again... with Ivy Bridge, then with Haswell, then with Broadwell, then with Skylake, and so on and so forth?
 
Last edited:

NAC

Golden Member
Dec 30, 2000
1,105
11
81
So based on some preliminary tests in Blender in Windows, the two CPUs are a tiny bit faster than my GTX 960 for rendering. When rendering a final product, I tested that I can create two Blender instances - one rending in GPU and one CPU, and then combine the results. If I only cared about 3d rendering, then putting the money into just a better GPU makes more sense though.
 

NAC

Golden Member
Dec 30, 2000
1,105
11
81
I pounced on one of these CPUs at $67 a while ago. I still have not built the tower based on the E5-2670, yet... but will this processor bottleneck a Radeon R9 Nano? And will the E5-2670 be better than an A10-7850K in Final Fantasy XIV: Heavensward?

Also, anyone think this will occur again... with Ivy Bridge, then with Haswell, then with Broadwell, then with Skylake, and so on and so forth?

In general for gaming, faster cores are better than more cores. But I would expect that the e5-2670 will be fast enough at least most of the time. I can't speak to anything specific though.

Also, I think this will occur again. I'm already thinking about which e5 v4 or v5 CPU will be worth upgrading to. CPUs are just not getting much faster year over year like they used to. But they are getting more energy efficient so it will be worth while for server companies to upgrade. I suspect, however that Intel is trying to figure out how to prevent this from happening every generation.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
I pounced on one of these CPUs at $67 a while ago. I still have not built the tower based on the E5-2670, yet... but will this processor bottleneck a Radeon R9 Nano? And will the E5-2670 be better than an A10-7850K in Final Fantasy XIV: Heavensward?

Also, anyone think this will occur again... with Ivy Bridge, then with Haswell, then with Broadwell, then with Skylake, and so on and so forth?

It seems that it will. I see only one reason why it wouldn't. Companies decided to hold on to servers for longer but that seems very unlikely. Running costs account for a big chunk of TCO because they are run 24/h and power usage is mostly multiplied by close to 2x (how is that statistic called? I forgot) in most servers because they have to run AC units in server rooms so a CPU that cuts power consumption and ups efficiency is much more valuable to servers then to us.

ps. I made a thread about the viability of the approach of buying decommisioned server CPUs as opposed to new ones.
http://forums.anandtech.com/showthread.php?t=2473157
This thread is specifically about SB-EP I made a thread about the viability of this approach to building computers in general.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
So based on some preliminary tests in Blender in Windows, the two CPUs are a tiny bit faster than my GTX 960 for rendering. When rendering a final product, I tested that I can create two Blender instances - one rending in GPU and one CPU, and then combine the results. If I only cared about 3d rendering, then putting the money into just a better GPU makes more sense though.

The problem is that large scenes don't fit in GPU's RAM. My scenes regularly use 12+GB of RAM. Maybe I'm just not that memory efficient…
 

NAC

Golden Member
Dec 30, 2000
1,105
11
81
So I did some benchmarks rendering Premiere Elements. My findings seem to mirror that link. Note that these aren't rigorous benchmarks, I literally used a timer and pressed start, the clicked to render and then pressed stop when it was done.

First I chose a 26 second section of my video. It includes some 4k adjusted for brightness, and some 1080p. I rendered a mp4 h264 file. In all cases I included the hyperthreading core - so when I tested 4 cores, it was actually 4 real and 4 hyperthreading cores.
4 cores: 97 seconds
6 cores on 1 CPU: 69 seconds
6 cores, 3 on each CPU: 73 seconds
8 cores on 1 CPU: 61 seconds
8 cores, 4 on each CPU: 62 seconds
10 cores, 5 on each CPU: 60 seconds
all cores: 58 seconds

I also generated previews (by pressing enter) of a section of the video which in which every frame is a png file I created in blender. So this would be the stop motion use case, but in my scenario it wasn't stop motion - instead each frame was a picture. There was basically zero benefit to cores:
1 core: 53 seconds
2 cores: 49 seconds
4 cores: 49 seconds
unlimited cores: 62 seconds.

And I generated previews from another section of the video with brightness adjusted and cropped 4k video, and an overlay of 1080p video - like picture in picture. Similar to that link, I saw the highest benefit of cores during normal preview generation:
1 core: 183 seconds
8 cores: 30 seconds
unlimited cores: 20 seconds.

In general when previewing the timeline (not generating previews by pressing enter), it still studders massively during the sequence of single frame png files. Curiously, it is only reading between 1 and about 8 megs per second from the hard drive, so I'm not sure why it cant preview real time. I'll try it with the files on an SSD at a later time.

When previewing the timeline during other regular clips - either 4k or 1080p, I can sometimes get it to studder a little with only 8 cores on one CPU active. But not consistently, I'm not sure if it is a CPU limited or just a loading files into memory issue. But with all cores, I don't think I was able to get it to studder at all. This is a huge improvement from my i5, which had trouble with anything but the simplest parts of the timeline.

I'm quite pleased.
 
Last edited:

jhu

Lifer
Oct 10, 1999
11,918
9
81
Another idea is that I can create multiple instances of some apps, and divide the work onto different cores and get it done faster. I could probably run 4 instances of handbrake and encode 4 videos at once in a much faster time than sequentially. Even blender - I can encode with GPU and a core or two for some frames, and the rest of the cores in CPU mode and then merge things together. But it is a lot of manual setup. I'm sure in the future more and more apps will do this automatically.

You know, I just tested this out. That's a fantastic idea. I don't have a discrete GPU, but since Blender is not NUMA aware (at least that's what I'm attributing the speedup to, but I could be wrong), it is about 16% faster (based on the BMW1M scene) to run an instance of Blender on each CPU.
 
Last edited:

2blzd

Senior member
May 16, 2016
318
41
91
So I did some benchmarks rendering Premiere Elements. My findings seem to mirror that link. Note that these aren't rigorous benchmarks, I literally used a timer and pressed start, the clicked to render and then pressed stop when it was done.

First I chose a 26 second section of my video. It includes some 4k adjusted for brightness, and some 1080p. I rendered a mp4 h264 file. In all cases I included the hyperthreading core - so when I tested 4 cores, it was actually 4 real and 4 hyperthreading cores.
4 cores: 97 seconds
6 cores on 1 CPU: 69 seconds
6 cores, 3 on each CPU: 73 seconds
8 cores on 1 CPU: 61 seconds
8 cores, 4 on each CPU: 62 seconds
10 cores, 5 on each CPU: 60 seconds
all cores: 58 seconds

I also generated previews (by pressing enter) of a section of the video which in which every frame is a png file I created in blender. So this would be the stop motion use case, but in my scenario it wasn't stop motion - instead each frame was a picture. There was basically zero benefit to cores:
1 core: 53 seconds
2 cores: 49 seconds
4 cores: 49 seconds
unlimited cores: 62 seconds.

And I generated previews from another section of the video with brightness adjusted and cropped 4k video, and an overlay of 1080p video - like picture in picture. Similar to that link, I saw the highest benefit of cores during normal preview generation:
1 core: 183 seconds
8 cores: 30 seconds
unlimited cores: 20 seconds.

In general when previewing the timeline (not generating previews by pressing enter), it still studders massively during the sequence of single frame png files. Curiously, it is only reading between 1 and about 8 megs per second from the hard drive, so I'm not sure why it cant preview real time. I'll try it with the files on an SSD at a later time.

When previewing the timeline during other regular clips - either 4k or 1080p, I can sometimes get it to studder a little with only 8 cores on one CPU active. But not consistently, I'm not sure if it is a CPU limited or just a loading files into memory issue. But with all cores, I don't think I was able to get it to studder at all. This is a huge improvement from my i5, which had trouble with anything but the simplest parts of the timeline.

I'm quite pleased.


Great info.

Although I think your results will drastically change if you put your files on separate SSDs. One for media, media cache, previews, exports, projects etc
 

NAC

Golden Member
Dec 30, 2000
1,105
11
81
Great info.

Although I think your results will drastically change if you put your files on separate SSDs. One for media, media cache, previews, exports, projects etc

Good point. I currently have all Premiere scratch disks on my C: drive which is a Sandisk Ultra SSD and has the program files as well. I have my media and the project file on a 7200 drive. For my benchmarks above, I rendered onto the C: drive.

I tried a few more benchmarks, always with all cores, just shuffling data around to see how it changes. I created a small benchmark file - just about 4 gigs of media files for this purpose. I think that makes a difference because during normal work, Preimiere will load up ram with the media files and whatever it needs. So benchmark speeds were impacted if I ran it first or second. I created a Ramdisk using IMdisk for these benchmarks, assuming that to Premiere a ramdisk is at a minimum equivalent to a very fast SSD (or probably quite a bit faster) See results below.

Semi complex timeline - effects applied, pan and crop 4k files, several fades so two video files needed at once, titles with motion. Ran previews (press enter). "Cache" is all of the Premiere elements cache type settings (media cache database, media cache, video previews, audio previews)
A- Cache on SSD, media on 7200: 43
B- Cache on SSD, media on ramdisk: 36
C- Cache on SSD, media on 7200 (repeat of test A): 40
D- Cache on ramdisk, media on 7200: 37
E- Cache on ramdisk, media on ramdisk: 36

Note that I suspect that test 1 became faster because Premiere had more data in ram, but obviously not everything otherwise it would have been closer to the other ramdisk scores.

Simple timeline - no effects or fades, just gentle zoom into single 4k file. Generated previews again:
F- Cache on ramdisk, media on ramdisk: 57
G- Cache on ramdisk, media on 7200: 55
H- Cache on SSD, media on 7200: 64

What do I learn from all this?

* Based on test E - I may have run up against the limitation of CPU processing, even moving all files to the fastest possible drive didn't help much.

* Based on test F/G - it looks like a 7200 drive is good enough for media if using single 4k file at at time, but based on test B/C it is a limitation when there are 2 files at once such as during a fade or picture in picture.
Note that opening my 2 hour Premiere project which has dozens of different media files takes a long time. So I may want media on SSD for that - so projects don't take forever to open. I'm not sure why it doesn't use the cache fully to speed up file opening. Perhaps it re-verifies things in case anything changed.

* Based on test G/H and C/D - having cache files on a ramdisk or a very fast SSD will make a difference. I think Premiere Elements has a 10 gig limit on cache files, and I've never seen it use more than about 30 gigs of physical ram. So I may create a ramdisk on startup, and a process to synchronize between the ramdisk and a hard drive on startup and shutdown. This would be cheaper and faster than say a Samsung 950 pro. Premiere Pro may allow a larger cache and this wouldn't work unless you have even more memory.


Also, I didn't want to take over this thread with Premiere benchmarks, but I've already posted a lot about it and others may be following here. I think to help others in the future, later I will copy these posts into a new thread in the software section so people can find them.
 

2blzd

Senior member
May 16, 2016
318
41
91
You should really look into http://ppbm7.com/



Bunch of Adobe Hardware Forum guys came together and created this benchmark for Premiere. It's pretty much the defacto benchmark standard for Premiere where every HW config is measured and tested. There is even a list of the highest scoring rigs...the top 10 all have SSDS and 6-8 cores with a GTX 970 or above. The #1 rig is using the newer m.2 ssd's...

You download their benchmark, then upload your results. I've been following it for years and they continually tweak and update it with each new Premiere release. Right now they're running tests on the 6950X and pascal gpus. I cannot wait for the results.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,143
136
Dual Xeon E5-2670 is back, now against Broadwell-E (Core i7-6950X):

rYUud8I.png


sTsnRTH.png


www.hardwareunboxed.com/6950x-vs-dual-xeons-premiere-pro-cc-encoding-battle
 

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
In general for gaming, faster cores are better than more cores. But I would expect that the e5-2670 will be fast enough at least most of the time. I can't speak to anything specific though.

Also, I think this will occur again. I'm already thinking about which e5 v4 or v5 CPU will be worth upgrading to. CPUs are just not getting much faster year over year like they used to. But they are getting more energy efficient so it will be worth while for server companies to upgrade. I suspect, however that Intel is trying to figure out how to prevent this from happening every generation.

Intel can get datacenters to upgrade by offering more cores/socket while charging a premium until it hits or it's slightly less than the overall TCO savings. A 1 socket 16 core with a slightly higher CPU TDP is still gonna be helluva energy efficient than a dual socket 8 core, much less 2 separate servers of a 1x8 cores each and that's before considering supporting equipment like switches/UPS etc.
 

daniel1926

Junior Member
Feb 18, 2015
23
1
11
Dual Xeon E5-2670 is back, now against Broadwell-E (Core i7-6950X):

rYUud8I.png


sTsnRTH.png


www.hardwareunboxed.com/6950x-vs-dual-xeons-premiere-pro-cc-encoding-battle

I am surprised that they only threw 32gb into this machine. Most of the dual 2670 builds that I have seen in my area are either stocked with 128gb or 192gb of memory.

I know a lot of guys here are gamers so they need high single tread performance, but for my use case, the more threads and the more memory the better. Indeed, I would happily trade slower memory for greater capacity. Likewise, I would trade lower ST performance for significantly more cores.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
I am surprised that they only threw 32gb into this machine. Most of the dual 2670 builds that I have seen in my area are either stocked with 128gb or 192gb of memory.

I know a lot of guys here are gamers so they need high single tread performance, but for my use case, the more threads and the more memory the better. Indeed, I would happily trade slower memory for greater capacity. Likewise, I would trade lower ST performance for significantly more cores.

I have 32 GiB in mine. It's unlikely I'll need anything more for a while or ever.