Max single thread performance = 6700K OC?

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Got an issue with one of my clients needing maximum single thread performance for some data crunching. It's worth real money, and there are a variety of reasons (some boiling down to lack of time from the programmers) we cannot get multithreaded code anytime in the foreseeable future. They've already cleaned things up as much as they're going to get cleaned up, and it's still a real choke point.

Right now the job is running on an i5 4590. Based on the benchmarks I've seen it looks like we wouldn't get more than maybe 10-20% max going to a 4790K or 6700K. We've discussed overclocking and nobody wants that of course but if it can yield a major throughput advantage we could set it up for fault tolerance with a backup machine.

We don't care about the Z170 platform advantages, M.2, etc. All we need is one thread going absolutely as fast as possible short of liquid nitrogen and other crazy solutions.

Suggestions?
 

mnewsham

Lifer
Oct 2, 2010
14,539
428
136
Why go with an i7? Just get a 4690k and OC it to the highest single core you can, or 6600k. Should be able to get 4.5Ghz+ pretty easy.
 

zlejedi

Senior member
Mar 23, 2009
303
0
0
Overclocking will add maybe 10% on top of 6700 stock performance.

I don't know any existing cpu with faster single thread than 6700k
 

mnewsham

Lifer
Oct 2, 2010
14,539
428
136
Figured the i7s might be binned better, but if it doesn't improve our OC then sure, i5 is fine.

If you're literally looking for every single ounce you can squeeze from it, then sure. but if you're just shooting for a reasonable OC, you can use an i5.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
The core i5 has less cache. It might be 1-2% more performance but if you want every ounce of performance then go for the i7

ps. In some corner cases the 5775C might be faster due to that L4 cache
 

SAAA

Senior member
May 14, 2014
541
126
116
If your workload benefits in any way or shape from more cache try the Broadwell with eDRAM, there are both Xeons and an overclockable desktop chip.

In some applications that cache can make enough difference to run faster than a 500MHz advantage on Skylake.
 

Ansau

Member
Oct 15, 2015
40
20
81
Overclocking is the only solution that would bring tangible benefits.

Getting a 6600k with +3000MHz ddr4 and overclock it to 4.8-4.9GHz would bring a +35% performance in single core.
Moving to 6700k or 4790k at stock only means a 10-15% more performance in single core, which is not enough for the money spent.

The 5775c/5675c would only be good if the workload relies on managing a lot of data, but it wouldn't matter if instead the workload is based on a lot of calculations done.
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
My guess is this is more a calculation limit than anything but I'm not a programmer. The program is just crunching a bunch of datafeeds which is part of a rather long pipeline that essentially prints our money. Right now it takes over two hours to run, and that lag time hurts - basically the more times we can cycle it per day, the more money we make.

There's not enough money/political will in this to justify LN2, but anything up to that point is probably fair game.

Given that I'm not sure whether that L4 cache in the 5775C would make any difference and neither does the programmer I asked, I'd be inclined to play it safe... but will try and track down a better answer. If the cache does matter, might a 6770HQ a la the new NUC be a better choice? I understand Skylake made some architectural improvements to facilitate the EDRAM's use as an L4 cache.

If we do (likely) decide to chase pure performance, I'd be inclined to lean toward Skylake on the chance that the IPC improvements balance out the 4790K clocking a hair higher on average. Is that a reasonable choice or is the 4790K generally faster with aggressive OC in the real world? This being high end air, maybe water cooling... again, not LN2. Which is a bummer, cause that would've been tons of fun to build. :)

And finally, any tips on specific hardware to use?

Need 16GB RAM and support for 2x 240GB BX200s in RAID1. RAM speed and SSD speed don't matter; CPU is the only bottleneck.
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
Skylake is actually a pretty large leap over Haswell, more than Haswell was over Ivy and more than Ivy was over Sandy. It honestly might not be too far from the Sandy to Haswell. If you want the highest possible single thread performance, an overclockable Skylake chip is definitely the way to go. I kinda wish I had just waited until I could have gotten a 6700K for a good price than simply going for a 4790K.
 

escrow4

Diamond Member
Feb 4, 2013
3,339
122
106
Got an issue with one of my clients needing maximum single thread performance for some data crunching. It's worth real money, and there are a variety of reasons (some boiling down to lack of time from the programmers) we cannot get multithreaded code anytime in the foreseeable future. They've already cleaned things up as much as they're going to get cleaned up, and it's still a real choke point.

Right now the job is running on an i5 4590. Based on the benchmarks I've seen it looks like we wouldn't get more than maybe 10-20% max going to a 4790K or 6700K. We've discussed overclocking and nobody wants that of course but if it can yield a major throughput advantage we could set it up for fault tolerance with a backup machine.

We don't care about the Z170 platform advantages, M.2, etc. All we need is one thread going absolutely as fast as possible short of liquid nitrogen and other crazy solutions.

Suggestions?

Fire your current programmers, hire some foreign programmers to multithread it for cheap. Time is money. No point sinking money into more hardware when the software needs to fixed.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,226
9,990
126
And finally, any tips on specific hardware to use?

Need 16GB RAM and support for 2x 240GB BX200s in RAID1. RAM speed and SSD speed don't matter; CPU is the only bottleneck.

You do know that the BX200 is possibly the poorest-performing (TLC, for that matter) modern SSD on the market, right?

If using SATA, should have gone with Samsung 850 Pro MLC 512GB drives, and if moving to a Skylake / 1151 platform, should invest in a motherboard with dual M.2 PCI-E 3.0 x4 slots ("Ultra M.2 32Gbit/sec"), and a pair of Samsung 950 Pro M.2 NVMe PCI-E 3.0 x4 SSDs, in RAID.
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Illogical - 6700k it is!

escrow - We've seen the damage cheapo programmers can do. The current guy is missing a few tricks but he's a damn ninja in sql and solved a bunch of problems that other supposedly excellent teams failed at. Including one of the top recommended bigcommerce firms...

Larry - SSD utilization is usually under 10%. Moving to 2x 950 Pros in RAID0 would probably save us 10 seconds a day. Per the snip you quoted, RAM speed and SSD speed don't matter; CPU is the only bottleneck. We use BX200s because we already had some on hand, they're cheap as dirt, and the failure rate is practically zero.
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Also, some things just don't lend themselves to parallelism. I have no idea if this is one of those... regardless, some new hardware is way the hell cheaper than training some overseas team on our systems, praying they can actually do everything our current guy can AND make it better without screwing things up in some way that doesn't reveal itself until the damage is done, etc.

As the guy who'd have to manage that project, no thanks. :p
 

Dasa2

Senior member
Nov 22, 2014
245
29
91
RAM speed don't matter; CPU is the only bottleneck..
are you sure? have you tested ram speed? as the only thing ram speed really makes a difference to is cpu speed
some software has all it needs in the cpu cache so it sees no benefit from faster ram
other software that frequently needs data from ram can see fairly large gains from faster ram up to ~20% increase in the efficiency of the cpu is possible in some cases

however if the data is crucial and a single error in the data can screw things up big time then overclocking may not be such a good idea maybe you should even be looking at ecc ram
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
There is some error checking built in, but as the program's grown we are considering ECC. My current thought is that it might be best to have an OC speed demon doing the primary processing and a cheap Xeon box for backup and error checking - still discussing the details of how we might implement something like this... it might be smarter to just forget the OC and move to a Xeon 1270v5 though. 6700K minus 200mhz ain't too bad for the reduced complexity of ECC and speed in the same box.

Speaking of which, given that our usual load is 100% on core0, 10% on core1, and 0-5% on core3 and core4, would the retail cooling on a 1270 be sufficient to keep it close to max turbo or should I be looking at a 212 Evo or something aftermarket?

And yeah our resource monitoring is pretty good. All the RAM is doing is throwing math at the CPU and waiting for it to hurry up already.

It's kinda surprising to me that there's no CPU really optimized for major single thread performance. As I noted above, there are some workloads that just can't be parallelized anymore or where it'd be extremely expensive relative to even a high end Xeon box.

If we could buy something with 2x the single thread performance of a 6700K for $3000 it'd be a no-brainer.
 

SAAA

Senior member
May 14, 2014
541
126
116
Whoa 3000$? What about an FPGA then? It might be ridiculously faster if you code the program directly on chip, cheaper and easier than an ASIC too.
Sadly aside from the name I've got no other suggestions on these, never used them myself, but everyone who needs a script to run faster uses them.
 

Keljian

Member
Jun 16, 2004
85
16
71
Don't be fooled by your load being 100% - the cpu could be waiting on memory, and wait states could be holding things up.



If speed is such an issue, and you are coding this yourself (or having it built) you should budget $500 or so for Intel's compiler, and enable things like AVX2, FMA3, automatic multithreading and automatic vectorisation (note that different compiles and compilers should be tested to find the fastest results!). This can have a dramatic effect on speed when running on Intel chips.



https://software.intel.com/en-us/c-compilers



http://programmers.stackexchange.co...mpilers-really-better-than-the-microsoft-ones "If the code is really computationally expensive, yes, definitely. I have seen an improvement of over 20x times with the former Intel C++ Compiler (now Intel Studio if I recall correctly) vs the standard Microsoft Visual C++ Compiler."

P.s. Forget overclocking, and go Xeon with as fast ECC ram as you can afford.
 
Last edited:

Keljian

Member
Jun 16, 2004
85
16
71
Whoa 3000$? What about an FPGA then? It might be ridiculously faster if you code the program directly on chip, cheaper and easier than an ASIC too.
Sadly aside from the name I've got no other suggestions on these, never used them myself, but everyone who needs a script to run faster uses them.


Wow ok so FPGA work requires a very specialist set of skills and $3k won't cut it for the skills, hardware and the coding. Plus FPGAs are good for big parallel workloads, not so much for serial workloads.

If this thing can be multithreaded (big if) then it would be worth considering Xeon phi, as that would fit in said $3k budget and potentially not need the skills required for an FPGA.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
If this thing can be multithreaded (big if) then it would be worth considering Xeon phi, as that would fit in said $3k budget and potentially not need the skills required for an FPGA.
Just because something is parallel doesn't mean it is massively parallel. Amdahl's law. There's bound to be at least some % of serial code. If it's less then 3% the Xeon PHI might make sense otherwise the regular Xeon would be better.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,219
3,798
75
In case you didn't notice, I run PrimeGrid, which is heavily dependent on both CPU speed and RAM bandwidth. In a recent race, my i3 6100 was somehow the second-fastest machine per-core, behind a heavily overclocked i5. There could be a few reasons for this. I run Linux, though that's not all that unusual. My RAM speed is above stock - though not that much, 2666, but there's 2x16GB sticks of it. And the reason for that is that I have a mini-ITX motherboard. So maybe the size of the mini-ITX board improves RAM throughput? Or maybe it's the 16GB per stick?

The point is individual motherboard characteristics may affect your speed if you're RAM bandwidth-bound.
 

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
Your i3 6100 has above average RAM because your sticks are probably dual rank (16GB requires lots of chips, at least fully populating both sides of the module). Those modules perform an extra speed bin or two above in benchmarks vs single rank modules. Also similar to tightening command rate from 2T to 1T.


But yeah, if single threaded speed is most important and the workload can't be multithreaded, a 6700k (you want the full 8MB L3) with a healthy OC to >4.5GHz and the fastest dual rank DDR4 available (overclocked to >3500-4000MHz) and a decent heatsink or watercooling should do the trick, short of going LN2. As others have mentioned the i7 5775c is another good candidate because of the L4 that can help with certain workloads, but this is a wildcard that may or may not be useful.


Of course such a rig should be tested for utmost stability at those settings. If OC is out of consideration then the next step down is stock 6700k, or the highest clocked Skylake Xeon equivalent plus ECC RAM as a bonus as you've mentioned, OP. Problem is, Skylake scales wonderfully with RAM speed and Intel supports up to DDR4-2400 IIRC, that's 1-1.5GHz less memory speed that you can probably reliably use through 6700k+OC and a truckload of stress testing.
 

Keljian

Member
Jun 16, 2004
85
16
71
So much FUD. Intel compiler +flags, skylake Xeon, fast ECC memory.

No overclocking (yeah it may work, but you can't guarantee it wont flip a bit or something down the track).
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Awesome feedback, thanks guys!

BTW just to be clear we have two programmers. They're both extremely talented and have complimentary skills. The lead is very deep into Linux, writes his own firewalls, does low-level text processing with heavy use of regex, etc. He's a little old school and doesn't do object-oriented or some other more modern stuff. The other guy used to run a large ecommerce company he mostly built and coded himself - most of his skill is in enterprise Java.

FPGA: I'll discuss with the programmers; the lead does a lot of low level stuff and might be able to do it... kinda doubt it though. And if we can't distribute the load across two cores, I really doubt we could do something massively parallel like this. Also, both of them are busy building out the infrastructure and saving their time is half the reason we're pushing hardware improvements.

Compiler - This is mostly a Perl and Java + MySQL project; I don't believe a compiler is required?