Max single thread performance = 6700K OC?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ZGR

Platinum Member
Oct 26, 2012
2,052
656
136
6700k OC is by far the safe bet. A 5775c can beat it in single thread performance but maybe only in stock vs stock. The 5775c hits a wall at 4.2 GHz whereas the 6700k can do 4.6+ GHz. At that point, efficiency is thrown out of the window unfortunately.

eDRAM has a lot of potential in your field. It may be worth looking into, but who knows how hard it will be to effectively implement.

I think the best solution for you is upcoming Broadwell E. It supports much faster RAM than Haswell E, but should also improve single threaded performance. The 6 core chip would pay itself off in no time.
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
ZGR - six cores won't help; like I said we can barely use 2 cores. Only way BDW-E would help is if core clocks are significantly better than 6700K/1270, which seems unlikely since it'll have 2-4 additional cores to deal with.
 

ZGR

Platinum Member
Oct 26, 2012
2,052
656
136
ZGR - six cores won't help; like I said we can barely use 2 cores. Only way BDW-E would help is if core clocks are significantly better than 6700K/1270, which seems unlikely since it'll have 2-4 additional cores to deal with.

Oops. my bad. 2 cores need all the clockspeed they can get. Definitely OC the 6700k.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
Compiler - This is mostly a Perl and Java + MySQL project; I don't believe a compiler is required?

Perl + Java?! :eek: Well, I guess Java isn't too bad for number crunching. Perl can be fairly slow, though. Most heavy number crunching is done in either C/C++ or Fortran. You might be able to find a more optimized JVM or Java compiler for your work.

Is that MySQL on the same server, or a different server? Have you verified the time spent waiting on the DB is not significant?
 

Keljian

Member
Jun 16, 2004
85
16
71
Yeah I would look at c++ rather than Java or perl.. That said, Java can be compiled for speed and optimised, but won't touch c++ with flags on the Intel compiler
 
Last edited:

etherealfocus

Senior member
Jun 2, 2009
488
13
81
I'm in pretty far over my head on this stuff - had a couple years of comp sci and speak fluent HTML/CSS/JS and a bit of PHP and Python, but don't know anything about real code optimization.

I know Perl is generally considered a little sluggish, but like I said he makes heavy use of regex and we've seen him make huge improvements to execution times with it. No idea how it compares to cpp/Intel of course.

He doesn't know cpp but does know vanilla C... would that have a lot of the same advantages?

Part of the workload is converting a bunch of different datafeeds into our native format, part of it is number crunching our price/availability/channel logic, part is a black box I don't know anything about bc it's outside my department. That said, the SQL server is on another box. One of the functions of this program is to feed data into SQL, and the SQL box is always at low utilization so I don't think that's a bottleneck.

Any idea how fast perl regex is compared to cpp?

I know we're also using Talend Big Data (free version) for some of this... don't think it's in any of the performance-critical areas but again I'm over my head on some of this.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
Yeah, that's confusing. I think multiple people have written different programs to be tested. So I guess it's possible but not easy to write some regexes faster in php.
 

Keljian

Member
Jun 16, 2004
85
16
71
If I were in charge of the project I would:
1. Profile the code (time the loops/sections) to find the bottlenecks
2. Refactor bottlenecking code in c or c++, using intel's compiler with aforementioned options
3. Profile again and find bottlenecks, again refactor next lot of bottleneck code to c/c++
4. Throw more hardware at it

I would also analyse the code to see where the big slowdowns are and why they are happening (eg, if it is crunching a data set of less than 128mb, then a chip with edram would be preferable to one without, the E3 12xx v4 series might work better for you, or the skylake 6770HQ)

Hardware alone is not the answer here though. A car is only as good as its driver.

You may get enormous (20-40x) speed ups doing this, where you likely will only get a factor of 1.3-1.5x by throwing hardware at it.

So the combination of the two would potentially result in up to a 60x speed up (more if the compiler's auto-multithreading works to your advantage)
 
Last edited:

Keljian

Member
Jun 16, 2004
85
16
71
Btw C++ is basically a hybrid between Java and C so if your coders are fluent in both there is almost no learning required for C++
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Normally I'd agree entirely. Confounding factors:

1. Programmer time is very expensive, both in terms of actual hourly expense and in terms of taking them off other projects. Right now the additional value they generate by working on other stuff is bigger than the value to be gained by optimizing this.

2. The lead knows procedural programming only - no OO. The other guy knows OO through Java but we're very hesitant to have code they can't both support efficiently.

3. We've discussed bringing on additional programmers and we are actually hiring right now. If anyone's a SQL/PHP/Java ninja in the North Dallas area and wouldn't mind commuting to Denton/Aubrey/Argyle, feel free to PM me a resume. This would be a FT job and you would mostly be required to be physically present, although remote work may be on the table once in a while. This is NOT an entry level position - if you can't learn a complex system, keep pace with industry veterans, and crank out quality code promptly and reliably, it's not gonna work. If you also happen to be good at writing very fast C++, obviously that's awesome.

Having a third person will definitely open up time for refactoring some of this mess. If we find someone soon, I may hold off on the hardware for now. If not, or if they tell me it'll take a while to train up whoever we hire, I'll probably go ahead and buy something.

The EDRAM cache and/or OC is tempting, but I'm leaning toward a Xeon for the ECC - rather lose money on running fewer cycles than on bit flip borking our system. Something tells me developing a way to validate the results of an OC machine with a slower Xeon would take at least as much developer time as refactoring the code properly.

There aren't any OCable chips with EDRAM are there? Just being thorough...
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
I should note, again, the code is already fairly well optimized. The initial versions were all kinds of messy and slow... not anymore. We've got some painful bottlenecks, but most of the system is pretty solid. Making real improvements from here is gonna take some doing - the low hanging fruit is gone.
 

Keljian

Member
Jun 16, 2004
85
16
71
Ok some points:
1. Low hanging fruit is c code for the bottlenecks/algorithms. Can't convince me otherwise. Even without Intel's compiler, it's going to plainly be faster. You can refactor portions of the code without refactoring the whole enchilada.

2. The Xeon e3 v4 is a broadwell Xeon with edram and ECC support, top speeds are circa 3.8ghz. http://ark.intel.com/products/88046/Intel-Xeon-Processor-E3-1285-v4-6M-Cache-3_50-GHz

I confess I am biased towards i/o, ram, edram and cache. My main machine is based around an i7-5775c (@4.0ghz) with 32 gig of CL10/2400 ram and a 950 pro(on 4x pcie 3)

3. If you are printing money with this as stated, a 40x speed up is significant. 40 days work in 1. A year in just over 9 days. This imo is worth some programming time. Refactoring to C shouldn't take too much effort considering the ideas, constructs and loops etc are already created, just have to be translated- the hard work is done.

4. This is not about optimising code, this is about using the right language and optimisations for the application. Based on what you're saying, this hasn't been the case
 
Last edited:

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
There aren't any OCable chips with EDRAM are there? Just being thorough...

Yes, the i5 5675c and i7 5775c. These are Broadwell based chips and come with a 128MB L4 EDRAM cache. They have a hard wall at about 4.2GHz, they don't overclock well... yet if the workload fits on the L4 cache, they punch way above their weight. These fit on socket 1150 boards and use DDR3.


Sadly there are no overclockable Skylake chips with L4 cache.
 

selni

Senior member
Oct 24, 2013
249
0
41
Fire your current programmers, hire some foreign programmers to multithread it for cheap. Time is money. No point sinking money into more hardware when the software needs to fixed.

There's still a lot of widely used algorithms that have no practical (ie faster than the serial) parallel implementations. It's not just lazy programming, it's mathematics.

Btw C++ is basically a hybrid between Java and C so if your coders are fluent in both there is almost no learning required for C++

C and C++ done well are actually a pretty long way apart and it's been that way for quite a while. Agree with most of everything else you've said here though.
 
Last edited:

hhhd1

Senior member
Apr 8, 2012
667
3
71
6700k, and disable hyper threading and 1 or 2 of the cores, and then overclock.
the lower heat will give you more room to OC.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
231
106
6700k, and disable hyper threading and 1 or 2 of the cores, and then overclock.
Tried to do this trick with my 4770K. Can't say I did see big gains with this tactic. HT off, yeah, better temps. Stick with 4C I say.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
I'm satisfied the Java is sufficiently optimized. It's the Perl in the critical path I'd kill and replace with C and some regex library. Unless the Perl isn't in the critical path - have you checked what percentage of the time is used by Perl code?

Or you could introduce your Perl programmer to ithreads. They're one of the the simplest threading models I've ever seen. C on one core is still likely to beat Perl on 4, however.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
If you run into thermal throttling, don't uses the Hyper 212. That is a three-heatpipe heatsink. If you need aftermarket cooling, go with a 6+ heatpipe heatsink. Noctua makes them. So do others. A comparative review is here, in case you need some ideas.
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Ok, conclusions/questions from above:

1. I'll meet sometime this coming week and discuss refactoring in C.
2. Heatsink suggestion from ehume noted. I've read I believe on AT that high end air is almost as good as mid-high end water; is that correct? Or would water still give me appreciably better speed (say at least 200mhz)?
3. Given that we get ~0 benefit from >2 cores, if I do go with an OC solution I'll certainly kill HT and cut it back to dual core. No reason not to.
4. Programmers have pretty good data on critical path code but it's not in front of me... will check at mtg next week. Most of it will likely be Perl/regex or Java/regex. As I said, you guys have pretty well convinced me that c/cpp is worth pushing for. If I get a strong rebuttal I'll post here for feedback.
5. What's the benefit of the 5775C vs Skylake+HD580 a la Skull Canyon? Just the extra clock speed/TDP? I assume that's generally more significant than Skylake's L4 arch improvements? I was of the understanding that neither could OC, but vodka above mentioned getting 4.2ghz on the 5775C...?
6. I should probably mention that the value curve on speeding this thing up is not linear. Cutting execution time down to ~20 mins would probably extract most of the value that can be extracted... anything better than that is icing and future proofing. Our dataset isn't growing fast anymore, but it is growing.