Max single thread performance = 6700K OC?

ehume · May 29, 2016

If you are not overclocking, high end air is fine. If you overclock your CPU and you can afford it, high end air is fine, but the Kraken X61 all-in-one should be better. It has two 140mm fans on a 140x280mm rad. Better still would be the EK Predator 360, with three 120mm fans on a 120x360mm radiator unit.

The AIO units contain water, though, and your equipment is mission-critical. So AIO units represent a minuscule risk where you don't have that with high end air. As usual, There Ain't No Free (TM).

Keljian · May 29, 2016

High end air is better than water imo as there is less chance of failure - as such I have a noctua d15, which is huge. Typically the fans barely go above 10% for me unless I am firing off a heavy avx load.

As for overclocking the 5775C, as others have said, once you hit 4-4.2ghz that's about it. I personally haven't really experimented above 4 ghz as it requires more voltage with my chip and I don't want to do that, plus I like round numbers. I have no doubt the d15 would do 4.2 if I wanted it, there is plenty of thermal headroom

There is 0 reason to turn off cores/HT to reach 4.2ghz on a 5775C. There is a lot of thermal headroom on the chip.

Motherboard overclocking support for the 5775c is patchy too, so if you are choosing this path, do some research.

On the other hand skull canyon tops out at 3.2ghz, so I don't think it would compete with the 5775c- with a 500mhz delta at stock.

Skylake has some architectural improvements and will probably hit 4.5-4.6ghz

Note that you will not get ECC with overclocking.. Its one or the other

Personally I am leaning towards the Xeon I mentioned + ECC

Ken g6 · May 29, 2016

etherealfocus said:
Right now it takes over two hours to run, and that lag time hurts - basically the more times we can cycle it per day, the more money we make.

Can I ask one more stupid question? You say cycling it more is better, but you also mention lag time. Which is the problem? Would it make sense to run two instances of the code in question, staggered in time so they finish about once every hour?

Keljian · May 29, 2016

Ken g6 said:
Can I ask one more stupid question? You say cycling it more is better, but you also mention lag time. Which is the problem? Would it make sense to run two instances of the code in question, staggered in time so they finish about once every hour?

Sounds to me like you need the result of one run, to feed into the next one

ehume · May 29, 2016

Really, though, he asks a simple question. Could you see OP running 4 instances on an i7?

Keljian · May 29, 2016

ehume said:
Really, though, he asks a simple question. Could you see OP running 4 instances on an i7?

Hey I am not criticising - just trying to elucidate. As the old saying goes, 9 women cannot produce a baby in one month.

Keljian · May 29, 2016

I should note that good skylake overclockers will do 5.2ghz on air - 1000mhz worth of speed trumps edram for single threaded performance.

ehume · May 29, 2016

Keljian said:
Hey I am not criticising - just trying to elucidate. As the old saying goes, 9 women cannot produce a baby in one month.

But nine women could produce nine babies in 9 months, which averages out to a baby a month.

ZGR · May 29, 2016

Honestly this sounds all overkill to me now. I would go with a cheap Z87 board and go for Pentium OC or, go for i3-6100 OC on Z170.

If OP can run several instances of the software, I'd go for Broadwell E.

An overclock without ECC would really speed things up over stock with ECC. A tough call. There are many solutions to this. None of them are bad.

Keljian · May 30, 2016

A Pentium is a really bad idea as it doesn't have AVX2

AVX2 instructions could be used for regex work and to get a processor without that would be nearsighted. (up to 70% speedup with AVX2) http://www.hardware-infos.com/news/4437/haswell-unter-avx2-und-fma3-sehr-schnell.html see linpack performance

And up to 80% faster for regex specifically: http://www.icgrep.com/ and http://www.tuicode.com/article/562d48bb614007710ff9c6a7

Keljian · May 30, 2016

OP: This may be helpful for your coders: https://01.org/hyperscan/blogs/geofflangdale/2015/welcome-hyperscan

Also, if you want a "pre-tested" chip, you could try: https://siliconlottery.com/collections/all

etherealfocus · Jun 1, 2016

Thanks again guys, another round of conclusions/questions:

1. 6700K @ 4.6 is probably at least equal to a 5775C and the benefits are more reliable since we know for a fact that the app scales more or less linearly with clock, and are only speculating about the value of a big L4.

2. I talked to the team today and they were leaning pretty heavily toward ECC over overclocking. Apparently they could build an error correction system but it'd be a lot of labor better spent elsewhere. Therefore my top target is now the Xeon 1270v5... EXCEPT that Intel just released a bunch of Xeon E3s with L4 cache. Lose a couple hundred mhz if I remember right, but no biggie... and L4 plus ECC would be great. Any reason not to go this route? Would also save me the hassle of being on the hook for any problems overclocking since I'm in charge of the hardware.

Of course, I'm still looking at a high end cooler to maximize the amount of time spent at max turbo unless it's truly a waste of money...? Client's server room is probably 75 F or so.

3. I cannot run multiple iterations - as someone guessed above, each iteration depends on previous iteration's output. We're already trimmed the dependency as much as possible.

4. Our lead programmer spent a bunch of time last night reoptimizing the code. Hopefully I'm explaining this right... he switched from whatever we were doing to something event-driven. I asked him to send me an explanation of what it was exactly that he did, will post it here whenever I get it. Regardless, he was able to knock 40 mins off our execution time. He just does that stuff sometimes... wouldn't have had time during biz hrs (other parts of the app are still missing major functionality that causes clogs in the supply chain and hurts the CSRs' ability to help customers) so he just knocked it out over a few beers at home. We'd have leaned toward ECC anyway, but this made the choice easier.

5. He agrees that C and C++ are faster than Perl, but thinks gains from refactoring to C would only yield 10-20% improvement. Also said he thought COBOL would be a better choice than C. Apparently some of his other clients use it for large data processing jobs and get extremely good performance even on legacy hardware. My understanding was that COBOL was essentially proto-BASIC and FORTRAN was the language of maximum math performance and supercomputing. I assume I'm just wrong here...?

6. I just emailed the programmers a link to Hyperscan; will post back with their thoughts. If it takes significant time the response is probably "get to it after the fires are out"... but that shouldn't be more than a few months with any luck.

etherealfocus · Jun 1, 2016

ken g6 said:
Can I ask one more stupid question? You say cycling it more is better, but you also mention lag time. Which is the problem? Would it make sense to run two instances of the code in question, staggered in time so they finish about once every hour?

Sorry, sloppy language. The lag was just lag between cycles... there's nothing else running on it so there's no lag in the slow-computer sense of the word. Only goal here is to reduce the 200-min runtime (now ~160-min) as much as possible - although as I said above there are diminishing returns; half an hour would extract probably 80% of the value we can extract.

As per above, we cannot run multiple instances since each depends on the previous iteration. We did discuss an error-checking algorithm that runs three instances each on its own core and uses a voting system to accept only outputs validated by another process. We didn't get to a solid verdict on it but they didn't seem scared of building something like that. Said it wouldn't be as good as ECC but not a whole lot worse.

Downside would be loading 3 cores rather than 1 = more heat and probably losing a little per-core performance.

So am I better off with a Xeon E3 1585v5 (http://www.anandtech.com/show/10361...500-v5-iris-pro-and-edram-for-streaming-video )? ECC, 3.5/3.9GHz, 128MB L4

Or a 6700K @ ~4.4ghz but running at high load with 3 cores pegged at 100% 24/7/365 and last core probably at at least 50% load dealing with overhead?

I'm taking what I hope is a conservative estimate of max stable clocks we can sustain long-term at close to 100% load. Seem reasonable?

At a 500mhz advantage... it'd be nice, but I'm leaning toward the Xeon being labor-free for the programmers, lower load, and hopefully burning away most of that clock deficiency with L4 cache. Plus programmers already said they like that solution.

.vodka · Jun 1, 2016

Sounds like the Broadwell xeon and its L4 cache + ECC support is the best solution for you, all things considered.

Your guys can probably optimize the code even more on top of what they've already achieved lately (which is probably already more than what you would've gained from moving to a highly overclocked 6700k)

Keljian · Jun 1, 2016

C with intel's compiler will be faster than COBOL, but Intel have a fortran compiler also.

ECC won't stop you from having errors if you have errors in your code or input data. It is purely for hardware reliability.

The new xeons only just got announced, and are embedded, I don't think you will see them in the mainstream . Better getting broadwell Xeon (with L4) now, than skylake later.

OpenCL or CUDA may be the answer if regex is the bottleneck (openCL code is available here: https://github.com/crepererum/oclgrep )

It seems strange that this can't be multithreaded, I am curious to know more but appreciate that I may not be able to know more. PM me if you are willing to share more.

I am very keen to help you optimise this.

Headfoot · Jun 1, 2016

Hand writing regex is a huge waste of time when you need scalability and things like Elasticsearch (i.e. Super Apache Lucene) exist off the shelf. I heard "Hand written regex in Perl" and I wondered if this was 1995

TBH hardware is just going to be a bandaid for what sounds like a kludge. If the dude isn't even familiar with OO programming which was the latest and greatest in the early 90's, no way he's fluent in highly scalable (scale-out) modern tech like the various NoSQL solutions, Elasticsearch, etc. Even just swapping a perl script for the Node.js version of it would likely speed it up substantially, and would be very easy to translate. Even more so if you translate it to Python and then use any of the many C-in-Python libraries to drop into C for performance sensitive portions and even down to assembly since you can do that in C.

Being able to hand write stuff is great and all but these days Facebook, Twitter, Google, Amazon, etc. all have written massively scaleable, battle-tested libraries for various common high performance tasks and they pay a small army of very, very smart people to write things like that. I'll take the open source cream of the crop over handwritten perl any day.

Textual analysis is usually pretty to parallelize at least in part, provided its not a single contiguous thing. Good ol fashion chop / analyze / recombine, very basic. But I dont know the workload so it could be more complicated than that

IMO buying faster hardware worked in the 90s and early 00's but the paradigm has changed where software is expected to be scalable now. As you see, you can't barely buy faster single threaded performance no matter how much money you have to throw at it. There is no end-game with buying faster hardware anymore, there is only buying more hardware and scaling across it. For example, if you got your text search working via Elasticsearch you could instantly scale that out to as many machines as you need and you'd instantly get speed and redundancy. Change your MySQL to an ES-Hadoop stack and you now have a highly redundant, highly scalable platform

OC'd 6700k would certainly help some in the meantime while you figure out how to get to your 30 minute target

etherealfocus · Jun 3, 2016

Had a very productive meeting today. Programmer agreed that prefab code is probably better; he's just not familiar - dude's very talented but also old school and used to solving problems by writing code rather than using someone else's. He's agreed to dig into it and will be looking into ElasticSearch in particular. Would SOLR be worth looking into as well? I mostly know it as the default search in Magento - awesome as long as you have a ninja to manage it correctly.

We're also looking into moving a lot of dev to Talend Big Data. Free, code mgmt features are supposed to be awesome, and it's got several of the other free versions baked in for good measure. I installed it a while ago and been slowly getting the hang of it.

<rant>Talend = most painful installation process I've ever experienced. My i3+850 Evo usually does everything I want just about instantly, but I spent a solid 45 mins wishing I had an i7 and a 950 Pro. And holy crap do I hate Java's micromanagement requirements. Here, install these dependencies because other dependencies depend on them.</rant>

I dunno how popular Talend is... its claim to fame is reusing known best-in-class code blocks in essentially a drag and drop fashion to assemble your pipeline... except apparently unlike its competitors, it actually works. Anyone here have experience with it? Good/bad/ugly?

I expect even if we do go with Talend it'll take a while to reassemble the pipeline in a scalable fashion. Think we'll probably still go with a hardware upgrade... little tempted to grab a 2P E5 rig in expectation of major gains down the line, but probably be safe and just nab an E3. If we get anything resembling linear scaling 8 threads should get us under half an hour anyway.

And yeah I didn't notice the Skylake+L4 Xeon doesn't have a release date yet. Based on the AT review the E3 1285Lv4 looks like a winner... although it does get thoroughly stomped by the 6700K even at stock... sigh. http://www.anandtech.com/show/9532/...on-e3-v4-review-95w-65w-35w-1285-1285l-1265/8

Keljian - appreciate the offer but unfortunately I signed an NDA so this is about as much as I can say. I'll try and get a sanitized description of the problem as I mentioned before; hopefully it'll still be detailed enough to be useful. I brought up OpenCL/CUDA and got some blank stares... hopefully they'll be able to read up and respond at next mtg.

Search

Max single thread performance = 6700K OC?

ehume

Golden Member

Keljian

Member

Ken g6

Programming Moderator, Elite Member

Keljian

Member

ehume

Golden Member

Keljian

Member

Keljian

Member

ehume

Golden Member

ZGR

Platinum Member

Keljian

Member

Keljian

Member

etherealfocus

Senior member

etherealfocus

Senior member

.vodka

Golden Member

Keljian

Member

Headfoot

Diamond Member

etherealfocus

Senior member

TRENDING THREADS