Anyone have BullDozer Distributed Computing benchmarks?

VirtualLarry · Oct 12, 2011

As most of you probably know, BullDozer was officially released to the world last night at midnight. And not to a lot of acclaim. Many benchmarks, show it SLOWER than their older 6-core Thuban chip.

But all is apparently not lost, the new CPU supports some new opcodes that might cause it to show improvement:
http://forums.anandtech.com/showthread.php?t=2198168

Just wondering if anyone has DC-related benchmark info for BD, and whether or not their favorite project is going to be re-compilied to support BD new opcodes?

It should be noted also that these chips burn A LOT of power when overclocked. Something like 200W MORE than an equivalent Intel SB CPU.

somethingsketchy · Oct 12, 2011

I've been doing some reading and from what I can tell, unless the DC projects suppose some of the newer architectures outlined (AVX for example), I don't believe it would do much better than most. Granted you do have eight disccrete cores with their own L1/L2 cache, but I feel like the instruction set may not be enough to give the eight core beasts a good run on the Sandy Bridge chips.

Judging by some of the real world benchmarks (i.e. the software build of Chromium), I was very disappointed by those numbers. Maybe DC projects are a key demographic for this kind of processor, but again I don't believe there are any projects that have AVX instruction set coded for Bulldozer chips.

Rudy Toody · Oct 13, 2011

Perhaps, Ken_g6 could tell us if any of the new instructions could be used to good advantage on parallel dc projects.

Ken g6 · Oct 13, 2011

First of all, the AVX instruction set probably would help some projects on some chips. The problem with Bulldozer is that an 8-core chip has only 4 full AVX units. Each AVX unit will have to trade off between two cores. This can work if there's extra integer work to be done in the meantime; but an AVX instruction will also block any other floating-point work (for a cycle.) Efficient projects probably won't be helped much by AVX on Bulldozer, as they could be doing 2 SSEx operations in parallel on the two cores anyway.

If you're thinking of PrimeGrid sieves, I don't use floating-point in those. (Well, except for some 80-bit FPU math hijinks, but that's not helped by AVX either.) I'm waiting for Haswell and its integer equivalent to AVX. But the sieves could be obsolete by then anyway.

Rudy Toody · Oct 13, 2011

Thanks, Ken!

I misunderstood the setup. I thought that only the use of double precision FP blocked one core, but that each could use single precision FP without getting in each others way.

Ken g6 · Oct 13, 2011

Oh, yeah, we're way beyond double precision! (And yet not.)

Way back on the original Pentium, Intel noticed that there were these eight 80-bit registers (usually used for 64-bit double-precision math at most) that couldn't be used for anything integer. So they came up with MMX, which allowed a single instruction to operate on sets of either 8 bytes, 4, pairs-of-bytes ("words"), or occasionally two double-words (32-bit integers).

Well, this was a hit. But floating-point math remained as it had always been, awkward and slow. AMD actually had the bright idea of working with two 32-bit floating point numbers at once in the same way, in the same registers. This was called "3Dnow!". But it never really took off.

So in the Pentium III, Intel created eight SSE registers. Each held 128 bits (4 single-precision numbers or 2 double-precision numbers), and could operate on them as sets. You can also work with as little as a single FP number at a time here. This is the new standard way to do floating point math. (Except for some 80-bit FPU math hijinks.)

Well, SSE2 added integers to the mix, and SSE3, 4.1, and 4.2 have added new instructions. With 64-bit processors they even added 8 more SSE registers. But they've stayed 128 bits, until Sandy Bridge.

Sandy Bridge and Bulldozer have 256-bit AVX registers, 128 bits of which can still be used as SSE registers. Applications made specifically to use AVX (either through compiler optimizations that generally don't exist yet or through manual assembly coding) can work with 4 double-precision or 8 single-precision FP numbers at a time. But "at a time" can be misleading - I believe Sandy Bridge simply works with half of the AVX registers in each clock cycle. So there are fewer instructions to read if that's a bottleneck, but otherwise speed stays the same for now.

Bulldozer has an interesting trick, sharing 1 256-bit FPU between two integer processors. In applications that use regular SSE registers, half the FPU is allocated to each processor, and they can both do work at the same time. If one (and only one) of a pair of processors wants to do an AVX instruction, that instruction takes only one clock cycle, apparently working twice as fast. But if, as in DC, all the processors want to do AVX work at the same time, they have to trade off, and the speed again becomes 2 clock cycles per instruction, just like Intel.

In case you're wondering, yes, you could run 4 AVX-heavy WUs and 4 integer-heavy WUs on an 8-core bulldozer at the same time. If they were on the proper cores, the AVX work would be up to twice as fast. But current OSes don't know how to allocate work to the proper cores. Reportedly Windows 8 will know how, but it would seem to be a very tough thing to do.

somethingsketchy · Oct 13, 2011

Ken g6 said:
But current OSes don't know how to allocate work to the proper cores. Reportedly Windows 8 will know how, but it would seem to be a very tough thing to do.

This, to me, is quite a huge dealer breaker for me with Bulldozer. When I read that Windows 7 was optimized to deal with this scheduling for Bulldozer, I was wondering why AMD even bothered. They should have waited until Piledriver and then work on the poor single thread performance, or even the extra memory latency they've introduce. I figured with Phenom II, AMD would have been on track to match Intel, but that's not the case.

bryanW1995 · Oct 13, 2011

I brought this thread-scheduling issue up with John Fruehe 6 months ago. At the time, I thought that he gave a good answer, but apparently 6 months later they're still "working on it". Maybe they have top men working on it like in Raiders of the Lost Ark, but it seems like an issue with no easy answers.

biodoc · Oct 14, 2011

I'll wait until I see performance data with linux as OS on various dc projects including F@H before I pass judgement. However, I'm not optimistic.

ZipSpeed · Oct 14, 2011

I'm looking to add another cruncher into my arsenal and was dreaming big that Dulldozer would be a great fit. My eyes are now set on a 2600k. Hopefully AMD can fix the IPC and power consumption with Piledriver but I'm not holding my breath.

brownstone · Oct 16, 2011

Regardless of whether or not it was what everyone expected, I'd still like to see some F@H benchmarks. There seem to be quite a few similar threads floating around in different forums, but I can't seem to make it past all the "I'm butthurt" comments to get to any actual data. So if anyone knows where I can get F@H info without the useless/frivolous/needless/annoying/fanboy,anti-fanboy/obnoxious/disappointed/hurt/angry/offensive/ridiculous/hypothetical/I-told-you-so/argumentative/unoptimistic/flaming commentary, please, do tell.

theAnimal · Oct 16, 2011

According to the first comment for this review (from the reviewer) it gets about 14k ppd at stock on SMP (and also for -bigadv which is strange if true).

http://www.overclockers.com/amd-fx-8150-bulldozer-processor-review

Smartazz · Oct 16, 2011

Another problem with Bulldozer is that the 2600K can usually outperform it using a lot less power. I'm not sure how the 4 extra HT cores help in DC vs the 8 integer cores on Bulldozer.

nanaki333 · Oct 16, 2011

i should have some tomorrow. i have an 8150 i setup last friday to be a folder. going to fire it up tomorrow to see how it fares.

Mr. Pedantic · Oct 17, 2011

http://www.overclockers.com/amd-fx-8150-bulldozer-processor-review

Look in the comments - the very first one.

I had promised folding results to Shelnutt2, but they couldn't make the review. So, they will be posted in the first post!

Regular SMP work unit - 13698.9ppd

Bigadv work unit - 13859.2ppd

So between 13,500 and 14,000 at stock, which is right where it's positioned - around the PPD of a 2500K.

In other words...it kind of sucks.

brownstone · Oct 17, 2011

theAnimal said:
According to the first comment for this review (from the reviewer) it gets about 14k ppd at stock on SMP (and also for -bigadv which is strange if true).

http://www.overclockers.com/amd-fx-8150-bulldozer-processor-review

Thanks theAnimal for the link. What is the state of bigadv these days? Have the points been nerfed on them, are they as prevalent as they used to be?

nanaki333 said:
i should have some tomorrow. i have an 8150 i setup last friday to be a folder. going to fire it up tomorrow to see how it fares.

I look forward to the results nanaki333.

nanaki333 · Oct 17, 2011

I'm getting 12913.58 in fah currently without -bigadv

theAnimal · Oct 17, 2011

brownstone said:
Thanks theAnimal for the link. What is the state of bigadv these days? Have the points been nerfed on them, are they as prevalent as they used to be?

The bigadv WU base points were dropped a while ago; they used to give 50% bonus over SMP but that is now 20%. If you have at least 12 threads you can run the new bigger bigadv which give very nice PPD.

brownstone · Oct 17, 2011

nanaki333 said:
I'm getting 12913.58 in fah currently without -bigadv

Very interesting. I guess I may have to stick with the ol' 1090T for now then. On a side note, you are about to blow past me on the F@H stats...2 days, my drop in rank is imminent. Congrats!

theAnimal said:
The bigadv WU base points were dropped a while ago; they used to give 50% bonus over SMP but that is now 20%. If you have at least 12 threads you can run the new bigger bigadv which give very nice PPD.

lol, now with more biggerness! Now I just need to figure out how to get 12 threads on the cheap.

Smartazz · Oct 17, 2011

brownstone said:
Very interesting. I guess I may have to stick with the ol' 1090T for now then. On a side note, you are about to blow past me on the F@H stats...2 days, my drop in rank is imminent. Congrats!

lol, now with more biggerness! Now I just need to figure out how to get 12 threads on the cheap.

I wonder how much the Gulftown 6-core processors will go for used once SB-E is released.

sangyup81 · Oct 18, 2011

Smartazz said:
I wonder how much the Gulftown 6-core processors will go for used once SB-E is released.

Core2Quads are still pricey so I wouldn't expect those prices to get too low

nanaki333 · Oct 18, 2011

well my bulldozer was hanging on the same WU at 0% for over 9 hours when i checked a few minutes ago. now? it locked up.

i power cycled it and started it back up on a completely new WU, now it's pully 16365.33 PPD.

nanaki333 · Oct 18, 2011

wow.. ok.. this thing has serious problems. it just errored and restarted on a new project.

[03:59:16] CoreStatus = C0000029 (-1073741783)
[03:59:16] Client-core communications error: ERROR 0xc0000029

CPUz shows max TDP of 223W. that's almost as high as an i5 2400 + gtx 465 (unlocked to 470) running 24/7.

somethingsketchy · Oct 18, 2011

Are you overclocking the CPU? I'm wondering if F@H is hanging because of a bad OC?

nanaki333 · Oct 18, 2011

it is 100% stock. i had to put a phenom x4 840 in to flash the bios (brand new mobo) to get the FX to POST.

just errored again.

[04:16:22] Completed 5000 out of 500000 steps (1%)
[04:20:06] CoreStatus = C0000029 (-1073741783)
[04:20:06] Client-core communications error: ERROR 0xc0000029
[04:20:06] Deleting current work unit & continuing...

Anyone have BullDozer *Distributed Computing* benchmarks?

No Lifer

Golden Member

Diamond Member

Programming Moderator, Elite Member

Diamond Member

Programming Moderator, Elite Member

Golden Member

Lifer

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Anyone have BullDozer Distributed Computing benchmarks?