AMD FX-8150 shows up in Passmark CPU benchmark

aigomorla · Oct 3, 2011

996GT2 said:
Why does i7-2600K score 1000 points higher than i7-2600 when they are exactly the same performance @ stock?

duh.. cuz one system obviously has the intel inside sticker..

didnt u know stickers make your system faster!!!

put an intel inside sticker on a AMD system and OMG, you'll break physics even...

Dresdenboy · Oct 3, 2011

Abwx said:
http://en.wikipedia.org/wiki/Gauss–Legendre_algorithm

Super PI utilizes the x86 instruction set. These instructions date all the way back to the 8086 math coprocessor. While they were important for 80386, 80486, and Pentium they became obsolete when 3DNow! and Streaming SIMD Extensions were released.

http://en.wikipedia.org/wiki/Super_PI

super_pi_mod.exe

CPU clocks not halted (cycles) 552894
Retired instructions 298664
IPC 0,54
Retired uops 382941
Retired branch instructions 26436
Retired mispredicted branch instructions 292
Retired x87 floating point ops (add/sub/mul/div) 50288
Retired MMX/FP (x87, MMX, 3DNow!, SSE) ops 160136
Retired SSE ops 0

http://www.planet3dnow.de/vbulletin/showpost.php?p=4460110&postcount=3382

That an algorithm can be calculated by using integer numbers only doesn't mean that there is no way to use FPU ops for that task. Prime95 is an excellent example of calculating integer multiplications mod (2^n)-1 using SSE2 floating point FFTs.

Abwx · Oct 3, 2011

Dresdenboy said:
http://en.wikipedia.org/wiki/Super_PI

http://www.planet3dnow.de/vbulletin/showpost.php?p=4460110&postcount=3382

That an algorithm can be calculated by using integer numbers only doesn't mean that there is no way to use FPU ops for that task. Prime95 is an excellent example of calculating integer multiplications mod (2^n)-1 using SSE2 floating point FFTs.

It could be, but according to your quote from wikipedia :

Super PI utilizes the x86 instruction set. These instructions date all the way back to the 8086 math coprocessor. While they were important for 80386, 80486, and Pentium they became obsolete when 3DNow! and Streaming SIMD Extensions were released.

Superpi was designed before even MMX was implemented ,
let alone SSE1/2....

But then looking at your exemple of Superpi analysis by CodeAnalyst :

super_pi_mod.exe

CPU clocks not halted (cycles) 552894
Retired instructions 298664
IPC 0,54
Retired uops 382941
Retired branch instructions 26436
Retired mispredicted branch instructions 292
Retired x87 floating point ops (add/sub/mul/div) 50288
Retired MMX/FP (x87, MMX, 3DNow!, SSE) ops 160136
Retired SSE ops 0

What does the term "mod" stand for ?..modded ?..
Because the line of the analysis that i marked in red
doesnt make sense since Superpi support none of the
said instructions apart from X87 but then these instructions
retirement number is in the line above , so we can assume
that the second line is for purely integer ops , whatever
the name given by CodeAnalyst....

soccerballtux · Oct 3, 2011

zlejedi said:
VirtualLarry said:

Still sounds decent to me. I might get one. Looking at this motherboard. Looks pretty sick to me. Imagine that with an 8-core BullDozer, and four GTX460s, all crunching Distributed Computing.

Edit: I think that AMD is still ahead of Intel, not in pure computing horsepower, but as a platform company.

AMD's newest chipsets sport ALL SATA 6G ports, not this pathetic mish-mash of SATA 6G and SATA 3G like Intel. I think (could be wrong), that AMD's newest chipsets also support USB3 natively. Also, AMD's higher-end chipsets like the 890FX and 990FX, have mucho PCI-E lanes. That's one thing that you cannot get with Sandy Bridge, and that's a motherboard with four PCI-E x16 slots (that are at least PCI-E 2.0 x8 electrically). SB's on-chip PCI-E lanes simply aren't numerous enough, and even if they were, they cannot be split four ways, AFAIK.

Click to expand...

And all of those are utterly pointless to huge majority of users.

SATA2 will never bottleneck traditional HDDs
SATA3 is needed only for fast SSDs
PCI-ex lanes x8 vs x16 have been shown many times to have no noticeable impact on multi-GPU performance

welcome to the world of marketing friend

Tuna-Fish · Oct 3, 2011

Abwx said:
What does the term "mod" stand for ?..modded ?..
Because the line of the analysis that i marked in red
doesnt make sense since Superpi support none of the
said instructions apart from X87 but then these instructions
retirement number is in the line above , so we can assume
that the second line is for purely integer ops , whatever
the name given by CodeAnalyst....

Actually, no. The heading also covers several old x87 instructions, notably moves. AFAIK super pi "mod" only added the checksum, nothink more.

Dresdenboy · Oct 4, 2011

Abwx said:
It could be, but according to your quote from wikipedia :

Superpi was designed before even MMX was implemented ,
let alone SSE1/2....

But then looking at your exemple of Superpi analysis by CodeAnalyst :

What does the term "mod" stand for ?..modded ?..
Because the line of the analysis that i marked in red
doesnt make sense since Superpi support none of the
said instructions apart from X87 but then these instructions
retirement number is in the line above , so we can assume
that the second line is for purely integer ops , whatever
the name given by CodeAnalyst....

It's not my measurement. But any mods I know of just added changes like showing more digits of the measured time or adding prefetches (SSE3 mod, but I'm not sure about that). The main calculation hasn't been changed (very difficult and time consuming w/o the source code).

But to explain the difference in the measurement: There are performance monitor counter events, which could be mapped to these lines. And the 2nd line still includes "x87" (which also had instructions to load a float value from integer data in mem or store fp as int). Some x87 related event groups are (as listed by oprofile):

RETIRED_X87_FLOATING_POINT_OPERATIONS
0x01: Add/subtract ops
0x02: Multiply ops
0x04: Divide ops

RETIRED_MMX_FP_INSTRUCTIONS
0x01: x87 instructions
0x02: MMX & 3DNow instructions
0x04: SSE instructions (SSE, SSE2, SSE3, and SSE4A)

The first only covers pure calculation ops as it seems, so even fxch, fcomp etc. might land in the second group.
http://oprofile.sourceforge.net/docs/amd-family10-events.php

cantholdanymore · Oct 4, 2011

aigomorla said:
duh.. cuz one system obviously has the intel inside sticker..

didnt u know stickers make your system faster!!!

put an intel inside sticker on a AMD system and OMG, you'll break physics even...

There are some differences but whether or not they are enough to explain the difference results? Don't know

http://ark.intel.com/compare/52213,52214

nemesismk2 · Oct 4, 2011

aigomorla said:
duh.. cuz one system obviously has the intel inside sticker..

didnt u know stickers make your system faster!!!

put an intel inside sticker on a AMD system and OMG, you'll break physics even...

wow I didn't know the sticker made all the difference to the performance of an intel cpu. your comment made me laugh lol

psolord · Oct 4, 2011

NostaSeronx said:
Clock to clock
All hardware similar
4GB of RAM
570GTX

AMD FX is in the red(is bad) in 2d graphics, 3d graphics, and memory

i7 950 w/o hyperthreading and i7 2600K has a 70MHz advantage

@all
Sorry for the late reply, but some questions have been hovering around my head, since I saw this.

Shouldn't we expect better integer performance instead of better fpu performance, since BD has eight native integer units and only 4 native fpu units?

Will this FPU measurement affect gaming? If Passmark registers this fpu processing power, will games do the same?

Idontcare · Oct 4, 2011

psolord said:
Shouldn't we expect better integer performance instead of better fpu performance, since BD has eight native integer units and only 4 native fpu units?

That will be the difference between theoretical peak IPC versus actual IPC once other limiting factors are accounted for (cache latency, bandwidth, etc).

I'm not as on top of the details of BD as you are, but if what you say above is true then one possible conclusion here is that despite having a technically more powerful integer capability, that capability is somehow undermined by other limitations in the microarchitecture such that the technically weaker fpu manages to produce superior throughput at the platform level.

bronxzv · Oct 4, 2011

psolord said:
Shouldn't we expect better integer performance instead of better fpu performance, since BD has eight native integer units and only 4 native fpu units?

high throughput integer code (amenable to SIMD) will use SSE2 packed integers and here a BD module will be more or less on par with a SNB core, well at least on paper.

SNB is able to issue 3 simple SSE2 128-bit integer instructions per clock (*1) and BD only 2 per clock (*2) so a SNB core is even a bit faster than 2 BD "cores" on paper

*1: based on the Intel 64 and IA-32 Optimization Reference Manual
*2: based on my understanding (FPU pipes 2 and 3 "FPMAL" for SSE-2 integer), unfortunately the Bulldozer Optimization Guide has no instruction thoughput figures, it makes it a lot less valuable than it should be

Dresdenboy · Oct 4, 2011

bronxzv said:
high throughput integer code (amenable to SIMD) will use SSE2 packed integers and here a BD module will be more or less on par with a SNB core, well at least on paper.

SNB is able to issue 3 simple SSE2 128-bit integer instructions per clock (*1) and BD only 2 per clock (*2) so a SNB core is even a bit faster than 2 BD "cores" on paper

*1: based on the Intel 64 and IA-32 Optimization Reference Manual
*2: based on my understanding (FPU pipes 2 and 3 "FPMAL" for SSE-2 integer), unfortunately the Bulldozer Optimization Guide has no instruction thoughput figures, it makes it a lot less valuable than it should be

According to the BD SOM, all 4 FP pipelines do integer SSE stuff with different capabilities:
Pipe 0: simd, mmx, multiplier
Pipe 1: shuffles, packs, permutes
Pipe 2: simd, mmx, ALU
Pipe 3: simd, mmx, ALU, store

And move ops are eliminated.

psolord · Oct 4, 2011

Great! Thanks guys!

So let's wait and see what the train brings!

bronxzv · Oct 4, 2011

Dresdenboy said:
According to the BD SOM, all 4 FP pipelines do integer SSE stuff with different capabilities:
Pipe 0: simd, mmx, multiplier
Pipe 1: shuffles, packs, permutes
Pipe 2: simd, mmx, ALU
Pipe 3: simd, mmx, ALU, store

And move ops are eliminated.

you should have a more recent version than me, link welcome,
in mine (rev 3.03), bottom of page 37, 3rd bullet, it's stated that :

"

There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.

"

inf64 · Oct 4, 2011

It looks like the pipes that execute fma ops are also capable of doing integer simd stuff and this was maybe omitted from rev 3.03

bronxzv · Oct 4, 2011

inf64 said:
It looks like the pipes that execute fma ops are also capable of doing integer simd stuff and this was maybe omitted from rev 3.03

can you provide a link to this more accurate version of the doc?

Dresdenboy · Oct 4, 2011

bronxzv said:
you should have a more recent version than me, link welcome,
in mine (rev 3.03), bottom of page 37, 3rd bullet, it's stated that :

"

There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.
"

I collected that info from page 232 of the most recent version. It seems to me that what they mean by "packed integer" are the pipes capable of doing MMX ALU stuff.

inf64 · Oct 4, 2011

Sorry I can't since I don't have it . DDB has it . Edit: he posted already

aigomorla · Oct 4, 2011

cantholdanymore said:
There are some differences but whether or not they are enough to explain the difference results? Don't know

http://ark.intel.com/compare/52213,52214

incase u didnt understand i totally meant my comment as a joke.. as in i havent the clue either on why its not the same... so i am putting blame on the STICKER!!!

nemesismk2 said:
wow I didn't know the sticker made all the difference to the performance of an intel cpu. your comment made me laugh lol

Im glad it made one person laugh!

All hail the powered by ~~HKS~~ errr Intel Sticker!

bronxzv · Oct 4, 2011

Dresdenboy said:
I collected that info from page 232 of the most recent version. It seems to me that what they mean by "packed integer" are the pipes capable of doing MMX ALU stuff.

thanks, I see it in my version, so it looks like pipe 0, 2 and 3 are able to execute simple SSE-2 integers instruction (PAND, POR) just like Sandy , though only one pipe for shuffles (PSHUFx) half Sandy here, not sure for packed arithmetic (PADDX, PMULx) maybe better than Sandy (i.e. more than 2 padd per clock, 1 pmul per clock) in this area?

anyway, it will be far more clear if AMD publish instruction throughput values

Jake_Chicago · Oct 4, 2011

The Passmark score is right about where I thought it would be. They (AMD) claimed the FX series competed against Intel's Extreme edition CPU, and quite frankly it does at perhaps higher clock speeds. For instance, based on the one sample provided, I think it's fair to reason the FX-8150 would score 10,334 points at 5GHz. Plus, depending on the speed of the RAM and additional samples of FX processors; the FX-8150 has the potential to score some where in the area of 11,000-12,000 points or so. Therefore, I think the [initial] test sample at Passmark is legitimate.

aigomorla · Oct 4, 2011

You guys are all missing the point... if a test cant even compare the same processors as equal, how the hell can u guys take the rest of the test with credibility?

This is why a lot of people are calling foul.

A Stock i7-2600K should perform like a stock i7-2600k regardless if its Xeon, unless its tied to ECC ram.

This graph doesnt show that... so people who know how to read graphs are all calling FOUL.
People who have no clue how to read it.. and are just looking at pretty numbers are only going to get confused even more.

Dresdenboy · Oct 4, 2011

bronxzv said:
thanks, I see it in my version, so it looks like pipe 0, 2 and 3 are able to execute simple SSE-2 integers instruction (PAND, POR) just like Sandy , though only one pipe for shuffles (PSHUFx) half Sandy here, not sure for packed arithmetic (PADDX, PMULx) maybe better than Sandy (i.e. more than 2 padd per clock, 1 pmul per clock) in this area?

anyway, it will be far more clear if AMD publish instruction throughput values

x264 developer "Dark_Shikari" documents his findings in his chat log here: http://akuvian.org/src/x264/freenode-x264dev.log.bz2
He already posted some throughput numbers during the last days. For a start: http://www.planet3dnow.de/vbulletin/showthread.php?p=4501020#post4501020
and:

2011-10-04 04:46:38 < Dark_Shikari> C, with mode analysis shortcuts: 253 cycles
2011-10-04 04:46:45 < Dark_Shikari> My crappy, badly optimized XOP asm: 93 cycles
2011-10-04 04:46:56 < Dark_Shikari> This is kinda awesome
2011-10-04 04:49:35 < Dark_Shikari> Oh, and old without shortcuts: 379 cycles
2011-10-04 04:49:45 < Dark_Shikari> My asm is 4 times faster than the existing... wait where have we seen this before? XD
2011-10-04 04:49:57 < Dark_Shikari> It's just like SAD_4x4_x9 all over again!
2011-10-04 04:50:10 < JEEB>
2011-10-04 04:50:18 < JEEB> that sounds pretty awesome
2011-10-04 04:50:21 < Dark_Shikari> Except this time I'm still wondering how best to do it without vpperm
2011-10-04 04:50:33 < Dark_Shikari> Thanks AMD, for bringing back the best instruction ever after 15+ years of hiatus.

bronxzv · Oct 4, 2011

Dresdenboy said:
He already posted some throughput numbers during the last days. For a start: http://www.planet3dnow.de/vbulletin/showthread.php?p=4501020#post4501020

Interesting, so it looks like 128-bit AVX/FMA4 is faster than 256-bit, if Bulldozer is faster with legacy SSE than with AVX-256 it will be a legit
optimization to disable the AVX path for Bulldozer, I'm not sure that's something I will do however

Since people with access to systems discuss freely the timings already, I will be very interested to know which path is faster in these benchmarks:

http://www.inartis.com/Products/Kribi%203D%20Engine/Default.aspx

Simply select the "AVX disabled" link to force the SSE path

@DresdenBoy, since you post at planet3dnow.de, can you ask them to test it? it should take less than 5 minutes

AMD FX-8150 shows up in Passmark CPU benchmark

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Golden Member

Lifer

Lifer

Golden Member

Golden Member

Senior member

Diamond Member

Platinum Member

Elite Member

Senior member

Golden Member

Platinum Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Senior member

Junior Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Golden Member

Senior member