Athlon X4 845 ExcavatorIPC benchmarks

lixlax · Mar 14, 2016

Even if the IPC is only at SB levels it will still make Zen based processors the most competitive AMD CPU's in the last 5 years or so. Although I feel like they'll have to do a little better to get the masses to buy those(and make some money).
And very nice effort looncraz, even the biggest tech sites doesn't bother to do so deep analysis when new processors/architectures are released.

FlanK3r · Mar 14, 2016

At XS froum justapost got near 4400 MHz OC with Athlon 845

http://www.xtremesystems.org/forums...-how-it-goes&p=5272907&viewfull=1#post5272907

PS: Carizzo is SUperpi32M fun, the best AMD ever for this oldschool benchmark
http://hwbot.org/submission/3153973_gorod_superpi___32m_athlon_x4_845_10min_50sec_640ms

Yuriman · Mar 14, 2016

That's a lot closer to Intel IPC than I've seen in a while:

After normalizing for clockspeed, the difference is only about 25% with Ivy.

Shehriazad · Mar 14, 2016

Yuriman said:
That's a lot closer to Intel IPC than I've seen in a while:

After normalizing for clockspeed, the difference is only about 25%.

Shhhhh....people are gonna use this one post to tell everyone how AMD perf is gonna be at least 15% above Intel once Zen is out!!! Quick, hide yourself!!!! xD.

But yea, in some "benches" the chip doesn't look half bad.

FlanK3r · Mar 14, 2016

still Im thinking information about Zen and October relase+IPC near Broadwell are fakes. What I know, the Zen is planed at the end of this year, seems December time. And nospecific information about comparison performance to Intel.

el etro · Mar 14, 2016

Yuriman said:
That's a lot closer to Intel IPC than I've seen in a while:

After normalizing for clockspeed, the difference is only about 25% with Ivy.

Superpi is pretty much legacy code.

DrMrLordX · Mar 14, 2016

looncraz said:
The module penalty is very much still there and can be seen in the relative multi-threaded performance in reference to Phenom II or the i5.

Maybe we're having some miscommunication about what is the "module penalty"? Piledriver and Bulldozer had the infuriating quality of producing inferior throughput in some sparsely-threaded applications when the thread scheduler would indiscriminantly load two threads onto the same module. For example, a 4m/8t Bulldozer running an application that spawned four threads would run significantly faster if one thread could be allocated to each module instead of allowing two threads per module (or something similar) for any significant amount of time.

SR and XV aren't nearly so bad, as your own comparisons highlight, though there still isn't 100% performance scaling moving from one thread per module to two threads per module. It's a lot closer for SR and XV than it was for Bulldozer, but let's face it, AMD's CMT implementation isn't good enough to boast 100% performance increase. On Cinebench R10, my 7700k manages 84% thread scaling going from ST to MT (4 threads). That's quite good compared to some of the nightmarish scaling in fp apps that happened on Bulldozer.

Of course, it also helps that modern OSes (Linux and Win10 in particular) are much better about scheduling threads on modules than Windows 7 or XP used to be.

In response to SuperPi, I think XV is doing well in that bench due to the cache improvements XV has over SR. And SR was no slouch in Super Pi compared to other AMD chips!

looncraz · Mar 14, 2016

NTMBK said:
So you're in the same boat as AMD :awe:

Great work, thanks a lot for digging into this! You should try selling your article to a tech news site.

I hadn't thought of either of those

looncraz · Mar 14, 2016

DrMrLordX said:
Maybe we're having some miscommunication about what is the "module penalty"? Piledriver and Bulldozer had the infuriating quality of producing inferior throughput in some sparsely-threaded applications when the thread scheduler would indiscriminantly load two threads onto the same module. For example, a 4m/8t Bulldozer running an application that spawned four threads would run significantly faster if one thread could be allocated to each module instead of allowing two threads per module (or something similar) for any significant amount of time.

SR and XV aren't nearly so bad, as your own comparisons highlight, though there still isn't 100% performance scaling moving from one thread per module to two threads per module. It's a lot closer for SR and XV than it was for Bulldozer, but let's face it, AMD's CMT implementation isn't good enough to boast 100% performance increase. On Cinebench R10, my 7700k manages 84% thread scaling going from ST to MT (4 threads). That's quite good compared to some of the nightmarish scaling in fp apps that happened on Bulldozer.

Of course, it also helps that modern OSes (Linux and Win10 in particular) are much better about scheduling threads on modules than Windows 7 or XP used to be.

In response to SuperPi, I think XV is doing well in that bench due to the cache improvements XV has over SR. And SR was no slouch in Super Pi compared to other AMD chips!

The module penalty is still there, but, yes, it is tiny compared to Bulldozer or Piledriver... but huge compared to not having one at all.

A little rundown of the difference I calculated (using Phenom II X4 scaling values):

Code:

Bench   |   x4 XV   |  x4 NoMod | Improvement
CB R10  |    7708   |    8378   |    8.7%
CB R11.5|    2.99   |    3.26   |    9.0%
CB R15  |    258    |     298   |    15.5%
GeekInt |    7935   |    8406   |    5.9%
GeekFPU |    6474   |    7312   |    12.9%
3dPM    |    153    |    195    |    27.4%
7-zip   |    9055   |    9644   |    6.5%

This also includes some cache penalties (L3, mostly), which probably impacts 3dPM more than anything else, but should only be a small part of the rest of the results. In any event, we'd never really expect 100% scaling, 90% is quite decent. I have some code designed to scale to the highest degree possible, and it scales 100.2%. Yeah, figure that one out, LOL!*

SuperPi is definitely a sweet spot for XV. It gains more than almost anything else.

*There's a fixed-cost for n-cores, so once you get enough threads going, they overwhelm the cost to execute in one thread and scale better and better. With synthetic loads, Intel's HT can have 100% scaling - giving me 802% more performance than with just one thread.

Search

Athlon X4 845 ExcavatorIPC benchmarks

lixlax

Senior member

FlanK3r

Senior member

Yuriman

Diamond Member

Shehriazad

Senior member

FlanK3r

Senior member

el etro

Golden Member

DrMrLordX

Lifer

looncraz

Senior member

looncraz

Senior member

TRENDING THREADS