Steamroller on AM3+

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lifeblood

Senior member
Oct 17, 2001
999
88
91
I wouldnt put my hopes up for Steamroller on AM3+. FM2 and later FM3 will rule them all in the desktop segment. And with the continual layoffs. You gonna see AMD trying to simplify everything as much as possible.
I had no expectation of Steamroller on AM3+. Instead I fully expected it on FM2 or FM3. That's the point. I expected AMD to release its next "enthusiast processor" after Vishera as an APU on FM2/3 with steamroller cores. Their most recent roadmaps made it clear that Vishera would be the last of the pure CPUs (except for some low end server CPUs). Now I'm hearing different. Not that it changes anything but does anybody else believe these rumors? Or is it just somebody just being misinformed?

I need a new CPU. If PD is competitive I'll get it since I already own an AM3+ motherboard and hate rebuilding my PC from scratch. If its not competitive then I'll bite the bullet and get a new Intel CPU/Motherboard. I'm not going to keep using my current CPU in the hope that SR is awesome. BD was supposed to be awesome. Then PD was supposed to be awesome. Now SR is supposed to be awesome. I'm tired of it. :mad:
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
I had no expectation of Steamroller on AM3+. Instead I fully expected it on FM2 or FM3. That's the point. I expected AMD to release its next "enthusiast processor" after Vishera as an APU on FM2/3 with steamroller cores. Their most recent roadmaps made it clear that Vishera would be the last of the pure CPUs (except for some low end server CPUs). Now I'm hearing different. Not that it changes anything but does anybody else believe these rumors? Or is it just somebody just being misinformed?

I need a new CPU. If PD is competitive I'll get it since I already own an AM3+ motherboard and hate rebuilding my PC from scratch. If its not competitive then I'll bite the bullet and get a new Intel CPU/Motherboard. I'm not going to keep using my current CPU in the hope that SR is awesome. BD was supposed to be awesome. Then PD was supposed to be awesome. Now SR is supposed to be awesome. I'm tired of it. :mad:

Here is your PD for AM3+
http://forums.anandtech.com/showthread.php?t=2276475

I can highly recommend the 3570K :p
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
I can highly recommend the 3570K
FX-8350 beats the i7 3770K in MT and the i5 3570K in ST while benchmarking the new Cinebench R13(w/ R14 render). Only issue is power where Piledriver Orochi uses 1.33x to 1.5x as much power in comparison to the i7 3770K.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
FX-8350 beats the i7 3770K in MT and the i5 3570K in ST while benchmarking the new Cinebench R13(w/ R14 render). Only issue is power where Piledriver Orochi uses 1.33x to 1.5x as much power in comparison to the i7 3770K.

So somehow the FX-8350 gained 50% ST performance in an unreleased Cinebench version? Oh where did I hear something similar before...

:whiste:
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Actually early bulldozer was going to be the opposite of the traditional - aka "fat" - core design.

The original incarnation of what was to come after the Stars core microarchitecture was going to be slim cores and lots of them - it really was planned to be Niagara on x86 (i.e. Moar cores with piss poor IPC).

When the quad-core kentsfields came out in late 2006 that roadmap was torn up because AMD realized they couldn't possibly go to market and compete with the equivalent of a 12-core bobcat versus a quad-core Nehalem, single-threaded IPC had to be competitive.

That is when the fattened up version of bulldozer was envisioned, and is what we got now. A fatter-core version was never in the plans as far as I am aware.

the only thing that i found is that bulldozer, at that time, was going to use sse5 :p

and that bobcat was really delayed too... from 2009 to 2011 o_O
hehe, if bobcat was realised in 2009 things would be very diferent :eek:
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
So somehow the FX-8350 gained 50% ST performance in an unreleased Cinebench version? Oh where did I hear something similar before...

:whiste:
Somehow, Cinebench R13(w/ R14 render) is using a newer instruction set that has greater ILP which allows a single core to utilize the full floating point unit.

:whiste:

Bulldozer, Powerful Single-threaded Engine, Medicore Multi-threaded Engine. Steamroller, Powerful Single-threaded Engine, Above Average Multi-threaded Engine. Ivy Bridge SMT(CMP), same as Bulldozer/Steamroller, Awful Multi-threaded Engine(CMP:Excellent Multi-threaded Engine).
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Somehow, Cinebench R13(w/ R14 render) is using a newer instruction set that has greater ILP which allows a single core to utilize the full floating point unit.

:whiste:

Bulldozer, Powerful Single-threaded Engine, Medicore Multi-threaded Engine. Steamroller, Powerful Single-threaded Engine, Above Average Multi-threaded Engine. Ivy Bridge SMT(CMP), same as Bulldozer/Steamroller, Awful Multi-threaded Engine(CMP:Excellent Multi-threaded Engine).


There are 2 options.
1: Utter BS.
2: Cinebench added AVX etc, and you compare across 2 different versions when claiming PD is faster.

The numbers might change with a new benchmark. But the result will stay the same:
44768.png

44769.png


And a stock FX-8350 gets 6.82 points in the MP. Mainly from the higher clock.
 
Last edited:

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
ShintaiDK: You are correct that Cinebench 11.5 and R13 with R14 render are totally different scores. The problem is that for most of us R13 is not available yet so comparing the scores is ridiculous. It's like saying the PileDriver 8350 stock is faster than the Bulldozer 8150 stock. No kidding! 4 Ghz vs 3.6Ghz plus the PileDriver has some improvements.

BTW, my 8150 in rig 3 below at 4.6Ghz scores 7.51 in Cinebench MT so I'm satisfied.
 

Abwx

Lifer
Apr 2, 2011
12,038
5,014
136
Funny is that Cinebench is compiled using ICC wich is known
to have different dispatching for CPUs , either it is an Authentic_Intel_Cpu
and it has fast routines dispatched or if not fulfilling this requirement a slower
path is implemented.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
Funny is that Cinebench is compiled using ICC wich is known to have different dispatching for CPUs , either it is an Authentic_Intel_Cpu and it has fast routines dispatched or if not fulfilling this requirement a slower path is implemented.
The issue isn't that AMD is on the slower path, it is that Bulldozer's FPU hates legacy code. You won't get Bulldozer anywhere near maximum throughput if you don't use SSE4.1 and newer.

Cinebench R11.5: SSE2 on AMD, AVX128 on Intel.
Cinebench R13: SSE4.1 on AMD, AVX256 on Intel.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Funny is that Cinebench is compiled using ICC wich is known
to have different dispatching for CPUs , either it is an Authentic_Intel_Cpu
and it has fast routines dispatched or if not fulfilling this requirement a slower
path is implemented.

This argument gets old over the years.

If it is true and AMD has still not managed to figure out a way to improve the situation for the benefit of themselves as well as their customers that have little choice but to purchase and run ICC compiled code then at this point what does it really matter anymore?

The performance is what it, for whatever the reasoning, and excuses don't pay the bills or sell the product.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,696
136
That's how a patched Win7 system (and win8 natively) schedules for it now. It's much, much closer to what the architecture really is, but it didn't help much. Originally, MS listened to AMD marketing and treated them the same, but it works slightly better scheduling it just like HT.

Why not go the full Monty and treat each module as an 8-issue front-end with two FPUs...? :whiste:
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
If it is true and AMD has still not managed to figure out a way to improve the situation for the benefit of themselves as well as their customers that have little choice but to purchase and run ICC compiled code then at this point what does it really matter anymore?
Most people who run AMD hardware compile with Industrial Compilers like PGI. Most cases though they can't show benchmarks because those are internal and under NDA. The main issue with AMD is capacity there isn't enough Interlagos/Abu Dhabi procs to go around.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Most people who run AMD hardware compile with Industrial Compilers like PGI. Most cases though they can't show benchmarks because those are internal and under NDA. The main issue with AMD is capacity there isn't enough Interlagos/Abu Dhabi procs to go around.

I've personally used PGI since 1997. I know a fair number of people who run AMD hardware. I don't know a single person in real-life other than myself who uses PGI compilers...and yet you are claiming that most people who run AMD hardware go to the trouble of recompiling with PGI?

Regarding capacity...they found enough capacity somehow to create so many Llano chips that they couldn't sell and are now having to write-down to the tune of $100m. Capacity exists, demand doesn't.

But if the mythical performance exists as you claim then surely word would get around, as it always does, and demand would have existed and AMD would have put capacity to better use than making a bunch of unsellable Llanos.
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
I think NostaSeronx was referring to people who run Opterons, not home users. That being said, AMD is certainly not capacity constrained at the moment, at least on their 32nm product-lines at this point in time.
 

sequoia464

Senior member
Feb 12, 2003
870
0
71
Lots of interesting posts in this thread as always, but back to the OP's question; does anyone have any insight beyond the posted links on Steamroller being released on AM3+?

All I have ever seen, and any references that I have found, refer back to that inquirer article that, as was pointed out to me previously, does not contain a source or any type of press release from AMD.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I think NostaSeronx was referring to people who run Opterons, not home users. That being said, AMD is certainly not capacity constrained at the moment, at least on their 32nm product-lines at this point in time.

Thread is about steamroller on AM3+, a consumer platform, and the app under discussion was Cinebench, a consumer app, if he is talking about other segments with other hardware and other apps then that would explain why I am not able to follow the context of his statements.

But the point still stands, regardless, that if there were any "unspoken rumors" hidden by NDA's and so forth then NDA or not the rumor would get back to the ears of people at AMD who most surely spend some amount of time and effort testing compilers and looking for shenanigans so they can make more money by pressing Intel over anti-competitive measures.

I hope they would be, I don't want Intel holding back performance on AMD chips anymore than you do. But it is up to AMD to care about its customers enough to ferret out those things if they really are happening. AMD's customers should not be left to wondering why a random blog dude is finding this stuff but AMD isn't (or aren't doing anything about it when they do become aware of it).

The problem isn't that there is a blog guy (and yes I know he is credible on the matter, but that isn't the point) or that Intel prioritizes spending its own dollars towards advancing the performance of its own chips with its own compilers over that of spending money to enhance the performance of its competitor's chips; rather, the problem there is that AMD is standing idle leaving their customers to the mercy of Intel's compilers (if the compilers really are pulling shenanigans as has been claimed by the blog).
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Why not go the full Monty and treat each module as an 8-issue front-end with two FPUs...?

Power consumption would be astronomical if the modules were treated like individual cores + HT. The 20% CMT cut, although quite high, means you can leave the other 3 modules asleep. They're damned with either approach. Either the chip performs slightly better in lightly threaded scenarios, anything from 2-to-4, with huge power consumption, or it performs slightly worse but consumes far less power. Though the performance was obviously lacking, the power consumption is a much larger concern given the wide variety of systems the chips find themselves in -- remember that OEMs used Bulldozer as well and they don't exactly pick quality PSUs and motherboards

- it's far more than that, actually. You'll also want to stick the threads sharing common resources on the same module due to the shared cache, namely the L2, so it's much more complicated than =>0>3 then 4>7 in chronological order. Then there's also the turbo and how it fairs within modules and between modules.
 
Last edited:

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,696
136
Power consumption would be astronomical if the modules were treated like individual cores + HT. The 20% CMT cut, although quite high, means you can leave the other 3 modules asleep. They're damned with either approach. Either the chip performs slightly better in lightly threaded scenarios, anything from 2-to-4, with huge power consumption, or it performs slightly worse but consumes far less power. Though the performance was obviously lacking, the power consumption is a much larger concern given the wide variety of systems the chips find themselves in -- remember that OEMs used Bulldozer as well and they don't exactly pick quality PSUs and motherboards

- it's far more than that, actually. You'll also want to stick the threads sharing common resources on the same module due to the shared cache, namely the L2, so it's much more complicated than =>0>3 then 4>7 in chronological order. Then there's also the turbo and how it fairs within modules and between modules.

That's what I like around here, you ask a question and you get a well reasoned answer...():)

I was just being being curious...:confused:
 

lifeblood

Senior member
Oct 17, 2001
999
88
91
All I have ever seen, and any references that I have found, refer back to that inquirer article that, as was pointed out to me previously, does not contain a source or any type of press release from AMD.
Yea, thats part of my problem. I haven't found anything else, official or otherwise, that gave this any credence. Their last roadmap says that after Vishera everything desktop (budget to performance) will be an APU. That means FM2/3 and rules out AM3+.

However, things can change. Lets pretend they do not feel they will be ready to release an enthusiast class chipset on FM2/3 by the time SR is released. What can they do? The easy answer is to release SR on AM3+. But that makes me wonder why the hell can't they build and release a enthusiast class chipset for FM2? It just doesn't add up.

Hopefully AMD will release new roadmaps to clear this up. Regardless I'm not going to trust this until something official comes out.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Yea, thats part of my problem. I haven't found anything else, official or otherwise, that gave this any credence. Their last roadmap says that after Vishera everything desktop (budget to performance) will be an APU. That means FM2/3 and rules out AM3+.

However, things can change. Lets pretend they do not feel they will be ready to release an enthusiast class chipset on FM2/3 by the time SR is released. What can they do? The easy answer is to release SR on AM3+. But that makes me wonder why the hell can't they build and release a enthusiast class chipset for FM2? It just doesn't add up.

Hopefully AMD will release new roadmaps to clear this up. Regardless I'm not going to trust this until something official comes out.

Enthusiast chipset? Didnt they die long ago? Just look at Z77 vs the others. Nothing there besides a switch flipped. :p
 

dezz

Junior Member
Sep 16, 2011
4
0
0
I can fix BD architechture for you :

Die-shrink + lower latency caches + double the FPUs. Done.

As long as they try this bullcrap 'module' idea with gimped/incomplete 'cores', then performance will remain very hit or miss depending on app. Fix it so Integer + FPU = 1 core, multiply by 4, 6, or 8, and bam, it will work well for all scenarios.
No. It seems you have no idea what's really wrong with BD. I'm sorry, but it's just funny you were showing your 'ignorance' in an AnandTech Forums topic on Steamroller, as it was AnandTech that recently published an article about the most important changes of Steamroller, thus revealing the weakest points of Bulldozer (and Piledriver)... The weakest one being the shared Instruction Decoders! Having dedicated decoders for each integer core allows for up to 30% IPC improvement! There are a few other changes as well. (No, the latency of the caches won't change much. Wouldn't hurt, but weren't that important.)

Here is the article:
AMD's Steamroller Detailed: 3rd Generation Bulldozer Core

Look at an 8350 as a quad core with an AMD form of HT, and it actually looks half decent.

Yes, that's true. BD were also a marketing failure...