Thoughts, Rumors, or Specs of AMD fx series steamroller cpu

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

kernelc

Member
Aug 4, 2011
77
0
66
www.ilsistemista.net
I suspect that's part of the reason for large L2$ for Bulldozer. The P4 had a large L2$, but no L3$.

Yes, I think so. In a bandwidth-constrained scenario (as current Trinity, a dual module + iGPU) a large cache surely helps.

Llano also had 2x larger L2 size then AthlonII X4 for this very reason.

P4 needed large L2 cache for:
1) very high branch misprediction penalty
2) for the uops replay system that heavily depend of fast/large L2.

Thanks.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I may be confusing the Pentium/486 launch with the Pentium II launch. But nobody was a big fan of those original Pentiums as I recall. :) And everyone was very gungho about the high-clocked 486 clones AMD was putting out.

Anyway, the general pattern is of the first launch of a new microarchitecture being underwhelming.

Oh, yeah, I just realized that I bet what you were/are thinking of is the Pentium Pro (which was then reworked over for the PII and PIII and Core).
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
They were initially underwhelming though. The original Pentium clocked way too low, and didn't really take off until it was paired with faster memory and FSB. Pentium Pro was good in 32-bit Windows NT environments, not so much in 16-bit desktop which was still common even with Windows 95. Pentium II fixed the weaknesses on Pentium Pro.
 

kernelc

Member
Aug 4, 2011
77
0
66
www.ilsistemista.net
Question to kernelc: is code that fills up the WCC too quickly prevalent enough to be a large concern or is it more of a worst case scenario?

Hi,
store code is generally burstly: you have many stores, then nothing for relatively long time, then many other stores.

Anyway, 4KB of continuos stores are quite rare in general code, so WCC can work more or less well in the general case. However, the most heavy programs generally are store intensive and, considering how small the WCC is and that it is shared at the module level (between two cores), sometime it can really be filled up. To make things worse, the low L2 bandwidth means that flushing the WCC will require some time.

Testing BD with a very small programs which data will never exceed WCC size should be interesting: if performance in this kind of program are ok, we find what hamper Bulldozer the most ;)

Unfortunately I don't have a Bulldozer sample: it should be very interesting to profile it :|

Regards.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,736
155
106
AMD might as well just skip piledriver
bring on the steamroller !

That's how this thread and the promise of 10% more performance makes me think for some reason ...
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I keep going back to this and still think the L1$ is too small. Well, either the L1$ or the WCC or both. A slow L2 is a given in a deeper pipeline higher clocking architecture so why not mask it by increasing the size of the L1$? WCC? It seems to be they exacerbated the L2 issues by cutting down the L1$/WCC. Granted, given higher clockspeeds we would likely not even be mentioning the L2 cache speeds.
It's a matter of perspective. Server workloads, outside of HPC, tend to be many very small operations. Looking over a small segment of a string here, re-arranging a portion of a tree there. The total data set for most procedures can usually fit in a very small cache, provided that the prefetchers into that cache are giving good enough accuracy.

On the desktop side, you're often dealing with GUIs who's address space is a complete mess, game entities where you'll need to process tens to hundreds of KB of one or two data structures at a time, work on whole video frames or images, or at least significant tiles or slices of them, and so on, and much of it has poor regularity. I'd like to see some attempts to delve into it, because it's not going to be easy to test, but I don't think the 2-way L1I, nor 16KB L1D, are doing desktops any favors, when they're backed by such a slow cache. I think Intel's use of a small fast L2 and very large L3 fits much better for desktops and notebooks.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
On the desktop side, you're often dealing with GUIs who's address space is a complete mess, game entities where you'll need to process tens to hundreds of KB of one or two data structures at a time, work on whole video frames or images, or at least significant tiles or slices of them, and so on, and much of it has poor regularity. I'd like to see some attempts to delve into it, because it's not going to be easy to test, but I don't think the 2-way L1I, nor 16KB L1D, are doing desktops any favors, when they're backed by such a slow cache. I think Intel's use of a small fast L2 and very large L3 fits much better for desktops and notebooks.

I agree. The server market is still important to AMD, and I think this may be a large part of this design decision. Larger L1$ and a smaller faster L2$ in Steamroller would be a boon to desktop apps, but AMD is either unable to or unwilling to differentiate its designs between APU, DT CPU & SERVER; it's probably the former, especially after the recent cuts. In actuality, the APU is already a different die. Why not double the L2$ on the APU, but use smaller faster L2$ on the server/DT die?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
I agree. The server market is still important to AMD, and I think this may be a large part of this design decision. Larger L1$ and a smaller faster L2$ in Steamroller would be a boon to desktop apps, but AMD is either unable to or unwilling to differentiate its designs between APU, DT CPU & SERVER; it's probably the former, especially after the recent cuts. In actuality, the APU is already a different die. Why not double the L2$ on the APU, but use smaller faster L2$ on the server/DT die?

AMD might _think_ the server segment is important. But with a marketshare of ~5% its a place they should stop focusing at. Because they aint good at it. Its the worst segment of all for AMD by a huge margin. And it affects their desktop and laptop segments negatively because they move the failed concepts from the server segment down. Not to talk about I bet its just a pure loss financially in that segment.

Im not sure who even buys Opterons besides Cray anymore.
 
Last edited:

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
AMD might _think_ the server segment is important. But with a marketshare of 4-5% its a place they should stop focusing at. Because they aint good at it. Its the worst segment of all for AMD. And it affects their desktop and laptop segments negatively because they move the failed concepts from the server segment down.

Im not sure who even buys Opterons besides Cray.

Just like AMD's professional video card lines (FireGL and compute cards), it's a way more profitable market per sale. Ideally, you get scale by selling your stuff for whatever you can on the consumer market, and then reap the benefits on the highly profitable professional market.

Plus, AMD had some good advantages in the server market until Nehalem, and was even pretty competitive up until bulldozer.

That said, I think more focus on the consumer market would have benefited them greatly, even if it was just with more appropriate product placement.
IE: Focus on low power designs for laptops and ultraportables, and claim that highly profitable segment from Intel. AMD never put much effort into low power designs, even when they've had good power efficiency.
Instead of Phenom being a quad core, it should have been a native dual core design similar to the core 2 duo and with no L3 cache. In many real world situations, they probably could have had equal to or better performance than what they offered, but with vastly lower costs per chip.
Really, AMD shouldn't have launched consumer lines with L3 cache at all (at least not on the low end), it bloats die size to a large amount and AMD's designs still under perform.
Instead of bulldozer, a die shrunk and up-clocked Phenom II based design would have been better. A Bulldozer module is nearly 2x the size of a Phenom II core, ie, their innovative design doesn't seem to have saved them very much die space.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
The problem is FireGL/FirePro are the exact same silicon. Server CPUs are not and requires not only design, new mask, new fab run. But they also require alot of costly validation, different platform, different chipsets, suppoer and so on. Unless you go the single socket route to save abit on some of the parts.

I agree fully on the shrinked Phenom. AMD just seems to live in some alternative world. Plus their history is really against them. Its full of arrogance and hybris. We all remember P4s on fire and a firetruck. Yet its the other way around today. Phenom as 50% faster than Core 2. Bulldozer IPC increase, faster than everything with 505 as well..FX brand! And before all those we can talk about the "native quadcore" true dualcore etc. only to see the same company jumping on the MCM wagon itself.

Its just like if they hope people dont remember. The entire brand is just slammed into the ground.

And we can see with JF still being employed at AMD. Even tho they had mass layoffs. Something is terrible wrong.
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
And we can see with JF still being employed at AMD. Even tho they had mass layoffs. Something is terrible wrong.

I agree a lot of mistakes were made at AMD, but I think it is in poor taste to wish that JFAMD had been fired. I do not think he lied to us at any point, and even if he did it was on his own time. He was just a guy blowing smoke in a forum.

Anyway, I don't think Bulldozer is a lost cause. When BD was released I was sure Trinity would be a disaster, but it is actually pretty good. Steamroller + better adoption rates of newer instructions could show AMD in a much better light.
 

Charles Kozierok

Elite Member
May 14, 2012
6,762
1
0
I agree fully on the shrinked Phenom. AMD just seems to live in some alternative world. Plus their history is really against them. Its full of arrogance and hybris.

Good gravy -- you act like this is some sort of personal conspiracy against you.

If they could make the chip of your dreams, they would. For you, Bulldozer is a disappointment -- for them it is much, much more serious.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Good gravy -- you act like this is some sort of personal conspiracy against you.

If they could make the chip of your dreams, they would. For you, Bulldozer is a disappointment -- for them it is much, much more serious.

Why shouldnt I complain as a consumer? Should I just roll over on my back and be a yesman to the companies?

Thats exactly the mentality that does, that AMD continues to lie to people since 2005. Simply because they can get away with it.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
I agree a lot of mistakes were made at AMD, but I think it is in poor taste to wish that JFAMD had been fired. I do not think he lied to us at any point, and even if he did it was on his own time. He was just a guy blowing smoke in a forum.

Anyway, I don't think Bulldozer is a lost cause. When BD was released I was sure Trinity would be a disaster, but it is actually pretty good. Steamroller + better adoption rates of newer instructions could show AMD in a much better light.

Why is it poor taste? Would you want incompetent people giving your company bad PR employed? And how many bad should they keep? Enough for the ship to sink? I think it would be better both for him and AMD if he started to work elsewhere.

You have more faith than me in BD. Plus the new instructions wont change anything. Its abit like the magic patch...remember? The one that changed to Win8. And now is...I dont know..Win9? Any day now.
You just cant fix something that is fundamentality broken. Its like saying P4 could be fixed. BD is a speedracer design. The only way to fix it is clock it up massively. But then you hit that other famous problem like the P4.
 
Last edited:

Charles Kozierok

Elite Member
May 14, 2012
6,762
1
0
Why shouldnt I complain as a consumer? Should I just roll over on my back and be a yesman to the companies?

You can print up picket signs for all I care. But the stuff about "hubris" and "arrogance" makes no sense. This is a very tough industry and I assure you they are doing their best.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
You can print up picket signs for all I care. But the stuff about "hubris" and "arrogance" makes no sense. This is a very tough industry and I assure you they are doing their best.

It doesnt make sense? Do you remember the official AMD PDF with P4 jokes? Do you remember the Barcelona is 50% faster than Core 2? Or do you remember the Bulldozer will be 50% faster than Nehalem? The 9xx chipset scandal? All said to you with a straight face until release day so everyone with preorders sit with the duds.
Or the native dualcore, native quadcore arguments?

Maybe you are too young to remember those. :)

But the point is, dont let a company keep pissing you on the back while you basicly thank them for it.
Had AMD just been honest all the way through. It would be a whole other matter.
 
Last edited:

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
A Bulldozer module is nearly 2x the size of a Phenom II core, ie, their innovative design doesn't seem to have saved them very much die space.

It's actually smaller and they've saved some space. Iirc ~20%? That's not a bad tradeoff at all.

The people Deneb/Thuban thumping should realize that there's a BD-based Piledriver chip that already does everything better than Thuban/Deneb do on the same node. Trinity > Llano and that's with smaller cores and more space dedicated to the uncore.

I hate CMT and Bulldozer as much as the next guy (probably even more actually) but you have to give credit where credit is due: AMD actually pulled a rabbit out of the hat with Trinity and managed to increase performance and perf-per-watt from Llano on the same node more than Intel managed with a die shrink. Looking at the Trinity benchmarks it's hard to think that that's a Bulldozer derived core. Maybe they're not as stupid as we all thought they were.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
It's actually smaller and they've saved some space. Iirc ~20%? That's not a bad tradeoff at all.

The people Deneb/Thuban thumping should realize that there's a BD-based Piledriver chip that already does everything better than Thuban/Deneb do on the same node. Trinity > Llano and that's with smaller cores and more space dedicated to the uncore.

I hate CMT and Bulldozer as much as the next guy (probably even more actually) but you have to give credit where credit is due: AMD actually pulled a rabbit out of the hat with Trinity and managed to increase performance and perf-per-watt from Llano on the same node more than Intel managed with a die shrink. Looking at the Trinity benchmarks it's hard to think that that's a Bulldozer derived core. Maybe they're not as stupid as we all thought they were.

We only seen a prototype laptop havent we? Try look at the same review with the prototype Llano. Things turned out differently in real world products.

I am looking forward to the desktop parts so we can get better data.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
How is the CPU performance going to be any different? The only hindrance that can possibly change the benchmarks at all would be the RAM used and that depends on the individual laptops and even then only the GPU side.

That prototype was a Lenovo build, iirc. All they did was change the stickers.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
How is the CPU performance going to be any different? The only hindrance that can possibly change the benchmarks at all would be the RAM used and that depends on the individual laptops and even then only the GPU side.

That prototype was a Lenovo build, iirc. All they did was change the stickers.

It was a prototype laptop. And I doubt it was Lenovo.

When you talk performance/watt improvements. Its not exactly a good way of measureing it. You use 2 completely different laptops and draw a conclusion on that. Further more they can be more optimized and with better lower consuming components. Thats why I prefer real products.

http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope/4

Just look at the AMD Llano vs OEM Llano. Yes its a smaller battery in the OEM. But the results...huge difference besides the h.264. More than the battery would account for.
http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope/8

Or just look at the Intel laptops. You couldnt make a single claim on that illogical spread either.

If desktop Trinity gives same results. Then Hurra its good and dandy. But I just wish to see them first because I know how much laptops can be manipulated, both good and bad.
 
Last edited:

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Comparing the production laptops is always better because you take into account various things that go around the CPU, but that review was strictly based on the CPU itself. You can't claim that somehow AMD supersized it's chip.

The reason the Llano has such shitty perf-per-watt there is because Llanos have always been overvolted. It's the reason K10stat + Llano was considered (and still is) an awesome buy because the Llanos overclocked and underclocked like mad. People have been getting close to 3ghz with their A8-3500s on all 4 cores! At stock voltage you can easily get the chips over 2.4ghz on all 4 cores. Even still the Llanos weren't able to match SB in power consumption yet Trinity does that and even surpasses them both.

The video playback results are also off because Trinity uses a bit more of the CPU/GPU when playing video. Take a look at the review.

That was still a very crappy review. Techreport did a far better one despite not having access to an i5 SB to compare it to. If you read them both carefully you'll see AMD increased die size slightly (all of it to GPU), decreased the core size and yet managed to get really good GPU performance and better CPU performance from smaller cores (or module in this case). So yes, they increased performance with Trinity>Llano in every way imaginable and they did this on the same node. If you look at % of improvement AMD actually did more on the same node than Intel managed using a die shrink (and that includes power consumption figures which should be the most shocking).

The only benchmark where Llano compares to Trinity is in perf-per-watt in video playback and in cinebench multi-threaded due to the spread of the workload. It doesn't lose either of those but actually ties.

It isn't all dandy. I expect Trinity for desktop to under-impress because of how well the IB chips perform and the diminishing returns as you expand the chip's TDP and clocks as was the case with Llano.
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Why is it poor taste? Would you want incompetent people giving your company bad PR employed? And how many bad should they keep? Enough for the ship to sink? I think it would be better both for him and AMD if he started to work elsewhere.

You have more faith than me in BD. Plus the new instructions wont change anything. Its abit like the magic patch...remember? The one that changed to Win8. And now is...I dont know..Win9? Any day now.
You just cant fix something that is fundamentality broken. Its like saying P4 could be fixed. BD is a speedracer design. The only way to fix it is clock it up massively. But then you hit that other famous problem like the P4.


I know the general consensus is that BD sucks, modules suck, and AMD sucks, but I think the general idea of modules makes a ton of sense and the current implementation is just poor. Clearly modules aren't great for peak performance, but for perf/watt they are the way to go. The idea behind the module configuration is to achieve very high resource utilization.

I'm hoping that they improve on the caches in Vishera (wouldn't be surprising, given that it will be based off of server designs), and then for steamroller improve the frontend.