• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

First Steamroller processor core exposure

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Uh, links to both resumes and the 15h manual?
I posted the link to the news about new SOG manual update in Kabini thread. It got swallowed up by spam and offtopic bickering unfortunately. Manual itself is public and anyone can DL it 😉.

Since I'm such a nice guy here it is(news about it at p3dnow):
http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi?category=1&id=1368122313

Old manual (prior to May 2013 update):
file.php


New manual (May update):
file.php


Direct link to the manual @ AMD's website.
You want the page 591- it's the table above 😉.

Simple google search will get you another information you seek:
http://ca.linkedin.com/pub/james-fry/49/216/a56

James Fry's Experience


SOC Director for Kaveri and Kaveri2.0 APUs

AMD


Public Company; 10,001+ employees; AMD; Semiconductors industry
September 2010 – Present (2 years 9 months)
Managed a large project team spanning the globe on a multi-year project with multi-billion dollars of expected revenue
 
Last edited:
LOL, Excavator is still on 28nm but they went ahead and doubled FPU resources?

What power envelope are they targeting? And they'd better be able to charge high prices for these, since the dies will be huge.
 
New manual (May update):
file.php



Piledriver can exe 256bit AVX but to do so it must first broke them in two 128b instructions before handling the said two sub instructions.

It seems that SR can exe the 256bit instruction in a row , without
the need to break it , wich reduce the number of necessary cycles
but this doesnt imply more FP units , just a better use of the existing ones.
 
Last edited:
LOL, Excavator is still on 28nm but they went ahead and doubled FPU resources?

What power envelope are they targeting? And they'd better be able to charge high prices for these, since the dies will be huge.

Please you are talking about things you simple do not know (none of us do, except AMD themselves).

1st we do not know if the screenshot is genuine or not.

2nd what is the die size of it? It's not done on 32nm so the only option is 28nm and below. We know AMD has means to reduce die area and power by using specialized tools plus the new core has advantage of smaller node(presumably 28nm).
Present module die area with L2 cache(2MB) is ~30.9mm^2. Given that AMD can reduce the size with HDL library by up to 30% (their claim for an example done on FP unit in BD) plus you get advantage of smaller structures due to 32->28nm shrink, it's possible the new module won't be larger (if that) than BD/PD module.The cache will scale nicely with a shrink so 4C/2M part with no L3 and an (beefed up GCN) iGPU may end up being just somewhat bigger than Richland @ 32nm is. Something in the range of 265-280mm^2 for 2M+512SP APU with 4MB of L2 cache(total).

3rd, if the shot is real we do not know what that module in the OP is. Is it SR,SR+? AN EX module?
 
Piledriver can exe 256bit AVX but to do so it must first broke them in two 128b instructions before handling the said two sub instructions.

It seems that SR can exe the 256bit instruction in a row , without
the need to break it , wich reduce the number of necessary cycles
but this doesnt imply more FP units , just a better use of the existing ones.

That's much more likely considering AMD has officially stated that Steamroller packs 2x128bit FMACs.
 
What power envelope are they targeting? And they'd better be able to charge high prices for these, since the dies will be huge.

Even if the die was 50% larger than BD/PD it would still have roughly 2/3rd's of the die per wafer. That's ~460mm2 at around 120 DPW.

The question is would you pay $300 for a CPU that will crush even an 8-thread Haswell (at an obvious power consumption penalty), because that would get AMD the same money as they are making on the $200 8350 now (assuming a 50% larger die).

AMD has a lot of wafers to use up at Global F(l)oundries, so no better way to do it that go large and take back the performance crown. We're just guessing, but if you were AMD what would you have done?
 
Last edited:
Even if the die was 50% larger than BD/PD it would still have roughly 2/3rd's of the die per wafer. That's ~460mm2 at around 120 DPW.

The question is would you pay $300 for a CPU that will crush even an 8-thread Haswell (at an obvious power consumption penalty), because that would get AMD the same money as they are making on the $200 8350 now (assuming a 50% larger die).

AMD has a lot of wafers to use up at Global F(l)oundries, so no better way to do it that go large and take back the performance crown. We're just guessing, but if you were AMD what would you have done?

You assume GF's yields on the 28nm process will be good. Remember how craptacular yields were on 32nm at first? So 50% larger die doesn't mean that cost per die scales up proportionally...it could be much worse.

I would much rather they move their GPUs and "Cat" cores to GloFo...not sure why they're having their stuff built at TSMC other than that GloFo is a bunch of hypsters that don't yet have a viable 28nm process.
 
Not only that, what's the TDP of such a big chip? 200W?

it isn't that much bigger, look at the L2 array on the right hand edge, compare that to bulldozer, relative to the L2 it's maybe 10-15% bigger, factor in say 10% smaller from 28nm and its almost the same size.


all the expert naysayers explain how you go about faking something like this, your obviously so across mirco uarch that you can point out all the BS in this fake.

i think question is what is it, not is it fake. its not SR as detailed @ hotchips, thats for sure.


It seems that SR can exe the 256bit instruction in a row , without
the need to break it , wich reduce the number of necessary cycles
but this doesnt imply more FP units , just a better use of the existing ones.
to me that seems unlikely that increases FPU scheduler complexity for only a 1 cycle gain, adds no MT benifit ( could actually be a penalty) and almost no ST benefit ( 1 cycle in a 22 stage pipeline..........)
 
Last edited:
it isn't that much bigger, look at the L2 array on the right hand edge, compare that to bulldozer, relative to the L2 it's maybe 10-15% bigger, factor in say 10% smaller from 28nm and its almost the same size.


all the expert naysayers explain how you go about faking something like this, your obviously so across mirco uarch that you can point out all the BS in this fake.

i think question is what is it, not is it fake. its not SR as detailed @ hotchips, thats for sure.


to me that seems unlikely that increases FPU scheduler complexity for only a 1 cycle gain, adds no MT benifit ( could actually be a penalty) and almost no ST benefit ( 1 cycle in a 22 stage pipeline..........)

My guess is that this is Excavator.
 
My guess is that this is Excavator.

Yet if they have Excavator in this state ( a taped out module) why even bother with steamroller. This floor plan still looks "bulldozer era" Excavator was supposed to bring much more automated floor plan, thus units would be less symmetrical ( look at bobcat/jaguar). Everything is still symmetric and the FPU's / alu's, aglu's all look like bulldozer/piledriver.

Maybe with the less aggressive node transitions this is a 1/2 between what SR and EX was going to be.

If this is the module thats getting released as kaveri then i have to buy one regardless of performance becuase they have obviously given it a red hot go :biggrin:.
 
Yet if they have Excavator in this state ( a taped out module) why even bother with steamroller. This floor plan still looks "bulldozer era" Excavator was supposed to bring much more automated floor plan, thus units would be less symmetrical ( look at bobcat/jaguar). Everything is still symmetric and the FPU's / alu's, aglu's all look like bulldozer/piledriver.

Maybe with the less aggressive node transitions this is a 1/2 between what SR and EX was going to be.

If this is the module thats getting released as kaveri then i have to buy one regardless of performance becuase they have obviously given it a red hot go :biggrin:.

Erm, if this is scheduled for late 2014/early 2015, then they had better have something taped out by now...
 
Erm, if this is scheduled for late 2014/early 2015, then they had better have something taped out by now...

No way! to get to this point they have already completely finalized the uarch, they have finalized a floor plan, have sent it off to the Fab, samples have comeback and then someone has decided to give out a dieshot.

show me one other dieshot that came out 18-24 month ahead of the chip! Where is the steamroller dieshot and where is the more automated design of excavator. There hasn't even been a single detail of excavator given yet we have a complete dieshot.

Sorry that doesn't add up.
 
No way! to get to this point they have already completely finalized the uarch, they have finalized a floor plan, have sent it off to the Fab, samples have comeback and then someone has decided to give out a dieshot.

show me one other dieshot that came out 18-24 month ahead of the chip! Where is the steamroller dieshot and where is the more automated design of excavator. There hasn't even been a single detail of excavator given yet we have a complete dieshot.

Sorry that doesn't add up.

Then what is this die shot of? Steamroller has already been detailed and the much wider FPU/VPU doesn't match the uarch description.
 
The question is would you pay $300 for a CPU that will crush even an 8-thread Haswell (at an obvious power consumption penalty), because that would get AMD the same money as they are making on the $200 8350 now (assuming a 50% larger die).
No it actually gets AMD more money. It may be the same amount of $ per mm2 (assuming good yields) but when the actual variable cost of die is measure in the dozens of dollars and you are getting paid $200+ vs $100+ for a variable item it is better to get the $200 dollar profit. Your fixed costs, the R&D and the Salaries are going to be the same regardless.
 
Last edited:
Then what is this die shot of? Steamroller has already been detailed and the much wider FPU/VPU doesn't match the uarch description.

I know it doesn't. i don't know what it is for sure, if it is excavator it looks like its 28nm not 22/20nm but without some kind of external interface its very hard to tell. I think the question to ask is what happen/ where is the steamroller core that was detailed at hotchips, could it have already been dead before that presentation was given :awe:.

do you really expect AMD to be on 28nm in 2015.
 
Then what is this die shot of? Steamroller has already been detailed and the much wider FPU/VPU doesn't match the uarch description.

Wouldn't be the first time AMD published a design detail that was totally wrong.

Like the correction moving Zambezi from 2b to 1.2b transistors..

If you ask me, the timing of the SOG update fits Steamroller better than Piledriver. Make that change too far in advance and it'll just confuse people. There were some Steamroller related updates to GCC not that much longer ago.
 
Wouldn't be the first time AMD published a design detail that was totally wrong.

Like the correction moving Zambezi from 2b to 1.2b transistors..

If you ask me, the timing of the SOG update fits Steamroller better than Piledriver. Make that change too far in advance and it'll just confuse people. There were some Steamroller related updates to GCC not that much longer ago.

So you think each SR module gets 2x256bit FMACs?
 
So you think each SR module gets 2x256bit FMACs?


It could be 4 128, they look very much like the existing ones ( higher/lower order bits are split), there is just double the amount of them. There also looks to be double the amount of register/queue for the FPU so maybe some kind of course grain separation between them can take pressure off needing so many read/write ports on one register/queues with a 4x128bit design.
 
Last edited:
It could be 4 128, they look very much like the existing ones ( higher/lower order bits are split), there is just double the amount of them. There also looks to be double the amount of register/queue for the FPU so many some kind of course grain separation between them can take pressure off needing so many read/write ports on one register/queues with a 4x128bit design.

Interesting. AMD may be trying to gain back share in HPC by beefing up the FPU. Should be interesting to see Steamroller v.s. Haswell or, probably more likely, Steamroller v.s. Broadwell.
 
Back
Top