First Steamroller processor core exposure

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ventanni

Golden Member
Jul 25, 2011
1,432
142
106
Isn't experience and fine tuning on new nodes something that AMD is good at though? No doubt we'll likely see slightly lower clocks with Steamroller given that it's a new process given that GF's 32nm SOI process is probably quite mature, but that doesn't mean 28nm bulk is going to mean <3ghz processing levels. That'd be a nightmare of a press release...
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
I have just completed giving this thread a fairly good scrubbing. To those of you who reported posts rather than replying to them or otherwise acting out, thank you very much. To those of you who received infractions and vacations, you've earned them.
-ViRGE
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Isn't experience and fine tuning on new nodes something that AMD is good at though? No doubt we'll likely see slightly lower clocks with Steamroller given that it's a new process given that GF's 32nm SOI process is probably quite mature, but that doesn't mean 28nm bulk is going to mean <3ghz processing levels. That'd be a nightmare of a press release...
You know what is a nightmare? Finding numbers that fit into 998.4 and 1049.x. (x being 5 to 9)

Server parts go by base frequency and desktop parts go by boost frequency.

Berlin: 998.4 GFlops
Kaveri: 1049.9 GFlops

Opteron 6386 SE = 179.2 GFlops ÷ 140 Watts = 1.28 GFlops * 7.8x improvement = 9.984 GFlops/Watt * 100 Watts = 998.4

So, far for Berlin which was REALLY EASY to do:
CPU: 2.4 GHz
GPU: 0.9 GHz

(2.4 GHz * 2 Modules * 16 Flops)->76.8
+
(0.9 GHz * 512 ALUs * 2 Flops)->921.6
= 998.4 GFlops

So, for the base clocks for Kaveri/Berlin it must be 2.4 GHz for CPU and 0.9 GHz for GPU.

Knowing that in the PDF slides GFlops for Llano/Trinity/Richland/Kaveri is heavily rounded. I set out for the boost clocks and ultimately failed! So, here is a really bull explanation why I think the GPU clock is what it is.

Trinity - 800 MHz
Richland - 844 MHz
Berlin GPU Base - 900 MHz
Berlin GPU Boost - 944 MHz
Yep, only an 100+ MHz boost and the expectation the GPU may actually have boost clock.

Kaveri Base: 2.4 GHz CPU/0.9 GHz GPU -> 998.4 GFlops
Kaveri Boost: 2.6 GHz CPU/0.944 GHz GPU -> 1049.9 GFlops

https://semiaccurate.com/assets/uploads/2013/06/AMD-Berlin-Slide.png
^-- what I am basing everything on.

I'm 100% certain that the GFlops/Watt calculation is dealing with SPEC2006, not theoretical max GFlops.
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Desktop SKUs are usually clocked higher than server, at least for AMD.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
German Tomshardware reports the leaked die photo. This seems to confirm that it is not a fake.

The part that most fascinated me was:

Die einzelnen Module können nun bis zu vier Threads gleichzeitig ausführen.​
[The individual modules can now execute up to four threads simultaneously].
in agreement with the analysis/speculation of some posters here. It is interesting for me for two reasons:

  • Kaveri 2M/4C will match the eight treads in consoles.
  • Kaveri 2M/4C will match the eight threads i7s from Intel.

Up to now I had considered Kaveri 4C as a competitor of the Haswell i5. and the formerly rumoured 6C Kaveri version as competing with the i7. But now that the 6C version has been eliminated from the 2013 roadmap, I am now considering that Kaveri 4C will be aimed to compete with the i7. In fact, the i7-4770k is a 848 GFLOP chip and recent Kaveri presentation

04.jpg


promises kaveri will be a 1.05 TFLOP chip.

The server roadmap makes now more sense as well, with 8C piledriver based Opteron chips being replaced by 4C Steamroller based Berlin chips.

http://www.tomshardware.de/amd-steamroller,news-249250.html

Your thoughts?
 

insertcarehere

Senior member
Jan 17, 2013
712
701
136
German Tomshardware reports the leaked die photo. This seems to confirm that it is not a fake.

The part that most fascinated me was:

in agreement with the analysis/speculation of some posters here. It is interesting for me for two reasons:

  • Kaveri 2M/4C will match the eight treads in consoles.
  • Kaveri 2M/4C will match the eight threads i7s from Intel.

Up to now I had considered Kaveri 4C as a competitor of the Haswell i5. and the formerly rumoured 6C Kaveri version as competing with the i7. But now that the 6C version has been eliminated from the 2013 roadmap, I am now considering that Kaveri 4C will be aimed to compete with the i7. In fact, the i7-4770k is a 848 GFLOP chip and recent Kaveri presentation

04.jpg


promises kaveri will be a 1.05 TFLOP chip.

The server roadmap makes now more sense as well, with 8C piledriver based Opteron chips being replaced by 4C Steamroller based Berlin chips.

http://www.tomshardware.de/amd-steamroller,news-249250.html

Your thoughts?

That will require a huge (>50%) IPC for 2M4T Kaveri just to catch up to haswell in single threaded performance, let alone account for hyperthreading from the i7s. Unless AMD straight up redid the whole architecture, in which case its not really based on Bulldozer anymore, I would say this is very unlikely.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I don't think GPU compute is comprehensive enough to fully compete against faster CPUs, not unless you run specific applications that will gain full benefit.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
My German is rusty, but the translation seems correct. However what's being proposed doesn't make a lot of sense. Each module is already oversubscribed; it only has duplicate integer resources. So the decoder frontend and FPU in particular can already be overtaxed by 2 threads. Consequently I'm not sure why AMD would add SMT on top of CMT, since it would further oversubscribe the hardware. I have to seriously wonder if something got lot along the way, or if someone is being fed bad information (Tom.de's record on rumors is not very good; they'll publish just about anything).

In any matter, because a module's resources are already oversubscribed, I would not expect to see the same gains that we've seen on SNB/HSW architectures from HT. Not that HT adds a lot in the first place, but AMD's CMT architecture means there's even less underutilized hardware to tap into via SMT. You should pick up some performance, but you wouldn't expect to see i5->i7 like gains in this scenario.
 
Last edited:

galego

Golden Member
Apr 10, 2013
1,091
0
0
So, far for Berlin which was REALLY EASY to do:
CPU: 2.4 GHz
GPU: 0.9 GHz

(2.4 GHz * 2 Modules * 16 Flops)->76.8
+
(0.9 GHz * 512 ALUs * 2 Flops)->921.6
= 998.4 GFlops

So, for the base clocks for Kaveri/Berlin it must be 2.4 GHz for CPU and 0.9 GHz for GPU.

Therefore 4 SR cores have 2x the theoretical performance of 4 PD cores. No?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Last edited:

ShadowVVL

Senior member
May 1, 2010
758
0
71
Steamroller being 2x faster seems like hype.
I am expecting at best 15-28% faster and about 20-30% better performance per watt for the chip.

I am wondering has anyone heard if steamroller will be on a new socket?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Steamroller being 2x faster seems like hype.
Only if clock to clock.
I am wondering has anyone heard if steamroller will be on a new socket?
Only on the server side, where there will be a new platform, eventually.

AM3+ = DEAD/DECOMMISSIONED
C32 = DEAD/DECOMMISSIONED
G34 = GC34
New Socket = GC36

GC34 = 1974 pins, Quad-Channel, CPU only. No MCM.
GC36 = 2000+ pins, Octo-Channel, CPU/APU. MCM only on CPUs.
GC36 -> "Hawaii" GPU was said to be an APU that will be on it.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,448
5,831
136
My German is rusty, but the translation seems correct. However what's being proposed doesn't make a lot of sense. Each module is already oversubscribed; it only has duplicate integer resources. So the decoder frontend and FPU in particular can already be overtaxed by 2 threads. Consequently I'm not sure why AMD would add SMT on top of CMT, since it would further oversubscribe the hardware. I have to seriously wonder if something got lot along the way, or if someone is being fed bad information (Tom.de's record on rumors is not very good; they'll publish just about anything).

In any matter, because a module's resources are already oversubscribed, I would not expect to see the same gains that we've seen on SNB/HSW architectures from HT. Not that HT adds a lot in the first place, but AMD's CMT architecture means there's even less underutilized hardware to tap into via SMT. You should pick up some performance, but you wouldn't expect to see i5->i7 like gains in this scenario.

It kind of makes sense for the die shot seen earlier in this thread- that appeared to have 4 x 128bit FMACs in a single module. That sort of doubling up of the FP hardware could make SMT worthwhile. However, I still remain unconvinced that that is a die shot of a Steamroller module- there have been so many slides showing SR modules having 2 x 128bit FMAC. Three options:

1. Its a fake
2. Its Excavator
3. Its a massively overhauled version of Kaveri, with little resemblance to what it used to be
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
We seem pretty sure SteamRoller will be on the FM2+ socket but what about the AM3+ socket ? What concerns me is whether or not AMDs idea to release the Centurion on the AM3+ socket is an internal decision to break with the AM3+ socket for SteamRoller.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
We seem pretty sure SteamRoller will be on the FM2+ socket but what about the AM3+ socket ? What concerns me is whether or not AMDs idea to release the Centurion on the AM3+ socket is an internal decision to break with the AM3+ socket for SteamRoller.

Seems AM3+ is dead future wise. The future is 2M APUs.

The Centurion looks like a last ditch to get some PR before its over. Also why you (most likely) wont see it outside special OEM builds.
 

NTMBK

Lifer
Nov 14, 2011
10,448
5,831
136
We seem pretty sure SteamRoller will be on the FM2+ socket but what about the AM3+ socket ? What concerns me is whether or not AMDs idea to release the Centurion on the AM3+ socket is an internal decision to break with the AM3+ socket for SteamRoller.

AMD's roadmaps show only 32nm Warsaw (i.e. Piledriver based, or another tweak of it) for their big Opterons- the only Steamroller parts they have announced are the Berlin APUs, which are just rebranded Kaveri. The big Opterons share the same die as the FX series- if there was a new AM3+ FX Steamroller coming, we would have seen it in their Opteron roadmap. The fact that they are bringing out another 32nm part for those platforms indicates to me that the 4-module Steamroller is dead.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
The Centurion looks like a last ditch to get some PR before its over. Also why you (most likely) wont see it outside special OEM builds.

FX9590 will be sold in retail as well, price will not be for the weak hearted :rolleyes:
 
Last edited:

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
AMD's roadmaps show only 32nm Warsaw (i.e. Piledriver based, or another tweak of it) for their big Opterons- the only Steamroller parts they have announced are the Berlin APUs, which are just rebranded Kaveri. The big Opterons share the same die as the FX series- if there was a new AM3+ FX Steamroller coming, we would have seen it in their Opteron roadmap. The fact that they are bringing out another 32nm part for those platforms indicates to me that the 4-module Steamroller is dead.

That's been my feeling for sometime. I'm not surprised though. In fairness, Intel socket 1155 only supported2 generations, Sandy and Ivy.
 

NTMBK

Lifer
Nov 14, 2011
10,448
5,831
136
Or it will not be produced at 28nm.

Yikes, I hope that GloFo's 28nm isn't bad enough to push them to release it on 32nm. :\ Or do you think they're holding out for TSMC 20nm? Either way, porting it to a new process will be a pretty big job.