Kaveri APU Features 20% CPU and 30% GPU Performance Uplift Over Richland

csbin

Senior member
Feb 4, 2013
908
614
136
AMD-Kaveri-APU-Platform-Details.jpg


kaveri.jpg



We already know that AMD’s Steamroller core architecture would be incorporated inside the heart of the new APU. The Kaveri APU would feature upto four SteamrollerB cores with a total of 4 MB L2 cache. This would be backed up by the new temperature start turbo core technology (Turbo Core 3.0) which would provide the APU with boosted clock speeds depending on temperature and power levels. We really don’t know how this would work till AMD sends us a sample of the Kaveri to test but that isn’t going to happen until January 2014 so its best to wait for more information.
The part you want to learn is that the leaked slide mentions the SteamrollerB and GCN core actually provides a 20% CPU side performance uplift and 30% GPU side improvement over Richland APU. This is the same trend as was seen in AMD’s high-performance core roadmap which showed an annual 10-15% IPC improvement over past predecessors. Since Richland is based on the Piledriver core which itself is an improved Bulldozer variant, the new Kaveri APU’s 20% CPU-Side improvement was pretty much expected.
As for the graphics side, the 30% increase due to the new Graphics Next Core seems pretty good i still think there’s room for more improvement over the older VLIW based IGP. The 512 Stream processors are going to bring performance close to a Radeon HD 7750 which would allow users to play Battlefield 4 at 1080P without a discrete GPU. It is also mentioned that the GPU would be available in various configurations since it belongs to the Radeon R7 and Radeon R5 family.


Read more: http://wccftech.com/amd-kaveri-apu-...land-platform-details-unveiled/#ixzz2lf07gjWG
 

NTMBK

Lifer
Nov 14, 2011
10,448
5,829
136
Interesting that WCCFTech have gone back on their claims of 4 cores/8 threads.
 

(sic)Klown12

Senior member
Nov 27, 2010
572
0
76
Interesting that WCCFTech have gone back on their claims of 4 cores/8 threads.

I'm guessing in their rush to get the information out that they mistook the four blocks in the diagram as modules. These types of sites seem to value speed over accuracy in order to drive up hits.
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
This has been already posted and discussed in detail in other topic.
 
Aug 11, 2008
10,451
642
126
This is already discussed in the other Kaveri thread. It is qualified by the usual "up to" market speak, and the footnote says it is based only on estimates of the silicon design, not real performance figures.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Great, up to 20% and up to 30% mean absolutely nothing when 1) based on pre-silicon figures and 2) are 'up to values'.

AMD has consistently overestimated performance gains for bulldozer, trinity, and richland. I'm only expecting around 50-75% of what they state on that slide on average in real world situations.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
The something weird about it anyway. Obviously they know exactly what the performance of final silicon is as they are shipping it.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
Up to 20% more performance. But if the clocks are 10% lower then you're only looking at a 10% gain. And that could be entirely due to the enhanced turbo. If it is temperature based then it should turbo much higher since richland runs pretty cool under a single core load.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Could they mean 20% enhanced performance taking into account a clock speed drop?

Ie, IPC gains could be greater than 20%, but overall performance increase is 20% because of clock speed drop? Lets hope so.
 

Ventanni

Golden Member
Jul 25, 2011
1,432
142
106
So the first slide is for mobile Kaveri chips and the second chart is for desktop Kaveri parts. Are we sure we want to be making mobile claims for the desktop world since we know how AMD's clockspeeds affect their IPC?
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Well 1st of all this is mobile parts slide ;). The clock speeds we have seen on the ES side are 1.8Ghz base and 2.3Ghz turbo, which is rather low. Richland for example is clocking at 2.5/3.5Ghz so in order for Kaveri to outperform it on x86 side there are only two possibilities:
1)Mobile Kaveri's CPU clocks at launch will be roughly comparable to Richland's or ~10% lower. IPC will need to be , in that case, at least 20% higher (or ~30% higher if clocks are ~10% lower, roughly).
2)Mobile Kaveri's CPU clocks at launch will be a lot lower than Richlands and somewhat higher than what ES have now. This leaves us at 2-2.5Ghz base-turbo range. This would imply that in order to outperform Richland AT ALL, SR core needs to have rather impossible 40-50% IPC increase, which is very very unlikely to happen. SR is basically radically reworked BD/PD done on a new process node so such gains just due to uarchitectural changes are out of the question.

Out of the two scenarios above only one is probable IMO and that is number 1). BTW when AMD talks performance they mean total performance Vs Richland, ie. what you will see comparing end products. They didn't mention clock speeds as they are both already figured in this projection. Now the projection could be flawed for all we know since the footnote says it's not based on real silicon. But AMD should know by now how good(or bad) Kaveri really is so there is no point for them to make claims that will turn out false in one moth time ;).

I for one think that claims are even maybe conservative and on the lower end of the scale since for example AMD has not yet finalized the Turbo core clock speed for desktop Kaveri. This tells me they are not yet sure what is the maximum performance as it all depends where the mean clock speed will land due to new Turbo core functionality (higher Turbo means higher average clock and thus higher total performance). Similar goes for mobile parts.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Well 1st of all this is mobile parts slide ;). The clock speeds we have seen on the ES side are 1.8Ghz base and 2.3Ghz turbo, which is rather low. Richland for example is clocking at 2.5/3.5Ghz so in order for Kaveri to outperform it on x86 side there are only two possibilities
With the A1 samples, they are actually quite close to the production samples.

Kaveri Mobile:
1.4 GHz base/1.8 GHz average clock/2.3 GHz Max TDP clock

Kaveri Desktop:
3.5 GHz base/3.7 GHz average clock/3.8 GHz Max TDP clock

--

I do enjoy that people don't use the actual slide, instead use my edited one.
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Kaveri 7850K does not have 3.5Ghz base clock. AMD lists it as 3.7Ghz base clock on their own slide from APU13...
And we have no idea what is base/turbo for mobile parts that will launch. In order to outperform Richland at all with those pathetically low clocks SR-B needs to have between 40% and 50% higher IPC. Sorry, this won't happen in this universe, maybe in some other parallel one :D.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
In order to outperform Richland at all with those pathetically low clocks SR-B needs to have between 40% and 50% higher IPC.
You also have to point out the clocks for SteamrollerB are half the clock rate shown. Do to the whole 2 Inst. Fetch/2 Inst. Decode/4 ALUs/4 AGUs/8 FMACs.

So, it is more in the line of 2.2x to 2.5x, actually.

Kaveri, Carrizo, and Basilisk will not have configurations like; 1M/2C/4T - 2M/4C/8T - 3M/6C/12T - etc.
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Come on SeronX, come back from the dreamland :). Clocks are what they are, there are no double pumped units in Kaveri... Also there is no 4 ALUs,4AGUs and 8(????) FMACs per core. Where did you even get 8FMAC units from?? That's just ludicrous, sorry :(.

Kaveri is 2ALU/2AGU per core design and one module shares 2x128bit FMA units(+ one 128bit MMX unit, down 1 unit from BD/PD as FMACs picked up some of the functionality). There are no double pumped exec. units in SR-B core neither. New decoders run every other cycle as per AMD.
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
So wait, the title of that image is "Steamroller module" and that is what actually makes it a steamroller module? That image is from dubious source and does not correlate with what AMD disclosed about SR core one year ago and this month on APU13. Thus the image is either something else. Either EX core or a fake.

This is what AMD showed on APU13, a real SR-B:
4FX4D50.jpg


Granted it's just a diagram of the core blocks but it shows in no uncertain terms:
1)4 integer pipelines, just like PD/BD. 2ALU+2AGU just as before
2) 2x128bit FMACs and 1xMMX unit, slightly different from PD/BD but nowhere do we see 2x256bit FMACs or anything similar to that.

So let us stick with reality and leave fantasy in the dreamland where it belongs ;).
 
Last edited:

NaroonGTX

Member
Nov 6, 2013
106
0
76
Do to the whole 2 Inst. Fetch/2 Inst. Decode/4 ALUs/4 AGUs/8 FMACs.
Where did you get this from? Pretty sure SR still has 2x ALUs/AGUs and 2x FMACs per module, just like BD/PD.
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
How about AMD stating that FP unit is 2x128bit? There is your proof. There is no 256bit FP units in SR-B, there is no double pumping of units, there is no 4ALU and 4AGU. All you have as a "proof" is a dubious image of unknown origin. That's not evidence that proves anything, sorry.
 

NaroonGTX

Member
Nov 6, 2013
106
0
76
Lots of things were doubled-up in that module shot. AMD didn't say anything about things being doubled up at APU13, so it clearly doesn't point to that image being SR-B. We'll find out for sure at CES 2014, but I'm positive it isn't SR-B either.
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
That cannot be SR core since we now know 100% that SR-B core has 16KB dedicated L1 data caches. The module in this image was disected by Hans and a few other people and they stated the data cache structure looked doubled and was now 32KB per core. Clearly this is not a core that is in Kaveri.We have CPU-z shots and geekbench entries for Kaveri and the all state a correct structure of L1 cache and it aligns with what AMD stated: 1x96KB L1 instr. cache and 2x16KB L1 data cache per module. Sorry the image cannot be of SR module.

1)CPUz shot
900x900px-LL-9933b803_1464715_10151756108733946_1811656005_n.jpeg


2)Geekbench entry for Kaveri
L1 Instruction Cache 96 KB x 2
L1 Data Cache 16 KB x 4
L2 Cache 2048 KB x 2
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
Either way we are always talking about the same perf ballpark, wether it's the more centered proyections or Seronx wackyland's ones.

Actually, I think Seronx's SR version would be less performing, cutting clocks in half in order to double almost everything in the Integrer/FPUs. Going 2x wide doesnt necesarily mean 2x IPC in real world scenarios.