Kaveri APU Features 20% CPU and 30% GPU Performance Uplift Over Richland

csbin · Nov 25, 2013

We already know that AMDs Steamroller core architecture would be incorporated inside the heart of the new APU. The Kaveri APU would feature upto four SteamrollerB cores with a total of 4 MB L2 cache. This would be backed up by the new temperature start turbo core technology (Turbo Core 3.0) which would provide the APU with boosted clock speeds depending on temperature and power levels. We really dont know how this would work till AMD sends us a sample of the Kaveri to test but that isnt going to happen until January 2014 so its best to wait for more information.
The part you want to learn is that the leaked slide mentions the SteamrollerB and GCN core actually provides a 20% CPU side performance uplift and 30% GPU side improvement over Richland APU. This is the same trend as was seen in AMDs high-performance core roadmap which showed an annual 10-15% IPC improvement over past predecessors. Since Richland is based on the Piledriver core which itself is an improved Bulldozer variant, the new Kaveri APUs 20% CPU-Side improvement was pretty much expected.
As for the graphics side, the 30% increase due to the new Graphics Next Core seems pretty good i still think theres room for more improvement over the older VLIW based IGP. The 512 Stream processors are going to bring performance close to a Radeon HD 7750 which would allow users to play Battlefield 4 at 1080P without a discrete GPU. It is also mentioned that the GPU would be available in various configurations since it belongs to the Radeon R7 and Radeon R5 family.

Read more: http://wccftech.com/amd-kaveri-apu-...land-platform-details-unveiled/#ixzz2lf07gjWG

NTMBK · Nov 25, 2013

Interesting that WCCFTech have gone back on their claims of 4 cores/8 threads.

(sic)Klown12 · Nov 25, 2013

NTMBK said:
Interesting that WCCFTech have gone back on their claims of 4 cores/8 threads.

I'm guessing in their rush to get the information out that they mistook the four blocks in the diagram as modules. These types of sites seem to value speed over accuracy in order to drive up hits.

inf64 · Nov 25, 2013

This has been already posted and discussed in detail in other topic.

frozentundra123456 · Nov 25, 2013

This is already discussed in the other Kaveri thread. It is qualified by the usual "up to" market speak, and the footnote says it is based only on estimates of the silicon design, not real performance figures.

Enigmoid · Nov 25, 2013

Great, up to 20% and up to 30% mean absolutely nothing when 1) based on pre-silicon figures and 2) are 'up to values'.

AMD has consistently overestimated performance gains for bulldozer, trinity, and richland. I'm only expecting around 50-75% of what they state on that slide on average in real world situations.

SiliconWars · Nov 25, 2013

The something weird about it anyway. Obviously they know exactly what the performance of final silicon is as they are shipping it.

sm625 · Nov 25, 2013

Up to 20% more performance. But if the clocks are 10% lower then you're only looking at a 10% gain. And that could be entirely due to the enhanced turbo. If it is temperature based then it should turbo much higher since richland runs pretty cool under a single core load.

Ancalagon44 · Nov 25, 2013

Could they mean 20% enhanced performance taking into account a clock speed drop?

Ie, IPC gains could be greater than 20%, but overall performance increase is 20% because of clock speed drop? Lets hope so.

Ventanni · Nov 25, 2013

So the first slide is for mobile Kaveri chips and the second chart is for desktop Kaveri parts. Are we sure we want to be making mobile claims for the desktop world since we know how AMD's clockspeeds affect their IPC?

inf64 · Nov 25, 2013

Well 1st of all this is mobile parts slide

. The clock speeds we have seen on the ES side are 1.8Ghz base and 2.3Ghz turbo, which is rather low. Richland for example is clocking at 2.5/3.5Ghz so in order for Kaveri to outperform it on x86 side there are only two possibilities:
1)Mobile Kaveri's CPU clocks at launch will be roughly comparable to Richland's or ~10% lower. IPC will need to be , in that case, at least 20% higher (or ~30% higher if clocks are ~10% lower, roughly).
2)Mobile Kaveri's CPU clocks at launch will be a lot lower than Richlands and somewhat higher than what ES have now. This leaves us at 2-2.5Ghz base-turbo range. This would imply that in order to outperform Richland AT ALL, SR core needs to have rather impossible 40-50% IPC increase, which is very very unlikely to happen. SR is basically radically reworked BD/PD done on a new process node so such gains just due to uarchitectural changes are out of the question.

Out of the two scenarios above only one is probable IMO and that is number 1). BTW when AMD talks performance they mean total performance Vs Richland, ie. what you will see comparing end products. They didn't mention clock speeds as they are both already figured in this projection. Now the projection could be flawed for all we know since the footnote says it's not based on real silicon. But AMD should know by now how good(or bad) Kaveri really is so there is no point for them to make claims that will turn out false in one moth time

.

I for one think that claims are even maybe conservative and on the lower end of the scale since for example AMD has not yet finalized the Turbo core clock speed for desktop Kaveri. This tells me they are not yet sure what is the maximum performance as it all depends where the mean clock speed will land due to new Turbo core functionality (higher Turbo means higher average clock and thus higher total performance). Similar goes for mobile parts.

NostaSeronx · Nov 25, 2013

inf64 said:
Well 1st of all this is mobile parts slide . The clock speeds we have seen on the ES side are 1.8Ghz base and 2.3Ghz turbo, which is rather low. Richland for example is clocking at 2.5/3.5Ghz so in order for Kaveri to outperform it on x86 side there are only two possibilities

With the A1 samples, they are actually quite close to the production samples.

Kaveri Mobile:
1.4 GHz base/1.8 GHz average clock/2.3 GHz Max TDP clock

Kaveri Desktop:
3.5 GHz base/3.7 GHz average clock/3.8 GHz Max TDP clock

--

I do enjoy that people don't use the actual slide, instead use my edited one.

inf64 · Nov 25, 2013

Kaveri 7850K does not have 3.5Ghz base clock. AMD lists it as 3.7Ghz base clock on their own slide from APU13...
And we have no idea what is base/turbo for mobile parts that will launch. In order to outperform Richland at all with those pathetically low clocks SR-B needs to have between 40% and 50% higher IPC. Sorry, this won't happen in this universe, maybe in some other parallel one

.

NostaSeronx · Nov 25, 2013

inf64 said:
In order to outperform Richland at all with those pathetically low clocks SR-B needs to have between 40% and 50% higher IPC.

You also have to point out the clocks for SteamrollerB are half the clock rate shown. Do to the whole 2 Inst. Fetch/2 Inst. Decode/4 ALUs/4 AGUs/8 FMACs.

So, it is more in the line of 2.2x to 2.5x, actually.

Kaveri, Carrizo, and Basilisk will not have configurations like; 1M/2C/4T - 2M/4C/8T - 3M/6C/12T - etc.

inf64 · Nov 25, 2013

Come on SeronX, come back from the dreamland

. Clocks are what they are, there are no double pumped units in Kaveri... Also there is no 4 ALUs,4AGUs and 8(????) FMACs per core. Where did you even get 8FMAC units from?? That's just ludicrous, sorry

.

Kaveri is 2ALU/2AGU per core design and one module shares 2x128bit FMA units(+ one 128bit MMX unit, down 1 unit from BD/PD as FMACs picked up some of the functionality). There are no double pumped exec. units in SR-B core neither. New decoders run every other cycle as per AMD.

NostaSeronx · Nov 25, 2013

http://imgur.com/HDjJSET

Just in case you forgot it is labelled Steamroller Module, or in detail AMD Steamroller Module.

NTMBK · Nov 25, 2013

NostaSeronx said:
http://imgur.com/HDjJSET

That's not a Steamroller module.

inf64 · Nov 25, 2013

So wait, the title of that image is "Steamroller module" and that is what actually makes it a steamroller module? That image is from dubious source and does not correlate with what AMD disclosed about SR core one year ago and this month on APU13. Thus the image is either something else. Either EX core or a fake.

This is what AMD showed on APU13, a real SR-B:

Granted it's just a diagram of the core blocks but it shows in no uncertain terms:
1)4 integer pipelines, just like PD/BD. 2ALU+2AGU just as before
2) 2x128bit FMACs and 1xMMX unit, slightly different from PD/BD but nowhere do we see 2x256bit FMACs or anything similar to that.

So let us stick with reality and leave fantasy in the dreamland where it belongs

.

NaroonGTX · Nov 25, 2013

Do to the whole 2 Inst. Fetch/2 Inst. Decode/4 ALUs/4 AGUs/8 FMACs.

Where did you get this from? Pretty sure SR still has 2x ALUs/AGUs and 2x FMACs per module, just like BD/PD.

NostaSeronx · Nov 25, 2013

inf64 said:
Granted it's just a diagram of the core blocks but it shows in no uncertain terms

Granted it isn't a die shot, you have nothing to prove that module shot isn't Steamroller.

inf64 · Nov 25, 2013

How about AMD stating that FP unit is 2x128bit? There is your proof. There is no 256bit FP units in SR-B, there is no double pumping of units, there is no 4ALU and 4AGU. All you have as a "proof" is a dubious image of unknown origin. That's not evidence that proves anything, sorry.

NaroonGTX · Nov 25, 2013

Lots of things were doubled-up in that module shot. AMD didn't say anything about things being doubled up at APU13, so it clearly doesn't point to that image being SR-B. We'll find out for sure at CES 2014, but I'm positive it isn't SR-B either.

NostaSeronx · Nov 25, 2013

inf64 said:
All you have as a "proof" is a dubious image of unknown origin.

The die shots for Jaguar(Kabini) and Steamroller(Kaveri) were exposed around the same time frame by AMD.

5

inf64 · Nov 25, 2013

That cannot be SR core since we now know 100% that SR-B core has 16KB dedicated L1 data caches. The module in this image was disected by Hans and a few other people and they stated the data cache structure looked doubled and was now 32KB per core. Clearly this is not a core that is in Kaveri.We have CPU-z shots and geekbench entries for Kaveri and the all state a correct structure of L1 cache and it aligns with what AMD stated: 1x96KB L1 instr. cache and 2x16KB L1 data cache per module. Sorry the image cannot be of SR module.

1)CPUz shot

900x900px-LL-9933b803_1464715_10151756108733946_1811656005_n.jpeg

2)Geekbench entry for Kaveri

L1 Instruction Cache 96 KB x 2
L1 Data Cache 16 KB x 4
L2 Cache 2048 KB x 2

PPB · Nov 25, 2013

Either way we are always talking about the same perf ballpark, wether it's the more centered proyections or Seronx wackyland's ones.

Actually, I think Seronx's SR version would be less performing, cutting clocks in half in order to double almost everything in the Integrer/FPUs. Going 2x wide doesnt necesarily mean 2x IPC in real world scenarios.

Kaveri APU Features 20% CPU and 30% GPU Performance Uplift Over Richland

Senior member

Lifer

Senior member

Diamond Member

Lifer

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Golden Member