Official Improvements of Piledriver Cores.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Considering that clock for clock Ivy Bridge > Sandybridge > Westmere > Nehalem > Yorkfield > Kentsfield > Phenom II > Bulldozer, I think it is reasonable to low ball expectations from AMD on the high end at this point.

I think it's going to depend on how much they managed to bump up the IPC. Trinity desktop parts clock between 3ghz>low 4ghz, thus if they managed to make up that ~10% IPC gap that was created by the Bulldozer slip-up it's reasonable to conclude that any clock speed increase will translate into a direct performance increase. Llano tops out at 3ghz while Trinity tops out at 4.2ghz (desktop parts, the both of them) so the performance may in fact be that 20-30% claim -- the 10%+/- difference attributed to the IPC term encompassing all general workloads. The question here would be whether they managed to get that 10% IPC back from the Phenom II/Thuban > Bulldozer decrease. The leaked benchmarks showed a mixed bag, with an equal IPC, slightly higher or slightly lower and they only represent a very narrow workload (I believe it was Folding?). Needless to say it's difficult to estimate, but we've got the clock speeds so all we're missing is how much they've improved per-clock.

Bulldozer was initially designed to maintain IPC level while boosting clock speeds up 30%. Neither of those goals were attained and we received a Faildozer instead. This time around, though, AMD has managed to clock an absolutely massive APU (monstrous GPU inside it. That's taking up probably more than half the die space) and still get it at 4.2ghz at an equal 100W TDP to Llano. Clearly they've done quite a bit of improvement in perf-per-watt when compared to Bulldozer. It's not easy getting 384 VLIW4 shaders clocked at 800mhz and a 2 modules at 4.2ghz under 100W, so that's incredibly impressive.

Ultimately it boils down to IPC gains. What I'd like to see more in the Trinity APU is some significant IMC gains as well because relying on DRAM frequency is an empty goal if you're unable to saturate that 128-bit bus the APU shares between the GPU+CPU. Some of the other improvements should help the CPU though I'm not sure how significantly that matters as far as GPU performance goes.

Improved prefetch, branch prediction and L2 efficiency are huge and means AMD went right to the heart of the problems at Bulldozer; and that's a laundry list of changes included on those slides and there's likely far more that we're not seeing. Clearly they realized they fucked up big time and they absolutely had to address the issues immediately. This bodes well for Vishera which AMD has already hinted further improvements and not just the addition of L3 cache, so there's a good chance Vishera looks even better than what we see in the above Piledriver cores in Trinity. Ultimately the real question will be how much they've improved upon Bulldozer and how close they've chipped away at that -10% IPC. At the moment that's something we don't know :/ The clock speeds for Vishera are also an unknown at this point. It's quite clear that Trinity is clock speed limited by those absolutely massive VLIW4 shaders and that it's clock speeds could likely have exceeded the mid 4ghz range had it not been TDP limited and forced to lug around those extra transistors. There's also the issue of the diminishing returns past 4ghz on the licensed RCM tech that AMD applied in Piledriver. So how high can the Piledriver cores really clock? and how high can they clock with that 8MB L3 cache? What's the OC ceiling? Who knows, but what seems certain is that the mobile parts will clock quite close to their desktop brethren at lower TDP, aggressive turbo included. Good news for laptops but conceivably not-as-impressive news for Vishera.

I wouldn't expect Sandy levels of performance but maybe what Bulldozer was gunning for originally, somewhere in that 10-15% better than Nehalem range, mainly attributed to clock speed differences.
 

formulav8

Diamond Member
Sep 18, 2000
7,004
523
126
They went to the worst area's and it should show quite well. Also the desktop clockspeeds are supposed to be at the original speeds they were gunning for at release (4.2-4.4ghz). In the end, it will be a nicely done cpu that will Still do everything nearly every person needs a cpu to do.
 

Axon

Platinum Member
Sep 25, 2003
2,541
1
76
I'd love to see a nice offering from AMD. And if they can get nice iGPU numbers, that would be well in their favor. With that said, I'm not holding my breath.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
I'd love to see a nice offering from AMD. And if they can get nice iGPU numbers, that would be well in their favor. With that said, I'm not holding my breath.

I don't think anyone doubts the GPU's performance here. I do think the VLIW4 architecture will pull through for them and with that 800mhz clock speeds it should perform 30-50% better than the previous Llano, which Intel still hasn't caught up to. If AMD can do one thing extremely well it's GPU. The CPU performance otoh is still an uncertainty.
 

DeeDot78

Member
Jul 29, 2011
77
0
0
AMD-Piledriver-vs-Bulldozer.png


Comparison with Bulldozer from my other post. If the piledriver cores are improved with desktop version, we may have something to look forward to.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
That graph is confusing. The Llano has 2 scores while the Trinity part has 3? 4? The 4500M Trinity varies from 2573 per-ghz integer to 2370, a 10% gap. The two Llanos differ by almost 200pts per-ghz integer as well. The FX4100 scores look to be roughly equal while the 8150 differs by the better part of 200.

It's hard to get a read on the performance judging by that graph. If the scores IPC differs by as much as 10% in that specific workload then it's nearly impossible to call.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
nEO_IMG_trinity-5.jpg.jpeg


who followed the white papers of piledriver, knows that trinity is incomplete...the real piledriver, will be out with vishera....and no, it's not just L3 cache

^This. This is AMD "Tick/Tock". Implement part of uArch improvements on mobile and full range of improvements on desktop. Depending on how well GF's 32nm HPC node has improved, clocks could be in for a nice boost as well.
 

Hatisherrif

Senior member
May 10, 2009
226
0
0
Considering that clock for clock Ivy Bridge > Sandybridge > Westmere > Nehalem > Yorkfield > Kentsfield > Phenom II > Bulldozer, I think it is reasonable to low ball expectations from AMD on the high end at this point.

Kentsfield? You sir, you are unbelievable.
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Kentsfield? You sir, you are unbelievable.

lol I'm unbelievable? How about AMD? Yes the 65nm Core 2 Quads are faster clock per clock than Phenom II. It's just been so long since anyone discussed Kentsfield that it seems to have been forgotten. AMD fanboys are hilarious
 

Rvenger

Elite Member <br> Super Moderator <br> Video Cards
Apr 6, 2004
6,283
5
81
AMD fanboys are hilarious


Funny that Hatisherrif has a 2500k in his sig. Also I am with Hatisherrif as well on this too that Kentsfield is a little far fetched.



Llano A6 trades blows with a Q6600 and the A8 beats it and this is slower than a Phenom II due to the lack of L3 cache. Clock for clock, Kentsfield is about on par to a Llano. Phenom II and Bulldozer are faster for sure.

http://www.anandtech.com/bench/Product/53?vs=399
 
Last edited:

nehalem256

Lifer
Apr 13, 2012
15,669
8
0
Funny that Hatisherrif has a 2500k in his sig. Also I am with Hatisherrif as well on this too that Kentsfield is a little far fetched.



Llano A6 trades blows with a Q6600 and the A8 beats it and this is slower than a Phenom II due to the lack of L3 cache. Clock for clock, Kentsfield is about on par to a Llano. Phenom II and Bulldozer are faster for sure.

http://www.anandtech.com/bench/Product/53?vs=399

But A6 and A8 enjoy an advantage of 200-500MHz. And Llano as double the L2 cache of Phenom II
 

blckgrffn

Diamond Member
May 1, 2003
9,686
4,345
136
www.teamjuchems.com
The benches I just ran last week give the nod to Deneb Clock for Clock vs Kentsfield. It is pretty close, and having the best chipset for C2Q (that came out well after Kenstfield was available) may turn the tide.
 

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
4 int cores/2 fp units maybe? Still doesn't roll off the tongue I guess. Does it even matter how many fp units the CPU has if it performs similarly? Does lacking a L3 cache make Trinity a 8/3 core cpu? Is each Bulldozer int core only a 2/3 of a core compared to Phenom that has 3 ALUs and 3 AGUs?

You see, the reason that Bulldozer tanked was because it only had 8*2/3*1/2=16/6 cores! And it's even worse for Trinity, 4*2/3*1/2*2/3=16/18, thats not even a full core! What is AMD thinking? D:

I guess it's too late for me to avoid a silly discussion about what is a core? (a miserable pile of secrets!)

This post is full of win. For what profit is it to a man if he gains the world, and loses his own soul.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Oh dear lord, stop arguing over the IPC of Conroe vs. Stars. It's been common knowledge for some time that Conroe is slightly faster clock for clock than Stars, by 3%.

So yes, both at 3.2GHz, a Core 2 Quad Q6xxx would just edge out a Phenom II X4. This alone should show you just how far behind AMD is in per-core performance since Stars' IPC is around 10% higher than that of Bulldozer.

overall.png
 
Last edited:

Subyman

Moderator <br> VC&G Forum
Mar 18, 2005
7,876
32
86
I would love to see AMD back in action. I've been a fan of the Bulldozer vision, but their clock vs clock performance hamstrung them from the goal. Hopefully Piledriver will fix some of those short comings, but I don't see this being a legitimate choice over IB except for very particular needs. Can't wait to see more.
 

Makaveli

Diamond Member
Feb 8, 2002
4,976
1,571
136
I wouldn't expect Sandy levels of performance but maybe what Bulldozer was gunning for originally, somewhere in that 10-15% better than Nehalem range, mainly attributed to clock speed differences.

SB = 10-15% better than Nehalem so you are expecting it.

Wow... i always though C2Q == PII

I double piledriver will beat nehalem. The only reason bulldozer ever looked competitive vs nehalem is the 920s lousy stock clockspeed.

This^^
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Oh dear lord, stop arguing over the IPC of Conroe vs. Stars. It's been common knowledge for some time that Conroe is slightly faster clock for clock than Stars, by 3%.

So yes, both at 3.2GHz, a Core 2 Quad Q6xxx would just edge out a Phenom II X4. This alone should show you just how far behind AMD is in per-core performance since Stars' IPC is around 10% higher than that of Bulldozer.

overall.png

I'd love to see this graph updated with Bulldozer, Atom, and Bobcat.

(I'd also like to see an Atom and Bobcat @ 3ghz).

I wonder how close Bobcat IPC would be to Phenom II and BD...
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
why the pic sooooo large? I had to use shrink fonts to see it. Now my browser fonts like 4pts.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
I'd love to see this graph updated with Bulldozer, Atom, and Bobcat.

(I'd also like to see an Atom and Bobcat @ 3ghz).

I wonder how close Bobcat IPC would be to Phenom II and BD...

You mean how far, right?

K8 has 10% higher IPC than Bobcat. Add 10% to the time it took for the X2 6000+ to complete and there's your answer.

Atom... well, Bobcat has about 40% higher IPC than Atom. Yes, Atom is that slow.