Official Improvements of Piledriver Cores.

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Easy, Nvidia has far less resources than Intel and Nvidia doesn't have fabs and a process node advantage. Nvidia vs AMD and Intel vs AMD is pretty apples to oranges.

And that ~7% loss in IPC is a pretty huge deal and won't get resolved until Win8.

Honestly, I think Bulldozer is a much forward thinking arch than SB to the point that I'd call it subjectively "better". Most apps don't use over 4 cores and 2 modules seem to scale much better than 2 "full" cores + HT and take about the same amount of space allowing for a beefier iGPU. You also have less redundancy which means less idle transistors leaking current. What's more, laptop chips are clocked a lot lower than desktop chips so the increased pipeline depth is definitely worth it there.

Not really, because Phenom II was still good (which shouldn't surprise you, since it was based on an architecture that, while old, was inherently sound). All AMD needed to do was price it right, and unfortunately they weren't THAT aggressive when it came to that because the platform price difference when choosing between a Core 2 Quad Q9*00 or Q9450 and a Phenom II X4 was very small, less than $50.

Bulldozer isn't a forward-looking architecture. Not now, not ever.
Also, a dual-core with HT from Intel is comparable to faster to a "quad"-core with CMT from AMD in multi-threaded, and in single-threaded AMD gets slaughtered. From a review comparing the Core i3-2120 and FX-4100 in Linux, which favors the Bulldozer architecture more than Windows:

The AMD FX-4100 is currently carrying a retail price of about $130 USD while the Intel Core i3 2120 is coming in right now at $140 USD. In most of the Linux tests that were carried out, this lower-end Sandy Bridge CPU blew AMD's FX-4100 out of the water. The i3-2120 operates at just 3.3GHz and is a dual-core part with Hyper Threading while the FX-4100 operates at 3.6/3.8GHz and is a true quad-core.
That alone should tell you the story here: you can't polish a turd. Link to article.
 
Last edited:

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
I still wonder why AMD didn't put Steamroller on it's desktop roadmap.

Apparently they're doing some massive improvements on Steamroller that will steer more towards IPC and away from Bulldozer's clock speed aims, and therefore socket compatibility as well.

There was initially some talk of abandoning two desktop platforms for a single FM2 and FM2-like APU platform that would fill the role of OEM and desktop segment of the market, which is why we saw more info and slides on server CPUs and APUs than we did for desktop CPUs or even an AM3+ successor; which were essentially void of anything after Vishera. I don't think that's true anymore and they'll likely create another socket after AM3+ for Steamroller. Though, tbh, I'd rather see a 3-module APU than a server-based CPU.

I think that AMD is banking on HSA/GPGPU for the gap in FMAC/AVX2, but I'm not sure just what they've got in store for vector int. This would give credibility to the "All APU but server" claims and would mean they'd waste less money by streamlining.
 

bononos

Diamond Member
Aug 21, 2011
3,938
190
106
No, they just need to close in as much as possible to Intel on IPC. Have as much comparable overall performance as possible, and then attack Intel aggressively when it comes to bang-for-buck. Again, resources/R&D argument: excuse. AMD has no problems competing with NVIDIA when it comes to GPUs, so why do they struggle so much in comparison to Intel when it comes to CPUs? I'll say the same thing again: it's the engineers, probably combined with the product marketing guys.
......
You're focused incessantly on IPC which is but one part of the overall cpu
architecture.
AMD engineers definitely aren't dumb, BD was something of a gamble because Intel was getting too far ahead in the process node advantage and AMD had to scrap K9. BD had to be pushed out in a risky move with unproven tech and AMD fell flat.
Intel has the advantage of having way larger resources and could pursue multiple designs, which was how the whole core2 family came about. Radeon/Nvidia do not have the same David/Goliath aspect as AMD/Intel.
 

KompuKare

Golden Member
Jul 28, 2009
1,228
1,597
136
The gains in Win8 will be more substantial, on the order of 1-10%. Games that utilize between 3-7 threads should show close to that 10% number. Which is great, but it's also completely ridiculous to think that you have to spend another $100+ to buy a new OS just because Microsoft wants to hold hostage its updated thread scheduler...

But that's not an excuse. AMD should have worked with MS to release a proper scheduler in time for BD's release instead of waiting several months and furthermore the improved Win8 scheduler should be included in a Win7 update rather than require people to buy a new OS.

Ah, but seeing how Win8 is going to be something which almost no desktop / laptop user wants (the GUI is hopelessly dumbed down for anything except tablets), about the only thing MS can do to get people to eventually 'upgrade' to it is too gimp the scheduler in Win7 or introduce some new 'feature' (DX12 maybe?) which they then claim cannot be backported to Win7.
 

Arzachel

Senior member
Apr 7, 2011
903
76
91
Bulldozer isn't a forward-looking architecture. Not now, not ever.
Also, a dual-core with HT from Intel is comparable to faster to a "quad"-core with CMT from AMD in multi-threaded, and in single-threaded AMD gets slaughtered. From a review comparing the Core i3-2120 and FX-4100 in Linux, which favors the Bulldozer architecture more than Windows:

That alone should tell you the story here: you can't polish a turd. Link to article.

Except that you conveniently skip the laptop part. 300 mhz increase in laptops is double the percentile increase you'd get on desktops.
 

Makaveli

Diamond Member
Feb 8, 2002
4,976
1,571
136
AMD has no problems competing with NVIDIA when it comes to GPUs, so why do they struggle so much in comparison to Intel when it comes to CPUs?

Just wanted to address this point.

AMD really ATI vs NV is not even remotely close to Intel vs AMD.

The gap between the two gpu's makers was never as big as on the cpu side. Even when the athlon 64 was destroying the P4 that gap was still huge.

So its abit of a apple to oranges if you ask me.

Windows scheduling is another old excuse now. Patches came out addressing the issue, guess what, it barely improved anything. You're not gonna polish a turd with some fixes to the OS scheduler.

I totally agree with this.

Even if AMD hit the clockspeed they wanted and didn't have an issue with GLO. BD would have still been slower than SB. I've never seen a 40 improvement in IPC between new gens, which is what they would have needed to be competitive. They released a cpu that still is slowing than Nehalem and Piledriver looks like it may finally equal 2008's bloomfield.

This is not a good thing 4 year later!
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
You're focused incessantly on IPC which is but one part of the overall cpu
architecture.
AMD engineers definitely aren't dumb, BD was something of a gamble because Intel was getting too far ahead in the process node advantage and AMD had to scrap K9. BD had to be pushed out in a risky move with unproven tech and AMD fell flat.
Intel has the advantage of having way larger resources and could pursue multiple designs, which was how the whole core2 family came about. Radeon/Nvidia do not have the same David/Goliath aspect as AMD/Intel.

It's one part, but the most important part, at least for desktops. When were AMD CPUs amazing? Oh, back in 2002-2005. What did they have that made them superior to Intel? Higher IPC and lower power consumption. Why are AMD doing the exact opposite of what made them successful then and going with low IPC and high power consumption?

It's better to focus an architecture on being as fast as possible than just focusing on making that architecture reach the highest clock speeds. Why? Because clock speed is affected in a big way also by external factors like process node, manufacturing foundry, and leakage. Focus on improving the architecture itself first, then focus on the clock speeds. That's how Intel's done since 2006, and it's worked well for them. Conroe didn't have high clock speeds, but they had a solid foundation: a good architecture. Now that they had a good architecture, they focused on improving clock speeds so they had the whole package. That's what AMD needed to do.
 

zebrax2

Senior member
Nov 18, 2007
977
70
91
Here's the thing with single-threaded performance: it affects speed in ALL workloads, no matter if single or multi-threaded, while multi-threaded performance only affects speed in multi-threaded workloads. Theoretical CPU A has a theoretical score of 4000 in single-threaded. Theoretical CPU B has a theoretical score of 2500 in single-threaded. Theoretical CPU A has four cores, and theoretical CPU B has six to make up for the difference in MT. But then there's the problem that as you add cores, core scaling decreases and therefore there are bigger points of diminishing returns. For CPU A, running a heavily multi-threaded workload, let's say you get 99% more performance than two cores. Add two cores more on top of that, though, and scaling decreases to 90%. Then you end up with a theoretical score of 15960 for CPU A (4000*3.99) and a theoretical score of 14750 for CPU B (2500*5.90). That's why single-threaded performance is so critical to desktops: it affects single AND multi-threaded programs.

The problem though is single-threaded performance is also dependent to clock speed that is why people kept saying that IPC alone as a basis for determining if an architecture is good is a load of BS. Don't get me wrong i do think that BD is pretty crappy i just don't agree with your views here
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Another hypothetical scenario here would be how high would BD have clocked had it been designed as a 2-module CPU rather than a larger 4-module, along with a decrease in the size of the L3 cache. My perspective has always been that the clock speeds were hampered by the dead and empty space along with the added size required to fit in the monstrous cache and 4 modules, decreasing both clock speeds and single-threaded performance for the added benefit of integer heavy multi-threaded workloads (and mainly workloads with more than 4 threads as having 4 has been shown to kick BD in the ass).

Here's Trinity

Trinity_Die_Low_wm.png


Llano

die-shot.jpg


Here's BD for comparison

Bulldozer_Die_size.png


The size of a single BD module (2 AMD cores) is 30.9mm2 with the shared L2 included.
Piledriver already looks to be more neatly packed, and to be honest I think the APU's just look a little neater in general, Llano and Trinity both.

That's essentially my main gripe with the CMT approach. Though it saves die space it's for the purpose of catering to highly-threaded workloads (again, 5-8 integer heavy threads) which just doesn't make much sense to me on the desktop and even less on the laptop. This explains why we had no BD mobiles and Trinity has only 2 modules. AMD also introduced the 1MB L2 per-core in Llano, which makes sense given that it's an APU but I'm not sure it's necessary given the 8MB L3 (though decreasing the size of the L2 would require a ground-up change with the way the cache itself works due to the WCC included in L2 and the tiny L1).

It all makes more sense when you realize it's a server architecture first and not a mobile/desktop architecture, but that doesn't mean I have to like it :D

edit - does anyone know how much space AMD saved by going with CMT/modules when compared to 2 Llano cores, L2 included?
 
Last edited:

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
It's one part, but the most important part, at least for desktops. When were AMD CPUs amazing? Oh, back in 2002-2005. What did they have that made them superior to Intel? Higher IPC and lower power consumption. Why are AMD doing the exact opposite of what made them successful then and going with low IPC and high power consumption?

i am beeing speculative here, and not defending AMD for realease a turd...

but, bulldozer isn't just about clocks, it's pipeline is similar to POWER 7, that reach 4.25Ghz at 45nm.... so amd thought that they would be safe at 32nm

almost every important stuff in bulldozer's white papers, looked that bulldozer would had better IPC than PH2... actually, even at AVX-1, bulldozer looked to beat sandy hard

but than, we have the trade-offs.... the caches were slower but were bigger... the Fpu was better, but shared...the front end was "modern" but half-clocked...and so on

in the end, GF 32nm node was worst than it 45 nm, the trade-offs were even worst and every thing that looks good in bulldozer is at servers and HPC
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Except that you conveniently skip the laptop part. 300 mhz increase in laptops is double the percentile increase you'd get on desktops.

Your point is exactly... what?

Ivy Bridge means a clock speed increase in comparison to Sandy Bridge for laptops, too.

Intel isn't just gonna sit there idle and wait five years until AMD can catch up to then introduce new products. They need to keep their momentum going, and they are.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
The problem though is single-threaded performance is also dependent to clock speed that is why people kept saying that IPC alone as a basis for determining if an architecture is good is a load of BS. Don't get me wrong i do think that BD is pretty crappy i just don't agree with your views here

Yes, but IPC is more important than clock speed, especially when you're developing a completely new architecture. If you don't have high IPC you don't have a good foundation to built upon on, which is why the Pentium 4 was scrapped after many tries at making it work.

Again, when you design an architecture hoping for high clock speeds, compromises need to be made. If you start out with a high IPC, low clock speed architecture you can still improve clock speed via new process nodes and as those processes mature. New steppings help, too. If you go for low IPC, high clocks, you'll hit a voltage and power consumption wall because, again, it raises exponentially. This is exactly the problem you see in Netburst and Bulldozer, and since this architecture will be what AMD will be moving forwards with, don't expect them to overtake Intel in the next five years. Intel will keep the CPU lead for some time to come.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
So you will begin to use it properly?

I think he did, but only on his last few replies. It makes it difficult to take seriously anyone who's looking to make a point on IPC by using the term "CPU architecture."

It's like taking aim at a small window on a barn door, hitting the roof and saying "good enough."
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
I think he did, but only on his last few replies. It makes it difficult to take seriously anyone who's looking to make a point on IPC by using the term "CPU architecture."

It's like taking aim at a small window on a barn door, hitting the roof and saying "good enough."

IPC=raw performance of a CPU architecture.

I explained that simply enough, and that's what it is.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
You're aware that's still wrong, right?

http://en.wikipedia.org/wiki/Instructions_per_cycle
I'd implore you to read the link you yourself posted earlier in the thread.

IPC is workload dependent, ISA dependent, and is affected by fab process and the entirety of the CPU architecture itself. It's just a small piece of the puzzle that you've misrepresented and assumed it's a good representative of the whole shibang (shimove shimove).

I'm not saying that it's not important or that AMD wasn't wrong in attempting to keep IPC the same, I'm just saying you should have an idea of what you're talking about before you assume others don't.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Hmm, missed that somehow when BD came out. Is that why, even with a 4 issue scheduler, BD was limited to dual issue throughput?

actually, no...BD have 4 issue scheduller, but it need 8....
so each core only uses 2 issue...:\

ignore the half-clocked....my mistake:\