Official Improvements of Piledriver Cores.

Ancalagon44 · May 2, 2012

Hatisherrif said:
So where is it? Where is the CPU you designed and/or fabricated? I am dying to see it, since nobody at AMD knows what they're doing.

Well I for one agree with him.

In both relative (ie clock for clock) and absolute measurements (ie at different clock speeds), Thuban is better than Bulldozer in the majority of cases taking into account both performance and clockspeed. By the time you have clocked BD high enough to beat Thuban, it consumes significantly more power. Which is even more embarrassing when you consider Thuban is a full process node behind.

In other words, it sucks.

But actually I dont really blame that on the engineers, I blame it on the product managers, who should have seen the writing on the wall about Bulldozer about 3 years ago, and changed direction. But they didnt, they persisted, even when all of their internal testing must have been showing them it sucked. And they released it, complete with marketing up the wazoo claiming this is the best CPU EVAR, and it sucked. Even if it isnt as bad as its detractors, such as myself, claim, it was bad enough compared to its predecessor that it has significantly damaged AMD's brand.

I went from owning a K7-700, then Athlon XP 2400, then Athlon X2 5600+, then Athlon II X4 620, and finally Phenom II X6 1050 (bought after BD was released!), to never wanting to buy an AMD CPU again. Why should I? I finally realized why AMD is in the financial doldrums - its because its executives could not organize a piss up in a brewery.

Chiropteran · May 2, 2012

LOL_Wut_Axel said:
Have you been paying attention to any of the discussion in this thread? We're discussing CPU ARCHITECTURE. You evaluate how fast a CPU architecture is in comparison to another by comparing them at the same clock speed and with a single core, to evaluate per-core performance. Simply adding 50% more cores onto the exact same architecture won't magically make it a faster architecture. The 3930K has 50% more cores than the 2600K. That doesn't mean it has a faster architecture, which for both is Sandy Bridge. Simply adding cores doesn't make a CPU architecture faster, easily seen by the fact that the Phenom II X6 and Phenom II X4 are within 1% of each other when comparing architectures in the chart.

It feels like you are changing your argument as you go along. You said, very clearly:

"but the reality is that AMD at the very best can only match Intel's CPU architecture from 6 years ago"

Which is what I disputed, and so far you haven't refuted anything.

IPC is irrelevant when some CPU such as bulldozer can scale to much higher clock speed than other CPU.

Single-threaded IPC is incredibly useless as an indicator of real world performance, because the vast majority of CPU bound tasks can run in multiple threads.

Correct me if I am wrong, but it looks like you argument is as follows:

If you remove AMD's multicore advantage, and if you remove AMD's clockspeed advantage, compare AMD vs Intel based on a single core of identical clockspeed, Intel wins.

I just don't get how you think that your argument is proving anything at all. You cherry pick benchmarks and you stack the game against a certain CPU architecture you can make the alternative look vastly superior, congratulations. Even I can dig up a benchmark showing the 8150 as faster than a 2500k, it doesn't really prove anything though.

LOL_Wut_Axel · May 2, 2012

Hatisherrif said:
So where is it? Where is the CPU you designed and/or fabricated? I am dying to see it, since nobody at AMD knows what they're doing.

That's as stupid an argument as saying that the designers/builders of the Russian Lada were good because you've never made a car yourself. Or as stupid as saying that you can't criticize a pop music artist that uses auto-tune because you've never made songs or an album yourself.

Arzachel · May 2, 2012

LOL_Wut_Axel said:
CPU performance is derived from that, yes. I'm comparing one aspect: IPC, or architecture raw speed. Nothing else. I made that very clear from the beginning.

Unfortunately, AMD can't defy nature, so they should've been smart enough from the beginning to know that making CPUs focused only on high clock speeds wasn't gonna get them anywhere. They didn't learn from Intel and the Pentium 4, and now it's come back to haunt them. They also forgot Intel has their own foundries which are much more advanced than GF's. It seems like their engineers completely forgot about the fact that power consumption increases exponentially as you raise clock speed, thinking they'd be able to achieve stock clock speeds of 5GHz.

Instead, they can only compete with Intel when it comes to clock speeds, meaning they weren't even able to achieve their high clocks goal--the average overclock of an unlocked Sandy/Ivy Bridge and Bulldozer is 4.5GHz, and the base clock speeds are nearly identical with an FX-8150 being 3.6GHz and a 3770K 3.5GHz. That's why their engineers are idiots in comparison to Intel's: they ignored fundamental mistakes Intel made way back at the end of 2000, while Intel used this harsh lesson to improve by unprecedented amounts. At the same time Intel was moving forwards, AMD moved backwards.

And trying to match Intel's IPC is going to get them nowhere even quicker. You manage to absolutely ignore that AMD has maybe a tenth of the resources to spare and trying to beat Intel where Intel is the strongest is plain suicide. Yes, the GPU wing of AMD manages to do more with less, but that's the exception not the law. Multi-billion companies don't do anything because they "feel like it", like you're trying to claim. They wouldn't go with the design they did if there wouldn't be tons of theoretical proof that what they're doing will work within a margin of error. While you can blame AMD for Bulldozer, GF's 32nm process and Windows scheduling did the most of the burying. To call the engineers doing the work stupid you'd have to be willfully ignorant or kind of a dick.

Hatisherrif said:
So where is it? Where is the CPU you designed and/or fabricated? I am dying to see it, since nobody at AMD knows what they're doing.

While I don't agree with Axel, this line of thinking is silly. You don't have to be a cook to say the soup tastes terrible.

Olikan · May 2, 2012

pelov said:
If you're going to quote me to tell me I'm wrong then you may want to read the following paragraph, otherwise you might end up making a bit of a fool of yourself.

ROLF... yeah my bad...

i have been saying the same thing over, and over again and didn't read the rest...

i have to say, if the ipc increases here, ppl will be all "wow, L3 cache is trully important"

Hatisherrif · May 2, 2012

Stupid argument?

If you call someone stupid, you better say compared to WHOM they are stupid. Compared to you, they are definitely smarter since you can't make anything. Maybe compared to Intel engineers they are stupid, who knows...

That is the only reason I asked him if he had his own CPU. You can say something is a bad product, but you should not insult the people who made it without knowing squat about their position and what they had in mind. I don't think that line of thinking is silly.

pelov · May 2, 2012

Olikan said:
ROLF... yeah my bad...

i have been saying the same thing over, and over again and didn't read the rest...

i have to say, if the ipc increases here, ppl will be all "wow, L3 cache is trully important"

Haha. Well, it kind of is. Bulldozer has more L3 cache and is also able to utilize more of its L3 than SB or IB due to the way it handles data writes and subsequently reads as well. L2 cache is generally copied (at least some of it) to the L3 as it moves up but with Bulldozer they work independently (though some of it will still be copied) with a backwards access. This sounds great because you've more megerrbitz to utilize in cache but it creates big problems if you have cache misses, meaning you've got to re-access to the L2 cache to reach the L3 store. This wouldn't be too big of a deal if the L2 cache was quick, but it's horrendously slow with Bulldozer. The L3 is going to give the Vishera chips an added IPC bump without a doubt, but it's likely to be less beneficial then efficiency improvements in the L2.

This shouldn't sound surprising given how many people have been saying much the same since the reviews came out:

Fix the damn L2!!

LOL_Wut_Axel · May 2, 2012

Arzachel said:
And trying to match Intel's IPC is going to get them nowhere even quicker. You manage to absolutely ignore that AMD has maybe a tenth of the resources to spare and trying to beat Intel where Intel is the strongest is plain suicide. Yes, the GPU wing of AMD manages to do more with less, but that's the exception not the law. Multi-billion companies don't do anything because they "feel like it", like you're trying to claim. They wouldn't go with the design they did if there wouldn't be tons of theoretical proof that what they're doing will work within a margin of error. While you can blame AMD for Bulldozer, GF's 32nm process and Windows scheduling did the most of the burying. To call the engineers doing the work stupid you'd have to be willfully ignorant or kind of a dick.

While I don't agree with Axel, this line of thinking is silly. You don't have to be a cook to say the soup tastes terrible.

No, they just need to close in as much as possible to Intel on IPC. Have as much comparable overall performance as possible, and then attack Intel aggressively when it comes to bang-for-buck. Again, resources/R&D argument: excuse. AMD has no problems competing with NVIDIA when it comes to GPUs, so why do they struggle so much in comparison to Intel when it comes to CPUs? I'll say the same thing again: it's the engineers, probably combined with the product marketing guys.

Windows scheduling is another old excuse now. Patches came out addressing the issue, guess what, it barely improved anything. You're not gonna polish a turd with some fixes to the OS scheduler.

SlowSpyder · May 2, 2012

LOL_Wut_Axel said:
CPU performance is derived from that, yes. I'm comparing one aspect: IPC, or architecture raw speed. Nothing else. I made that very clear from the beginning.

Unfortunately, AMD can't defy nature, so they should've been smart enough from the beginning to know that making CPUs focused only on high clock speeds wasn't gonna get them anywhere. They didn't learn from Intel and the Pentium 4, and now it's come back to haunt them. They also forgot Intel has their own foundries which are much more advanced than GF's. It seems like their engineers completely forgot about the fact that power consumption increases exponentially as you raise clock speed, thinking they'd be able to achieve stock clock speeds of 5GHz.

Instead, they can only compete with Intel when it comes to clock speeds, meaning they weren't even able to achieve their high clocks goal--the average overclock of an unlocked Sandy/Ivy Bridge and Bulldozer is 4.5GHz, and the base clock speeds are nearly identical with an FX-8150 being 3.6GHz and a 3770K 3.5GHz. That's why their engineers are idiots in comparison to Intel's: they ignored fundamental mistakes Intel made way back at the end of 2000, while Intel used this harsh lesson to improve by unprecedented amounts. At the same time Intel was moving forwards, AMD moved backwards.

But what if SB only could run at 2GHz? The CPU's architecture has a lot to do with how high clock speeds can go. I'm not necessarily disagreeing with you, it is just that I think clock speed and IPC aren't as completely separate as they are made to sound in this thread. Clock speed may have zero to do with IPC, but the architecture does have something to do with clock seed.

Obviously Intel enjoys the best of both worlds, but I don't think that is a smart vs. dumb engineer thing as much is it an AMD R&D budget vs. Intel R&D budget thing.

nehalem256 · May 2, 2012

I dont know how anyone would reach the conclusion AMDs engineers are stupider than Intel. It seem more likely that when you throw more Engineers at CPU desgin you will get a better cpu.

Maybe with a few more Engineers AMD would have been able to tune Bulldozer better to match the IPC of Thuban. And then you would have had a CPU with 2 more cores, 10% higher clock speeds, and working turbo. Hardly a failure then.

pelov · May 2, 2012

LOL_Wut_Axel said:
Windows scheduling is another old excuse now. Patches came out addressing the issue, guess what, it barely improved anything. You're not gonna polish a turd with some fixes to the OS scheduler.

The gains in Win8 will be more substantial, on the order of 1-10%. Games that utilize between 3-7 threads should show close to that 10% number. Which is great, but it's also completely ridiculous to think that you have to spend another $100+ to buy a new OS just because Microsoft wants to hold hostage its updated thread scheduler...

It is an OS/programming issue, though. BD does bode far better in Linux than it does on Windows mainly due to the naturally worse threading in Win apps and the thread scheduler itself.

But that's not an excuse. AMD should have worked with MS to release a proper scheduler in time for BD's release instead of waiting several months and furthermore the improved Win8 scheduler should be included in a Win7 update rather than require people to buy a new OS. Then there's the argument that most users don't need more than 4 cores anyway as an overwhelming majority of their workloads are at or under 4 and past that they should be gunning for efficiency rather than moar coars, which is my personal reason why BD sucks as bad as it does

LOL_Wut_Axel · May 2, 2012

SlowSpyder said:
But what if SB only could run at 2GHz? The CPU's architecture has a lot to do with how high clock speeds can go. I'm not necessarily disagreeing with you, it is just that I think clock speed and IPC aren't as completely separate as they are made to sound in this thread. Clock speed may have zero to do with IPC, but the architecture does have something to do with clock seed.

Obviously Intel enjoys the best of both worlds, but I don't think that is a smart vs. dumb engineer thing as much is it an AMD R&D budget vs. Intel R&D budget thing.

Again, the comparison I made is raw speed, to see what each does clock-for-clock.

If you want to compare two CPUs or two CPU series running at different clock speeds to represent what they achieve in the real-world, that's fine, but even if SB could only clock to 2GHz and BD to 4GHz, SB would still have a faster architecture/IPC, but not the faster overall speed because, again, CPU performance is derived from architecture/IPC and clock speeds. Architecture decisions can affect clock speeds, but that's a trade-off the engineers have to see if it's worth it. Among other things, Bulldozer has a somewhat longer pipeline than Stars for higher clock speeds, but it didn't work out too well because power consumption limited the clock speed potential. Global Foundries also comes into the picture there, hence why AMD recently ended their exclusivity agreement.

R&D, again, an excuse. AMD and NVIDIA have been very strong competitors for four years.

Lepton87 · May 2, 2012

SlowSpyder said:
Clock speed may have zero to do with IPC, but the architecture does have something to do with clock seed.

That is not true, IPC changes with clock-speed. Look at charts that compare prescott to northwood. The situation looks different at 4GHz than at 3GHz.

SlowSpyder · May 2, 2012

Lepton87 said:
That is not true, IPC changes with clock-speed. Look at charts that compare prescott to northwood. The situation looks different at 4GHz than at 3GHz.

I meant on a given architecture. That is a 3.2GHz Northwood has the same IPC as a 3.0GHz Northwood. And what I was trying to point out is that the architecture plays a role in how far the clock speed can go, so I think you can't completely separate the two.

LOL_Wut_Axel · May 2, 2012

nehalem256 said:
I dont know how anyone would reach the conclusion AMDs engineers are stupider than Intel. It seem more likely that when you throw more Engineers at CPU desgin you will get a better cpu.

Maybe with a few more Engineers AMD would have been able to tune Bulldozer better to match the IPC of Thuban. And then you would have had a CPU with 2 more cores, 10% higher clock speeds, and working turbo. Hardly a failure then.

Small group of intelligent, experienced folks>>>big group of dumb, inexperienced folks.

I don't know how valid the article was, but I read a piece that said that AMD fired their small group of engineers back in 2006 or so, which were the ones that made the Athlon XP and 64, for a much bigger group of younger, less experienced ones. Apparently these are the ones that have since brought us gems like Bulldozer. CPU architectures also take some five-six years from idea to design to execution stage. I hope you can see where I'm going...

SlowSpyder · May 2, 2012

LOL_Wut_Axel said:
Again, the comparison I made is raw speed, to see what each does clock-for-clock.

If you want to compare two CPUs or two CPU series running at different clock speeds to represent what they achieve in the real-world, that's fine, but even if SB could only clock to 2GHz and BD to 4GHz, SB would still have a faster architecture/IPC, but not the faster overall speed because, again, CPU performance is derived from architecture/IPC and clock speeds. Architecture decisions can affect clock speeds, but that's a trade-off the engineers have to see if it's worth it. Among other things, Bulldozer has a somewhat longer pipeline than Stars for higher clock speeds, but it didn't work out too well because power consumption limited the clock speed potential. Global Foundries also comes into the picture there, hence why AMD recently ended their exclusivity agreement.

R&D, again, an excuse. AMD and NVIDIA have been very strong competitors for four years.

Right, and Nvidia has had the faster single GPU more often then not. But even so, it's still apples and oranges. AMD and Nvidia have a common thread that keep either one from getting too far out of reach of the other, that is they both use TSMC. Intel has a lot of R&D dollars to stay ahead in the manufacturing game and to taylor their architecture for a given process. AMD's 32nm parts are not nearly as mature and Intel's.

I'm not saying AMD didn't make some missteps with Bulldozer. I think AMD's biggest mistake has more to do with management focusing so heavily on multithreaded performance vs. single threaded/lightly threaded performance. The engineers work on what they are told to build, they don't steer the ship.

LOL_Wut_Axel · May 2, 2012

Lepton87 said:
That is not true, IPC changes with clock-speed. Look at charts that compare prescott to northwood. The situation looks different at 4GHz than at 3GHz.

http://en.wikipedia.org/wiki/Instructions_per_cycle
http://en.wikipedia.org/wiki/Clock_cycle

Also, you can't just look at Prescott and Northwood and say that because, even though they're both based on the Netburst architecture, there are small architectural differences between both. Those small architectural differences can affect IPC.

Ivy Bridge is based on the Sandy Bridge architecture, but that doesn't mean they're the same thing: there's small tweaks in Ivy Bridge.

Abwx · May 2, 2012

LOL_Wut_Axel said:
Except you forgot the simple fact that you need higher voltage with higher clock speed, so yes, power consumption will rise exponentially, because higher voltage raises power consumption exponentially. That's why Intel was never able to reach 7-10GHz like they wanted to on the Pentium 4 and instead abandoned the idea altogether.

Your line about me not understanding EE is hilarious, because that's exactly what you've just displayed in your comment.

I forgot nothing , anybody would understand unless one is illiterate
in matter of EE..

Here what you were responding to :

Abwx said:
Increasing both parameters would yield a cubic law but certainly
not an exponential law that is only relevant for leakages.....

So learn to read before even attempting to draw one of your
numerous twisted logic.

Still ,a cubic (power) law is not an exponential law..

power law f(x) = x^c , c being constant
exponential law f(x) = c^x , c being constant.

Indeed , it s not only mathematically that your brain is
running at counterclock....

Lepton87 · May 2, 2012

SlowSpyder said:
I meant on a given architecture. That is a 3.2GHz Northwood has the same IPC as a 3.0GHz Northwood.

No, it does not have the same IPC, although with such a minor difference in clock speed it would be hard to measure it. When you increase CPU frequency by 50% does everything that is CPU limited gets faster by the same amount? It would if IPC stayed the same.

nehalem256 · May 2, 2012

LOL_Wut_Axel said:
Small group of intelligent, experienced folks>>>big group of dumb, inexperienced folks.

I don't know how valid the article was, but I read a piece that said that AMD fired their small group of engineers back in 2006 or so, which were the ones that made the Athlon XP and 64, for a much bigger group of younger, less experienced ones. Apparently these are the ones that have since brought us gems like Bulldozer. CPU architectures also take some five-six years from idea to design to execution stage. I hope you can see where I'm going...

But a large group of intelligent, experienced folks is still greater than a large one.

Arzachel · May 2, 2012

LOL_Wut_Axel said:
No, they just need to close in as much as possible to Intel on IPC. Have as much comparable overall performance as possible, and then attack Intel aggressively when it comes to bang-for-buck. Again, resources/R&D argument: excuse. AMD has no problems competing with NVIDIA when it comes to GPUs, so why do they struggle so much in comparison to Intel when it comes to CPUs? I'll say the same thing again: it's the engineers, probably combined with the product marketing guys.

Windows scheduling is another old excuse now. Patches came out addressing the issue, guess what, it barely improved anything. You're not gonna polish a turd with some fixes to the OS scheduler.

Easy, Nvidia has far less resources than Intel and Nvidia doesn't have fabs and a process node advantage. Nvidia vs AMD and Intel vs AMD is pretty apples to oranges.

And that ~7% loss in IPC is a pretty huge deal and won't get resolved until Win8.

Honestly, I think Bulldozer is a much forward thinking arch than SB to the point that I'd call it subjectively "better". Most apps don't use over 4 cores and 2 modules seem to scale much better than 2 "full" cores + HT and take about the same amount of space allowing for a beefier iGPU. You also have less redundancy which means less idle transistors leaking current. What's more, laptop chips are clocked a lot lower than desktop chips so the increased pipeline depth is definitely worth it there.

sefsefsefsef · May 2, 2012

Nobody in the world cares about IPC separate from performance. Performance is what people care about, and IPC is just one factor in it. Generally speaking, performance is IPC x clock speed. IPC can vary with clock speed, though, because not all performance-critical parts of the system work on the same clock (e.g., available memory bandwidth is largely independent of CPU clock speed), even on the same architecture.

So bragging about IPC is literally worthless, unless that IPC is accompanied by a high clock speed that can get real work done.

Ajay · May 2, 2012

Olikan said:
but clock speed is a part of cpu arquitecture ¬¬

Yep, uArch + implementation + process maturity (being able to bin higher clocks).

LOL_Wut_Axel · May 2, 2012

SlowSpyder said:
Right, and Nvidia has had the faster single GPU more often then not. But even so, it's still apples and oranges. AMD and Nvidia have a common thread that keep either one from getting too far out of reach of the other, that is they both use TSMC. Intel has a lot of R&D dollars to stay ahead in the manufacturing game and to taylor their architecture for a given process. AMD's 32nm parts are not nearly as mature and Intel's.

I'm not saying AMD didn't make some missteps with Bulldozer. I think AMD's biggest mistake has more to do with management focusing so heavily on multithreaded performance vs. single threaded/lightly threaded performance. The engineers work on what they are told to build, they don't steer the ship.

Except outright performance alone isn't the goal of everything. AMD won against NVIDIA with the HD 4000-6000 series when it came to performance/watt and performance/mm^2, and therefore had higher efficiency. AMD has now gone for a multi-purpose approach of sorts with the HD 7900 series, having comparable FP/compute performance to the GTX 580 but still having great gaming performance and good performance/watt and performance/mm^2.

Tahiti is 352mm^2 while GK104 is 294mm^2 and the fastest card featuring GK104 is almost 10% faster than the one featuring Tahiti, but AMD made that small tradeoff for 2x higher compute performance than NVIDIA, which admittedly almost no consumers care about. But in comparison to what we had before, which was NVIDIA with enormous 500mm^2+ GPUs for a small gain of 15% in comparison to AMD, it's not bad at all that AMD has a slightly bigger GPU and slightly lower gaming performance for 2x higher compute. I suspect they did this to appeal more to the professional market while still also not losing focus of their main market, which was gaming.

Here's the thing with single-threaded performance: it affects speed in ALL workloads, no matter if single or multi-threaded, while multi-threaded performance only affects speed in multi-threaded workloads. Theoretical CPU A has a theoretical score of 4000 in single-threaded. Theoretical CPU B has a theoretical score of 2500 in single-threaded. Theoretical CPU A has four cores, and theoretical CPU B has six to make up for the difference in MT. But then there's the problem that as you add cores, core scaling decreases and therefore there are bigger points of diminishing returns. For CPU A, running a heavily multi-threaded workload, let's say you get 99% more performance than two cores. Add two cores more on top of that, though, and scaling decreases to 90%. Then you end up with a theoretical score of 15960 for CPU A (4000*3.99) and a theoretical score of 14750 for CPU B (2500*5.90). That's why single-threaded performance is so critical to desktops: it affects single AND multi-threaded programs.

Ajay · May 2, 2012

Olikan said:
actually no, there is more changes...

the biggest one, is that vishera will be able to do 4 64bit Movs per clock, while trinity will do 2 64bit Movs per clock

Movs are very important in x86 code, the IPC gain could be huge if there is no bottleneck (but it will)

AMD is going this route....
Bulldozer -> Piledriver v1 + GPU -> Piledriver v2 + No GPU -> Steamroller v1 + GPU -> Steamroller v2 + No GPU -> etc.

Good to know. One problem AMD will still have is a single 256 FMAC per module whereas Haswell will have two per core (which can do vector int operations as well).

I still wonder why AMD didn't put Steamroller on it's desktop roadmap.

Official Improvements of Piledriver Cores.

Diamond Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Senior member

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Platinum Member

Lifer

Senior member

Senior member

Lifer

Diamond Member

Lifer