Official Improvements of Piledriver Cores.

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Don Karnage

Platinum Member
Oct 11, 2011
2,865
0
0
Its the same speedracer concept. You bet on clockspeed to counter either a long pipeline or/and reduced core with limited issue ports to reach the desired performance. Thats also why Bulldozer/Pilediver is a 2 issue wide INT uarch with a shared FPU vs Conroe/nehalem/Sandys 4 issue wide INT/FPU.

AMD took a huge gamble and lost it. Small simple cores at high speed. They just couldnt reach the speed. I assume they had expected 5-6Ghz as stock speeds.

Another loss was the shared FPU. But that tradeoff seems to have been the first of many victims.

Netburst for example had 1 complex and 2 simple issue ports. K8 had 3 complex.
image001.gif

Bulldozer at 6Ghz would be interesting
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Its the same speedracer concept. You bet on clockspeed to counter either a long pipeline or/and reduced core with limited issue ports to reach the desired performance.

This seems to indicate a fundamental lack of understanding of CPU architecture. You don't accidentally create a long pipeline and then clock it high to overcome its shortcomings. You create a long pipeline *so you can* crank up the clock speed. There is absolutely nothing wrong with long pipelines or high clock speeds.

The clock speed of a CPU is dictated by the latency of its longest pipeline stage. If you can break that longest pipeline stage into two or more shorter sub-stages, then you suddenly have a shorter "longest" pipeline stage, so your clock speed can now go higher.

The thing that people say is bad about a long pipeline is the branch mis-predict penalty. This is a problem that can be fixed with improved branch predictors. I heard a rumor that there was something wrong with Bulldozer's branch predictor (which was supposed to be superior to anything AMD had made before), and that is one of the big reasons why Bulldozer wasn't great. If you can get the branch predictor right, though, then having a longer pipeline is only good for performance.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
actually, no...BD have 4 issue scheduller, but it need 8....
so each core only uses 2 issue...:\

ignore the half-clocked....my mistake:\

That is not entirely correct.

Each INT core has 4 micro-ops execution pipe-lines(4-wide) and each INT core has its own Scheduler. Each Scheduler is four(4) micro-ops wide.
If you had a single Scheduler for both INT cores (2x 4-wide) you would needed an 8-ops wide Scheduler.

bulldozer-diagram-1.jpg


2.10.1 Integer Scheduler
The scheduler can receive and schedule up to four micro-ops (ìops) in a dispatch group per cycle.
The scheduler tracks operand availability and dependency information as part of its task of issuing
ìops to be executed. It also assures that older ìops which have been waiting for operands are
executed in a timely manner. The scheduler also manages register mapping and renaming.



2.10.2 Integer Execution Unit
There are four integer execution units per core. Two units which handle all arithmetic, logical and
shift operations (EX). And two which handle address generation and simple ALU operations

(AGLU). Figure 2 shows a block diagram for one integer cluster. There are two such integer clusters
per compute unit.

scaled.php


Macro-ops are broken down into micro-ops in the schedulers. Micro-ops are executed when their
operands are available, either from the register file or result buses. Micro-ops from a single operation
can execute out-of-order. In addition, a particular integer pipe can execute two micro-ops from
different macro-ops (one in the ALU and one in the AGLU) at the same time.
(See Figure 1 on
page 32.) The scheduler can receive up to four macro-ops per cycle. This group of macro-ops is
called a dispatch group.

EX0 contains a variable latency non-pipelined integer divider. EX1 contains a pipelined integer
multiplier. The AGLUs contain a simple ALU to execute arithmetic and logical operations and
generate effective addresses.
A load and store unit (LSU) reads and writes data to and from the L1
data cache. The integer scheduler sends a completion status to the ICU when the outstanding microops
for a given macro-op are executed.

The INT Scheduler can issue up to 4 micro-ops(4-Wide), the INT core (4-pipelines wide) can execute up to four(4) micro-ops.

http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Yeah cause official anything from AMD means so much after JF fed us lies for months about BD. Good luck getting anyone to believe this crap AMD.

I'll wait for reviews from trusted sites.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Yeah cause official anything from AMD means so much after JF fed us lies for months about BD. Good luck getting anyone to believe this crap AMD.

I'll wait for reviews from trusted sites.
+1 :thumbsup:
 

PhoenixEnigma

Senior member
Aug 6, 2011
229
0
0
I still have hope for piledriver
+1

If it can at least make the cost/performance of a new CPU respectable when compared to the cost/performance of a new platform, I'll be fairly happy. Assuming, of course, it's a drop in upgrade for AM3+.

Don't really have a lot of hope or interesting if it means replacing the motherboard, though, that's just too much ground for AMD to cover in one generation, performance-wise.
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
Funny thing. I noticed the Bulldozer 8150 FX dropped to 219.99 at the Egg (yes I know AMD dropped prices) however, they also added a $15 promotional code drop. I've been watching this for awhile. Perhaps the Desktop version of Piledriver is closer than we think? Cleaning out inventory? Your thoughts?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Yeah cause official anything from AMD means so much after JF fed us lies for months about BD. Good luck getting anyone to believe this crap AMD.

I'll wait for reviews from trusted sites.

Leaving the IPC debate out, what else do you believe JF and/or AMD lied about BD's architecture ??
 

Arzachel

Senior member
Apr 7, 2011
903
76
91
my power 7 comparison is the pipeline lenght...

prescott had 31 stages
Bulldozer and Power7, around 20

And Sandy Bridge has 18 stages iirc so the Netburst comparison is stupid.

Yeah cause official anything from AMD means so much after JF fed us lies for months about BD. Good luck getting anyone to believe this crap AMD.

I'll wait for reviews from trusted sites.

Did JFAMD kill your puppy or something?
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Did JFAMD kill your puppy or something?

No, but he did lie for months even after accurate leaked benchmarks were out proving him dead wrong. And then he did attack any posters posting the leaked benchmarks saying they were fake when he knew damn well they were not fake and did indeed reflect BD's actual performance. There were even posters on other forums banned over conflicts/arguments they had with JF about the whole IPC deal, arguments he carried out long after he would have known he was feeding us lies.

I basically have zero faith in AMD's official statements anymore.
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Leaving the IPC debate out, what else do you believe JF and/or AMD lied about BD's architecture ??

Other than IPC JF was honest, but since IPC is the most important metric for a CPU it wasnt a small detail he was lieing about.

It also pissed me off that AMD announced BD would be backwards compatable with AM3 just to get people to buy AM3 systems and then a year later decided to screw us all over and make it AM3+, since it turned out to be slower than PH II there is no reason it could not have been released on AM3.

This has made me very weary of taking anything AMD says seriously anymore. I like AMD, i own more AMD systems than intel, only my gaming box is intel. But the last 2 years have been brutal for AMD and they seriously need to start competing with CPU's intel has released after socket 775.

I hope PD is a sucess, im just not buying anything AMD says about it till i read reviews from neutral sites.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
+1

If it can at least make the cost/performance of a new CPU respectable when compared to the cost/performance of a new platform, I'll be fairly happy. Assuming, of course, it's a drop in upgrade for AM3+.

Don't really have a lot of hope or interesting if it means replacing the motherboard, though, that's just too much ground for AMD to cover in one generation, performance-wise.

-1

piledriver v2 seems to bring IPC up, and clocks\watt lower....but i don't expect miracles

That is not entirely correct

:hmm:
not that bad when you get everything from your brain XD
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Other than IPC JF was honest, but since IPC is the most important metric for a CPU it wasnt a small detail he was lieing about.

It also pissed me off that AMD announced BD would be backwards compatable with AM3 just to get people to buy AM3 systems and then a year later decided to screw us all over and make it AM3+, since it turned out to be slower than PH II there is no reason it could not have been released on AM3.

This has made me very weary of taking anything AMD says seriously anymore. I like AMD, i own more AMD systems than intel, only my gaming box is intel. But the last 2 years have been brutal for AMD and they seriously need to start competing with CPU's intel has released after socket 775.

I hope PD is a sucess, im just not buying anything AMD says about it till i read reviews from neutral sites.


Since IPC is application depended(ISA's, compilers etc), i can show you a lot of benchmarks that BD has higher IPC than Phenom. What all of us desktop users experienced as an IPC decrease in benchmarks, server systems show an increase of IPC, MT scalability and throughput.
Im not trying to defend him, but JF was and still is a server guy and he always mentioned that.

4x Opteron 6272 16core 2.1GHz 115W are faster than 4x Opteron 6142 12 core 2.2GHz 115W.

It is very interesting to note that BD Opteron 6272 has 4 FPUs vs Opteron's 6172 6 FPUs and still it is faster ;)

spec2012-01a.jpg


A testament of how much better BD is in server realm is the 1.1% increase of Server market share AMD show in Q1 2012.

http://www.investorvillage.com/mbth...mValue=235910&dValue=1&tid=11666535&showall=1

overall 18.9M units 19.1% share +0.3% over Q4
server 286K units 6.8% share +1.1% over Q4
dekstop 9.7M units 22.7% share +0.4% over Q4
portable 8.9M units 17.1% share +0.1% over Q4
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Funny thing. I noticed the Bulldozer 8150 FX dropped to 219.99 at the Egg (yes I know AMD dropped prices) however, they also added a $15 promotional code drop. I've been watching this for awhile. Perhaps the Desktop version of Piledriver is closer than we think? Cleaning out inventory? Your thoughts?

Newegg is selling them at exactly the price they buy them at, or at least per 1k units at $205, so it's a decent deal. The better deal would be the 8120 at even cheaper but I'd still steer away from BD unless you know exactly why you want/need it and that it will perform better.

In some cases the BD IPC is actually higher than SB/Thuban, so this notion of "IPC above all else" is only relevant to particular workloads. Though with that said it's also undeniable that with the longer pipeline, slower L2 latency and failure to reach higher clock speed goals BD's generally have a big gap as far as IPC goes between their Intel 32nm counterparts.

And JF, or at least the marketing team, also lied about performance :p "50% better than Phenom II" was something I remember being thrown around. AMD's PR team needs a serious lesson in how to release a CPU. If it's not performing up to expectations then you're better off keeping quiet, and if it performs better than you expected then it's better to undersell it. Certainly this time around that should be their motto after the Bulldozer fiasco.

The clock speeds on the Vishera parts should be interesting to see. The BD pipeline is apparently 25% longer than Phenom II but those 30%+ clock speed goals weren't reached. Judging by the L2 cache latency, though, you'd assume that BD has a 50% longer pipeline so there's probably some issues with the L2 access speeds that need still to be addressed. At low 4ghz stock clocks and high 4ghz Turbo speeds it should look like an entirely different chip once a few issues are ironed out. I wouldn't expect it to be nearly as efficient as SB and certainly not IB at 22nm, but let's at least hope it doesn't suck as AMD can't afford yet another failure (and neither can my stock portfolio).
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
-1

piledriver v2 seems to bring IPC up, and clocks\watt lower....but i don't expect miracles.

I wonder if Steamroller will be built on GF's 22nm SHP? AMD could have left Steamroller off of the desktop roadmap because they will use Opterons to ramp up the process since yields and top bins won't be as important. Based on the chart in on the first page, Steamroller will have more modules, but maybe it will have cut lines to carve out the desktop dice later on. Then again, since Excavator shows a marked upward jump in performance, it may be the first 22nm CPU.

Have you seen a Steamroller evolution of the Trinity APU on any roadmaps?
 
Last edited:
Aug 11, 2008
10,451
642
126
Funny thing. I noticed the Bulldozer 8150 FX dropped to 219.99 at the Egg (yes I know AMD dropped prices) however, they also added a $15 promotional code drop. I've been watching this for awhile. Perhaps the Desktop version of Piledriver is closer than we think? Cleaning out inventory? Your thoughts?

My thought is that people are finally realizing BD was grossly overpriced initially for the performance it delivered in the vast majority of desktop workloads, and the marketplace is adjusting for that.
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
A testament of how much better BD is in server realm is the 1.1% increase of Server market share AMD show in Q1 2012.


hahaha.gif


It must be because Bulldozer is sooo much better than Phenom II, there are no other variables for AMDs incredible growth from 5% to 6%.
 

ctsoth

Member
Feb 6, 2011
148
0
0
hahaha.gif


It must be because Bulldozer is sooo much better than Phenom II, there are no other variables for AMDs incredible growth from 5% to 6%.

Not to sound abundantly silly, but a 1.1% increase in server market share is significant... You did read server share right? I think total market share did increase by ~1%, a trend I expect to continue. In a theoretical vacuum bulldozer is a good processor, it just didn't live up to expectations, or perform well price/performance depending on the workload.

I expect that trinity/piledriver will improve greatly on the platform, but in recognition of the general enthusiast attitude, if it doesn't bake you a cake and win your video games it will be a "failure."
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Have you seen a Steamroller evolution of the Trinity APU on any roadmaps?

roadmaps just say, greater parelelism...

but there was a something about radix-8, not long ago

and some "rumors" about 8x decoders
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
Not to sound abundantly silly, but a 1.1% increase in server market share is significant... You did read server share right? I think total market share did increase by ~1%, a trend I expect to continue. In a theoretical vacuum bulldozer is a good processor, it just didn't live up to expectations, or perform well price/performance depending on the workload.

I expect that trinity/piledriver will improve greatly on the platform, but in recognition of the general enthusiast attitude, if it doesn't bake you a cake and win your video games it will be a "failure."

If 1.1% increase while Intel was waiting to transition off their old servers on to the new SBe lineup is significant, what is the significance of Intels 94% market share compared to AMDs 6%?