Fudzilla: Bulldozer performance figures are in

Page 68 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Voo

Golden Member
Feb 27, 2009
1,684
0
76
He suspects... What does that tell you? He hasn't signed any NDA, and has no figures, or chips. He may as well suspect to know the winner of 2012 election. In short, it amounts to plain speculation, and nothing more. Certainly not anything worthwhile. I'll tend to think JF who works for AMD, has both access to chips and engineers would know more than him about the processor.
Yeah and as we all know PR NEVER ever would misrepresent or spin facts in a way that the company looks better. I mean it's also well known that Intel never cheated on any benchmarks or build compilers that penalized AMD chips - you can find lots of PR statements about that after all! (Well until they got caught with facts). So clearly no neutral person can state any facts, but hey that's what the whole thread is about.


In many a tests at [H], a good air cooling setup was found to be as good/ better than most ready to go water cooling set-ups available. They used an Antec solution, which someone on another thread mentioned is only as good as a Noctua DH14.
As I understood it the antec cooler wasn't yet released - any link to which model they used if that isn't the case? Would be interesting.
 

garagisti

Senior member
Aug 7, 2007
592
7
81
You have already posted the same thing 3 times, in 3 different threads (2 of which are unrelated). Do you have a good reason for this? :hmm:

For a good reason alright. i knew that you and other from team glue, i mean blue, wouldn't read it even once.
 

BlueBlazer

Senior member
Nov 25, 2008
555
0
76
For a good reason alright. i knew that you and other from team glue, i mean blue, wouldn't read it even once.
There is already in fact multiple posts of this AMD overclocking event news in at least 4 other threads, including the thread about Sept 12 thingy (I posted there first and in fact I broke the news first). Each of those news come from different sites, except yours came from a single site. This can be considered as spam. Are you advertising for SemiAccurate, or are you Groo (Charlie D)? :p
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
No, he's saying that AMD already came out and said that the project goals for Bulldozer were 25%+ higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

The full quote:

The 2.37mm2 integer execution unit supports single-cycle data bypass among four independent func tional units. Compared to previous AMD x86-64 cores, project goals reduce the number of F04 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.

As we are only talking about the integer execution units themselves, IPC remains constant relative to previous generations.

The previous generation had 3 ALU/AGUs, we now have 2/2. Previous generation was unable to feed the integer unit well enough, and some instructions could only be used on one ALU/AGU.

Bulldozer has 33% more throughput to the integer unit, better prediction & prefetch, more efficient cache layout, 4x the available L2 cache ( and not just a victim cache anymore! ), and MUCH more going for it.

All together, I'd anticipate 25% better IPC ( JF-AMD has said MANY MANY times that IPC increases ), a bit more/less here/there... and 10% average overhead/core due to shared resources within a module under heavy load.

That means it will be 10-15% behind i5 2500k in lightly threaded loads (thanks to turbo, sometimes on par / faster), and about 20-30% faster in heavily threaded workloads (lack of turbo).

i5 2500k is a good 40-50% ahead.:'(

However, FPU is one area where things get REALLY interesting... hard to determine the performance of the FMAC, though... nothing to go on...

8x128, 4x256 arrangement should cause equal AVX/SSE performance with heavy threading... how that compares to SB.. :confused:

--The loon
 

BlueBlazer

Senior member
Nov 25, 2008
555
0
76
I'm sorry, i really don't understand.:(
He doesn't have to "good reason" at all. He says "I wouldn't read it", which is false. There's already a thread for the news. He simply posts the same news from the same site in multiple threads majority of which is unrelated, which IMHO is considered "spamming". :hmm:
 

garagisti

Senior member
Aug 7, 2007
592
7
81
There is already in fact multiple posts of this AMD overclocking event news in at least 4 other threads, including the thread about Sept 12 thingy (I posted there first and in fact I broke the news first). Each of those news come from different sites, except yours came from a single site. This can be considered as spam. Are you advertising for SemiAccurate, or are you Groo (Charlie D)? :p

You didn't read the link. It outlines the difference between the Intel chip that hit 8.3 and BD that hit 8.4. Also they mention a couple of other things which people from team glue will obviously ignore, like you did.

Sometime ago, all one heard was "BD will never match overclocking ability of SB." Now, "IPC will suck..." and "what difference does it make." Honestly, hypocrisy is strong on threads here.

If you ask me, i may still buy SB/ SB-E, IB if i don't like numbers that will come up. However, numbers also mean perf/$. Let us see... However, the shite move to cut features on X79 has pissed off a lot of people. Let us see how this turns out to be.

Oh yeah, by the by, i'm not Charlie. I'm someone who ALSO reads S/A though.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
Hypocrisy is only strong with fanboys. I don't think you understand most of the AT enthusiasts though if you think most here are fanboys. Virtually everyone dismissing the 8.4ghz BD run also dismiss the 8.3 Celeron run as meaningless. The only overclocking that matters :

(1)- OC with stock cooling / voltage.
(2)- OC with decent air cooling / modest voltage jump.
(3)- OC with water / upper voltage limits.

Suicide runs, liquid nitrogen, all of this stuff is just masturbation really.

I really want BD to be good. Right now for gaming I have to recommend SB, and it's kind of annoying because the cheapest unlocked CPU is the 2500k. Even if BD isn't all that competitive on the higher end, it'll be nice if AMD comes out with one that has around a $150 price that overclocks and outperforms 2100, 2400, etc, and gets to around stock 2500 levels. That'd be good enough for me to recommend to people, because it would give about $70 extra to put towards SSD, more memory, better GPU, etc.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
The full quote:



As we are only talking about the integer execution units themselves, IPC remains constant relative to previous generations.

The previous generation had 3 ALU/AGUs, we now have 2/2. Previous generation was unable to feed the integer unit well enough, and some instructions could only be used on one ALU/AGU.



--The loon

What if he refers to the fact that previous generation has fluctuating
IPC while BD has less fluctuating IPC , i.e, higher average IPC ?...

That would be in line with the reduced execution ressources
that thus must have better average throughput...
 

BlueBlazer

Senior member
Nov 25, 2008
555
0
76
You didn't read the link. It outlines the difference between the Intel chip that hit 8.3 and BD that hit 8.4. Also they mention a couple of other things which people from team glue will obviously ignore, like you did.
I don't find anything new in there that wasn't already published in other sites. :p

Sometime ago, all one heard was "BD will never match overclocking ability of SB." Now, "IPC will suck..." and "what difference does it make." Honestly, hypocrisy is strong on threads here.
Does that give you the "good reason" to "spam"? There's already a exciting/heated discussion (includes "mudslinging" action) over at the Overclocking Preview Bulldozer 8150 thread. If you have something contribute then feel free to join their discussion. ;)

Oh yeah, by the by, i'm not Charlie. I'm someone who ALSO reads S/A though.
I've stopped looking at Charlie seriously after he started dancing in the aisle trend. I also prefer to do my own homework.. :D
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
All falling on deaf ears Matt. :D JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.

Anyways, what will be your concise opinion of IPC improvements in percentage over Phenom II, unless they've found some other TLB bug... I'd tend to opine about 10-20% in most scenarios unless using fancy coding where gains could be more.
Well, I think that your range is a little bit too high for the average. But depending on the app IPC might even be better than that (though rare cases I assume) and sometimes even be lower than PHII's. A mixed bag as it comes with a totally new arch :)

I wonder what some people would say when they get to know that the BD ES tested at SiSoftware in some tests has a lower per core/clock performance than Bobcat.

However, there are still some unknowns regarding AMD's definition of IPC. Do they look at IPC when code/data is in the caches or is it the measureable IPC when running real world apps? And did they talk about IPC using optimized code or code being optimized for 10h, SB or sth else? And

BTW according to an AMD white paper about compiler optimizations they even recommend not to use the AVX flag when compiling with ICC. This means that AVX code produced by ICC will likely perform worse than when using SSEn extensions (the recommended setting).

I don't understand your point.

There is non-linear scaling with overclocking, but IPC of SB at 4.7ghz is still around 40% superior over Phenom II at 4.7ghz as it is 40% superior at 3.3ghz.
What's wrong with that? Scaling is often related to the memory subsystem. Since SB has an advantage there...


This 'attack' on IPC has become a recent phenomenon from AMD side. Should we revisit AMD's historical roots when their superior in IPC CPUs were actually good?

Athlon XP+
Athlon 64
Athlon X2 / FX

It's interesting how AMD keeps dismissing IPC as irrelevant in the last 5 years given that only a handful of code exists that uses 6-8 threads. I find it very ironic because in the last 10 years a CPU with superior IPC has shown to be superior most of the time.
Source? One reason for going to more cores can be found in Pollack's rule, as defined by this former Intel researcher. For servers, HPC, video encoding, rendering etc. (highly parallelizable tasks) more cores have a better energy efficiency than a fat wide core. There are hundreds of papers on that. But as you know still a lot of apps esp. on the desktop, where games and so on even just use a handful of threads, would benefit from using that fat wide core.

It's sad to see that everything Athlon 64 stood for is what SB is today, while AMD went backwards towards Pentium 4 era of throwing more "specs" (cores) to try to beat efficiency. It's even more ironic considering AMD's GPU division is doing the exact opposite of their CPU division.
The last time I checked the number of shaders went up and are planned to go up even further. Changing architecture to GCN is another story.

Also, isn't it better to get 40-50% faster performance per clock so that you can reduce power consumption since you won't have to clock your processor's frequency as high? Isn't this what gave Athlon 64 the edge over P4?
It's not that simple. IIRC P4 still had more transistors per core than Athlon 64. It had even a higher issue rate per cycle in several cases. And energy efficiency is not just definied by issue rate. What about not having the data in place (prefetched and waiting in the cache)? This will cost lots of cycles where such older arch's consumed power (well, not that much) because they neither could clock gate nor power gate their logic. P4 had another option then: run the other thread.

You might find this paper interesting:
http://www.duke.edu/~BCL15/documents/azizi2010-isca-opt.pdf
"Energy-Performance Tradeoffs in Processor Architecture and Circuit Design: A Marginal Cost Analysis"

It shows different architecture types and how they perform. A nice way to look at these things.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
The 2.37mm2 integer execution unit supports single-cycle data bypass among four independent func tional units. Compared to previous AMD x86-64 cores, project goals reduce the number of F04 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.

English is not my native language but the way I read it is,

Compared to previous AMD x86-64 cores (Deneb/thuban in Phenom X4 X6, Int cores),(Coma) project goals reduce the number of FO4 inverter delays per cycle by more than 20% (lower delay = higher performance IPC), (Coma) while maintaining constant IPC (this line explains that we have 20% less delay with a constant IPC ie not fluctuating), (Coma) to achieve higher frequency and performance in the same power envelope (to achieve higher frequency and performance due to the 20% less delay at the same power envelope), (Coma) even with increased core counts (We can achieve the performance and higher frequencies in the same power envelope even when we increase core count).

So, I would say that translated in to

A 4 module 8 core BD will have 20% higher IPC per Int Core, will be able to operate at higher frequencies and at the same power envelope (95 or 125W) vs 6-core (deneb/thuban).

We cannot have 20% less delay and at the same time have the same IPC as before, do we ?? Unless, im missing something .
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Just to have an idea of past processors IPC ;)

frequencyvsipc.jpg
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
What if he refers to the fact that previous generation has fluctuating
IPC while BD has less fluctuating IPC , i.e, higher average IPC ?...

That would be in line with the reduced execution ressources
that thus must have better average throughput...

The issue before was both. Limited bandwidth to the integer unit, and uneven performance within the integer unit.

It appears that they went both ways from the Bulldozer programming optimization docs I've read.

Before int mult could only occur on ALU0 & LZCNT and POPCNT could only occur on ALU2.. I'm sure there's more.

If the new ALU units are both equal, we could see some doubling of performance(internally) in some areas such as multiplication.

Interesting times coming.. can't wait to see how all these changes work together...

http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=1 is a nice source...

--The loon
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
@AtenRa:
Nice pic!

But IPC is not in direct connection to clock frequency or the resulting cycle time. Integer ops executing 1 per cycle wouldn't execute more (e.g. 1.2) per cycle if this cycle time is shorter.
 

intangir

Member
Jun 13, 2005
113
0
76
We cannot have 20% less delay and at the same time have the same IPC as before, do we ?? Unless, im missing something .

You're missing something. FO4 delay only directly affects frequency, not IPC. It's a measure of the logic depth between pipeline stages (how much work gets done per stage) and limits the minimum cycle time needed for a pipestage of logic to stabilize its output value. Minimum cycle time is directly translated to maximum frequency.

The statement is commenting on both essential elements of per-core performance. Performance is the product of IPC and frequency. No analysis of performance is complete without knowing both values. It's saying frequency increases while IPC remains the same.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
What if he refers to the fact that previous generation has fluctuating
IPC while BD has less fluctuating IPC , i.e, higher average IPC ?...

That would be in line with the reduced execution ressources
that thus must have better average throughput...

I responded earlier, but my post seems to have been lost...

Which is okay, I think I have a better way of saying it this time anyway:

I read the information as meaning the FO4 delay improvements didn't effect IPC themselves.

Indeed they shouldn't, but they should help clocking and power usage.

Bulldozer IPC is unrelated to this change unless a change in performance resulted - which it did not.

The integer unit has been optimized because it was in need of it. It could feed 3 units, but it had 6. The units were not very flexible, with only one ALU capable of multiplication. I'd hope that the 2/2 setup now uses general purpose ALU/AGU designs so 4 ops instead of the previous 3 can be accomplished concurrently in the same unit.

That would be a 33% increase over Phenom II as a starting point of IPC improvements.

--The loon
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
OK i believe i got it,

20% less FO4 at the same IPC (Constant) means = we execute one Instruction faster in time, but number of Instructions remain the same(Edit: per Cycle).

So at the end, even if we have the same IPC(Instructions Per Cycle) but because FO4 = 20% less, each Instruction is executed 20% faster(takes less time).

Phenom II = 1.5 IPC takes 10ns to execute
Bulldozer = 1.5 IPC takes 20% less time to execute, 8,33ns (Edit: 8ns)


did i pass the class ?? :D
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
OK i believe i got it,

20% less FO4 at the same IPC (Constant) means = we execute one Instruction faster in time, but number of Instructions remain the same.

So at the end, even if we have the same IPC(Instructions Per Cycle) but because FO4 = 20% less each Instruction is executed 20% faster(takes less time).

Phenom II = 1.5 IPC takes 10ns to execute
Bulldozer = 1.5 IPC takes 20% less time to execute, 8,33ns


did i pass the class ?? :D

Precisely my understanding of it as well, and because the response is ready sooner, clock rate can increase so the potential for performance benefit can be realized ( doesn't help if the result is ready in 8ns if you don't fetch it but every 12ns... ).

An IPC improvement would be making it "fatter" so that more results are ready per cycle. If the integer core took 12ns to provide 2 results that is better than a result every 7ns.

I'm just a hobbyist, though, so I could have it completely wrong :whiste:

--The loon
 

sequoia464

Senior member
Feb 12, 2003
870
0
71
Precisely my understanding of it as well, and because the response is ready sooner, clock rate can increase so the potential for performance benefit can be realized ( doesn't help if the result is ready in 8ns if you don't fetch it but every 12ns... ).

An IPC improvement would be making it "fatter" so that more results are ready per cycle. If the integer core took 12ns to provide 2 results that is better than a result every 7ns.

I'm just a hobbyist, though, so I could have it completely wrong :whiste:

--The loon

Dude - If your just a hobbyist it must mean I'm still dragging my knuckles when I walk. (hate it when I step on them)
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Dude - If your just a hobbyist it must mean I'm still dragging my knuckles when I walk. (hate it when I step on them)

Well, then I hope you have a healthy supply off Bactine!! We wouldn't want your knuckles to get infected, now, would we?

Indeed so, I am just a hobbyist when it comes to microprocessor design. By trade, I am a lowly computer tech...

--The loon
 

Kalessian

Senior member
Aug 18, 2004
825
12
81
I really want BD to be good. Right now for gaming I have to recommend SB, and it's kind of annoying because the cheapest unlocked CPU is the 2500k.

I agree, I really hope AMD gets this right. I kind of don't want to buy intel just because of this. If AMD can force intel to release more k series, or if AMD has great black editions on their own, I'll vote for the winner with my wallet.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
All i did read is that a BD core is more powerfull than a K8 core ,
according to AMD techies.
The link was posted by there..

Today, we have announced that we’re shipping a product (“Interlagos”,) that’s slightly more than 100 watts. There are actually 16 processors in there, and each one is more powerful than the original AMD Opteron processor.

http://blogs.amd.com/fusion/2011/09/14/what-the-amd-tech-guy-said/
 
Status
Not open for further replies.