Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 103 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Intel didn't Do FMA 4 in SB and likely won't use FMA4 at all , But FMA 3 I hope Intel uses 3 and not 4 . Intels had FMA for years on Itanic it won't be a problem . When we have use for it intel will be ready Same as open CL on the IGP when there are programms intel will be there .

There is no jump since their is nothing after K8.. That is called a jump away. No its called a run away from K9 doggy jokes ruff ruff!
And saying SB isn't any were near 20 % faster is an out and out LIE. Its only true if you through out the highs which you all did to fit your little worlds.

When averaging you can't through out the highs and lows to fit your agenda . You said AVX hasn't many programms ready . Thats part of the IPC So evertime a programm comes out that puts it to use . The IPC increases. SO as we move forward SB increases its IPC that pretty cool.

YOU talk about FMA . AMD hasn't got a cpu on the Market with FMA . Please show it to me . IB will be at least 20% faster than SB + any surprises that are thrown in . Than lets figure QS into this IP equation . Ya not even close to 20% ya right only if you don't figure in all benckmarks. Nice math you people were taught. Through out the Highs than not figure AVX in because of recompiles . nice logic.
 
Last edited:

Riek

Senior member
Dec 16, 2008
409
15
76
Intel didn't Do FMA 4 in SB and likely won't use FMA4 at all , But FMA 3 I hope Intel uses 3 and not 4 . Intels had FMA for year on Itanic it won't be a problem . When we have use for it intel will be ready Same as open CL on the IGP when there are programms intel will be there .
indeed SB doesn't do FMA at all. at least BD has some form of AVX support. Altough they do not get a throughput advantage for AVX, they do with FMA however.

When averaging you can't through out the highs and lows to fit your agenda . You said AVX hasn't many programms ready . Thats part of the IPC So evertime a programm comes out that puts it to use . The IPC increases. SO as we move forward SB increases its IPC that pretty cool.

So every app that will use FMA ipc of BD improves also? neet! Or is that way of thinking only for intel products? since you seem to be able to make a big distinguish between them in the same post... it almost make it seem deliberate that you do that distinguishment.

When averaging you can't take the heights and the lows? so the highest difference is for you the ipc difference? fine by me. But that would just mislead people since those bennifits are only in very specialized cases. If you would hold the same metric for llano we might even see >30% increases due to the Div unit in very specific calculations. Even BD would probably show above 50% using integer SSE. besides it being completely useless in metric, it would give nice excel bars to brag with .
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
You are making fool of yourself..

When only 128 bit datas are issued by the scheduler ,
AMD s FP execution unit can execute two simultaneously ,
while SB will be stuck at one 128 bit data per execution unit,
though it will execute two if AVX is implemented, and this only
when very well opimised code is implemented.

In short, BD will have very high FP single thread performance
even with current SSE code while SB will have to wait for
optimised softwares that will only allow it to be on par with BD.

BS AMD did a 64+64=128 not so good . The only fools in these threads are the ones who bought into AMDs fairytales
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
indeed SB doesn't do FMA at all. at least BD has some form of AVX support. Altough they do not get a throughput advantage for AVX, they do with FMA however.



So every app that will use FMA ipc of BD improves also? neet! Or is that way of thinking only for intel products? since you seem to be able to make a big distinguish between them in the same post... it almost make it seem deliberate that you do that distinguishment.

When averaging you can't take the heights and the lows? so the highest difference is for you the ipc difference? fine by me. But that would just mislead people since those bennifits are only in very specialized cases. If you would hold the same metric for llano we might even see >30% increases due to the Div unit in very specific calculations. Even BD would probably show above 50% using integer SSE. besides it being completely useless in metric, it would give nice excel bars to brag with .

Hay what BD Show me one for sale . There is no BD . If you insist there is . Great . So stop trying to compare a die thats not here to a die that is . Lets do it like this BD not around Vs IB not around that is the more fair comparison . We have all seen running IB .
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
So every app that will use FMA ipc of BD improves also? neet! Or is that way of thinking only for intel products? since you seem to be able to make a big distinguish between them in the same post... it almost make it seem deliberate that you do that distinguishment.

Nope that perfectly reasonable to me .
 

jones377

Senior member
May 2, 2004
461
64
91
It's very simple.... On Linpack it will look something like this

8c Bulldozer: 1x256 FMA per module per clock = 8 DP FLOPS/clock/module * 4 Modules @ 3GHz = 96GFLOPS
4c SandyBridge: 2x256 per core per clock = 8 DP FLOPS/clock/core * 4 cores @ 3GHz = 96GFLOPS

PEAK FLOPS per cycle will be the same between a 8 core Bulldozer and a 4 core SandyBridge, when Bulldozer uses FMA otherwise SB will be 2x higher. The difference will be in the clockspeed and efficiency areas. SB up to 3.4GHz + Turbo currently, BD unknown as of yet.
 

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
BS AMD did a 64+64=128 not so good . The only fools in these threads are the ones who bought into AMDs fairytales

You are making (false) assumptions out of thin air and apparently
with absolutely no knowledge about BD FPUs...

Code doesn't have to be rewritten to support AVX in order to take advantage of the Flex FP, however. The Flex FP is comprised of two 128-bit FMAC units capable of performing FMAC, FADD, or FMUL instructions per cycle, either combined as a 256-bit instruction or split as two 128-bit instructions, which apparently provides significantly higher performance than 'competing solutions.'

http://www.bit-tech.net/news/hardware/2010/10/28/amd-unveils-flex-fp/1
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Ya so tell me what desktop programms use FMA, All these developers are going to develop FMA for server side for a cpu that has less than 7% of the server market . Don't hold your breath.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
indeed SB doesn't do FMA at all. at least BD has some form of AVX support. Altough they do not get a throughput advantage for AVX, they do with FMA however.



So every app that will use FMA ipc of BD improves also? neet! Or is that way of thinking only for intel products? since you seem to be able to make a big distinguish between them in the same post... it almost make it seem deliberate that you do that distinguishment.

When averaging you can't take the heights and the lows? so the highest difference is for you the ipc difference? fine by me. But that would just mislead people since those bennifits are only in very specialized cases. If you would hold the same metric for llano we might even see >30% increases due to the Div unit in very specific calculations. Even BD would probably show above 50% using integer SSE. besides it being completely useless in metric, it would give nice excel bars to brag with .

You are making (false) assumptions out of thin air and apparently JF writes a blog and he isn't trying to sell AMD LOL. Let me know when BD shows in june. JF said it would be here in june .
with absolutely no knowledge about BD FPUs...



http://www.bit-tech.net/news/hardware/2010/10/28/amd-unveils-flex-fp/1

OK so you tell me all about how AMds FPUs Vs Intel . This should be good . I Mean really good. So JF blog . When BD shows in June I will take JF more seriously .
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
It's very simple.... On Linpack it will look something like this



PEAK FLOPS per cycle will be the same between a 8 core Bulldozer and a 4 core SandyBridge, when Bulldozer uses FMA otherwise SB will be 2x higher. .

It will be 2X higher if AVX is used for 100% of the code,
so with the soft highly optimised for SB and in case it s not
optimised for FMA capabilities...

That s quite a "fair" comparison but it will surely please
the intel afficionado ego..

But what happens if only regular 64b or 128b code is fed
to the competing CPUs.?..;)

Edit :

Btw ,yet another rumour...

http://tech2.in.com/news/motherboards/computex-2011-asus-may-show-amd-bulldozer-in-action/222052
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Anand said:
Just above Llano we will have the long awaited Bulldozer CPU. AMD originally wanted to launch Bulldozer at Computex but performance issues with its B0 and B1 stepping chips pushed back the launch. Now we're looking at a late July launch with B2 silicon, but performance today is a big unknown. Apparently the performance of B1 stepping silicon doesn't look too good.


What am I missing? :confused: Where is this posted...?
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Yea, I just found it. So many Computex articles :twisted:


Well if Anand is saying it, it must be true. BD is delayed... again.

At least they seem to have gotten Llano out the door. Llano is going to be the real money maker in 2011, and always was. It is discouraging though. They can't keep riding K8/K10 forever.
 

jones377

Senior member
May 2, 2004
461
64
91
It will be 2X higher if AVX is used for 100% of the code,
so with the soft highly optimised for SB and in case it s not
optimised for FMA capabilities...

That s quite a "fair" comparison but it will surely please
the intel afficionado ego..

But what happens if only regular 64b or 128b code is fed
to the competing CPUs.?..;)

Edit :

Btw ,yet another rumour...

http://tech2.in.com/news/motherboards/computex-2011-asus-may-show-amd-bulldozer-in-action/222052

NO

BD has 2x128 bit FMA pipes that can be ganged together for 1x256bit FMA when using AVX. But the overall peak FLOPS is the same in SSE and AVX excluding FMA (which needs a recompile just like AVX does.

Running a legacy SSE binary for Linpack, Bulldozer will have the same peak FLOPS per cycle as all the previous gen CPUs from Phenom and up for AMD and Core2 and up for Intel. Yes I do expect Bulldozer will have a higher efficiency (and possibly higher clockspeeds) but in Linpack we're already at about 90% of peak FLOPS. There is not much room to improve there. Running a Linpack compiled for AVX and FMA, 8C Bulldozer should get about the same FLOPS/cycle as 4C Sandy Bridge using AVX (see my earlier post). These are facts that are indisputable. It's simple math based on execution resources.

For normal codes it's a whole different ballgame however. BD can do 2*128bit FADD or 2x128bit FMUL or any 2x combination there of per cycle on legacy SSE code (the majority of apps today). PhII, Sandy, Nehalem etc can only do 1*128bit FADD + 1*128bit FMUL, meaning it's not as flexible but has the same peak as Bulldozer. We have yet to see how much impact this will have on regular applications.

There is a reason why until Bulldozer all x86 FPUs have one FADD pipe and one FMUL pipe. Those are the most common FP operations in codes and tend to have a fairly even split between them, usually not exactly 50/50 though. Linpack is the exception, which is why it will also get almost a 100% boost from FMA not only on x86 but on POWER, IPF etc.

Please give this some thought before responding in your usual manner.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Hay guys my wife just read this and she says I riding you AMD guys to hard. But hay you got llano and for what its does its going to good for AMD. BD will be Ok to . Actually as a hardware forum We got the best guys around, Were not beyond 3D thats great place also. But as a whole were better than XS [H] X--bit. I won't say Toms . There a good bunch . But anyway I sorry I pushing your bottons so hard. But thats what we amd Intel NV (xATI) guys do . I know I don't bring alot to this forum but what the heck. I like that ya kinda sorta put up with me . I have a lot of fun with that. I sure BD will be a really good cpu once GF get the process down .

But I got to tell you something that just drives me insane with you AMDers YOU always want to compare to Intels present generation .

To me thats not even an plausable. Your beat befor the game is afoot. Tick tock will not and cannot work for AMD they haven't the resources
BD is just to late they needed it 09. AMD is fighter but by fighting against intel , and worked with them it would go much better. After all Intel knows they need AMD . Its AMD that just doesn't get. You get more bees with nector than vinager,

I just got to say . Have any of ya really thought about Intels 22nm . If whats been reported is true or understated do ya really get it. Its over AMD has to stop with the vinager. It wasn't long ago I invisioned and I stated it here that Intel would produce a cpu with AMD(ATI) APU. That could still happen but AMD has to change and change fast. If what I have heard as you have 22nm is in fact true. The IGP on intel IB isn't what rumored to be . That would make no sense. If power reduction is as stated that allows a 16 core 32thread IB . Thats insane Intel AMD need to tag tean arm . Intel has to stop making IGP . and use AMd design team and everyone that really counts wins as far as this cpu forum is concerned . Everyday that passes that vision is fading. I felt sure thats how it would turnout . ATI has great tech. AMD happy Intel happy.Of course AMD would need a cpu as brand name also to maintain 2 horse race.

But any way sorry for pushing . I try to eas off but its so dam-----n fun. You guys are a really good bunch , But stay out of O/T & PN that straightjacket territory there.

A good example of just how good ya are . Go read the BD delayed thread @ XS. Just read it . Were way better informed and artistic.LOL.

Why did intel go from 1156 to 1155 on sandy . Thats the debate or partly . SOC wouldn't have anything to do with it would it . Than some fools tring to stick SBs in generation 1 board.
Why would a desk top PC every need 50gb memory . Will all know that shortly.
I mean honestly that thread is = to a thread with nothing but nemesis 1 and tweakboy . I not kidding .

But anyway sorry for the hard time BD will be just fine it be like whine needs to age a bit.
 
Last edited:

GlacierFreeze

Golden Member
May 23, 2005
1,125
1
0
Someone has no life. Wife saying you ride AMDers too hard, is just her way of saying you need to pay her more attention and not some anonymous posters. :awe:
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
BS AMD did a 64+64=128 not so good . The only fools in these threads are the ones who bought into AMDs fairytales

What do you mean? 2 64b doubles in a SIMD register or the processing of 128b as 2 64b executions? The latter has been replaced by 128b processing (2 64b double FMUL and 2 64b double FADD per cycle) with K10 a couple of years ago.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Give it a rest . SO AMD never used K9 but did a K8 thats a jump over , You been lead like a little lamb. I never seen anything good in amds design . AVX is a joke on BD 128 +128 = 256 vs intels 256 . all the talk and there really isn't anything there, IPC per core will be slower period. As AVX doesn't count yet as none have figured it into SB IPC over generation 1 . SB is at least 20 faster than generation 1 or it wouldn't be Kicking intels highest end cpu around . That has 2 more cores . Ya I know its slower in multi threads but not by much . Yet the losers still use the 15% figure its really amusing . TO top it off ya blame GF that is rich . The fabs moved out of germany and hired all new people . Really .
There were cancelled 6-wide and 8-wide designs. Who cares?

What is the joke in doing e.g. FADDs as 128+128 or 256 vs. doing 128 (1x) or 256? It's a tradeoff as many other design decisions. Same for Intel. BTW when using FMAs IPC (instructions per clock) might even go down a bit because less instructions are needed to do the same calculations. Sure, throughput would go up.

Soon we'll see your +(20%+) performer:
intel_roadmap_ivy_bridge_1.png

http://www.xbitlabs.com/news/cpu/di...t_Generation_Ivy_Bridge_Processors_Slide.html

And GF's fabs in Dresden look like being alive and kickin' (or do you see a moving truck there?):
5470990_7ed30c7a58_m.jpeg

http://citavia.blog.de/2011/04/05/a...onstruction-work-at-globalfoundries-10934679/
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
You missed the point, He was blaming GF So I said what did they movie the german fabs and hire all new people . Meaning Its the same bunch that was at AMD already befor they sold fabs, Thats all . AMD bit alot off on this new cpu , in process in more than one way and a new arch . Thats a big bite. and a hard trick to pull off.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Also get use to IB hype don't worry tho it won't be 2 years of hype just 6 months if thats a comfort.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Someone has no life. Wife saying you ride AMDers too hard, is just her way of saying you need to pay her more attention and not some anonymous posters. :awe:

LOL ya this 60 year old guy has no life. I have lead a very exciting adventured filled life thank you very much.
 
Status
Not open for further replies.