Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

itsmydamnation

Diamond Member
Feb 6, 2011
3,049
3,845
136
Ok I think we are done with the debate on6&8v4 core.

I would like to know a little more on how tessellation works.from what Ive read its basicaly like subdivisions or something like that. But I wondering if someone can explain how it works,why is so special and what its all about?

In 3 or less paragraphs per post please.

it is basicly a form of Geometry compression, there is no reason that you couldn't make the same meshes without tessalation as you do with ( except for performace). it allows a massive decrease in bandwidth requirements but then you are limited to what the hardware can do with the data that is supplied.

the issues with tessalation aren't really issues with tessalation but rather issues with small triangles on current hardware and rasterization in general. ROP's work on squares of pixels but each square can only contain a part of one triangle. So when you use really small triangles you increase waisted resouces in the ROP's. the higher the tessalation level the smaller the triangles the more waste of resources.

when you start getting sub pixel sized triangles (ie high levels 32/64) you really start to kill ROP performace for almost no reason.

NV have a performace advantage at high tessalation levels in benchmarks because there not ROP limited, make them ROP limited and all of a sudden (assuming comparable ROP hardware) performace will be identical.

Turn on wireframe on Heaven and the parts where all you can see is a sea of white and nothing else, there the points where your loosing ROP performace for almost no gain.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
btw what is the max turbo core for bulldozer ? i heard that it will be 1 Ghz+ when not all core being used and 500 Mghz when all core were busy
 

ShadowVVL

Senior member
May 1, 2010
758
0
71
is that the 8 or 16 core? if its the 8 core then damn thats sweet.

its probably a fake so i will wait and see what jf has to say before i start getting my hopes up.
 
Last edited:

itsmydamnation

Diamond Member
Feb 6, 2011
3,049
3,845
136
well from hotchips prescentatio(last year) IBM are releasing 5.2GHZ chips. 24mb l3 and 192 l4 on 45nm process. so there is no reason an AMD chip designed for higher clocks ( remember STARS INT core hasn't been touched in over 10 years, all new font end) on 32nm SOI HKMG couldn't be clocking in the range you speak. look at the l2 cache latency the only way that works(higher access cycles but lots of data) is with a high clock speed.

while you would be a brave man to put you eggs in the 5ghz basket i wouldn't make any assumptions about clock speed just yet.

interesting that IBM chip has "only" two LSU's with a instruction set that is meant to be far better at extracting IPC.
 
Last edited:

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
well from hotchips prescentatio(last year) IBM are releasing 5.2GHZ chips. 24mb l3 and 192 l4 on 45nm process. so there is no reason an AMD chip designed for higher clocks ( remember STARS INT core hasn't been touched in over 10 years, all new font end) on 32nm SOI HKMG couldn't be clocking in the range you speak. look at the l2 cache latency the only way that works(higher access cycles but lots of data) is with a high clock speed.
IBM's chips are in a whole different class with regards to size, power, and thermals, and cannot be compared to those produced by AMD and Intel.

If Intel and AMD's size, power and thermal budgets were raised, they would certainly have produced very different chips.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Ok the first one's encoding which we don't have to argue about.. you need a gigantic number of cores or a tiny sample before that runs into problems ;)

Never played Mafia 2, so can't argue with that, but I've played ME1 and 2 on a e8400 and if you're claiming that it's bottlenecked by the CPU then it's either completely not optimized for more than 2 threads (which iirc is not the case) or your quad is running at 2ghz since my good old e8400 is getting over 60fps there.
Great case where you may see a large percentual increase which is not really interesting with a refresh rate of 60Hz.


I really should look up some old posts when the dual/quad debatte was hot - should be quite funny in retrospective.

Of course that ME 1 and ME 2 will run great on a Dual Core CPU, but when you move to a Quad, you can see that the CPU utilization scales with the number of cores, probably an hexacore will have the same thing, except that the game is FPS capped and will give you no gains anyway.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,049
3,845
136
IBM's chips are in a whole different class with regards to size, power, and thermals, and cannot be compared to those produced by AMD and Intel.

If Intel and AMD's size, power and thermal budgets were raised, they would certainly have produced very different chips.

not really, intel have both X86 and IA64 chips in that size range, if you remove cache size as well the IBM chips aren't that big. also thats a 45nm process, the 32nm SOI, should be faster both because of the shrink and also because of the added HKMG.

thermals/power (the same thing really) do matter, but again we are talking about adding a 45nm to 32 HKMG to the equations. im not saying there directly comparable, but its just (if not more vaild) to look at what IBM have done with the same process then comparing AMD to intel.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
not really, intel have both X86 and IA64 chips in that size range, if you remove cache size as well the IBM chips aren't that big. also thats a 45nm process, the 32nm SOI, should be faster both because of the shrink and also because of the added HKMG.

thermals/power (the same thing really) do matter, but again we are talking about adding a 45nm to 32 HKMG to the equations. im not saying there directly comparable, but its just (if not more vaild) to look at what IBM have done with the same process then comparing AMD to intel.

As long as we are speculating, there's no reason Intel chips can't reach those frequencies either. However, it remains true the only chip capable of reaching 5GHz is Power 7. I know they also use 200W of power and cost more than $10k for the high end chip. Some even use liquid cooling to cool the chips.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
but its just (if not more vaild) to look at what IBM have done with the same process then comparing AMD to intel.
Are you aware what the Z-chips are for? Or how they are deployed, how they are cooled, and how much power they use?

Have you actually worked with a Z-series from IBM, or even just read the technical manual for it?

If you have any of those experiences, you would know that what clockspeeds IBM can get is no indication of what AMD/Intel may accomplish. You may guess all you want, and that would be well within your right, but using IBM's accomplishment for their own mainframe series as basis for what AMD/Intel can do is certainly more than just a little off.

Even if we were to totally discount the entirely different architecture, size and power, the fact that these Z-chips will be deployed in cages that come with their own modular refrigeration units (the base of the cages are actually ref units) as vital cooling (not just a dinky heatsink+fan, or radiator+fan), should tell you that these clockspeeds are in no way indicative of anything that will be deployed by Intel/AMD. It's simply no use dragging IBM into the picture when talking about mass-market products from Intel/AMD.

You could have just based your guess on what AMD/Intel has already done, and historical clockspeed increases due to process changes without drastic uarch changes, plus account for the new uarch meant for higher clocks, etc. I'd have no argument with that. Dragging IBM's mainframe chips into the picture is a whole different story entirely.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,049
3,845
136
Are you aware what the Z-chips are for? Or how they are deployed, how they are cooled, and how much power they use?
yes and clockspeed isn't the only thing in determining power requirements

Have you actually worked with a Z-series from IBM, or even just read the technical manual for it?
i haven't worked on them, i have worked in very large datacentres with them ( im a network engineer/Architect)

If you have any of those experiences, you would know that what clockspeeds IBM can get is no indication of what AMD/Intel may accomplish.
i never said bulldozer will hit 5.0ghz, but it shows what the process can do on a new design not a 10+ year old front end + int core, saying the Z draws to much power when not considering workloads isn't fair either.

You may guess all you want, and that would be well within your right, but using IBM's accomplishment for their own mainframe series as basis for what AMD/Intel can do is certainly more than just a little off.
not really,no more then using 45nm STARS or SB to guage bulldozer clock speed. You can get what 3.6 quad from amd thats front end/int cores largely haven't changed since K7. just look at the cache L2 latency change from STARS to bulldozer 15 cycles to what 18 while reducing L1 size, increased stages to the pipe line etc.

Even if we were to totally discount the entirely different architecture, size and power, the fact that these Z-chips will be deployed in cages that come with their own modular refrigeration units (the base of the cages are actually ref units) as vital cooling (not just a dinky heatsink+fan, or radiator+fan), should tell you that these clockspeeds are in no way indicative of anything that will be deployed by Intel/AMD. It's simply no use dragging IBM into the picture when talking about mass-market products from Intel/AMD.
again not really, its far more effecient to cool like this, I work on far bigger power hungry bits of kit that are fan cooled ie forwarding rates upto 92 Tbit a sec ( the system i have worked on aren't that big but still several racks) and the power allocation just for fans internal to device is over 600 watts per rack and you need very good localised cold air flow on top of that as well. while the heat load is more spread in my situation its far higher as well.

You could have just based your guess on what AMD/Intel has already done, and historical clockspeed increases due to process changes without drastic uarch changes, plus account for the new uarch meant for higher clocks, etc. I'd have no argument with that. Dragging IBM's mainframe chips into the picture is a whole different story entirely.

i just see it as another metric to consider, it gives an idea to what the process is capable of we really have no idea how limited clocks are by the exsisting uarch.

but yes i expect to be enjoying a 5.0+ GHZ OC on an 8 core consumer processor but we will have to wait and see :D

ie. im not using Z as a basis for what bulldozer can do but for a basis of what 45nm SOI could do and what we might be able to expect from 32nm SOI HKMG. hell power 7 can do over 4ghz on 45nm SOI, again not talking about the bulldozer uarch but the SOI process.

edit: its 1 am here so i dont know how coherent this post is......lol
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
again not really, its far more effecient to cool like this, I work on far bigger power hungry bits of kit that are fan cooled ie forwarding rates upto 92 Tbit a sec ( the system i have worked on aren't that big but still several racks) and the power allocation just for fans internal to device is over 600 watts per rack and you need very good localised cold air flow on top of that as well. while the heat load is more spread in my situation its far higher as well.
Well the thing is IBM is able to cool this things pretty efficiently and doesn't have the gigantic variations Intel/Amd have to live with. Custom cooling solutions do have their advantages, as well as being able to spend a considerable amount of money on those (a cooling solution that costs a few thousand bucks is a bit easier to justify if the product you're buying costs milliosn ;) )

Also don't forget that noise is really their least problem for those things - you almost need hearing protection in some datacenters.

It's just a completely different field of application, I wouldn't want to draw to many conclusions from one to the other. I mean one z196 with SC has a total chip size of about 10cm^2.
 

JFAMD

Senior member
May 16, 2009
565
0
0
btw what is the max turbo core for bulldozer ? i heard that it will be 1 Ghz+ when not all core being used and 500 Mghz when all core were busy

We have only disclosed the 500MHz all core boost number, we have not disclosed max.

is that the 8 or 16 core? if its the 8 core then damn thats sweet.

its probably a fake so i will wait and see what jf has to say before i start getting my hopes up.

I can't say anything about the data but I can tell you that I have never seen a chart in that format at AMD.
 

hamunaptra

Senior member
May 24, 2005
929
0
71
My hopes for BD:
Launch desktop speed 3.7ghz + to 4.3ghz high end...frequencies before turbo.
Or maybe...
3.5ghz clock to 4.25ghz single thread Turbo and high end 4ghz clock 4.75ghz turbo single thread.

OC ability: 5ghz+ air cooling, decent voltage. before turbo, actually turbo disabled.
OC ability under extreme cooling with matured process: hopefully 10ghz barrier broken!!!!

If the chart I posted holds true at all and is the 8core cpu at stock clocks...AMD is going to freaking OWN! thats an 8core against intel's 6core assume HT enabled.

Thats 1750 points per core for 980x
2125 per core for BD
2325 per core for SB

given clocks are unknown...yeah thats something else
But that puts it line with all speculation up to this point, faster than nehalem per core, slower than SB per core and much faster than thuban.

Take away HT on intel in those graphs and probably lose another 10-20%....
AMD YOU FREAKIN OWN IF THIS GRAPH IS ANYWHERE NEAR REAL lol!

If AMD holds overclocking value high like they do with their current products....then enthusiasts are gonna have some fun!
 

Ajay

Lifer
Jan 8, 2001
16,094
8,112
136
Hopefully we'll see some numbers, or get a sense of BD from whatever demos AMD will be putting up @ CEBIT. BD will be getting pretty close to launch, so I think AMD start priming the PR pipeline by then.
 

Mopetar

Diamond Member
Jan 31, 2011
8,438
7,634
136
- So does OpenCL run on both CPUs and GPUs?
- Virtualization on a desktop? Ya, I do it, and it's kinda cool, but isn't that what servers are for? Why not just go whole hog and run a browser over a networked VM?
- Westmere/SB, and I think BD, have hardware AES encryption. Again, it is massively faster than any general purpose CPU could ever hope to be
- Flash video? I'm glad you brought it up. You can play 1080p Youtube now on a 1.4GHz Core 2 Duo? How? GPU acceleration.

I think you'd need something at the OS level similar to Apple's GCD in order to take advantage of being able to delegate code to different chips. OpenCL is designed for GPGPU programming. The main issue there is that there are multiple competing standards right now so developers may eschew writing code for any of them in case it never gains critical market share.

You could use a thin client approach, but as soon as you lose the network you've lost browser ability. There are also problems if the servers go down or are under too much load.

Pretty sure the AES encryption is dedicated hardware specifically designed for the algorithm. Doing it on dedicated hardware is faster than using general purpose CPU computation or GPU computation.

It's not hardware accelerated in Linux and I think they only just recently added it for Macs.

Again, the most demanding tasks we do on a CPU can be ported over to the GPU or other specialized hardware or instructions(like AES, for example). Encryption, decryption, video encoding, video decoding, 3D rendering, scientific modeling and the user interface itself can all be off loaded to the GPU. . . . Look at ARM. Their A9 chips can do hardware 1080p encoding in real time. I don't think you can do that even on a multi-core CPU.
Most of those special cases are handled by dedicated hardware. Take a look at picture of the Tegra 2 found in the LG Optimus2x.
SoC.jpg


Most of the chip is dedicated hardware to deal with those tasks. I don't think the ARM CPU cores even touch a lot of those things that you listed. They don't offload it to the GPU because it's even cheaper to created dedicated hardware for some cases. It makes sense for phones as neither the CPU or GPU will be very powerful compared to traditional computers and the range of supported formats is usually small.

For a CPU it doesn't make as much sense to include dedicated hardware, since it's very dedicated and if the workload doesn't utilize it, it will never get used and essentially will be a waste of space. A CPU is also expected to be able to handle a broad range of formats. If one of these devices doesn't have dedicated hardware for WebM (or some other codec) it will have to use the CPU/GPU which will eat a lot of power. A BD/SB CPU on the other hand can easily handle it with one or two cores, even if it is much less efficient. If you thought having extra cores that might not get used was pointless, having a lot of dedicated hardware that has almost no chance of being used is even more pointless.

CPUs have to be made for general purpose. There might be some dedicated hardware for common tasks, but they can't be overly specialized. All of that dedicated H.264 hardware in the current generation of devices won't be very useful when the world moves on to the next codec.

Multiple cores are going to be good at handling whatever work load is thrown at them. They'll also be useful for things such as AI in gaming, something that GPUs would be absolutely abysmal at handling. The more cores you have, the more AI threads can be run, even if the game only uses a small amount of CPU otherwise. Look at games like Civilization V where an increase in the number of cores can result in massive performance improvements. There was a linked article in a GPU discussion where moving from 4 cores to 6 resulted in a 40% performance improvement when measuring the amount of time it took for the AI to take its turn.

Even though CPUs are very general purpose and there are many tasks that will run better on a GPU, there's always going to be a need for good general purpose computing and there are plenty of uses for more cores. Even if BD doesn't match SB performance per core, if they have more cores, some workloads are going to do better on BD.
 

Mopetar

Diamond Member
Jan 31, 2011
8,438
7,634
136
Heres something interesting, Ive only seen posted on this other forum so far but...um check out this post:
If its anywhere near real...Im freakin PUMPED!!!
http://www.amdzone.com/phpbb3/viewtopic.php?f=532&t=138369&start=25#p198909

I'm fairly skeptical.

The other problem is that it doesn't tell us anything about the BD chip being used. It could very well be a 16-core server chip that was overclocked. Great numbers, but if it's a for a $1500 chip, I wouldn't consider it very competitive with the 2600K.
 

Mopetar

Diamond Member
Jan 31, 2011
8,438
7,634
136
We have only disclosed the 500MHz all core boost number, we have not disclosed max.

That's not a bad number, considering Intel only steps up 100MHz when all cores are in use.

However, it does concern me that if you can turbo that high the regular clock rate might be lower than hoped. Then again that's not necessarily a bad thing if the IPC is really good, but most of the technical articles I've read have suggested that BD would have a long pipeline and high clock.

Hopefully we'll know soon enough. Does it ever get boring watching everyone guess and speculate while you know how good the performance is going to be?
 

hamunaptra

Senior member
May 24, 2005
929
0
71
I'm fairly skeptical.

The other problem is that it doesn't tell us anything about the BD chip being used. It could very well be a 16-core server chip that was overclocked. Great numbers, but if it's a for a $1500 chip, I wouldn't consider it very competitive with the 2600K.

I think if that were the 16core chip...AMD would be in serious serious trouble. Because if 16cores isnt even double the speed of the 6 core 980x...um then... yeah BD truely SUCKS.

I am pretty sure AMD didnt spend the last 7 years on a uarch that requires nearly 3x more cores to compete with intel LOL!

If anything this has to be some form of the 8 core BD, since BD itself is 8 cores. 16core chip is an MCM.

Well technically speaking BD refers to the individual module. But I still think of BD as the base 8 core part non MCM =P
 

Mopetar

Diamond Member
Jan 31, 2011
8,438
7,634
136
I think if that were the 16core chip...AMD would be in serious serious trouble. Because if 16cores isnt even double the speed of the 6 core 980x...um then... yeah BD truely SUCKS.

I am pretty sure AMD didnt spend the last 7 years on a uarch that requires nearly 3x more cores to compete with intel LOL!

If anything this has to be some form of the 8 core BD, since BD itself is 8 cores. 16core chip is an MCM.

Well technically speaking BD refers to the individual module. But I still think of BD as the base 8 core part non MCM =P

Look at the flip side though. If that number is accurate, it means that a BD core has better performance than a SB core w/o hyper-threading and only slightly worse performance than a SB core with hyper-threading. That of course assumes that they have a similar IPC and clock rate or that the BD chip has a worse IPC but a much higher clock rate.

Look at the quoted performance numbers that have appeared in this thread several times. 50% more performance with 33% more cores. With that in mind, the BD with 8 cores should score around 9000 based on the score for the 1100T. A score of 17000 is more than a 250% performance increase with 33% more cores. The only other explanation is that it's a benchmark that their new architecture really excels at for one reason or another and isn't indicative of general performance.

The BD score looks too good to be true. If they are true, BD is potentially more beastly than previously imagined; by a rather large amount at that.
 
Status
Not open for further replies.