Setting performance expectations for Bulldozer(client)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
s2011 will not be price competitive with Bulldozer. I'm sure Intel can make a really fast 8c/16t chip, but with the CPU at a thousand bucks and a motherboard at $300, who cares?

And you know this how?
 

magomago

Lifer
Sep 28, 2002
10,973
14
76
i've never read such a long thread that talked about absolutely nothing :hmm:

No offense there mate....just gotta point out that we have little to nothing to compare to
 

PreferLinux

Senior member
Dec 29, 2010
420
0
0
I don't "know" it, but Intel has been releasing high end processors at $1000 for a long time now, and I don't expect that to change.
Sure...But you don't expect AMD to do the same???:eek: I'm afraid you're badly mistaken, as they sold similarly priced CPUs when they could (Athlon FX, or something).
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
I don't "know" it, but Intel has been releasing high end processors at $1000 for a long time now, and I don't expect that to change.

If BD is beating SB (lga1155) then expect it to be priced higher. That is just smart business.

people who think amd will take the performance crown AND continue selling their chips below $200 are sadly mistaken.
 

Mopetar

Diamond Member
Jan 31, 2011
8,494
7,751
136
Single threaded performance might not be as bad as you think. If you consider the fact that with all cores running, Bulldozer is still able to turbo up to 500 MHz, it would suggest that if only a single core is being used, the turbo could be much higher, possibly over 1000 MHz. Additionally, that one core will have the entirety of the L1I and L2 caches to itself.

It may not be competitive with Intel from an IPC perspective, but there are other ways to improve performance that they can take advantage of in order to come out even or pull ahead.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
8 core Sandy Bridge will likely end up faster.

Deneb can already reach 3.5GHz with 4 cores and Thuban can do 3.3GHz. With a 32nm process and lowered gate delay it would at least be expected to reach whatever clocks they have out with their current generation parts.

Core size comparison

Bulldozer without the 2nd "Integer core": 15.0mm2
Westmere without the L2 cache: 15.3mm2
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,076
3,908
136
8 core Sandy Bridge will likely end up faster.

Deneb can already reach 3.5GHz with 4 cores and Thuban can do 3.3GHz. With a 32nm process and lowered gate delay it would at least be expected to reach whatever clocks they have out with their current generation parts.

Core size comparison

Bulldozer without the 2nd "Integer core": 15.0mm2
Westmere without the L2 cache: 15.3mm2

8 core SB will also be bigger. bulldozer looks like it will be larger then 4 core SB. who knows about 6 core SB vs 8 core bulldozer, but 8 core SB would be quite a bit larger. Also depends how intel scale L3, still keep it at 2mb per core? what would be interesting at that point (8 core sb) is that cache sze would be very close 16mb L3 vs 16mb L2/3. form the pics i have seen AMD L3 look very simlar size to its L2. it would be really good to give an idea of how much space the "other stuff" on each architecture takes up.


edit:
Wait, what kind of cache architecture does bulldozer have that that works out? xX
L1D per core
L1I per module
L2 per module
L3 per chip but distributed (like intel)
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,494
7,751
136
Wait, what kind of cache architecture does bulldozer have that that works out? xX

The L1I and L2 caches are shared in each module so both cores have access to them. Since they're most likely designed to be effective when both cores are in use, it should be beneficial for certain workloads when only one core is being used.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
The L1I and L2 caches are shared in each module so both cores have access to them. Since they're most likely designed to be effective when both cores are in use, it should be beneficial for certain workloads when only one core is being used.
Okay shared caches per module is understandable and sounds reasonable - I thought you meant the complete L1/L2 caches not just of one module.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
So why would they use L3 instead of L2 if they are the same size in both die size and capacity?

/noob question

What? They are not. The L3 caches on AMD chips are used as victim buffer and in general made for multi-core. There's an advantage having independent(or pseudo-indepent because its shared at a module level) L2 caches and shared L3.
 

HW2050Plus

Member
Jan 12, 2011
168
0
0
I am wondering if we will really see such a low clock of 3.5 GHz on the client Zambezi Bulldozer parts. It might be possible when the TurboCore is settled overly aggressive (going up to 4.5-5 GHz if other modules are switched off).

I ask that why, because the 3.5 GHz was the minimum number from ISSCC and those was meant for the server parts which are traditionally clocked lower.

By the high speed design the clock potential of Zambezi is about 4.5 GHz. To be more exact the design parameter changed from 22 to 17, means that you will have 22/17 = 30% higher clocks. And then you get some little additional boost from the process shrink on top of that.

Though we might not see this clock initially because of the ramping of the 32 nm process I also doubt that Zambezi will be stucked at a low 3.5 GHz. If we really see initial low clocked 3.5 GHz parts than it is only because it is enough for the competition so far and they want to get most out of the new process. According to leaked roadmaps there will be a 90W and a 125W part. Maybe the 3.5 GHz ist for the 90 W TDP one.

Therefore regarding those frequencies we should be careful since very little is known especially about the client Zambezi ones.

Another thing regarding performance and use of 8 core parts. The argument that a 4 core CPU does it all for you: Might be but consider the following: You are using applications utilizing all 4 cores in a 75 Watt CPU consuming 75 Watt. Or you use those 4 cores on a 8 core CPU that would consume 75 Watt on all 8 cores but using only 4 cores it consumes only e.g. 50 Watt.

Means, if you have a 8 core CPU you have either greater performance (if application makes used of more cores) or lower power consumtion (if application does not make use of more cores) compared to a 4 core CPU.
 

Riek

Senior member
Dec 16, 2008
409
15
76
8 core Sandy Bridge will likely end up faster.

Deneb can already reach 3.5GHz with 4 cores and Thuban can do 3.3GHz. With a 32nm process and lowered gate delay it would at least be expected to reach whatever clocks they have out with their current generation parts.

Core size comparison

Bulldozer without the 2nd "Integer core": 15.0mm2
Westmere without the L2 cache: 15.3mm2

faster then what? BD 8c or BD 16c?

biggest problem for SB-E will be clockspeeds. Gulftown barely makes 6core @ 3.46GHz. Add 2cores and a bucket of additional cache on the same process.. Wouldn't be suprsied if we see SB-E barely reaching 3.4GHz with all cores enabled and TDP 140W. In that case it will strongly depend on type of workload which one will be faster. (since BD would reach 3.8GHz over all cores if the base clock is 3.3GHz). BD could still have an advantage in integer workloads.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,076
3,908
136
faster then what? BD 8c or BD 16c?

biggest problem for SB-E will be clockspeeds. Gulftown barely makes 6core @ 3.46GHz. Add 2cores and a bucket of additional cache on the same process.. Wouldn't be suprsied if we see SB-E barely reaching 3.4GHz with all cores enabled and TDP 140W. In that case it will strongly depend on type of workload which one will be faster. (since BD would reach 3.8GHz over all cores if the base clock is 3.3GHz). BD could still have an advantage in integer workloads.

cache is very power lite, remember a BD module is around the same size as a SB core so if sb core can clock high there is no reason (power wise) that a 8 core bulldozer cant.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
cache is very power lite, remember a BD module is around the same size as a SB core so if sb core can clock high there is no reason (power wise) that a 8 core bulldozer cant.


errrrhhmmm....

I thought the 8 "core" bulldozers where gonna be ~294mm^2, this is from people that have die shots and a statement from AMD about a core size being 30something mm^2 and then scaleing it up to the picture, and compaireing to get a idea of how big the "total die area" of the cpu is.

So ~294mm^2 for a 8 "core", maybe half that for the 4 "core" version (if its just the same cpu cut in half), and the 6 "core" probably being the same as the 8 core, with 2 cores turned off.

Now Sandy bridges are like 216mm^2, which is clearly smaller than the 8 core bulldozers.


a BD module is around the same size as a SB core so if sb core

Maybe.. but a cpu is more than just the "core".
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,076
3,908
136
Source please.

http://semiaccurate.com/forums/showthread.php?t=4351

thats BD to westmere and not a massive amount has changed to SB core wise(AVX reuses exsisting SIMDS). remember i said core to module not SOC to SOC.

So ~294mm^2 for a 8 "core", maybe half that for the 4 "core" version (if its just the same cpu cut in half), and the 6 "core" probably being the same as the 8 core, with 2 cores turned off.
a lot of the difference here is about cache, they could go severial different ways on cache for a consumer specific SOC, if they halfed the L2/L3 or got rid of the L3 they would be much closer in die size.

the ring bus save a nice amout of space for intel, but i wonder how much performace it actually costs in the form of increased latency?
Maybe.. but a cpu is more than just the "core".

i know i never said it wasn't. but then you get a completely different SOC deisgn, in BD having almost twice the cache.