Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 95 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

purefun1965

Member
Dec 23, 2009
109
0
76
While they may know what you are posting, the next question is, are they actively trying to find out who you are, then doing this behind the scenes...
;)

Then again, you could be JF-AMD as well... :hmm: :ninja: :biggrin:

They know who i am and they arent upset. I havent said anything yet.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Sorry people no takers. Thought so lol!
You could use what is assumed to be JF's secret information transport protocol (SITraP):

Someone asks a yes/no question.

You answer:
a) Yes, if that's the right answer and you're allowed to talk about it.
b) No, if that's the right answer and you're allowed to talk about it OR if you're not allowed to talk about it but there is a need to prevent additional rumours.
c) No comment, if the answer is "Yes" and you're not allowed to talk about it. ;)
 

psolord

Golden Member
Sep 16, 2009
1,976
1,207
136
But.. who cares?

Remember back in the day, when having the fastest CPU actually meant something? When going with an Athlon meant you could get a couple more fps in games at real resolutions(or, more to the point, we were actually gaming at 1024x768 in those days).

Now its like... I don't really care that much. I mean, it is marginally useful for photo editing and stuff, but... it's not particularly exciting. I sometimes edit photos on my 1.86GHz Core 2 Duo. We're talking about a system with 1/8th the performance of 8-core bulldozer, and it still works fine, just not quite as smooth.

There has never been another time when a system could be 1/8th the speed of another and still be able to perform the same tasks, with such little sacrifice in "use" performance. The difference in the experience between having a 300MHz Pentium 3 and a 3GHz Pentium 4 is astronomical. For 90% of people, a Core 2 Duo with an SSD will feel faster than Bulldozer with a HDD, even if the Bulldozer scores 10 times higher in Handbrake.

Am I the only one that finds PC hardware to be less exciting now than it was 5 years ago? And the consoles have made the PC graphics world kinda boring as well.

You would be absolutely right, if you hadn't mentioned gaming at all.

CPU performance can and will affect performance in many cases. I am not talking about going from 100fps to 150fps with a faster cpu. I am talking about gaming below 60fps and how the CPU can affect that.

I did some quick testing on Witcher 2 yesterday (bench location=camp at the beginning of the game) and found out the following.

i7 860 stock + GTX 570@850 Mhz
2011-05-18 19:15:01 - witcher2
Frames: 2454 - Time: 72728ms - Avg: 33.742 - Min: 14 - Max: 43

i7 860@4Ghz + GTX 570@850MHz
2011-05-18 19:26:39 - witcher2
Frames: 3649 - Time: 76784ms - Avg: 47.523 - Min: 29 - Max: 61

Also their respective cpu, gpu and framerate graphs

i7 stock




i7 4Ghz



What we see here, is a 41% performance difference, which is linear with the cpu clock increase. This is, cpu affecting gaming in its finest!

Same behavior has been witnessed in a few other games as well, Arcania being one of them. So from a gamer's point of you, a Bulldozer could be a very exciting product, if that 50% performance gain claim over the i7 950, is anywhere near true.

From the attached cpu usage above, we can see that the i7 is barely sweating and yet the framerate of the game is well below 60fps. For me this is unacceptable and unplayable. The reason for this, is too technical for my understanding, but bad programming jumps to mind. Also the lack of DX11 render path may mean that there is not multithreaded rendering going on in Witcher 2 and thus this as far as an i7 can go.

Now I don't know if Bulldozer can somehow help with this, so I'd like some input from more experienced users.

My wishful thinking, along with my limited understanding, says that Bulldozer with its 8 cores and 4 256bit FPUs, could break down existing SSE 128bit FPU commands in such a way, that it could execute more commands per clock, hence giving a crazy performance boost. If something like that could happen, we may see cases where even a dual module/quad core 4110 would be faster than a 860 and Witcher 2 could be such a case.

Am I right or wrong here? Have I gotten things completely wrong? :p Anyone?
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
But.. who cares?

Remember back in the day, when having the fastest CPU actually meant something? When going with an Athlon meant you could get a couple more fps in games at real resolutions(or, more to the point, we were actually gaming at 1024x768 in those days).

Now its like... I don't really care that much. I mean, it is marginally useful for photo editing and stuff, but... it's not particularly exciting. I sometimes edit photos on my 1.86GHz Core 2 Duo. We're talking about a system with 1/8th the performance of 8-core bulldozer, and it still works fine, just not quite as smooth.

There has never been another time when a system could be 1/8th the speed of another and still be able to perform the same tasks, with such little sacrifice in "use" performance. The difference in the experience between having a 300MHz Pentium 3 and a 3GHz Pentium 4 is astronomical. For 90% of people, a Core 2 Duo with an SSD will feel faster than Bulldozer with a HDD, even if the Bulldozer scores 10 times higher in Handbrake.

Am I the only one that finds PC hardware to be less exciting now than it was 5 years ago? And the consoles have made the PC graphics world kinda boring as well.

Don't worry. Give it time. The hardware manufacturers will be sure to supply enough bloated code (out of the kindness of their hearts to help out the poor software devs) to slow lesser procs and video cards to a crawl. Then you won't feel so bad buying that shiny new hardware. :p

Actually, just being overly cynical. I'm sure that devs will find some real use for all that computing power.
 

ydnas7

Member
Jun 13, 2010
160
0
0
My predication

Bulldozer will match the Ivybridge shrink of i7-2600k in highly threaded scenarios (ie rendering)

Bulldozer will match late model Lynnfield/Bloomfiled I7 in lowly threaded scenarios (ie gaming)

so in general, it will be much better or much worse than Sandybridge. Thats what you get for squeezing twice as many cores into the same die area/ xtor budget.

The performance profiles of AMD and Intel are diverging. like when Intel was using Netburst, but this time AMD will be king of both core count and frequency, vs Intel king of latency, IPC and thermals.
 
Last edited:

Riek

Senior member
Dec 16, 2008
409
14
76
My predication

Bulldozer will match the Ivybridge shrink of i7-2600k in highly threaded scenarios (ie rendering)

I don't see how highly threaded fpu oriented applications will be an advantage for BD? For integer workloads yes, fp workloads will be difficult due to the shared nature of the fp and the width. (its already a miracle that a 4M BD can be close to a 6core gulftown in those applications.)


Bulldozer will match late model Lynnfield/Bloomfiled I7 in lowly threaded scenarios (ie gaming)

so in general, it will be much better or much worse than Sandybridge. Thats what you get for squeezing twice as many cores into the same die area/ xtor budget.

The performance profiles of AMD and Intel are diverging. like when Intel was using Netburst, but this time AMD will be king of both core count and frequency, vs Intel king of latency, IPC and thermals.

frequency? Thermals? latency? ipc?
The only sure thing is cores.
ipc we don't know.
frequency? if rumours are correct, clock is lower then current SB, gulftown, pehnomII.
thermals? we don't know. (both will come in 95W TDP and 125/130W TDP)
latency/? i have no id what you mean by that :p
 

Abwx

Lifer
Apr 2, 2011
11,061
3,722
136
JFAMD clearly said that 4 Module 8 Core Bulldozer will have 50% more THROUGHPUT (not performance) with 33% more cores.

Since he s on the server dpt, he talks about Integer computation.

As for FP, wich is heavily used by CB11.5 , perfs will be +80%...
 

purefun1965

Member
Dec 23, 2009
109
0
76
You could use what is assumed to be JF's secret information transport protocol (SITraP):

Someone asks a yes/no question.

You answer:
a) Yes, if that's the right answer and you're allowed to talk about it.
b) No, if that's the right answer and you're allowed to talk about it OR if you're not allowed to talk about it but there is a need to prevent additional rumours.
c) No comment, if the answer is "Yes" and you're not allowed to talk about it. ;)

got a chuckle! I have been reading you for years im a big fan:awe:
 

chihlidog

Senior member
Apr 12, 2011
884
1
81
OK purefun, as someone who is trying to hold out for Bulldozer specifically to build a gaming rig, tell me this, based on what you've seen, is it worth it for me to wait or to just go ahead with a 2500k? Price isnt a huge concern, I'd even pay a small premium to go with AMD if they brought the goods compared to the 2500k.

Are you allowed to respond to that in any way?
 

RyanGreener

Senior member
Nov 9, 2009
550
0
76
OK purefun, as someone who is trying to hold out for Bulldozer specifically to build a gaming rig, tell me this, based on what you've seen, is it worth it for me to wait or to just go ahead with a 2500k? Price isnt a huge concern, I'd even pay a small premium to go with AMD if they brought the goods compared to the 2500k.

Are you allowed to respond to that in any way?

I wonder if he can respond to this, as it basically gives away the performance, but I believe in one of his older posts he said that it won't be the 2600k, but that was awhile ago when he had an older stepping/older BIOS.
 

nonameo

Diamond Member
Mar 13, 2006
5,949
3
76
JFAMD clearly said that 4 Module 8 Core Bulldozer will have 50% more THROUGHPUT (not performance) with 33% more cores.

Ah yeah, I thought about that well after I had posted... but isn't performance just a more generic name for throughput? I mean, isn't throughput just another measure of performance?

:)
 

podspi

Golden Member
Jan 11, 2011
1,965
71
91
Ah yeah, I thought about that well after I had posted... but isn't performance just a more generic name for throughput? I mean, isn't throughput just another measure of performance?

:)

It is one dimension of performance, but is not the end-all.

For example, a dual-core Sandy Bridge i3 is actually faster than a 980X in some singlethreaded benchmarks, but (of course) loses in throughput.

This is also the same idea behind those super-dense Atom servers that are supposedly going to be the rage. Very slow singlethreaded performance, but since you have thousands of threads, throughput is very high.

It is looking more and more like Bulldozer is a high-throughput, integer-oriented design. I am hoping they didn't sacrifice too much on the singlethread performance, just because that means the chips really won't be suitable for desktop use...
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Ah yeah, I thought about that well after I had posted... but isn't performance just a more generic name for throughput? I mean, isn't throughput just another measure of performance?

:)

Actually the reference to throughput could mean nothing more than the memory bandwidth (or cache bandwidth) has increased 50%.

It really is a well-obfuscated term in computer science and the vernacular is applied, correctly too, as an adjective to describe a wide swath of underlying microarchitecture features.
 

Soleron

Senior member
May 10, 2009
337
0
71
Actually the reference to throughput could mean nothing more than the memory bandwidth (or cache bandwidth) has increased 50%.

It really is a well-obfuscated term in computer science and the vernacular is applied, correctly too, as an adjective to describe a wide swath of underlying microarchitecture features.

I think he means performance on server workloads with very high numbers of threads. What he's trying to avoid by saying "throughput" is people taking the numbers to mean client workloads or anything that doesn't fully use all of BD's threads and FP units.

Quoting JF (long but I didn't want to omit and quote out of context):

First: There is only ONE performance number that has been legally cleared, 16-core Interlagos will give 50% more throughput than 12-core Opteron 6100. This is a statement about throughput and about server workloads only. You CANNOT make any client performance assumptions about that statement.

Now, let's get started.

First, everything that I am about to say below is about THROUGHPUT and throughput is different than speed. If you do not understand that, then please stop reading here.

Second, ALL comparisons are against the same cores, these are not comparison different generations nor are they comparisons against different architectures.

Assume that a processor core has 100% throughput.

Adding a second core to an architecture is typically going to give ~95% greater throughput. There is obviously some overhead because the threads will stall, the threads will wait for each other and the threads may share data. So, two completely independent cores would equal 195% (100% for the first core, 95% for the second core.)


Looking at SPEC int and SPEC FP, Hyperthreading gives you 14% greater throughput for integer and 22% greater throughput for FP. Let's just average the two together.

One core is 100%. Two cores are 118%. Everyone following so far? We have 195% for 2 threads on 2 cores and we have 118% for 2 threads on 1 core.

Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

Running 2 threads on the same module is ~180%.

You can see why the strategy is more appealing than HT when it comes to threaded workloads. And, yes, the world is becoming more threaded.

Now, where does the 90% come from? What is 180% /2? 90%.

People have argued that there is a 10% overhead for sharing because you are not getting 200%. But, as we saw before, 2 cores actually only equals 195%, so the net per core if you divide the workload is actually 97.5%, so it is roughly a 7-8% delta from just having cores.

Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT. But ultimately that is not how people jusdge these things. Having 5 people in a car consumes more gas than driving alone, but nobody talks about the increase in gas consumption because it is so much less than 5 individual cars driving to the same place.

So, now you know the approximate metrics about how the numbers work out. But what does that mean to a processor? Well, let's do some rough math to show where the architecture shines.

An Orochi die has 8 cores. Let's say, for sake of argument, that if we blew up the design and said not modules, only independent cores, we'd end up with about 6 cores.

Now let's compare the two with the assumption that all of the cores are independent on one and in modules on the other. For sake of argument we will assume that all cores scale identically and that all modules scale identically. The fact that incremental cores scale to something less than 100% is already comprehended in the 180% number, so don't fixate on that. In reality the 3rd core would not be at 95% but we are holding that constant for example.

Mythical 6-core bulldozer:
100% + 95% + 95% + 95% + 95% + 95% = 575%

Orochi die with 4 modules:
180% + 180% + 180% + 180% = 720%

What if we had just done a 4 core and added HT (keeping in the same die space):
100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%

What about a 6 core with HT (has to assume more die space):
100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%

(Spoiler alert - this is a comparison using the same cores, do NOT start saying that there is a 25% performance gain over a 6-core Thuban, which I am sure someone is already starting to type.)

The reality is that by making the architecture modular and by sharing some resources you are able to squeeze more throughput out of the design than if you tried to use independent cores or tried to use HT. In the last example I did not take into consideration that the HT circuitry would have delivered an extra 5% circuitry overhead....

Every design has some degree of tradeoff involved, there is no free lunch. The goal behind BD was to increase core count and get more throughput. Because cores scale better than HT, it's the most predictable way to get there.

When you do the math on die space vs. throughput, you find that adding more cores is the best way to get to higher throughput. Taking a small hit on overall performance but having the extra space for additional cores is a much better tradeoff in my mind.

Nothing I have provided above would allow anyone to make a performance estimate of BD vs. either our current architecture or our compeition, so, everyone please use this as a learning experience and do not try to make a performance estimate, OK?
 

JFAMD

Senior member
May 16, 2009
565
0
0
Should be even more since they also claim 82% more perfs
in the FP area , also with 33% more cores .

A score of about 11 would be more in line with such claims,
Cinebench using surely a lot of FPUs ressources..

AMD never made an FP-specific quote.


JFAMD clearly said that 4 Module 8 Core Bulldozer will have 50% more THROUGHPUT (not performance) with 33% more cores.

Yes, but that is a server statement. Please keep that in mind.


While they may know what you are posting, the next question is, are they actively trying to find out who you are, then doing this behind the scenes...
;)

Then again, you could be JF-AMD as well... :hmm: :ninja: :biggrin:

No, I only post under my name.
 

Hard Ball

Senior member
Jul 3, 2005
594
0
0
I don't see how highly threaded fpu oriented applications will be an advantage for BD? For integer workloads yes, fp workloads will be difficult due to the shared nature of the fp and the width. (its already a miracle that a 4M BD can be close to a 6core gulftown in those applications.)

That will be highly dependent on the fraction of mem references within the instruction stream; so you will need to analyze specific compiled applications as well as know more specific details about the microarchitecture than what is available in the public domain right now.
 

Hard Ball

Senior member
Jul 3, 2005
594
0
0
Actually the reference to throughput could mean nothing more than the memory bandwidth (or cache bandwidth) has increased 50%.

It really is a well-obfuscated term in computer science and the vernacular is applied, correctly too, as an adjective to describe a wide swath of underlying microarchitecture features.

That is very true;

The term "throughput" by itself means very little in terms of actual work (in the mathematical sense) gets done by a pipeline. Most importantly, you will need to specify the fractions of the major instruction formats/types in the machine code. Something like 35% ALU, 15% FP, 25% MEM, 20% BR/JMP, and another 5% miscellaneous is the minimum you would need to get any semblance of a real metric. Without knowing that sort of basic assumption of the instruction stream, you would not even know how many entries for the Instr Queue vs. ROB (most often 4:3 or 5:4) to go into a design.

I'm always very amazed how many people can come to such specific conclusions about certain performance metrics of these uarchitectures (in threads like this one), given how little precise information there is in the public domain.
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
JF said:
Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT.
Exactly!!!! This should stick it to all the Intel fanboys! Just because I'm shameless, I'll kick them while they're down.

Intel processors will never be as good as AMD processors. A very wise and literate forum member revealed AMD's secret indirectly. AMD's green is just a better color than Intel's blue. Blue light has a shorter wavelength meaning more energy. Since the color seen is more or less the absence of a specific color, Intel's blue has to absorb all that high energy light. All this extra energy is the reason why Intel processors run so hot. AMD's green has a much milder wavelength, absorbing less energy allowing it to stay cool.


Lol, 36% hit from HTT. What a joke. Maybe in Windows 95 running Linpack. Remember us fanboys use consumer processors with consumer applications. There is a reason why you never hear Power7 on this forum. Anandtech as a whole cares more about i7s and Phenoms instead of Xeons and Opterons.

I'm not saying that the server space isn't important. Just remember where you are at, before you start pointing fingers and calling fanboy.
 
Last edited:

videoclone

Golden Member
Jun 5, 2003
1,465
0
0
Looks like the pricing was correct and we have 2 weeks left until we see the goods at E3 on the 7th of June YEY
 

Elixer

Lifer
May 7, 2002
10,376
762
126
Looks like the pricing was correct and we have 2 weeks left until we see the goods at E3 on the 7th of June YEY

This is going to be the longest 2 weeks ever.
Then again, we don't really know for sure what will be shown there, it could just be a dog and pony show.
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
This is going to be the longest 2 weeks ever.
Then again, we don't really know for sure what will be shown there, it could just be a dog and pony show.

Sony is releasing a new mirrorless APS C camera, and there should be some new Battlefield 3 multiplayer footage, so it won't be all bad.
 
Status
Not open for further replies.