Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

zebrax2 · Feb 25, 2011

Anyone here knows if HT increases power consumption and if it does by how much? It just came to me that the new turbo could possibly their answer to HT from a performance/power point of view.

Mopetar · Feb 26, 2011

zebrax2 said:
Anyone here knows if HT increases power consumption and if it does by how much? It just came to me that the new turbo could possibly their answer to HT from a performance/power point of view.

It does, but not by terribly much. AMD's answer to HT is their use of CMT. The higher turbo is most likely an answer to their (assumed) lower IPC.

itsmydamnation · Feb 26, 2011

Mopetar said:
It does, but not by terribly much. AMD's answer to HT is their use of CMT. The higher turbo is most likely an answer to their (assumed) lower IPC.

HT can add a ton of heat/power based on the workload, but thats because your getting a ton more thoughput, during summer i disable HT and turn it back on in winter (average summer temp 30C, avg winter temp 0C ).

Lonbjerg · Feb 26, 2011

bryanW1995 said:
btw, for anybody who is wondering, I'm a gemini.

Oops, wrong forum...

Actually...your not, there are 13 signs..ignored by astrologers...as they ignore reality.

zebrax2 · Feb 26, 2011

Mopetar said:
It does, but not by terribly much. AMD's answer to HT is their use of CMT. The higher turbo is most likely an answer to their (assumed) lower IPC.

The way i look at it is HTs purpose was to make a single core do more work which is what is AMDs doing with their new turbo by increasing clocks based on TDP. CMT for me is just a way of minimizing space consumed on the die of individual cores. I'm basing this on the purpose of the feature rather than why was it made/used. That is my opinion anyway.

HW2050Plus · Feb 26, 2011

Mopetar said:
Unfortunately, no one has a job that demands that they use their computer to produce excellent SPEC results so it's not as useful as various video rendering or photoshop benchmarks for most professionals. I don't mean to say that SPEC is utterly useless, but regardless of how well a CPU does under that benchmark, I'm going to choose the one that best suits my needs based on the applications I most commonly use.

Yes but I do it with the SPEC results. So if I wanna do e.g. video rendering I will take the SPEC result for that application (PovRay). Or I take the H.264 video encoder results if I wanna do that or the zip-results if I am interested in compression.

The SPEC set of applications is a bit wider than that of Anandtech suite, e.g. it includes also chess engines and in general a wider application range with the exception of games but those games are as we know GPU limited anyway.

The difference between SPEC and e.g. Anandtech is that the application set of the 30 applications contained in SPEC CPU is unbiased and any other set of applications is accidently or willingly biased. That is basically why SPEC organization was created to get unbiased application results.

I agree with you that if you hit exactly an application and it's version with another benchmark set than it will be perfectly accurate for that use (but for this application and version only). But to get a general comparison between different CPUs the other sets are more or less inaccurate. And if you hit the application in SPEC, than SPEC is even more perfect for you.

bryanW1995 · Feb 26, 2011

VirtualLarry said:
You sure about that? I remember differently.

What about people that represent that they are from companies, and they get a special member title, indicating that they are from a company.
Last I heard, that requires some sort of official company confirmation to get that title.

keysplayr was a very special case, iirc he volunteered to put the "focus group member" info in his sig. I was not aware that this was a focus group requirement, however. however, don't take my word for it, go ask him.

Bearach · Feb 26, 2011

http://forums.nvidia.com/index.php?showuser=29408&f=0 According to this person he must have a Focus Group signature, as that is part of their rules. But could be wrong.

What is this mysterious NVIDIA Focus Group, you ask, and how do we interact with NVIDIA? Good questions. We're a very small team of forum users that receives information, hardware and software from NVIDIA. We don't hide our identity. As a matter of policy every NVIDIA Focus Group member (were often called the forum champs) must display the disclaimer that I currently display in their forum signature.

bryanW1995 · Feb 26, 2011

That change was made a few years ago iirc. Either that or Rollo was a rogue operative who thumbed his nose at all the rules yet still maintained enough clout to get NV to convince anand to reinstate him. I'm going with A.

Nemesis 1 · Feb 26, 2011

Lonbjerg said:
Actually...your not, there are 13 signs..ignored by astrologers...as they ignore reality.

0/T Also 13 months in a real year . Lunar month is 28 day. How many days in a week 7 7divided into 28 = 4 weeks . Romans new exactly what they was doing when they changed that . Screwed us out of 1 months pay ALL of us. This is old old news to history buffs . but it seems to be new news to rest of world .

CTho9305 · Feb 26, 2011

HW2050Plus said:
Yes but I do it with the SPEC results. So if I wanna do e.g. video rendering I will take the SPEC result for that application (PovRay). Or I take the H.264 video encoder results if I wanna do that or the zip-results if I am interested in compression.

The SPEC set of applications is a bit wider than that of Anandtech suite, e.g. it includes also chess engines and in general a wider application range with the exception of games but those games are as we know GPU limited anyway.

The difference between SPEC and e.g. Anandtech is that the application set of the 30 applications contained in SPEC CPU is unbiased and any other set of applications is accidently or willingly biased. That is basically why SPEC organization was created to get unbiased application results.

I agree with you that if you hit exactly an application and it's version with another benchmark set than it will be perfectly accurate for that use (but for this application and version only). But to get a general comparison between different CPUs the other sets are more or less inaccurate. And if you hit the application in SPEC, than SPEC is even more perfect for you.

Be careful with that. h.264 and zip may not look similar at all to a processor. Similarly, POV-ray and video rendering may look very different to a processor. Just because the tasks appear similar at a 10,000ft view ("compression", "rendering") doesn't mean that they have anything in common at all at lower levels. I would actually be shocked if there's a strong correlation between zip and h.264 performance given how different the algorithms are. Also, POV-ray is a raytracer, which has very different behavior (from the standpoint of a processor) from scanline / triangle-pipeline renderers (i.e. most real-time or near-real-time renderers) and has nothing to do with Windows Movie Maker-type performance.

edit: I see I misread, and you're not trying to use h.264 to indicate zip performance (or vice versa)... but the video rendering stuff still stands.

Cogman · Feb 26, 2011

CTho9305 said:
Be careful with that. h.264 and zip may not look similar at all to a processor. Similarly, POV-ray and video rendering may look very different to a processor. Just because the tasks appear similar at a 10,000ft view ("compression", "rendering") doesn't mean that they have anything in common at all at lower levels. I would actually be shocked if there's a strong correlation between zip and h.264 performance given how different the algorithms are. Also, POV-ray is a raytracer, which has very different behavior (from the standpoint of a processor) from scanline / triangle-pipeline renderers (i.e. most real-time or near-real-time renderers) and has nothing to do with Windows Movie Maker-type performance.

edit: I see I misread, and you're not trying to use h.264 to indicate zip performance (or vice versa)... but the video rendering stuff still stands.

zip performance MIGHT indicate H.264 performance to some extent. The problem is more that zipping a file up is pretty much completely Hard drive bound now (which makes it less then perfect).

The place that it is similar (and this is strictly speaking about x264 now) is the fact that zipping, as well as most lossless compression, heavily focus on integer instructions. So as a test of speed, you'll get somewhat similar results. The difference would come in the usage of SIMD instructions. While most lossless compression schemes could benefit from it, I doubt they would have taken to it as fast as x264 has.

You might be shocked at how well some applications can predict the performance of other applications. For the CPU, there are pretty much 3 types instructions when it comes to the performance of applications. Branching instructions, Floating point instructions, and integer instructions. So long as two applications has a similar mix of those instructions, it is going to see roughly the same performance increases and decreases from two different platforms.

Anexate · Feb 27, 2011

http://www.theregister.co.uk/2011/02/24/amd_bulldozer_core_isscc/

this article is interesting because of pics showing different parts of the CPU and the functional interconnections between them.

Mopetar · Feb 27, 2011

Seems interesting that they've doubled the L2 cache, but have decreased sizes for the L1 caches. Each core has a 16 kB cache now, although it is 4-way associative. The instruction cache is also shared between cores.

Has any information been released about the latencies for the caches? There was some expectation that the L2 would be faster on the new process, but I don't recall any confirmation of that.

BladeVenom · Feb 27, 2011

Nemesis 1 said:
0/T Also 13 months in a real year . Lunar month is 28 day. How many days in a week 7 7divided into 28 = 4 weeks . Romans new exactly what they was doing when they changed that . Screwed us out of 1 months pay ALL of us. This is old old news to history buffs . but it seems to be new news to rest of world .

On planet Earth a lunar cycle is 29.53 days, so most cultures came up with a 12 month year.

Triskain · Feb 27, 2011

Mopetar said:
Has any information been released about the latencies for the caches? There was some expectation that the L2 would be faster on the new process, but I don't recall any confirmation of that.

The L1D Cache has a latency of 4 cycles, the L2 Cache a latency of 18 cycles.

Voo · Feb 27, 2011

Cogman said:
The problem is more that zipping a file up is pretty much completely Hard drive bound now (which makes it less then perfect).

Depends on what you mean with "zipping". If you include some of the better compressing algorithms out there you become CPU bound quite easily. So in that case that wouldn't be the problem.

Cogman · Feb 27, 2011

Voo said:
Depends on what you mean with "zipping". If you include some of the better compressing algorithms out there you become CPU bound quite easily. So in that case that wouldn't be the problem.

By zipping, I mean applying the DEFLATE algorithm as it is currently applied to the .zip file standard. (in other words, none of this LZMA stuff. Just standard, zlib like compression.)

I don't include other lossless compression methods when I talk about zipping a file. If I did mean that, then I would have said "lossless compression algorithms".

You are quite right, there are several that will tax the CPU (and even memory) beyond insanity. My favorite being the PAQ compressor.

SickBeast · Feb 27, 2011

There's always going to be a use for more CPU power. Right now with my Phenom II I'm finding that I could use more single threaded performance in certain games that can only take advantage of one core.

Once programmers get around to optimizing their software, 8 core CPUs are going to be incredibly fast and powerful. There is already a great deal of software that can take advantage of pretty much as many cores as you can give it.

IntelUser2000 · Feb 27, 2011

Mopetar said:
Has any information been released about the latencies for the caches? There was some expectation that the L2 would be faster on the new process, but I don't recall any confirmation of that.

There's always a trade off. They've increased the L2 cache by 4x to 2MB, so that'll alone result in higher latency.

New processes usually bring 25% or so faster transistors at equal power consumption. If you want lower power, you'll have to lower your speed gain. Of course it can be anywhere between the two.

Another thing is a design target. If they decided to increase the operating frequency by 20%, that means everything that is synchronized with the clock will have to operate 20% faster too. The L2 cache latencies are always represented relative to the CPU clock, meaning 20% higher frequency with same 20 cycle means the L2 cache is 20% faster. That means if the CPU is designed to be operated at much higher frequency, there won't be any relative reduction for the L2 cache.

Schmide · Feb 27, 2011

IntelUser2000 said:
There's always a trade off.

I wonder if they're going to reach a 2 cycle L1 or remain at 3 cycles and have extra headroom for clock speed? It would most certainly seem that they traded size for the extra associativity and snooping vs shared resources probably played a part in the small L1 as well.

Mopetar · Feb 27, 2011

IntelUser2000 said:
There's always a trade off. They've increased the L2 cache by 4x to 2MB, so that'll alone result in higher latency.

Technically it's doubled since the L2 cache is shared by both cores on the module. Of course if only one core is running then it's essentially quadrupled.

IntelUser2000 · Feb 27, 2011

Mopetar said:
Technically it's doubled since the L2 cache is shared by both cores on the module. Of course if only one core is running then it's essentially quadrupled.

Which wouldn't matter in latency calculations.

Schmide said:
I wonder if they're going to reach a 2 cycle L1 or remain at 3 cycles and have extra headroom for clock speed? It would most certainly seem that they traded size for the extra associativity and snooping vs shared resources probably played a part in the small L1 as well.

64KB L1-I: 3 cycles
16KB L1-D : 4 cycles
2MB L2: 18-20 cycles

Mopetar · Feb 27, 2011

IntelUser2000 said:
Which wouldn't matter in latency calculations.

I know, just being pedantic

HW2050Plus · Feb 27, 2011

CTho9305 said:
Also, POV-ray is a raytracer, which has very different behavior (from the standpoint of a processor) from scanline / triangle-pipeline renderers (i.e. most real-time or near-real-time renderers) and has nothing to do with Windows Movie Maker-type performance.

Maybe you mix up video editing (Windows Movie Maker) with rendering (PovRay)? Anyway of course you look at the suited ones.

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Member

Lifer

Senior member

Lifer

Lifer

Elite Member

Lifer

Member

Diamond Member

Lifer

Member

Golden Member

Lifer

Lifer

Elite Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Member