First real-world benches of Barcelona?

achim71 · Sep 4, 2007

For better comparison i ran the benchmarks on an quad core opteron (Santa Rosa, 1,8GHZ) server (NUMA enabled).

http://www.momc.org/wp-content...eron-numa/multiple.jpg
http://www.momc.org/wp-content...n-numa/cinebench10.jpg

This one (run without NUMA) shows more CPU-Z info.

http://www.momc.org/wp-content...pteron/cinebench10.jpg

achim71 · Sep 4, 2007

Regs · Sep 4, 2007

Thank you Gary for the added information. You guys demonstrate excellent business practices.

classy · Sep 4, 2007

Originally posted by: keysplayr2003

Originally posted by: JEDIYoda
I see all the closet Amd supporters are out in force..lol

Click to expand...

Yes. The CPU forum will be getting very, very ugly the next few weeks. I'll have my work cut out for me.

sigh
This is a mod call out ... there will be no more of this and no further warnings

CPU moderator apoppin

Keysplayr · Sep 4, 2007

Originally posted by: classy

Originally posted by: keysplayr2003

Originally posted by: JEDIYoda
I see all the closet Amd supporters are out in force..lol

Click to expand...

Yes. The CPU forum will be getting very, very ugly the next few weeks. I'll have my work cut out for me.

Click to expand...

You'll be the main culprit, you might have to ban yourself :wine:

Is that right? Well, we'll just have to see now won't we, Classy?

ltcommanderdata · Sep 4, 2007

Originally posted by: BitByBit
None of that makes any sense.

1. There is no 'pressure' within a processor.

2. AMD's three-issue/retire architecture is not a bottleneck - there are far more factors that determine performance than issue rate alone. Beyond three issue, there are diminishing returns due to instruction parallelism.

2. Core 2 can decode more instructions per clock, but it still only has three ALUs and three FPUs - the same as K10.

3. K10 has double the intruction fetch bandwidth over Core 2, meaning its decoders are more likely to be fully utilised when executing large instructions.

4. I fail to see how you can compare or attempt to draw any parallels between the fuel/air ratio of an engine and the issue rate of a processor. The mind boggles.

Well, Merom actually has 3 ALUs, 2 FPUs, and 3 SSE units, although I believe the FPUs and SSE units share some resources so are not completely exclusive. In terms of K10 fetching instructions in 32-byte blocks, Merom is actually a little better than K8's 16-byte fetch in that it also has a 64-byte buffer that stores previous instructions where it can fetch in 32-byte blocks, useful for repetitive instructions although still not as good as K10's pure 32-byte fetch. Still when things don't fit neatly in the buffer, I do agree that Merom will have difficulty fully utilizing it's 4 decoders and issuing it's up to 5 instruction (including macro-op fusion) potential.

formulav8 · Sep 4, 2007

Originally posted by: classy

Originally posted by: keysplayr2003

Originally posted by: JEDIYoda
I see all the closet Amd supporters are out in force..lol

Click to expand...

Yes. The CPU forum will be getting very, very ugly the next few weeks. I'll have my work cut out for me.

Click to expand...

You'll be the main culprit, you might have to ban yourself :wine:

Yuck Yuck Yuck

No it isn't funny ... mod "call outs" are not tolerated.
CPU moderator apoppin

Ajay · Sep 4, 2007

Originally posted by: Viditor

Isn't it in fact the B01 stepping . That will be released on the 10 of sept. But your right its just a 1 stepping

Click to expand...

No...they are releasing stepping B2 or B3 (I believe B3). B1 was from April...
This is the reason for the 6 month delay, they've created at least 2 more steppings since then.

B2 will be the stepping shipping, B3 is at least 2 months out, IIRC.

Ajay · Sep 4, 2007

Originally posted by: Nemesis 1
I agree with you 100% Gary. So would it be safe to assume that A0 steppings of Penryn on a bran new process type and logic as well as softerware improvements in Penryn stand to gain as well? Beings how its AO stepping I would say large improvements. Or do you think intel showed their whole hand in this poker game.

Gains from whatever the the first released stepping of Penryn will likely be less significant (% wise) because the Intel isn't having any big problems with their 45nm process and the chipset infrastructure will be more mature and feature complete complete at release. Still, the clock advantage Intel will have (especially for overclockers) will likely dominate AMD's offerings for most applications

PlasmaBomb · Sep 5, 2007

Originally posted by: BitByBit

Originally posted by: PlasmaBomb
Higher RPM doesn't lead to higher compression.

Click to expand...

No, I just made it up.

"...This makes the maintenance of smoothly increasing RPM far harder with turbochargers than with belt-driven superchargers which apply boost in direct proportion to the engine RPM."
Source

Clearly.

Sorry that reply was quite brief. Boost is different to compression in the engine so we may be talking cross purposes. If by compression in the first post you meant boost, you have my apologies.

However when it comes to compression -

The dynamic compression ratio (DCR) is always lower than the static compression ratio (SCR)
The DCR does not change at any time during the operation of the engine

EDIT: To the mods and thread readers, sorry I didn't realise the statement would drag on so much...

classy · Sep 5, 2007

Originally posted by: classy

Originally posted by: keysplayr2003

Originally posted by: JEDIYoda
I see all the closet Amd supporters are out in force..lol

Click to expand...

Yes. The CPU forum will be getting very, very ugly the next few weeks. I'll have my work cut out for me.

Click to expand...

You'll be the main culprit, you might have to ban yourself :wine:

This is a mod call out ... there will be no more of this and no further warnings

CPU moderator apoppin

Instead Ill pm you

Thank-you for your explanation that it was 'in jest' ... please just no more giving the appearance of it. Yours was just 1 of 3 poster's comments that needed to be edited.
CPU Moderator apoppin

Scottie Wang · Sep 5, 2007

SUPER PI 26S for a 2G or 3G BARCELONA?
CONFUSED...

Originally posted by: Gary Key

I've never heard of any case whereby features become active at certain clock speeds. If this is indeed the case, then my apologies, but it doesn't make sense. Why would AMD deactivate features at lower clock speeds?

Click to expand...

Throughout the entire prototype and pre-production (as stated in my last message) process, certain features on the CPU, in the BIOS, or on the chipsets have been turned off/on, latencies have changed, etc, etc. This is a normal part of the engineering process as the design is fleshed out and finalized. It does not represent final silicon capabilities and performance.

As I said earlier, I used a poor example as it was not meant to be taken literally spec for spec when comparing engines and CPUs. Regardless of the example, the point was that the platform performance improved significantly as the core speeds improved and this included performance per watt among other indicators. There is a myriad of reasons as to why this occured but considering the early silicon, BIOS, and chipset designs, we could only speculate as to why and I tried to present a few reasons that we honed in on.

If you compare a B00 chip from May to a B02 today, there is a significant difference in performance in all areas (26 seconds in SuperPI 1m for one) and my comments represent observations of what has occurred over this time period. We have final silicon now and results will be posted in the near future. My observations today are different than they were two weeks ago and as the platform matures they will change again.

Once we see the HT 3.0 capable chipsets and Phenom cores mature then we will have an even better indication of the performance of this core design in the consumer market but for now the initial release is Barcelona in the enterprise market.

Hulk · Sep 5, 2007

I for one am happy to see Barcelona doing well on this benchmark. Regardless of who ran the test or the benchmark itself. This at least shows Barcelona may yet be a competitor.

I have no dog in this fight except the new dog. I always want the new chip coming out to top the old one. I was happy when C2D raised the bar for X2 and I'll be happy if Barcelona does the same to C2D.

If IPC for Barcelona is better than C2D then that is great news.

I would think it's easier to scale frequency than to be stuck with a underperforming core from an IPC point of view.

bryanW1995 · Sep 5, 2007

Originally posted by: Scottie Wang
SUPER PI 26S for a 2G or 3G BARCELONA?
CONFUSED...

Originally posted by: Gary Key

I've never heard of any case whereby features become active at certain clock speeds. If this is indeed the case, then my apologies, but it doesn't make sense. Why would AMD deactivate features at lower clock speeds?

Click to expand...

Throughout the entire prototype and pre-production (as stated in my last message) process, certain features on the CPU, in the BIOS, or on the chipsets have been turned off/on, latencies have changed, etc, etc. This is a normal part of the engineering process as the design is fleshed out and finalized. It does not represent final silicon capabilities and performance.

As I said earlier, I used a poor example as it was not meant to be taken literally spec for spec when comparing engines and CPUs. Regardless of the example, the point was that the platform performance improved significantly as the core speeds improved and this included performance per watt among other indicators. There is a myriad of reasons as to why this occured but considering the early silicon, BIOS, and chipset designs, we could only speculate as to why and I tried to present a few reasons that we honed in on.

If you compare a B00 chip from May to a B02 today, there is a significant difference in performance in all areas (26 seconds in SuperPI 1m for one) and my comments represent observations of what has occurred over this time period. We have final silicon now and results will be posted in the near future. My observations today are different than they were two weeks ago and as the platform matures they will change again.

Once we see the HT 3.0 capable chipsets and Phenom cores mature then we will have an even better indication of the performance of this core design in the consumer market but for now the initial release is Barcelona in the enterprise market.

Click to expand...

that is almost definitely 2g barcelona since gary only has a 2ghz barcelona cpu to play with. I get 14.283 on my cpu at 3.512 ghz. hmmmm, I wonder what I'll get at 2...

edit: uh, heh heh, it's kinda funny that my ram is NOT stable at 4-4-3-11 1000...go figure. anyway, I dropped the cpu down to 8x250, ran 2:3 memory at 4-4-3-11, and got 25.328 superpi. that puts the 26 seconds that gary recorded in perspective, though it's still a tremendous improvement from the high 30's that was rumored last week.

Pederv · Sep 5, 2007

"If you look at performance instructions, Barcelona is about 30 percent faster than Clovertown. However, if you look at energy instructions, Clovertown is about 30 percent faster than Barcelona," Dell said.

EETimes

Wasn't that helpful.

HopJokey · Sep 5, 2007

Originally posted by: Pederv
"If you look at performance instructions, Barcelona is about 30 percent faster than Clovertown. However, if you look at energy instructions, Clovertown is about 30 percent faster than Barcelona," Dell said.

EETimes

Wasn't that helpful.

WTH is a "performance instruction" and a "energy instruction"?

Pederv · Sep 5, 2007

Originally posted by: HopJokey

Originally posted by: Pederv
"If you look at performance instructions, Barcelona is about 30 percent faster than Clovertown. However, if you look at energy instructions, Clovertown is about 30 percent faster than Barcelona," Dell said.

EETimes

Wasn't that helpful.

Click to expand...

WTH is a "performance instruction" and a "energy instruction"?

I'm thinkin' it breaks down to what we have known for a long time - AMD CPUs are better at some things, Intel CPUs are better at other things.

classy · Sep 5, 2007

Originally posted by: HopJokey

Originally posted by: Pederv
"If you look at performance instructions, Barcelona is about 30 percent faster than Clovertown. However, if you look at energy instructions, Clovertown is about 30 percent faster than Barcelona," Dell said.

EETimes

Wasn't that helpful.

Click to expand...

WTH is a "performance instruction" and a "energy instruction"?

LOL, I was thinking the same thing.

lopri · Sep 5, 2007

What I read is this?

"If you look at floating point instructions, Barcelona is about 30 percent faster than Clovertown. However, if you look at integer instructions, Clovertown is about 30 percent faster than Barcelona,"

Of course that 30% number is out of Mr. Dell's a**, but the relative expectation is in line with what's known - Barcelona will be very fast in FP calculations. And this enhancement, while understandable (AMD had always been behind Intel when it comes to FP), makes me scratch my head because it doesn't benefit servers as much (as Integers). Workstations, maybe.

Amaroque · Sep 5, 2007

Originally posted by: Pederv
"If you look at performance instructions, Barcelona is about 30 percent faster than Clovertown. However, if you look at energy instructions, Clovertown is about 30 percent faster than Barcelona," Dell said.

EETimes

Wasn't that helpful.

My interpretation of the quote is this...

If you look at equal power draw on both systems, Clovertown is 30% faster at the same power usage. If you look at IPC Barcelona is 30% faster clock for clock.

But Clovertown will be clocked at least 30% higher, so it might be a moot point.

bryanW1995 · Sep 5, 2007

that would be an amazing pull-out-of-their-ass manuever from amd to get 30 % clock for clock advantage. that would put a 2.5 ghz barcelona equivalent to a 3.25 ghz penryn. The energy efficiency of the 45 nm process also makes sense, allowing intel to clock higher at the same power draw. of course, if barcelona is 30% faster clock for clock and clovertown is 30 % more energy efficient at the same clock speed, they should end up pretty close to equal in power draw at the same "equivalent speed".

BitByBit · Sep 6, 2007

I'd be very surprised if Cloverton was 30% fast per Watt, given K10's power saving features. If K10 is indeed faster per clock, I'd expect it to have a slight advantage in Performance/Watt. 30% per Watt is roughly the advantage Core holds over K8.

Gary Key · Sep 6, 2007

Depending upon the application, Barcelona clock for clock is equal to Clovertown or just a tad faster in some areas and in others it is 20%+ (heavy emphasis on "plus" until Monday,

) faster. I have to say that final silicon is looking really nice at this point, especially in memory sensitive applications where this processor shines, but core speeds need to come up in a hurry to compete with Tigerton in general server applications.

The memory bandwidth numbers we noticed are a little surprising considering the results in certain memory intensive applications. The latencies are much improved over the last core stepping we looked at as is pure throughput which makes you think twice about the relationship between bandwidth and performance in general from this CPU.

My personal opinion is that if AMD had hit their original launch targets and speeds, the server market would be incredibly competitive at this point with Tigerton being a much needed response to Barcelona. As it stands, AMD is going to have an uphill battle at this time. But this is all subjective thinking from a guy who tried to compare an engine to a CPU, but maybe the results on Monday will clarify some of the observations I was trying to convey without breaking the rules.

Stoneburner · Sep 6, 2007

Originally posted by: Gary Key
Depending upon the application, Barcelona clock for clock is equal to Clovertown or just a tad faster in some areas and in others it is 20%+ (heavy emphasis on "plus" until Monday, ) faster. I have to say that final silicon is looking really nice at this point, especially in memory sensitive applications where this processor shines, but core speeds need to come up in a hurry to compete with Tigerton in general server applications.

The memory bandwidth numbers we noticed are a little surprising considering the results in certain memory intensive applications. The latencies are much improved over the last core stepping we looked at as is pure throughput which makes you think twice about the relationship between bandwidth and performance in general from this CPU.

My personal opinion is that if AMD had hit their original launch targets and speeds, the server market would be incredibly competitive at this point with Tigerton being a much needed response to Barcelona. As it stands, AMD is going to have an uphill battle at this time. But this is all subjective thinking from a guy who tried to compare an engine to a CPU, but maybe the results on Monday will clarify some of the observations I was trying to convey without breaking the rules.

I wish there were more editors like you willing to throw morsels to us salivating masses. Any chance you could be a little (just a little) more specific on what areas barcelona and tigerton are roughly equal?

Viditor · Sep 6, 2007

Originally posted by: Gary Key
Depending upon the application, Barcelona clock for clock is equal to Clovertown or just a tad faster in some areas and in others it is 20%+ (heavy emphasis on "plus" until Monday, ) faster. I have to say that final silicon is looking really nice at this point, especially in memory sensitive applications where this processor shines, but core speeds need to come up in a hurry to compete with Tigerton in general server applications.

The memory bandwidth numbers we noticed are a little surprising considering the results in certain memory intensive applications. The latencies are much improved over the last core stepping we looked at as is pure throughput which makes you think twice about the relationship between bandwidth and performance in general from this CPU.

My personal opinion is that if AMD had hit their original launch targets and speeds, the server market would be incredibly competitive at this point with Tigerton being a much needed response to Barcelona. As it stands, AMD is going to have an uphill battle at this time. But this is all subjective thinking from a guy who tried to compare an engine to a CPU, but maybe the results on Monday will clarify some of the observations I was trying to convey without breaking the rules.

Thanks for that Gary...are you saying you are benching Tigerton as well?
Are you using it in a 4S configuration?

First *real-world* benches of Barcelona?

Junior Member

Junior Member

Lifer

Lifer

Elite Member

Junior Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Junior Member

Diamond Member

Lifer

Golden Member

Platinum Member

Golden Member

Lifer

Elite Member

Platinum Member

Lifer

Senior member

Senior member

Diamond Member

Diamond Member

First real-world benches of Barcelona?