Sneak peak at FX-8150 with SuperPi & wPrime

BlueBlazer · Sep 19, 2011

LOL_Wut_Axel said:
You're on ignore, but Nemesis quoted you. Some things: you compare using cores, not threads. If you want to see IPC, you use a single-threaded application, preferably at the same or close clock speed.

Comparison is valid because both CPUs are running a multi-threaded program, and HyperThreading is being used on both CPUs. Furthermore its about comparing two Penryn cores versus two HyperThreaded Atom D510 cores, or one Penryn core versus one HyperThreaded Atom D510 core (since wPrime is after all a multi-threaded application).

LOL_Wut_Axel said:
The implementation of HyperThreading in the Pentium 4 is very poor and at many times actually hinders performance, while on an Atom it can bring up to a 20% improvement. That alone makes your comparison invalid.

Depending on application HyperThreading can degrade or improve performance in multi-threaded applications. In particular, some multi-threaded programs benefits from HyperThreading (for example as shown in this Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone.... Page 2, check the "Multi-Threading Tasks" section). To say Pentium4 loses performance due to HyperThreading most of the time is incorrect. :hmm:

LOL_Wut_Axel said:
And yes, Atom does have lower IPC than Pentium 4. That's been known for a long time now. Intel's focus with Atom wasn't energy efficiency, but providing "decent-enough" performance and very cheap manufacturing costs. If you compare the energy efficiency of Atom to a Core i3, it'll come out losing by a huge margin.

And did you forget that, per-core (2 threads), the Atom D510 in the above application shows 68% higher performance per clock? You can easily see that certain applications (synthethic and multi-threaded like Fritz Chess and possibly wPrime) can easily perform "60%" faster then Pentium4. That shows Atom D510 can at times have better IPC than Pentium4 (by quite a margin due to its efficiency).

LOL_Wut_Axel said:
Again, go troll somewhere else.

You were proven wrong before (about x87 "support") in this thread, and proven wrong again here (about IPC on Atom D510 and HyperThreading).

Schmide · Sep 19, 2011

RussianSensation said:
Actually, SuperPi performance scales almost perfectly with IPC between each of my Intel Core 2 Duo E6400, Q6600, Core i7 860 and i5 2500k!! I can't comment on AMD side, but SuperPi scaling shows very accurate linear IPC improvement across Intel's lineup in the last 5 years.

But if you strictly want to look at IPC improvements, it's one of the best benchmarks (assuming you don't skew the results with DDR3-2400 mhz memory, etc.).

I know you know this, your last comment about memory kind of shows it. (below)

LOL_Wut_Axel said:
No, SuperPi is horrible for testing because it uses a very outdated instruction set that does not represent performance in any recent real-world program. X87 is old junk.

It's not the instruction set nor the IPC that gives super PI its numbers, it's the cache and memory system for witch Intel has dominated AMD for years. If you read up on the nature of calculating PI, you'll understand why.

I would also bet if you look at the memory and cache speed of the various processors RussianSensation was talking about, it would scale most accurately to this.

BlueBlazer · Sep 19, 2011

Schmide said:
It's not the instruction set nor the IPC that gives super PI its numbers, it's the cache and memory system for witch Intel has dominated AMD for years. If you read up on the nature of calculating PI, you'll understand why.

Its not only about cache and memory, its also about IPC and FP (x87) performance. Before the advent of Core 2 Duo, the K7s and K8s dominated SuperPi. The K7 had better IPC and x87 performance. Even with the arrival of dual channel memory on the Pentium4 platform, the situation did not change. The K8 with its internal memory controller had better/lower latency and better memory bandwidth than FSB-based Intel CPUs like Pentium4. The early Socket-754 Athlon 64s have only single channel memory controller. Should be noted that Core 2 Duo was still using the antiquated FSB also.

Riek · Sep 19, 2011

BlueBlazer said:
Its not only about cache and memory, its also about IPC and FP (x87) performance. Before the advent of Core 2 Duo, the K7s and K8s dominated SuperPi. The K7 had better IPC and x87 performance. Even with the arrival of dual channel memory on the Pentium4 platform, the situation did not change. The K8 with its internal memory controller had better/lower latency and better memory bandwidth than FSB-based Intel CPUs like Pentium4. The early Socket-754 Athlon 64s have only single channel memory controller. Should be noted that Core 2 Duo was still using the antiquated FSB also.

He is talking about l1-l2 cache speeds.

Schmide · Sep 19, 2011

The P4 had probably one of the worst penalties for a cache load/miss ever. Add to that its anemic L1 system and super long pipeline and you can see why it did poorly. However, when you pumped the FSB/cache on the P4, I believe it actually put forth some good numbers for SuperPi and some records were held for a while by super clocked P4s.

BlueBlazer · Sep 19, 2011

Schmide said:
The P4 had probably one of the worst penalties for a cache load/miss ever. Add to that its anemic L1 system and super long pipeline and you can see why it did poorly. However, when you pumped the FSB/cache on the P4, I believe it actually put forth some good numbers for SuperPi and some records were held for a while by super clocked P4s.

The SuperPi records wasn't held by Pentium4s, but was in fact held by overclocked K8s and Yonah (Core Duo) back in those days (before Core 2 Duo arrived).

blackened23 · Sep 19, 2011

OBR has admitted as of this morning that these benchmarks were falsified.

lol123 · Sep 19, 2011

blackened23 said:
OBR has admitted as of this morning that these benchmarks were falsified.

These benchmarks were run by NordicHardware, not by OBR.

LOL_Wut_Axel · Sep 19, 2011

BlueBlazer said:
Comparison is valid because both CPUs are running a multi-threaded program, and HyperThreading is being used on both CPUs. Furthermore its about comparing two Penryn cores versus two HyperThreaded Atom D510 cores, or one Penryn core versus one HyperThreaded Atom D510 core (since wPrime is after all a multi-threaded application).

Depending on application HyperThreading can degrade or improve performance in multi-threaded applications. In particular, some multi-threaded programs benefits from HyperThreading (for example as shown in this Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone.... Page 2, check the "Multi-Threading Tasks" section). To say Pentium4 loses performance due to HyperThreading most of the time is incorrect. :hmm:

And did you forget that, per-core (2 threads), the Atom D510 in the above application shows 68% higher performance per clock? You can easily see that certain applications (synthethic and multi-threaded like Fritz Chess and possibly wPrime) can easily perform "60%" faster then Pentium4. That shows Atom D510 can at times have better IPC than Pentium4 (by quite a margin due to its efficiency).

You were proven wrong before (about x87 "support") in this thread, and proven wrong again here (about IPC on Atom D510 and HyperThreading).

Yeah, no. Comparing IPC, therefore architectures, is only done well when it's on single-threaded programs, and at similar clock speeds. HyperThreading is NOT part of the IPC not to mention, again, the implementation of HyperThreading on Atom is much better than that of the Pentium 4.

If you still don't get it by now, it's hopeless. In-order execution and a 2-issue design reduces performance by a devastating amount on Atom, hence why it's so slow. Even with HyperThreading, it's impossible for Atom to catch up to Penryn because Penryn has more than 100% higher IPC, while on applications that benefit the most from HyperThreading you only gain 25%, plus the additional 23% clock speed. Still a higher-than-50% gap. If you think an Atom Dual-Core can be the same speed as a CULV Core 2 Duo, you're insane.

RussianSensation · Sep 19, 2011

Schmide said:
I would also bet if you look at the memory and cache speed of the various processors RussianSensation was talking about, it would scale most accurately to this.

Ya, that's the thing. If BD performs poorly in SuperPi, it may also be a function of its inferior memory controller, latency and cache speed to Intel processors. These things may or may not translate directly to poor performance in real world programs.

SuperPi is also very memory bandwidth AND latency dependent. DDR3-2400 ram at CL6 (if such existed), would show massive improvements in 32M vs. DDR3-1066 CL9.

SuperPi may not be exact, but it's pretty close to real world differences in IPC. Comparing Intel 2500k and AMD X6 1100T directly, we see that 2500k is about 50% faster. SB's real world IPC advantage in single-threaded apps at the same clock speed is also about 40-50% greater:

Still, since BD is an entirely new architecture from AMD, it may be the first processor that performs poorly in SuperPi, and yet performs extremely well under real world conditions. We'll know soon enough.

BD will have some cores Turbo to 4.2ghz. So in the context of single-threaded performance, it will still end up faster than Phenom II CPUs.

BlueBlazer · Sep 19, 2011

blackened23 said:
OBR has admitted as of this morning that these benchmarks were falsified.

No kidding! AFAIK he admitted that months ago.

LOL_Wut_Axel said:
Yeah, no. Comparing IPC, therefore architectures, is only done well when it's on single-threaded programs, and at similar clock speeds. HyperThreading is NOT part of the IPC not to mention, again, the implementation of HyperThreading on Atom is much better than that of the Pentium 4.

I've already mentioned earlier that comparing one Penryn core to one HyperThreaded (2 threads) Atom D510 core. You may imply comparing them as one core to one thread per core. However in that wPrime benchmark, the program utilizes all the hardware threads it can use. In other words its going to spawn 2 threads for 2 cores on the Penryn, and spawn 4 threads for 2 cores on the Atom D510. That means each thread on the Atom D510 no longer have 100% of its core performance. Thus for this particular benchmark, logically have to compare one Penryn core to 2 threads on the Atom D510 core.

LOL_Wut_Axel said:
If you still don't get it by now, it's hopeless. In-order execution and a 2-issue design reduces performance by a devastating amount on Atom, hence why it's so slow. Even with HyperThreading, it's impossible for Atom to catch up to Penryn because Penryn has more than 100% higher IPC, while on applications that benefit the most from HyperThreading you only gain 25%, plus the additional 23% clock speed. Still a higher-than-50% gap. If you think an Atom Dual-Core can be the same speed as a CULV Core 2 Duo, you're insane.

First of all did you check....

BlueBlazer said:
In particular, some multi-threaded programs benefits from HyperThreading (for example as shown in this Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone.... Page 2, check the "Multi-Threading Tasks" section).

Look at the figures in the table, especially performance gains from HyperThreading can be anywhere from 24.9% to as high as 45.7% depending on application.

Here is another example >> First Look at Presler: Intel Pentium Extreme Edition 955 CPU Review. Page 13 especially on the (multi-threaded) 3ds max 7.0 SPECapc CPU render graph.....

Compare Pentium D 950, Pentium XE 955 and Athlon X2 4800+. Everyone knows the Athlon X2 4800+ is a far superior CPU in every aspect, however in this particular multi-threaded application the tables are turned as Pentium XE 955 edges ahead. Pentium XE 955 is almost the same as Pentium D 950, except for slightly higher clock speed (3.46GHz vs 3.4GHz), bus speed (1066MHz vs 800MHz) and HyperThreading. That feature alone allowed Pentium XE 955 to overtake a better Athlon X2 4800+ beast.

RussianSensation · Sep 19, 2011

BlueBlazer said:
N
First of all did you check....Look at the figures in the table, especially performance gains from HyperThreading can be anywhere from 24.9% to as high as 45.7% depending on application.

Those are extreme examples from Pentium 4 era. HT was more effective for P4. On modern SB CPU, the difference is far less dramatic.

HardwareCanucks: 18% on average
Computerbase.de: 4-5% on average

In Cinebench, HT helps the 2600k a lot though over the 2500k.

sm625 · Sep 19, 2011

atom probably does relatively well at finding prime numbers because OoO doesnt optimize that code.

BlueBlazer · Sep 19, 2011

RussianSensation said:
In Cinebench, HT helps the 2600k a lot though over the 2500k.

As much as 29.5% for Cinebench R11.5, and 52.6% for wPrime 32m (from this example, adjusted clock-to-clock).

Spikesoldier · Sep 19, 2011

RussianSensation said:
Those are extreme examples from Pentium 4 era. HT was more effective for P4. On modern SB CPU, the difference is far less dramatic.

HardwareCanucks: 18% on average
Computerbase.de: 4-5% on average

In Cinebench, HT helps the 2600k a lot though over the 2500k.

you forget the 2600K has 2MB more L3 cache than the 2500K.

P4's inefficient long pipe let HT help performance more than SB.

aigomorla · Sep 19, 2011

RussianSensation said:
Those are extreme examples from Pentium 4 era. HT was more effective for P4. On modern SB CPU, the difference is far less dramatic.

HardwareCanucks: 18% on average
Computerbase.de: 4-5% on average

In Cinebench, HT helps the 2600k a lot though over the 2500k.

Netburst only worked if intel could get 10ghz on its current GHZ scale.

The guys at intel went ROFLCOPTER when they realized how hot these prescott/prestlers and smithy's got when loaded and overclocked.

C2D reduced heat considerably while giving us a faster platform.
Then Bloomfield came along and changed everything along with bringing back HT on the new GHZ scale which actually worked.

P4->C2D = a big Step
C2D -> Bloomfield = another big Step
Bloomfield -> Gulftown = Growth.
Gulftown -> SB-E = Lets make the rich people spend money they dont have to, unless ur in the corp. IT sector!

LOL_Wut_Axel · Sep 19, 2011

BlueBlazer said:
I've already mentioned earlier that comparing one Penryn core to one HyperThreaded (2 threads) Atom D510 core. You may imply comparing them as one core to one thread per core.

That's not how you compare. That's the problem with all of your arguments. HyperThreading is not part of IPC, yet you're counting it as if it were. For the last time, Penryn has at least 100% higher IPC than Atom. That means both on one core, one thread, at the same frequency, Penryn will perform at least 100% more instructions. You don't compare one core/one thread to one core/two threads when looking at IPC. If you want to compare multi-threaded performance, THEN you look at HyperThreading. As I've repeated to you many times to no avail, in an accurate benchmark there should be at least a 50% performance difference between the D510/N570 and the SU7300, yet in wPrime they're almost equal. No, the HyperThreading won't make up for the difference. It's only a 20-25% improvement.

RussianSensation, SuperPi clearly favors Intel's modern CPUs over AMD's and presents skewed numbers because of it. Again, it puts Conroe as performing 25% better than Phenom II when the biggest performance difference you'll find at the same clock speed in almost all other benchmarks is 5% or so. If all SuperPi takes into account is memory bandwidth as you guys say, then that alone should be enough to discard it. Given how much memory bandwidth we've had since Phenom II and Nehalem came out, it makes little difference in real-world programs except file compression and some synthetics. You can also see this comparing Bloomfield and Lynnfield.

aigomorla · Sep 19, 2011

996GT2 said:
...and this 8 core FX-8150 came out considerably slower than even a Thuban (which does wPrime32M in around 9 seconds).

which is why u heard me tell people i am hoping the X6 drops in price and then migrating my friend in that direction and not bulldozer. :biggrin:

LOL_Wut_Axel said:
You can also see this comparing Bloomfield and Lynnfield.

lulz... another one i hated...
WTF was the point in 1156? someone please tell me.

It was a stepping stone to 1155, which probably had one the shortest platform life intel has ever had.
Basically another repeat of socket 462 but thank god we werent forced on rambus ram like we were then.

Arkaign · Sep 19, 2011

aigomorla said:
which is why u heard me tell people i am hoping the X6 drops in price and then migrating my friend in that direction and not bulldozer. :biggrin:

lulz... another one i hated...
WTF was the point in 1156? someone please tell me.

It was a stepping stone to 1155, which probably had one the shortest platform life intel has ever had.
Basically another repeat of socket 462 but thank god we werent forced on rambus ram like we were then.

not to nitpick but Socket 462 was AMD Socket A, you're thinking of s423

aigomorla · Sep 19, 2011

Arkaign said:
not to nitpick but Socket 462 was AMD Socket A, you're thinking of s423

oh my gosh... i am tripping... lol..

S423 was the one with the rambus ram correct?

RussianSensation · Sep 19, 2011

Spikesoldier said:
you forget the 2600K has 2MB more L3 cache than the 2500K.

P4's inefficient long pipe let HT help performance more than SB.

No, I didn't forget 2MB cache difference because the benches linked compare 2600 with and without HT. ^_^ Also, if you look at either review, you'll see the cache different amounts to almost nothing.

LOL_Wut_Axel said:
If all SuperPi takes into account is memory bandwidth as you guys say, then that alone should be enough to discard it.

I don't think anyone said this in the thread. Memory bandwidth, IPC, latency, cache speed are all key factors in SuperPi performance. Memory bandwidth alone is not the most important factor.

Arkaign · Sep 19, 2011

aigomorla said:
oh my gosh... i am tripping... lol..

S423 was the one with the rambus ram correct?

Aye aye. Also many S478 boards had RDRAM as well, at least the early high-end ones. RDRAM was fast for the time at 800 and 1066 speeds, but waaaaaaaaay too expensive to make any real sense. A while after they were out I got a used system with 4x512MB Samsung PC1066 modules, which I promptly ebayed

The kit brought over $1k! Even though 2GB of DDR400 was available at the time for less than $150!

tweakboy · Sep 19, 2011

Man there's too many people ready to throw a party on the 22nd when BD is available I assume ? If not delayed to Oct. Well find out soon.

I still rather have 4 cores and Intels HT 4 logical cores then 8 cores of BD.

aigomorla · Sep 19, 2011

... :X

BTW OC'd Gulftown @ 4.6ghz numbers on Water /w 1.42vcore!

But yes everyone knows SuperPI and Wprime are more geared to intel processors.
And u can see how extreme it is...

kcidmai · Sep 19, 2011

LOL_Wut_Axel said:
That's not how you compare. That's the problem with all of your arguments. HyperThreading is not part of IPC, yet you're counting it as if it were. For the last time, Penryn has at least 100% higher IPC than Atom. That means both on one core, one thread, at the same frequency, Penryn will perform at least 100% more instructions. You don't compare one core/one thread to one core/two threads when looking at IPC. If you want to compare multi-threaded performance, THEN you look at HyperThreading. As I've repeated to you many times to no avail, in an accurate benchmark there should be at least a 50% performance difference between the D510/N570 and the SU7300, yet in wPrime they're almost equal. No, the HyperThreading won't make up for the difference. It's only a 20-25% improvement.

RussianSensation, SuperPi clearly favors Intel's modern CPUs over AMD's and presents skewed numbers because of it. Again, it puts Conroe as performing 25% better than Phenom II when the biggest performance difference you'll find at the same clock speed in almost all other benchmarks is 5% or so. If all SuperPi takes into account is memory bandwidth as you guys say, then that alone should be enough to discard it. Given how much memory bandwidth we've had since Phenom II and Nehalem came out, it makes little difference in real-world programs except file compression and some synthetics. You can also see this comparing Bloomfield and Lynnfield.

Well what if HT does improve IPC. IPC is a measure of how many instructions you can perform for each clock cycle. If HT shoves more instructions into a starved pipeline, every time. Guss what, it improves IPC.

Have a nice day.

Sneak peak at FX-8150 with SuperPi & wPrime

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Member

Diamond Member

Elite Member

Senior member

Elite Member

Diamond Member

Senior member

Diamond Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Diamond Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Lifer

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Elite Member

Lifer

Diamond Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Banned