ClawHammer to Perform 30-50% Better Than Athlon XP at Same Clock Speed

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Bluga

Banned
Nov 28, 2000
4,315
0
0


<< I have been asked by several people for performance estimates of AMD's
Hammer series of processors so I'll give it a shot. Keep in mind that x86-64
is a new microarchitecture so there is wide room for error on any or all of
these factors:
Reference point:
- K7 XP 2000+
- at or near end of performance scaling in 0.18 um bulk CMOS
- 1667 MHz, ~700 SPECint_base2k, ~600 SPECfp_base2k

Hammer top bin clock rate (early/mature):
- 5%/5% bump from 12 stage pipeline (extra stages mostly for IPC
gain and for handling extra complexity of x86-64)
- 20%/25% gain from 0.13 um (wire limitation, limited Leff reduction
from late model 0.18 um K7s vs use of 0.09 um FET techniques)
- 10%/15% gain from SOI
Total +35% early, +45% mature

Microarchitectural gains:
- biggest difference is on-chip memory controller. If we assume best of
class K7 chip sets average 100 ns access for ~50% page hit mix and
moderate traffic and integrating the memory controller shaves 30 ns
(probably a bit generous) off read latency, and integer app performance
scalability is 60%, then speedup is approximately 1/(0.6 + 0.4*(70/100))
or 19%. Assume othe efficiencies like better buffering and round that to
20%. For larger cache/wider memory Sledgehammer, I'll say 25% bump
for integer apps. FP apps are much more bandwidth sensitive than
latency sensitive so I'll apportion 5%/40% for Claw/Sledge.
- The improved front end I'll apportion 5%/0% for int/FP apps.
- for x86-64 compiled apps I'll apportion 5/10% for int/FP apps from
increased number of GPRs available and other efficiencies.

So "IPC" improvements relative to XP (with x86-64 recompilation):

Clawhammer:

int: 20% MC + 5% FE + 5% x86-64 = 30%
FP: 5% MC + 0% FE + 10% x86-64 = 15%

Sledgehammer:

int: 25% MC + 5% FE + 5% x86-64 = 35%
FP: 40% MC + 0% FE + 10% x86-64 = 50%

SPECint/fp_base2k estimates (assume 70%/50% int/FP perf scaling with F)
with full x86-64 recompilation:

Early top bin (+35%, ~2250 MHz)

Claw: 1150 / 800
Sledge: 1200 / 1050

Mature top bin (+45%, ~2400 MHz)

Claw: 1200 / 850
Sledge: 1250 / 1100

If the Hammers are running generic or P4 optimized 32 bit x86
code then I would discard the x86-64 IPC bump and cut the
FE bump in half. That will reduce the performance by about 6
to 8%.

FWIW a 3400 MHz XP would probably score roughly around 1050
SPECintbase_2k so if Hammer's model rating number was based
on SPECInt then a 3400+ Clawhammer would clock around 2 GHz.
Conversely, a 2.25 GHz Claw would rate around a 4000+ rating.

Now remember folks that is a 15 minute, back of an envelope
calculation/estimate/WAG and 5 minutes was taken to find the
envelope. ;-)
>>

 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
agodspeed,

Good news if it becomes reality....

1) Doesn't increasing the pipeline stages while allowing better ramping of speed actually lower the fpu??? Ie just like we bitched about p4 performance? So theis 30-50 percent is on top of taking a hit in the fpu category....

2) I have read 3-4 reviews and I have always seen the 2000+ on par with the 2ghz northwood not the 2.2 northwood...do you have links to more then one site stating that???? I think you are wrong....I think athlon xp pr rating lost its conservative rating when the northwoods came out.

Athlon 2000+ > p4 2ghz willamette
Athlon 2000+ = p4 2ghz northwood

This is my opinion but I base it on quite a few different reviews....
 

BuckleDownBen

Banned
Jun 11, 2001
519
0
0
Well, I don't believe that 30 to 50% figure. AMD had to do something, cuz they're getting killed in the enthusiast market with the 1.6A Northwood. They quote this figure and hope that some people that are about to buy a Northwood will wait for the new AMD chip.

 

AGodspeed

Diamond Member
Jul 26, 2001
3,353
0
0
2) I have read 3-4 reviews and I have always seen the 2000+ on par with the 2ghz northwood not the 2.2 northwood...do you have links to more then one site stating that???? I think you are wrong....I think athlon xp pr rating lost its conservative rating when the northwoods came out.

Yeah, here are some Athlon XP 2000+ vs. 2.2GHz Northwood Reviews:

1. AnandTech

Conclusion: The second point we attempted to make is that the Athlon XP is indeed a very impressive offering from AMD. We've all known this for a while but seeing how well it is able to stand up to the 2.2GHz Pentium 4 is like one day rediscovering the beauty of a wife or girlfriend of many years. In virtually all of the tests we conducted the Athlon XP 2000+ was within a negligible amount of percentage points of the 2.2GHz Pentium 4.

2. Aceshardware

Conclusion: In the desktop market, it is a different game. Northwood features better gaming performance than the Willamette Pentium 4, but fails to leave the fastest Athlon XP behind. Knowing that a 2.2 GHz Pentium 4 costs $562 and that an Athlon XP 2000+ (on average slightly faster) comes with a $339 pricetag, it is crystal clear that the Athlon is still the king in the price/performance department.

3. TechReport

Conclusion: Performance-wise, it's a toss-up. I would like to declare one or the other of these processors the clear winner, but that's just not possible. The Athlon XP 2000+ and Pentium 4 2.2GHz are locked in a dead heat for the title of "fastest x86 processor."

Those are just a few that I searched for. Tomshardware and a few other good hardware review sites conclude roughly the same thing, with a few declaring that the 2.2GHz Northwood has a slight performance edge (but most of the time within negligible territory).
 

AGodspeed

Diamond Member
Jul 26, 2001
3,353
0
0
Well, I don't believe that 30 to 50% figure. AMD had to do something, cuz they're getting killed in the enthusiast market with the 1.6A Northwood. They quote this figure and hope that some people that are about to buy a Northwood will wait for the new AMD chip.

For whatever reason you don't believe the 30-50% figure, you should still take a look at Paul Demone's estimates, which Bluga cut and pasted in the first post on this page. I'll cut and paste the most important parts:

Clawhammer:

int: 20% MC + 5% FE + 5% x86-64 = 30%
FP: 5% MC + 0% FE + 10% x86-64 = 15%

Sledgehammer:

int: 25% MC + 5% FE + 5% x86-64 = 35%
FP: 40% MC + 0% FE + 10% x86-64 = 50%
 

Rand

Lifer
Oct 11, 1999
11,071
1
81




<< Doesn't increasing the pipeline stages while allowing better ramping of speed actually lower the fpu??? >>



The largest performance impact it has is making a branch mis-prediction more costly, in terms of wasted clockcycles calculation a branch that was not taken, it does not inherently hurt FPU performance. In fact, it tends to have a minimal impact of FPU performance has FPU code tends to have relatively few potential branches.
It has a much larger impact upon integer intensive code.
The improved BPU of the Hammer should more then off-set the slightly longer pipeline though, IMHO the Athlon's BPU is far and away one of the biggest hinderances in the K7 core. The BPU unit on the old K6 was quite a bit more advanced IMHO, though the K6's BPU would have been ill-suited to the K7 so obviously that's not quite a fair comparison. But then, the K6 did have an extremely advanced BPU for it's time. The largest TLB uffer should also, help to off-set any minimal penalties imposed by the longer pipeline.



<< I think athlon xp pr rating lost its conservative rating when the northwoods came out. >>



I would still say the AthlonXP's Model rating is indeed rather conservatibe when put in the context of the relative performance compared to the 400MHz FSB Northwood + I850 combo.
My own personal experience tends to put the AthlonXP as being roughly equal to a Northwood of 200MHz higher 'rating'. This is of course heavily dependent upon the applications used for testing though.
This opinion is also mostly backed up by the tests done by Lost Circuits, Tom's Hardware, AnandTech and AcesHardware.... I specify those reviews in particular as they've done the most intensive and thorough testing of the processors relative capabilities IMHO.

As for Paul Demone's estimates... I put little faith in those. Not that I'm saying that I disbelieve it will be a 30-50% performance differential, it may well be. It's quite hard to judge the impact of the integrated memory controller, as this will be the first X86 processor to do such a thing, we still don't know all the details of the new BPU, and it's somewhat difficult to gauge the effectiveness of the extra stages to decode the added complexity of X86-64 and the potential IPC gain also related to those versus the trade off in terms of wasted clockcycles in case of a branch mis-prediction. Potential gains in a limited number of applications due to SSE2. SSE2/64bit code that can make use of the extra registers, potential improved compiler optimizations for the P4/AXP/Hammer' could help/hinder performance also.

Just too many unknown variables to make a truly accuracte estimate of performance without having any first had data from software architectural simulations or A0 stepping performance.
As Mr. Demone himself said

<< "Keep in mind that x86-64 is a new microarchitecture so there is wide room for error on any or all of these factors" >>



I have quite a bit of respect for Mr. Demone, but It's simply too early and there is too limited data, and too many unknowns to make anything more then a very vauge guesstimate of performance... unless as in the case of AMD you have first hand simulations and actual silicon to test.
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
Yeah it looks like in most cases that is true that 2000+ xp outscores p4 2.0ghz northwood....I stand corrected...

neglible for ppl who play games maybe....But for me even percentage points mean less minutes in divx encodings and lame mp3 conversions of 2 hour ac3 soundtracks.....


I don't have to worry about it too much...I am doing 2.4ghz w/ 533fsb and pc2700ddr doing 2525mb/sec of memory bandwidth...I still stand it will take a 2200+ athlon to beat this in majority performance....


I do hope the hammer is a big success...I wish amd would have implemented more then a die shrik for the thoroughbred. I looking forward to a hammer dual channel ddr system next year...
 

AGodspeed

Diamond Member
Jul 26, 2001
3,353
0
0
I looking forward to a hammer dual channel ddr system next year...

Unless you're going to be spending the bucks on a SledgeHammer, you're going to have to stick with a single channel DDR ClawHammer. :)
 

ToBeMe

Diamond Member
Jun 21, 2000
5,711
0
0


<< ClawHammer to Perform 30-50% Better Than Athlon XP at Same Clock Speed >>


Sounds great...............and I don't doubt that it may well perform 30 - 50% faster in some applications, but, I also believe a lot of this is hype at this point............to many variables yet. Intel has said that numbers have skyrocketed since the Northwood release to the point they are having some problems keeping up with demand...........Dell and other manufacturers numbers substantiate this so I'm sure this concerns AMD. Soon the Northwoods will debut at 533FSB and this will add distance between them also and with AMD's .13 proccess problems with T-Bred, it will most likely take "Hammer" to really compete with the new 533 Northy's. I just hope that AMD is able to stay the track and get the Hammer out before X-Mas '02................
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
I sent this estimate to AGodspeed earlier, and he asked me to post it....I hesitated at first, since predictions have a tendency to bite you in the a$$, so I'm not necessarily holding steadfast to this. I used some of DeMone's int and fp scalability figures from his thread at RWT that I read while thinking about this.



Assuming a 100ns DRAM page hit, 115ns row hit, and 130ns precharge latency (with a 50%, 40%, and 10% respective occurance rates), the average main memory access time is around 109 ns.....say that the on-die memory controller shaves off 25ns to bring the main memory access time to 84ns.

Let's assume that Clawhammer doubles the L2 cache size to 512KB...using the 1.67 GHz XP as a base (3 cycle L1 with 1.8% miss-rate, 11 cycle L2 with 1% miss-rate). The 512KB cache will bring the L2 miss-rate down to 1% / 2^.5 = .71%. I think I remember the Hammer MPF PDF implying that the L1 and L2 have load-use latencies of 2 and 8 cycles respectively.

Thus the 1.67 GHz XP has an average memory access time of:
3 cycles + .018 * 11 cycles + .01 * (109 * 1.67) cycles = 5.02 cycles

An estimate of the 2 GHz Clawhammer's average memory access time is:
2 cycles + .018 * 8 cycles + .0071 * (84 * 2) cycles = 3.34 cycles

With an integer code scalability of 60%, the speedup is 1/(.6 + .4 * (3.34/5.02)) = 16%. With another 5% improvement in performance for front-end tweaks to instruction decoding, dispatching and branch prediction, that brings the total for 32-bit code to ~20% (plus or minus 5%). x86-64 recompilation will add another 5-10% with use of more general purpose registers (less that I originally thought; x86 is a two operand format and will benefit less from more registers than a three operand format ISA), to a total of ~25-30%.

Floating-point code will see much less of a benefit from the on-die memory controller and a hypothetical 512KB L2; FP code is much more inner-loop intensive and thus more dependent on memory bandwidth than latency. Figure 5-10% base improvement in 32-bit FP code (improved front-end will likely have no benefit), perhaps another 10% for x86-64 recompiled code with effective use of the extra SSE registers.

I'd expect Clawhammer to be top-of-the-line for integer performance, but that's too be expected....x86 MPUs have had excellent integer performance since the 486. In SPECint_base2K, the 2.2GHz Northwood ties the 1.3 GHz Power4 for the top spot at 790, and 5 of the top 7 scores are from x86 MPUs. With the 1.67 GHz XP scoring 697, and assuming 70% integer scalability and 30% Clawhammer IPC boost with x86-64 recompilation, a SPECint_base2K score of around 1000 - 1100 is likely (((2 - 1.667)/1.667)*.7 + 1) * 700 * 1.3 = 1037)

Floating-point performance should still be respectable, but not as relatively high as the integer. With a 1.67 GHz XP scoring 596 in SPECfp_base2K, and assuming a 50% FP scalability and 15% FP improvement with Clawhammer using x86-64 recompilation, I'd imagine it will score roughly between 750-800 (((2 - 1.667)/1.667)*.5 + 1) * 600 * 1.15 = 760). While respectable for an x86 MPU, it can't really compare to McKinley or EV7, which should both be in the range of 1400-1500. Sledgehammer may tighten the gap a little bit, with its wider memory bus and the possibility of a larger L2 cache...perhaps 40-50% FP IPC boost compared to the XP for a SPECfp_base2K score of 900-1000.


There are still a lot of uncertainties involved, namely cache size, associativity, and latency, as well as clock speed. This is by no means meant to be an accurate prediction of performance given the ~10 minutes I spent thinking about this, so don't quote me on it as proof of anything.
 

Texmaster

Banned
Jun 5, 2001
5,445
0
0
Clawhammer will be AMD's test baby for a new technology. It will only get better in time but I really don't see any major competition with Intel for at least the first 3 months its. out.
 

dullard

Elite Member
May 21, 2001
26,107
4,754
126


<< Clawhammer will be AMD's test baby for a new technology. It will only get better in time but I really don't see any major competition with Intel for at least the first 3 months its out. >>


Texmaster, there isn't much we agree on. But this time, I agree with you 100%. The 3400+ Clawhammer doesn't seem like it will be a big success. It is the over 4000+ Hammers which are supposed to be out shortly later that will be impressive. Especially since that will give time for the price to settle down and for more motherboard selection.
 

Texmaster

Banned
Jun 5, 2001
5,445
0
0


<< Texmaster, there isn't much we agree on. But this time, I agree with you 100%. The 3400+ Clawhammer doesn't seem like it will be a big success. It is the over 4000+ Hammers which are supposed to be out shortly later that will be impressive. Especially since that will give time for the price to settle down and for more motherboard selection. >>



Agreed. Did I say that? :D