<< I have been asked by several people for performance estimates of AMD's
Hammer series of processors so I'll give it a shot. Keep in mind that x86-64
is a new microarchitecture so there is wide room for error on any or all of
these factors:
Reference point:
- K7 XP 2000+
- at or near end of performance scaling in 0.18 um bulk CMOS
- 1667 MHz, ~700 SPECint_base2k, ~600 SPECfp_base2k
Hammer top bin clock rate (early/mature):
- 5%/5% bump from 12 stage pipeline (extra stages mostly for IPC
gain and for handling extra complexity of x86-64)
- 20%/25% gain from 0.13 um (wire limitation, limited Leff reduction
from late model 0.18 um K7s vs use of 0.09 um FET techniques)
- 10%/15% gain from SOI
Total +35% early, +45% mature
Microarchitectural gains:
- biggest difference is on-chip memory controller. If we assume best of
class K7 chip sets average 100 ns access for ~50% page hit mix and
moderate traffic and integrating the memory controller shaves 30 ns
(probably a bit generous) off read latency, and integer app performance
scalability is 60%, then speedup is approximately 1/(0.6 + 0.4*(70/100))
or 19%. Assume othe efficiencies like better buffering and round that to
20%. For larger cache/wider memory Sledgehammer, I'll say 25% bump
for integer apps. FP apps are much more bandwidth sensitive than
latency sensitive so I'll apportion 5%/40% for Claw/Sledge.
- The improved front end I'll apportion 5%/0% for int/FP apps.
- for x86-64 compiled apps I'll apportion 5/10% for int/FP apps from
increased number of GPRs available and other efficiencies.
So "IPC" improvements relative to XP (with x86-64 recompilation):
Clawhammer:
int: 20% MC + 5% FE + 5% x86-64 = 30%
FP: 5% MC + 0% FE + 10% x86-64 = 15%
Sledgehammer:
int: 25% MC + 5% FE + 5% x86-64 = 35%
FP: 40% MC + 0% FE + 10% x86-64 = 50%
SPECint/fp_base2k estimates (assume 70%/50% int/FP perf scaling with F)
with full x86-64 recompilation:
Early top bin (+35%, ~2250 MHz)
Claw: 1150 / 800
Sledge: 1200 / 1050
Mature top bin (+45%, ~2400 MHz)
Claw: 1200 / 850
Sledge: 1250 / 1100
If the Hammers are running generic or P4 optimized 32 bit x86
code then I would discard the x86-64 IPC bump and cut the
FE bump in half. That will reduce the performance by about 6
to 8%.
FWIW a 3400 MHz XP would probably score roughly around 1050
SPECintbase_2k so if Hammer's model rating number was based
on SPECInt then a 3400+ Clawhammer would clock around 2 GHz.
Conversely, a 2.25 GHz Claw would rate around a 4000+ rating.
Now remember folks that is a 15 minute, back of an envelope
calculation/estimate/WAG and 5 minutes was taken to find the
envelope. ;-)
>>
