Originally posted by: Cerb
Take an already efficient chip, enter 5 years or so of real tweaks (per-clock performance from slot Athlons to Bartons have been entirely dependent of cache).
Cache that is faster by an order of magnitude.
Better register renaming.
Better branch predicting.
Remove a crippling 3.2GB/s FSB for a 3.2GB/s each way HT link, with RAM having its
own bus to get to the CPU, no sharing.
Remove the latency of going to the chipset, then out to RAM. Speed up the RAM controller on top of that.
Then there are tweaks I haven't mentioned, and many which AMD won't say a word about (have you seen how many patents they made? And that was just for the x86-64...who knows how many "trade secrets" they have?).
All this without sacrificing a design with short pipelines, so it is great for branching tasks and office multitasking.
There are a handful of synthetic math benches where the A64 is only 10% or so faster than an equivalently-clocked Athlon, so many thinks are similar...but in EVERY real benchmark (as in how real software perform doing real tasks), the A64 handily beats Athlons clocked significantly higher.
As far as boot times, that's mostly HDD limited. A SCSI HDD will get faster boot times

.