Okay, I'm back....just to clarify, this article isn't necessarily bad, but there are a few major points that IMHO the author missed:
- I think he assumes too much that AMD has already succeeded with x86-64 if he is saying stuff like "the same company who is now in control of IA-32, AMD". They may, now more then ever, have the potential to control the direction of x86, but it's still going to be a long, uphill battle that won't be resolved for years. And to assume that Intel will just stand by and watch is irresponsible.
- The author uses microarchitectural terms that he doesn't understand, and his discussion of 3DNow and SSE is rubbish.
So while Intel has to slap on yet another execution unit and more registers, AMD just figures what FPU pipes to use and registers to dedicate to it
This is not a basis for any merits on AMD's part; the latter half of the sentance is how
any datapath functions. The Athlon has vector units just like the P3/P4, that share issue ports with its FPUs. And just like is possible with
any MPU, changes in the decoding control and datapath can support new instructions. And while AMD has said little about its SSE implementation in the Athlon, it is quite probable that it has a physical SSE register set, since they are 128-bit registers while the renaming registers the Athlon are 88-bit (IIRC). But the existence of a physical register set or aliased registers is hardly any basis for merit on either side....he's just grasping for straws here.
Whenever Intel adds another 50+ opcodes for some fancy, schmancy multimedia niche, AMD just writes some microcode to leverage its FPU (or ALU in some case) to do it
Again, rubbish...first of all, all SIMD sets in the Athlon use the FP reservation station and issue ports, not those of the integer. Secondly, microcode is only used in the most esoteric x86/x87 instructions, those that decode to over 4 uops. I really doubt he understands the intricacies of microcode vs. hardwired control.
- There are a number of errors in his IA64 discussion:
And in an effort to completely eliminate the dreadful even of a processor stall caused by branch misprediction, it introduced the concept of branch "predication" where both branches are executed and the road not taken result is discarded when the branch is resolved
Predication is not meant to eliminate branch mispredictions; even Intel states it cuts the number of mispredictions in half. While you can predicate just about any IA64 instruction, it is only used for short if-then blocks; predicating loops would use too many execution resources. In cases where predication is not used, dynamic and state branch prediction is used.
I mean, while an 800MHz, 0.18um Itanium toasts even a 2.4GHz, 0.13um Pentium 4 at floating point, even 3-year old, 600MHz 0.35um Alpha 264s _outperforms_ that same Itanium by an even wider margin!
On the contrary, the
800 MHz Itanium (Merced) has a SPECfp2k base score of
703, compared to the 2.4GHz P4's
803 and the 833 MHz Alpha 21264B's
643. I don't know where he gets the idea that the 600MHz 21264A beats the Itanium by a wide margin (in SPECint, perhaps, but that wasn't his discussion).
Since IA-64 hasn't "caught on" yet and there is a very good chance that even 2nd gen "McKinley" won't either
I think he greatly underestimates McKinley. Not only is it going to alleviate Merced's performance problems (even in its IA32 implementation according to comp.arch discussions), but it already has far more system support than Merced enjoyed.
Personally, I think the concept of AMD's own VLIW ISA that will leave Intel in the dust is a lot like the talk from the Apple folks that we will all be using MacOS and Apple hardware within a few years. And speaking as a (as of this fall) grad student studying computer architecture, I think the author puts too much basis in VLIW as an ISA for general purpose MPUs. One of EPIC's goals (as of 10 years ago) was to simplify issuing and scheduling hardware, since an n-way out-of-order superscalar core requires scheduling hardware that is on the order of n^2 complexity. As of yet, this prediction hasn't turned true; we are still doubling transistors every 18 months, and the trend towards larger caches eliminates any logic advantage VLIW may have. And the two IA64 implentations, Merced and McKinley, haven't shown any advantage yet in complexity and clock speed (not that it matters, IA64 implementations are still capable of excellent performance, as we will see with McKinley). But there is absolutely no reason, as of yet, for everyone else to jump ship and develop their own VLIW ISA and implementations.
Many of the current ideas for new microarchitectural paradigms for billion+ transistor MPUs are less design philosophy dependent. First of all, advanced superscalar and superspeculative processors extends current out-of-order superscalar to up to 64-way issue with data value speculation, instruction trace caches, larger tournament-based branch predictors, and much, much larger reorder windows (2000 instructions). Many of the other ideas; chip-level multiprocessors, SMT processors, and trace and multiscalar processors; are design to exploit thread-level-like parallelism that can be realized on any ISA design philosophy, beit RISC/x86/IA64 or out-of-order superscalar or in-order VLIW. The only strictly vector/VLIW-like approach for future paradigms is that of Berkeley's V-IRAM, which embeds relatively simple Vector and CPU cores (around 7 million transistors, including caches) onto 96 MB of embedded memory (800 million transistors) with a high-bandwidth memory crossbar. The CPU is a 2-way in-order superscalar core, and the vector unit contains two 8-way 64-bit pipelines. I haven't heard favorable opinions about this concept, though...analysis that I've read has acknowledged that integer performance would be worse with respect to even in-order VLIW (IA64).
I see very, very little reason for AMD to break on such a risky path towards a new VLIW ISA and design philosophy when x86-64 and out-of-order superscalar can suit them just fine. The only reason Intel is capable of doing VLIW with IA64 is because they have the time, money, and design resources to try to make such an unorthdoxed path to work. There are a couple of students in the department here that have interned with AMD on the K9, perhaps I could ask them.
🙂
Doesn't x86-64 allow larger double-precision floating point numbers than x86-32?
Not that I'm aware of....x87 already has supported (since the late 70s) 32-bit single-precision, 64-bit double-precision, and 80-bit extended precision.
McKinley will be released this month, Hammer is still unsure
this person wasn't talking about when it would be released, he/she was talking about:
Quote
--------------------------------------------------------------------------------
IMHO, if this happens, Intel will have its issues go exponential. Not only will they have a tough time proving to people that IA-64 is viable versus this new VLIW2, but their other strategy revolves around the, "now dying," x86-64 ISA. Since IA-64 hasn't "caught on" yet and there is a very good chance that even 2nd gen "McKinley" won't either (the consumer version isn't due until late 2003), the only chance Intel has is to go x86-64 "full bore" and keep people from moving off it. So we're back to Intel actually being the "x86 forever" guys!
--------------------------------------------------------------------------------
Well, I don't agree...I think he was confusing the release dates of McKinley and Sledgehammer (the latter of which isn't due, by AMD's roadmaps, until first half 2003). And what he calls the "consumer version" of IA64 is likely Deerfield, which is actually a lower-power variant of Madison that is meant for slim rack-mount servers. Intel still has no official plans to bring IA64 to the desktop in the near future. When he says something like IA64 and McKinley has a strong chance of failing, whereas x86-64 has already succeeded as well as some pipe-dreamed AMD VLIW ISA, I think his bias is showing through a bit too much. This course is certainly not impossible, but definite very improbably IMHO.
...being stuck here in a hospital)
That sucks, I hope you're alright.
🙁