• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How AMD and its partners are putting x86 back on the right track

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Oh My what the world happen for one thing I'm sorry for all that postting but i in NO WAY posted that many time hit post one time and my pc started to freak out now i see a lot of the same postting from me....not my probs i say something wrong somewhere thoe!!!

once again to you ppl that post here and to mods sorry for all of the mess!
 
Well, it was so long ago I could prolly forget 😛. Don't take it all too serious, anyway. It was not AMD, but VESA. Just fine. Any difference? 😀

true, but then again, what about Hypertransport? PCI-X? DDR SDRAM? AMD and it's buddy companies were all instrumental in making these technologies into working products (ALi uses Hypertransport, and so does AMD for it's Hammer series, and AMD 8000 series chipsets, and if it wasn't for the Athlons' DDR FSB and ability to use all of the FSB bandwidth, DDR SDRAM would never have made it).

Sohcan

Um, this is just ridiculous and hypocritical. I can be very sure that the Athlon does not use microcode to execute SSE. Why criticize Intel for implementing SSE, then praise AMD for adopting it?

while I barely understand what microcode is (or for that matter, what they mean by an x86 instruction set), I do know that he had a very good explaination for saying what he/she did (at least about 3DNow!), which was:
Only once have they introduced instruction set extensions (3DNow!), and those were done to address the _shortcomings_ of a marketing-driven extension set from Intel (MMX).
which happens to be true. MMX did have SIMD abilities, but AFAIK they were only on integer operations. this was introduced around the same time that the Voodoo 2 was popular if I remember correctly, which means gaming/3D/decoding MPEG2/MPEG4 wasn't necessarily on everyone's mind... in fact, this was about the time MP3s became somewhat popular on the internet (though hard to find, as you had to find an FTP search engine, or just strike it lucky finding a website with them).

McKinley will be released this month, Hammer is still unsure

this person wasn't talking about when it would be released, he/she was talking about:
IMHO, if this happens, Intel will have its issues go exponential. Not only will they have a tough time proving to people that IA-64 is viable versus this new VLIW2, but their other strategy revolves around the, "now dying," x86-64 ISA. Since IA-64 hasn't "caught on" yet and there is a very good chance that even 2nd gen "McKinley" won't either (the consumer version isn't due until late 2003), the only chance Intel has is to go x86-64 "full bore" and keep people from moving off it. So we're back to Intel actually being the "x86 forever" guys!
that may or may not be accurate (I don't know I haven't looked at an Intel roadmap for ages, being stuck here in a hospital) when it comes to the date, but also remember how much trouble Intel had actually producing the Itanium: even if today they 'released' it, the likelyhood of us actually SEEING it starts reminding me of those .18 micron P3 days, when all you could do was dream about the 1ghz P3, while the lucky AMD guys actually got them (grr).

LOL, whatever...x86-64 may be a (small) step in the right direction, but calling any extension of x86 an "engineering bliss" is pretty laughable. x86-64 does nothing to address x87 floating-point, as well as the decoding nightmare that x86 is with its numerous prefixes and addressing modes

I agree, I don't see x86 helping us much, even though I have no idea of what features they added other than it being 64 bit (which means more addressable memory AFAIR). however, I do know that if it catches on where Intels IA-64 so far has not (which surprised me, because I expected more response to the Itanium), that could spell trouble for Intel like he/she says in this 'article.' It could be the first step in AMD moving towards the 'dream architecture' that Alpha fans everywhere would flock to. you have to notice that this is more of a connect the dots game this guy is playing. obviously he sees the advantages in Alpha's and Transmeta's technologies, and somehow found a way for his dreams to come true by adding a little company called AMD into the mix.. after all, do you actually think either one of those two companies are going to be able to do something like this now that one of them is dead?? his idea is certainly interesting.
 
Interesting read, but I think he is grasping at straws...

A minor correction: I am almost positive that X86-64 is an open licensed technology. Intel wouldnt have to license anything from AMD. They would just have to eat crow for being second if they ever do release Yamhill.
 
<<x86-64 does nothing to address x87 floating-point, as well as the decoding nightmare that x86 is with its numerous prefixes and addressing modes>>

Doesn't x86-64 allow larger double-precision floating point numbers than x86-32?
 
Okay, I'm back....just to clarify, this article isn't necessarily bad, but there are a few major points that IMHO the author missed:
- I think he assumes too much that AMD has already succeeded with x86-64 if he is saying stuff like "the same company who is now in control of IA-32, AMD". They may, now more then ever, have the potential to control the direction of x86, but it's still going to be a long, uphill battle that won't be resolved for years. And to assume that Intel will just stand by and watch is irresponsible.

- The author uses microarchitectural terms that he doesn't understand, and his discussion of 3DNow and SSE is rubbish.
So while Intel has to slap on yet another execution unit and more registers, AMD just figures what FPU pipes to use and registers to dedicate to it
This is not a basis for any merits on AMD's part; the latter half of the sentance is how any datapath functions. The Athlon has vector units just like the P3/P4, that share issue ports with its FPUs. And just like is possible with any MPU, changes in the decoding control and datapath can support new instructions. And while AMD has said little about its SSE implementation in the Athlon, it is quite probable that it has a physical SSE register set, since they are 128-bit registers while the renaming registers the Athlon are 88-bit (IIRC). But the existence of a physical register set or aliased registers is hardly any basis for merit on either side....he's just grasping for straws here.

Whenever Intel adds another 50+ opcodes for some fancy, schmancy multimedia niche, AMD just writes some microcode to leverage its FPU (or ALU in some case) to do it
Again, rubbish...first of all, all SIMD sets in the Athlon use the FP reservation station and issue ports, not those of the integer. Secondly, microcode is only used in the most esoteric x86/x87 instructions, those that decode to over 4 uops. I really doubt he understands the intricacies of microcode vs. hardwired control.

- There are a number of errors in his IA64 discussion:
And in an effort to completely eliminate the dreadful even of a processor stall caused by branch misprediction, it introduced the concept of branch "predication" where both branches are executed and the road not taken result is discarded when the branch is resolved
Predication is not meant to eliminate branch mispredictions; even Intel states it cuts the number of mispredictions in half. While you can predicate just about any IA64 instruction, it is only used for short if-then blocks; predicating loops would use too many execution resources. In cases where predication is not used, dynamic and state branch prediction is used.

I mean, while an 800MHz, 0.18um Itanium toasts even a 2.4GHz, 0.13um Pentium 4 at floating point, even 3-year old, 600MHz 0.35um Alpha 264s _outperforms_ that same Itanium by an even wider margin!
On the contrary, the 800 MHz Itanium (Merced) has a SPECfp2k base score of 703, compared to the 2.4GHz P4's 803 and the 833 MHz Alpha 21264B's 643. I don't know where he gets the idea that the 600MHz 21264A beats the Itanium by a wide margin (in SPECint, perhaps, but that wasn't his discussion).

Since IA-64 hasn't "caught on" yet and there is a very good chance that even 2nd gen "McKinley" won't either
I think he greatly underestimates McKinley. Not only is it going to alleviate Merced's performance problems (even in its IA32 implementation according to comp.arch discussions), but it already has far more system support than Merced enjoyed.

Personally, I think the concept of AMD's own VLIW ISA that will leave Intel in the dust is a lot like the talk from the Apple folks that we will all be using MacOS and Apple hardware within a few years. And speaking as a (as of this fall) grad student studying computer architecture, I think the author puts too much basis in VLIW as an ISA for general purpose MPUs. One of EPIC's goals (as of 10 years ago) was to simplify issuing and scheduling hardware, since an n-way out-of-order superscalar core requires scheduling hardware that is on the order of n^2 complexity. As of yet, this prediction hasn't turned true; we are still doubling transistors every 18 months, and the trend towards larger caches eliminates any logic advantage VLIW may have. And the two IA64 implentations, Merced and McKinley, haven't shown any advantage yet in complexity and clock speed (not that it matters, IA64 implementations are still capable of excellent performance, as we will see with McKinley). But there is absolutely no reason, as of yet, for everyone else to jump ship and develop their own VLIW ISA and implementations.

Many of the current ideas for new microarchitectural paradigms for billion+ transistor MPUs are less design philosophy dependent. First of all, advanced superscalar and superspeculative processors extends current out-of-order superscalar to up to 64-way issue with data value speculation, instruction trace caches, larger tournament-based branch predictors, and much, much larger reorder windows (2000 instructions). Many of the other ideas; chip-level multiprocessors, SMT processors, and trace and multiscalar processors; are design to exploit thread-level-like parallelism that can be realized on any ISA design philosophy, beit RISC/x86/IA64 or out-of-order superscalar or in-order VLIW. The only strictly vector/VLIW-like approach for future paradigms is that of Berkeley's V-IRAM, which embeds relatively simple Vector and CPU cores (around 7 million transistors, including caches) onto 96 MB of embedded memory (800 million transistors) with a high-bandwidth memory crossbar. The CPU is a 2-way in-order superscalar core, and the vector unit contains two 8-way 64-bit pipelines. I haven't heard favorable opinions about this concept, though...analysis that I've read has acknowledged that integer performance would be worse with respect to even in-order VLIW (IA64).

I see very, very little reason for AMD to break on such a risky path towards a new VLIW ISA and design philosophy when x86-64 and out-of-order superscalar can suit them just fine. The only reason Intel is capable of doing VLIW with IA64 is because they have the time, money, and design resources to try to make such an unorthdoxed path to work. There are a couple of students in the department here that have interned with AMD on the K9, perhaps I could ask them. 🙂

Doesn't x86-64 allow larger double-precision floating point numbers than x86-32?
Not that I'm aware of....x87 already has supported (since the late 70s) 32-bit single-precision, 64-bit double-precision, and 80-bit extended precision.


McKinley will be released this month, Hammer is still unsure

this person wasn't talking about when it would be released, he/she was talking about:

Quote

--------------------------------------------------------------------------------
IMHO, if this happens, Intel will have its issues go exponential. Not only will they have a tough time proving to people that IA-64 is viable versus this new VLIW2, but their other strategy revolves around the, "now dying," x86-64 ISA. Since IA-64 hasn't "caught on" yet and there is a very good chance that even 2nd gen "McKinley" won't either (the consumer version isn't due until late 2003), the only chance Intel has is to go x86-64 "full bore" and keep people from moving off it. So we're back to Intel actually being the "x86 forever" guys!
--------------------------------------------------------------------------------
Well, I don't agree...I think he was confusing the release dates of McKinley and Sledgehammer (the latter of which isn't due, by AMD's roadmaps, until first half 2003). And what he calls the "consumer version" of IA64 is likely Deerfield, which is actually a lower-power variant of Madison that is meant for slim rack-mount servers. Intel still has no official plans to bring IA64 to the desktop in the near future. When he says something like IA64 and McKinley has a strong chance of failing, whereas x86-64 has already succeeded as well as some pipe-dreamed AMD VLIW ISA, I think his bias is showing through a bit too much. This course is certainly not impossible, but definite very improbably IMHO.


...being stuck here in a hospital)
That sucks, I hope you're alright. 🙁
 
Although I am only able to comprehend maybe 50~60% of the article and the rebuttal discussions I can say this is an excellent read (and re-read).

Keep the discussion going!
 
((I mean, while an 800MHz, 0.18um Itanium toasts even a 2.4GHz, 0.13um Pentium 4 at floating point, even 3-year old, 600MHz 0.35um Alpha 264s _outperforms_ that same Itanium by an even wider margin!))

<<On the contrary, the 800 MHz Itanium (Merced) has a SPECfp2k base score of 703, compared to the 2.4GHz P4's 803 and the 833 MHz Alpha 21264B's 643. I don't know where he gets the idea that the 600MHz 21264A beats the Itanium by a wide margin (in SPECint, perhaps, but that wasn't his discussion).>>

Maybe he meant SPECfp95? Seems like that benchmark didn't support modern SiMD optimizations.

 
Originally posted by: jm0ris0n
Although I am only able to comprehend maybe 50~60% of the article and the rebuttal discussions I can say this is an excellent read (and re-read).

Keep the discussion going!

 
Maybe he meant SPECfp95? Seems like that benchmark didn't support modern SiMD optimizations.

SPECFP95 and 2000 are C, C++ and Fortran source code programs that are COMPILED. It's up to the compiler whether it should use SIMD instructions or not.
 
Originally posted by: MadRat
((I mean, while an 800MHz, 0.18um Itanium toasts even a 2.4GHz, 0.13um Pentium 4 at floating point, even 3-year old, 600MHz 0.35um Alpha 264s _outperforms_ that same Itanium by an even wider margin!))

<<On the contrary, the 800 MHz Itanium (Merced) has a SPECfp2k base score of 703, compared to the 2.4GHz P4's 803 and the 833 MHz Alpha 21264B's 643. I don't know where he gets the idea that the 600MHz 21264A beats the Itanium by a wide margin (in SPECint, perhaps, but that wasn't his discussion).>>

Maybe he meant SPECfp95? Seems like that benchmark didn't support modern SiMD optimizations.
he obviously thinks lower is better.

 
Back
Top