Originally posted by: MadRat
CTho9305-
You probably meant to say the 180nm P!!!'s topped out at 1.2GHz, not the 130nm P!!!'s. Also they did release the 130nm P!!! up to 1.4GHz on a 133fsb, and the 130nm Celeron went up to 1.4GHz on 100fsb. Banias, on the other hand, runs on a 400fsb (100MHz-QDR) just like the original P4 family. To be perfectly honest, though, the P!!! never really was given much consideration for Intel's upper end once they brought out the 130nm version because it was so starved for front-side bus and raising it about 1.4GHz made little sense. There are plenty of people that were able to push 130nm P!!!'s to over 1.6Ghz on air, so its not like it couldn't of scaled more in raw MHz.
If I remember correctly, my impression at the time was that Intel didn't want to raise P6 core speeds too fast because otherwise it would cannibalize Pentium 4 sales. Also, Pentium 4 was a year or two late out the door thanks to a last minute decision to shove in HyperThreading. If it had been released on schedule, I think Intel would've been the first to 1 GHz and hyperpipelining would never have become such a public issue. I mean, when Alpha took the speed crown with their 500MHz chip, I don't remember much hooplah over the deep (for the time) pipeline.
As for Banias, it's bus interface is the Pentium 4's design, last I heard. It actually would be pin compatible if it weren't for Intel changing one little pin on the Banias chip ( you have no idea how sore I am over that ).
Oh, yeah, and by the time Pentium 4 came out, I think the P6 core was almost five years old. That's pretty damn old.
And if Banias is to be considered the top of the P6 line, then its not a 1.6GHz cap but rather a 1.7GHz cap. Banias helped to uncork the P!!! legacy by mating it to the P4's front-side bus. It makes me wonder how much headroom it would realize on either the 533fsb (133MHz-QDR) or the 800fsb (200MHz-QDR) of the Northwood, considering how AMD was only able to keep the XP line alive above 2GHz by scaling up to a 200MHz-DDR fsb and strapping on twice the L2 cache. Last time I checked the Banias L2 was 1MB, twice what we find in Mr. Athlon Barton. Banias on the higher fsb could probably easily compete with the Barton.
I can't help but recall reading of the Banias core being designed from scratch. It borrowed lots of features from the P6 and Pentium 4 cores and also mixed in a few new ideas. Also, scaling a chip has little to do with the FSB. You could technically scale a CPU to 3 GHz on a 33MHz FSB. Obviously, it's a bad idea, but I'm just pointing out that the ability to increase core clock is not dependent on FSB. Well, unless you run out of multipliers, or something.
I'm not sure what your point was about SMT. The main reason Intel uses it in the P4 is because it needs to fill idle pipelines to fight its otherwise high bandwidth, high latency memory design. AMD on the other hand keeps the need for SMT at a minimum by keeping the pipeline short and the memory latency to a minimum. SMT and other TLP designs benefit from high memory bandwidth (not necessarily low latency) and prefetching.
Unfortunately, I keep seeing a different picture than you. Everywhere I look, the trend is towards longer pipelines, multiple threads, and VLIW designs. Not a single paper I've read that has been published in the last ten years has concluded that shorter pipelines are required for future designs. In fact, most papers tend towards decreasing complexity, longer pipelines, and multithreading. Some papers are suggesting anywhere from 40-60 pipeline stages depending on the complexity of the architecture. However, the biggest problem with deep pipelines is ILP followed by wire delay. The ILP is such a vexing problem due to x86, which is why Intel chose to try ditching it. In fact, they've been trying since before 1994, the year Intel and HP announced they were already working on Itanium.
Oh, and btw, pipelines have always been "high latency" designs. They sacrifice high latency for throughput and it's especially true these days due to increasing latency everywhere in the system. SMT can benefit from low latency. It would be like comparing wide data bus with slow clock or high clock with narrow bus.
I would venture to say AMD doesn't use SMT on the Athlon because they don't have the resources to implement it. Unfortunately, I don't know much about the memory bus for Athlon or Pentium 4. They both run at the same clock speed so I assume the memory latency is pretty close. Unless, of course, one or the other decided to deeply pipeline the memory bus.
As the design decreases in pipeline complexity then load latency is more important than raw memory bandwidth, hence the reason AMD chose a very fast front-side bus.
The Pentium 4 doesn't "need" HyperThreading in the sense that it's useless otherwise. HyperThreading was thrown in smack dab in the middle of the design cycle because tests showed that on average, about 50% of the Pentium 4's execution engines were in use at any given time. 50% meant you could almost fit another thread in there. It's not a "trick" to keep Pentium 4 on par with Athlon much the same as Athlon's 200MHz FSB isn't a "trick" to keep it competitive with Pentium 4. 200MHz EV6 bus was on the charts long before Pentium 4 came out simply because high FSB becomes a necessity when the core speed scales too high.
The future for Intel is long pipelines reinforced by SMT tricks, multiple cores running multiple threads in parrallel, high memory bandwidth, small L1 caches since a cache miss isn't necessarily doom and gloom, and large L2/L3 caches for efficient prefetching. The future for AMD would seem to be short pipelines, large L1 caches, SMP on a chip, hypertransport links between CPU's, NUMA memory architecture, and a low latency front-side buses. They both have the same endgame and actually somewhat aim at differing markets, but in the end the performance is nearly identical across the board.
I am willing to bet that the future of both companies is "long pipelines reinforced by SMT tricks." Single thread performance has pretty much hit a brick wall due to ISA constraints and the nature of programs in general. The easiest way to increase performance is multi-threading.
Multiple cores is a byproduct of shrinking transistors. When your logic consumes on the order of 15% of your die space, it suddenly becomes feasible to shove another CPU on die. I remember both Intel and AMD saying that CMP could become feasible with 90nm, and more likely than not with 65nm.
Smaller L1 caches are a byproduct of lower latency designs. It may become the trend if clock speeds ramp so high or we may see several small caches feeding several cores. L2 caches may become smaller as well to keep up with clock speed in which case L3 would be requierd to avoid capacity misses.
The future of AMD is longer pipelines, as evidenced by the K8 core. Granted, it's only 2 more stages, but what can you expect from a cash-strapped company? At least the rest of the core isn't simply copied over from the K7.
The larger size of AMD's chips comes from the shorter pipelines. Shorter pipes don't scale as well in raw clock speed, so it's kind of pointless to make a really low latency cache when the larger capacity makes up for the small loss in latency. If AMD does come out with a hyperpipelined CPU, I don't think it would have large L1 caches as well. The P6 core only had small L1 caches because it was designed at a time when 16k L1 isn't really all that small. As to why the Pentium III revision didn't have larger caches, I can only speculate.
Hypertransport is definitely AMD's CPU interface for years to come. Heck, they developed it in the first place and brought memory controllers on-die for that reason. Byproduct of on-die memory controller : low-latency CPU to northbridge connection. Unfortunately, main memory (read: DRAM) is still high latency so thanks to Amadahl's law, that idea is gonna hit a brick wall unless memory technology improves. However, Intel would also benefit.
On the same vein, NUMA memory architecture is likely something AMD would want done, but isn't something the company would kill for. Opteron systems have localized memory banks for sure, but I don't know enough about the latency between banks to wonder if NUMA would bring any benefit.
Intel and AMD have their own roadmaps, but there are similarities. The main differences come from Intel's massive financial and manufacturing advantage and possibly some differences in opinion. If both companies were equally wealthy and capable, I don't think their roadmaps would be too different. After all, the K5 was technically superior to the Pentium, yet shared much of the same features.