Thoughts and speculations about Prescott design decisions

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: CTho9305
You have to remember that older processors were designed to work with main memory that was "almost" as fast as the processors themselves, so other improvements (superscalar stuff, all that fancy technology) aside, you still wouldn't get as much performance out of them with memory as relatively slow as it is now.
Even if we were to run memory at 100 MHz and the CPU at a reasonable 3x faster at 300 MHz (multipliers went to at least 3 for Pentium), a Pentium II 300 with 66MHz SDRAM could probably beat it. If we ran the memory at 200 MHz and the core at 600 MHz, my bet would be on the PentiumIII 600 even with memory at 133MHz. My point was that at the time of the P6 core introduction, in order execution was reaching its limit. It took OOOE to stay on track with Moore's Law.

It seems you're right. So what were the ~133MHz CPUs I put in 486 boards that was rated a P90?
I don't know for sure. However, I do recall some tidbit about a company producing a RISC cpu and running x86 emulation. Wouldn't bet anything on it, though.
 

SuperTool

Lifer
Jan 25, 2000
14,000
2
0
http://users.pandora.be/NielsBockx/PW1%20CPU's.htm

13. Pentium 4 (> 2 GHz): (The Northwood Core)

- Die Size of the processor: 146 mm²

- Number of transistors: 55 000 000

- Initial clock speed / maximal clock speed: 2 GHz / 3,2 GHz

- L1 cache: 20 KB

- L2 cache: 512 KB

15. Centrino:

- Die Size of the processor: 84 mm²

- Number of transistors: 77 000 000

- Initial clock speed / maximal clock speed: 1,6 GHz / 1,7 GHz

- L1 cache: 64 KB

- L2 cache: 1 MB

That's what I mean. If you look at centrino, about 1/2 is L2$, so the core size minus L2$ is around 45mm2.
If you look at P4 core, around 1/6th is L2$, so the core minus L2$ size is about 120u. That means you can put about 2 centrino cores and more L2$ than P4 on the same die that you can put just one P4 core and less L2$. Which one do you think would have better overall performance, single cpu P4 or 2 CPU CMT Centrino, provided that single processor centrino performance is very close to that of P4?
 

MadRat

Lifer
Oct 14, 1999
11,999
307
126
The 486 processors from AMD were 486SX copycats of Intel and the 5x86 family. The 5x86 family went all the way up to 160MHz officially on the 40fsb, but people were poking 50fsb through them to run 200MHz. The problem with 5x86 (*I googled*) was that it lacked a Pentium-compatible (80-bit) FPU, some Pentium-specific processor (RDTSC, etc.) instructions, 4MB paging, OOOE, and a superscalar design. The 5x86 family was AMD's own design and unrelated to NexGen's Nx586 design, which was more 386 than 486 or Pentium, although it was rated against Pentiums... You probably remember NexGen's Nx686 that became the K6 in the Socket7. The K5 was an AMD design that was late to market and did not scale well. AMD bought NexGen because they had a mature Nx686 design almost ready for launch. Problem with Nx processors were their unique socket for each revision, which was too costly to maintain. AMD rehashed Nx686, added Pentium-specific processor instructions (If you had a Cyrix/IBM 6x86 then you know how critical those missing Pentium-specific instructions were!) and launched the K6-233 to market.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Sahakiel
Originally posted by: CTho9305
You have to remember that older processors were designed to work with main memory that was "almost" as fast as the processors themselves, so other improvements (superscalar stuff, all that fancy technology) aside, you still wouldn't get as much performance out of them with memory as relatively slow as it is now.
Even if we were to run memory at 100 MHz and the CPU at a reasonable 3x faster at 300 MHz (multipliers went to at least 3 for Pentium), a Pentium II 300 with 66MHz SDRAM could probably beat it. If we ran the memory at 200 MHz and the core at 600 MHz, my bet would be on the PentiumIII 600 even with memory at 133MHz. My point was that at the time of the P6 core introduction, in order execution was reaching its limit. It took OOOE to stay on track with Moore's Law.

The P6 core was the first major x86 core to use internal RISC-like ops.
 

Eskimo

Member
Jun 18, 2000
134
0
0
Originally posted by: CTho9305


It is generally accepted that Intel has better fabs than AMD...

Only by those who either work for Intel or receive their information on the subject from Intel. If you look at the results from Sematech's survey you'll see that AMD is very competitive and leads the industry in quite a few indices of manufacturing excellence. Most people equate bigger = better and/or receive what little knowledge they have on the subject from Intel itself.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: CTho9305
The P6 core was the first major x86 core to use internal RISC-like ops.

P6 was the first core from Intel to use micro-ops and the first to have OOOE. AMD's K5 had OOOE and generally ran faster clock for clock than Pentium at the 100-200MHz range. Micro-ops allow faster clock speeds whereas OOOE allows higher issue rate.
If you were to graph Intel's processors on a performance vs clock speed graph, you would find a slight dip below the line defined by Moore's law starting at the 180-200MHz range for Intel's Pentium line. The Pentium Pro/Pentium II at the 200-300 MHz performed slightly above the line extrapolated from the Pentium family; enough to re-align Intel's processor performance with Moore's Law. There are several smaller dips here and there throughout the entire history from 8086 to Pentium 4, but if you ignore raw clock speed, some of those dips correspond with the limits of certain technologies.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: SuperTool

That's what I mean. If you look at centrino, about 1/2 is L2$, so the core size minus L2$ is around 45mm2.
If you look at P4 core, around 1/6th is L2$, so the core minus L2$ size is about 120u. That means you can put about 2 centrino cores and more L2$ than P4 on the same die that you can put just one P4 core and less L2$. Which one do you think would have better overall performance, single cpu P4 or 2 CPU CMT Centrino, provided that single processor centrino performance is very close to that of P4?

First off, I don't think Banias has multiprocessing support.
Second, I don't think Banias has HyperThreading.
Third, I don't think Banias has a trace cache.
Fourth, I don't think Banias supports PAE

I can't seem to find information on Banias' pipeline depth, though estimates seem to put it between 10-20. My guess is it's closer to 20 than 10, but including x86 decoding.
I also don't know how many instructions in flight Banias can support.

What's my point? Simply that both cores are different from each other and it's difficult to tell whether you could even get two Banias cores to work together. You might as well compare Pentium 4 to Athlon and see if you can go dual-core Athlon with shared L2 cache on the same die size.
 

SuperTool

Lifer
Jan 25, 2000
14,000
2
0
Originally posted by: Sahakiel
Originally posted by: SuperTool

That's what I mean. If you look at centrino, about 1/2 is L2$, so the core size minus L2$ is around 45mm2.
If you look at P4 core, around 1/6th is L2$, so the core minus L2$ size is about 120u. That means you can put about 2 centrino cores and more L2$ than P4 on the same die that you can put just one P4 core and less L2$. Which one do you think would have better overall performance, single cpu P4 or 2 CPU CMT Centrino, provided that single processor centrino performance is very close to that of P4?

First off, I don't think Banias has multiprocessing support.
Second, I don't think Banias has HyperThreading.
Third, I don't think Banias has a trace cache.
Fourth, I don't think Banias supports PAE

I can't seem to find information on Banias' pipeline depth, though estimates seem to put it between 10-20. My guess is it's closer to 20 than 10, but including x86 decoding.
I also don't know how many instructions in flight Banias can support.

What's my point? Simply that both cores are different from each other and it's difficult to tell whether you could even get two Banias cores to work together. You might as well compare Pentium 4 to Athlon and see if you can go dual-core Athlon with shared L2 cache on the same die size.

Would you rather have SMT or CMT? I personally would rather have two dedicated cores than one thread picking up another one's leftovers. I am guessing if you can make a SMP P3 Xeon, you can make two Banias cores work together on a chip with less effort than it takes to make a 31 pipeline stage P4 work. Obviously the cores are different, which is why we are talking about the tradeoffs made in each core. I like the tradeoffs made in Banias better than those made in Northwood and Prescott. P4 is really climbing up the area and power tradeoff curb and fighting against diminishing returns. The Banias is in the Sweet Spot. If it's within striking distance as is, either putting an extra core or doubling the L2$ will still leave Banias smaller than the P4 core, and will provide much improved performance in real world applications.
 

MadRat

Lifer
Oct 14, 1999
11,999
307
126
PAE has been around since the PPro. Banias surely supports it since its more or less a Pentium-family standard.

I'm not so sure we'd want just CMT or SMT in a dual-core Banias setup. The mere presence of CMT or SMT really isn't much benefit unless the compilers are optimized for it, since it ends up boiling down to doing work on one logical processor at a time. Simply having four logical processors doesn't mean it can do four separate processes at a time, nor four threads from any one process. Perhaps it would benefit both AMD and Intel to design a common CMT/SMT framework with a future scaling of up to thirty-two or more logical processors.

The cool thing about Windows Longhorn should be its ability to scale logical processor support much like IBM does on their midrange servers but only using XML-based interprogram communications - each program should be able to snag control of a logical processor and digest its own XML scripts independent of the others. If I understand correctly then XML transfers should operate alot like packet traffic on a LAN, with each program not really caring what the other is doing - totally opposite a concept of API-based programming. Not only that, but your system shouldn't need to rely on logical processors within its own system, allowing multiple machines to be tied together at a more intimate level.

When it comes down to it Intel should more or less make SMT a standard across their Pentium-family now. I'd be interested to see how the trace cache concept works in a Banias core, too, being its designed around SMT support and ultra-low latencies, a good match for either SMT or short pipeline designs. I'm guessing that Intel's SMT is pretty dependent on their trace cache and the conventional L1 wouldn't allow SMT to be much benefit.
 

rimshaker

Senior member
Dec 7, 2001
722
0
0
The 90nm process is not the main problem.

It's more of a new materials and process being integrated with the 90nm node:
1) low k dielectrics (for interconnects)
2) Strained silicon

Obviously all the kinks haven't all been discovered or matured yet.
 

Wolfdog

Member
Aug 25, 2001
187
0
0
I think that the full version of prescott probably will show actual improvements accross the board. There has been specualtion that they have already included quite a few unannounced improvments. I think that they had to turn them off in the short run, just like hyperthreading in the willamette core. With the power issues they have had with thier new core, it just couldn't be done before they launched the product. What is surprising is that they didn't include any of thier "next generation" building blocks. They have the l1 cache technology to keep it single cycle, but they didn't use it. When it comes down to it though the p3 tualatin was Intels last good all around processor. It ran very cool and could have easily be brought back to life. When you have a 1.4ghz p3 processor that performs as good or better than a 1.8ghz p4 then they should have taken a hint. I am looking forward to thier launch of a desktop pentium M product. It has the best of both worlds. Low power consumption, and high performance.