Question Need some help with this CPU chart

Hulk

Diamond Member
Oct 9, 1999
4,191
1,975
136
I have been trying to extend a CPU chart from Anandtech to include CPU's all the way back to the 8086 but I'm not knowledgeable enough to complete it. Can you help me finish/correct it?

8086 (16 bit/16 bit)286 (16bit/24 bit)386 (32 bit/32 bit)486P5/PentiµmP6/Pentiµm ProP4/NetburstConroe/PenrynNehalem WestmereSandy BridgeIvy BridgeHaswellBroadwellSkylakeSunny CoveWillow Cove
L1-D Cache6 Byte PrefetchRAM of the time was fast enough to serve the processor.While not technically L1 cache (definition - speeds up memory access), the Memory Management Unit (MMU) has a 32 entry (128 Byte) TLB (Translation Lookaside Buffer) which stores page table for faster translation from virtual to physical memory addresses.8 KB, 4-way set associative (unified, write-through policy)8KB (2-way)16KB (4-way)8KB (8-way)32KiB/8-way, 3 cycles32KiB/8-way, 4 cycles*32KiB/8-way, 4 cycles*32KiB/8-way, 4 cycles*32KiB/8-way, 4 cycles*32KiB/8-way, 4 cycles*32KiB/8-way, 4 cycles*48KiB/12-way, 5 cycles48KiB/12-way, 5 cycles
L1-I Cache8KB (2-way)16KB (4-way)12K µops trace cache32KiB/8-way32KiB/4-way32KiB/8-way32KiB/8-way32KiB/8-way32KiB/8-way32KiB/8-way32KiB/8-way32KiB/8-way
L1 Cache DecodersNoneNoneNone3 total, 1 complex, 1 simple, 4-5 µops/cycle3 total, 1 complex, 2 simple, 4-5 µops/cycle1 complex, 4-5 µops/cycle4 total, 1 complex, 3 simple, 4-5 µops/cycle4 total, 1 complex, 3 simple, 4-5 µops/cycle4 total, 1 complex, 3 simple, 4-5 µops/cycle4 total, 1 complex, 3 simple, 4-5 µops/cycle4 total, 1 complex, 3 simple, 4-5 µops/cycle4 total, 1 complex, 3 simple, 4-5 µops/cycle5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle
L2 CacheNoneNoneAvailable on many mainboardsNone256KiB/8-way256KB (8-way)2-3MiB/16-way, dynamically shared, 14 cycles256KiB/8-way, 10 cycles256KiB/8-way, 12 cycles256KiB/8-way, 12 cycles256KiB/8-way, 12 cycles256KiB/8-way, 12 cycles256KiB/4-way, Inclusive, 12 cycles512KiB/8-way, Inclusive, 13 cycles1.25MiB/20-way, Non-Inclusive
L3 Cache NoneNoneNoneNoneNoneN/A2MiB/16-way, 46 cycles2MiB/16-way, 29 cycles2MiB/12-way, 30 cycles2MiB/16-way, Inclusive 36 cycles2MiB/12-way, Inclusive, 38 cycles2MiB, 16-way, Inclusive, 34 cycles2MiB/12-way, Inclusive, 41 cycles3MiB/12-way, Non-Inclusive
µop Cache entriesNoneNoneNoneNoneNoneNoneNoneNoneNone1.5k1.5k1.5k1.5k1.5k2.25k2.25k
Reorder Buffer NoneNoneNoneNoneNone4012696128168168192192224352352
Integer RegistersN/AN/A160160168168180280280
FP/AVX RegistersN/AN/A144144168168168224224
Branch Order Buffer32364848487248??
In-Flight Loads4832486464727272128128
In-Flight Stores24203236364242567272
Scheduler EntriesControl Unit20 (unified)38 Int/FP, 8 Memory32365454606097160160
Execution Ports35466668881010
Instruction pipeline355 (6 for MMX)6, 1020, 31141614-19 dep. on µop hits/misses, 80% hit rate14-19 dep. on µop hits/misses, 80% hit rate14-19 dep. on µop hits/misses, 80% hit rate14-19 dep. on µop hits/misses, 80% hit rate14-19 dep. on µop hits/misses, 80% hit rate??
* In most real world situations latency was 5 cycles, 4 cycles only occurred for rare situations
 
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
The 80486 family supported an L2 cache on the motherboard that could be up to 256KB, maybe even 512KB. This I am certain of as I owned a 486DX2/66 with a 256KBL2. The Pentium had all sorts of cache configs, from a motherboard cache on the P1, cache on the slot cartridge for the P2, to an integrated one on the celerons, to a large slot cartridge cache on the P3, to the large integrated cache of the Tuallatin P3-S.
 

Hulk

Diamond Member
Oct 9, 1999
4,191
1,975
136
P4/Netburst also had different cache sizes. 512KB with Northwood, 1 & 2MB L2 with Prescott/PD. I think the L1D also doubled to 16KB with Prescott.

Yes you are correct. I am putting down the first processor of that particular "line." Perhaps when (if) I complete this I'll break it down into subcategories.
 

zir_blazer

Golden Member
Jun 6, 2013
1,160
400
136
I don't know where you got your early CPUs Cache values from, but they're all wrong.

The 8086 had just an internal 6 Byte prefetch Cache, not 8 KB! The 8088 had a smaller 4 Bytes.
Not sure about the 80286 since it is not covered as much as the first x86 generation, so I don't know if it had the prefetch Cache.
The 386 only supported external Cache, but the MMU Paging Unit had a TLB Cache which was for 32 entries thus 128 Bytes in size.
The 486s had a lot of internal Cache configurations, I recall 8 KiB and 16 KiB in both Write Thought and Write Back variants, the latter needing new BIOS support if I recall correctly. This is when you get Cache L1 and L2 referencing internal and external, respectively.


The Pentium Pro/II/III as stated before also had a truckload of different Cache configurations. Typically you had the BSB (Back Side Bus) wiring the CPU Core to the Cache L2. You had the Pentium Pro MCMs with Cache L2 on-package but off-chip, the original Celeron with no Cache L2, the Mendocino Celeron with integrated 128 KiB Cache L2, the other Pentium 2 and 3 variants with off-chip MCM Cache, the Pentium 2/3 Xeons with had VERY big packages and up to 4 high end Cache L2 chips (Pentium 2 used commodity SRAM as Cache L2, the Xeons was custom made) for up to 2 MiB Cache L2, and Coppermine and Tualatin that had it on die like the Mendocino.

Pentium 4 Willamate had 256 KiB Cache L2, Northwood 512, Prescott 1 MiB, Prescott 2M 2 MiB. There were some server variants like Gallatin that was a Northwood with 2 MiB Cache L3, and I think there were some massive Xeon MP models with 4 MiB (The famous P4 EE used the 2 MiB die, not the 4 MiB one).


I don't really like these "one size fits all" tables for obvious reasons.
 
Last edited:
  • Like
Reactions: moinmoin

Hulk

Diamond Member
Oct 9, 1999
4,191
1,975
136
I don't know where you got your early CPUs Cache values from, but they're all wrong.

The 8086 had just an internal 6 Byte prefetch Cache, not 8 KB! The 8088 had a smaller 4 Bytes.
Not sure about the 80286 since it is not covered as much as the first x86 generation, so I don't know if it had the prefetch Cache.
The 386 only supported external Cache, but the MMU Paging Unit had a TLB Cache which was for 32 entries thus 128 Bytes in size.
The 486s had a lot of internal Cache configurations, I recall 8 KiB and 16 KiB in both Write Thought and Write Back variants, the latter needing new BIOS support if I recall correctly. This is when you get Cache L1 and L2 referencing internal and external, respectively.


The Pentium Pro/II/III as stated before also had a truckload of different Cache configurations. Typically you had the BSB (Back Side Bus) wiring the CPU Core to the Cache L2. You had the Pentium Pro MCMs with Cache L2 on-package but off-chip, the original Celeron with no Cache L2, the Mendocino Celeron with integrated 128 KiB Cache L2, the other Pentium 2 and 3 variants with off-chip MCM Cache, the Pentium 2/3 Xeons with had VERY big packages and up to 4 high end Cache L2 chips (Pentium 2 used commodity SRAM as Cache L2, the Xeons was custom made) for up to 2 MiB Cache L2, and Coppermine and Tualatin that had it on die like the Mendocino.

Pentium 4 Willamate had 256 KiB Cache L2, Northwood 512, Prescott 1 MiB, Prescott 2M 2 MiB. There were some server variants like Gallatin that was a Northwood with 2 MiB Cache L3, and I think there were some massive Xeon MP models with 4 MiB (The famous P4 EE used the 2 MiB die, not the 4 MiB one).


I don't really like these "one size fits all" tables for obvious reasons.

Thank you for responding. I don't like general tables that are inaccurate either. That's why I posted this here for help and why I appreciate your comments.

I only including cores that are the first of a generation of similar cores. Meaning all the the P4 variants are P4's. P6 represents Pentium Pro, PII, PIII, and the first core. As I wrote above I'll try and break it out further if I can get what I have finished in a manner that makes some semblance of sense. And you are right maybe that isn't possible but the exercise won't hurt my brain (too much).

Regarding the L1 cache I guess I had to make a call that TLB, while between the CPU and main memory isn't technically L1 cache, which I'm defining as cache that has the primary function of speeding up processor access to main memory and since the early processors ran 1:1 ratio CPU:FSB there was really no need for L1 cache.

Since the TLB stores page tables so the MMU can improve speed translating virtual address to physical ones I wasn't considering it L1 in the conventional (modern) sense.

Yes I see how this terms kind of morphed over last 40 years! I think I can make a note of this in the table to make the entry specific and correct.

Again, thanks for your expertise!
 

zir_blazer

Golden Member
Jun 6, 2013
1,160
400
136
I only including cores that are the first of a generation of similar cores. Meaning all the the P4 variants are P4's. P6 represents Pentium Pro, PII, PIII, and the first core. As I wrote above I'll try and break it out further if I can get what I have finished in a manner that makes some semblance of sense.
Willamate/Northwood and Prescott are different enough to be considered on its own. May also want to read here "What makes a "new" CPU new?" as a source of inspiration.


Regarding the L1 cache I guess I had to make a call that TLB, while between the CPU and main memory isn't technically L1 cache, which I'm defining as cache that has the primary function of speeding up processor access to main memory and since the early processors ran 1:1 ratio CPU:FSB there was really no need for L1 cache.
The TLB is MMU Cache, the classic "internal Cache" is associated with the CPU part of the Processor, so I'm not suggesting mixing them, either. It would require a new row if you want to add it.
Actually, Intel considered adding a small Cache in the 386 but didn't cause adding a useful one would make the die too big for manufacturing, and a small one was useless.f

The 1:1 CPU:FSB ratio isn't what you think, cause the asynchronous DRAM didn't relied on a platform clock source to begin with (Like the later SDRAM). You're missing the infamous Memory Wait States, as you had extra do-nothing cycles on every RAM access. 1 or 2 Memory WS were usually supported choices in Turbo XT platforms with 8-10-12 MHz clocked 8088s, so even the earliest x86 Processor designs could be clocked high enough to have to wait for memory.



These tables are VERY hard to do cause they can relate to inmediate predecessors or successors, until something breaks the chain and you need to find another way to present the table.
 

Hulk

Diamond Member
Oct 9, 1999
4,191
1,975
136
Willamate/Northwood and Prescott are different enough to be considered on its own. May also want to read here "What makes a "new" CPU new?" as a source of inspiration.

The 1:1 CPU:FSB ratio isn't what you think, cause the asynchronous DRAM didn't relied on a platform clock source to begin with (Like the later SDRAM). You're missing the infamous Memory Wait States, as you had extra do-nothing cycles on every RAM access. 1 or 2 Memory WS were usually supported choices in Turbo XT platforms with 8-10-12 MHz clocked 8088s, so even the earliest x86 Processor designs could be clocked high enough to have to wait for memory.

These tables are VERY hard to do cause they can relate to inmediate predecessors or successors, until something breaks the chain and you need to find another way to present the table.

I follow regarding the memory latency. I think the designers of the early processors had limited transistors to work with and until the CPU started running at multiples of main memory those transistors were better spent elsewhere.

I have a different opinion of the P4's for the purpose of this chart. They are all Netburst and the main differences (except for Prescott going to 31 stage pipe in hopes of pushing clocks further) were memory improvements. Namely faster main memory access and large caches. I don't consider these changes "architectural" in terms of being significant in the history of CPU development.

As you mentioned it's a rabbit hole you can fall into if you are not careful.

I think I've set up the cores reasonably logically in terms of the big developments. Sure each category can be broken down into further subdivisions but that's not my intent. I'm looking at more of a "x86 architecture at a glance." I think it's pretty clear that the Pentium led to the Pentium Pro and variants all the way to Yonah to the first "Core." Then Core2Duo, Nehalem, etc...

The P4 was an additional "branch" of development going on simultaneously with the P6, namely during the PIII to Core development.

I'll update the chart. If you can provide any help in clarifying it I do appreciate it. I'm having fun with it.