Question Need some help with this CPU chart

Hulk · Jan 7, 2021

I have been trying to extend a CPU chart from Anandtech to include CPU's all the way back to the 8086 but I'm not knowledgeable enough to complete it. Can you help me finish/correct it?

8086 (16 bit/16 bit)

286 (16bit/24 bit)

386 (32 bit/32 bit)

486

P5/Pentiµm

P6/Pentiµm Pro

P4/Netburst

Conroe/Penryn

Nehalem Westmere

Sandy Bridge

Ivy Bridge

Haswell

Broadwell

Skylake

Sunny Cove

Willow Cove

L1-D Cache

6 Byte Prefetch

RAM of the time was fast enough to serve the processor.

While not technically L1 cache (definition - speeds up memory access), the Memory Management Unit (MMU) has a 32 entry (128 Byte) TLB (Translation Lookaside Buffer) which stores page table for faster translation from virtual to physical memory addresses.

8 KB, 4-way set associative (unified, write-through policy)

8KB (2-way)

16KB (4-way)

8KB (8-way)

32KiB/8-way, 3 cycles

32KiB/8-way, 4 cycles*

48KiB/12-way, 5 cycles

L1-I Cache

8KB (2-way)

16KB (4-way)

12K µops trace cache

32KiB/8-way

32KiB/4-way

32KiB/8-way

L1 Cache Decoders

None

3 total, 1 complex, 1 simple, 4-5 µops/cycle

3 total, 1 complex, 2 simple, 4-5 µops/cycle

1 complex, 4-5 µops/cycle

4 total, 1 complex, 3 simple, 4-5 µops/cycle

5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle

L2 Cache

None

Available on many mainboards

None

256KiB/8-way

256KB (8-way)

2-3MiB/16-way, dynamically shared, 14 cycles

256KiB/8-way, 10 cycles

256KiB/8-way, 12 cycles

256KiB/4-way, Inclusive, 12 cycles

512KiB/8-way, Inclusive, 13 cycles

1.25MiB/20-way, Non-Inclusive

L3 Cache

None

N/A

2MiB/16-way, 46 cycles

2MiB/16-way, 29 cycles

2MiB/12-way, 30 cycles

2MiB/16-way, Inclusive 36 cycles

2MiB/12-way, Inclusive, 38 cycles

2MiB, 16-way, Inclusive, 34 cycles

2MiB/12-way, Inclusive, 41 cycles

3MiB/12-way, Non-Inclusive

µop Cache entries

None

1.5k

2.25k

Reorder Buffer

None

40

126

96

128

168

192

224

352

Integer Registers

N/A

160

168

180

280

FP/AVX Registers

N/A

144

168

224

Branch Order Buffer

32

36

48

72

48

?

In-Flight Loads

48

32

48

64

72

128

In-Flight Stores

24

20

32

36

42

56

72

Scheduler Entries

Control Unit

20 (unified)

38 Int/FP, 8 Memory

32

36

54

60

97

160

Execution Ports

3

5

4

6

8

10

Instruction pipeline

3

5

5 (6 for MMX)

6, 10

20, 31

14

16

14-19 dep. on µop hits/misses, 80% hit rate

?

* In most real world situations latency was 5 cycles, 4 cycles only occurred for rare situations

damian101 · Jan 7, 2021

This might be worth including: https://en.wikipedia.org/wiki/FLOPS#FLOPS_per_cycle_for_various_processors

LightningZ71 · Jan 8, 2021

The 80486 family supported an L2 cache on the motherboard that could be up to 256KB, maybe even 512KB. This I am certain of as I owned a 486DX2/66 with a 256KBL2. The Pentium had all sorts of cache configs, from a motherboard cache on the P1, cache on the slot cartridge for the P2, to an integrated one on the celerons, to a large slot cartridge cache on the P3, to the large integrated cache of the Tuallatin P3-S.

Thunder 57 · Jan 8, 2021

P4/Netburst also had different cache sizes. 512KB with Northwood, 1 & 2MB L2 with Prescott/PD. I think the L1D also doubled to 16KB with Prescott.

Hulk · Jan 8, 2021

Thunder 57 said:
P4/Netburst also had different cache sizes. 512KB with Northwood, 1 & 2MB L2 with Prescott/PD. I think the L1D also doubled to 16KB with Prescott.

Yes you are correct. I am putting down the first processor of that particular "line." Perhaps when (if) I complete this I'll break it down into subcategories.

Shmee · Jan 8, 2021

What about the - E/EP variants?

moinmoin · Jan 8, 2021

Do you intend to include the x87 FPU co-processors (all external before being included as part of the instruction set since 486)? Those have an 8 register stack.
https://en.wikipedia.org/wiki/X87#Performance

Hulk · Jan 8, 2021

Shmee said:
What about the - E/EP variants?

No just the ones listed in the chart.

zir_blazer · Jan 8, 2021

I don't know where you got your early CPUs Cache values from, but they're all wrong.

The 8086 had just an internal 6 Byte prefetch Cache, not 8 KB! The 8088 had a smaller 4 Bytes.
Not sure about the 80286 since it is not covered as much as the first x86 generation, so I don't know if it had the prefetch Cache.
The 386 only supported external Cache, but the MMU Paging Unit had a TLB Cache which was for 32 entries thus 128 Bytes in size.
The 486s had a lot of internal Cache configurations, I recall 8 KiB and 16 KiB in both Write Thought and Write Back variants, the latter needing new BIOS support if I recall correctly. This is when you get Cache L1 and L2 referencing internal and external, respectively.

The Pentium Pro/II/III as stated before also had a truckload of different Cache configurations. Typically you had the BSB (Back Side Bus) wiring the CPU Core to the Cache L2. You had the Pentium Pro MCMs with Cache L2 on-package but off-chip, the original Celeron with no Cache L2, the Mendocino Celeron with integrated 128 KiB Cache L2, the other Pentium 2 and 3 variants with off-chip MCM Cache, the Pentium 2/3 Xeons with had VERY big packages and up to 4 high end Cache L2 chips (Pentium 2 used commodity SRAM as Cache L2, the Xeons was custom made) for up to 2 MiB Cache L2, and Coppermine and Tualatin that had it on die like the Mendocino.

Pentium 4 Willamate had 256 KiB Cache L2, Northwood 512, Prescott 1 MiB, Prescott 2M 2 MiB. There were some server variants like Gallatin that was a Northwood with 2 MiB Cache L3, and I think there were some massive Xeon MP models with 4 MiB (The famous P4 EE used the 2 MiB die, not the 4 MiB one).

I don't really like these "one size fits all" tables for obvious reasons.

Hulk · Jan 8, 2021

zir_blazer said:
I don't know where you got your early CPUs Cache values from, but they're all wrong.

The 8086 had just an internal 6 Byte prefetch Cache, not 8 KB! The 8088 had a smaller 4 Bytes.
Not sure about the 80286 since it is not covered as much as the first x86 generation, so I don't know if it had the prefetch Cache.
The 386 only supported external Cache, but the MMU Paging Unit had a TLB Cache which was for 32 entries thus 128 Bytes in size.
The 486s had a lot of internal Cache configurations, I recall 8 KiB and 16 KiB in both Write Thought and Write Back variants, the latter needing new BIOS support if I recall correctly. This is when you get Cache L1 and L2 referencing internal and external, respectively.

The Pentium Pro/II/III as stated before also had a truckload of different Cache configurations. Typically you had the BSB (Back Side Bus) wiring the CPU Core to the Cache L2. You had the Pentium Pro MCMs with Cache L2 on-package but off-chip, the original Celeron with no Cache L2, the Mendocino Celeron with integrated 128 KiB Cache L2, the other Pentium 2 and 3 variants with off-chip MCM Cache, the Pentium 2/3 Xeons with had VERY big packages and up to 4 high end Cache L2 chips (Pentium 2 used commodity SRAM as Cache L2, the Xeons was custom made) for up to 2 MiB Cache L2, and Coppermine and Tualatin that had it on die like the Mendocino.

Pentium 4 Willamate had 256 KiB Cache L2, Northwood 512, Prescott 1 MiB, Prescott 2M 2 MiB. There were some server variants like Gallatin that was a Northwood with 2 MiB Cache L3, and I think there were some massive Xeon MP models with 4 MiB (The famous P4 EE used the 2 MiB die, not the 4 MiB one).

I don't really like these "one size fits all" tables for obvious reasons.

Thank you for responding. I don't like general tables that are inaccurate either. That's why I posted this here for help and why I appreciate your comments.

I only including cores that are the first of a generation of similar cores. Meaning all the the P4 variants are P4's. P6 represents Pentium Pro, PII, PIII, and the first core. As I wrote above I'll try and break it out further if I can get what I have finished in a manner that makes some semblance of sense. And you are right maybe that isn't possible but the exercise won't hurt my brain (too much).

Regarding the L1 cache I guess I had to make a call that TLB, while between the CPU and main memory isn't technically L1 cache, which I'm defining as cache that has the primary function of speeding up processor access to main memory and since the early processors ran 1:1 ratio CPU:FSB there was really no need for L1 cache.

Since the TLB stores page tables so the MMU can improve speed translating virtual address to physical ones I wasn't considering it L1 in the conventional (modern) sense.

Yes I see how this terms kind of morphed over last 40 years! I think I can make a note of this in the table to make the entry specific and correct.

Again, thanks for your expertise!

zir_blazer · Jan 8, 2021

Hulk said:
I only including cores that are the first of a generation of similar cores. Meaning all the the P4 variants are P4's. P6 represents Pentium Pro, PII, PIII, and the first core. As I wrote above I'll try and break it out further if I can get what I have finished in a manner that makes some semblance of sense.

Willamate/Northwood and Prescott are different enough to be considered on its own. May also want to read here "What makes a "new" CPU new?" as a source of inspiration.

Hulk said:
Regarding the L1 cache I guess I had to make a call that TLB, while between the CPU and main memory isn't technically L1 cache, which I'm defining as cache that has the primary function of speeding up processor access to main memory and since the early processors ran 1:1 ratio CPU:FSB there was really no need for L1 cache.

The TLB is MMU Cache, the classic "internal Cache" is associated with the CPU part of the Processor, so I'm not suggesting mixing them, either. It would require a new row if you want to add it.
Actually, Intel considered adding a small Cache in the 386 but didn't cause adding a useful one would make the die too big for manufacturing, and a small one was useless.f

The 1:1 CPU:FSB ratio isn't what you think, cause the asynchronous DRAM didn't relied on a platform clock source to begin with (Like the later SDRAM). You're missing the infamous Memory Wait States, as you had extra do-nothing cycles on every RAM access. 1 or 2 Memory WS were usually supported choices in Turbo XT platforms with 8-10-12 MHz clocked 8088s, so even the earliest x86 Processor designs could be clocked high enough to have to wait for memory.

These tables are VERY hard to do cause they can relate to inmediate predecessors or successors, until something breaks the chain and you need to find another way to present the table.

Hulk · Jan 8, 2021

zir_blazer said:
Willamate/Northwood and Prescott are different enough to be considered on its own. May also want to read here "What makes a "new" CPU new?" as a source of inspiration.

The 1:1 CPU:FSB ratio isn't what you think, cause the asynchronous DRAM didn't relied on a platform clock source to begin with (Like the later SDRAM). You're missing the infamous Memory Wait States, as you had extra do-nothing cycles on every RAM access. 1 or 2 Memory WS were usually supported choices in Turbo XT platforms with 8-10-12 MHz clocked 8088s, so even the earliest x86 Processor designs could be clocked high enough to have to wait for memory.

These tables are VERY hard to do cause they can relate to inmediate predecessors or successors, until something breaks the chain and you need to find another way to present the table.

I follow regarding the memory latency. I think the designers of the early processors had limited transistors to work with and until the CPU started running at multiples of main memory those transistors were better spent elsewhere.

I have a different opinion of the P4's for the purpose of this chart. They are all Netburst and the main differences (except for Prescott going to 31 stage pipe in hopes of pushing clocks further) were memory improvements. Namely faster main memory access and large caches. I don't consider these changes "architectural" in terms of being significant in the history of CPU development.

As you mentioned it's a rabbit hole you can fall into if you are not careful.

I think I've set up the cores reasonably logically in terms of the big developments. Sure each category can be broken down into further subdivisions but that's not my intent. I'm looking at more of a "x86 architecture at a glance." I think it's pretty clear that the Pentium led to the Pentium Pro and variants all the way to Yonah to the first "Core." Then Core2Duo, Nehalem, etc...

The P4 was an additional "branch" of development going on simultaneously with the P6, namely during the PIII to Core development.

I'll update the chart. If you can provide any help in clarifying it I do appreciate it. I'm having fun with it.

Question Need some help with this CPU chart

Hulk

Diamond Member

damian101

Senior member

LightningZ71

Platinum Member

Thunder 57

Diamond Member

Hulk

Diamond Member

Shmee

Memory & Storage, Graphics Cards Mod Elite Member

moinmoin

Diamond Member

Hulk

Diamond Member

zir_blazer

Golden Member

Hulk

Diamond Member

zir_blazer

Golden Member

Hulk

Diamond Member

TRENDING THREADS