- Oct 9, 1999
- 4,191
- 1,975
- 136
I have been trying to extend a CPU chart from Anandtech to include CPU's all the way back to the 8086 but I'm not knowledgeable enough to complete it. Can you help me finish/correct it?
8086 (16 bit/16 bit) | 286 (16bit/24 bit) | 386 (32 bit/32 bit) | 486 | P5/Pentiµm | P6/Pentiµm Pro | P4/Netburst | Conroe/Penryn | Nehalem Westmere | Sandy Bridge | Ivy Bridge | Haswell | Broadwell | Skylake | Sunny Cove | Willow Cove | |
L1-D Cache | 6 Byte Prefetch | RAM of the time was fast enough to serve the processor. | While not technically L1 cache (definition - speeds up memory access), the Memory Management Unit (MMU) has a 32 entry (128 Byte) TLB (Translation Lookaside Buffer) which stores page table for faster translation from virtual to physical memory addresses. | 8 KB, 4-way set associative (unified, write-through policy) | 8KB (2-way) | 16KB (4-way) | 8KB (8-way) | 32KiB/8-way, 3 cycles | 32KiB/8-way, 4 cycles* | 32KiB/8-way, 4 cycles* | 32KiB/8-way, 4 cycles* | 32KiB/8-way, 4 cycles* | 32KiB/8-way, 4 cycles* | 32KiB/8-way, 4 cycles* | 48KiB/12-way, 5 cycles | 48KiB/12-way, 5 cycles |
L1-I Cache | 8KB (2-way) | 16KB (4-way) | 12K µops trace cache | 32KiB/8-way | 32KiB/4-way | 32KiB/8-way | 32KiB/8-way | 32KiB/8-way | 32KiB/8-way | 32KiB/8-way | 32KiB/8-way | 32KiB/8-way | ||||
L1 Cache Decoders | None | None | None | 3 total, 1 complex, 1 simple, 4-5 µops/cycle | 3 total, 1 complex, 2 simple, 4-5 µops/cycle | 1 complex, 4-5 µops/cycle | 4 total, 1 complex, 3 simple, 4-5 µops/cycle | 4 total, 1 complex, 3 simple, 4-5 µops/cycle | 4 total, 1 complex, 3 simple, 4-5 µops/cycle | 4 total, 1 complex, 3 simple, 4-5 µops/cycle | 4 total, 1 complex, 3 simple, 4-5 µops/cycle | 4 total, 1 complex, 3 simple, 4-5 µops/cycle | 5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle | 5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle | 5 total, 1 complex and 4 simple can deliver up to 5 µops/cycle | |
L2 Cache | None | None | Available on many mainboards | None | 256KiB/8-way | 256KB (8-way) | 2-3MiB/16-way, dynamically shared, 14 cycles | 256KiB/8-way, 10 cycles | 256KiB/8-way, 12 cycles | 256KiB/8-way, 12 cycles | 256KiB/8-way, 12 cycles | 256KiB/8-way, 12 cycles | 256KiB/4-way, Inclusive, 12 cycles | 512KiB/8-way, Inclusive, 13 cycles | 1.25MiB/20-way, Non-Inclusive | |
L3 Cache | None | None | None | None | None | N/A | 2MiB/16-way, 46 cycles | 2MiB/16-way, 29 cycles | 2MiB/12-way, 30 cycles | 2MiB/16-way, Inclusive 36 cycles | 2MiB/12-way, Inclusive, 38 cycles | 2MiB, 16-way, Inclusive, 34 cycles | 2MiB/12-way, Inclusive, 41 cycles | 3MiB/12-way, Non-Inclusive | ||
µop Cache entries | None | None | None | None | None | None | None | None | None | 1.5k | 1.5k | 1.5k | 1.5k | 1.5k | 2.25k | 2.25k |
Reorder Buffer | None | None | None | None | None | 40 | 126 | 96 | 128 | 168 | 168 | 192 | 192 | 224 | 352 | 352 |
Integer Registers | N/A | N/A | 160 | 160 | 168 | 168 | 180 | 280 | 280 | |||||||
FP/AVX Registers | N/A | N/A | 144 | 144 | 168 | 168 | 168 | 224 | 224 | |||||||
Branch Order Buffer | 32 | 36 | 48 | 48 | 48 | 72 | 48 | ? | ? | |||||||
In-Flight Loads | 48 | 32 | 48 | 64 | 64 | 72 | 72 | 72 | 128 | 128 | ||||||
In-Flight Stores | 24 | 20 | 32 | 36 | 36 | 42 | 42 | 56 | 72 | 72 | ||||||
Scheduler Entries | Control Unit | 20 (unified) | 38 Int/FP, 8 Memory | 32 | 36 | 54 | 54 | 60 | 60 | 97 | 160 | 160 | ||||
Execution Ports | 3 | 5 | 4 | 6 | 6 | 6 | 6 | 8 | 8 | 8 | 10 | 10 | ||||
Instruction pipeline | 3 | 5 | 5 (6 for MMX) | 6, 10 | 20, 31 | 14 | 16 | 14-19 dep. on µop hits/misses, 80% hit rate | 14-19 dep. on µop hits/misses, 80% hit rate | 14-19 dep. on µop hits/misses, 80% hit rate | 14-19 dep. on µop hits/misses, 80% hit rate | 14-19 dep. on µop hits/misses, 80% hit rate | ? | ? | ||
* In most real world situations latency was 5 cycles, 4 cycles only occurred for rare situations |
Last edited: