L1 vs L2 vs L3 SandyBridge's Micro-Op

lexspop

Junior Member
Sep 14, 2010
4
0
0
First post! (for me, anyway)

Am I right in thinking that speed wise, L1 > L2 > L3?

To make things more complicated, how is the SandyBridge's new Micro-Op cache different than traditional caches? Aren't they all just storing instructions?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
First question: Yes

Second question: On the L1 caches its seperated into two, one for data and one for instructions. All the other levels are unified.

Sandy Bridge's micro op cache only stores decoded instructions, while the regular cache stores regular x86 instructions.
 

lexspop

Junior Member
Sep 14, 2010
4
0
0
First question: Yes

Second question: On the L1 caches its seperated into two, one for data and one for instructions. All the other levels are unified.

Sandy Bridge's micro op cache only stores decoded instructions, while the regular cache stores regular x86 instructions.

Thanks for the quick reply! These forums rock.

As for the last part of you reply, I'm guessing it's faster to re-use decoded instructions than regular x86 instructions?
 

Hard Ball

Senior member
Jul 3, 2005
594
0
0
First post! (for me, anyway)

Am I right in thinking that speed wise, L1 > L2 > L3?

To make things more complicated, how is the SandyBridge's new Micro-Op cache different than traditional caches? Aren't they all just storing instructions?

Inteluser2000 already gave you a good answer, but perhaps I can provide a somewhat more nuanced answer.

I'm not sure what exactly you mean by 'speed'. Assuming you mean access latency, then it would be:
L1 < L2 < L3

If you mean the number of requests serviced per unit of time, then as I understand it, it would be:
L1 > L2 >= L3

In terms of the L0 I cache in the front end, it likely stores partially decoded instructions. Current CISC microarchitectures all contain two main components in their decode stages of the pipeline (sometimes other functionalities as well, such as detecting ops that can be fused). The x86 decode, which converts the compiler compatible representation into its native representation for the remainder of the pipeline. Format decode, which determines which bits of the instruction goes to which part of the instruction control circuitry, or to following datapath (for immediates); which format is generally determined by the number of regsiter operands and immediate operands, as well as by the length of the op-code.

For most of the designs out there that contain some instruction $ holding processor internal representation of a CISC ISA, the representation in the L0 is has usually gone through the first major component of decode, but not the second. Sandy bridge probably has this implementation too.

Hope this helps.