L1 vs L2 vs L3 SandyBridge's Micro-Op

lexspop · Sep 14, 2010

First post! (for me, anyway)

Am I right in thinking that speed wise, L1 > L2 > L3?

To make things more complicated, how is the SandyBridge's new Micro-Op cache different than traditional caches? Aren't they all just storing instructions?

IntelUser2000 · Sep 14, 2010

First question: Yes

Second question: On the L1 caches its seperated into two, one for data and one for instructions. All the other levels are unified.

Sandy Bridge's micro op cache only stores decoded instructions, while the regular cache stores regular x86 instructions.

lexspop · Sep 14, 2010

IntelUser2000 said:
First question: Yes

Second question: On the L1 caches its seperated into two, one for data and one for instructions. All the other levels are unified.

Sandy Bridge's micro op cache only stores decoded instructions, while the regular cache stores regular x86 instructions.

Thanks for the quick reply! These forums rock.

As for the last part of you reply, I'm guessing it's faster to re-use decoded instructions than regular x86 instructions?

IntelUser2000 · Sep 14, 2010

Yea, because it saves decoder bandwidth, and it takes time for instructions to decode.

Idontcare · Sep 14, 2010

lexspop said:
First post! (for me, anyway)

Welcome to the forums! :thumbsup:

Hard Ball · Sep 14, 2010

lexspop said:
First post! (for me, anyway)

Am I right in thinking that speed wise, L1 > L2 > L3?

To make things more complicated, how is the SandyBridge's new Micro-Op cache different than traditional caches? Aren't they all just storing instructions?

Inteluser2000 already gave you a good answer, but perhaps I can provide a somewhat more nuanced answer.

I'm not sure what exactly you mean by 'speed'. Assuming you mean access latency, then it would be:
L1 < L2 < L3

If you mean the number of requests serviced per unit of time, then as I understand it, it would be:
L1 > L2 >= L3

In terms of the L0 I cache in the front end, it likely stores partially decoded instructions. Current CISC microarchitectures all contain two main components in their decode stages of the pipeline (sometimes other functionalities as well, such as detecting ops that can be fused). The x86 decode, which converts the compiler compatible representation into its native representation for the remainder of the pipeline. Format decode, which determines which bits of the instruction goes to which part of the instruction control circuitry, or to following datapath (for immediates); which format is generally determined by the number of regsiter operands and immediate operands, as well as by the length of the op-code.

For most of the designs out there that contain some instruction $ holding processor internal representation of a CISC ISA, the representation in the L0 is has usually gone through the first major component of decode, but not the second. Sandy bridge probably has this implementation too.

Hope this helps.

degibson · Sep 15, 2010

1. Welcome to the forums.
2. Great response, Hard Ball.

IntelUser2000 · Sep 15, 2010

Welcome, albeit a late one, to the forums as well.

Search

L1 vs L2 vs L3 SandyBridge's Micro-Op

lexspop

Junior Member

IntelUser2000

Elite Member

lexspop

Junior Member

IntelUser2000

Elite Member

Idontcare

Elite Member

Hard Ball

Senior member

degibson

Golden Member

IntelUser2000

Elite Member

TRENDING THREADS