Hi all,
Quick question... I often see cache latencies quoted something like (i7, for example):
L1: 4 cycles
L2: 11 cycles
L3: ~40 cycles
RAM: ~100 cycles
So I understand that to mean that if I want to read line X, and X is sitting in L1 cache, it takes me 4 cycles to retrieve it. And if X isn't in L1 but it is in L2, then did I:
1) incur 4 cycles (check L1 for X) + 11 cycles (check & get X from L2) + 4 cycles (put X in L1) before I can use the contents of X
2) incur 11 (get from L2) + 4 (put in L1) before I can use contents of X (i.e. asking whether something is/is not in cache is fast)
3) incur 11 (get from L2) + nothing for L1 to do before I can use X
4) incur 11 (includes getting from L2 & putting in L1)
?
Or is it like case 2), but X is availalbe immediately so I don't have to wait for L1 to write in line X?
Also, does the quoted main memory latency include the time it takes for the RAM to act? Like if we have 533mhz (bus rate) DDR2 with 6-6-6 timings, it'll take something like 180ns (worst) to 60ns (best) for the memory to send the first bit of data back after the request goes through. But this could vary pretty widely depending on what kind of RAM you have.
On a 3ghz processor, 90ns is way more than 100 cycles... so my guess is that 100 cycles means it takes 100 cycles for the processor to ask the RAM to do something (so figuring out that X isn't in cache, looking up where to find X in TLB or doing addr translation, physically sending the signals around, etc). And then the RAM takes however long to respond. Is this correct?
Quick question... I often see cache latencies quoted something like (i7, for example):
L1: 4 cycles
L2: 11 cycles
L3: ~40 cycles
RAM: ~100 cycles
So I understand that to mean that if I want to read line X, and X is sitting in L1 cache, it takes me 4 cycles to retrieve it. And if X isn't in L1 but it is in L2, then did I:
1) incur 4 cycles (check L1 for X) + 11 cycles (check & get X from L2) + 4 cycles (put X in L1) before I can use the contents of X
2) incur 11 (get from L2) + 4 (put in L1) before I can use contents of X (i.e. asking whether something is/is not in cache is fast)
3) incur 11 (get from L2) + nothing for L1 to do before I can use X
4) incur 11 (includes getting from L2 & putting in L1)
?
Or is it like case 2), but X is availalbe immediately so I don't have to wait for L1 to write in line X?
Also, does the quoted main memory latency include the time it takes for the RAM to act? Like if we have 533mhz (bus rate) DDR2 with 6-6-6 timings, it'll take something like 180ns (worst) to 60ns (best) for the memory to send the first bit of data back after the request goes through. But this could vary pretty widely depending on what kind of RAM you have.
On a 3ghz processor, 90ns is way more than 100 cycles... so my guess is that 100 cycles means it takes 100 cycles for the processor to ask the RAM to do something (so figuring out that X isn't in cache, looking up where to find X in TLB or doing addr translation, physically sending the signals around, etc). And then the RAM takes however long to respond. Is this correct?