- Oct 19, 2004
- 1,628
- 0
- 0
There are several things that I don't understand about TLBs.
First of all, AMD64 CPUs have 40/40 L1 TLBs (instruction/data) and
512/512 L2 TLBs.
That seems low to me. 512 entries at page size 4096 bytes means that
only 2 MB worth of data can be accessed without having a TLB miss and
walking the page table. That is just twice as big as the L2 cache.
Do I have some misunderstanding here?
Then, since AMD64 and all other x86 processors have caches working on
physical addresses, does that mean that a TLB lookup is needed
everytime that any address is used in the program, even if that
address' data is currently cached?
I am looking at the performance counters for an application I want to
improve. I can see that compared to a reference program assumed to be
average (gcc) that I get better L2 data hit rates but more L1+L2 DTLB
misses. I wonder what I should start doing about it and I wonder which other
performance counter will tell me more about the resulting memory
accesses (because I now walk the page table).
%%
Semi-related question: where is AMD hiding the list of performance
counters and what they mean? Is there anything published by AMD on
this?
Thanks
First of all, AMD64 CPUs have 40/40 L1 TLBs (instruction/data) and
512/512 L2 TLBs.
That seems low to me. 512 entries at page size 4096 bytes means that
only 2 MB worth of data can be accessed without having a TLB miss and
walking the page table. That is just twice as big as the L2 cache.
Do I have some misunderstanding here?
Then, since AMD64 and all other x86 processors have caches working on
physical addresses, does that mean that a TLB lookup is needed
everytime that any address is used in the program, even if that
address' data is currently cached?
I am looking at the performance counters for an application I want to
improve. I can see that compared to a reference program assumed to be
average (gcc) that I get better L2 data hit rates but more L1+L2 DTLB
misses. I wonder what I should start doing about it and I wonder which other
performance counter will tell me more about the resulting memory
accesses (because I now walk the page table).
%%
Semi-related question: where is AMD hiding the list of performance
counters and what they mean? Is there anything published by AMD on
this?
Thanks