CPU TLB size

ap0g33

Junior Member
Dec 12, 2014
5
0
0
Hi

I am trying to find out why the TLB size from my CPU changes depending from the page size:

$ cpuid | grep -i tlb
cache and TLB information (2):
0x5a: data TLB: 2M/4M pages, 4-way, 32 entries
0x55: instruction TLB: 2M/4M pages, fully, 7 entries

0x03: data TLB: 4K pages, 4-way, 64 entries
0xb2: instruction TLB: 4K, 4-way, 64 entries
0xca: L2 TLB: 4K, 4-way, 512 entries

When using pages size as 4K there are much more entries. why?
 

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,358
136
The TLB are split depending on page sizes. The reason is that having multiple sizes for a single TLB means multiple comparisons which increases logic depth and might end up impacting clock.

Then having more 4KB pages is because it's the most used size.
 

ap0g33

Junior Member
Dec 12, 2014
5
0
0
The TLB are split depending on page sizes. The reason is that having multiple sizes for a single TLB means multiple comparisons which increases logic depth and might end up impacting clock.

Then having more 4KB pages is because it's the most used size.

So comparing performance between 4KB and 2MB page size in the same CPU is not valid as the TLB entries may vary?
 

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,358
136
So comparing performance between 4KB and 2MB page size in the same CPU is not valid as the TLB entries may vary?
The number of entries don't vary over time, these are distinct cache-like structures. So given that 2MB pages map more memory with a single entry, you'll see a good speedup if you have lots of data. Look an hugetlb in Linux for instance.
 

ap0g33

Junior Member
Dec 12, 2014
5
0
0
The number of entries don't vary over time, these are distinct cache-like structures. So given that 2MB pages map more memory with a single entry, you'll see a good speedup if you have lots of data. Look an hugetlb in Linux for instance.

I am trying to check if linux THP can really provide performance gains for different workloads.

But as TLB entries are different for 4KB and 2MB, I am affraid that comparison will not show exactly how better huge page is.
 

Schmide

Diamond Member
Mar 7, 2002
5,689
923
126
In a virtual page system you have to encode the page table, valid, attribute, and tag bits. On a 4k page system you need 12 bits for the offset. ON a 2m/4m page system you need 21-22. I'm sure the above is some balance of this allocation.
 

ap0g33

Junior Member
Dec 12, 2014
5
0
0
In a virtual page system you have to encode the page table, valid, attribute, and tag bits. On a 4k page system you need 12 bits for the offset. ON a 2m/4m page system you need 21-22. I'm sure the above is some balance of this allocation.

Good argument Schmide,

So as Huge pages TLB have a large offset it seems fair to compare the Huge Page against 4KB even if when number of TLB entries are different.

thanks
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
The TLB doesn't care about the offset. It only cares about the tag, and the tag is whatever bits are left over when you mask off the offset (or do address>>12 for 4KB pages, or address>>21 for 2MB pages). The tricky thing about TLB design is that you can't know if a virtual address is part of a 4KB or 2MB page (and therefore which tag you should really be using) until AFTER the TLB lookup is complete, so you end up having to search for the tag among both the 4KB and 2MB TLB entries.
 

ap0g33

Junior Member
Dec 12, 2014
5
0
0
How can I found out my real TLB size (including offset) ?

What I want to do is to check the TLB size for 4KB and discover how many entries would I have if others size like 16KB, 32KB, 64KB... were supported
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
If you do a cat /proc/meminfo in linux, you will see the DirectMapXX numbers, basically they tell how much mem can be mapped with each TLB size.


TLB these days is 4k, 2M or 1G (1G is for kernel, or some crazy micro managed hugetlb setup). And overall i would not be too focused about them either. The reason is simple - one needs a special load to be highly impacted by TLB. All normal loads will get hurt by cache misses long before TLBs will matter.

Ironically main use of THP (and hugetlb before it) was to deal with a way Oracle and co used shared memory - basically each process had to have a copy of page tables, so for each process and shared mem gigabite, space for 262144 pages in pagetable had to be allocated. It was not uncommon to see 15%+ of sys memory used for OS pagetables.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
How can I found out my real TLB size (including offset) ?

What I want to do is to check the TLB size for 4KB and discover how many entries would I have if others size like 16KB, 32KB, 64KB... were supported

Your original post had the info:

0x5a: data TLB: 2M/4M pages, 4-way, 32 entries
0x55: instruction TLB: 2M/4M pages, fully, 7 entries

etc.

The machine has a separate TLB for each page size depending on the bits set in the relevant control registers by the OS. The page sizes supported (as of today) are 4K, 2M, 4M and 1G, as per the PRM, chapter 4:

http://www.intel.com/content/dam/ww...eveloper-system-programming-manual-325384.pdf

For some reason intel ark doesn't show the number of TLB entries. Lame. :(