Here's a good explanation from Anand's review of the AMD 760MP chipset...
"The third improvement offered by the Athlon MP is a set of three enhancements to the processor's Translation Look-aside Buffers (TLBs). As taken from AMD?s tech docs on the Palomino core, the three TLB enhancements are:
1. The L1 Data TLB increases from 32 to 40 entries
2. Both the L2 Instruction TLB and L2 Data TLB use an exclusive architecture
3. TLB entries can be speculatively reloaded
As you will remember from our initial story on the Athlon 4 processor, the task of the TLB is to cache translated memory addresses. This translation process is necessary for the CPU to gain access to the data stored in main memory, and by caching the translated addresses, it becomes much quicker to find data in main memory.
The first improvement comes by increasing the number of entries in the L1 Data TLB. This increase allows for a greater hit rate (probability of finding what the CPU needs in the TLB) in the L1 Data TLB. You will also remember that the Pentium III has a L1 Data TLB with significantly more entries than even the new 40 entry TLB on the Athlon MP.
The next Athlon MP TLB enhancement comes by moving the L2 TLBs to an exclusive architecture. This means that data contained within the L1 TLBs is not duplicated in the L2 TLBs, which obviously saves space in the L2 TLBs meaning that they can be used to store even more translated addresses. The downside to this exclusive architecture is that there is a latency sacrifice that is made since the addresses aren't duplicated in the L2 TLBs.
The final improvement is that the TLB entries can be speculatively reloaded. This means that in the event that an address is not found in the TLB, the address can be loaded into the TLB before the instruction that requested the address is finished executing. On older Athlon cores, this was not possible, resulting in a bit of a performance hit in this situation. According to AMD, this situation is usually observed in "high-end software applications."
In fact, AMD states that the TLB enhancements of the Athlon MP are most useful in these "high-end software applications." Hopefully, we will see whether or not they are correct with our benchmarks, which are composed of a number of very high-end tests."