Originally posted by: Jeff7181
To be fair, people often use incorrect terminology when talking about this. It's been a while since I read up on it, but more informed sources use the correct term, "physically address" rather than "access." Because as we all know, the Xeon is a 32-bit processor and is capable of using more than 4 GB of RAM. I'm guessing it's better (easier, faster, etc.) if the processor has the native ability to physically address more than 4 GB rather than have it be able to use more than 4 GB with software "tricks."
People confuse virtual space with memory.
From the software's point of view, it lives in the virtual space. Not in physical ram. Every byte of data and code, every access to OS services and resources, has its neat little place in virtual space.
Every virtual "address" that is actually then used, is
mapped by the CPU's memory manager and the OS to some physical location, either in ram or in swap on the hd. There is no need for any correlation of bit-width in this mapping.
16-bit software use a virtual space composed of a number of 64KB segments. The software app itself need to explicitely manage these segments, and keep track of which segment every piece of data belongs to. This is contorted, to say the least.
Every used location inside these 16-bit pages is then mapped to some physical address, 24 bit wide, 32-bit wide, 36-bit wide, doesn't matter. This is business as usual. No penalties or software tricks here. The problems are entirely inside the applications own code, that have to deal with a multi-segmented virtual space.
32-bit software have 4GB segments. This opens for the possibility of having a software model that entirely resides inside one single flat 4GB segment. This means the software can be completely ignorant of the concept of segments, and can assume that every address is singular and unique. It has a
FLAT, linear virtual space. Address arithmetic is a breeze and performance is increased.
The OS can also now assume that every process have it's own 32-bit segment and can easily separate them this way.
So basically what we have now, is used data, inside a number of 32-bit virtual spaces, being mapped to 36-bit physical addresses.
Every used location inside these 32-bit segments is then mapped to a physical location. This works something like this:
The application wants to set up some piece of data and requires a 800KB block for this, in it's virtual space. It asks the OS to allocate 800KB. The OS then finds two hundred (200) free, 4KB wide memory pages in ram. These can be scattered anywhere and in any order. The OS also finds a previously unused 800KB free block inside the apps virtual space. Then every sequential 4KB piece of this 800KB block is associated with one of the 4KB pages in ram. Finally, it hands the application the 32-bit number pointing to the start of the 800KB block inside its own virtual space. The application never concerns itself with physical addresses. It actually cannot. It's cut off, and isolated by the OS.
The OS may later want to swap out some rarely used 4KB pages to hd (to get more free ram). It then changes the association for these pages to point into swap instead. The software doesn't know anything about this. The virtual addresses inside its virtual space remains the same. If and when the app access data at that virtual addresses, the OS will go - "Oops, that address is on that page in swap." Find a free 4KB page in ram, load the page into ram and change the associations so they now point to the new page in ram.
When the Windows 4GB space then doesn't contain enough unfragmented blocks of numbers, to represent everything needed (application code, application data, .dlls, shared data, OS APIs, OS resources - disc cache, agp aperture...), we run into a barrier. This barrier has nothing directly to do with physical ram or physical addressing. It has to do with the virtual space of our software model.
Since we want to keep the single flat, linear space and all the advantages of that, we go to a 64-bit software model, featuring a 64-bit, 16ExoByte, virtual space. This requires a new CPU featuring an instruction set with 64-bit address field.
Current AMD K8 cpus feature the hardware to map a total of 1TB addresses, from somewhere in the lowest 256TB of the virtual space, to a physical address. But WindowsXP64 will only map 16GB from a 16TB virtual space, for a Windows64 app, if I'm correctly informed.
This does seem a bit constricted. But the scheme is expandable. The hardware concept, x86-64, is ultimately expandable to mapping 4PB from the full 16EB virtual space. And any x86-64 software can ultimately run in that environment.
Note that this doesn't necessarily means a Windows64 (software model) app will be able to use 4PB from a 16EB virtual space. But WindowsXP64 will probably be able to run on a future cpu, featuring hardware support for 16EB to 4PB memory mapping.
The option of instead having a virtual space composed of multiple 32-bit segments, is not at all attractive.
Note that we in 64-bit computing map from a larger virtual space to a smaller physical space. Whereas the the case is the opposite with 16- and 32-bit computing. 64-bit computing is more flexible and also solves the problem of fragmentation of the virtual space.