memory address

beginner99

Diamond Member
Jun 2, 2009
5,315
1,760
136
probably the right place here anyway I my current course about the basics of Operating systems we also looked into memory addresses and virtual memory however we actually did not really go into details what at that address actually is.

As example we want to access a file that is in memory. Is 1 address enough for this assuming the file is unsegmented in memory? Or does each byte have to be addressed separately?

What about a 64-bit integer?
 
Jul 18, 2009
122
0
0
This is a Hard Question so I will oversimpify and also resort to the Rules of Old English Capitalization:

1) Addresses in Logical Address Space are always contiguous unless, You, The Programmer, manage to fuck things up (and it will be entirely Your fault if You do).

2) Addresses in Physical Address Space are divided into Pages. All the Logical Addresses on a given Page correspond to contiguous Physical Memory, but no two Pages of Memory are guaranteed to be physically contiguous.

2a) As a Programmer You will only ever work with Logical Addresses so it is not Your fucking Business whether or not Your Pages are physically contiguous.

3) Each CPU architecture may determine its own Memory Virtualization scheme, so the exact Rules of Memory Virtualization vary from one architecture to another (and they even vary within generation of an architecture, like how x86 only* had Protected Mode but x86-64 has both Protected Mode and Long Mode addressing schemes).

4) If I were to map a File to contiguous Logical Address Space then I would be able to read the entire file given only the Logical Address of the First Byte of said File (and I would possibly also need to know the Total Bytes in the File, unless it contains and EOF character I can read). All the contiguous bytes of the File would be mapped to contiguous bytes in Logical Space.**

*Technically x86 had other modes that no one cares about anymore.

**Obviously I am ignoring all the complexities of a modern file system.

***Are You starting to get a headache yet because I can make it more complicated if You want.
 

beginner99

Diamond Member
Jun 2, 2009
5,315
1,760
136
Go ahead and make it more complicated.

I understand how you can translate from virtual to physical address but what do I find there?

I mean an address is a specific place within a page. Assume I want to read variable x which is a 64-bit integer and we are using 64-bit windows.

How many addresses do I need to retrieve it? 1 I would assume?
How does the system know how many bits/bytes it must read from memory?
 
Jul 18, 2009
122
0
0
All eight bytes would have contiguous addresses in virtual space. Given a pointer to the first byte at &X, you could also read the values at &X+1, &X+2, ... &X+7.

The corresponding physical addresses to go along with &X and &X+7 are not guaranteed to be on the same page. There is a chance that &X could physically reside somewhere in your RAM, whereas &X+7 is on your hard drive in your disk cache.

Address translation is done automatically in-hardware by the CPU. As a programmer it doesn't matter to you whether or not your data is physically contiguous, so long as it is logically contiguous.

33ffccy.gif


500px-Virtual_memory.svg.png
 
Last edited:

martixy

Member
Jan 16, 2011
93
6
71
Read up on real and protected mode for some history.

P.S. The x64 extension allows greater data sizes and also allows addressing more than 4GB(2^32 bytes) of memory. However that does not automatically mean that you can address 2^64 bytes of data due to various other limitations present in CPU's/MoBo's for economic reasons.
 
Last edited:
Jul 18, 2009
122
0
0
Read up on real and protected mode for some history.

P.S. The x64 extension allows greater data sizes and also allows addressing more than 4GB(2^32 bytes) of memory.

x64 actually adds a whole new CPU addressing mode called Long Mode, which is entirely distinct from (although essentially backwards compatible with) Protected Mode.

However that does not automatically mean that you can address 2^64 bytes of data due to various other limitations present in CPU's/MoBo's for economic reasons.

Actually Long Mode is restricted to 2^48 addresses (=2^48 bytes = 256TB), not really for economic reasons but because Long Mode addressing is a superset of Page Address Extension, and also because some of those 64 bits are used for other purposes like the NX Bit.

On an EFI motherboard it would probably be pretty trivial to hit the 256TB address limit assuming you can get your hands on a 256TB RAID with DMA support.
 

martixy

Member
Jan 16, 2011
93
6
71
Well I read somewhere that a lot of MoBo's don't even go to 2^48 because manufacturers don't wanna waste material or bandwidth or whatever to provide for a feature that is more or less practically impossible to utilize presently.
 

beginner99

Diamond Member
Jun 2, 2009
5,315
1,760
136
All eight bytes would have contiguous addresses in virtual space. Given a pointer to the first byte at &X, you could also read the values at &X+1, &X+2, ... &X+7.

The corresponding physical addresses to go along with &X and &X+7 are not guaranteed to be on the same page. There is a chance that &X could physically reside somewhere in your RAM, whereas &X+7 is on your hard drive in your disk cache.

Address translation is done automatically in-hardware by the CPU. As a programmer it doesn't matter to you whether or not your data is physically contiguous, so long as it is logically contiguous.

33ffccy.gif


500px-Virtual_memory.svg.png

Sry i still don't get it. One 64-bit integer, is 1 Address enough to access it? How is it fetched? Get Address xyz and all 7 following bytes (in virtual memory) or are 8 addresses sent and then combined together?

I know about that picture and that in physical memory those 8 bytes can be separated. Still it seems kind you a waste to store 8 addresses for 1 64-bit int? Assuming 64-bit addresses that would be 64 byte per 64-bit int just for address pointers?
 

martixy

Member
Jan 16, 2011
93
6
71
There is an abstraction between virtual memory and it's physical place of residence. That process is the job of the OS and CPU.

The program may only see a continuous range of memory.
Yet beyond the operating system, parts of that seemingly continuous range may reside on any number of locations in the RAM or the HDD of the system, not necessary themselves contiguous in any form or way.
 
Jul 18, 2009
122
0
0
You only need one address. There's a reason I keep harping on about "virtual addresses are contiguous" and "programmers don't care about that."

Let's say you write a program that has a 64 bit integer, stored in the address range 0xF000-0xF007. This is the VIRTUAL address, the one used internally by your program.

Let's also say that your CPU/memory architecture allows you to read data in 64-bit chunks. So you issue the instruction READ64 0xF000, which simultaneously reads everything in the range 0xF000-0xF007.

The CPU sees that you are accessing the memory range 0xF000-0xF007, so it sneaks in and replaces the addresses you're looking at, according to the Page Lookup Table.

33ffccy.gif


For simplicity the first two bytes in the address (0xF0) are the page number, and the last two bytes (0x00-0x07) are the offset. So the CPU looks up the page number 0xF0 and finds that the corresponding physical frame address (in the virtual address space used by your program) is, let's say, 0xAA.

So now the CPU changes your instruction to READ64 0xAA00, which reads every byte from 0xAA00 to 0xA07. Note that this is the PHYSICAL address, which in a very simple computer would correspond to the actual physical cells in memory where your data is stored. You could say to yourself, "Oh, 0xAA00 is on the third chip on the second DIMM of my RAM. I could surgically apply a tiny current to those exact DRAM cells, and that would change the data stored in my program."

The situation only gets complicated when you read data which happens to lie across two different pages. Most CPU architectures use an instruction set which is carefully aligned to the addressing scheme, which guarantees it is impossible for a single CPU instruction to require accessing two different frames of physical memory.

I don't know if you are more or less confused now but if you want the rabbit hole keeps going down.
 
Jul 18, 2009
122
0
0
All personnel please clear the area immediately.

The program may only see a continuous range of memory.

"CONTINUOUS" IS A TERM IN NUMBER THEORY TO DESCRIBE A RANGE OF VALUES WHICH ARE BOTH CLOSED (CONTAINING ALL THEIR LIMITS) AND DENSE (GIVEN ANY TWO VALUES IN THE RANGE, THERE EXISTS A THIRD VALUE BETWEEN THOSE TWO VALUES THAT IS ALSO IN THAT RANGE).

"CONTIGUOUS" IS A TERM IN TOPOLOGY TO DESCRIBE WHEN TWO THINGS SHARE A BOUNDARY WITH NO GAP SEPARATING THEM.

COMPUTER ADDRESS SPACES CAN BE CONTIGUOUS, BUT THEY ARE NEVER CONTINUOUS.
 
Last edited:

exdeath

Lifer
Jan 29, 2004
13,679
10
81
Some more info on x86 that might help clarify, a machine "word" is defined as 16 bits/2 bytes. On many (almost all others) architectures, a "word" is 32 bits/4 bytes. That means, on x86:

byte = 8 bits, smallest unit the CPU can address when communicating with memory, that is to say, to access bits, you have to read at least a byte, toggle the bits you want, then write back at least a whole byte

word = 2 bytes, or 16 bits

dword = 4 bytes, or 32 bits

qword = 8 bytes, or 64 bits

On some architectures, address space can refer to words, so a machine with 32 address lines (4 billion addresses) accessing a 4 byte word at each address totals to 16 GB of physical ram. On x86, each address is exactly one byte, so 32 bit address space = 4 billion bytes (4GB).

How much it reads at a given address is determined by the instruction. A byte instruction like "mov al, [0x00000000]" will read only that byte. "mov ax, [0x00000000]" will move 2 bytes from 0x00000000-0x00000001. "mov eax, [0x00000000]" will load the 4 bytes from 0x00000000-0x00000003. "mov rax, [0x00000000]" will load 8 bytes from 0x00000000-0x00000007.

You can access bytes, but it's not efficient use of cycles. Also word or higher reads that cross a word boundary are broken into two address accesses (for example mov eax, [0x00000001] has to read from both 0x00000000 for the first 3 bytes and 0x00000004 for the last byte). This is "word alignment".

This is all you need to know as a software programmer.

To further complicate matters you have cache. Regardless of a byte read or a quadword read, the CPU only physically reads/writes memory in blocks of 16 bytes or so, known as a "cache line". This is completely transparent to the programmer, the only thing you need to know, is that it's there, how it works, and how to optimize your reads/writes so as not to cause cache flushes/misses/reloads, esp due to the way multiples of addresses share cache lines (the term "cache thrashing") etc

The only reason you can't read 0x00000000 in practice is that it's a "NULL" address purely by convention alone; the first entry in the page table (or several) of every process is intentionally left invalid (not backed by real memory) for debugging purposes so you know when you've accessed a zero memory address or uninitialized pointer. Hence the term "null pointer" and related access violations. This is purely by convention and there is no real physical reason for not being able to have valid data at address 0x00000000.

Behind the scenes, you have virtual memory mapping: page tables and translation look aside buffers are special data structures in RAM managed by the OS and defined by the x86 architecture, that perform virtual to physical memory lookups, handle page faults behind the scenes, etc. These structures, along with file system buffers, and memory mapped hardware, make the majority of memory taken up by the OS kernel.

All processes have their own page table and page table directory, managed by the OS, this are the most important part of a task switch, in addition to saved CPU registers, when the OS multitasks. When the address of the page table is swapped (system register CR3 on x86 if I recall) what WAS at 0x00000000-0x7FFFFFFF (your application) disappears and is instantly replaced by the contents of another process. This is multi tasking. 0x80000000-0xFFFFFFFF is the same in all processes page tables, this is the OS, and is protected memory (ring 0 on x86). Trying to look at this memory in any form results in ???????? in a debugger. You cannot view this memory unless you are doing kernel/driver development with the Win32 DDK, and any attempt to access in a program will cause a access violation and crash your app. The process of mapping virtual memory addresses to generate the correct physical address line signal to the busses is handled entirely in hardware, automatically, by the CPU (it's hardware memory management unit). All the OS needs to do is follow the protocol for setting up and loading the page tables correctly.

Why is this complex mechinism used at all and why not just straight physical addressing? Notice how every application has the same address space (the virtual one, the only one we ever need be concerned with unless writing an OS) and how multi tasking, system sharing, data/code security, privileges, etc becomes possible. Not having this, and having a straight flat physical address space = DOS, where any app can write to the interrupt vector table, all apps have to fit in the address space at the same time and start at various locations, etc. Virtual memory addressing solves all these problems very neatly. You need not worry about an application altering another applications memory (unless deliberately through system calls), messing with the OS (because it can't see it), all processes start at the same address in their own virtual address sandbox and have the whole address space to themselves, etc.

As a programmer (using 32 bit example), all processes have the same memory addresses (virtual), that is to say 0x00000000 to 0x7FFFFFFF, this is it's virtual address space and is identical for all apps. Not all of this space is mapped at once. Initially only a very very small portion is mapped, from what the OS allocates (in multiples of 4k) to load the image (.exe), it's resources, initial stack, .dlls, initial heap, etc. Accessing any address in that range that isn't valid (nothing was ever allocated there) causes an invalid page fault and the process to be terminated. Accessing memory with incorrect permission (writing to read only) causes a fault. Accessing a page that is valid, but marked "not present" causes the same exact page fault, but the OS can examine the page table and look to see that there is in fact something there, it's just in the page file. Then it allocates or steals a page from another process (after writing it's contents to the swap file and marking it not present in the donor process), retrieves 4k from the swap file, and resumes your program like the fault never happened. Faults are really just built in hardware interrupts caused by the CPU itself, based on information in the page tables, to transfer control to the OS, which then investigates why the fault occurred, and takes action (either kill the process or retrieve the page from swap, etc).

New pages are mapped in as required. Keyword "new" for example, will return memory if there is some free memory in the heap, or if the heap is used up, call the OS to request more RAM be mapped to the current process, THEN return the new memory at the newly used virtual. The OS merely makes a change to the page table, and instantly a 4k page of something appears in your address space. This can be memory mapped hardware, a shared read only section of library code (1 physical 4k page that is written to all page tables), heap memory requested with malloc/new, etc. The whole address space can't be used because the OS needs to be in RAM on a context change. That is, the ram from 0x80000000 to 0xFFFFFFFF is address space that has OS and hardware resource ready immediately when the privilege changes from a system call, interrupt, etc. The CPU has to immediately vector to OS code from the currently running process, thus the OS must be immediately visible in the virtual address space of ALL running processes.

This is what is referred to as 2/2 GB split, or optional 3/1 GB, and the reason the application does not have 4 GB to itself. This virtual address space crowding is *the* biggest reason for 64 bit, not necessarily physical ram amount (which already went to 16+ GB in 32 bit with PAE). This is what causes System Properties to only show 2-3GB ram when you have 4 GB installed. The push for 64 bit wasn't so much for physical RAM stick capacity as it was for cramped virtual memory address space. You could have 16 GB physical ram on a 32 bit CPU, but the application can only see it in a virtual window of 2 GB at a time (from 0x00000000 to 0x7FFFFFFF). 64 bit address space eliminates this problem even if you only have 4GB RAM installed.

PS: No regard for knowledge levels here, I'm assuming there is some level of understanding of programming/OS to be asking this question :)
 
Last edited:

exdeath

Lifer
Jan 29, 2004
13,679
10
81
That's the whole story above.

The simple answer for the direct question asked:

A memory address contains a single byte. The next address, another byte, and so on. Physically the CPU can, depending on the instruction used, access 1 byte or up to 16 bytes at a time (128 bit SIMD/MMX/SSE type vector instructions). The address provided for any of those instructions is simply the start of the data for a given size. If you are only accessing a byte at that address, all it loads is the one byte. If you were accessing 4 bytes at a time (32 bit movs) it would load 4 bytes, with the first one starting at the address you give, and you would increment your memory address by 4 each time to read successive groups of 4 bytes.

Lets say you have this 6 byte string of text in memory: Hello!
Each each byte/character at these addresses:

0x0000 'H'
0x0001 'e'
0x0002 'l'
0x0003 'l'
0x0004 'o'
0x0005 '!'

A byte read at 0x0003 produces the single byte 'l'. A 16 bit word read at the same address produces 'lo'. A 32 bit read at 0x0001 loads the register with 'ello'. A 64 bit mov from address 0x0000 to a 64 bit register would result in the register containing 'Hello!**' where * is whatever random garbage bytes are at addresses 0x0006 and 0x0007 since I didn't specify them (or they would be the bytes 0x0D 0x0A for newline+carriage return and the file would be 8 bytes total if this was an ascii text file that was created from say, notepad).

A 32 bit read at 0x0001 sees 'ello' and the next 32 bit read at 0x0002 sees 'llo!' if you only increment by 1 instead of 4. Memory address of data being processed is always incremented by the size (in bytes on x86) of each element.

Note: there is something called "endian" and "significant bytes" and byte order that I'm leaving out here for simplicity. That is to say, a 32 bit read from 0x0000 in the above example on a x86 results in a register containing "lleH" and not "Hell", and this is purely a hardware convention having to do with which end of the register is loaded first, the big end or little end. It's just the convention that we read left to right, but register bits are ordered right to left, as in [31, 30, 29, ....., 3, 2, 1, 0]. On x86, it stuffs the registers starting with the byte at the lowest address going to the little end of the register (bits 0-7) first, otherwise known as "little endian". This seems backwards, esp to someone who preaches RISC/big endian, but little endian is actually the "correct" way in a physical/electrical sense.
 
Last edited:

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
Sry i still don't get it. One 64-bit integer, is 1 Address enough to access it? How is it fetched? Get Address xyz and all 7 following bytes (in virtual memory) or are 8 addresses sent and then combined together?



Start with the hardware layer. Memory chips are designed with varying layouts.

Blocks - if I want to read 4 bits I have to read the whole byte that make up that block and discard what I don't want. Some blocks can be as large as 8KB, so imagine the waste if you only need a few bits .

Sectors - Groups of blocks that make up a set amount of storage. If you look at how memory chips are sold they have designations like 16x64. Basically it has 16 sectors on the chip with each sector containing 64 blocks.

Pages - groups of sectors

The hardware is responsible for translating the chips internal layout into addresses that can be accessed to read or write information.

Internally it gets a lot more complicated. Suppose the information that was stored was written to two different sectors and the data is at the end of one sector and the start of another, the size doesn't matter, lets say 124 bits. The hardware would have to issue commands in order like:
convert 128 bits to the correct amount of blocks for the chip used
Find what sector and block the start address is for the data
Calculate what sector and block will be the last part of the data
Select the start sector
Select the start block
Read the block until sector boundary is encountered
Select the next sector
Select the block
Read the blocks until all the data is read
Remove any bits not needed for the requested data, software wanted 124 bits , we can only read 128 bits, so discard 4 bits
Return 124 bits of data to program

This part of the hardware is called the MMU
http://en.wikipedia.org/wiki/Memory_management_unit


If the data is stored at 3 different addresses then the process would be to
get the start address for data part 1 and determine how many sectors and blocks it consumes, repeat for part 2 and part 3 , then combine into one value.

The OS would store the information like this with each address and size required to make up the total data.
0x401000, 72 kB
0x413000, 20 kB
0x41c000, 32 kB


To get the data it has to access each location then combine them into one in another part of memory and return that value to the program. The more fragmented the data the bigger the list of addresses and the more memory keeping that reference table requires. When you go to 64 bit , 32 bit applications use more memory, not a lot but more . The reason for this is because the addresses get longer and it takes more memory to store the reference addresses for the data. Regardless of whether the program is using more than 32bits worth of memory , the OS still has to store the reference values of the locations in its memory and if that happens to be in the upper part of memory then the values to store get larger.

Above would become:
0xC001401000, 72 kB
0xC001413000, 20 kB
0xC00141c000, 32 kB

The data at the addresses are still the same amount but the addresses got longer.

If the memory is not fragmented then 1 address can cover any length of data
0xffff , 100GB

It ins't the most efficient of systems but it works :)
 

exdeath

Lifer
Jan 29, 2004
13,679
10
81
Sry i still don't get it. One 64-bit integer, is 1 Address enough to access it? How is it fetched? Get Address xyz and all 7 following bytes (in virtual memory) or are 8 addresses sent and then combined together?

Forget virtual memory, it really doesn't matter other than recognizing that the address the application requests and the address that comes out on the address pins on the bus will differ due to translation.

How much it can physically access per cycle is determined not by the address bus, but the data bus. A 64 bit CPU with 64 bit registers and 64 bit address space can have a 32 bit DATA bus. You issue one instruction to read a 64 bit int from a single address, and the CPU would end up issuing two 32 bit bus cycles for your address and address+4. Perfect example of this is the 8088: it's a 16 bit CPU, with 20 bit addresses, but only has an 8 bit data bus. "mov ax, [address]" results in a 16 bit read that requires two bus transactions to complete on a 8088, but only 1 transaction on a 8086 (same exact thing as the 8088 but with 16 data lines instead of 8).

A true 64 bit CPU with a 64 bit address AND data bus would access all 8 bytes in one bus cycle with one address generation. The address of the first byte is put out on the address lines, and all 64 bits of the data bus are connected to the 8 bytes in memory starting with the one at your address. *(this is responsible for 2 below)

Multiple bus accesses are only required on two occasions:

1) if the data size is larger than the physical data bus size.

2) typically to save costs and complexity, the hardware is wired to address in multiples of the data bus. If the address of your 8 bytes is not evenly divisible by the size of the data bus, or "straddles 64 bit boundary" or isn't 64 bit aligned in programming lingo, two accesses are required, even though the CPU is capable of one, since the addressing hardware is optimized to only address multiples of the data bus size, and not just anywhere arbitrarily. 0x00000001 is a valid address for reading a INT64, but it will be slower than reading from 0x00000000 or 0x00000008 due to the split due to the way the hardware works.

In reality no modern CPU accesses memory in individual bytes like this, but in large bursted blocks known as "cache lines". The time it takes for DRAM to select rows and columns and setup time for the read/write takes too long, so when you read some bytes, it generates the address once, of the entire cache line your address lies in, and bursts the entire cache line into cache (many many bytes or so at a time) even if just reading 1 byte. So in the real world, on a modern CPU, your 64 bit read from one address loads like 128 bytes or something to that extent, to minimize the impact of the slow DRAM and amortize DRAM setup time. DRAM is a sequential burst technology, it's not very good at randomly hopping around due to the way it's organized in rows and columns with various latencies to start a new transaction.
 
Last edited: