Inside a DB File

OogyWaWa

Senior member
Jan 20, 2009
623
0
71
I've been googling this for a while, but haven't come up with any decent results. Does anyone have a reference to a white paper or site that explains the internals of a DBMS managed database file. I'm thinking there won't be one for MSSQL, Oracle, etc. but for the open source systems MySQL, SQLite, etc. I would think there is some documentation on this.

What I'm looking for is something that explains what the inside of a DB file looks like. The actual file that holds the table information, row values, etc. I realize a lot of it would probably look like garbage since the logic is built into the DBMS, but i'm just curious as to HOW different it is from a flat file (XML, CSV, etc).

Thanks :)
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
As you point out, a lot depends on the DB. You might want to start with a database text.

Briefly, a table consists of a plurality of pages. Each page contains a tiny amount of metadata, e.g., some lock bits, pointers to other pages, etc., but mostly a page contains individual records, laid out in a DBMS-specific format. A DB page corresponds to a plurality of disk pages, in the even that your DBMS operates on raw disk, or may correspond to a file in the OS's file system otherwise.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Not only will it depend on the DBMS you're using, but many support multiple storage engines as well. For example, mysql supports InnoDB and MYISAM among others.
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
Not only will it depend on the DBMS you're using, but many support multiple storage engines as well. For example, mysql supports InnoDB and MYISAM among others.

Is this unique to MySQL since it is opensource? Or do other DBMSs support different engines to make switching easier?
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Is this unique to MySQL since it is opensource? Or do other DBMSs support different engines to make switching easier?

So far MySQL is the only DBMS I've run into that supports switching the storage engine, but I certainly haven't used them all.
 

brandonb

Diamond Member
Oct 17, 2006
3,731
2
0
Microsoft has listed in their SQL books online the nature of their files.

It's spread across multiple help pages, but here is an idea:

Microsoft said:
The fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.

Pages
In SQL Server, the page size is 8 KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.

The following table shows the page types used in the data files of a SQL Server database.

Page type Contents
Data
Data rows with all data, except text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data, when text in row is set to ON.

Index
Index entries.

Text/Image
Large object data types:

text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data


Variable length columns when the data row exceeds 8 KB:

varchar, nvarchar, varbinary, and sql_variant



Global Allocation Map, Shared Global Allocation Map
Information about whether extents are allocated.

Page Free Space
Information about page allocation and free space available on pages.

Index Allocation Map
Information about extents used by a table or index per allocation unit.

Bulk Changed Map
Information about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit.

Differential Changed Map
Information about extents that have changed since the last BACKUP DATABASE statement per allocation unit.

Note:
Log files do not contain pages; they contain a series of log records.



Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.


Large Row Support
Rows cannot span pages, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. This restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page. For more information, see Row-Overflow Data Exceeding 8 KB.

Extents
Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.

To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:

Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.


Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.


A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

Of course there are no pictures in my quote, and has more details on other pages about each thing they talk about.
 
Last edited:

Crusty

Lifer
Sep 30, 2001
12,684
2
81
So far MySQL is the only DBMS I've run into that supports switching the storage engine, but I certainly haven't used them all.

I was under the impression that PostgreSQL had the same features, but after a quick search it turns out that it doesn't. MySQL very well might be the only large DMBS that supports multiple storage engines.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
I was under the impression that PostgreSQL had the same features, but after a quick search it turns out that it doesn't. MySQL very well might be the only large DMBS that supports multiple storage engines.

Yeah, Postgre was the main one that I haven't used, that I thought might support it. I doubt any of the mainstream commercial engines like Oracle/Sybase/SQL Server would have any incentive to do so.