How do DBMSes work?

Red Squirrel · Mar 14, 2010

I will be writing a custom DBMS that will be custom tailored for a specific application, unlike something like mysql, the data is not really structured in a way where it makes sense to use mysql or other such DB.

I want to read up on how DBMSes work as far as how they store the data to disk, and how they do it so fast and efficiently, to give me an idea of techniques I can use.

For example how does all the table data get stored into a single file, and how does it get updated without having to read the entire file into memory then rewriting it, each time? I can't seem to find much info online on such techniques, so I'm basically just looking for stuff I can read.

Another good example is the MS Exchange store. It's a single file yet stored tons of data and is fairly efficient at being accessed.

KLin · Mar 15, 2010

Sounds like a bunch of a work with not a lot of value. Why not use a database engine already built that has tons of support?

Schmide · Mar 15, 2010

Up until this century it was mostly ISAM (Indexed Sequential Access Method) but recently there has been a divergence into virtual and very tabled algorithms. IBM calls its method method VSAM (Virtual storage access method). MySQL calls its method MySAM.

No matter what you call it, it's a bunch of tables and hashes pointing to chunks of memory.

chusteczka · Mar 15, 2010

If you really want to create a novel database, then it may be useful to peruse the ACM Digital Library.

A search for "how a database stores data to disk" turned up the following papers.
Storing semi-structured data on disk drives
StorHouse metanoia - new applications for database, storage & data warehousing
The next database revolution
A data warehouse appliance for the mass market
Office documents on a database kernel—filing, retrieval, and archiving

Another useful search term would be "parallel I/O" (input/output) or "parallel file system".

GodlessAstronomer · Mar 15, 2010

If you data isn't suited to traditional relational DBs you may want to consider CouchDB or MongoDB, a pair of interesting document-oriented databases.

degibson · Mar 15, 2010

I need to throw the brakes on this one, Red. I understand an innate need to write things like connection libraries and floating-point printing routines (sometimes I like to do stuff like that, too), but a DBMS is rivaled only by an OS in size and complexity.

For a guide on how a basic DBMS works, I recommend my old college text (the cow book).
http://highered.mcgraw-hill.com/sites/0072465638/
http://pages.cs.wisc.edu/~dbbook/

For more practical backend details, I recommend google and/or this forum via specific questions.

For example how does all the table data get stored into a single file, and how does it get updated without having to read the entire file into memory then rewriting it, each time?

1. mmap()
2. Not all DBs use files. Some use raw unparititioned space. DBs are usually smarter than the OS as far as I/O scheduling... or at least think they are.
3. As far as in-place updates, well, thats complicated and requires an understanding of the various systems within a DB. In short, all data is organized on to pages (not necessarily the same size as a disk block) in a "buffer pool". Records are overlaid on top of pages by the record manager. If the DB is ACID, a log of updates is written and flushed before buffer pool pages are released, etc.

Markbnj · Mar 15, 2010

Red might be using the term "DBMS" somewhat loosely, but still I have to toss my lot in with the other repliers and suggest that there must be something out there already that is suited to your data. Of course, if it's truly a unique requirement... have fun 🙂.

Red Squirrel · Mar 15, 2010

Yeah DBMS may be the wrong term for what I n eed. Basically it will be an app that runs on a tcp port and accepts data and stores it to disk, updating existing records as needed. I'll read up on what has been posted so far.

Crusty · Mar 15, 2010

RedSquirrel said:
Yeah DBMS may be the wrong term for what I n eed. Basically it will be an app that runs on a tcp port and accepts data and stores it to disk, updating existing records as needed. I'll read up on what has been posted so far.

You mean a web service?

Red Squirrel · Mar 15, 2010

Crusty said:
You mean a web service?

Web service is too vague. That could be a web server, a mail server etc...

My app will be for serialzied data storage.

HardcoreRobot · Mar 15, 2010

Reinvent the Wheel.

Dont do it.

Crusty · Mar 15, 2010

RedSquirrel said:
Web service is too vague. That could be a web server, a mail server etc...

My app will be for serialzied data storage.

No, not quite. A mail server is a mail server, just like a web server is a web server.

Wikipedia said:
Web services are typically application programming interfaces (API) or web APIs that can be accessed over a network, such as the Internet, and executed on a remote system hosting the requested services.

What you described fits the definition of a web service perfectly. You want your users to be able to make changes to your data with a confined set of restrictions. That is exactly what a web service is designed to do.

If you are in the .NET world you can look into using WCF if you need to offer compatibility for legacy services via SOAP while supporting other RPC mechanisms like namedpipes or TCP channels.

Nothinman · Mar 15, 2010

I will be writing a custom DBMS that will be custom tailored for a specific application, unlike something like mysql, the data is not really structured in a way where it makes sense to use mysql or other such DB

Here we go again...

You've obviously got some structure to your data because you mention records so what's wrong with MySQL or even SQLite?

Cogman · Mar 15, 2010

HardcoreRobot said:
Reinvent the Wheel.

Dont do it.

There's nothing wrong with reinventing the wheel, so long as it is something done in your spare time (or for a class).

HardcoreRobot · Mar 15, 2010

Cogman said:
There's nothing wrong with reinventing the wheel, so long as it is something done in your spare time (or for a class).

good point. i assumed non-educational purposes.

sourceninja · Mar 15, 2010

Every time I come here I seem to find Red asking a very complex question for some huge project where the solution seems to have already been solved by people much smarter then all of us. What gives?

Red Squirrel · Mar 15, 2010

Nothinman said:
Here we go again...

You've obviously got some structure to your data because you mention records so what's wrong with MySQL or even SQLite?

I'm currently using MySQL for this, but I'm pretty much killing the DB server. My data is not structured, it's a header, then a variable lenght stream of serialized data. Ex:

Code:

INSERT INTO `items` (`iserial`, `iupdatecounter`, `itypeid`, `itypestr`, `iserializedata`) VALUES (1073799188, 0, 193, 'Server.Items.LargeCrate', '0;68554768;1491786;5492;576;58;3644;0;1;0;2;0;2;0;0;0;5;00;100;0;0;00;');

The last field can range anywhere from a few chars, to 100k chars. Idealy this data should be "compressed" to binary. Ex a 32-bit int only needs to take 4 chars. That part will be easy, I wrote a class to take care of this.

Markbnj · Mar 15, 2010

sourceninja said:
Every time I come here I seem to find Red asking a very complex question for some huge project where the solution seems to have already been solved by people much smarter then all of us. What gives?

If he told you, he'd have to... yeah, you know.

BoberFett · Mar 15, 2010

RedSquirrel said:
I'm currently using MySQL for this, but I'm pretty much killing the DB server. My data is not structured, it's a header, then a variable lenght stream of serialized data. Ex:

Code:

INSERT INTO `items` (`iserial`, `iupdatecounter`, `itypeid`, `itypestr`, `iserializedata`) VALUES (1073799188, 0, 193, 'Server.Items.LargeCrate', '0;68554768;1491786;5492;576;58;3644;0;1;0;2;0;2;0;0;0;5;00;100;0;0;00;');

The last field can range anywhere from a few chars, to 100k chars. Idealy this data should be "compressed" to binary. Ex a 32-bit int only needs to take 4 chars. That part will be easy, I wrote a class to take care of this.

What's wrong with BLOBs?

Alternatively, why not just use files if you think a database can't handle it?

As I think about it, I have to wonder what on earth you could be doing that a modern database can't handle that you think your own code could?

Red Squirrel · Mar 15, 2010

BoberFett said:
What's wrong with BLOBs?

Alternatively, why not just use files if you think a database can't handle it?

As I think about it, I have to wonder what on earth you could be doing that a modern database can't handle that you think your own code could?

That's essentially what my system will do, is use file(s).

degibson · Mar 15, 2010

RedSquirrel said:
That's essentially what my system will do, is use file(s).

So what more will your system do than will be done by the file system?

sourceninja · Mar 15, 2010

Maybe you would like a noSQL database? http://en.wikipedia.org/wiki/NoSQL

KLin · Mar 16, 2010

OP needs to post more details about what he's wanting to do.

Schmide · Mar 16, 2010

KLin said:
OP needs to post more details about what he's wanting to do.

The same thing we do every night, Pinky - try to take over the world!

Nothinman · Mar 16, 2010

Schmide said:
The same thing we do every night, Pinky - try to take over the world!

Or in RS' case, re-implement the world...

How do DBMSes work?

No Lifer

Lifer

Diamond Member

Diamond Member

Lifer

Golden Member

Elite Member <br>Moderator Emeritus

No Lifer

Lifer

No Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Diamond Member

No Lifer

Elite Member <br>Moderator Emeritus

Lifer

No Lifer

Golden Member

Diamond Member

Lifer

Diamond Member

Elite Member