• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How do DBMSes work?

Red Squirrel

No Lifer
I will be writing a custom DBMS that will be custom tailored for a specific application, unlike something like mysql, the data is not really structured in a way where it makes sense to use mysql or other such DB.

I want to read up on how DBMSes work as far as how they store the data to disk, and how they do it so fast and efficiently, to give me an idea of techniques I can use.

For example how does all the table data get stored into a single file, and how does it get updated without having to read the entire file into memory then rewriting it, each time? I can't seem to find much info online on such techniques, so I'm basically just looking for stuff I can read.

Another good example is the MS Exchange store. It's a single file yet stored tons of data and is fairly efficient at being accessed.
 
Sounds like a bunch of a work with not a lot of value. Why not use a database engine already built that has tons of support?
 
Up until this century it was mostly ISAM (Indexed Sequential Access Method) but recently there has been a divergence into virtual and very tabled algorithms. IBM calls its method method VSAM (Virtual storage access method). MySQL calls its method MySAM.

No matter what you call it, it's a bunch of tables and hashes pointing to chunks of memory.
 
If you really want to create a novel database, then it may be useful to peruse the ACM Digital Library.

A search for "how a database stores data to disk" turned up the following papers.
Storing semi-structured data on disk drives
StorHouse metanoia - new applications for database, storage & data warehousing
The next database revolution
A data warehouse appliance for the mass market
Office documents on a database kernel—filing, retrieval, and archiving

Another useful search term would be "parallel I/O" (input/output) or "parallel file system".
 
Last edited:
I need to throw the brakes on this one, Red. I understand an innate need to write things like connection libraries and floating-point printing routines (sometimes I like to do stuff like that, too), but a DBMS is rivaled only by an OS in size and complexity.

For a guide on how a basic DBMS works, I recommend my old college text (the cow book).
http://highered.mcgraw-hill.com/sites/0072465638/
http://pages.cs.wisc.edu/~dbbook/

For more practical backend details, I recommend google and/or this forum via specific questions.

For example how does all the table data get stored into a single file, and how does it get updated without having to read the entire file into memory then rewriting it, each time?

1. mmap()
2. Not all DBs use files. Some use raw unparititioned space. DBs are usually smarter than the OS as far as I/O scheduling... or at least think they are.
3. As far as in-place updates, well, thats complicated and requires an understanding of the various systems within a DB. In short, all data is organized on to pages (not necessarily the same size as a disk block) in a "buffer pool". Records are overlaid on top of pages by the record manager. If the DB is ACID, a log of updates is written and flushed before buffer pool pages are released, etc.
 
Red might be using the term "DBMS" somewhat loosely, but still I have to toss my lot in with the other repliers and suggest that there must be something out there already that is suited to your data. Of course, if it's truly a unique requirement... have fun 🙂.
 
Yeah DBMS may be the wrong term for what I n eed. Basically it will be an app that runs on a tcp port and accepts data and stores it to disk, updating existing records as needed. I'll read up on what has been posted so far.
 
Yeah DBMS may be the wrong term for what I n eed. Basically it will be an app that runs on a tcp port and accepts data and stores it to disk, updating existing records as needed. I'll read up on what has been posted so far.

You mean a web service?
 
Web service is too vague. That could be a web server, a mail server etc...

My app will be for serialzied data storage.

No, not quite. A mail server is a mail server, just like a web server is a web server.

Wikipedia said:
Web services are typically application programming interfaces (API) or web APIs that can be accessed over a network, such as the Internet, and executed on a remote system hosting the requested services.

What you described fits the definition of a web service perfectly. You want your users to be able to make changes to your data with a confined set of restrictions. That is exactly what a web service is designed to do.

If you are in the .NET world you can look into using WCF if you need to offer compatibility for legacy services via SOAP while supporting other RPC mechanisms like namedpipes or TCP channels.
 
I will be writing a custom DBMS that will be custom tailored for a specific application, unlike something like mysql, the data is not really structured in a way where it makes sense to use mysql or other such DB

Here we go again...

You've obviously got some structure to your data because you mention records so what's wrong with MySQL or even SQLite?
 
Every time I come here I seem to find Red asking a very complex question for some huge project where the solution seems to have already been solved by people much smarter then all of us. What gives?
 
Here we go again...

You've obviously got some structure to your data because you mention records so what's wrong with MySQL or even SQLite?

I'm currently using MySQL for this, but I'm pretty much killing the DB server. My data is not structured, it's a header, then a variable lenght stream of serialized data. Ex:

Code:
INSERT INTO `items` (`iserial`, `iupdatecounter`, `itypeid`, `itypestr`, `iserializedata`) VALUES (1073799188, 0, 193, 'Server.Items.LargeCrate', '0;68554768;1491786;5492;576;58;3644;0;1;0;2;0;2;0;0;0;5;00;100;0;0;00;');

The last field can range anywhere from a few chars, to 100k chars. Idealy this data should be "compressed" to binary. Ex a 32-bit int only needs to take 4 chars. That part will be easy, I wrote a class to take care of this.
 
Every time I come here I seem to find Red asking a very complex question for some huge project where the solution seems to have already been solved by people much smarter then all of us. What gives?

If he told you, he'd have to... yeah, you know.
 
I'm currently using MySQL for this, but I'm pretty much killing the DB server. My data is not structured, it's a header, then a variable lenght stream of serialized data. Ex:

Code:
INSERT INTO `items` (`iserial`, `iupdatecounter`, `itypeid`, `itypestr`, `iserializedata`) VALUES (1073799188, 0, 193, 'Server.Items.LargeCrate', '0;68554768;1491786;5492;576;58;3644;0;1;0;2;0;2;0;0;0;5;00;100;0;0;00;');

The last field can range anywhere from a few chars, to 100k chars. Idealy this data should be "compressed" to binary. Ex a 32-bit int only needs to take 4 chars. That part will be easy, I wrote a class to take care of this.

What's wrong with BLOBs?

Alternatively, why not just use files if you think a database can't handle it?

As I think about it, I have to wonder what on earth you could be doing that a modern database can't handle that you think your own code could?
 
Last edited:
What's wrong with BLOBs?

Alternatively, why not just use files if you think a database can't handle it?

As I think about it, I have to wonder what on earth you could be doing that a modern database can't handle that you think your own code could?

That's essentially what my system will do, is use file(s).
 
Back
Top