How do DBMSes work?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Or in RS' case, re-implement the world...

Well, I for one am interested in what RedSquirrel is up to. Re-implementations have value, after all, both from the perspective of improving one's own skill, and they can also yield improvements in the target implementations. In this particular case, I'm just not sure what the heck RS is trying to do. :)
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
I think when someone states they are trying to do something, and others see little value in it, they're going to hear back on that. But once the point has been made, and the person persists in pursuit of the goal, then we shrug and help 'em out.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Well, I for one am interested in what RedSquirrel is up to. Re-implementations have value, after all, both from the perspective of improving one's own skill, and they can also yield improvements in the target implementations. In this particular case, I'm just not sure what the heck RS is trying to do. :)

I agree that they can have value when done to fix a certain problem in the old implementation that can't easily be fixed. But generally when RS goes down one of these paths he's trying to fix problems that aren't really problems.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
I agree that they can have value when done to fix a certain problem in the old implementation that can't easily be fixed. But generally when RS goes down one of these paths he's trying to fix problems that aren't really problems.

Yea, but every time RS does this, he learns a helluva lot about the problem domain. I can't speak for the rest of the forum, but I like looking at problems from a new perspective, too -- and I usually manage to learn something. Thats part of the value of these re-implementations -- just plain ol' experience! It's no inconvenience to us to help out, after all.
 

Red Squirrel

No Lifer
May 24, 2003
70,583
13,805
126
www.anyf.ca
Yea, but every time RS does this, he learns a helluva lot about the problem domain. I can't speak for the rest of the forum, but I like looking at problems from a new perspective, too -- and I usually manage to learn something. Thats part of the value of these re-implementations -- just plain ol' experience! It's no inconvenience to us to help out, after all.

This is actually more or less why I want to know as well, for learning experience. I'm sure something like MySQL must have all sorts of advanced techniques for writing to the DB files, and formatting them in a way that makes it easy to retrieve information fast from disk, as well as update it fast. I know there's a concept called a lazy writer, that is one way that is used for fast retrieval if a query similar to a past query is made. I'm sure there's tons of other techniques used as well.


For more background of how the app works, here is how, before I touched it:

The app, which is a game server, loads everything from serialized files. This is probably the fastest possible way for the type of app and data. Now to change the file, the ENTIRE data set had to be written over, completely. Now this process is somewhat fast enough given there is very little overhead. The problem is, to avoid corruption, it had to halt the entire app to do the save. As the data set grows, so do the saves. If the program crashes or is stopped prematurely, then the game data has to be reverted to the last save. So this makes it important to perform saves often enough. If someone killed a hard boss, got a very rare item, and the server crashes before a save, I have a very pissed off player!

So my fix for this issue was to make it so "saves" only save what changes. So it does a save to a memory location, this save halts the game, but it is only a few milliseconds, and they happen every 5 seconds (in theory). Very unoticable. As data changes, it is saved to memory. Then another thread takes the data from the saved memory location, and does a series of updates, adds, or deletes to the database, depending on what is needed. Once this is done, then if it's been more then 5 seconds since the last save it saves again. This is rather flawless and works great. My issue is, MySQL is not really meant for the type of data I'm giving it. It's not really well structured.

So I decided I may be better off writing a custom app that is designed for accepting this type of data. I have a good idea how I will do most of it.

I just want tips on how to properly store this data to disk, so the app can keep up with the updates fast enough. Ex: is fstream fast enough, should I have an index file that points to a char location on a data file? Is there a way I can write data in the middle of a file, and push the rest of the data over? All questions I'm wondering. Should I maybe load the entire thing into memory (rather not do this, as this data grows all the time, and in a leased server environment, ram is VERY expensive) should I have 1 file per record? Lot of different ways I can do this, and I want to learn more how to do this efficiently.

You may be wondering why I don't just code this in directly into the game server app, instead of having a second app. 1) C++ / Linux will probably be faster (the game server is C#) and most importantly 2) it gives me the option to run it on a separate server so the game server can waste less time on data saving, and simply hand it off the another server. In the future I may even add clustering support, but that's a whole other ball game.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Ah, I think I see how this fits in with the other projects you mentioned.

General Response: First of all, based on your description of requirements, what you need is a transaction processing system. You don't want a system failure to leave a user's character in an inconsistent state. Hence, you need atomicity and durability. You'll also need composability if you ever have more than one transaction in flight. You probably do not need isolation, so you can run at lower consistency degrees than serialization.

Transaction processing is pretty hard to get right. (some) Databases do this by keeping a disk-based log of all updates (old values and new values) -- updates are pushed to the log before anything is considered 'done'. This is called Write-Ahead Logging (WAL). As a performance optimization, the original data structure is updated at the same time, but not pushed to disk. The log is just replayed if the system crashes.

For an example, I suggest you google "aries logging protocol".

I'm not sure what MySQL's transactional capabilities are. I.e., I don't know how protected your current set-up is.

Specific Responses:
Is there a way I can write data in the middle of a file, and push the rest of the data over?
Yes. Use C-language file routines, e.g., fseek(), or go all the way down the syscalls themselves, seek(). Those routines can be used to overwrite (as opposed to insert) bytes mid-file.

... should I have an index ... ?
Yes. Either implicit (e.g., easily computable) or explicit.

...should I have 1 file per record?
If your goal is to avoid corruption (from crashing), you shouldn't use the file system at all (see transaction processing system, above). If you use the FS anyway, at least use it with the no-cache flag set (O_DIRECT).

2) it gives me the option to run it on a separate server so the game server can waste less time on data saving
Its usually possible to set up writes to be non-blocking, with or without completion notification. Under linux, you would use fcntl() (IIRC).

C++ / Linux will probably be faster
You might as well use interpreted python if its as I/O bound as you claim.

Lastly:
MySQL is not really meant for the type of data I'm giving it. It's not really well structured.
Are you sure? I'm no schema expert, but it seems to me things like player data fit nicely into the relational DB model.
 

Red Squirrel

No Lifer
May 24, 2003
70,583
13,805
126
www.anyf.ca
Thanks that's the type of tips I was looking for, I'll read up on aries logging protocol.

And yeah the way the app is designed (not my idea) the data is not well DB friendly. If I had designed this from ground up I definably would have made it structured better for a database.
 

Red Squirrel

No Lifer
May 24, 2003
70,583
13,805
126
www.anyf.ca
I finally got started on my app, I'm only at the structural part right now, got it to listen on the port and accept traffic using non blocking sockets. I may add multithreading but I'll see how it goes with one thread. Multithreading makes writing an app much more complex so I always keep it in mind, but only use it if I really need to.

For the actual data storage part, here is what I have in mind:

A store will consist of multiple storebanks, which are basically equivalent to "tables".

A store bank will basically be an index file, and a data file. The data file will contain serialized data in the format [32-bit serial] [32-bit size] [data] for each element. The index will contain a listing of [serial] [position in file] entries.

Upon program loading, it will do a partial load of all the entries in memory using a dictionary data type (I read up quickly, it appears "std::map" is what I want to use) with the serial being the key, and a blank data set as the value. These will only load as required, so that way I'm not just loading an unknown amount of data into memory and keeping it there.

In the case of my app, I only really need to load once, and load everything. So when the load command is sent from the client it will basically "dump" the data file to the client over the network. The client will then keep the data in memory in it's own existing format.

From this point on, the program will only really need to write data, not read. So commands will come in quite regularly from the client to update an object. The data will be added to the entry that matches the key of the map, and it will start populating the entries. I might change this up a little and just make it a queue system instead.

A background process will go through these and write it to the file. The goal is to try to get everything on the file system as fast as possible, but the priority is taking the data from the client faster, so if it needs to buffer to memory longer, then it will but it will flush to file as soon as it can.

Now to determine WHERE to write to the file, is where the index (loaded into memory) comes in. the index will basically be a mapping of each item by serial, and the location on the file that it is. So to write to the file it will simply use the seek method to write over the existing data. If the size of the data set is bigger and there's not enough room to add the new data in that spot, then it will be written at the end, or another empty space. The old entry will be obsolete and no longer in records, kinda like how deleting a file leaves it on the hard drive but the link is gone.

In the background the app will always be doing a defrag which will basically check if entries should be moved over.

This is where multitasking may come to play. I will have a mutex to make sure the file is only opened once. So if a new data change is required it will just stay in memory until the file is available again.

So far does this system sound like it would work well? Is there a better way I can do it? I want to stick to the file system, I can't really imagine what the alternatives would be, without getting super complicated.

If I changed this up and made it so I have 1 file per data entry (TONS of small files) would that be better? Or worse? It would be way easier to do. If it matters, this app will run on Linux using ext3 file system.

There will also be built in backup/snapshot features, but this will be more or less copies of the store that will just be renamed in a systematic way where the program takes care of everything.
 
Oct 27, 2007
17,009
5
0
You're really going to go through with this? You never answered this question:
What's wrong with BLOBs?
Seems like a BLOB solves exactly the problem you're looking at (or maybe CLOBs, you haven't given enough detail to determine that). If you write your own pseudo-DBMS you're going to miss out on the millions of man-hours than have made modern DBs so great: caching, indexing, atomicity, reliability, isolation, durability, data consistency, etc.

It's your choice and it sounds like an interesting project to whittle away some free time, but I'd seriously advise you to consider the overwhelming sentiment of this thread: use an existing solution.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Why would you start with the networking code? That seems like one of the last things to get done IMO. Design your API for accessing the server and then build a network client that calls that API. If you are trying to debug the networking code the same time you are working on the storage engine you won't get very far very fast.

What happens if you lose power or your server crashes before the data gets committed to disk? Is your client going to assume that on a successful 'query/update call' everything is saved permanently or are you going to use an asynchronous callback to notify the client? IF all of the sudden you have a flurry of activity and your disks can't keep up with your updates you could lose data if there is no way of knowing the data is on disk from the client.

Are you going to use SQL as your query language or do you not need that kind of flexibility? It sounds like you have no typing system for the data so I don't see why you would need a full blown query language if all you are doing is getting/setting key/value pairs.

Look into memory mapped files for your storage... tons and tons of small files will destroy your I/O.
 

Red Squirrel

No Lifer
May 24, 2003
70,583
13,805
126
www.anyf.ca
Why would you start with the networking code? That seems like one of the last things to get done IMO. Design your API for accessing the server and then build a network client that calls that API. If you are trying to debug the networking code the same time you are working on the storage engine you won't get very far very fast.

What happens if you lose power or your server crashes before the data gets committed to disk? Is your client going to assume that on a successful 'query/update call' everything is saved permanently or are you going to use an asynchronous callback to notify the client? IF all of the sudden you have a flurry of activity and your disks can't keep up with your updates you could lose data if there is no way of knowing the data is on disk from the client.

Are you going to use SQL as your query language or do you not need that kind of flexibility? It sounds like you have no typing system for the data so I don't see why you would need a full blown query language if all you are doing is getting/setting key/value pairs.

Look into memory mapped files for your storage... tons and tons of small files will destroy your I/O.

The network portion is the first thing that gets loaded, so it made sense to start with that. It's also the easiest part as I already have a class I wrote a while back to handle all of that. I still need to write the actual packet handlers but those will come as I do the data engine.

I wont be using SQL, the language will actually be quite simple and custom.

For example the client writing data will send a packet that looks like this:

[packetid][itemserial][size][data] for write

This will be sent in a packet, the ID of the packet will call up the "add data" handler, and it will take this data and either update if that serial is already in use, or add it. I will also need to have another packet to delete an item. It will be lot of small packets going around. The first byte will always be an ID for what type of packet is sent. There will probably be some kind of handshake packets too to acknowledge that something was successful.

And stuff like power and all that, well same thing as if the power would go out on a regular MySQL server. If the data was not written to disk yet, it will most likely be lost. This is why I want it written as fast as possible. If the actual app goes down but the client is still up, then the client starts writing to a local transaction log, then once the data server comes back it starts fireing off the queries. not the most efficient way as these logs grow FAST but I would bring the server down if there is an extended outage. This is more to cover blips that could potentially occur.

There will be a snapshot feature where the client occasionally sends a "take snapshot" command, where it will set the existing dataset aside. If the program crashes, it will load from there and ignore any progress, as the progress may not be atomic.

As for blobs, they aren't very efficient. MySQL is better suited for storing structured, textual data. Start throwing binary stuff at it, and things start to slow down at least that's what I noticed with my existing setup.

So lot of stuff to consider but I think once I'm done this should be quite solid.

I googled memory mapped files and it does sound interesting, I'll look at it further.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
In Mysql the default setting is with auto committing on, this means that as soon as your statement is done executing the changes are written to disk.

So your operations look like this
Client Send Update
Server Receive Update
start transaction
Execute query
Server sends query response back
commit to disk

So yes, it would suffer the same problem....but nobody in their right mind would run a production mysql database without a binary log. The binary log isn't everything though, to be really safe you need to change some values to tell it flush it's contents after every update. Writing to the binary log is a much safer and cheaper operation than updating the data files on disk so it's performed before the commit to disk occurs. Since it occurs immediately following the query execution if the server were to crash chances are you will only lose that query instead of your whole buffer of updates.

My original point was really that you really need to pay attention to your I/O code to make sure that your disk writes don't lag too far behind the incoming updates. If someone in the game knows this they can exploit it to cause data loss.

All that said... I'm still with most people in this thread... why re-invent the wheel? What you've described seems to be a simple key/value storage engine of which there are several options for open source servers as well as just using a RDBMS as a key/value storage engine. Just use a BLOB field to store your data.

Or how about http://memcachedb.org/ ?
 

Red Squirrel

No Lifer
May 24, 2003
70,583
13,805
126
www.anyf.ca
One of the main reasons I'm doing this myself is speed. It will be much faster to make this rather simple app myself, then to try to figure out a 3rd party one, which will then require libraries to be configured, installed, maintained etc... I want to be able to easily send this to someone and they can compile it without the need for dependencies or anything too complicated. I can also custom tailor it for my exact needs.

The binary log is a good idea too, I might consider that. I will try to go without it and see how things go, and if I see I get backlogged during high activity times, then I might add it in. Overall it will add a bit more overhead but it may still be needed. For my needs the snapshot feature might be ok. Occasionally the client will send a packet to say that the current data is atomic, and it will then take a snapshot. Any issues that occur after that will simply involve reverting to the last snapshot. This will be automatic. I'm hoping to do these snapshots on a minute basis, but it will depend on the server activity.

That's actually the main issue I have with my existing mysql implementation, is that the queries backlog and it can take a couple hours to catch up! This is during very drastic things mind you, like spawning a couple million monsters at once. Something that will never actually happen during normal operation.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Why not use something like sqlite if speed is the issue? You'll still get to write your own networking code, however, it should be pretty fast while providing most features that you want in a Db system. As a bonus, it is in public domain. So you could literally distribute the SQLite header and c file with your entire project, and compile it into it as well.