• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Database Question ( MySQL particularly... ) LOTS of tables, or ONE BIG ONE?

Superwormy

Golden Member
Feb 7, 2001
1,637
0
0
All kinda theoretical at this point but...

Using MySQL currently, but that might change later, say I have a table, only about 4kb per record, but an anticipated number of records somewhere around 20 MILLION. Thats a lot of data.

Now, would it be better to break the table into multiple tables, say something like this:

table_a ( tables with the main record starting with the letter a )
table_b ( tables with the main record starting with the letter b )
table_c ( table.... etc you get the point

OR

Would it be better to just leave it in one HUGE table?


If it helps any, the only selects that are done select by the IDX of the record, the Primary key. So the select is just:
SELECT * FROM table WHERE myprimarykey = 233

If it was split into multiple tables, it would jsut change to:
SELECT * FROM table_" . $firstcharacter . " WHERE myprimarykey = 223


Any thoughts? How much faster would 26 smaller tables be than 1 huge table when selecting by index?
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
With a smart system, multiple tables would be better (faster) if you can split them across drives.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
Would it be faster to look through a card catalog if it was jsut one long drawer, or a different drawer for each letter of the alphabet?
 

manly

Lifer
Jan 25, 2000
13,292
4,064
136
I'm gonna be contrarian and say the manual split-table approach won't necessarily help out that much. You can physically partition a database (or the underlying data storage) without modifying the logical database anyway.

For starters, it's unlikely you'll achieve any kind of uniform distribution across the 26 tables. Secondly, this sounds like the basic case where a secondary index would help out a lot. As usual, take my opinion with a grain of salt. I'm just another AT geek, not a professional DBA. ;)

The only real question (that I can't answer) is how scalable is MySQL? Are 10+ GB tables* in common use with MySQL?

* 10 GB nowadays isn't as big as it sounds ;)
 

Superwormy

Golden Member
Feb 7, 2001
1,637
0
0
Anyone else? Other opinions?

I guess another issue to keep in mind is that I believe MySQL limits the total number of tables that can be open at one time... will tables have to be opened / closed more often with this many more tables, adn will this slow thigns down?
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
I think that indexing the table should logically accomplish what you hope to accomplish physically by splitting up the tables. Being able to split the tables between multiple spindles may help, but you may be better off RAIDing those spindles together.
But I'm guessing that the only way to tell is to try it.
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
The time that it takes your code to figure out what table to look in for a particular piece of data is going to be much more than the difference in time for SQL to check the index on a larger table.

The time it takes SQL to union results out of multiple tables is going to be a hell of a lot more than it would take to just query with the big table.

So you're much better off staying with the one big table.
 

Superwormy

Golden Member
Feb 7, 2001
1,637
0
0
Aight but it wont really take me any extra time to determine what table to grab from if I sort everything by the first letter of the main record right?

So I can just do ( PHP code here ):
SELECT * FROM table_" . $singlechar . " WHERE ID = 25


And I'll only ever be doing single selects, I really should never need to do a JOIN to join the result sets... so then do I still go with one table?
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
If it's all going to be on one physical disk spindle, I don't think you'll see any advantage to splitting up the tables. Across multiple spindles it might help when the load really cranks up. But as I said before, RAID might buy you the same advantage without the hassle of dealing with multiple tables.
What kind of load do you expect?

Sounds like an interesting experiment though ... sometimes all you can do is test it.
Let us know what you find, cuz I think most of the responses you've had so far are just intuition or WAGS.
 

Superwormy

Golden Member
Feb 7, 2001
1,637
0
0
I will definitally be trying it hopefully both ways, and I'll let you guys know what I find out. Might be at least another few weeks though before I find out. At least.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Originally posted by: Superwormy
Anyone else? Other opinions?

I guess another issue to keep in mind is that I believe MySQL limits the total number of tables that can be open at one time... will tables have to be opened / closed more often with this many more tables, adn will this slow thigns down?

I know that Access has a 30+ table limit that can be opened at once.
I suspect the MySQL may have a similar limit.

 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
You are better off calling an sp, with the key value you are looking for as a param, so that SQL only has to parse the select statement and determine an optimization strategy once, rather than having PHP build the select statement dynamically.

Although for the multiple table case php will handle building the table name better than using a SQL exec for that kind of logic within an sp -- having SQL just query one huge table still ends up by far the best option.

The only exception to this would be if you actually had multiple database servers and distributed the data across the servers, especially if the data was too big to become cached on one server.
 

chsh1ca

Golden Member
Feb 17, 2003
1,179
0
0
Access is a flatfile database format akin to DBF.
MySQL is a database server capable of handling hundreds of simultaneous connections.

They are nothing alike.

The answer to your question on open tables is that it's configurable: http://www.mysql.com/doc/en/Table_cache.html

Principally speaking, the multi-table approach is a BAD idea. We've got a database of image information that's indexed here. Something along the lines of 3 million rows taking up roughly 63.7GB of data, all in a single table on MSSQL Server 2000. The drives the thing is on are in RAID5, and more importantly, the server has 8GB of RAM.

RAM is more likely to be a key factor in indexing. If the entire index is loaded into ram with room to spare, you shouldn't have any issues pulling up any record, depending on how MySQL handles indexes.

You're far more likely to run into slowdowns within your application than on the DB side, unless you're talking 10+ million rows.
 

Superwormy

Golden Member
Feb 7, 2001
1,637
0
0
Well, heh, thats the thing, we're talking 20+ million rows :)

GlugGlub, what do you mean by this:
You are better off calling an sp, with the key value you are looking for as a param, so that SQL only has to parse the select statement and determine an optimization strategy once, rather than having PHP build the select statement dynamically.


Thanks1
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
If you send the SQL server "select * from xtable where ID=123", that statement needs to get parsed and the optimizer needs to figure out that the index on ID is the best one to use before it executes, and this happens every time you send the server the select.

If you call a sp that looks like this:

create procedure blah @IDparam as int
AS
select * from xtable where ID = @IDparam


that SP gets compiled and run through the optimizer ONCE the first time you run it, and the query plan to use the index gets cached. For a select of a single row based on an indexed field like this, the actual search itself is tiny compared to the optimization process.
 

manly

Lifer
Jan 25, 2000
13,292
4,064
136
Particularly for the one-table approach, you could also use a prepared statement, which is also cached by the DBMS. You don't need to implement a stored procedure until you know you need one.
 

ztadmin

Member
Feb 11, 2002
199
0
0
what the hell are you doing that needs a 20 million row table? i mean, thats pretty big.
 

joinT

Lifer
Jan 19, 2001
11,172
0
0
Originally posted by: notfred
Would it be faster to look through a card catalog if it was jsut one long drawer, or a different drawer for each letter of the alphabet?

ROFL - good answer