• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Scalability of PHP/MySQL solutions

spyordie007

Diamond Member
I have a membership database at the non-profit I work for, we have a single table for contacts and use (primarily) PHP pages for our users to manage those contacts. Currently we are looking at about 1000 entries and as you would expect we dont really have any scalability/speed/stability problems however we are likely going to start expanding and adding tables for other portions of the organization and one of them has upwards of about 3 Million entries, as far as I know MySQL it should be able to handle tables with that many entries but I have never run with that many. Do any of you have any experience with MySQL tables with that many entries and/or have you seen any problems with tables this large? At this point I'm just looking to get a feel for how well I could expect MySQL to handle it.

Thanks

-Spy
 
I have never worked with that many records myself. The most records I've worked with is around 900,000 records on MySQL and it's running fine. I've heard of people having more than 5 million records on MySQL with over 80 tables on one server.
 
I have not worked with that many my self, I am pretty sure each will be able to handle it, you are just going to reach a point where how you code your querys and you PHP code will make a bigger difference on the speed.
 
Try webhostingtalk.com forums, I would bet a lot of people there have experience with this.

Most I've had in a table was about half a million, and simple queries seem acceotably fast, but it's on a very low traffic site, with no optimizations, etc. Just don't you dare use REGEXP, can you say 40 second queries? 😀

Indexing will likely also be important (I'm not sure if you are aware of this or not so forgive me if I'm patronizing you 🙂)
 
I'll give you what I've been playing with at work lately.

We've been working on a logging setup for apache called mod_log_sql. What happens is, rather than write a line to a flatfile with all that stuff, it writes it to a mysql table. We also get a lot of traffic... so dealing with the size of that table and what mysql can handle has been my charge for a few days.

The table in question consists of a few varchar's (5,40,50,50,255,255,255) two unsigned INTs and an unsigned SMALLINT. Which if I can remember off the top of my head is something like 920bytes a record, so lets say 1K to make it easy. This table has no index's.

At about the 200,000 record mark queries started to break what most of us are used to with mysql, somtimes taking a full second or two to complete. Around the time we hit 650,000 records a query with two "LIKE '%string%' clauses in it was taking 12 to 15 seconds to complete. I'm not complaining, it does what I need it to do, but it certianly is not fast enough to embed in a web application.

BTW: this is on a P3 1.13Ghz machine with 256Mb of pc133. The only other load on the system is the stream of inserts putting the records into the table.

Now in my case I'm hosed for a few reasons.
1.) "LIKE '%blah%'" searches are a lot of overhead. Fortunatly I'm doing them on relativly short varchars, and not large packed TEXT columns.
2.) I have no indexes. They could probably drastically speed up some of that searching at the cost of INSERT time and disk space. Unfortunatly mod_log_sql will bog down the apache process very badly if it doesn't get fairly imediate satisfaction from the sql server, so I can't have indexes.
3.) MySQL's locking granularity is by page not by record (on MyISAM). This means as INSERTS are constantly adding to the last "page" of the table, my SELECT query is having trouble getting its turn to read the table.

Simiarly on another application I'm working on we have 8 tables, and nearly every query joins 2 - 4 of them. They all complete in under a quarter of a second though as any given table has yet to break 100 records (this is on an almost completely un-loaded 700mhz xeon).

So enough about my stuff, for your task you've got to address a few of the basic considerations.
- concurrent access. Will 2 - 5 office workers being accessing this frequently throughout the day, or will one lady load it 2 or three times a day.
- is your insert-to-select ration extremely low (as is usually true) and can you therefore afford many indexes
- will your application often involve multi-table joins
- will your application often involve fulltext or 'like' searches.

Even if you will have 5 concurrent users of 6 or 7 tables of ~500 records with no index's using mostly 2-table joins with one or two fuzzy comparison searches you'll probably be fine as long as you're not on a already near-overload shared-server. However if you start joining against or searchign through that 3-million record table you may find things a bit sluggish if your queries are complex at all.

Interestingly enough, PHP will probably be faster than you need it to. Most of the 'work' is performed by mysql in these apps. Unless you're using this data to dynamically generate images or pdf's or anything. Or if you're not using joins properly in sql and you replicate that with array searches and ugly nested looping in php.

Now ask yourself, is there a logical point in this application where some page does not have to be generated on the fly each time a user loads it. For instance, with my task I'm going to write some daily and montly cron'd perl scripts that do the selects I need and create the html files to show the statistics.
Take this, your contacts database is stored in mysql. But everytime someone adds a record the php script fork's off a perl script that extracts the data, builds a static html file, and replaces the old one sorted say alphabetical so organized into a few pages. At that point a user can go to the "M - S" page for "robinson" and use their browsers built in find feature after they have downloaded the static html page.

Anyway, this might not work for you at all, I just find that more web developers should be asking themselves things like "does my blog need to pull from the database every single time a user visits... or once each time i update it".
 
Great post Buddha Bart! I have a question about something you said though.
2.) I have no indexes. They could probably drastically speed up some of that searching at the cost of INSERT time and disk space. Unfortunatly mod_log_sql will bog down the apache process very badly if it doesn't get fairly imediate satisfaction from the sql server, so I can't have indexes
Why do indexes cause mod_log_sql to bog down?

I'm facing a similar situation here at work. We have this ASP that logs access to a particular site on our server. Right now it's hitting an Access DB <shudder> and that thing has grown huge in just a week. I'm talking 60MB for ~20,000 rows. And this thing needs to run forever, so I can imagine I will be out of disk space before long. I'm currently looking to implement mySQL in Access's place.
 
Originally posted by: BingBongWongFooey
Indeces slow down insertions.

Yes think about it this way AmigaMan with and Index every time you make an INSERT you have to regenerate the Index (which also can get pretty big) This is often worth it if you are doing alot of SELECT's and need the added speed, but in some aplications the added overheard for the INSERTS is not worth it compared to the speed increase for the SELECT's.
 
That's interesting, I never knew that. The database I'm building probably won't have indices then since the only time a select would be used would be for reporting done at the end of the month. The managers will just have to wait a couple seconds longer...hehehe
 
Originally posted by: jonmullen
Originally posted by: BingBongWongFooey
Indeces slow down insertions.

Yes think about it this way AmigaMan with and Index every time you make an INSERT you have to regenerate the Index (which also can get pretty big) This is often worth it if you are doing alot of SELECT's and need the added speed, but in some aplications the added overheard for the INSERTS is not worth it compared to the speed increase for the SELECT's.

It doesn't have to regenerate the whole index, it just has to find out where the current element fits in the index. So you're basically searching for a position in a sorted list ... O(log n) where n is the size of the list. Get a few million entries and that will start to bite you hard 😀
 
Wow so I guess the tip is if you want plenty of replies than just dont re-visit the issue for several days, thanks allot for the input guys!

I guess the way we are working now we will likely be just fine, my MySQL box is a dual ~400mhz box with 768MB of RAM. I think the biggest concern would be the RAM if we started doing a large number of searches so I could always add some more if need be.

We're actually in the process of rebuilding many of our PHP pages also, we want an interface that makes sense to more than just the handful of us who currently use it. We are keeping extreme scalability well in mind this time around (our current pages have some pretty sloppy code) so hopefully everything will be just fine. I doubt the PHP server will give us much problems, it is a Athlon 2200+ and (as many of you indicated) for the majority of these pages the MySQL server will be doing the majority of the work. I'm currently in the process of building a Quad P3 Xeon Server and if the current server becomes an issue I could always decommission our current MySQL server and run MySQL on the Quad-CPU box (its current planned services are well below it's capabilities).

Thanks again for all your input!

-Spy
 
Back
Top