• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

anyone know SQL? i have a general question about efficiency in SQL

purbeast0

No Lifer
I just have a question about which is more efficient.

Say for a webpage, on the front page, you have news combined from all consoles. So basically all ps2, ps3, xbox, etc, news is posted on the front page in order of the time.

However on the side menu you also have a way to browse news by consoles, so if you click on PS2 on the side, it will go to the page listing only ps2 news, ordered by time.

Now my question is about setting up the tables in the db ...

For this type of deal, would it be more efficient to have 1 giant table that just has ALL of the news articles in it, and when you see the home page, it just gets the latest 30 or so and posts them, and then when you click on the PS2 link on the side, it then selects only the PS2 articles from this huge table?

Or is it more efficient to have a table for each console (PS2 table, xbox table, etc), and then on the main front page, do a JOIN and select from that to display the main page, and then when they click on the PS2 or XBOX link on the side, it will just order them by time?

If you are to eventually have a HHUUGGEE database, which one of these is more efficient in the long run?

I've just heard before that JOIN's are pretty intensive and I didn't know if it was such a good thing to use on every homepage load.
 
In that situation, your most common scenario will be to display news from all consoles (as this is the default for the home page of the site). In that case, you are certainly better off storing the news in one table and filtering that table as needed. If each row in the table has a console column and that column is properly indexed, filtering the data by column should be an efficient process.
 
Originally posted by: MrChad
In that situation, your most common scenario will be to display news from all consoles (as this is the default for the home page of the site). In that case, you are certainly better off storing the news in one table and filtering that table as needed. If each row in the table has a console column and that column is properly indexed, filtering the data by column should be an efficient process.

See the thing i'm wondering, is sure the homepage will display it for ALL consoles. However I think that most users will click on a side bar as soon as they see the main page. atleast I know that's how I personally do it sometimes heh.
 
Having multiple tables, one per console type, is extremely inefficient. In order to display a list of articles that included all topics, you'd have to query each table and then concat the results and sort. Not to mention adding a console would be a headache.

Having a single table with a foreign key to a table which contains the names of the separate consoles is the way to do it. Then it's a simple join to retrieve the news per console. Joins are not intensive, it's what RDBMSs were designed for. More intensive than a query from a single table? Sure. But far less work than doing a dozen queries and then compiling and sorting the results.
 
Originally posted by: BoberFett
Having multiple tables, one per console type, is extremely inefficient. In order to display a list of articles that included all topics, you'd have to query each table and then concat the results and sort. Not to mention adding a console would be a headache.

Having a single table with a foreign key to a table which contains the names of the separate consoles is the way to do it. Then it's a simple join to retrieve the news per console. Joins are not intensive, it's what RDBMSs were designed for. More intensive than a query from a single table? Sure. But far less work than doing a dozen queries and then compiling and sorting the results.

see I don't htink I know enough about SQL to do it the way you are talkin about heh. I don't know much about key's and stuff (as noob as that sounds).

what if I had console specific tables (console_ps2, console_xbox, etc) that were used for the console pages, and then what if i ALSO had a main console table that just mixed all consoles. so when i add to the db, it would add in 2 places.

that way i wouldn't have to join any results at all, i would just call a select and order by time on the main console table for the homepage, and do the console_specific queries for the console specific links.

would that be efficient? it almost seems like that's what you were saying, but in a more advanced way that also may be more efficient than the way i'm putting out there.
 
Duplicating data? Terrible idea.

Two tables:

tblConsoles
ConsoleID, primarykey, autoincrement, int, an ID to use in other tables to reference
ConsoleName, char(255), the name of the console

tblNews
NewsID, primarykey, autoincrement, int, the ID of the news item used for display, linking, etc.
ConsoleID, foreignkey, int, a reference to the name of the console the news is for
NewsTime, datetime, the time the news was added
NewsDescription, char(255), a short description of the news
NewsText, text blob, the news


You query the news table and display the most recent X news item limiting by date, or by TOP 20 or whatever on the home page.

On the side you query the console table to retrieve a list of consoles that can be linked to another page.

On each console's page, you don't even have to join, you can just run one query and limit it by WHERE ConsoleID = X, where X is the number of the console you want to display news for.

Adding a new console is easy, just add another entry to the console table and use that ID when adding news.
 
Yeah, if you have several sets of data that have the same components, it's usually bad form to break them into multiple tables. If you do something like selecting the 10 most recent articles from one table, the database will already have an index (hopefully) on the date field, so it could sort very quickly. I suppose it's possible, but less likely, that a database would keep an index on the combination of date fields across multiple tables. I'm not even sure how you'd do it, maybe declare a view that unions all the tables and has an index on the date field...

I once worked on a project where there were multiple identical tables (normally a bad idea) but it was necessary for performance reasons. There were something like 8 queues being backed by the database, each one with the potential for several hundred thousand items and constant enqueuing and dequeuing (making indexes difficult to keep up to date) as fast as the app could process the data in the queues. The number of queues was a fairly central part of the design so it wasn't likely to change. But data from multiple queues never had to be combined.
 
Originally posted by: BoberFett
Duplicating data? Terrible idea.

Two tables:

tblConsoles
ConsoleID, primarykey, autoincrement, int, an ID to use in other tables to reference
ConsoleName, char(255), the name of the console

tblNews
NewsID, primarykey, autoincrement, int, the ID of the news item used for display, linking, etc.
ConsoleID, foreignkey, int, a reference to the name of the console the news is for
NewsTime, datetime, the time the news was added
NewsDescription, char(255), a short description of the news
NewsText, text blob, the news


You query the news table and display the most recent X news item limiting by date, or by TOP 20 or whatever on the home page.

On the side you query the console table to retrieve a list of consoles that can be linked to another page.

On each console's page, you don't even have to join, you can just run one query and limit it by WHERE ConsoleID = X, where X is the number of the console you want to display news for.

Adding a new console is easy, just add another entry to the console table and use that ID when adding news.

Aaah okay I gotcha. That makes a lot of sense actually.

I think my main concern was when the db gets huge, say over 100K entries in it, will displaying console news by console be a slow query since it has to go throug this huge table of data to retrieve it? that is what i was unsure of.

he i know how sql works and stuff, but since i never took a class on it i don't know what is the most efficient way to do stuff 🙂

EDIT:

Side question ...

is it a better idea to have the forum's data in a completely different database for a website, or is it okay to have it mixed in with the data that is used for the news and stuff?
 
Originally posted by: kamper
Yeah, if you have several sets of data that have the same components, it's usually bad form to break them into multiple tables. If you do something like selecting the 10 most recent articles from one table, the database will already have an index (hopefully) on the date field, so it could sort very quickly. I suppose it's possible, but less likely, that a database would keep an index on the combination of date fields across multiple tables. I'm not even sure how you'd do it, maybe declare a view that unions all the tables and has an index on the date field...

I once worked on a project where there were multiple identical tables (normally a bad idea) but it was necessary for performance reasons. There were something like 8 queues being backed by the database, each one with the potential for several hundred thousand items and constant enqueuing and dequeuing (making indexes difficult to keep up to date) as fast as the app could process the data in the queues. The number of queues was a fairly central part of the design so it wasn't likely to change. But data from multiple queues never had to be combined.

Was immediate access to the data in that case required by the app? That sounds like a crazy setup, if immediate access wasn't important I'd skip the database and write the data to a text file, and then bulk insert it later.
 
Originally posted by: BoberFett
Was immediate access to the data in that case required by the app? That sounds like a crazy setup, if immediate access wasn't important I'd skip the database and write the data to a text file, and then bulk insert it later.
Yes, it was a queue which means data was thrown on as soon as it was received from other systems and pulled off as soon as soon as our app was ready to process it. A text file would be completely out of the question. It could technically have been done in memory, but that would have mean data loss if the server went down unexpectedly and might have gotten messy if there was a really big backup. It would have been theoretically better to use a canned jms queue (as were using java) but jboss's queuing (we were using jboss) wasn't too hot at the time, something like mq series would have cost too damn much and the standard jms api didn't have all the features we needed (starvation prevention when using priorities).
 
One more quick question ...

What is a number where the number of entries in a DB begins to make the DB select queries slow?

10,000 entries in a table?
100,000?
1,000,000?
 
thats a really tuff question. How many tables are you joinging? are there any delimiters? are there any indexes in place?
I've seen a query operate on a table with only 100 entries slower than a 1 million row table come back with simple results that had an index in place.
 
Originally posted by: Drakkon
thats a really tuff question. How many tables are you joinging? are there any delimiters? are there any indexes in place?
I've seen a query operate on a table with only 100 entries slower than a 1 million row table come back with simple results that had an index in place.

most likely i'm going to take bobberfett's idea that is like 5 posts up. I think that's how I'm going to work my DB.

if that was the case what would you say?
 
Originally posted by: purbeast0
Originally posted by: Drakkon
thats a really tuff question. How many tables are you joinging? are there any delimiters? are there any indexes in place?
I've seen a query operate on a table with only 100 entries slower than a 1 million row table come back with simple results that had an index in place.

most likely i'm going to take bobberfett's idea that is like 5 posts up. I think that's how I'm going to work my DB.

if that was the case what would you say?
There's no set line when things start to get slow. As Drakkon said, it depends on how you've got your table set up. It depends on what database software you're using, how it's configured and what hardware you're running on. And 'slow' is a very relative term. It will probably gradually get slower as the number of rows increases. Just how many news stories are you really planning on archiving?
 
Originally posted by: BoberFett
Duplicating data? Terrible idea.

Two tables:

tblConsoles
ConsoleID, primarykey, autoincrement, int, an ID to use in other tables to reference
ConsoleName, char(255), the name of the console

tblNews
NewsID, primarykey, autoincrement, int, the ID of the news item used for display, linking, etc.
ConsoleID, foreignkey, int, a reference to the name of the console the news is for
NewsTime, datetime, the time the news was added
NewsDescription, char(255), a short description of the news
NewsText, text blob, the news


You query the news table and display the most recent X news item limiting by date, or by TOP 20 or whatever on the home page.

On the side you query the console table to retrieve a list of consoles that can be linked to another page.

On each console's page, you don't even have to join, you can just run one query and limit it by WHERE ConsoleID = X, where X is the number of the console you want to display news for.

Adding a new console is easy, just add another entry to the console table and use that ID when adding news.

Okay well i'm looking at this suggestion here and I have another question ...

What is the point of the tblConsoles? I mean, if I just know in my mind which # refers to which console, wouldn't having that table be kind of pointless and not needed, which could be some overhead that I don't necessarily need?

Like it seems that once I get the console ID from? Or is the tblConsoles kind of just for my own personal reference?
 
for that description I'd be inclined to say that you wouldnt see any signifigant slowdown on a decent webserver (i.e. p4 /w 4GB of ram) running say apache, mysql and php until maybe 100K records if they were all displayed at once. And there your biggest bottleneck would be the PHP processing the data. Your not doing any complex joins or searching by like terms so i'd be inclined to say that you would never notice any slowdown except by your processing script/application because as it has more rows to process it will take longer.

The reason to serpeate out tblConsoles is jsut to make searches easier later on. Searching by a primary key helps in what they call "indexing" a table to making searches on it run faster. If your searching for a number instead of a full string it helps in organzing the search easier when it comes to most DB's. It also could help later on if say you wanted to make a review section where instead of news stories you wanted to create reviews and a serperate "tblReviews" whcih then you could have consoles off that.
 
Originally posted by: Drakkon
for that description I'd be inclined to say that you wouldnt see any signifigant slowdown on a decent webserver (i.e. p4 /w 4GB of ram) running say apache, mysql and php until maybe 100K records if they were all displayed at once. And there your biggest bottleneck would be the PHP processing the data. Your not doing any complex joins or searching by like terms so i'd be inclined to say that you would never notice any slowdown except by your processing script/application because as it has more rows to process it will take longer.

The reason to serpeate out tblConsoles is jsut to make searches easier later on. Searching by a primary key helps in what they call "indexing" a table to making searches on it run faster. If your searching for a number instead of a full string it helps in organzing the search easier when it comes to most DB's. It also could help later on if say you wanted to make a review section where instead of news stories you wanted to create reviews and a serperate "tblReviews" whcih then you could have consoles off that.

Okay I see what you are saying ... kind of.

Like say I want to get the news article out of the tblNews for PS2 (which say is ConsoleID 2). What would be the point of querying tblConsoles to get the ConsoleID for PS2, and THEN querying the tblNews with that result, when I could just query tblNews with the ConsoleID 2 since I already know that ps2 = consoleID 2 ?

that is the whole thing I'm missing.
 
anyone else have any comments on my last confusion question?

just waiting to hear before I go start my DB, thanks 🙂
 
If you know the indexing then you can use it.

What would happen if the indexing changes?
 
Back
Top