anyone know SQL? i have a general question about efficiency in SQL

purbeast0 · Nov 4, 2006

I just have a question about which is more efficient.

Say for a webpage, on the front page, you have news combined from all consoles. So basically all ps2, ps3, xbox, etc, news is posted on the front page in order of the time.

However on the side menu you also have a way to browse news by consoles, so if you click on PS2 on the side, it will go to the page listing only ps2 news, ordered by time.

Now my question is about setting up the tables in the db ...

For this type of deal, would it be more efficient to have 1 giant table that just has ALL of the news articles in it, and when you see the home page, it just gets the latest 30 or so and posts them, and then when you click on the PS2 link on the side, it then selects only the PS2 articles from this huge table?

Or is it more efficient to have a table for each console (PS2 table, xbox table, etc), and then on the main front page, do a JOIN and select from that to display the main page, and then when they click on the PS2 or XBOX link on the side, it will just order them by time?

If you are to eventually have a HHUUGGEE database, which one of these is more efficient in the long run?

I've just heard before that JOIN's are pretty intensive and I didn't know if it was such a good thing to use on every homepage load.

MrChad · Nov 4, 2006

In that situation, your most common scenario will be to display news from all consoles (as this is the default for the home page of the site). In that case, you are certainly better off storing the news in one table and filtering that table as needed. If each row in the table has a console column and that column is properly indexed, filtering the data by column should be an efficient process.

purbeast0 · Nov 4, 2006

Originally posted by: MrChad
In that situation, your most common scenario will be to display news from all consoles (as this is the default for the home page of the site). In that case, you are certainly better off storing the news in one table and filtering that table as needed. If each row in the table has a console column and that column is properly indexed, filtering the data by column should be an efficient process.

See the thing i'm wondering, is sure the homepage will display it for ALL consoles. However I think that most users will click on a side bar as soon as they see the main page. atleast I know that's how I personally do it sometimes heh.

BoberFett · Nov 4, 2006

Having multiple tables, one per console type, is extremely inefficient. In order to display a list of articles that included all topics, you'd have to query each table and then concat the results and sort. Not to mention adding a console would be a headache.

Having a single table with a foreign key to a table which contains the names of the separate consoles is the way to do it. Then it's a simple join to retrieve the news per console. Joins are not intensive, it's what RDBMSs were designed for. More intensive than a query from a single table? Sure. But far less work than doing a dozen queries and then compiling and sorting the results.

purbeast0 · Nov 4, 2006

Originally posted by: BoberFett
Having multiple tables, one per console type, is extremely inefficient. In order to display a list of articles that included all topics, you'd have to query each table and then concat the results and sort. Not to mention adding a console would be a headache.

Having a single table with a foreign key to a table which contains the names of the separate consoles is the way to do it. Then it's a simple join to retrieve the news per console. Joins are not intensive, it's what RDBMSs were designed for. More intensive than a query from a single table? Sure. But far less work than doing a dozen queries and then compiling and sorting the results.

see I don't htink I know enough about SQL to do it the way you are talkin about heh. I don't know much about key's and stuff (as noob as that sounds).

what if I had console specific tables (console_ps2, console_xbox, etc) that were used for the console pages, and then what if i ALSO had a main console table that just mixed all consoles. so when i add to the db, it would add in 2 places.

that way i wouldn't have to join any results at all, i would just call a select and order by time on the main console table for the homepage, and do the console_specific queries for the console specific links.

would that be efficient? it almost seems like that's what you were saying, but in a more advanced way that also may be more efficient than the way i'm putting out there.

BoberFett · Nov 4, 2006

Duplicating data? Terrible idea.

Two tables:

tblConsoles
ConsoleID, primarykey, autoincrement, int, an ID to use in other tables to reference
ConsoleName, char(255), the name of the console

tblNews
NewsID, primarykey, autoincrement, int, the ID of the news item used for display, linking, etc.
ConsoleID, foreignkey, int, a reference to the name of the console the news is for
NewsTime, datetime, the time the news was added
NewsDescription, char(255), a short description of the news
NewsText, text blob, the news

You query the news table and display the most recent X news item limiting by date, or by TOP 20 or whatever on the home page.

On the side you query the console table to retrieve a list of consoles that can be linked to another page.

On each console's page, you don't even have to join, you can just run one query and limit it by WHERE ConsoleID = X, where X is the number of the console you want to display news for.

Adding a new console is easy, just add another entry to the console table and use that ID when adding news.

kamper · Nov 4, 2006

Yeah, if you have several sets of data that have the same components, it's usually bad form to break them into multiple tables. If you do something like selecting the 10 most recent articles from one table, the database will already have an index (hopefully) on the date field, so it could sort very quickly. I suppose it's possible, but less likely, that a database would keep an index on the combination of date fields across multiple tables. I'm not even sure how you'd do it, maybe declare a view that unions all the tables and has an index on the date field...

I once worked on a project where there were multiple identical tables (normally a bad idea) but it was necessary for performance reasons. There were something like 8 queues being backed by the database, each one with the potential for several hundred thousand items and constant enqueuing and dequeuing (making indexes difficult to keep up to date) as fast as the app could process the data in the queues. The number of queues was a fairly central part of the design so it wasn't likely to change. But data from multiple queues never had to be combined.

purbeast0 · Nov 4, 2006

Originally posted by: BoberFett
Duplicating data? Terrible idea.

Two tables:

tblConsoles
ConsoleID, primarykey, autoincrement, int, an ID to use in other tables to reference
ConsoleName, char(255), the name of the console

tblNews
NewsID, primarykey, autoincrement, int, the ID of the news item used for display, linking, etc.
ConsoleID, foreignkey, int, a reference to the name of the console the news is for
NewsTime, datetime, the time the news was added
NewsDescription, char(255), a short description of the news
NewsText, text blob, the news

You query the news table and display the most recent X news item limiting by date, or by TOP 20 or whatever on the home page.

On the side you query the console table to retrieve a list of consoles that can be linked to another page.

On each console's page, you don't even have to join, you can just run one query and limit it by WHERE ConsoleID = X, where X is the number of the console you want to display news for.

Adding a new console is easy, just add another entry to the console table and use that ID when adding news.

Aaah okay I gotcha. That makes a lot of sense actually.

I think my main concern was when the db gets huge, say over 100K entries in it, will displaying console news by console be a slow query since it has to go throug this huge table of data to retrieve it? that is what i was unsure of.

he i know how sql works and stuff, but since i never took a class on it i don't know what is the most efficient way to do stuff 🙂

EDIT:

Side question ...

is it a better idea to have the forum's data in a completely different database for a website, or is it okay to have it mixed in with the data that is used for the news and stuff?

BoberFett · Nov 4, 2006

Originally posted by: kamper
Yeah, if you have several sets of data that have the same components, it's usually bad form to break them into multiple tables. If you do something like selecting the 10 most recent articles from one table, the database will already have an index (hopefully) on the date field, so it could sort very quickly. I suppose it's possible, but less likely, that a database would keep an index on the combination of date fields across multiple tables. I'm not even sure how you'd do it, maybe declare a view that unions all the tables and has an index on the date field...

I once worked on a project where there were multiple identical tables (normally a bad idea) but it was necessary for performance reasons. There were something like 8 queues being backed by the database, each one with the potential for several hundred thousand items and constant enqueuing and dequeuing (making indexes difficult to keep up to date) as fast as the app could process the data in the queues. The number of queues was a fairly central part of the design so it wasn't likely to change. But data from multiple queues never had to be combined.

Was immediate access to the data in that case required by the app? That sounds like a crazy setup, if immediate access wasn't important I'd skip the database and write the data to a text file, and then bulk insert it later.

kamper · Nov 4, 2006

Originally posted by: BoberFett
Was immediate access to the data in that case required by the app? That sounds like a crazy setup, if immediate access wasn't important I'd skip the database and write the data to a text file, and then bulk insert it later.

Yes, it was a queue which means data was thrown on as soon as it was received from other systems and pulled off as soon as soon as our app was ready to process it. A text file would be completely out of the question. It could technically have been done in memory, but that would have mean data loss if the server went down unexpectedly and might have gotten messy if there was a really big backup. It would have been theoretically better to use a canned jms queue (as were using java) but jboss's queuing (we were using jboss) wasn't too hot at the time, something like mq series would have cost too damn much and the standard jms api didn't have all the features we needed (starvation prevention when using priorities).

purbeast0 · Nov 4, 2006

One more quick question ...

What is a number where the number of entries in a DB begins to make the DB select queries slow?

10,000 entries in a table?
100,000?
1,000,000?

Drakkon · Nov 4, 2006

thats a really tuff question. How many tables are you joinging? are there any delimiters? are there any indexes in place?
I've seen a query operate on a table with only 100 entries slower than a 1 million row table come back with simple results that had an index in place.

purbeast0 · Nov 4, 2006

Originally posted by: Drakkon
thats a really tuff question. How many tables are you joinging? are there any delimiters? are there any indexes in place?
I've seen a query operate on a table with only 100 entries slower than a 1 million row table come back with simple results that had an index in place.

most likely i'm going to take bobberfett's idea that is like 5 posts up. I think that's how I'm going to work my DB.

if that was the case what would you say?

kamper · Nov 4, 2006

Originally posted by: purbeast0

Originally posted by: Drakkon
thats a really tuff question. How many tables are you joinging? are there any delimiters? are there any indexes in place?
I've seen a query operate on a table with only 100 entries slower than a 1 million row table come back with simple results that had an index in place.

Click to expand...

most likely i'm going to take bobberfett's idea that is like 5 posts up. I think that's how I'm going to work my DB.

if that was the case what would you say?

There's no set line when things start to get slow. As Drakkon said, it depends on how you've got your table set up. It depends on what database software you're using, how it's configured and what hardware you're running on. And 'slow' is a very relative term. It will probably gradually get slower as the number of rows increases. Just how many news stories are you really planning on archiving?

purbeast0 · Nov 5, 2006

Originally posted by: BoberFett
Duplicating data? Terrible idea.

Two tables:

tblConsoles
ConsoleID, primarykey, autoincrement, int, an ID to use in other tables to reference
ConsoleName, char(255), the name of the console

tblNews
NewsID, primarykey, autoincrement, int, the ID of the news item used for display, linking, etc.
ConsoleID, foreignkey, int, a reference to the name of the console the news is for
NewsTime, datetime, the time the news was added
NewsDescription, char(255), a short description of the news
NewsText, text blob, the news

You query the news table and display the most recent X news item limiting by date, or by TOP 20 or whatever on the home page.

On the side you query the console table to retrieve a list of consoles that can be linked to another page.

On each console's page, you don't even have to join, you can just run one query and limit it by WHERE ConsoleID = X, where X is the number of the console you want to display news for.

Adding a new console is easy, just add another entry to the console table and use that ID when adding news.

Okay well i'm looking at this suggestion here and I have another question ...

What is the point of the tblConsoles? I mean, if I just know in my mind which # refers to which console, wouldn't having that table be kind of pointless and not needed, which could be some overhead that I don't necessarily need?

Like it seems that once I get the console ID from? Or is the tblConsoles kind of just for my own personal reference?

Drakkon · Nov 5, 2006

for that description I'd be inclined to say that you wouldnt see any signifigant slowdown on a decent webserver (i.e. p4 /w 4GB of ram) running say apache, mysql and php until maybe 100K records if they were all displayed at once. And there your biggest bottleneck would be the PHP processing the data. Your not doing any complex joins or searching by like terms so i'd be inclined to say that you would never notice any slowdown except by your processing script/application because as it has more rows to process it will take longer.

The reason to serpeate out tblConsoles is jsut to make searches easier later on. Searching by a primary key helps in what they call "indexing" a table to making searches on it run faster. If your searching for a number instead of a full string it helps in organzing the search easier when it comes to most DB's. It also could help later on if say you wanted to make a review section where instead of news stories you wanted to create reviews and a serperate "tblReviews" whcih then you could have consoles off that.

purbeast0 · Nov 5, 2006

Originally posted by: Drakkon
for that description I'd be inclined to say that you wouldnt see any signifigant slowdown on a decent webserver (i.e. p4 /w 4GB of ram) running say apache, mysql and php until maybe 100K records if they were all displayed at once. And there your biggest bottleneck would be the PHP processing the data. Your not doing any complex joins or searching by like terms so i'd be inclined to say that you would never notice any slowdown except by your processing script/application because as it has more rows to process it will take longer.

The reason to serpeate out tblConsoles is jsut to make searches easier later on. Searching by a primary key helps in what they call "indexing" a table to making searches on it run faster. If your searching for a number instead of a full string it helps in organzing the search easier when it comes to most DB's. It also could help later on if say you wanted to make a review section where instead of news stories you wanted to create reviews and a serperate "tblReviews" whcih then you could have consoles off that.

Okay I see what you are saying ... kind of.

Like say I want to get the news article out of the tblNews for PS2 (which say is ConsoleID 2). What would be the point of querying tblConsoles to get the ConsoleID for PS2, and THEN querying the tblNews with that result, when I could just query tblNews with the ConsoleID 2 since I already know that ps2 = consoleID 2 ?

that is the whole thing I'm missing.

purbeast0 · Nov 5, 2006

anyone else have any comments on my last confusion question?

just waiting to hear before I go start my DB, thanks 🙂

EagleKeeper · Nov 5, 2006

If you know the indexing then you can use it.

What would happen if the indexing changes?

anyone know SQL? i have a general question about efficiency in SQL

purbeast0

No Lifer

MrChad

Lifer

purbeast0

No Lifer

BoberFett

Lifer

purbeast0

No Lifer

BoberFett

Lifer

kamper

Diamond Member

purbeast0

No Lifer

BoberFett

Lifer

kamper

Diamond Member

purbeast0

No Lifer

Drakkon

Diamond Member

purbeast0

No Lifer

kamper

Diamond Member

purbeast0

No Lifer

Drakkon

Diamond Member

purbeast0

No Lifer

purbeast0

No Lifer

EagleKeeper

Discussion Club Moderator<br>Elite Member

TRENDING THREADS