SQL Question

Aug 25, 2004
11,151
1
81
I writing a PHP app where one of the SQL tables has a column (lets call it "file") containing file paths (e.g. 'example/folder1/image1.gif'). Most of my queries would be something similar to:

select file from sometable where file='example/file1.ext';

I'm a beginner when it comes to databases - I know my way around with code, but I don't have much knowledge about the theoretical stuff (like normalization). Anyway, I was thinking the above query would get slow as the table size increased. Someone suggested using a hash value, so I was thinking I'd add a new column (lets call it "hash") and this would have hash values from 0 to 99. The sql query could then be modified to look like this:

select file from sometable where hash=5 and file='example/file1.ext';

My question is, will this help speed up my app, or am I just wasting my time???

Thank you :)
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
Just build an index on the file field. Your database will take up slightly more disk space, but doing selects on the file field will stay nice and fast, and it will scale well.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
How big do you expect the table to get?
Is it worth separating the file name and the path into two seperate columns?
 
Aug 25, 2004
11,151
1
81
Originally posted by: Armitage
How big do you expect the table to get?
Is it worth separating the file name and the path into two seperate columns?

The table's gonna start with about 3000+ rows, I'm expecting it to grow to atleast 10,000+ rows.

I hadn't thought about splitting the file name and path... is this something I should be looking into?
 

Drakkon

Diamond Member
Aug 14, 2001
8,401
1
0
indexing is what your looking for....it is what optimizes lookups and scalability....

filname and path really shouldnt matter...unless your doing searches by filename/path...
 

KB

Diamond Member
Nov 8, 1999
5,406
389
126
Originally posted by: hotchilisauce
I writing a PHP app where one of the SQL tables has a column (lets call it "file") containing file paths (e.g. 'example/folder1/image1.gif'). Most of my queries would be something similar to:

select file from sometable where file='example/file1.ext';

I'm a beginner when it comes to databases - I know my way around with code, but I don't have much knowledge about the theoretical stuff (like normalization). Anyway, I was thinking the above query would get slow as the table size increased. Someone suggested using a hash value, so I was thinking I'd add a new column (lets call it "hash") and this would have hash values from 0 to 99. The sql query could then be modified to look like this:

select file from sometable where hash=5 and file='example/file1.ext';

My question is, will this help speed up my app, or am I just wasting my time???

Thank you :)



Doing what you propose would be a step in the wrong direction. You would be adding a second comparison to your SQL statement, thus making it take longer.

If you will be searching by path only, do as notfred said and build an index on the filepath field.

The best thing to do is to try to give your files IDs and then index that unique ID. Then you will be searching for a number (which is faster than searching for a string) in an indexed field (which is faster than searching non-indexed).

So table looks like:

FileID - FileName
1 - file1.gif
2 - file2.gif
3 - fil3.jpg


query is
'Select file where file_id = 2'
 

statik213

Golden Member
Oct 31, 2004
1,654
0
0
Originally posted by: KB
Originally posted by: hotchilisauce
I writing a PHP app where one of the SQL tables has a column (lets call it "file") containing file paths (e.g. 'example/folder1/image1.gif'). Most of my queries would be something similar to:

select file from sometable where file='example/file1.ext';

I'm a beginner when it comes to databases - I know my way around with code, but I don't have much knowledge about the theoretical stuff (like normalization). Anyway, I was thinking the above query would get slow as the table size increased. Someone suggested using a hash value, so I was thinking I'd add a new column (lets call it "hash") and this would have hash values from 0 to 99. The sql query could then be modified to look like this:

select file from sometable where hash=5 and file='example/file1.ext';

My question is, will this help speed up my app, or am I just wasting my time???

Thank you :)



Doing what you propose would be a step in the wrong direction. You would be adding a second comparison to your SQL statement, thus making it take longer.

If you will be searching by path only, do as notfred said and build an index on the filepath field.

The best thing to do is to try to give your files IDs and then index that unique ID. Then you will be searching for a number (which is faster than searching for a string) in an indexed field (which is faster than searching non-indexed).

So table looks like:

FileID - FileName
1 - file1.gif
2 - file2.gif
3 - fil3.jpg


query is
'Select file where file_id = 2'


actually the second comparison would make things faster depending on how good his hasihing algorithm is. Since he has:
where hash=5 and file='example/file1.ext'
The second longer string comparision would occur only if the hash=5 test passes (I'm pretty sure SQL evaluates expressions left to right).... but if his hashing mecahnism is bad then he wouldn't see much of an improvement.

But, Indexing is really the way to go.... Don't waste time with the hashiing stuff. Also if you know that you'll be matching files most of the time on it's basename and not the complete path then split the name and path in to seperate fields.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: hotchilisauce
Originally posted by: Armitage
How big do you expect the table to get?
Is it worth separating the file name and the path into two seperate columns?

The table's gonna start with about 3000+ rows, I'm expecting it to grow to atleast 10,000+ rows.

I hadn't thought about splitting the file name and path... is this something I should be looking into?

Meh - for that size, don't even worry about it. Keep the path & filename together (unless there's another reason to split them) and index the whole thing.

Indexes on text columns can get expensive, and it increases with the length of the string. If you split the filename from the pathname, you might gain some advantage in the indexing, at the expense of slightly more complicated code to work with it. But your table is so small, I doubt it will make any difference unless you're doing some really ugly stuff with it.