Does anyone else think this is a ridiculously stupid algorithm?

notfred

Lifer
Feb 12, 2001
38,241
4
0
I'm reverse engineering a commercial php shopping cart application. I needed to figure out how it comes up with item numbers when the administrator adds a new product to the catalog.

This is the method it uses:

1) Select a random 8 digit integer.
2) Query the database to see if a product already exists with that item number.
3) repeat until step 2 is false.

Am I the only one who thinks that's ridiculous?
 

ArmchairAthlete

Diamond Member
Dec 3, 2002
3,763
0
0
As more and more products are added to the database the algorithm will get more and more inefficient.

Why can't the 8 digit number start at "00000000" and go up by one with every item added?

EDIT: You could go further and organize it so that seperate categories of items have their own group of numbers. Like processors are "1xxxxxxxx", motherboards "2xxxxxxx", etc.
 

StageLeft

No Lifer
Sep 29, 2000
70,150
5
0
Originally posted by: ArmchairAthlete
As more and more products are added to the database the algorithm will get more and more inefficient.

Why can't the 8 digit number start at "00000000" and go up by one with every item added?

EDIT: You could go further and organize it so that seperate categories of items have their own group of numbers. Like processors are "1xxxxxxxx", motherboards "2xxxxxxx", etc.
Yeah it should just be based on a primary key automatically increasing one by one. It sounds like whomever did the original one either didn't understand chapter one of databases or perhaps he merely wanted his product numbers to seem like the company had a lot of products instead of starting somewhere low like 0, but he could always have seeded the field at 1000 or something.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Hmm, if the database has 100,000 products the odds of a collision / retry adding a new product are still only 1 in 1,000 so retries shouldn't happen too often. Average number of db queries would be 1.001

It seems like a very reasonable method for under 10 million products, especially if there's any reason to want the IDs sparsely placed and/or out of order.
 

StageLeft

No Lifer
Sep 29, 2000
70,150
5
0
It seems like a very reasonable method for under 10 million products, especially if there's any reason to want the IDs sparsely placed and/or out of order.
If you need randomized entries, for some reason, it's really the only way to do it. I'm not sure why you would though :)
 

AFB

Lifer
Jan 10, 2004
10,718
3
0
Well, this works fine for most small stores and makes it look like they have a lot of products. OTOH, wouldn't it be smarter to use an auto increment row for ID ???
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.
 
Aug 16, 2001
22,505
4
81
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.
 

manly

Lifer
Jan 25, 2000
13,318
4,093
136
Originally posted by: Skoorb
It seems like a very reasonable method for under 10 million products, especially if there's any reason to want the IDs sparsely placed and/or out of order.
If you need randomized entries, for some reason, it's really the only way to do it. I'm not sure why you would though :)
This post wins the obvious logic of the day award. :p
 

AFB

Lifer
Jan 10, 2004
10,718
3
0
Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: amdfanboy
Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

To find the last ID number used is a constant time operation - the database keeps track of the next ID to use (primary key / auto_increment).

For plain old number columns (i.e. not the primary key), you would probably have to do a search, but since he's using it as a primary key, no search is required. Even if you had to search, one pass is faster than generating a random number, doing a full pass anyway, potentially multiple times.
 
Aug 16, 2001
22,505
4
81
Originally posted by: amdfanboy
Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

Forgot that.
Either way I don't think it is any slower to use the random numbers, just different. It will obviously be slower if the random number genrator generates the same number twice.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: FrustratedUser
Originally posted by: amdfanboy
Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

Forgot that.
Either way I don't think it is any slower to use the random numbers, just different. It will obviously be slower if the random number genrator generates the same number twice.

If it's no slower, why do you think DB designers provide auto-increment primary keys? It's an idiotic algorithm.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
The speed of that algorithm would sort of depend on a bunch of external things.

Does the database have a good index on the primary key? (if so this would drastically reduce lookup time)
How good is the random number generation? If it doesn't get a unique seed every time then it might regenerate the same sequence every time, causing it to recheck every previous entry every time before it finds a match. Sounds stupid but I've seen it happen.

That being said, it's not a very wise choice of algorithms, not least because it's probably somewhat confusing to reverse-engineer, making it harder for future developers to stay sane. The only advantage that I see is that he knows the primary key before inserting the new record (which could be done in a pseudo-auto_increment anyways).
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
What they should do is choose a serial key that makes sense, something that a human can relate to or at least look up in the book.

Like if it's a department store. Sporting goods will be a string like sprt012345-1, then if the item gets replaced by a updated product then you could have the number be srpt012345-2 and so on and so forth. Then a salesment would just do a query for sprt012345 and then results woudl come back and they could easily tell a customer "well we discontinued that item, but we have several things that can replace your broken/lost/warenteed item"

Stuff like that. The number should at least reflect some sort of information, so sort of coded thing.

That way when you add a item to the database, you chose "sports dept", "ball", etc etc. Then a alpha-numeric number will be generated on that. Then tacked onto the end would be a random number however large is needed to make sure everything stays unique.

Ever looked up Vin numbers for cars? Thats basicly what they do. So instead of keeping a database of every f-ing car in existance, a record is made for each car, but most places (like insurance places) just decode the vin to find out what year/model/color/engine type etc etc. That way you don't have to mirror the entire dealership database for every computer that needs to deal with VIN numbers.

Sounds like whoever designed the system for the store was a lazy bastard and knew the people he worked for either wern't smart enough or cared enough to spot it.

Either that it was designed by commitee who were to ignorant to let the poor guy get is work done and gave him innane instuctions "like make the numbers for the items look high so it looks like we have lots of stuff".


Also keep in mind that item numbers, serial numbers and stuff like that are not the same as primary keys for records in a database, Although they can be.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: drag
Also keep in mind that item numbers, serial numbers and stuff like that are not the same as primary keys for records in a database, Although they can be.

Exactly drag. He's not talking about serial numbers. He is talking about primary keys.

Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
Originally posted by: kamper
Originally posted by: drag
Also keep in mind that item numbers, serial numbers and stuff like that are not the same as primary keys for records in a database, Although they can be.

Exactly drag. He's not talking about serial numbers. He is talking about primary keys.

Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.

Ya, I guess I didn't make it down that far.... eh....:eek:
 

StageLeft

No Lifer
Sep 29, 2000
70,150
5
0
Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.
Then it's simply wrong. He should just insert a product and the field will auto-increment! :)
 

AFB

Lifer
Jan 10, 2004
10,718
3
0
Originally posted by: Skoorb
Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.
Then it's simply wrong. He should just insert a product and the field will auto-increment! :)

DING DING DING
 

tinyabs

Member
Mar 8, 2003
158
0
0
When you insert a random primary key into the table, wouldn't the database returns an exception when there is a collision? How about keep generating and inserting until it succeed.

It's single pass instead of 2 passes.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Super idea!! So now instead of checking a well defined result set you can try to figure out what the error was when your sql statement fails. In a scripting language no less. Optimizations that give you linear improvement are almost always a bad idea as the extra confusion isn't worth the efficiency savings. The only benefit to your method is the transactional security you gain from only performing one query per iteration. The same thing could be had with auto_increment.
 

AFB

Lifer
Jan 10, 2004
10,718
3
0
Originally posted by: tinyabs
When you insert a random primary key into the table, wouldn't the database returns an exception when there is a collision? How about keep generating and inserting until it succeed.

It's single pass instead of 2 passes.

?
That is step two.

do{
$r = random();
$results = query(insert......$r);
}
while($results == false)

Step 1: Make random number
Step 2: Query and get results
Step 3: Check result and decide if you should repeat.