Does anyone else think this is a ridiculously stupid algorithm?

notfred · Jun 26, 2004

I'm reverse engineering a commercial php shopping cart application. I needed to figure out how it comes up with item numbers when the administrator adds a new product to the catalog.

This is the method it uses:

1) Select a random 8 digit integer.
2) Query the database to see if a product already exists with that item number.
3) repeat until step 2 is false.

Am I the only one who thinks that's ridiculous?

ArmchairAthlete · Jun 26, 2004

As more and more products are added to the database the algorithm will get more and more inefficient.

Why can't the 8 digit number start at "00000000" and go up by one with every item added?

EDIT: You could go further and organize it so that seperate categories of items have their own group of numbers. Like processors are "1xxxxxxxx", motherboards "2xxxxxxx", etc.

JavaMomma · Jun 26, 2004

Yes it is stupid.
Although I have seen much much worse.

StageLeft · Jun 26, 2004

Originally posted by: ArmchairAthlete
As more and more products are added to the database the algorithm will get more and more inefficient.

Why can't the 8 digit number start at "00000000" and go up by one with every item added?

EDIT: You could go further and organize it so that seperate categories of items have their own group of numbers. Like processors are "1xxxxxxxx", motherboards "2xxxxxxx", etc.

Yeah it should just be based on a primary key automatically increasing one by one. It sounds like whomever did the original one either didn't understand chapter one of databases or perhaps he merely wanted his product numbers to seem like the company had a lot of products instead of starting somewhere low like 0, but he could always have seeded the field at 1000 or something.

DaveSimmons · Jun 26, 2004

Hmm, if the database has 100,000 products the odds of a collision / retry adding a new product are still only 1 in 1,000 so retries shouldn't happen too often. Average number of db queries would be 1.001

It seems like a very reasonable method for under 10 million products, especially if there's any reason to want the IDs sparsely placed and/or out of order.

StageLeft · Jun 26, 2004

It seems like a very reasonable method for under 10 million products, especially if there's any reason to want the IDs sparsely placed and/or out of order.

If you need randomized entries, for some reason, it's really the only way to do it. I'm not sure why you would though

AFB · Jun 26, 2004

Well, this works fine for most small stores and makes it look like they have a lot of products. OTOH, wouldn't it be smarter to use an auto increment row for ID ???

notfred · Jun 26, 2004

I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.

FrustratedUser · Jun 26, 2004

It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

manly · Jun 26, 2004

Originally posted by: Skoorb

It seems like a very reasonable method for under 10 million products, especially if there's any reason to want the IDs sparsely placed and/or out of order.

Click to expand...

If you need randomized entries, for some reason, it's really the only way to do it. I'm not sure why you would though

This post wins the obvious logic of the day award.

AFB · Jun 26, 2004

Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

CTho9305 · Jun 26, 2004

Originally posted by: amdfanboy

Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Click to expand...

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

To find the last ID number used is a constant time operation - the database keeps track of the next ID to use (primary key / auto_increment).

For plain old number columns (i.e. not the primary key), you would probably have to do a search, but since he's using it as a primary key, no search is required. Even if you had to search, one pass is faster than generating a random number, doing a full pass anyway, potentially multiple times.

FrustratedUser · Jun 26, 2004

Originally posted by: amdfanboy

Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Click to expand...

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

Forgot that.
Either way I don't think it is any slower to use the random numbers, just different. It will obviously be slower if the random number genrator generates the same number twice.

CTho9305 · Jun 27, 2004

Originally posted by: FrustratedUser

Originally posted by: amdfanboy

Originally posted by: FrustratedUser
It might be faster doing it that way rather than to make a query to the db to check the last number.
The odds of a collision is pretty small with an eight digit number.

Click to expand...

Did you not read ? It makes a query to the DB to search the whole thing to see if it is taken. If so, it must repeat that.

Click to expand...

Forgot that.
Either way I don't think it is any slower to use the random numbers, just different. It will obviously be slower if the random number genrator generates the same number twice.

If it's no slower, why do you think DB designers provide auto-increment primary keys? It's an idiotic algorithm.

kamper · Jun 27, 2004

The speed of that algorithm would sort of depend on a bunch of external things.

Does the database have a good index on the primary key? (if so this would drastically reduce lookup time)
How good is the random number generation? If it doesn't get a unique seed every time then it might regenerate the same sequence every time, causing it to recheck every previous entry every time before it finds a match. Sounds stupid but I've seen it happen.

That being said, it's not a very wise choice of algorithms, not least because it's probably somewhat confusing to reverse-engineer, making it harder for future developers to stay sane. The only advantage that I see is that he knows the primary key before inserting the new record (which could be done in a pseudo-auto_increment anyways).

drag · Jun 27, 2004

What they should do is choose a serial key that makes sense, something that a human can relate to or at least look up in the book.

Like if it's a department store. Sporting goods will be a string like sprt012345-1, then if the item gets replaced by a updated product then you could have the number be srpt012345-2 and so on and so forth. Then a salesment would just do a query for sprt012345 and then results woudl come back and they could easily tell a customer "well we discontinued that item, but we have several things that can replace your broken/lost/warenteed item"

Stuff like that. The number should at least reflect some sort of information, so sort of coded thing.

That way when you add a item to the database, you chose "sports dept", "ball", etc etc. Then a alpha-numeric number will be generated on that. Then tacked onto the end would be a random number however large is needed to make sure everything stays unique.

Ever looked up Vin numbers for cars? Thats basicly what they do. So instead of keeping a database of every f-ing car in existance, a record is made for each car, but most places (like insurance places) just decode the vin to find out what year/model/color/engine type etc etc. That way you don't have to mirror the entire dealership database for every computer that needs to deal with VIN numbers.

Sounds like whoever designed the system for the store was a lazy bastard and knew the people he worked for either wern't smart enough or cared enough to spot it.

Either that it was designed by commitee who were to ignorant to let the poor guy get is work done and gave him innane instuctions "like make the numbers for the items look high so it looks like we have lots of stuff".

Also keep in mind that item numbers, serial numbers and stuff like that are not the same as primary keys for records in a database, Although they can be.

kamper · Jun 27, 2004

Originally posted by: drag
Also keep in mind that item numbers, serial numbers and stuff like that are not the same as primary keys for records in a database, Although they can be.

Exactly drag. He's not talking about serial numbers. He is talking about primary keys.

Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.

drag · Jun 27, 2004

Originally posted by: kamper

Originally posted by: drag
Also keep in mind that item numbers, serial numbers and stuff like that are not the same as primary keys for records in a database, Although they can be.

Click to expand...

Exactly drag. He's not talking about serial numbers. He is talking about primary keys.

Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.

Click to expand...

Ya, I guess I didn't make it down that far.... eh....

StageLeft · Jun 27, 2004

Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.

Then it's simply wrong. He should just insert a product and the field will auto-increment!

AFB · Jun 27, 2004

Originally posted by: Skoorb

Originally posted by: notfred
I should say that these aren't even really product numbers, they're just id numbers that the database uses as a primary key. There's a seperate field in the table for a user-visible product number.

Click to expand...

Then it's simply wrong. He should just insert a product and the field will auto-increment!

DING DING DING

tinyabs · Jun 28, 2004

When you insert a random primary key into the table, wouldn't the database returns an exception when there is a collision? How about keep generating and inserting until it succeed.

It's single pass instead of 2 passes.

kamper · Jun 29, 2004

Super idea!! So now instead of checking a well defined result set you can try to figure out what the error was when your sql statement fails. In a scripting language no less. Optimizations that give you linear improvement are almost always a bad idea as the extra confusion isn't worth the efficiency savings. The only benefit to your method is the transactional security you gain from only performing one query per iteration. The same thing could be had with auto_increment.

AFB · Jun 29, 2004

Originally posted by: tinyabs
When you insert a random primary key into the table, wouldn't the database returns an exception when there is a collision? How about keep generating and inserting until it succeed.

It's single pass instead of 2 passes.

?
That is step two.

do{
$r = random();
$results = query(insert......$r);
}
while($results == false)

Step 1: Make random number
Step 2: Query and get results
Step 3: Check result and decide if you should repeat.

Jeraden · Jun 29, 2004

Thats what you get for outsourcing your development to India.

notfred · Jun 30, 2004

Originally posted by: Jeraden
Thats what you get for outsourcing your development to India.

I'm pretty sure it was developed in Russia, actually.

Does anyone else think this is a ridiculously stupid algorithm?

Lifer

Diamond Member

Senior member

No Lifer

Elite Member

No Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Elite Member

Lifer

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

No Lifer

Lifer

Member

Diamond Member

Lifer

Platinum Member

Lifer