• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

creating userID (int) - Use AutoIncrement?

Hi Guys,

Let's say you are building a user table.

Each user is going to get a userID, which is just a unique number for every account. What the number is is inconsequential as long as it's unique.

Would it be best to use the primaryKey/rowID for each userID?

For instance, let's look at a basic table.

userID (int, PK, auto)
userName (varchar255)
email (varchar255)

Is that ok? The UserID would simply autoincrement as records are inserted and the DBMS would decide what ID# was used next.

Or would it be best to determine this record ID # programatically so that I'm not reliant on the DBMS to keep userID's in order. In that case, I would use

rowID (int,auto)
userID (int, PK)
userName (varchar255)
email (varchar255)

The rowID would autoincrease as records were inserted, but I would programmatically decide on a userID prior to the insert. Of course I could enforce unique ID's with the DBMS by making sure the inserted userID is unique during the insert, but if I do things correctly that shouldn't be much of an issue.

Thoughts?

1. Rely on DBMS to create unique ID
2. Create uniqueID externally
 
I don't have any problem with an auto-increment PK for the user ID. In some cases I might make it a GUID, but I think deriving it externally and passing it in is probably more error prone than relying on the DB.
 
> so that I'm not reliant on the DBMS to keep userID's in order.

Please explain this requirement in more detail. The auto-increment will guarantee uniqueness (never a duplicate ID), but with multiple near-simultaneous row inserts you might not know which process (or server in a cluster) would get its new row added first.

No ID numbers would be skipped at first, but if you delete rows then you'll add gaps in the existing numbers. (New inserts will still get new, unique IDs after that, deleted IDs would not be re-used.)

The only reason why I'd bother with GUIDs is security, if you don't want a bad person to be able to guess at valid ID numbers for some reason.
 
Last edited:
Autoincrement/Identity or GUID. Relying on the application to generate this column may eliminate any ability to import data through SQL tools (for whatever reason you may have).
 
1. I'd like to avoid long userID's so a GUID would be pretty nasty. In fact, I'd like to keep the userID's simple. I don't have any real security reason to hide/encrypt them. I'd be perfectly happy with the 1st record being '1' and it auto incrementing by 1.

ex.

UserID UserName Email

1 Techboyjk techboyjk@gmail.com
2 DaveSimmons dave@gmail.com
3 nickbits nick@gmail.com


2. Performance wise, I've leaned towards having the DBMS do it since the application would have to calculate the next ID, lock it, and insert it. If I do it via the DBMS, by design it's not going to screw up the autoincrement, and it will more than likely be significantly more efficient at determining what's the next ID number.
 
Since you just need a simple number, auto-incement is the best and safest way to do it. The DB engine will guarantee unique IDs.

Anything you do in code yourself could have race conditions if there is ever more than one process running at a time.
 
If you need auto-incrementing integers, have the database generate them.

If you need something else, your application should generate GUIDs. There is no reason to worry about race conditions or having to manually lock anything if you generate a GUID and insert it into the database. The chance of creating a duplicate GUID is astronomically low. That's the whole point of GUID.
 
I'd personally just use an auto-incrementing INT datatype. Requires much less storage per row than a GUID and is automatically handled by the sql engine.

Why reinvent the wheel?
 
If it's a web app and exposed anywhere in the UI: GUID. Predictable IDs are insecure.

Only if the ID number lets you do anything by itself. If your site allows something like

display-secret-documents?user=12345

... without requiring more such as a valid session cookie tied to that specific user, then it's already broken beyond the ability of a GUID to help.
 
It's not security by obscurity. Obviously don't make your system insecure in the first place where even if the key was a sequence it would not matter. At the same time, giving someone who is tying to own your system the ability to predict the next element in a sequence is bad practice.

Point is, you may think your code is airtight because you wrote it, that does not mean you throw best security practices out the window just because you're feeling confident.
 
You're arguing not to use row IDs in tables, because then an attacker might break past all other security.

I'd argue the effort needed to replace the row IDs with GUIDs is better spent on testing and hardening the rest of your code.

Both are of course opinions.
 
Let the database do autoincrement IDs. Make sure you scrub and sanitize data on your front end to ensure security. Keeping a numeric allows indexing to be faster.

Also, as a final consideration, if you are planning on having more than 2147483647 records in that table, consider bigint. I've seen plenty of bad bugs because of a bad datatype.
 
It's not security by obscurity. Obviously don't make your system insecure in the first place where even if the key was a sequence it would not matter. At the same time, giving someone who is tying to own your system the ability to predict the next element in a sequence is bad practice.

Point is, you may think your code is airtight because you wrote it, that does not mean you throw best security practices out the window just because you're feeling confident.

That means everything out there that utilizes numerical data types are bad practices.

http://forums.anandtech.com/showthread.php?t=2244269
http://forums.anandtech.com/showthread.php?t=2244268
http://forums.anandtech.com/showthread.php?t=2244267

Quick, someone alert a mod about a vulnerability in the site!
 
Last edited:
That means everything out there that utilizes numerical data types are bad practices.

http://forums.anandtech.com/showthread.php?t=2244269
http://forums.anandtech.com/showthread.php?t=2244268
http://forums.anandtech.com/showthread.php?t=2244267

Quick, someone alert a mod about a vulnerability in the site!

For data you want to obscure, he's correct. AutoIncrement makes things predictable. If you don't want somebody scraping your site, then use GUIDs. If you want to make it easy to scrape then go ahead and use an autoincrement and put it right in the URL. There are reasons to do both.
 
One reason that you might NOT want to use "userID" as the Primary Key is that primary keys can never be changed/edited. You would have to delete the entire row and recreate it with the new PK/userID. If you don't ever plan on changing userID's, this is not an issue. Just something to think about.
 
One reason that you might NOT want to use "userID" as the Primary Key is that primary keys can never be changed/edited. You would have to delete the entire row and recreate it with the new PK/userID. If you don't ever plan on changing userID's, this is not an issue. Just something to think about.

What dbms are you talking about?
Should have no issues updating the primary key as long as the foreign have 'on update cascade'.
 
Quick, someone alert a mod about a vulnerability in the site!

It does not matter much on a forum, it does matter if you don't want someone to mine your site and create metrics on your userbase and sequence of signups, etc. As mentioned, it's case dependent.

Also if you're sharding you'll avoid hotspots by using UUIDs.
 
For data you want to obscure, he's correct. AutoIncrement makes things predictable. If you don't want somebody scraping your site, then use GUIDs. If you want to make it easy to scrape then go ahead and use an autoincrement and put it right in the URL. There are reasons to do both.

If it's data someone shouldn't see, then you should not show it, unless they are authenticated. Plain and simple. Making the ID unpredictable doesn't change this basic need.

And if someone CAN gain information (like if they are denied information based on auth, or the use in question does not exist) then that is information leakage, and should also be prevented. On my secure sites, if you request data by ID, and the app cannot show it for any reason (not found, not authorized, etc) the app returns a generic error, not what the error is.
 
If it's data someone shouldn't see, then you should not show it, unless they are authenticated. Plain and simple. Making the ID unpredictable doesn't change this basic need.

And if someone CAN gain information (like if they are denied information based on auth, or the use in question does not exist) then that is information leakage, and should also be prevented. On my secure sites, if you request data by ID, and the app cannot show it for any reason (not found, not authorized, etc) the app returns a generic error, not what the error is.

Again, there are different solutions for different reasons. You may want randomized IDs but still want it in the URL so it can be bookmarked for instance. And yes, authentication and security is part of that. There is no one right answer, use the pieces that are correct for any particular problem.
 
Back
Top