Why should every table use a unique integer column as its primary key?

ICRS

Banned
Apr 20, 2008
1,328
0
0
I just thought about this, after seing a post. Why is this a requirement for so many people. Our DBA requires this, and I don't see it as compleatly efficient. We have tables with unique integers as primary key which would never be used a join, or a filter, or anything. I see it serving no real purpose.
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
Just as a side note, the new norm for large tables and data warehouses is to actually use bigint (instead of the conventional int) in PK, IDENTITY columns.
 

techfuzz

Diamond Member
Feb 11, 2001
3,107
0
76
For properly designed databases that have been normalized, a 2NF/3NF form more or less requires primary keys in just about every situation. Now requiring the primary keys to be int/bigint columns that has use is just plain stupid. In addition to wasting space, it actually slows down your database because for every insert/delete the primary column's index has to be updated. If that column is never used you're adding overhead that doesn't need to be there.

When I design databases, I do my best to normalize them but maybe not to a strict 2NF/3NF form, but they're close in most situations. Consequently, I usually end up with int/big primary keys on just about every table that I do plan to use for joins and filtering because they are very easy to deal with from a uniqueness perspective. Virtually all the data I work with is non-unique so I typically utilize a int/bigint primary column for uniqueness.

Something else you have to consider, even though you as a database person might not use a primary key, your application developers might. I might not need a primary key in a table, but I will add one if the contents of the table are used in an application where it is easier for the programmers to reference the table rows by a primary key instead of other row values. From an application programmer's view, primary keys as int/bigint are probably preferable to most.

techfuzz
 

ICRS

Banned
Apr 20, 2008
1,328
0
0
Originally posted by: techfuzz
For properly designed databases that have been normalized, a 2NF/3NF form more or less requires primary keys in just about every situation. Now requiring the primary keys to be int/bigint columns that has use is just plain stupid. In addition to wasting space, it actually slows down your database because for every insert/delete the primary column's index has to be updated. If that column is never used you're adding overhead that doesn't need to be there.

When I design databases, I do my best to normalize them but maybe not to a strict 2NF/3NF form, but they're close in most situations. Consequently, I usually end up with int/big primary keys on just about every table that I do plan to use for joins and filtering because they are very easy to deal with from a uniqueness perspective. Virtually all the data I work with is non-unique so I typically utilize a int/bigint primary column for uniqueness.

Something else you have to consider, even though you as a database person might not use a primary key, your application developers might. I might not need a primary key in a table, but I will add one if the contents of the table are used in an application where it is easier for the programmers to reference the table rows by a primary key instead of other row values. From an application programmer's view, primary keys as int/bigint are probably preferable to most.

techfuzz

We have a table.

Col1: ID (Identity) Int
Col2: ForeignID Int (Foreign key from another table)
Col3: Date Datetime
Col4: Amount Money

Here Col2 and Col3 together creat a unique row. What I don't really see is why we must have Col1, which not event the software developers would have use for.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Originally posted by: ICRS
Originally posted by: techfuzz
For properly designed databases that have been normalized, a 2NF/3NF form more or less requires primary keys in just about every situation. Now requiring the primary keys to be int/bigint columns that has use is just plain stupid. In addition to wasting space, it actually slows down your database because for every insert/delete the primary column's index has to be updated. If that column is never used you're adding overhead that doesn't need to be there.

When I design databases, I do my best to normalize them but maybe not to a strict 2NF/3NF form, but they're close in most situations. Consequently, I usually end up with int/big primary keys on just about every table that I do plan to use for joins and filtering because they are very easy to deal with from a uniqueness perspective. Virtually all the data I work with is non-unique so I typically utilize a int/bigint primary column for uniqueness.

Something else you have to consider, even though you as a database person might not use a primary key, your application developers might. I might not need a primary key in a table, but I will add one if the contents of the table are used in an application where it is easier for the programmers to reference the table rows by a primary key instead of other row values. From an application programmer's view, primary keys as int/bigint are probably preferable to most.

techfuzz

We have a table.

Col1: ID (Identity) Int
Col2: ForeignID Int (Foreign key from another table)
Col3: Date Datetime
Col4: Amount Money

Here Col2 and Col3 together creat a unique row. What I don't really see is why we must have Col1, which not event the software developers would have use for.
Depending on your resolution of time, you could end up with a duplicate "unique" record.

Is it wise to take than chance?

 

nakedfrog

No Lifer
Apr 3, 2001
62,794
18,987
136
Originally posted by: techfuzz
For properly designed databases that have been normalized, a 2NF/3NF form more or less requires primary keys in just about every situation. Now requiring the primary keys to be int/bigint columns that has use is just plain stupid. In addition to wasting space, it actually slows down your database because for every insert/delete the primary column's index has to be updated. If that column is never used you're adding overhead that doesn't need to be there.

How much space are you really wasting, honestly, especially when storage is dirt cheap these days? What would you use that's more efficient as a PK than int/bigint?
I've also always heard that you should always have a unique PK for performance reasons.
 

techfuzz

Diamond Member
Feb 11, 2001
3,107
0
76
Originally posted by: nakedfrog
How much space are you really wasting, honestly, especially when storage is dirt cheap these days? What would you use that's more efficient as a PK than int/bigint?
I've also always heard that you should always have a unique PK for performance reasons.
True, storage is cheap and you're not wasting that much space with an int/bigint unless you're already bumping up against the row limit page size. Then considerations like this have to be taken into account.

I am interested to see anything that says a unique PK is good performance reasons if it is never used.

techfuzz
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Because someone is not using the PK at present, does not mean that it may not be useful down the road.

Pennywise, pound foolish
 

nakedfrog

No Lifer
Apr 3, 2001
62,794
18,987
136
Originally posted by: techfuzz
Originally posted by: nakedfrog
How much space are you really wasting, honestly, especially when storage is dirt cheap these days? What would you use that's more efficient as a PK than int/bigint?
I've also always heard that you should always have a unique PK for performance reasons.
True, storage is cheap and you're not wasting that much space with an int/bigint unless you're already bumping up against the row limit page size. Then considerations like this have to be taken into account.

I am interested to see anything that says a unique PK is good performance reasons if it is never used.

techfuzz

Free reg required, but available from bugmenot.com
 

techfuzz

Diamond Member
Feb 11, 2001
3,107
0
76
Originally posted by: nakedfrog
Originally posted by: techfuzz
Originally posted by: nakedfrog
How much space are you really wasting, honestly, especially when storage is dirt cheap these days? What would you use that's more efficient as a PK than int/bigint?
I've also always heard that you should always have a unique PK for performance reasons.
True, storage is cheap and you're not wasting that much space with an int/bigint unless you're already bumping up against the row limit page size. Then considerations like this have to be taken into account.

I am interested to see anything that says a unique PK is good performance reasons if it is never used.

techfuzz

Free reg required, but available from bugmenot.com

Ok, so I read the article and while he does give many valid reasons, there are no performance reasons given. Everything he says is a common sense answer that has already been debated here. What I would like to read is something from someone who has figured out that adding a PK to a table increases performance when the PK is not used. Charts, graphs, plain metrics would be nice.

I find it amusing that later in the article, the writer even admits that while he recommends you add a PK and clustered index to every table, it is not the best practice.... but continues to insist it.

techfuzz
 

ICRS

Banned
Apr 20, 2008
1,328
0
0
Originally posted by: nakedfrog
Originally posted by: techfuzz
Originally posted by: nakedfrog
How much space are you really wasting, honestly, especially when storage is dirt cheap these days? What would you use that's more efficient as a PK than int/bigint?
I've also always heard that you should always have a unique PK for performance reasons.
True, storage is cheap and you're not wasting that much space with an int/bigint unless you're already bumping up against the row limit page size. Then considerations like this have to be taken into account.

I am interested to see anything that says a unique PK is good performance reasons if it is never used.

techfuzz

Free reg required, but available from bugmenot.com

What is wrong with a multicolumn primary key?
 

nakedfrog

No Lifer
Apr 3, 2001
62,794
18,987
136
Originally posted by: ICRS
Originally posted by: nakedfrog
Originally posted by: techfuzz
Originally posted by: nakedfrog
How much space are you really wasting, honestly, especially when storage is dirt cheap these days? What would you use that's more efficient as a PK than int/bigint?
I've also always heard that you should always have a unique PK for performance reasons.
True, storage is cheap and you're not wasting that much space with an int/bigint unless you're already bumping up against the row limit page size. Then considerations like this have to be taken into account.

I am interested to see anything that says a unique PK is good performance reasons if it is never used.

techfuzz

Free reg required, but available from bugmenot.com

What is wrong with a multicolumn primary key?

Besides being a PITA?
Wouldn't joining on a single int field be better than joining on multiple fields, especially if one of them is a char field?
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
I can't produce any physical evidence, but there might be a slight performance hit when you have a PK (int/bigint) that is not used. However, it does not necessarily mean that you have a clustered index on that key. We usually have a few tables that have PKs that are hardly used in our datawarehousing applications... we simply don't create a clustered index on that column as long as there is another column that can be used for joining on. As I said, I can't give you the evidence, but my opinions are based on huge tables that I deal with on a daily basis. Regarding the page size issue, I don't think that matters, too (unless you're dealing with varbinary-like data types)... you can always play around with things like the fill factor and set them according to your preferences (again, this is specific to SQL Server) to tweak the performance.

A composite key is definitely a pain in terms of performance and size implications... most likely, you'll have your index covering both the columns, which means a bigger performance hit when page split occurs as well as a bigger index size. Also, in the OP's case, the index can't be unique, which means more work for the underlying algorithms of the DB engine when it comes to sorting and hashing data. Having an auto-incrementing PK, which is unique, but not used is perfectly fine - many people (including me) have been burned around schemas that didn't contain such columns at later points in time when it came to maintaining the data. It is better to have a redundant PK now and use later as opposed to not have a proper PK now and then getting burned later when you need - in most such scenarios, a schema change would be ugly.

I'd also argue that your design/normalization could be flawed if you have a table whose PK is not being referenced anywhere within your schema. Unless, of course, it is a datwarehousing-type of schema where denormalization is norm. Most data warehouses have a PK that is not being used anywhere else except to perform incremental loads and things of that nature.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: Dhaval00
I can't produce any physical evidence, but there might be a slight performance hit when you have a PK (int/bigint) that is not used. However, it does not necessarily mean that you have a clustered index on that key. We usually have a few tables that have PKs that are hardly used in our datawarehousing applications... we simply don't create a clustered index on that column as long as there is another column that can be used for joining on. As I said, I can't give you the evidence, but my opinions are based on huge tables that I deal with on a daily basis. Regarding the page size issue, I don't think that matters, too (unless you're dealing with varbinary-like data types)... you can always play around with things like the fill factor and set them according to your preferences (again, this is specific to SQL Server) to tweak the performance.

A composite key is definitely a pain in terms of performance and size implications... most likely, you'll have your index covering both the columns, which means a bigger performance hit when page split occurs as well as a bigger index size. Also, in the OP's case, the index can't be unique, which means more work for the underlying algorithms of the DB engine when it comes to sorting and hashing data. Having an auto-incrementing PK, which is unique, but not used is perfectly fine - many people (including me) have been burned around schemas that didn't contain such columns at later points in time when it came to maintaining the data. It is better to have a redundant PK now and use later as opposed to not have a proper PK now and then getting burned later when you need - in most such scenarios, a schema change would be ugly.

I'd also argue that your design/normalization could be flawed if you have a table whose PK is not being referenced anywhere within your schema. Unless, of course, it is a datwarehousing-type of schema where denormalization is norm. Most data warehouses have a PK that is not being used anywhere else except to perform incremental loads and things of that nature.

That and asking your DBA to add a PK for a table with 800 million + rows will put you on his/her bad side :p

 

techfuzz

Diamond Member
Feb 11, 2001
3,107
0
76
Originally posted by: Dhaval00
Regarding the page size issue, I don't think that matters, too (unless you're dealing with varbinary-like data types)...
I have to work with these kind of data types occasionally, but I don't take them into consideration when it comes to page size. My databases are certainly not "mission critical" and the performance impact of the page size just isn't a consideration. I only meant to bring it up to show it could be a consideration for some people.

Having an auto-incrementing PK, which is unique, but not used is perfectly fine - many people (including me) have been burned around schemas that didn't contain such columns at later points in time when it came to maintaining the data. It is better to have a redundant PK now and use later as opposed to not have a proper PK now and then getting burned later when you need - in most such scenarios, a schema change would be ugly.
Agreed, I would prefer to have the PK available from the beginning so that later when you do need it, it's already there. I'm not really advocating against their use, I meant to show that adding an int/bigint PK just for the sake of it is stupid. If you can use another EXISTING column, by all means use it.

I'd also argue that your design/normalization could be flawed if you have a table whose PK is not being referenced anywhere within your schema.
I also agree with this point of yours. I wholeheartedly think that if your design comes up with a table without a PK after normalization, you've probably made a mistake. I'd have to pull out my db design books from school to confirm this opinion, but I'm pretty sure they would back it up.

techfuzz

 

brandonb

Diamond Member
Oct 17, 2006
3,731
2
0
Even though it may already be mentioned or mentioned in an article link to someons post.

The reason why you use an identity column for new rows in the database even if its used as a search criteria or not is because identity columns are added to the back of the database file.

In Microsoft SQL server (my last training class was 2003 so it may have changed)...

The database is stored in a series of 8k pages in the file.


8k page - 8k page - 8k page - 8k page - 8k page -> to the end of the data.

Identity makes sure that the row is added to the last page. If you don't do this, what happens if you fill up the page? SQL server has to break 2 pages apart, fill them up with data (4k on each page) then add your row. That slows down processing. Its much more effecient to just add a new blank page to the db at the end rather than copying data from one page to another and update indexes, etc.

Thats why you use identity column for your clustered index (clustered index is the main database file which should be build in ascending order like that)... All other indexes can be built off that with whatever criteria with a non clustered index.
 

JACKDRUID

Senior member
Nov 28, 2007
729
0
0
techfuzz

We have a table.

Col1: ID (Identity) Int
Col2: ForeignID Int (Foreign key from another table)
Col3: Date Datetime
Col4: Amount Money

Here Col2 and Col3 together creat a unique row. What I don't really see is why we must have Col1, which not event the software developers would have use for.


1. not true. for instance, 2 users using same login name (col2), user#1 live in california, user#2 live in east coast. user #1 enter data at 10:00am california time, user#2 enter data at 1:00pm easter time. Server based in california. it will show up 10:00am for both records.

2. software developers hate using time fields as unique identifier. reason being time datatype is implemented differently across different systems.

3. if you change foreign id, you lose the data, theoratically. Unique id (identity) is something that can never be changes, or better yet, not even visible to user. with col1, your will still be able to link to the record even if col2 has been modified.
 

ICRS

Banned
Apr 20, 2008
1,328
0
0
Originally posted by: JACKDRUID
techfuzz

We have a table.

Col1: ID (Identity) Int
Col2: ForeignID Int (Foreign key from another table)
Col3: Date Datetime
Col4: Amount Money

Here Col2 and Col3 together creat a unique row. What I don't really see is why we must have Col1, which not event the software developers would have use for.


1. not true. for instance, 2 users using same login name (col2), user#1 live in california, user#2 live in east coast. user #1 enter data at 10:00am california time, user#2 enter data at 1:00pm easter time. Server based in california. it will show up 10:00am for both records.

2. software developers hate using time fields as unique identifier. reason being time datatype is implemented differently across different systems.

3. if you change foreign id, you lose the data, theoratically. Unique id (identity) is something that can never be changes, or better yet, not even visible to user. with col1, your will still be able to link to the record even if col2 has been modified.

These are actually contract dates, it is predefined. It isn't the time the user enters the data.
 

sao123

Lifer
May 27, 2002
12,653
205
106
Originally posted by: ICRS
Originally posted by: JACKDRUID
techfuzz

We have a table.

Col1: ID (Identity) Int
Col2: ForeignID Int (Foreign key from another table)
Col3: Date Datetime
Col4: Amount Money

Here Col2 and Col3 together creat a unique row. What I don't really see is why we must have Col1, which not event the software developers would have use for.


1. not true. for instance, 2 users using same login name (col2), user#1 live in california, user#2 live in east coast. user #1 enter data at 10:00am california time, user#2 enter data at 1:00pm easter time. Server based in california. it will show up 10:00am for both records.

2. software developers hate using time fields as unique identifier. reason being time datatype is implemented differently across different systems.

3. if you change foreign id, you lose the data, theoratically. Unique id (identity) is something that can never be changes, or better yet, not even visible to user. with col1, your will still be able to link to the record even if col2 has been modified.

These are actually contract dates, it is predefined. It isn't the time the user enters the data.


A foreign key (which create a table relationship) should never be part of the primary key.
So now your primary key consists largely of a datetime field. This is just bad practice in general.