Creating a unique identifier based on a string?

uclabachelor

Senior member
Nov 9, 2009
448
0
71
This question mainly applies to PHP, but can be generalized to any programming language....

Given any string of any length, is there any methods available to guarantee a fixed-length alphanumeric unique identifier by operating on that string?

I know base64 encode does the trick, but is variable length.

Both CRC32 and MD5 hashes are fixed-length and doesn't guarantee uniqueness but does a pretty good job.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Out of curiosity, what do you need the guaranteed uniqueness part for?

For most applications close-to-unique and deterministic (string 'a' always creates hash 'b') is good enough.
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,286
147
106
There is no way to have a fixed length key that is completely unique base on a variable length string (unless the string is shorter then the fixed length)

Look up hashing algorithms, if you need uniqueness, then you want to look at cryptohashing algorithms. If you just want speed and fairly good uniqueness, then a regular hashing algorithm should do the trick.
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
What you're asking for is literally impossible. Consider:

There are infinitely many strings that qualify as "any string of any length."

There is a finite number of alphanumeric identifiers of a fixed length.

Therefore, you are trying to create a mapping from an infinite set to a finite set. This is not possible. If it's not clear to you why, read this:

http://en.wikipedia.org/wiki/Injective_function

In particular, this statement:

"If f : X → Y is an injective function, then Y has at least as many elements as X."
 

Red Squirrel

No Lifer
May 24, 2003
71,335
14,092
126
www.anyf.ca
MD5 is probably the best bet for your needs. Note that a certain MD5 string can have unlimited possibilities of matches, but it's fairly "unique".

If you want 100% unique then your string would have to be variable length. Basically you'd just be encrypting/encoding it.
 

Argo

Lifer
Apr 8, 2000
10,045
0
0
MD5 is probably the best bet for your needs. Note that a certain MD5 string can have unlimited possibilities of matches, but it's fairly "unique".

If you want 100% unique then your string would have to be variable length. Basically you'd just be encrypting/encoding it.

SHA will provide less collisions. OP - what are you trying to do?
 

uclabachelor

Senior member
Nov 9, 2009
448
0
71
I have a dataset containing a few million records and each need a unique id assigned, preferably an id that is randomly generated and fixed length. The ISBN number of book is a perfect example of what I'm trying to do, except more randomized.

I was going to do something like:

var a = generateRandomKey();

while(databaseExists(a))
a = generateRandomkey();

save(a);

The number possibilities would be X^Y where X is the # of characters in the character set and Y is the character length.

So for X = 36 (uppercase letters and 0-9) and Y = 10, there's 3.6 x 10^15 possibilities, which is quite a longs way to go before the chances of duplicates increases.

The generateRandomKey() function is the function I'm looking for... as of right now, it's a truncated md5/sha1/uniqid/crc32 hash of a random string/number.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
How about this? (trivially obeys uniqueness requirements)
Code:
int generateRandomKey() {
  return lastKey++;
}
 

Argo

Lifer
Apr 8, 2000
10,045
0
0
Read up on UUID, that should be exactly what you need. I tend to base64 encode them, to come up with a 22 char, human readable string.