I'm building an app. For the purpose of keeping URL's short, I am considering using a notation other than decimal to represent record ID's in the URL, a lot of sites already do this. (Imgur comes to mind) but I always wondered, how do they prevent myDomain.com/f*ck or /cu*t, etc?
Let's explore some options:
Hexadecimal:
FFFF = 65,535 saving 1 char,
FFFFF = 1,048,575 saving 2 chars
Worst possible words to show up in a url: BED, FED, DED?
* No case sensitivity required
Base36:
zzz = 46,655 saving 2 chars
zzzz = 1,679,615 saving 3 chars
Worst possible words in 3 chars: sex, i love you, dik, with variations like f4g, s3x, d1k
Four letter possibilities: a lot of bad ones, with even more variations.
* No case sensitivity required
Base62:
ZZZ = 238,328 saving 3 chars
ZZZZ = 14,776,336 saving 4 chars
Worst possible words: all the same as Base36
* Case sensitivity required
Keeping a list of badwords that needs to be checked every time a new ID is generated would be a pain in the ass. I'm also considering letting users choose their own "url slug" when creating a record, just checking for uniqueness, with automated ones just generating something in dec/hex. This would do double duty as it would help with SEO
How do sites like Imgur prevent this? My app is supposed to be "family friendly" as a lot of the sponsors are family safe brands, I'd hate to have someone complain because their daughter's bookmarked event is myDomain.com/slUt
Let's explore some options:
Hexadecimal:
FFFF = 65,535 saving 1 char,
FFFFF = 1,048,575 saving 2 chars
Worst possible words to show up in a url: BED, FED, DED?
* No case sensitivity required
Base36:
zzz = 46,655 saving 2 chars
zzzz = 1,679,615 saving 3 chars
Worst possible words in 3 chars: sex, i love you, dik, with variations like f4g, s3x, d1k
Four letter possibilities: a lot of bad ones, with even more variations.
* No case sensitivity required
Base62:
ZZZ = 238,328 saving 3 chars
ZZZZ = 14,776,336 saving 4 chars
Worst possible words: all the same as Base36
* Case sensitivity required
Keeping a list of badwords that needs to be checked every time a new ID is generated would be a pain in the ass. I'm also considering letting users choose their own "url slug" when creating a record, just checking for uniqueness, with automated ones just generating something in dec/hex. This would do double duty as it would help with SEO
How do sites like Imgur prevent this? My app is supposed to be "family friendly" as a lot of the sponsors are family safe brands, I'd hate to have someone complain because their daughter's bookmarked event is myDomain.com/slUt