SQL query question - finding the longest part of a string

Homerboy

Lifer
Mar 1, 2000
30,856
4,974
126
I'm trying to figure out a way to extract the street name from a full street address
While I know there's no 100% fool-proof way of doing this (unless the data is uniformly formatted, which is far from the case), I'm hoping to be able to extract the the street name on most of the entries in my column.
My thought is I want to pull the value of the LONGEST parsed substring in a value

So if the entry is "123 BROADWAY ST # 456"
The longest substring, based on " " being the delimiter of that string is "BROADWAY" so I want my returning value to pull "BROADWAY"

Again, I know it won't work 100% of the time "1234 ELM St." is going to end up returning "1234" but due to the way this data is arranged, I should be able to at least highlight those for the end user who is cleaning up this data.

Does that make sense? Or am I nuts in this line of thinking?
Thanks for any suggestions/help/advice you can provide.
 

purbeast0

No Lifer
Sep 13, 2001
52,870
5,740
126
I don't think that is a good way to do that. Not only does it break in your one example, but if you have a street name that is 2 words, you are screwed. Then if you have Boulevard spelled out instead of Blvd. you are also screwed.

"123 Martin Luther King Boulevard" breaks on multiple levels.

What I would do is do a delimeter on space like you were doing, then just scrap the first and last entry. Then just add all the remaining strings together with a space between and you have the street name.

Most addresses are <number> <street_name> <street_type> so that would handle multiple word addresses.

This isn't fool proof either. It breaks on a P.O. Box address, if that matters. Probably more cases too if I thought more about it.

Also, this isn't really SQL related. You'll want to just pull the entire column of street numbers then run this logic on it outside the context of SQL.

EDIT:

This is super simple logic.

JavaScript:
getStreet(addressString) {
    const split = addressString.split(' ');
    const streetNameArray = split.slice(1, split.length - 2);
    const streetName = streetNameArray.join(' ');
    return streetName;
}

There is no error checking in case split didn't have an array of at least 3, but I'm not sure how you would want to handle those cases anyways but it would be easy to implement whatever you choose.

Could probably do it via regex too but I'm far from a regex expert.
 
Last edited:

mike8675309

Senior member
Jul 17, 2013
507
116
116
Look at your data and try to see how it's formatted? USA Addresses or also Canada or Mexico?

Most USA addresses are
building number street name [street ext...apt#...suite#]
City, State, postal code

If this is a static set of data (not going to change, so you don't need to keep doing this) then review the data for the rules to apply.
Maybe start with doing a first pass where you grab the first section of the string by cutting off the first space, that gets rid of the building number.
Then look at common usages for apartments or suites. From there you could go through and eliminate those from the data.
With those gone the rules for just the street should be easier.

If the data isn't static, fix the system capturing this so it breaks up the data in the format you need, or at least applies a delimiter you can use to break it up into the component parts.
 

pauldun170

Diamond Member
Sep 26, 2011
9,133
5,072
136
I'm trying to figure out a way to extract the street name from a full street address
While I know there's no 100% fool-proof way of doing this (unless the data is uniformly formatted, which is far from the case), I'm hoping to be able to extract the the street name on most of the entries in my column.
My thought is I want to pull the value of the LONGEST parsed substring in a value

So if the entry is "123 BROADWAY ST # 456"
The longest substring, based on " " being the delimiter of that string is "BROADWAY" so I want my returning value to pull "BROADWAY"

Again, I know it won't work 100% of the time "1234 ELM St." is going to end up returning "1234" but due to the way this data is arranged, I should be able to at least highlight those for the end user who is cleaning up this data.

Does that make sense? Or am I nuts in this line of thinking?
Thanks for any suggestions/help/advice you can provide.

When you say full street address do you mean just the street or also including state and ZIP?
If it just the street, then you'll want a function that does a couple of things.

Start off with your string and replace all double spaces with single spaces to in case you got sloppy data,
Then treat the single space as a separator.
So for example
1600 Pennsylvania Avenue
becomes a array\map\list\whatever of

1. 1600
2. Pennsylvania
3. Avenue

Lets say you got some silly data that includes apartments are crap like this
B. Apartment 2, 515 George Street
You can include in you logic to Set the first position to the comma then break it out
becomes a wee array\map\list\whatever of

1. 515
3. George
3. Street


Have a lookup table or whatever the kids like to do nowadays set up with this monstrosity (every.single.street.type.including all variations)
You'll want to slap an upper() on your query on both the data you are searching against and you'll probably want to make sure your reference table is all upper case.

If the position is greater than 1 (because a street will ALWAYS have the name and type) and any of the other rows contain a value from the reference
- Whatever the position contains the street type, the position(s) prior will have the street name.
- If you only have 3 values, then you know 1 has the street number, 2 has the name, 3 has the type
- If you have more than 3, well you can be confident that the first position will contain the street number and that you need to concatenate the blocks between the first and the last with the street type that was identified.

69 Your Mom on a Tuesday lane

1. 69
2. Your
3. Mom
4. on
5. a
6. Tuesday
7. lane

First position assumed to be street number
7th position was identified as a street type
Grab everything in between
"Your Mom on a Tuesday"

What about weird ones?

21 Lane ln
1. 21
2. Lane
3. Ln

Your logic should always choose the block with the highest # in case of multiple results.

21 Jump Street Cove

So you end up with "Jump Street."

Going back to your example

123 BROADWAY ST # 456
1. 123
2. BROADWAY
3. ST
4. #
5. 456

You don't care about anything after 3 and you don't care about 1.
"BROADWAY"
 

pauldun170

Diamond Member
Sep 26, 2011
9,133
5,072
136
When you say full street address do you mean just the street or also including state and ZIP?
If it just the street, then you'll want a function that does a couple of things.

Start off with your string and replace all double spaces with single spaces to in case you got sloppy data,
Then treat the single space as a separator.
So for example
1600 Pennsylvania Avenue
becomes a array\map\list\whatever of

1. 1600
2. Pennsylvania
3. Avenue

Lets say you got some silly data that includes apartments are crap like this
B. Apartment 2, 515 George Street
You can include in you logic to Set the first position to the comma then break it out
becomes a wee array\map\list\whatever of

1. 515
3. George
3. Street


Have a lookup table or whatever the kids like to do nowadays set up with this monstrosity (every.single.street.type.including all variations)
You'll want to slap an upper() on your query on both the data you are searching against and you'll probably want to make sure your reference table is all upper case.

If the position is greater than 1 (because a street will ALWAYS have the name and type) and any of the other rows contain a value from the reference
- Whatever the position contains the street type, the position(s) prior will have the street name.
- If you only have 3 values, then you know 1 has the street number, 2 has the name, 3 has the type
- If you have more than 3, well you can be confident that the first position will contain the street number and that you need to concatenate the blocks between the first and the last with the street type that was identified.

69 Your Mom on a Tuesday lane

1. 69
2. Your
3. Mom
4. on
5. a
6. Tuesday
7. lane

First position assumed to be street number
7th position was identified as a street type
Grab everything in between
"Your Mom on a Tuesday"

What about weird ones?

21 Lane ln
1. 21
2. Lane
3. Ln

Your logic should always choose the block with the highest # in case of multiple results.

21 Jump Street Cove

So you end up with "Jump Street."

Going back to your example

123 BROADWAY ST # 456
1. 123
2. BROADWAY
3. ST
4. #
5. 456

You don't care about anything after 3 and you don't care about 1.
"BROADWAY"

Posting to old thread because apparently I'm bored