Coldfusion Question.. Parsing Strings to find different elements

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Hi Guys,

I have some strings that need to be parsed, and I'm having trouble figuring out the best way to do it. I'll try and best explain what I need and hopefully someone can point me in the right direction. My goal is to be able to take a user inputted string that contains location info (city, state, zip) and parse each element out seperately. Granted, this contains alot of guess work (could this string be a city name? yes/no), but I still need to have something in place. If anybody could explain in CFML code how to achieve each of the parse jobs listed below, I would be very grateful!!!

First, let's assume the string being worked with is a variable named 'string'


Parse Job 1 (find zip code)

-Search 'string' to find zip code.
-Find segment where first character is numeric and length = 6
-Check segment that all 6 characters are numeric
-set variable 'zipstring' equal to segment
-set variable 'string' equal to 'string - segment' so that future operations on string exclude zip code

ex. original string = "st. louis, MO 63101"
ex. updates string = "st. louis, MO"

Parse Job 2 (find state abbr ex. MO)

-Search 'string' to find State Abbreviation
-findsegment where first char is "," (comma) and following string is no more than 3 characters
-remove any spaces from segment ", IL" becomes ",IL"
-remove comma from segment ",IL" becomes "IL"


Parse Job 3 (find state abbr ex. MO)

-findsegment where first char is Alphabet and Length = 2
-Check segment if all char is alphanumeric
 

MrChad

Lifer
Aug 22, 2001
13,507
3
81
Regular expressions are ideal for this. REFind or REFindNoCase will do what you're looking for.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Thanks! Those were the expressions I was leaning towards, but I'm having trouble figuring out how to phrase them to parse out the data I need..

Like... how do I reFind to search for a string of 5 numeric characters?? Do I need to find the first numeric character, and note its place, like 7 from the left, then look at 8 from the left, (see if its a numeric), then 9, 10, 11, 12, searching each spot to determine if its numeric??
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
TY!

If I can't get this figured out by Jan1, I might have to paypal you a few $$ to help!! :p I really need to get this figured out because this is one fo the last functions I need to work before my site is working.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Ok..

<CFSET ziptemp = REFIND("[0-9]{5}","St. Louis, MO 63101")>

<CFOUTPUT>#ziptemp#</CFOUTPUT>

This is just returning the place where the zip starts, '15'

So I guess I would just use 'mid', provide the string, starting spot of 15 and end of 20 (5 digits). and that would strip out the zip. so lets say I would set zip=mid(string)

Then if I wanted to find out what the string would be with the zip removed, I would do a replacelist on the original string, to replace the 'zip' with an empty value. Then that left over value would be the original string minus the zip.

Does this sound correct?
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Ok, I'm not sure why this isn't working. If I put a numeric string smaller than 5 digits, it works, returing 0 for the zip, BUT if I use a numeric string longer than 5 digits, it uses it still. I need to find ONLY strings that are 5 digits in length. But shouldn't the mid function take care of that by providing a start and stop point? It's like the mid function isn't stopping when it's told to.

This below kinda works. It will return 63101 as the zip value. However, if I use 6310101, it will still return 6310101.

<CFSET string = "st. louis, MO 63101">

<CFSET zipindex = REFIND("[0-9]{5}",string)>

<CFIF zipindex gte 1>
<CFSET zip = Mid(string, zipindex, (zipindex+4))>
<CFELSE>
<CFSET zip = 0>

</CFIF>

<CFOUTPUT>#zip#</CFOUTPUT>
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
ok, if i change

<CFSET zip = Mid(string, zipindex, (zipindex+4))>

to

<CFSET zip = Mid(string, zipindex, 5)>

It works.

Now the only problem I have is that if given a string

st. louis, MO 63101

I get 63101

But if I use a string that has a longer sequence of numbers, the script is just taking the first 5 digits..

st. louis, MO 631012345

would yield 63101

So I guess I need to first find numeric strings that are LONGER than a zip code, and remove then from the string.