Text Parser

Tekari · Mar 15, 2006

I have a massive text file at work that is setup as one long string of text. Basically, it's a simple text file you can open in notepad with a string of text of a bunch of different accounts.

We are looking through the data for accounts only with a certain string sequence in them. Say, JK101. Once we find JK101 we then have to back track and fine the start and end of the account. JK101 is always in the middle and the accounts always start and end with CP00 but because they are all in one giant string and the file is HUGE it would take quite a lot of man hours to CTRL-F and go through each thing individually to find all the data.

Does anyone know of a program out there that can go through the text and once it finds JK101 parse out that information to another notepad/word/excell/ whatever document?

If I only knew how to program I'm sure it wouldnt be too difficult, but through my searching on google so far I have not found a parser yet that will allow for a start and end input that I would need.

Any help would be greatly appreciated.

Thanks a bunch!

EagleKeeper · Mar 15, 2006

Check out sourceforge.net , twocows.com or other similar sites. Many people will have small utilities available on those sites that may do the trick.

It should take only a couple of hours to throw together a package that will support your requirements.

It may be better to get the information into some tool that is much more readable and maintain it from there.

kamper · Mar 15, 2006

Sounds like it would be fairly simple to implement in perl. If you post more specific details of exactly what the data can be (and a sample), I'm sure someone here can whip something up in no time. I'm just a perl beginner, but I'd give it a whirl if it's as simple as it sounds.

agnitrate · Mar 15, 2006

Is the file format like this:

CP00 asdfasfdzxczxcvzxcvJK101
CP00
CP00
adsfasdfscvzxcvlkasfjasdklfjJK101adsfasdCP00
CP00
asdflasflasldfasdlkfasfdsafdaJK101asdfasdfdasfCP00

?

Is there a unique CP00 for each account or is it like this:

CP00oneadsfdasfafasJK101dsfasdlfasCP00
asdfalkfdafdsJK101bbbbbbbCP00asdfadsfdsafdsafadsJK101afafasdfaasCP00

?

agnitrate · Mar 15, 2006

Assuming it's the former where they are delineated properly the followin code works. Run the code with like test.pl file_you_want_to_parse and it will output each account under a file called outputX where X is the number of the account.

I'm a perl newbie, so I bet notfred will have a much nicer solution. This is just a free bump with a potential solution.

notfred · Mar 15, 2006

It's amazing how BAD some of the people on this forum are with software considering they're the IT people at actual businesses. I know I sound like a complete asshole, but come on, these are simple problems. You can't find all the lines of text containing a certain string? Can I have your jobs?

One line in unix. replace "file.txt" with the name of your file.

cat file.txt | perl -e 'while(<> ){s/CP00/\nCP00/g; print}' | grep JK101

EagleKeeper · Mar 15, 2006

Originally posted by: notfred
It's amazing how BAD some of the people on this forum are with software considering they're the IT people at actual businesses. I know I sound like a complete asshole, but come on, these are simple problems. You can't find all the lines of text containing a certain string? Can I have your jobs?

One line in unix. replace "file.txt" with the name of your file.

cat file.txt | perl -e 'while(<> ){s/CP00/\nCP00/g; print}' | grep JK101

Will that run on a Windows platform in some form. The OP referenced "notpad", implying that it may be Windows.

Also, your comments on the solution indicates "lines of text".
The OP states one line of text.

Is it still valid with my two question?

notfred · Mar 15, 2006

Originally posted by: EagleKeeper

Originally posted by: notfred
It's amazing how BAD some of the people on this forum are with software considering they're the IT people at actual businesses. I know I sound like a complete asshole, but come on, these are simple problems. You can't find all the lines of text containing a certain string? Can I have your jobs?

One line in unix. replace "file.txt" with the name of your file.

cat file.txt | perl -e 'while(<> ){s/CP00/\nCP00/g; print}' | grep JK101

Click to expand...

Will that run on a Windows platform in some form. The OP referenced "notpad", implying that it may be Windows.

Also, your comments on the solution indicates "lines of text".
The OP states one line of text.

Is it still valid with my two question?

It'll run on windows with cygwin installed.
It creates lines of text by inserting newline characters before each occurance of "CP00".

Kaeishiwaza · Mar 15, 2006

Dag Notfred beat me to it. I woulda done 3 lines of perl instead of using UNIX commands, but his solution is about as clean as it gets.

Tekari · Mar 15, 2006

Thanks a bunch guys, thats exactly what I needed. As a disclaimer though, I am not an IT guy. I have nothing to do with anything like that. Which is the reason I came here to ask, because I knew most of the people on this forum were very fluent in all of this stuff. I know really basic stuff, but am just trying to help out everyone at work by creating a faster solution. Thanks again though, much appreciated!

Text Parser

Tekari

Junior Member

EagleKeeper

Discussion Club Moderator<br>Elite Member

kamper

Diamond Member

agnitrate

Diamond Member

agnitrate

Diamond Member

notfred

Lifer

EagleKeeper

Discussion Club Moderator<br>Elite Member

notfred

Lifer

Kaeishiwaza

Member

Tekari

Junior Member

TRENDING THREADS