• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How to return value of matching regex instead of entire line?

Given a sample of my log file that is constantly written to:

11/07/2011 12:09:20.115783, x5994, End PG01 (panel TIMG093) Txntime 0.411531 sec, pid 2864241
11/07/2011 12:09:20.596116, x5994, Begin PG01, pid 2295144
11/07/2011 12:09:20.750298, x11kxk, Begin PG01, pid 2213044
11/07/2011 12:09:20.830860, x4026, Begin PM01, pid 681707
11/07/2011 12:09:21.050209, x11kxk, End PG01 (panel TIMG020) Txntime 0.299911 sec, pid 2213044
11/07/2011 12:09:21.291839, x10913, Begin PD20, pid 558718
11/07/2011 12:09:21.354671, x4026, End PM01 (panel TIMM010) Txntime 0.523811 sec, pid 681707
11/07/2011 12:09:21.613560, x10913, End PD20 (panel TIMD20G) Txntime 0.321721 sec, pid 558718
11/07/2011 12:09:21.661077, x11epi, Begin PD08, pid 599314
11/07/2011 12:09:21.871423, x5994, End PG01 (panel TIMG093) Txntime 1.275307 sec, pid 2295144
11/07/2011 12:09:21.923036, x4797, Begin PX27, pid 2327543
11/07/2011 12:09:22.091868, x4835, Begin PASS, pid 625412
11/07/2011 12:09:22.198136, x11epi, End PD08 (panel TIMD081) Txntime 0.537059 sec, pid 599314
11/07/2011 12:09:22.379719, x5994, Begin PG01, pid 710105
11/07/2011 12:09:22.526667, x4797, End PX27 (panel TIMX270) Txntime 0.603631 sec, pid 2327543
11/07/2011 12:09:22.964977, x5994, End PG01 (panel TIMG093) Txntime 0.585258 sec, pid 710105
11/07/2011 12:09:23.426903, x10913, Begin PD20, pid 494923
11/07/2011 12:09:23.884180, x4026, Begin PM01, pid 542721
11/07/2011 12:09:23.888926, x11epi, Begin PD08, pid 740103
11/07/2011 12:09:23.935553, x4835, End PASS (panel ) Txntime 1.843685 sec, pid 625412
I want to return the panel ID (TIMX270, TIMD081, etc.). I've made it this far.

tail -f $infile | sed -n '/TIM[A-Z][0-9][0-9][0-9]/p'
But, this will only make the output like this.

11/07/2011 11:56:47.652304, x6278, End PM10 (panel TIMM100) Txntime 0.681214 sec, pid 1739722
11/07/2011 11:56:47.753127, x8172, End PD03 (panel TIMD030) Txntime 0.372174 sec, pid 1798617
11/07/2011 11:56:47.975405, x8172, End PX00 (panel TIMX000) Txntime 0.198091 sec, pid 1588831
11/07/2011 11:56:48.655058, x6188, End PM10 (panel TIMM100) Txntime 0.804459 sec, pid 2006503
11/07/2011 11:56:50.978475, x5641, End PN33 (panel TIMN331) Txntime 0.470516 sec, pid 1459044
11/07/2011 11:56:51.500535, x7600, End PM10 (panel TIMM102) Txntime 0.451346 sec, pid 1564634
11/07/2011 11:56:51.609356, x11kxk, End PG01 (panel TIMG020) Txntime 0.394369 sec, pid 1782118
11/07/2011 11:56:51.888329, x4797, End PD08 (panel TIMD081) Txntime 1.062310 sec, pid 1568333
My desired output is:

TIMM102
TIMG020
TIMD081
...
I tried piping the command out to a cut command, but it didn't work (the panel ID is in the same character location). I would prefer to do this within awk/sed/perl so that it can remain robust. In the future we will be doing this for IP addresses.
 
Unsure of what language you are scripting in for capabilities, but.

  • For each string, locate the TIM strimg
  • Grab the next 4 characters and form an output string that you can work with.
 
Code:
tail -f $infile | grep "TIM[A-Z][0-9][0-9][0-9]" | sed -e 's/^.*\(TIM[A-Z][0-9][0-9][0-9]\).*$/\1/'

Beware if there are two TIM* things on one line. Sed doesn't do non-greedy wildcards, so you'll get the last.

Edit: Did you want unique entires. That's pretty easy to do with a Perl hash.

Edit2:
Code:
tail -f $infile | sed -n 's/^.*\(TIM[A-Z][0-9][0-9][0-9]\).*$/\1/p'
Huh, sed and grep in one command. Learn something new every day!
 
Last edited:
Code:
tail -f $infile | grep "TIM[A-Z][0-9][0-9][0-9]" | sed -e 's/^.*\(TIM[A-Z][0-9][0-9][0-9]\).*$/\1/'
Beware if there are two TIM* things on one line. Sed doesn't do non-greedy wildcards, so you'll get the last.

Edit: Did you want unique entires. That's pretty easy to do with a Perl hash.
Ken, your code is something I tried to attempt earlier. I had the .* and \1/ in a previous revision but I don't really understand what they mean. I thought the \1/ said to only output the matched value. But the .* and the escaped back slashes are blowing my mind.

I can't get it to work and I'm not sure why there is no output.

EDIT:

Nevermind, I tried removing the grep and it sort of works. Now my output looks like this:

TIMG015
11/07/2011 14:14:52.564827, x6869, Begin PM10, pid 1220124
TIMD200
TIMP210
11/07/2011 14:14:52.805283, x5424, Begin PM10, pid 1017144
TIMI061
11/07/2011 14:14:52.894918, x6869, End PM10 (panel TIMM10B) Txntime 0.330091 sec, pid 1220124
11/07/2011 14:14:53.365499, x5424, End PM10 (panel TIMM10A) Txntime 0.560216 sec, pid 1017144
11/07/2011 14:14:53.372030, x5424, Begin PX00, pid 1013044
TIMX000
11/07/2011 14:14:54.096242, x3882, Begin PX28, pid 1039242
11/07/2011 14:14:54.250906, x7564, Begin PD20, pid 1438901
11/07/2011 14:14:54.426720, x9706, Begin PP20, pid 1439001
Is this happening because the script can't keep up? EDIT again. Nevermind, I changed my regex to include the [0-9A-Z].

Still not sure about the other lines, though...
 
Last edited:
\(.*\) means:
Start a capturing group.
Include anything after that.
End a capturing group.
\1 returns the first capturing group's contents.

I've been editing my previous post and added a better version at the end. For unique entries with Perl:
Code:
tail -f $infile | perl -ne 'if(/TIM[A-Z][0-9][0-9][0-9]/) { $seen{$&} || print $&, "\n"; $seen{$&} = 1;}'
 
Last edited:
Not quite as elegant as a one-liner, but this perl script should do the trick.

Code:
my $filein = "new";
open (my $fh, "<", $filein) or die "Unable to open $filein: $!\n";
my $fileout = "test.txt";
open (my $fo, ">", $fileout) or die "Unable to open $fileout): $!\n";

for (<$fh>)
{
    next if s/.*Begin.*\n//i;
    s/^.+panel\s(.*?)\).*$/$1/i;
    print $fo $_;
}
Just replace the $filein and $fileout to whatever files you want to read/write. You will get blank lines on the "PASS" line, but everything else should work fine.
 
@Ken

Code:
(/TIM[A-Z][0-9][0-9][0-9]/)

Won't work too well because some of the codes have Alpha in the last slot.

Code:
(/TIM\w{4}/)
or
(/TIM[A-Z0-9]{4}/)

Would probably work out better. Also, I couldn't get your code to run 🙁
 
Any of them. Are you just running from command line? I replace $infile or whatever it was with the name of a test file. More than likely its a pebkac error on my part 😀
 
Back
Top