regex help - find replace issue

cirrrocco · Jun 13, 2011

hey guys, I need help with some regex. I have tried using some advanced find and replace but it doesnt seem to help.

so I have this

<html>
content
</html>
<doctype>
<html>
Crap content
</html>

I want to be able to remove everything below </html> for a bunch of files.

I tried using bbedit and notepad++ and I am now not sure if regular find and replace helps.

I checked online and found that regex could possibly help. Is there a search parameter where I can find

</html>
<doctype>
to End of file

and replace with
</html>

Thanks a lot for your help

PhatoseAlpha · Jun 13, 2011

Should be as a simple as a properly escaped </html><doctype>followed by .* The dot (any character) will start eating, and since the repeat operator (*) is greedy, it will eat up the entire rest of the file. Then just replace that match.

Wait, no, you'll need to group the dot with a newline character, since dot doesn't match those, then star repeat the group.

Markbnj · Jun 13, 2011

Why use a regex for this? I'm curious. Looks like a simple line-oriented file scan to me. Read from one file, scan the line for the tags you need, and write to another file. When you find the last tag close the second file, delete the first, and rename the second.

Ken g6 · Jun 13, 2011

Code:

perl -n -i~ -e "print unless($ended);$ended=1 if(/<\/html>/i);" [files]

Replace double-quotes with single-quotes on Linux.

play · Dec 2, 2011

Sorry about the bump, but I was looking through some old questions and saw this one was incomplete.

I tested this regex in php, it does exactly what cirrrocco requested:

Code:

<?php 
$sub='blah blah blah</html><doctype>blah';
$sub=preg_replace('%(?s)</html>\W*<doctype>.*%','</html>',$sub);
echo $sub."<br /><br />";
?>

Search

regex help - find replace issue

cirrrocco

Golden Member

PhatoseAlpha

Platinum Member

Markbnj

Elite Member <br>Moderator Emeritus

Ken g6

Programming Moderator, Elite Member

play

Junior Member

TRENDING THREADS