regex help - find replace issue

cirrrocco

Golden Member
Sep 7, 2004
1,952
78
91
hey guys, I need help with some regex. I have tried using some advanced find and replace but it doesnt seem to help.

so I have this

<html>
content
</html>
<doctype>
<html>
Crap content
</html>

I want to be able to remove everything below </html> for a bunch of files.

I tried using bbedit and notepad++ and I am now not sure if regular find and replace helps.

I checked online and found that regex could possibly help. Is there a search parameter where I can find

</html>
<doctype>
to End of file

and replace with
</html>

Thanks a lot for your help
 

PhatoseAlpha

Platinum Member
Apr 10, 2005
2,131
21
81
Should be as a simple as a properly escaped </html><doctype>followed by .* The dot (any character) will start eating, and since the repeat operator (*) is greedy, it will eat up the entire rest of the file. Then just replace that match.


Wait, no, you'll need to group the dot with a newline character, since dot doesn't match those, then star repeat the group.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Why use a regex for this? I'm curious. Looks like a simple line-oriented file scan to me. Read from one file, scan the line for the tags you need, and write to another file. When you find the last tag close the second file, delete the first, and rename the second.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
Code:
perl -n -i~ -e "print unless($ended);$ended=1 if(/<\/html>/i);" [files]

Replace double-quotes with single-quotes on Linux.
 

play

Junior Member
Dec 2, 2011
14
0
0
Sorry about the bump, but I was looking through some old questions and saw this one was incomplete.

I tested this regex in php, it does exactly what cirrrocco requested:
Code:
<?php 
$sub='blah blah blah</html><doctype>blah';
$sub=preg_replace('%(?s)</html>\W*<doctype>.*%','</html>',$sub);
echo $sub."<br /><br />";
?>