• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

need help with regular expressions

  • Thread starter Thread starter
  • Start date Start date

I'm trying to make the webcrawler that indexes my website a little better.
It allows me to do meta and title rewrites in the index using regular expressions.

my experience with them is minimal

basically all i need is how to extract the text between two tags

example
<h1 class="headline">herty smerty flip flip</h1>

The syntax to do the rewrite is easy, but i don't know how to extract 'herty smerty flip flip' from the code.
 
This should work.

/<[^>]+>([^<]+)<[^>]+>/

only works if on the same line, and if the text you want to extract doesn't have any < or >'s.
 
let me clarify

the source will be full of lots of stuff, but there will only be one <h1 class="headline"> tag (and its closing)
i want to get the phrase thats between that h1 tag and its closing /h1 tag.

can i modify what you wrote to do that?
 
/<h1 class\=\"headline\">(.*?)<\/h1>/

That might be closer to what you're looking for. Then you want to save the part you extracted (the (.*?)) to a variable, or what?

Rob
 
Back
Top