• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Regular expression help

Red Squirrel

No Lifer
I'm working on converting a forum and having issues with nested quotes. In phpbb quote format is the following:

(replacing square brackets with "(" )

Code:
(quote:aaaaaaaaaa="author") text (quote:aaaaaaaaaa)

aaaaaaaaaa is a random alpha numerical string. I managed to figure out how to convert a single quote with this code:

Code:
    $ret =  preg_replace('~\[quote:(.*)="(.*)"\]~', '[quote author="${2}"]', $ret);
    $ret =  preg_replace('~\[/quote:(.*)\]~', '[/quote]', $ret);

But if there is a nested quote, then it somehow actually deletes the first occurance of the first quote start. So I end up with the deepest nested quote but all the others only have a [/quote] at the end as the starts get deleted.

Is there a better way I can do this? Regular expressions arn't really my strength so there's probably a better way of doing it.
 
I have used regular expressions but not for something like this. I would probably use a loop to do this.
This is my psuedo code

foreach line in post
if [quote: is found increment a counter
if [/quote is found decrement counter
while counter > 0
add lines to string array

end for
 
Last edited:
Yes, I would also probably just write code for it. I've done regex stuff over the years, and it invariably ends up with me banging my head against it until it finally does what I want. I built a simple testing program that let me put in example text and modify my regex on the fly to make the process faster.
 
I think you want something like:

Code:
preg_replace('~\[quote:([^=]*)="([^"]*)"\](.*)\[/quote:\1\]~', '[quote author="${2}"]${3}[/quote]', $ret);

Or you could find the text after the first [quote: and stick it into both regexes.
 
I had another idea. If the page conforms to XHTML you might try to load the html output into the XML Dom. This give you loads of flexibility for extracting data.
 
Or, let me try a non-regex way:

PHP:
$quote_id_pos = strpos($ret, "[quote:")+strlen("[quote:");
$quote_author_pos = strpos($ret, '="', $quote_id_pos);
$quote_id = substr($ret, $quote_id_pos, $quote_author_pos - $quote_id_pos);
$quote_author_pos += 2;
$quote_author = substr($ret, $quote_author_pos, strpos($ret, '"]', $quote_author_pos) - $quote_author_pos);
$ret = str_replace("[quote:$quote_id=\"$quote_author\"]", "[quote author=\"$quote_author\"]", $ret);
$ret = str_replace("[/quote:$quote_id]", "[/quote]", $ret);
 
Thanks I'll give these ideas a shot, but if not I might look into just doing it through code as well. In fact if I can programatically figure out what that ID is it will save some grief as well as I can just use it literally instead of through regex.
 
Ok so it turned out to be much simpler to just programmatically get that ID. It's a column in the post table for each post. So I ended up going with this:

Code:
    $ret =  preg_replace('~\[quote:'.$bbcode_uid.'="(.*?)"\]~', '[quote author="${1}"]', $ret);
    $ret =  preg_replace('~\[/quote:'.$bbcode_uid.'\]~', '[/quote]', $ret);

The entire post table row is passed as an argument to that function, figured I'd just standardize on doing that as I may need other elements. Trying to do it in one shot was proving problematic with nested quotes so it was easier to do it as two separate statements.

Even in the phpbb code the uid is always known when doing any preg replaces. Not even sure why they did it this way, I guess it was to not get parse issues if someone actually types out some code.
 
The problem is this...

The non-nested:
aaaa"bbbbb"ccccc"dddd"aaaaaa
is no different than the nested:
aaaa"bbbbb"ccccc"bbbb"aaaaaa

how do you distinguish between the 2 cases?
 
Back
Top