Regular Expression is borken

Sep 29, 2004
18,656
67
91
Well, I'm working with Java's built in regular expression classes.

I need to look for expressions that take the form:
1 - 1/2
12 - 1/4
123 - 1/8
etc


I have the following regular expression:
^[0-9]+\s*-\s*(([0-7]/8)|([0-3]/4))|([0-1]/2)$

Why does the following show up as a pattern match:
-12 - 1/2

12 - 1/2 should show up. But the negative symbol that leads in should casue the string to not come up as a match. What is going on here? Is this a Java bug???

Thanks....
 

Thyme

Platinum Member
Nov 30, 2000
2,330
0
0
Are you escaping your backslashes? Also, your parens don't seem to match. I think you want to have all three of the last groups inside one group. Not sure why the -12 1/2 would match, though.
 
Sep 29, 2004
18,656
67
91
It's taken me atleast 4 hours now ... and I just made a breakthrough!

Thyme ... you are probably right about how I am grouping hte 'or' statements.

Again ... Doesn't work:
^[0-9]+\s*-\s*(([0-7]/8)|([0-3]/4))|([0-1]/2)$

BUT:
^[0-9]+\s*-\s*(([0-1]/2)|([0-3]/4))|([0-7]/8)$
works.

All I did is the flip the #/8 and #/2 arguments. Found this by coincidence. So, I am using hte pipe (|, 'or' statement) incorrectly.
 
Sep 29, 2004
18,656
67
91
CONFIRMED!!!!

^[0-9]+\s*-\s*(([0-1]/2)|([0-3]/4)|([0-7]/8))$
WORKS!!!

It was all a nesting problem with the OR statements.

When ORring 3 things, do (A)|(B)|(C) ..... otherwise the equivilent I showed above doesn't work ??????

I think that's a bug ... but I'm not about to check the W3C spec.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
I doubt it's a bug. Regexes get way more complex than that and work correctly so I'm sure whatever it does, it's intentional. And I don't think the w3c tells the java regex designers what standard they should support.

I believe the reason your original example failed not because of the nesting of the fractions, but because one of the pipes was outside any parentheses so you had essentially this:
(^[0-9]+\s*-\s*(([0-7]/8)|([0-3]/4)))I(([0-1]/2)$)
So what matched was just the "([0-1]/2)$" which explains why changing the order fixed it for that example, which happened to end in "1/2".

You could have fixed that by simply adding an extra set of parentheses around the whole fraction part to limit the range of the pipe:
^[0-9]+\s*-\s*((([0-7]/8)|([0-3]/4))I([0-1]/2))$

But you've essentially done that in cleaner fashion in your fix anyway :)

Edit: damn, bold pipes aren't very clear, I replaced them with capital 'i's :confused: