What does the regex (?a)(b)\2(c) match?


What does the regex (?a)(b)\2(c) match?

What does this regex match?

It doesn’t match abc, abac, abbc, abcc, or ab\x02c and it doesn’t throw an exception.

If you take off the (c) it matches aba. My understanding is that unnamed captures get numbered first, and then the named captures. So (c) should get 2, except that I try to back-reference it before it’s defined, so I thought maybe it would match a and a would get renumbered when it hits c, but that doesn’t appear to be the case either.


Solution 1:

\2 in your regex refers to the (c) group, as your experiments have shown. Sadly, your regex never matches anything.

You can find reference in the documentation (though it isn’t too clear, and seems to be followed by an unrelated example):

If a group has not captured any substrings, a backreference to that group is undefined and never matches.

Solution 2:

It matches “abbc” for me. I am using Perl regex flavor and I get “a” as the group 1, the “b” as group 2 and the c as group 3. the \2 refers to the 2nd group which is “b”

(?<n>a) – A named capture group: the name is “n” the regex is “a”
(b)\2 – Capture group matching “b” then a reference to the second group which is “b”
(c) – third capture group matching “c”

Solution 3:

a’s backreference is ‘n’. b’s reference is ‘1’ and the third is ‘2’

(?<n>a)(b)\1(c) matches abbc where n is a, 1 is b and 2 is c

it numbers unnamed back references with 1 and counts up. Non-capturing parentheses are not numbered. So, it can’t match anything.

Regular-Expressions.info on Brackets