Expert Refresh


1) In regex parenthesis are NOT used to:

2) Can you use parenthesis to group for backreference?

3) Which metasequence is used for backreference?

Parantheses scope
So far, we have seen two uses for parentheses: to limit the scope of alternation, and to group multiple characters to witch you can apply quantifiers.
In many regular-expression flavors, parentheses can "remember" text matched by the subexpression they enclose. Wouldn't it be nice if we could match one generic word, and then say "now match the same thing again"?
Backreferencing is a regular-expression feature that allows you to match new text that is the same as some text matched earlier in the expression.
Example "double double" word
We start with /\b(the) +(the)\b/ and replace "the" with a regex to match a general word, say /[A-Za-z]+/. Finaly we replace the second word with the metasequence \1. New regex /\b([a-zA-Z]+) +\1b/ matches "anyword anyword". In this way we can replace "anyword anyword" with "anyword". PHP example here:
More parentheses
Of course, you can have more than one set of parentheses. Use \1, \2, etc to refer to first, second, etc sets.
It's important to understand its limitations. Since egrep considers each line in isolation, it isn't able to find when the ending word of one line is repeated at the beginning of the next.
The great escape
How you actually match a characted that a regex would normaly interpret as metacharacter. We use backslahses. The metasequence to match a dot is a dot preceded by a backslash (\.). Another example, regex /\([a-zA-Z]+\)/ matches "(very)".