Parentheses are used to limit the scope of alternation
, and to group
multiple characters to witch you can apply quantifiers
In many regular-expression flavors, parentheses can remember text matched
by the subexpression they enclose.
Wouldn't it be nice if we could match one generic word, and then say now match the same thing again
Backreferencing is a regular-expression feature that allows you to match
new text that is the
same as some text matched earlier
in the expression.
Example - double double
We start with \b(the) +(the)\b
and replace "the"
with a regex to match a general word, say [A-Za-z]+
Finaly we replace the second word with the metasequence \1
. New regex \b([a-zA-Z]+) +\1\b
matches anyword anyword
Of course, you can have more
than one set of parentheses. Use \1, \2
, etc to refer to first, second, etc sets.
Since egrep considers each line in isolation
, it isn't able to find when the ending word of one line is repeated at the beginning of the next.
The great escape
How you actually match a characted that a regex would normaly interpret as metacharacter. We use backslahses
The metasequence to match a dot
is a dot preceded by a backslash (\.).
Another example, regex \([a-zA-Z]+\)