Expert Refresh

Lookaround

1) Lookahead metasequence




2) Lookbehind metasequence





3) Negated lookahead metasequence








Lookaround metasequences
Is a much more general construct than the special word boundary and anchors.
Lookahead
One type of lookaround, called lookahead, peeks forward in the text (toward the right) to see if its subexpression can match. Positive lookahead is specified with the special sequence (?= ), such as with (?=d) which is successful at positions where a digit comes next.
Lookbehind
Another type of lookaround is lookbehind, which look back (toward the left). It's given with the special sequence (?<= ), such as (?<=d), which is successful at positions with a digit to the left.
Position
An important thing to understand is that they don't actually "consume" any text. The regex /Jeffrey/ matches Jeffrey in "Jeffrey Friedl", but the same regex withing lookahead, (?=Jeffrey) matches only the location (or position) before Jeffrey.
Order
It's also important to realize that the order in which they're combined is very important. Jeff(?=Jeffrey) doesn't match "by Jeffrey Friedl", it matches "Jeff" only if followed immediately by "Jeffrey".
Open parenthesis
Therea are a number of special "open parenthesis" sequences, but they all begin with the two-character sequence "(?". We've already seen group-but-don't-capture "(?: )", lookahead "(?= )", lookbehind "(?<= )".
Four types of lookaround
Positive Lookahead (?= ) successful if can MATCH to the RIGHT Positive Lookbehind (?<= ) successful if can MATCH to the LEFT Negative Lookahead (?! ) successful if can NOT match to the RIGHT Negative Lookbehind (?<! ) successful if can NOT match to the LEFT
Common mistake
You might think that D (something not a digit) is the same as (?!d). Remember, with D something is required, while with (?!d) is not
Lookaround Examples 1) Matches "Jeff" only if it is part of "Jeffrey" $var = "Jeffrey Friedl"; $var =~ s/(?=Jeffrey)(Jeff)/by $1/; print $var; #Outpus: by Jeffrey Friedl $var = "Thomas Jefferson"; $var =~ s/(?=Jeffrey)(Jeff)/by $1/; print $var; #doesn't match 2) Replace "Jeffs" with "Jeff's" (with lookahead) $var = "Jeffs articles"; $var =~ s/\bJeff(?=s\b)/Jeff'/g; print $var; #Outputs: Jeff's articles 3) Replace "Jeffs" with "Jeff's" (with lookbehind) $var = "Jeffs articles"; $var =~ s/(?<=\bJeff)(?=s\b)/'/g; print $var; #Outputs: Jeff's articles 4) Commafying numbers $var = "The population of 2298444215 is growing"; $var =~ s/(?<=\d)(?=(\d\d\d)+)/,/g; print $var . "\n"; #Outputs: 2,2,9,8,4,4,4,215 4.1) If we add \b it works $var = "The population of 2298444215 is growing"; $var =~ s/(?<=\d)(?=(\d\d\d)+\b)/,/g; print $var . "\n"; #Outputs: 2,2,9,8,4,4,4,215 4.2) But it doesn't match something like: $var = "12345Hz"; $var =~ s/(?<=\d)(?=(\d\d\d)+\b)/,/g; print $var . "\n"; #Outputs: 12345Hz 4.3) We use (?!\d) as three digits boundary $var = "12345Hz"; $var =~ s/(?<=\d)(?=(\d\d\d)+(?!\d))/,/g; print $var . "\n"; #Outputs: 12,345Hz


References