Expert Refresh

Text to html

Perl

Search and replace
Non-capturing
Shorthands
Lookaround
Text to html


0.0% 100%
0 pages 58 questions
1) Perl modifier to use on multiple lines




2) Permited delimiters





3) Which is the readability modifier






New line
If a file contains multiple lines, every line will contain \n or \n\r (windows) at the end. A variable containing text line will look something like:
$text = "This is a sample text.\n\r\n\rIt has three lines.\n\rThat's all."; Separating paragraphs
We might be tempted to use /^$/ for separating paraghraphs. BBcause those refers not to logical line positions, but to the absolute start- and end-of-string positions, the next code won't work.
$text = "This is a sample text.\n\r\n\rIt has three lines.\n\rThat's all."; $text =~ s/^$/<p>/g; #won't work print $text; Multiline modifier
When using multiline /m modifier, the meaning of ^ and $ changes from string related to logical-line related. Also ^ and $ matches only a position, not actual charactes (like \n\r).
$text = "This is a sample text.\n\r\n\rIt has three lines.\n\rThat's all."; $text =~ s/^\s+$/<p>/mg; print $text; Email address link
The basic form of an email address is "username\@hostname" so the regex will look like: $text =~ s/\b(username\@hostname)\b/<a href="mailto:$1">$1<\/a>/g;
Custom delimiters
The first things to notice are the two marked backslashes (\@ and <\/a>). The \@ will be discuss later (Perl requires @ to be escaped). We need to use backslashes to escape forward slashes, used by Perl for search and replace. It's a little ugly, but Perl allows us to pick our own delimiters, so the escape is not necessary anymore. s|regex|replacement|modifiers OR s{regex}{replacement}modifiers
$text = "jfriedl\@regex.info"; $text =~ s{(\w+\@\w+(\.\w+))}{<a href="mailto:$1">$1</a>}g; print $text; Readability modifier /x
Metacharacter \w is not quite appropiate, because it allows ASCII letters or digits, which are not allowed in hostname. There is no way to fit the correct regex onto the page. Perl allows /x modifier, which allows us to rewrite regex on multiple lines. This modifier does two simple but powerful things. First, it causes most whitespaces to be ignored, so you can "free-format" the expression for readability. Secondly, it allows comments with a leading #. It turns most whitespace into an "ignore me" character.
$text = "jfriedl\@regex.info"; #search and replace $text =~ s{ #capture the email address to $1 ( #username regex [-a-zA-Z0-9]+ \@ #hosname regex [-a-z0-9]+(?:\.[-_a-z0-9]+)*\.(?:com|edu|info) ) }{<a href="mailto:$1">$1</a>}gix; #x modifier (Wow!) print $text;


References