Programming
Memory App
Regex



Text to html






New line

If a file contains multiple lines, every line will contain \n or \n\r (windows) at the end. A variable containing text line will look something like: $text = "This is a sample text.\n\r\n\rIt has three lines.\n\rThat's all.";

Separating paragraphs

We might be tempted to use ^$ for separating paraghraphs. Because those refers not to logical line positions, but to the absolute start- and end-of-string positions, the next code won't work. $text = "This is a sample text.\n\r\n\rIt has three lines.\n\rThat's all."; $text =~ s/^$/<p>/g; //won't work print $text;

Multiline modifier

When using multiline /m modifier, the meaning of ^ and $ changes from string related to logical-line related. Also ^ and $ matches only a position, not actual charactes (like \n\r). $text = "This is a sample text.\n\r\n\rIt has three lines.\n\rThat's all."; $text =~ s/^\s+$/<p>/mg; print $text;

Email address link

The basic form of an email address is "username\@hostname" so the regex will look like: $text =~ s/\b(username\@hostname)\b/<a href="mailto:$1">$1<\/a>/g;

Custom delimiters

The first things to notice are the two marked backslashes (\@ and <\/a>). The \@ will be discuss later (Perl requires @ to be escaped). We need to use backslashes to escape forward slashes, used by Perl for search and replace. It's a little ugly, but Perl allows us to pick our own delimiters, so the escape is not necessary anymore. s|regex|replacement|modifiers OR s{regex}{replacement}modifiers $text = "jfriedl\@regex.info"; $text =~ s{(\w+\@\w+(\.\w+))}{<a href="mailto:$1">$1</a>}g; print $text;

Readability modifier /x

Metacharacter \w is not quite appropiate, because it allows ASCII letters or digits, which are not allowed in hostname. There is no way to fit the correct regex onto the page. Perl allows /x modifier, which allows us to rewrite regex on multiple lines. This modifier does two simple but powerful things. First, it causes most whitespaces to be ignored, so you can "free-format" the expression for readability. Secondly, it allows comments with a leading #. It turns most whitespace into an "ignore me" character. $text = "jfriedl\@regex.info"; # search and replace $text =~ s{ # capture the email address to $1 ( # username regex [-a-zA-Z0-9]+ \@ # hosname regex [-a-z0-9]+(?:\.[-_a-z0-9]+)*\.(?:com|edu|info) ) }{<a href="mailto:$1">$1</a>}gix; // x modifier (Wow!) print $text;
Comments
Comments ...