How to Match Multiple Lines using Regex in Perl One-liners
Posted on In Programming, TutorialPerl one-liners with perl’s regular expression statement can be a very powerful text processing tools used as commands in a terminal or a script. By default, the input to the perl one-liner with -p
or -n
options is passed line by line. However, when we want to match multiple lines, it gets us some trouble. In this post we take a look at a technique to match multiple lines using Perl one-liner.
As an example, let’s try to find and remove content between <PRE>
and </PRE>
(both tags included too) if the content contains only new lines (\n
), spaces, and <BR>
/<HR>
tags. A simple regex like <PRE>[\s{<BR>}{<HR>}]*</PRE>
matches such criteria. But it does not match across multiple line (that is \s
does not match \n
). The trick here is to add option -0777
so that the record separator is the char of octal number 777 instead of \n
.
perl -0777 -pe 's|<PRE>[\s{<BR>}{<HR>}]*</PRE>||g'
You can find the meanings of the options to perl
used here from perlrun
manual.
Here is one example of usages of the above one-liner.
$ echo -e "text\n<PRE>\n<BR>\n<HR><HR>\n \n</PRE>more text"
text
<PRE>
<BR>
<HR><HR>
</PRE>more text
$ echo -e "text\n<PRE>\n<BR>\n<HR><HR>\n \n</PRE>more text" |
perl -0777 -pe 's|<PRE>[\s{<BR>}{<HR>}]*</PRE>||g'
text
more text
The same technique can be used for grep
too: How to Grep 2 Lines using grep in Linux.