pcrecallout (3) - Linux Manuals
pcrecallout: Perl-compatible regular expressions
NAME
PCRE - Perl-compatible regular expressions
SYNOPSIS
#include <pcre.h>
int (*pcre_callout)(pcre_callout_block *);
int (*pcre16_callout)(pcre16_callout_block *);
int (*pcre32_callout)(pcre32_callout_block *);
DESCRIPTION
PCRE provides a feature called "callout", which is a means of temporarily passing control to the caller of PCRE in the middle of pattern matching. The caller of PCRE provides an external function by putting its entry point in the global variable pcre_callout (pcre16_callout for the 16-bit library, pcre32_callout for the 32-bit library). By default, this variable contains NULL, which disables all calling out.
Within a regular expression, (?C) indicates the points at which the external function is to be called. Different callout points can be identified by putting a number less than 256 after the letter C. The default value is zero. For example, this pattern has two callout points:
If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE
automatically inserts callouts, all with number 255, before each item in the
pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
it is processed as if it were
(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
Notice that there is a callout before and after each parenthesis and
alternation bar. If the pattern contains a conditional group whose condition is
an assertion, an automatic callout is inserted immediately before the
condition. Such a callout may also be inserted explicitly, for example:
This applies only to assertion conditions (because they are themselves
independent groups).
Automatic callouts can be used for tracking the progress of pattern matching.
The
pcretest
program has a pattern qualifier (/C) that sets automatic callouts; when it is
used, the output indicates how the pattern is being matched. This is useful
information when you are trying to optimize the performance of a particular
pattern.
You should be aware that, because of optimizations in the way PCRE compiles and
matches patterns, callouts sometimes do not happen exactly as you might expect.
At compile time, PCRE "auto-possessifies" repeated items when it knows that
what follows cannot be part of the repeat. For example, a+[bc] is compiled as
if it were a++[bc]. The pcretest output when this pattern is anchored and
then applied with automatic callouts to the string "aaaa" is:
This indicates that when matching [bc] fails, there is no backtracking into a+
and therefore the callouts that would be taken for the backtracks do not occur.
You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS
to pcre_compile(), or starting the pattern with (*NO_AUTO_POSSESS). If
this is done in pcretest (using the /O qualifier), the output changes to
this:
This time, when matching [bc] fails, the matcher backtracks into a+ and tries
again, repeatedly, until a+ itself fails.
Other optimizations that provide fast "no match" results also affect callouts.
For example, if the pattern is
PCRE knows that any matching string must contain the letter "d". If the subject
string is "abyz", the lack of "d" means that matching doesn't ever start, and
the callout is never reached. However, with "abyd", though the result is still
no match, the callout is obeyed.
If the pattern is studied, PCRE knows the minimum length of a matching string,
and will immediately give a "no match" return without actually running a match
if the subject is not long enough, or, for unanchored patterns, if it has
been scanned far enough.
You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
option to the matching function, or by starting the pattern with
(*NO_START_OPT). This slows down the matching process, but does ensure that
callouts such as the example above are obeyed.
During matching, when PCRE reaches a callout point, the external function
defined by pcre_callout or pcre[16|32]_callout is called (if it is
set). This applies to both normal and DFA matching. The only argument to the
callout function is a pointer to a pcre_callout or
pcre[16|32]_callout block. These structures contains the following
fields:
MISSING CALLOUTS
THE CALLOUT INTERFACE