123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164 |
- .TH PCRE2CONVERT 3 "28 June 2018" "PCRE2 10.32"
- .SH NAME
- PCRE2 - Perl-compatible regular expressions (revised API)
- .SH "EXPERIMENTAL PATTERN CONVERSION FUNCTIONS"
- .rs
- .sp
- This document describes a set of functions that can be used to convert
- "foreign" patterns into PCRE2 regular expressions. This facility is currently
- experimental, and may be changed in future releases. Two kinds of pattern,
- globs and POSIX patterns, are supported.
- .
- .
- .SH "THE CONVERT CONTEXT"
- .rs
- .sp
- .nf
- .B pcre2_convert_context *pcre2_convert_context_create(
- .B " pcre2_general_context *\fIgcontext\fP);"
- .sp
- .B pcre2_convert_context *pcre2_convert_context_copy(
- .B " pcre2_convert_context *\fIcvcontext\fP);"
- .sp
- .B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP);
- .sp
- .B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP,
- .B " uint32_t \fIescape_char\fP);"
- .sp
- .B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP,
- .B " uint32_t \fIseparator_char\fP);"
- .fi
- .sp
- A convert context is used to hold parameters that affect the way that pattern
- conversion works. Like all PCRE2 contexts, you need to use a context only if
- you want to override the defaults. There are the usual create, copy, and free
- functions. If custom memory management functions are set in a general context
- that is passed to \fBpcre2_convert_context_create()\fP, they are used for all
- memory management within the conversion functions.
- .P
- There are only two parameters in the convert context at present. Both apply
- only to glob conversions. The escape character defaults to grave accent under
- Windows, otherwise backslash. It can be set to zero, meaning no escape
- character, or to any punctuation character with a code point less than 256.
- The separator character defaults to backslash under Windows, otherwise forward
- slash. It can be set to forward slash, backslash, or dot.
- .P
- The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if
- their second argument is invalid.
- .
- .
- .SH "THE CONVERSION FUNCTION"
- .rs
- .sp
- .nf
- .B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP,
- .B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP,"
- .B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);"
- .sp
- .B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP);
- .fi
- .sp
- The first two arguments of \fBpcre2_pattern_convert()\fP define the foreign
- pattern that is to be converted. The length may be given as
- PCRE2_ZERO_TERMINATED. The \fBoptions\fP argument defines how the pattern is to
- be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
- PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
- One or more of the glob options, or one of the following POSIX options must be
- set to define the type of conversion that is required:
- .sp
- PCRE2_CONVERT_GLOB
- PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
- PCRE2_CONVERT_GLOB_NO_STARSTAR
- PCRE2_CONVERT_POSIX_BASIC
- PCRE2_CONVERT_POSIX_EXTENDED
- .sp
- Details of the conversions are given below. The \fBbuffer\fP and \fBblength\fP
- arguments define how the output is handled:
- .P
- If \fBbuffer\fP is NULL, the function just returns the length of the converted
- pattern via \fBblength\fP. This is one less than the length of buffer needed,
- because a terminating zero is always added to the output.
- .P
- If \fBbuffer\fP points to a NULL pointer, an output buffer is obtained using
- the allocator in the context or \fBmalloc()\fP if no context is supplied. A
- pointer to this buffer is placed in the variable to which \fBbuffer\fP points.
- When no longer needed the output buffer must be freed by calling
- \fBpcre2_converted_pattern_free()\fP. If this function is called with a NULL
- argument, it returns immediately without doing anything.
- .P
- If \fBbuffer\fP points to a non-NULL pointer, \fBblength\fP must be set to the
- actual length of the buffer provided (in code units).
- .P
- In all cases, after successful conversion, the variable pointed to by
- \fBblength\fP is updated to the length actually used (in code units), excluding
- the terminating zero that is always added.
- .P
- If an error occurs, the length (via \fBblength\fP) is set to the offset
- within the input pattern where the error was detected. Only gross syntax errors
- are caught; there are plenty of errors that will get passed on for
- \fBpcre2_compile()\fP to discover.
- .P
- The return from \fBpcre2_pattern_convert()\fP is zero on success or a non-zero
- PCRE2 error code. Note that PCRE2 error codes may be positive or negative:
- \fBpcre2_compile()\fP uses mostly positive codes and \fBpcre2_match()\fP
- negative ones; \fBpcre2_convert()\fP uses existing codes of both kinds. A
- textual error message can be obtained by calling
- \fBpcre2_get_error_message()\fP.
- .
- .
- .SH "CONVERTING GLOBS"
- .rs
- .sp
- Globs are used to match file names, and consequently have the concept of a
- "path separator", which defaults to backslash under Windows and forward slash
- otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not
- permitted to match separator characters, but the double-star (**) feature
- (which does match separators) is supported.
- .P
- PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
- match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the
- double-star feature disabled. These options may be given together.
- .
- .
- .SH "CONVERTING POSIX PATTERNS"
- .rs
- .sp
- POSIX defines two kinds of regular expression pattern: basic and extended.
- These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
- PCRE2_CONVERT_POSIX_EXTENDED, respectively.
- .P
- In POSIX patterns, backslash is not special in a character class. Unmatched
- closing parentheses are treated as literals.
- .P
- In basic patterns, ? + | {} and () must be escaped to be recognized
- as metacharacters outside a character class. If the first character in the
- pattern is * it is treated as a literal. ^ is a metacharacter only at the start
- of a branch.
- .P
- In extended patterns, a backslash not in a character class always
- makes the next character literal, whatever it is. There are no backreferences.
- .P
- Note: POSIX mandates that the longest possible match at the first matching
- position must be found. This is not what \fBpcre2_match()\fP does; it yields
- the first match that is found. An application can use \fBpcre2_dfa_match()\fP
- to find the longest match, but that does not support backreferences (but then
- neither do POSIX extended patterns).
- .
- .
- .SH AUTHOR
- .rs
- .sp
- .nf
- Philip Hazel
- University Computing Service
- Cambridge, England.
- .fi
- .
- .
- .SH REVISION
- .rs
- .sp
- .nf
- Last updated: 28 June 2018
- Copyright (c) 1997-2018 University of Cambridge.
- .fi
|