pcre2convert.3 6.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164
  1. .TH PCRE2CONVERT 3 "28 June 2018" "PCRE2 10.32"
  2. .SH NAME
  3. PCRE2 - Perl-compatible regular expressions (revised API)
  4. .SH "EXPERIMENTAL PATTERN CONVERSION FUNCTIONS"
  5. .rs
  6. .sp
  7. This document describes a set of functions that can be used to convert
  8. "foreign" patterns into PCRE2 regular expressions. This facility is currently
  9. experimental, and may be changed in future releases. Two kinds of pattern,
  10. globs and POSIX patterns, are supported.
  11. .
  12. .
  13. .SH "THE CONVERT CONTEXT"
  14. .rs
  15. .sp
  16. .nf
  17. .B pcre2_convert_context *pcre2_convert_context_create(
  18. .B " pcre2_general_context *\fIgcontext\fP);"
  19. .sp
  20. .B pcre2_convert_context *pcre2_convert_context_copy(
  21. .B " pcre2_convert_context *\fIcvcontext\fP);"
  22. .sp
  23. .B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP);
  24. .sp
  25. .B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP,
  26. .B " uint32_t \fIescape_char\fP);"
  27. .sp
  28. .B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP,
  29. .B " uint32_t \fIseparator_char\fP);"
  30. .fi
  31. .sp
  32. A convert context is used to hold parameters that affect the way that pattern
  33. conversion works. Like all PCRE2 contexts, you need to use a context only if
  34. you want to override the defaults. There are the usual create, copy, and free
  35. functions. If custom memory management functions are set in a general context
  36. that is passed to \fBpcre2_convert_context_create()\fP, they are used for all
  37. memory management within the conversion functions.
  38. .P
  39. There are only two parameters in the convert context at present. Both apply
  40. only to glob conversions. The escape character defaults to grave accent under
  41. Windows, otherwise backslash. It can be set to zero, meaning no escape
  42. character, or to any punctuation character with a code point less than 256.
  43. The separator character defaults to backslash under Windows, otherwise forward
  44. slash. It can be set to forward slash, backslash, or dot.
  45. .P
  46. The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if
  47. their second argument is invalid.
  48. .
  49. .
  50. .SH "THE CONVERSION FUNCTION"
  51. .rs
  52. .sp
  53. .nf
  54. .B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP,
  55. .B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP,"
  56. .B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);"
  57. .sp
  58. .B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP);
  59. .fi
  60. .sp
  61. The first two arguments of \fBpcre2_pattern_convert()\fP define the foreign
  62. pattern that is to be converted. The length may be given as
  63. PCRE2_ZERO_TERMINATED. The \fBoptions\fP argument defines how the pattern is to
  64. be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
  65. PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
  66. One or more of the glob options, or one of the following POSIX options must be
  67. set to define the type of conversion that is required:
  68. .sp
  69. PCRE2_CONVERT_GLOB
  70. PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
  71. PCRE2_CONVERT_GLOB_NO_STARSTAR
  72. PCRE2_CONVERT_POSIX_BASIC
  73. PCRE2_CONVERT_POSIX_EXTENDED
  74. .sp
  75. Details of the conversions are given below. The \fBbuffer\fP and \fBblength\fP
  76. arguments define how the output is handled:
  77. .P
  78. If \fBbuffer\fP is NULL, the function just returns the length of the converted
  79. pattern via \fBblength\fP. This is one less than the length of buffer needed,
  80. because a terminating zero is always added to the output.
  81. .P
  82. If \fBbuffer\fP points to a NULL pointer, an output buffer is obtained using
  83. the allocator in the context or \fBmalloc()\fP if no context is supplied. A
  84. pointer to this buffer is placed in the variable to which \fBbuffer\fP points.
  85. When no longer needed the output buffer must be freed by calling
  86. \fBpcre2_converted_pattern_free()\fP. If this function is called with a NULL
  87. argument, it returns immediately without doing anything.
  88. .P
  89. If \fBbuffer\fP points to a non-NULL pointer, \fBblength\fP must be set to the
  90. actual length of the buffer provided (in code units).
  91. .P
  92. In all cases, after successful conversion, the variable pointed to by
  93. \fBblength\fP is updated to the length actually used (in code units), excluding
  94. the terminating zero that is always added.
  95. .P
  96. If an error occurs, the length (via \fBblength\fP) is set to the offset
  97. within the input pattern where the error was detected. Only gross syntax errors
  98. are caught; there are plenty of errors that will get passed on for
  99. \fBpcre2_compile()\fP to discover.
  100. .P
  101. The return from \fBpcre2_pattern_convert()\fP is zero on success or a non-zero
  102. PCRE2 error code. Note that PCRE2 error codes may be positive or negative:
  103. \fBpcre2_compile()\fP uses mostly positive codes and \fBpcre2_match()\fP
  104. negative ones; \fBpcre2_convert()\fP uses existing codes of both kinds. A
  105. textual error message can be obtained by calling
  106. \fBpcre2_get_error_message()\fP.
  107. .
  108. .
  109. .SH "CONVERTING GLOBS"
  110. .rs
  111. .sp
  112. Globs are used to match file names, and consequently have the concept of a
  113. "path separator", which defaults to backslash under Windows and forward slash
  114. otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not
  115. permitted to match separator characters, but the double-star (**) feature
  116. (which does match separators) is supported.
  117. .P
  118. PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
  119. match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the
  120. double-star feature disabled. These options may be given together.
  121. .
  122. .
  123. .SH "CONVERTING POSIX PATTERNS"
  124. .rs
  125. .sp
  126. POSIX defines two kinds of regular expression pattern: basic and extended.
  127. These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
  128. PCRE2_CONVERT_POSIX_EXTENDED, respectively.
  129. .P
  130. In POSIX patterns, backslash is not special in a character class. Unmatched
  131. closing parentheses are treated as literals.
  132. .P
  133. In basic patterns, ? + | {} and () must be escaped to be recognized
  134. as metacharacters outside a character class. If the first character in the
  135. pattern is * it is treated as a literal. ^ is a metacharacter only at the start
  136. of a branch.
  137. .P
  138. In extended patterns, a backslash not in a character class always
  139. makes the next character literal, whatever it is. There are no backreferences.
  140. .P
  141. Note: POSIX mandates that the longest possible match at the first matching
  142. position must be found. This is not what \fBpcre2_match()\fP does; it yields
  143. the first match that is found. An application can use \fBpcre2_dfa_match()\fP
  144. to find the longest match, but that does not support backreferences (but then
  145. neither do POSIX extended patterns).
  146. .
  147. .
  148. .SH AUTHOR
  149. .rs
  150. .sp
  151. .nf
  152. Philip Hazel
  153. University Computing Service
  154. Cambridge, England.
  155. .fi
  156. .
  157. .
  158. .SH REVISION
  159. .rs
  160. .sp
  161. .nf
  162. Last updated: 28 June 2018
  163. Copyright (c) 1997-2018 University of Cambridge.
  164. .fi