PageRenderTime 33ms CodeModel.GetById 8ms app.highlight 16ms RepoModel.GetById 2ms app.codeStats 0ms

/vendor/pcre/doc/html/pcreposix.html

http://github.com/feyeleanor/RubyGoLightly
HTML | 260 lines | 258 code | 2 blank | 0 comment | 0 complexity | 74cb887c5d86217827254a3be217a00a MD5 | raw file
  1<html>
  2<head>
  3<title>pcreposix specification</title>
  4</head>
  5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
  6<h1>pcreposix man page</h1>
  7<p>
  8Return to the <a href="index.html">PCRE index page</a>.
  9</p>
 10<p>
 11This page is part of the PCRE HTML documentation. It was generated automatically
 12from the original man page. If there is any nonsense in it, please consult the
 13man page, in case the conversion went wrong.
 14<br>
 15<ul>
 16<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
 17<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
 18<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
 19<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
 20<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
 21<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
 22<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
 23<li><a name="TOC8" href="#SEC8">AUTHOR</a>
 24<li><a name="TOC9" href="#SEC9">REVISION</a>
 25</ul>
 26<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
 27<P>
 28<b>#include &#60;pcreposix.h&#62;</b>
 29</P>
 30<P>
 31<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
 32<b>int <i>cflags</i>);</b>
 33</P>
 34<P>
 35<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
 36<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
 37</P>
 38<P>
 39<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
 40<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
 41</P>
 42<P>
 43<b>void regfree(regex_t *<i>preg</i>);</b>
 44</P>
 45<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
 46<P>
 47This set of functions provides a POSIX-style API to the PCRE regular expression
 48package. See the
 49<a href="pcreapi.html"><b>pcreapi</b></a>
 50documentation for a description of PCRE's native API, which contains much
 51additional functionality.
 52</P>
 53<P>
 54The functions described here are just wrapper functions that ultimately call
 55the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
 56header file, and on Unix systems the library itself is called
 57<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
 58command for linking an application that uses them. Because the POSIX functions
 59call the native ones, it is also necessary to add <b>-lpcre</b>.
 60</P>
 61<P>
 62I have implemented only those option bits that can be reasonably mapped to PCRE
 63native options. In addition, the option REG_EXTENDED is defined with the value
 64zero. This has no effect, but since programs that are written to the POSIX
 65interface often use it, this makes it easier to slot in PCRE as a replacement
 66library. Other POSIX options are not even defined.
 67</P>
 68<P>
 69When PCRE is called via these functions, it is only the API that is POSIX-like
 70in style. The syntax and semantics of the regular expressions themselves are
 71still those of Perl, subject to the setting of various PCRE options, as
 72described below. "POSIX-like in style" means that the API approximates to the
 73POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
 74domains it is probably even less compatible.
 75</P>
 76<P>
 77The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
 78potential clash with other POSIX libraries. It can, of course, be renamed or
 79aliased as <b>regex.h</b>, which is the "correct" name. It provides two
 80structure types, <i>regex_t</i> for compiled internal forms, and
 81<i>regmatch_t</i> for returning captured substrings. It also defines some
 82constants whose names start with "REG_"; these are used for setting options and
 83identifying error codes.
 84</P>
 85<P>
 86</P>
 87<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
 88<P>
 89The function <b>regcomp()</b> is called to compile a pattern into an
 90internal form. The pattern is a C string terminated by a binary zero, and
 91is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
 92to a <b>regex_t</b> structure that is used as a base for storing information
 93about the compiled regular expression.
 94</P>
 95<P>
 96The argument <i>cflags</i> is either zero, or contains one or more of the bits
 97defined by the following macros:
 98<pre>
 99  REG_DOTALL
100</pre>
101The PCRE_DOTALL option is set when the regular expression is passed for
102compilation to the native function. Note that REG_DOTALL is not part of the
103POSIX standard.
104<pre>
105  REG_ICASE
106</pre>
107The PCRE_CASELESS option is set when the regular expression is passed for
108compilation to the native function.
109<pre>
110  REG_NEWLINE
111</pre>
112The PCRE_MULTILINE option is set when the regular expression is passed for
113compilation to the native function. Note that this does <i>not</i> mimic the
114defined POSIX behaviour for REG_NEWLINE (see the following section).
115<pre>
116  REG_NOSUB
117</pre>
118The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
119for compilation to the native function. In addition, when a pattern that is
120compiled with this flag is passed to <b>regexec()</b> for matching, the
121<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
122are returned.
123<pre>
124  REG_UTF8
125</pre>
126The PCRE_UTF8 option is set when the regular expression is passed for
127compilation to the native function. This causes the pattern itself and all data
128strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
129is not part of the POSIX standard.
130</P>
131<P>
132In the absence of these flags, no options are passed to the native function.
133This means the the regex is compiled with PCRE default semantics. In
134particular, the way it handles newline characters in the subject string is the
135Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
136<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
137newlines are matched by . (they aren't) or by a negative class such as [^a]
138(they are).
139</P>
140<P>
141The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
142<i>preg</i> structure is filled in on success, and one member of the structure
143is public: <i>re_nsub</i> contains the number of capturing subpatterns in
144the regular expression. Various error codes are defined in the header file.
145</P>
146<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
147<P>
148This area is not simple, because POSIX and Perl take different views of things.
149It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
150intended to be a POSIX engine. The following table lists the different
151possibilities for matching newline characters in PCRE:
152<pre>
153                          Default   Change with
154
155  . matches newline          no     PCRE_DOTALL
156  newline matches [^a]       yes    not changeable
157  $ matches \n at end        yes    PCRE_DOLLARENDONLY
158  $ matches \n in middle     no     PCRE_MULTILINE
159  ^ matches \n in middle     no     PCRE_MULTILINE
160</pre>
161This is the equivalent table for POSIX:
162<pre>
163                          Default   Change with
164
165  . matches newline          yes    REG_NEWLINE
166  newline matches [^a]       yes    REG_NEWLINE
167  $ matches \n at end        no     REG_NEWLINE
168  $ matches \n in middle     no     REG_NEWLINE
169  ^ matches \n in middle     no     REG_NEWLINE
170</pre>
171PCRE's behaviour is the same as Perl's, except that there is no equivalent for
172PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
173newline from matching [^a].
174</P>
175<P>
176The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
177PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
178REG_NEWLINE action.
179</P>
180<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
181<P>
182The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
183against a given <i>string</i>, which is by default terminated by a zero byte
184(but see REG_STARTEND below), subject to the options in <i>eflags</i>. These can
185be:
186<pre>
187  REG_NOTBOL
188</pre>
189The PCRE_NOTBOL option is set when calling the underlying PCRE matching
190function.
191<pre>
192  REG_NOTEOL
193</pre>
194The PCRE_NOTEOL option is set when calling the underlying PCRE matching
195function.
196<pre>
197  REG_STARTEND
198</pre>
199The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and
200to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
201(there need not actually be a NUL at that location), regardless of the value of
202<i>nmatch</i>. This is a BSD extension, compatible with but not specified by
203IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
204intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
205not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
206how it is matched.
207</P>
208<P>
209If the pattern was compiled with the REG_NOSUB flag, no data about any matched
210strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
211<b>regexec()</b> are ignored.
212</P>
213<P>
214Otherwise,the portion of the string that was matched, and also any captured
215substrings, are returned via the <i>pmatch</i> argument, which points to an
216array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
217members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
218character of each substring and the offset to the first character after the end
219of each substring, respectively. The 0th element of the vector relates to the
220entire portion of <i>string</i> that was matched; subsequent elements relate to
221the capturing subpatterns of the regular expression. Unused entries in the
222array have both structure members set to -1.
223</P>
224<P>
225A successful match yields a zero return; various error codes are defined in the
226header file, of which REG_NOMATCH is the "expected" failure code.
227</P>
228<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
229<P>
230The <b>regerror()</b> function maps a non-zero errorcode from either
231<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
232NULL, the error should have arisen from the use of that structure. A message
233terminated by a binary zero is placed in <i>errbuf</i>. The length of the
234message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
235function is the size of buffer needed to hold the whole message.
236</P>
237<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
238<P>
239Compiling a regular expression causes memory to be allocated and associated
240with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
241memory, after which <i>preg</i> may no longer be used as a compiled expression.
242</P>
243<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
244<P>
245Philip Hazel
246<br>
247University Computing Service
248<br>
249Cambridge CB2 3QH, England.
250<br>
251</P>
252<br><a name="SEC9" href="#TOC1">REVISION</a><br>
253<P>
254Last updated: 05 April 2008
255<br>
256Copyright &copy; 1997-2008 University of Cambridge.
257<br>
258<p>
259Return to the <a href="index.html">PCRE index page</a>.
260</p>