PageRenderTime 88ms CodeModel.GetById 84ms app.highlight 1ms RepoModel.GetById 1ms app.codeStats 0ms

/vendor/pcre/doc/pcrecpp.3

http://github.com/feyeleanor/RubyGoLightly
Unknown | 342 lines | 341 code | 1 blank | 0 comment | 0 complexity | 7c744fe6c2dc94ed87f6c29cb8d43599 MD5 | raw file
  1.TH PCRECPP 3
  2.SH NAME
  3PCRE - Perl-compatible regular expressions.
  4.SH "SYNOPSIS OF C++ WRAPPER"
  5.rs
  6.sp
  7.B #include <pcrecpp.h>
  8.
  9.SH DESCRIPTION
 10.rs
 11.sp
 12The C++ wrapper for PCRE was provided by Google Inc. Some additional
 13functionality was added by Giuseppe Maxia. This brief man page was constructed
 14from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
 15further details.
 16.
 17.
 18.SH "MATCHING INTERFACE"
 19.rs
 20.sp
 21The "FullMatch" operation checks that supplied text matches a supplied pattern
 22exactly. If pointer arguments are supplied, it copies matched sub-strings that
 23match sub-patterns into them.
 24.sp
 25  Example: successful match
 26     pcrecpp::RE re("h.*o");
 27     re.FullMatch("hello");
 28.sp
 29  Example: unsuccessful match (requires full match):
 30     pcrecpp::RE re("e");
 31     !re.FullMatch("hello");
 32.sp
 33  Example: creating a temporary RE object:
 34     pcrecpp::RE("h.*o").FullMatch("hello");
 35.sp
 36You can pass in a "const char*" or a "string" for "text". The examples below
 37tend to use a const char*. You can, as in the different examples above, store
 38the RE object explicitly in a variable or use a temporary RE object. The
 39examples below use one mode or the other arbitrarily. Either could correctly be
 40used for any of these examples.
 41.P
 42You must supply extra pointer arguments to extract matched subpieces.
 43.sp
 44  Example: extracts "ruby" into "s" and 1234 into "i"
 45     int i;
 46     string s;
 47     pcrecpp::RE re("(\e\ew+):(\e\ed+)");
 48     re.FullMatch("ruby:1234", &s, &i);
 49.sp
 50  Example: does not try to extract any extra sub-patterns
 51     re.FullMatch("ruby:1234", &s);
 52.sp
 53  Example: does not try to extract into NULL
 54     re.FullMatch("ruby:1234", NULL, &i);
 55.sp
 56  Example: integer overflow causes failure
 57     !re.FullMatch("ruby:1234567891234", NULL, &i);
 58.sp
 59  Example: fails because there aren't enough sub-patterns:
 60     !pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
 61.sp
 62  Example: fails because string cannot be stored in integer
 63     !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
 64.sp
 65The provided pointer arguments can be pointers to any scalar numeric
 66type, or one of:
 67.sp
 68   string        (matched piece is copied to string)
 69   StringPiece   (StringPiece is mutated to point to matched piece)
 70   T             (where "bool T::ParseFrom(const char*, int)" exists)
 71   NULL          (the corresponding matched sub-pattern is not copied)
 72.sp
 73The function returns true iff all of the following conditions are satisfied:
 74.sp
 75  a. "text" matches "pattern" exactly;
 76.sp
 77  b. The number of matched sub-patterns is >= number of supplied
 78     pointers;
 79.sp
 80  c. The "i"th argument has a suitable type for holding the
 81     string captured as the "i"th sub-pattern. If you pass in
 82     void * NULL for the "i"th argument, or a non-void * NULL
 83     of the correct type, or pass fewer arguments than the
 84     number of sub-patterns, "i"th captured sub-pattern is
 85     ignored.
 86.sp
 87CAVEAT: An optional sub-pattern that does not exist in the matched
 88string is assigned the empty string. Therefore, the following will
 89return false (because the empty string is not a valid number):
 90.sp
 91   int number;
 92   pcrecpp::RE::FullMatch("abc", "[a-z]+(\e\ed+)?", &number);
 93.sp
 94The matching interface supports at most 16 arguments per call.
 95If you need more, consider using the more general interface
 96\fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
 97\fBDoMatch\fP.
 98.
 99.SH "QUOTING METACHARACTERS"
100.rs
101.sp
102You can use the "QuoteMeta" operation to insert backslashes before all
103potentially meaningful characters in a string. The returned string, used as a
104regular expression, will exactly match the original string.
105.sp
106  Example:
107     string quoted = RE::QuoteMeta(unquoted);
108.sp
109Note that it's legal to escape a character even if it has no special meaning in
110a regular expression -- so this function does that. (This also makes it
111identical to the perl function of the same name; see "perldoc -f quotemeta".)
112For example, "1.5-2.0?" becomes "1\e.5\e-2\e.0\e?".
113.
114.SH "PARTIAL MATCHES"
115.rs
116.sp
117You can use the "PartialMatch" operation when you want the pattern
118to match any substring of the text.
119.sp
120  Example: simple search for a string:
121     pcrecpp::RE("ell").PartialMatch("hello");
122.sp
123  Example: find first number in a string:
124     int number;
125     pcrecpp::RE re("(\e\ed+)");
126     re.PartialMatch("x*100 + 20", &number);
127     assert(number == 100);
128.
129.
130.SH "UTF-8 AND THE MATCHING INTERFACE"
131.rs
132.sp
133By default, pattern and text are plain text, one byte per character. The UTF8
134flag, passed to the constructor, causes both pattern and string to be treated
135as UTF-8 text, still a byte stream but potentially multiple bytes per
136character. In practice, the text is likelier to be UTF-8 than the pattern, but
137the match returned may depend on the UTF8 flag, so always use it when matching
138UTF8 text. For example, "." will match one byte normally but with UTF8 set may
139match up to three bytes of a multi-byte character.
140.sp
141  Example:
142     pcrecpp::RE_Options options;
143     options.set_utf8();
144     pcrecpp::RE re(utf8_pattern, options);
145     re.FullMatch(utf8_string);
146.sp
147  Example: using the convenience function UTF8():
148     pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
149     re.FullMatch(utf8_string);
150.sp
151NOTE: The UTF8 flag is ignored if pcre was not configured with the
152      --enable-utf8 flag.
153.
154.
155.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
156.rs
157.sp
158PCRE defines some modifiers to change the behavior of the regular expression
159engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
160pass such modifiers to a RE class. Currently, the following modifiers are
161supported:
162.sp
163   modifier              description               Perl corresponding
164.sp
165   PCRE_CASELESS         case insensitive match      /i
166   PCRE_MULTILINE        multiple lines match        /m
167   PCRE_DOTALL           dot matches newlines        /s
168   PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
169   PCRE_EXTRA            strict escape parsing       N/A
170   PCRE_EXTENDED         ignore whitespaces          /x
171   PCRE_UTF8             handles UTF8 chars          built-in
172   PCRE_UNGREEDY         reverses * and *?           N/A
173   PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
174.sp
175(*) Both Perl and PCRE allow non capturing parentheses by means of the
176"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
177capture, while (ab|cd) does.
178.P
179For a full account on how each modifier works, please check the
180PCRE API reference page.
181.P
182For each modifier, there are two member functions whose name is made
183out of the modifier in lowercase, without the "PCRE_" prefix. For
184instance, PCRE_CASELESS is handled by
185.sp
186  bool caseless()
187.sp
188which returns true if the modifier is set, and
189.sp
190  RE_Options & set_caseless(bool)
191.sp
192which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
193accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member
194functions. Setting \fImatch_limit\fR to a non-zero value will limit the
195execution of pcre to keep it from doing bad things like blowing the stack or
196taking an eternity to return a result. A value of 5000 is good enough to stop
197stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables
198match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
199which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
200recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
201\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
202therefore the amount of stack that is used.
203.P
204Normally, to pass one or more modifiers to a RE class, you declare
205a \fIRE_Options\fR object, set the appropriate options, and pass this
206object to a RE constructor. Example:
207.sp
208   RE_options opt;
209   opt.set_caseless(true);
210   if (RE("HELLO", opt).PartialMatch("hello world")) ...
211.sp
212RE_options has two constructors. The default constructor takes no arguments and
213creates a set of flags that are off by default. The optional parameter
214\fIoption_flags\fR is to facilitate transfer of legacy code from C programs.
215This lets you do
216.sp
217   RE(pattern,
218     RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
219.sp
220However, new code is better off doing
221.sp
222   RE(pattern,
223     RE_Options().set_caseless(true).set_multiline(true))
224       .PartialMatch(str);
225.sp
226If you are going to pass one of the most used modifiers, there are some
227convenience functions that return a RE_Options class with the
228appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR,
229\fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR.
230.P
231If you need to set several options at once, and you don't want to go through
232the pains of declaring a RE_Options object and setting several options, there
233is a parallel method that give you such ability on the fly. You can concatenate
234several \fBset_xxxxx()\fR member functions, since each of them returns a
235reference to its class object. For example, to pass PCRE_CASELESS,
236PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
237.sp
238   RE(" ^ xyz \e\es+ .* blah$",
239     RE_Options()
240       .set_caseless(true)
241       .set_extended(true)
242       .set_multiline(true)).PartialMatch(sometext);
243.sp
244.
245.
246.SH "SCANNING TEXT INCREMENTALLY"
247.rs
248.sp
249The "Consume" operation may be useful if you want to repeatedly
250match regular expressions at the front of a string and skip over
251them as they match. This requires use of the "StringPiece" type,
252which represents a sub-range of a real string. Like RE, StringPiece
253is defined in the pcrecpp namespace.
254.sp
255  Example: read lines of the form "var = value" from a string.
256     string contents = ...;                 // Fill string somehow
257     pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
258
259     string var;
260     int value;
261     pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
262     while (re.Consume(&input, &var, &value)) {
263       ...;
264     }
265.sp
266Each successful call to "Consume" will set "var/value", and also
267advance "input" so it points past the matched text.
268.P
269The "FindAndConsume" operation is similar to "Consume" but does not
270anchor your match at the beginning of the string. For example, you
271could extract all words from a string by repeatedly calling
272.sp
273  pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
274.
275.
276.SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
277.rs
278.sp
279By default, if you pass a pointer to a numeric value, the
280corresponding text is interpreted as a base-10 number. You can
281instead wrap the pointer with a call to one of the operators Hex(),
282Octal(), or CRadix() to interpret the text in another base. The
283CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
284prefixes, but defaults to base-10.
285.sp
286  Example:
287    int a, b, c, d;
288    pcrecpp::RE re("(.*) (.*) (.*) (.*)");
289    re.FullMatch("100 40 0100 0x40",
290                 pcrecpp::Octal(&a), pcrecpp::Hex(&b),
291                 pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
292.sp
293will leave 64 in a, b, c, and d.
294.
295.
296.SH "REPLACING PARTS OF STRINGS"
297.rs
298.sp
299You can replace the first match of "pattern" in "str" with "rewrite".
300Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
301used to insert text matching corresponding parenthesized group
302from the pattern. \e0 in "rewrite" refers to the entire matching
303text. For example:
304.sp
305  string s = "yabba dabba doo";
306  pcrecpp::RE("b+").Replace("d", &s);
307.sp
308will leave "s" containing "yada dabba doo". The result is true if the pattern
309matches and a replacement occurs, false otherwise.
310.P
311\fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
312occurrences of the pattern in the string with the rewrite. Replacements are
313not subject to re-matching. For example:
314.sp
315  string s = "yabba dabba doo";
316  pcrecpp::RE("b+").GlobalReplace("d", &s);
317.sp
318will leave "s" containing "yada dada doo". It returns the number of
319replacements made.
320.P
321\fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
322"rewrite" is copied into "out" (an additional argument) with substitutions.
323The non-matching portions of "text" are ignored. Returns true iff a match
324occurred and the extraction happened successfully;  if no match occurs, the
325string is left unaffected.
326.
327.
328.SH AUTHOR
329.rs
330.sp
331.nf
332The C++ wrapper was contributed by Google Inc.
333Copyright (c) 2007 Google Inc.
334.fi
335.
336.
337.SH REVISION
338.rs
339.sp
340.nf
341Last updated: 12 November 2007
342.fi