PageRenderTime 33ms CodeModel.GetById 12ms app.highlight 12ms RepoModel.GetById 2ms app.codeStats 0ms

/vendor/pcre/doc/html/pcrebuild.html

http://github.com/feyeleanor/RubyGoLightly
HTML | 340 lines | 340 code | 0 blank | 0 comment | 0 complexity | 3dce9cf9edade5f8435670dfe445529f MD5 | raw file
  1<html>
  2<head>
  3<title>pcrebuild specification</title>
  4</head>
  5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
  6<h1>pcrebuild man page</h1>
  7<p>
  8Return to the <a href="index.html">PCRE index page</a>.
  9</p>
 10<p>
 11This page is part of the PCRE HTML documentation. It was generated automatically
 12from the original man page. If there is any nonsense in it, please consult the
 13man page, in case the conversion went wrong.
 14<br>
 15<ul>
 16<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
 17<li><a name="TOC2" href="#SEC2">C++ SUPPORT</a>
 18<li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a>
 19<li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a>
 20<li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a>
 21<li><a name="TOC6" href="#SEC6">WHAT \R MATCHES</a>
 22<li><a name="TOC7" href="#SEC7">BUILDING SHARED AND STATIC LIBRARIES</a>
 23<li><a name="TOC8" href="#SEC8">POSIX MALLOC USAGE</a>
 24<li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a>
 25<li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a>
 26<li><a name="TOC11" href="#SEC11">LIMITING PCRE RESOURCE USAGE</a>
 27<li><a name="TOC12" href="#SEC12">CREATING CHARACTER TABLES AT BUILD TIME</a>
 28<li><a name="TOC13" href="#SEC13">USING EBCDIC CODE</a>
 29<li><a name="TOC14" href="#SEC14">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
 30<li><a name="TOC15" href="#SEC15">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
 31<li><a name="TOC16" href="#SEC16">SEE ALSO</a>
 32<li><a name="TOC17" href="#SEC17">AUTHOR</a>
 33<li><a name="TOC18" href="#SEC18">REVISION</a>
 34</ul>
 35<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
 36<P>
 37This document describes the optional features of PCRE that can be selected when
 38the library is compiled. It assumes use of the <b>configure</b> script, where
 39the optional features are selected or deselected by providing options to
 40<b>configure</b> before running the <b>make</b> command. However, the same
 41options can be selected in both Unix-like and non-Unix-like environments using
 42the GUI facility of <b>CMakeSetup</b> if you are using <b>CMake</b> instead of
 43<b>configure</b> to build PCRE.
 44</P>
 45<P>
 46The complete list of options for <b>configure</b> (which includes the standard
 47ones such as the selection of the installation directory) can be obtained by
 48running
 49<pre>
 50  ./configure --help
 51</pre>
 52The following sections include descriptions of options whose names begin with
 53--enable or --disable. These settings specify changes to the defaults for the
 54<b>configure</b> command. Because of the way that <b>configure</b> works,
 55--enable and --disable always come in pairs, so the complementary option always
 56exists as well, but as it specifies the default, it is not described.
 57</P>
 58<br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br>
 59<P>
 60By default, the <b>configure</b> script will search for a C++ compiler and C++
 61header files. If it finds them, it automatically builds the C++ wrapper library
 62for PCRE. You can disable this by adding
 63<pre>
 64  --disable-cpp
 65</pre>
 66to the <b>configure</b> command.
 67</P>
 68<br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br>
 69<P>
 70To build PCRE with support for UTF-8 character strings, add
 71<pre>
 72  --enable-utf8
 73</pre>
 74to the <b>configure</b> command. Of itself, this does not make PCRE treat
 75strings as UTF-8. As well as compiling PCRE with this option, you also have
 76have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
 77function.
 78</P>
 79<br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
 80<P>
 81UTF-8 support allows PCRE to process character values greater than 255 in the
 82strings that it handles. On its own, however, it does not provide any
 83facilities for accessing the properties of such characters. If you want to be
 84able to use the pattern escapes \P, \p, and \X, which refer to Unicode
 85character properties, you must add
 86<pre>
 87  --enable-unicode-properties
 88</pre>
 89to the <b>configure</b> command. This implies UTF-8 support, even if you have
 90not explicitly requested it.
 91</P>
 92<P>
 93Including Unicode property support adds around 30K of tables to the PCRE
 94library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
 95supported. Details are given in the
 96<a href="pcrepattern.html"><b>pcrepattern</b></a>
 97documentation.
 98</P>
 99<br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
100<P>
101By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
102of a line. This is the normal newline character on Unix-like systems. You can
103compile PCRE to use character 13 (carriage return, CR) instead, by adding
104<pre>
105  --enable-newline-is-cr
106</pre>
107to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
108which explicitly specifies linefeed as the newline character.
109<br>
110<br>
111Alternatively, you can specify that line endings are to be indicated by the two
112character sequence CRLF. If you want this, add
113<pre>
114  --enable-newline-is-crlf
115</pre>
116to the <b>configure</b> command. There is a fourth option, specified by
117<pre>
118  --enable-newline-is-anycrlf
119</pre>
120which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
121indicating a line ending. Finally, a fifth option, specified by
122<pre>
123  --enable-newline-is-any
124</pre>
125causes PCRE to recognize any Unicode newline sequence.
126</P>
127<P>
128Whatever line ending convention is selected when PCRE is built can be
129overridden when the library functions are called. At build time it is
130conventional to use the standard for your operating system.
131</P>
132<br><a name="SEC6" href="#TOC1">WHAT \R MATCHES</a><br>
133<P>
134By default, the sequence \R in a pattern matches any Unicode newline sequence,
135whatever has been selected as the line ending sequence. If you specify
136<pre>
137  --enable-bsr-anycrlf
138</pre>
139the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
140selected when PCRE is built can be overridden when the library functions are
141called.
142</P>
143<br><a name="SEC7" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
144<P>
145The PCRE building process uses <b>libtool</b> to build both shared and static
146Unix libraries by default. You can suppress one of these by adding one of
147<pre>
148  --disable-shared
149  --disable-static
150</pre>
151to the <b>configure</b> command, as required.
152</P>
153<br><a name="SEC8" href="#TOC1">POSIX MALLOC USAGE</a><br>
154<P>
155When PCRE is called through the POSIX interface (see the
156<a href="pcreposix.html"><b>pcreposix</b></a>
157documentation), additional working storage is required for holding the pointers
158to capturing substrings, because PCRE requires three integers per substring,
159whereas the POSIX interface provides only two. If the number of expected
160substrings is small, the wrapper function uses space on the stack, because this
161is faster than using <b>malloc()</b> for each call. The default threshold above
162which the stack is no longer used is 10; it can be changed by adding a setting
163such as
164<pre>
165  --with-posix-malloc-threshold=20
166</pre>
167to the <b>configure</b> command.
168</P>
169<br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
170<P>
171Within a compiled pattern, offset values are used to point from one part to
172another (for example, from an opening parenthesis to an alternation
173metacharacter). By default, two-byte values are used for these offsets, leading
174to a maximum size for a compiled pattern of around 64K. This is sufficient to
175handle all but the most gigantic patterns. Nevertheless, some people do want to
176process enormous patterns, so it is possible to compile PCRE to use three-byte
177or four-byte offsets by adding a setting such as
178<pre>
179  --with-link-size=3
180</pre>
181to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
182longer offsets slows down the operation of PCRE because it has to load
183additional bytes when handling them.
184</P>
185<br><a name="SEC10" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
186<P>
187When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
188by making recursive calls to an internal function called <b>match()</b>. In
189environments where the size of the stack is limited, this can severely limit
190PCRE's operation. (The Unix environment does not usually suffer from this
191problem, but it may sometimes be necessary to increase the maximum stack size.
192There is a discussion in the
193<a href="pcrestack.html"><b>pcrestack</b></a>
194documentation.) An alternative approach to recursion that uses memory from the
195heap to remember data, instead of using recursive function calls, has been
196implemented to work round the problem of limited stack size. If you want to
197build a version of PCRE that works this way, add
198<pre>
199  --disable-stack-for-recursion
200</pre>
201to the <b>configure</b> command. With this configuration, PCRE will use the
202<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
203management functions. By default these point to <b>malloc()</b> and
204<b>free()</b>, but you can replace the pointers so that your own functions are
205used.
206</P>
207<P>
208Separate functions are provided rather than using <b>pcre_malloc</b> and
209<b>pcre_free</b> because the usage is very predictable: the block sizes
210requested are always the same, and the blocks are always freed in reverse
211order. A calling program might be able to implement optimized functions that
212perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
213slowly when built in this way. This option affects only the <b>pcre_exec()</b>
214function; it is not relevant for the the <b>pcre_dfa_exec()</b> function.
215</P>
216<br><a name="SEC11" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
217<P>
218Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
219(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
220function. By controlling the maximum number of times this function may be
221called during a single matching operation, a limit can be placed on the
222resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
223at run time, as described in the
224<a href="pcreapi.html"><b>pcreapi</b></a>
225documentation. The default is 10 million, but this can be changed by adding a
226setting such as
227<pre>
228  --with-match-limit=500000
229</pre>
230to the <b>configure</b> command. This setting has no effect on the
231<b>pcre_dfa_exec()</b> matching function.
232</P>
233<P>
234In some environments it is desirable to limit the depth of recursive calls of
235<b>match()</b> more strictly than the total number of calls, in order to
236restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
237is specified) that is used. A second limit controls this; it defaults to the
238value that is set for --with-match-limit, which imposes no additional
239constraints. However, you can set a lower limit by adding, for example,
240<pre>
241  --with-match-limit-recursion=10000
242</pre>
243to the <b>configure</b> command. This value can also be overridden at run time.
244</P>
245<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
246<P>
247PCRE uses fixed tables for processing characters whose code values are less
248than 256. By default, PCRE is built with a set of tables that are distributed
249in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes
250only. If you add
251<pre>
252  --enable-rebuild-chartables
253</pre>
254to the <b>configure</b> command, the distributed tables are no longer used.
255Instead, a program called <b>dftables</b> is compiled and run. This outputs the
256source for new set of tables, created in the default locale of your C runtime
257system. (This method of replacing the tables does not work if you are cross
258compiling, because <b>dftables</b> is run on the local host. If you need to
259create alternative tables when cross compiling, you will have to do so "by
260hand".)
261</P>
262<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
263<P>
264PCRE assumes by default that it will run in an environment where the character
265code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
266most computer operating systems. PCRE can, however, be compiled to run in an
267EBCDIC environment by adding
268<pre>
269  --enable-ebcdic
270</pre>
271to the <b>configure</b> command. This setting implies
272--enable-rebuild-chartables. You should only use it if you know that you are in
273an EBCDIC environment (for example, an IBM mainframe operating system).
274</P>
275<br><a name="SEC14" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
276<P>
277By default, <b>pcregrep</b> reads all files as plain text. You can build it so
278that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
279them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of
280<pre>
281  --enable-pcregrep-libz
282  --enable-pcregrep-libbz2
283</pre>
284to the <b>configure</b> command. These options naturally require that the
285relevant libraries are installed on your system. Configuration will fail if
286they are not.
287</P>
288<br><a name="SEC15" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
289<P>
290If you add
291<pre>
292  --enable-pcretest-libreadline
293</pre>
294to the <b>configure</b> command, <b>pcretest</b> is linked with the
295<b>libreadline</b> library, and when its input is from a terminal, it reads it
296using the <b>readline()</b> function. This provides line-editing and history
297facilities. Note that <b>libreadline</b> is GPL-licenced, so if you distribute a
298binary of <b>pcretest</b> linked in this way, there may be licensing issues.
299</P>
300<P>
301Setting this option causes the <b>-lreadline</b> option to be added to the
302<b>pcretest</b> build. In many operating environments with a sytem-installed
303<b>libreadline</b> this is sufficient. However, in some environments (e.g.
304if an unmodified distribution version of readline is in use), some extra
305configuration may be necessary. The INSTALL file for <b>libreadline</b> says
306this:
307<pre>
308  "Readline uses the termcap functions, but does not link with the
309  termcap or curses library itself, allowing applications which link
310  with readline the to choose an appropriate library."
311</pre>
312If your environment has not been set up so that an appropriate library is
313automatically included, you may need to add something like
314<pre>
315  LIBS="-ncurses"
316</pre>
317immediately before the <b>configure</b> command.
318</P>
319<br><a name="SEC16" href="#TOC1">SEE ALSO</a><br>
320<P>
321<b>pcreapi</b>(3), <b>pcre_config</b>(3).
322</P>
323<br><a name="SEC17" href="#TOC1">AUTHOR</a><br>
324<P>
325Philip Hazel
326<br>
327University Computing Service
328<br>
329Cambridge CB2 3QH, England.
330<br>
331</P>
332<br><a name="SEC18" href="#TOC1">REVISION</a><br>
333<P>
334Last updated: 13 April 2008
335<br>
336Copyright &copy; 1997-2008 University of Cambridge.
337<br>
338<p>
339Return to the <a href="index.html">PCRE index page</a>.
340</p>