PageRenderTime 61ms CodeModel.GetById 17ms RepoModel.GetById 0ms app.codeStats 0ms

/doc/index.html

http://github.com/edicl/cl-ppcre
HTML | 2206 lines | 1861 code | 345 blank | 0 comment | 0 complexity | b8d64a620f112e30367282f0960294fa MD5 | raw file

Large files files are truncated, but you can click here to view the full file

  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  2. <html>
  3. <head>
  4. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  5. <title>CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp</title>
  6. <style type="text/css">
  7. pre { padding:5px; background-color:#e0e0e0 }
  8. h3, h4 { text-decoration: underline; }
  9. a { text-decoration: none; padding: 1px 2px 1px 2px; }
  10. a:visited { text-decoration: none; padding: 1px 2px 1px 2px; }
  11. a:hover { text-decoration: none; padding: 1px 1px 1px 1px; border: 1px solid #000000; }
  12. a:focus { text-decoration: none; padding: 1px 2px 1px 2px; border: none; }
  13. a.none { text-decoration: none; padding: 0; }
  14. a.none:visited { text-decoration: none; padding: 0; }
  15. a.none:hover { text-decoration: none; border: none; padding: 0; }
  16. a.none:focus { text-decoration: none; border: none; padding: 0; }
  17. a.noborder { text-decoration: none; padding: 0; }
  18. a.noborder:visited { text-decoration: none; padding: 0; }
  19. a.noborder:hover { text-decoration: none; border: none; padding: 0; }
  20. a.noborder:focus { text-decoration: none; border: none; padding: 0; }
  21. pre.none { padding:5px; background-color:#ffffff }
  22. </style>
  23. <meta name="description" content="Fast and portable perl-compatible regular expressions for Common Lisp.">
  24. </head>
  25. <body bgcolor=white>
  26. <h2>CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp</h2>
  27. <blockquote>
  28. <br>&nbsp;<br><h3>Abstract</h3>
  29. CL-PPCRE is a portable regular expression library for Common Lisp
  30. which has the following features:
  31. <ul>
  32. <li>It is <b>compatible with Perl</b>.
  33. <li>It is pretty <b>fast</b>.
  34. <li>It is <b>portable</b> between ANSI-compliant Common Lisp
  35. implementations.
  36. <li>It is <b>thread-safe</b>.
  37. <li>In addition to specifying regular expressions as strings like in
  38. Perl you can also use <a
  39. href="#create-scanner2"><b>S-expressions</b></a>.
  40. <li>It comes with a <a
  41. href="http://www.opensource.org/licenses/bsd-license.php"><b>BSD-style
  42. license</b></a> so you can basically do with it whatever you want.
  43. </ul>
  44. CL-PPCRE has been used successfully in various applications like <a
  45. href="http://nostoc.stanford.edu/Docs/">BioBike</a>,
  46. <a href="http://clutu.com/">clutu</a>,
  47. <a
  48. href="http://www.hpc.unm.edu/~download/LoGS/">LoGS</a>, <a href="http://cafespot.net/">CafeSpot</a>, <a href="http://www.eboy.com/">Eboy</a>, or <a
  49. href="http://weitz.de/regex-coach/">The Regex Coach</a>.
  50. <p>
  51. <font color=red>Download shortcut:</font> <a href="http://weitz.de/files/cl-ppcre.tar.gz">http://weitz.de/files/cl-ppcre.tar.gz</a>.
  52. </blockquote>
  53. <br>&nbsp;<br><h3><a class=none name="contents">Contents</a></h3>
  54. <ol>
  55. <li><a href="#install">Download and installation</a>
  56. <li><a href="#support">Support</a>
  57. <li><a href="#dict">The CL-PPCRE dictionary</a>
  58. <ol>
  59. <li><a href="#scanning">Scanning</a>
  60. <ol>
  61. <li><a href="#create-scanner"><code>create-scanner</code></a> (for Perl regex strings)
  62. <li><a href="#create-scanner2"><code>create-scanner</code></a> (for parse trees)
  63. <li><a href="#scan"><code>scan</code></a>
  64. <li><a href="#scan-to-strings"><code>scan-to-strings</code></a>
  65. <li><a href="#register-groups-bind"><code>register-groups-bind</code></a>
  66. <li><a href="#do-scans"><code>do-scans</code></a>
  67. <li><a href="#do-matches"><code>do-matches</code></a>
  68. <li><a href="#do-matches-as-strings"><code>do-matches-as-strings</code></a>
  69. <li><a href="#do-register-groups"><code>do-register-groups</code></a>
  70. <li><a href="#all-matches"><code>all-matches</code></a>
  71. <li><a href="#all-matches-as-strings"><code>all-matches-as-strings</code></a>
  72. </ol>
  73. <li><a href="#splitting">Splitting and replacing</a>
  74. <ol>
  75. <li><a href="#split"><code>split</code></a>
  76. <li><a href="#regex-replace"><code>regex-replace</code></a>
  77. <li><a href="#regex-replace-all"><code>regex-replace-all</code></a>
  78. </ol>
  79. <li><a href="#modify">Modifying scanner behaviour</a>
  80. <ol>
  81. <li><a href="#*property-resolver*"><code>*property-resolver*</code></a>
  82. <li><a href="#parse-tree-synonym"><code>parse-tree-synonym</code></a>
  83. <li><a href="#define-parse-tree-synonym"><code>define-parse-tree-synonym</code></a>
  84. <li><a href="#*regex-char-code-limit*"><code>*regex-char-code-limit*</code></a>
  85. <li><a href="#*use-bmh-matchers*"><code>*use-bmh-matchers*</code></a>
  86. <li><a href="#*optimize-char-classes*"><code>*optimize-char-classes*</code></a>
  87. <li><a href="#*allow-quoting*"><code>*allow-quoting*</code></a>
  88. <li><a href="#*allow-named-registers*"><code>*allow-named-registers*</code></a>
  89. </ol>
  90. <li><a href="#misc">Miscellaneous</a>
  91. <ol>
  92. <li><a href="#parse-string"><code>parse-string</code></a>
  93. <li><a href="#create-optimized-test-function"><code>create-optimized-test-function</code></a>
  94. <li><a href="#quote-meta-chars"><code>quote-meta-chars</code></a>
  95. <li><a href="#regex-apropos"><code>regex-apropos</code></a>
  96. <li><a href="#regex-apropos-list"><code>regex-apropos-list</code></a>
  97. </ol>
  98. <li><a href="#conditions">Conditions</a>
  99. <ol>
  100. <li><a href="#ppcre-error"><code>ppcre-error</code></a>
  101. <li><a href="#ppcre-invocation-error"><code>ppcre-invocation-error</code></a>
  102. <li><a href="#ppcre-syntax-error"><code>ppcre-syntax-error</code></a>
  103. <li><a href="#ppcre-syntax-error-string"><code>ppcre-syntax-error-string</code></a>
  104. <li><a href="#ppcre-syntax-error-pos"><code>ppcre-syntax-error-pos</code></a>
  105. </ol>
  106. </ol>
  107. <li><a href="#unicode">Unicode properties</a>
  108. <ol>
  109. <li><a href="#unicode-property-resolver"><code>unicode-property-resolver</code></a>
  110. </ol>
  111. <li><a href="#filters">Filters</a>
  112. <li><a href="#perl">Compatibility with Perl</a>
  113. <ol>
  114. <li><a href="#empty">Empty strings instead of <code>undef</code> in <code>$1</code>, <code>$2</code>, etc.</a>
  115. <li><a href="#scope">Strange scoping of embedded modifiers</a>
  116. <li><a href="#inconsistent">Inconsistent capturing of <code>$1</code>, <code>$2</code>, etc.</a>
  117. <li><a href="#lookaround">Captured groups not available outside of look-aheads and look-behinds</a>
  118. <li><a href="#order">Alternations don't always work from left to right</a>
  119. <li><a href="#uprops">Different names for Unicode properties</a>
  120. <li><a href="#mac"><code>&quot;\r&quot;</code> doesn't work with MCL</a>
  121. <li><a href="#alpha">What about <code>&quot;\w&quot;</code>?</a>
  122. </ol>
  123. <li><a href="#bugs">Bugs and problems</a>
  124. <ol>
  125. <li><a href="#quote"><code>&quot;\Q&quot;</code> doesn't work, or does it?</a>
  126. <li><a href="#backslash">Backslashes may confuse you...</a>
  127. </ol>
  128. <li><a href="#allegro">AllegroCL compatibility mode</a>
  129. <li><a href="#blabla">Hints, comments, performance considerations</a>
  130. <li><a href="#ack">Acknowledgements</a>
  131. </ol>
  132. <br>&nbsp;<br><h3><a name="install" class=none>Download and installation</a></h3>
  133. CL-PPCRE together with this documentation can be downloaded from <a
  134. href="http://weitz.de/files/cl-ppcre.tar.gz">http://weitz.de/files/cl-ppcre.tar.gz</a>. The
  135. current version is 2.0.11.
  136. <p>
  137. CL-PPCRE comes with a system definition
  138. for <a href="http://www.cliki.net/asdf">ASDF</a> and you compile and
  139. load it in the usual way. There are no dependencies (except that the
  140. <a href="#test">test suite</a> which is not needed for normal operation depends
  141. on <a href="http://weitz.de/flexi-streams/">FLEXI-STREAMS</a>).
  142. <p>
  143. The preferred way to install CL-PPCRE is
  144. through <a href="http://www.quicklisp.org/" target="_new">Quicklisp</a>:
  145. <pre>(ql:quickload :cl-ppcre)</pre>
  146. </p>
  147. <p>
  148. <a class=none name="test">You</a> can run a test suite which tests most aspects of the library with
  149. <pre>
  150. (asdf:oos 'asdf:test-op :cl-ppcre)
  151. </pre>
  152. <p>
  153. The current development version of CL-PPCRE can be found
  154. at <a href="https://github.com/edicl/cl-ppcre">https://github.com/edicl/cl-ppcre</a>. If you want to send patches, please fork the github repository and send pull requests.
  155. <p>
  156. <br>&nbsp;<br><h3><a name="support" class=none>Support</a></h3>
  157. The development version of cl-ppcre can be
  158. found <a href="https://github.com/edicl/cl-ppcre" target="_new">on
  159. github</a>. Please use the github issue tracking system to submit bug
  160. reports. Patches are welcome, please
  161. use <a href="https://github.com/edicl/cl-ppcre/pulls">GitHub pull
  162. requests</a>. If you want to make a change,
  163. please <a href="http://weitz.de/patches.html" target="_new">read this
  164. first</a>.
  165. <br>&nbsp;<br><h3><a class=none name="dict">The CL-PPCRE dictionary</a></h3>
  166. <h4><a name="scanning" class=none>Scanning</a></h4>
  167. <p><br>[Method]
  168. <br><a class=none name="create-scanner"><b>create-scanner</b> <i>(string string)<tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner, register-names</i></a>
  169. <blockquote><br> Accepts a string which is a regular expression in
  170. Perl syntax and returns a closure which will scan strings for this
  171. regular expression. The second value is only returned if <a href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a> is <i>true</i>. It represents a list of strings mapping registers to their respective names - the first element stands for first register, the second element for second register, etc. You have to store this value if you want to map a register number to its name later as <i>scanner</i> doesn't capture any information about register names. If a register isn't named, it has NIL as its name.
  172. <p>
  173. The mode keyword arguments are equivalent to the
  174. <code>&quot;imsx&quot;</code> modifiers in Perl. The
  175. <code>destructive</code> keyword will be ignored.
  176. <p>
  177. The function accepts most of the regex syntax of Perl 5.8 as described
  178. in <a href="http://perldoc.perl.org/5.8.8/perlre.html"><code>man
  179. perlre</code></a> including extended features like non-greedy
  180. repetitions, positive and negative look-ahead and look-behind
  181. assertions, &quot;standalone&quot; subexpressions, and conditional
  182. subpatterns. The following Perl features are (currently) <b>not</b>
  183. supported:
  184. <ul>
  185. <li><code>(?{ code })</code> and <code>(??{ code })</code> because
  186. they obviously don't make sense in Lisp.
  187. <li><code>\N{name}</code> (named characters), <code>\x{263a}</code>
  188. (wide hex characters), <code>\l</code>, <code>\u</code>,
  189. <code>\L</code>, and <code>\U</code>
  190. because they're actually not part of Perl's <em>regex</em> syntax - but see <a href="http://weitz.de/cl-interpol/">CL-INTERPOL</a>.
  191. <li><code>\X</code> (extended Unicode), and <code>\C</code> (single
  192. character). But you can of course use all characters
  193. supported by your CL implementation.
  194. <li>Posix character classes like <code>[[:alpha]]</code>.
  195. Use <a href="#unicode">Unicode properties</a> instead.
  196. <li><code>\G</code> for Perl's <code>pos()</code> because we don't have it.
  197. </ul>
  198. Note, however, that <code>\t</code>, <code>\n</code>, <code>\r</code>,
  199. <code>\f</code>, <code>\a</code>, <code>\e</code>, <code>\033</code>
  200. (octal character codes), <code>\x1B</code> (hexadecimal character
  201. codes), <code>\c[</code> (control characters), <code>\w</code>,
  202. <code>\W</code>, <code>\s</code>, <code>\S</code>, <code>\d</code>,
  203. <code>\D</code>, <code>\b</code>, <code>\B</code>, <code>\A</code>,
  204. <code>\Z</code>, and <code>\z</code> <b>are</b> supported.
  205. <p>
  206. Since version 0.6.0, CL-PPCRE also supports Perl's <code>\Q</code> and <code>\E</code> - see <a
  207. href="#*allow-quoting*"><code>*ALLOW-QUOTING*</code></a> below. Make sure you also read <a href="#quote">the relevant section</a> in &quot;<a href="#bugs">Bugs and problems</a>.&quot;
  208. <p>
  209. Since version 1.3.0, CL-PPCRE offers support for <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm#regexp-new-capturing-2">AllegroCL's</a> <code>(?&lt;name&gt;"&lt;regex&gt;")</code> named registers and <code>\k&lt;name&gt;</code> back-references syntax, have a look at <a href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a> for details.
  210. <p>
  211. Since version 2.0.0, CL-PPCRE
  212. supports <a href="#*property-resolver*">named properties</a>
  213. (<code>\p</code> and <code>\P</code>), but only the long form with
  214. braces is supported, i.e. <code>\p{Letter}</code>
  215. and <code>\p{L}</code> will work while <code>\pL</code> won't.
  216. <p>
  217. The keyword arguments are just for your
  218. convenience. You can always use embedded modifiers like
  219. <code>&quot;(?i-s)&quot;</code> instead.</blockquote>
  220. <p><br>[Method]
  221. <br><a class=none name="create-scanner"><b>create-scanner</b> <i>(function function)<tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner</i></a>
  222. <blockquote><br> In this case <code><i>function</i></code> should be a
  223. scanner returned by another invocation
  224. of <code>CREATE-SCANNER</code>. It will be returned as is. You can't
  225. use any of the keyword arguments because the scanner has already been
  226. created and is immutable.
  227. </blockquote>
  228. <p><br>[Method]
  229. <br><a class=none name="create-scanner2"><b>create-scanner</b> <i>(parse-tree t)<tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner, register-names</i></a>
  230. <blockquote><br>
  231. This is similar to <a
  232. href="#create-scanner"><code>CREATE-SCANNER</code></a> for regex strings above but
  233. accepts a <em>parse tree</em> as its first argument. A parse tree is an S-expression
  234. conforming to the following syntax:
  235. <ul>
  236. <li>Every string and character is a parse tree and is treated
  237. <em>literally</em> as a part of the regular expression,
  238. i.e. parentheses, brackets, asterisks and such aren't special.
  239. <li>The symbol <code>:VOID</code> is equivalent to the empty string.
  240. <li>The symbol <code>:EVERYTHING</code> is equivalent to Perl's dot,
  241. i.e it matches everything (except maybe a newline character depending
  242. on the mode).
  243. <li>The symbols <code>:WORD-BOUNDARY</code> and
  244. <code>:NON-WORD-BOUNDARY</code> are equivalent to Perl's
  245. <code>&quot;\b&quot;</code> and <code>&quot;\B&quot;</code>.
  246. <li>The symbols <code>:DIGIT-CLASS</code>,
  247. <code>:NON-DIGIT-CLASS</code>, <code>:WORD-CHAR-CLASS</code>,
  248. <code>:NON-WORD-CHAR-CLASS</code>,
  249. <code>:WHITESPACE-CHAR-CLASS</code>, and
  250. <code>:NON-WHITESPACE-CHAR-CLASS</code> are equivalent to Perl's
  251. <em>special character classes</em> <code>&quot;\d&quot;</code>,
  252. <code>&quot;\D&quot;</code>, <code>&quot;\w&quot;</code>,
  253. <code>&quot;\W&quot;</code>, <code>&quot;\s&quot;</code>, and
  254. <code>&quot;\S&quot;</code> respectively.
  255. <li>The symbols <code>:START-ANCHOR</code>, <code>:END-ANCHOR</code>,
  256. <code>:MODELESS-START-ANCHOR</code>,
  257. <code>:MODELESS-END-ANCHOR</code>, and
  258. <code>:MODELESS-END-ANCHOR-NO-NEWLINE</code> are equivalent to Perl's
  259. <code>&quot;^&quot;</code>, <code>&quot;$&quot;</code>,
  260. <code>&quot;\A&quot;</code>, <code>&quot;\Z&quot;</code>, and
  261. <code>&quot;\z&quot;</code> respectively.
  262. <li>The symbols <code>:CASE-INSENSITIVE-P</code>,
  263. <code>:CASE-SENSITIVE-P</code>, <code>:MULTI-LINE-MODE-P</code>,
  264. <code>:NOT-MULTI-LINE-MODE-P</code>, <code>:SINGLE-LINE-MODE-P</code>,
  265. and <code>:NOT-SINGLE-LINE-MODE-P</code> are equivalent to Perl's
  266. <em>embedded modifiers</em> <code>&quot;(?i)&quot;</code>,
  267. <code>&quot;(?-i)&quot;</code>, <code>&quot;(?m)&quot;</code>,
  268. <code>&quot;(?-m)&quot;</code>, <code>&quot;(?s)&quot;</code>, and
  269. <code>&quot;(?-s)&quot;</code>. As usual, changes applied to modes are
  270. kept local to the innermost enclosing grouping or clustering
  271. construct.
  272. </li><li>All other symbols will signal an error of type <a
  273. href="#ppcre-syntax-error"><code>PPCRE-SYNTAX-ERROR</code></a>
  274. <em>unless</em> they are defined to be <a
  275. href="#parse-tree-synonym"><em>parse tree synonyms</em></a>.
  276. <li><code>(:FLAGS {&lt;modifier&gt;}*)</code> where
  277. <code>&lt;modifier&gt;</code> is one of the modifier symbols from
  278. above is used to group modifier symbols. The modifiers are applied
  279. from left to right. (This construct is obviously redundant. It is only
  280. there because it's used by the parser.)
  281. <li><code>(:SEQUENCE {&lt;<i>parse-tree</i>&gt;}*)</code> means a
  282. sequence of parse trees, i.e. the parse trees must match one after
  283. another. Example: <code>(:SEQUENCE #\f #\o #\o)</code> is equivalent
  284. to the parse tree <code>&quot;foo&quot;</code>.
  285. <li><code>(:GROUP {&lt;<i>parse-tree</i>&gt;}*)</code> is like
  286. <code>:SEQUENCE</code> but changes applied to modifier flags (see
  287. above) are kept local to the parse trees enclosed by this
  288. construct. Think of it as the S-expression variant of Perl's
  289. <code>&quot;(?:&lt;<i>pattern</i>&gt;)&quot;</code> construct.
  290. <li><code>(:ALTERNATION {&lt;<i>parse-tree</i>&gt;}*)</code> means an
  291. alternation of parse trees, i.e. one of the parse trees must
  292. match. Example: <code>(:ALTERNATION #\b #\a #\z)</code> is equivalent
  293. to the Perl regex string <code>&quot;b|a|z&quot;</code>.
  294. <li><code>(:BRANCH &lt;<i>test</i>&gt;
  295. &lt;<i>parse-tree</i>&gt;)</code> is for conditional regular
  296. expressions. <code>&lt;<i>test</i>&gt;</code> is either a number which
  297. stands for a register or a parse tree which is a look-ahead or
  298. look-behind assertion. See the entry for
  299. <code>(?(&lt;<i>condition</i>&gt;)&lt;<i>yes-pattern</i>&gt;|&lt;<i>no-pattern</i>&gt;)</code>
  300. in <a
  301. href="http://perldoc.perl.org/perlre.html#Extended-Patterns"><code>man
  302. perlre</code></a> for the semantics of this construct. If
  303. <code>&lt;<i>parse-tree</i>&gt;</code> is an alternation is
  304. <em>must</em> enclose exactly one or two parse trees where the second
  305. one (if present) will be treated as the &quot;no-pattern&quot; - in
  306. all other cases <code>&lt;<i>parse-tree</i>&gt;</code> will be treated
  307. as the &quot;yes-pattern&quot;.
  308. <li><code>(:POSITIVE-LOOKAHEAD|:NEGATIVE-LOOKAHEAD|:POSITIVE-LOOKBEHIND|:NEGATIVE-LOOKBEHIND
  309. &lt;<i>parse-tree</i>&gt;)</code> should be pretty obvious...
  310. <li><code>(:GREEDY-REPETITION|:NON-GREEDY-REPETITION
  311. &lt;<i>min</i>&gt; &lt;<i>max</i>&gt;
  312. &lt;<i>parse-tree</i>&gt;)</code> where
  313. <code>&lt;<i>min</i>&gt;</code> is a non-negative integer and
  314. <code>&lt;<i>max</i>&gt;</code> is either a non-negative integer not
  315. smaller than <code>&lt;<i>min</i>&gt;</code> or <code>NIL</code> will
  316. result in a regular expression which tries to match
  317. <code>&lt;<i>parse-tree</i>&gt;</code> at least
  318. <code>&lt;<i>min</i>&gt;</code> times and at most
  319. <code>&lt;<i>max</i>&gt;</code> times (or as often as possible if
  320. <code>&lt;<i>max</i>&gt;</code> is <code>NIL</code>). So, e.g.,
  321. <code>(:NON-GREEDY-REPETITION 0 1 &quot;ab&quot;)</code> is equivalent
  322. to the Perl regex string <code>&quot;(?:ab)??&quot;</code>.
  323. <li><code>(:STANDALONE &lt;<i>parse-tree</i>&gt;)</code> is an
  324. &quot;independent&quot; subexpression, i.e. <code>(:STANDALONE
  325. &quot;bar&quot;)</code> is equivalent to the Perl regex string
  326. <code>&quot;(?>bar)&quot;</code>.
  327. <li><code>(:REGISTER &lt;<i>parse-tree</i>&gt;)</code> is a capturing
  328. register group. As usual, registers are counted from left to right
  329. beginning with 1.
  330. <li><code>(:NAMED-REGISTER &lt;<i>name</i>&gt; &lt;<i>parse-tree</i>&gt;)</code> is a named capturing
  331. register group. Acts as <code>:REGISTER</code>, but assigns <code>&lt;<i>name</i>&gt;</code> to a register too. This <code>&lt;<i>name</i>&gt;</code> can be later referred to via <code>:BACK-REFERENCE</code>. Names are case-sensitive and don't need to be unique. See <a href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a> for details.
  332. <li><code>(:BACK-REFERENCE &lt;<i>ref</i>&gt;)</code> is a
  333. back-reference to a register group. <code>&lt;<i>ref</i>&gt;</code> is
  334. a positive integer or a string denoting a register name. If there are
  335. several registers with the same name, the regex engine tries to
  336. successfully match at least of them, starting with the most recently
  337. seen register continuing to the least recently seen one, until a match
  338. is found. See <a
  339. href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a>
  340. for more information.
  341. <li><code>(:PROPERTY|:INVERTED-PROPERTY &lt;<i>property</i>&gt;)</code> is
  342. a <a href="#*property-resolver*">named property</a> (or its inverse) with
  343. <code>&lt;<i>property</i>&gt;</code> being a function designator or a
  344. string which must be resolved
  345. by <a href="#*property-resolver*"><code>*PROPERTY-RESOLVER*</code></a>.
  346. <li><a class=none name="filterdef"><code>(:FILTER &lt;<i>function</i>&gt; <tt>&amp;optional</tt>
  347. &lt;<i>length</i>&gt;)</code></a> where
  348. <code>&lt;<i>function</i>&gt;</code> is a <a
  349. href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
  350. designator</a> and <code>&lt;<i>length</i>&gt;</code> is a
  351. non-negative integer or <code>NIL</code> is a user-defined <a
  352. href="#filters">filter</a>.
  353. <li><code>(:REGEX &lt;<i>string</i>&gt;)</code> where
  354. <code>&lt;<i>string</i>&gt;</code> is an
  355. embedded <a href="#create-scanner">regular expression in Perl
  356. syntax</a>.
  357. <li><code>(:CHAR-CLASS|:INVERTED-CHAR-CLASS
  358. {&lt;<i>item</i>&gt;}*)</code> where <code>&lt;<i>item</i>&gt;</code>
  359. is either a character, a <em>character range</em>, a named property
  360. (see above), or a symbol for a special character class (see above)
  361. will be translated into a (one character wide) character
  362. class. A <em>character range</em> looks like
  363. <code>(:RANGE &lt;<i>char1</i>&gt; &lt;<i>char2</i>&gt;)</code> where
  364. <code>&lt;<i>char1</i>&gt;</code> and
  365. <code>&lt;<i>char2</i>&gt;</code> are characters such that
  366. <code>(CHAR&lt;= &lt;<i>char1</i>&gt; &lt;<i>char2</i>&gt;)</code> is
  367. true. Example: <code>(:INVERTED-CHAR-CLASS #\a (:RANGE #\D #\G)
  368. :DIGIT-CLASS)</code> is equivalent to the Perl regex string
  369. <code>&quot;[^aD-G\d]&quot;</code>.
  370. </ul>
  371. Because <code>CREATE-SCANNER</code> is defined as a generic function
  372. which dispatches on its first argument there's a certain ambiguity:
  373. Although strings are valid parse trees they will be interpreted as
  374. Perl regex strings when given to <code>CREATE-SCANNER</code>. To
  375. circumvent this you can always use the equivalent parse tree <code>(:GROUP
  376. &lt;<i>string</i>&gt;)</code> instead.
  377. <p>
  378. Note that <code>CREATE-SCANNER</code> doesn't always check
  379. for the well-formedness of its first argument, i.e. you are expected
  380. to provide <em>correct</em> parse trees.
  381. <p>
  382. The usage of the keyword argument <code>extended-mode</code> obviously
  383. doesn't make sense if <code>CREATE-SCANNER</code> is applied to parse
  384. trees and will signal an error.
  385. <p>
  386. If <code>destructive</code> is not <code>NIL</code> (the default is
  387. <code>NIL</code>), the function is allowed to destructively modify
  388. <code><i>parse-tree</i></code> while creating the scanner.
  389. <p>
  390. If you want to find out how parse trees are related to Perl regex
  391. strings, you should play around with
  392. <a href="#parse-string"><code>PARSE-STRING</code></a>:
  393. <pre>
  394. * (parse-string "(ab)*")
  395. (:GREEDY-REPETITION 0 NIL (:REGISTER "ab"))
  396. * (parse-string "(a(b))")
  397. (:REGISTER (:SEQUENCE #\a (:REGISTER #\b)))
  398. * (parse-string "(?:abc){3,5}")
  399. (:GREEDY-REPETITION 3 5 (:GROUP "abc"))
  400. <font color=orange>;; (:GREEDY-REPETITION 3 5 "abc") would also be OK</font>
  401. * (parse-string "a(?i)b(?-i)c")
  402. (:SEQUENCE #\a
  403. (:SEQUENCE (:FLAGS :CASE-INSENSITIVE-P)
  404. (:SEQUENCE #\b (:SEQUENCE (:FLAGS :CASE-SENSITIVE-P) #\c))))
  405. <font color=orange>;; same as (:SEQUENCE #\a :CASE-INSENSITIVE-P #\b :CASE-SENSITIVE-P #\c)</font>
  406. * (parse-string "(?=a)b")
  407. (:SEQUENCE (:POSITIVE-LOOKAHEAD #\a) #\b)
  408. </pre></blockquote>
  409. <p><br>
  410. <font color=green><b>For the rest of the dictionary, </b><code><i>regex</i></code><b> can
  411. always be a string (which is interpreted as a Perl regular
  412. expression), a parse tree, or a scanner created by
  413. <a href="#create-scanner"><font color=green><code>CREATE-SCANNER</code></font></a>. The
  414. </b><code><i>start</i></code><b> and </b><code><i>end</i></code><b>
  415. keyword parameters are always used as in <a
  416. href="#scan"><font color=green><code>SCAN</code></font></a>.</b></font>
  417. <p><br>[Generic Function]
  418. <br><a class=none name="scan"><b>scan</b> <i>regex target-string <tt>&amp;key</tt> start end</i> =&gt; <i>match-start, match-end, reg-starts, reg-ends</i></a>
  419. <blockquote><br>
  420. Searches the string <code><i>target-string</i></code>
  421. from <code><i>start</i></code> (which defaults to 0) to
  422. <code><i>end</i></code> (which default to the length of
  423. <code><i>target-string</i></code>) and tries to match
  424. <code><i>regex</i></code>. On success returns four values - the start
  425. of the match, the end of the match, and two arrays denoting the
  426. beginnings and ends of register matches. On failure returns
  427. <code>NIL</code>. <code><i>target-string</i></code> will be coerced
  428. to a simple string if it isn't one already. (There's another keyword
  429. parameter <code><i>real-start-pos</i></code>. This one should
  430. <em>never</em> be set from user code - it is only used internally.)
  431. <p>
  432. <code>SCAN</code> acts as if the part of
  433. <code><i>target-string</i></code> between <code><i>start</i></code>
  434. and <code><i>end</i></code> were a standalone string, i.e. look-aheads
  435. and look-behinds can't look beyond these boundaries.
  436. <pre>
  437. * (scan "(a)*b" "xaaabd")
  438. 1
  439. 5
  440. #(3)
  441. #(4)
  442. * (scan "(a)*b" "xaaabd" :start 1)
  443. 1
  444. 5
  445. #(3)
  446. #(4)
  447. * (scan "(a)*b" "xaaabd" :start 2)
  448. 2
  449. 5
  450. #(3)
  451. #(4)
  452. * (scan "(a)*b" "xaaabd" :end 4)
  453. NIL
  454. * (scan '(:greedy-repetition 0 nil #\b) "bbbc")
  455. 0
  456. 3
  457. #()
  458. #()
  459. * (scan '(:greedy-repetition 4 6 #\b) "bbbc")
  460. NIL
  461. * (let ((s (create-scanner "(([a-c])+)x")))
  462. (scan s "abcxy"))
  463. 0
  464. 4
  465. #(0 2)
  466. #(3 3)
  467. </pre></blockquote>
  468. <p><br>[Function]
  469. <br><a class=none name="scan-to-strings"><b>scan-to-strings</b> <i>regex target-string <tt>&amp;key</tt> start end sharedp</i> =&gt; <i>match, regs</i></a>
  470. <blockquote><br>
  471. Like <a href="#scan"><code>SCAN</code></a> but returns substrings of
  472. <code><i>target-string</i></code> instead of positions, i.e. this
  473. function returns two values on success: the whole match as a string
  474. plus an array of substrings (or <code>NIL</code>s) corresponding to
  475. the matched registers. If <code><i>sharedp</i></code> is true, the substrings may share structure with
  476. <code><i>target-string</i></code>.
  477. <pre>
  478. * (scan-to-strings "[^b]*b" "aaabd")
  479. "aaab"
  480. #()
  481. * (scan-to-strings "([^b])*b" "aaabd")
  482. "aaab"
  483. #("a")
  484. * (scan-to-strings "(([^b])*)b" "aaabd")
  485. "aaab"
  486. #("aaa" "a")
  487. </pre></blockquote>
  488. <p><br>[Macro]
  489. <br><a class=none name="register-groups-bind"><b>register-groups-bind</b> <i>var-list (regex target-string <tt>&amp;key</tt> start end sharedp) declaration* statement*</i> =&gt; <i>result*</i></a>
  490. <blockquote><br>
  491. Evaluates <code><i>statement*</i></code> with the variables in <code><i>var-list</i></code> bound to the
  492. corresponding register groups after <code><i>target-string</i></code> has been matched
  493. against <code><i>regex</i></code>, i.e. each variable is either
  494. bound to a string or to <code>NIL</code>.
  495. As a shortcut, the elements of <code><i>var-list</i></code> can also be lists of the form <code>(FN&nbsp;VAR)</code> where <code>VAR</code> is the variable symbol
  496. and <code>FN</code> is a <a
  497. href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
  498. designator</a> (which is evaluated) denoting a function which is to be applied to the string before the result is bound to <code>VAR</code>.
  499. To make this even more convenient the form <code>(FN&nbsp;VAR1&nbsp;...VARn)</code> can be used as an abbreviation for
  500. <code>(FN&nbsp;VAR1)&nbsp;...&nbsp;(FN&nbsp;VARn)</code>.
  501. <p>
  502. If there is no match, the <code><i>statement*</i></code> forms are <em>not</em>
  503. executed. For each element of
  504. <code><i>var-list</i></code> which is <code>NIL</code> there's no binding to the corresponding register
  505. group. The number of variables in <code><i>var-list</i></code> must not be greater than
  506. the number of register groups. If <code><i>sharedp</i></code> is true, the substrings may
  507. share structure with <code><i>target-string</i></code>.
  508. <pre>
  509. * (register-groups-bind (first second third fourth)
  510. (&quot;((a)|(b)|(c))+&quot; &quot;abababc&quot; :sharedp t)
  511. (list first second third fourth))
  512. (&quot;c&quot; &quot;a&quot; &quot;b&quot; &quot;c&quot;)
  513. * (register-groups-bind (nil second third fourth)
  514. <font color=orange>;; note that we don't bind the first and fifth register group</font>
  515. (&quot;((a)|(b)|(c))()+&quot; &quot;abababc&quot; :start 6)
  516. (list second third fourth))
  517. (NIL NIL &quot;c&quot;)
  518. * (register-groups-bind (first)
  519. (&quot;(a|b)+&quot; &quot;accc&quot; :start 1)
  520. (format t &quot;This will not be printed: ~A&quot; first))
  521. NIL
  522. * (register-groups-bind (fname lname (#'parse-integer date month year))
  523. (&quot;(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})&quot; &quot;Frank Zappa 21.12.1940&quot;)
  524. (list fname lname (encode-universal-time 0 0 0 date month year 0)))
  525. ("Frank" "Zappa" 1292889600)
  526. </pre>
  527. </blockquote>
  528. <p><br>[Macro]
  529. <br><a class=none name="do-scans"><b>do-scans</b> <i>(match-start match-end reg-starts reg-ends regex target-string <tt>&amp;optional</tt> result-form <tt>&amp;key</tt> start end) declaration* statement*</i> =&gt; <i>result*</i></a>
  530. <blockquote><br>
  531. A macro which iterates over <code><i>target-string</i></code> and
  532. tries to match <code><i>regex</i></code> as often as possible
  533. evaluating <code><i>statement*</i></code> with
  534. <code><i>match-start</i></code>, <code><i>match-end</i></code>,
  535. <code><i>reg-starts</i></code>, and <code><i>reg-ends</i></code> bound
  536. to the four return values of each match (see <a
  537. href="#scan"><code>SCAN</code></a>) in turn. After the last match,
  538. returns <code><i>result-form</i></code> if provided or
  539. <code>NIL</code> otherwise. An implicit block named <code>NIL</code>
  540. surrounds <code>DO-SCANS</code>; <code>RETURN</code> may be used to
  541. terminate the loop immediately. If <code><i>regex</i></code> matches
  542. an empty string, the scan is continued one position behind this match.
  543. <p>
  544. This is the most general macro to iterate over all matches in a target
  545. string. See the source code of <a
  546. href="#do-matches"><code>DO-MATCHES</code></a>, <a
  547. href="#all-matches"><code>ALL-MATCHES</code></a>, <a
  548. href="#split"><code>SPLIT</code></a>, or <a
  549. href="#regex-replace-all"><code>REGEX-REPLACE-ALL</code></a> for examples of its
  550. usage.</blockquote>
  551. <p><br>[Macro]
  552. <br><a class=none name="do-matches"><b>do-matches</b> <i>(match-start match-end regex target-string <tt>&amp;optional</tt> result-form <tt>&amp;key</tt> start end) declaration* statement*</i> =&gt; <i>result*</i></a>
  553. <blockquote><br>
  554. Like <a href="#do-scans"><code>DO-SCANS</code></a> but doesn't bind
  555. variables to the register arrays.
  556. <pre>
  557. * (defun foo (regex target-string &amp;key (start 0) (end (length target-string)))
  558. (let ((sum 0))
  559. (do-matches (s e regex target-string nil :start start :end end)
  560. (incf sum (- e s)))
  561. (format t "~,2F% of the string was inside of a match~%"
  562. <font color=orange>;; note: doesn't check for division by zero</font>
  563. (float (* 100 (/ sum (- end start)))))))
  564. FOO
  565. * (foo "a" "abcabcabc")
  566. 33.33% of the string was inside of a match
  567. NIL
  568. * (foo "aa|b" "aacabcbbc")
  569. 55.56% of the string was inside of a match
  570. NIL
  571. </pre></blockquote>
  572. <p><br>[Macro]
  573. <br><a class=none name="do-matches-as-strings"><b>do-matches-as-strings</b> <i>(match-var regex target-string <tt>&amp;optional</tt> result-form <tt>&amp;key</tt> start end sharedp) declaration* statement*</i> =&gt; <i>result*</i></a>
  574. <blockquote><br>
  575. Like <a href="#do-matches"><code>DO-MATCHES</code></a> but binds
  576. <code><i>match-var</i></code> to the substring of
  577. <code><i>target-string</i></code> corresponding to each match in turn. If <code><i>sharedp</i></code> is true, the substrings may share structure with
  578. <code><i>target-string</i></code>.
  579. <pre>
  580. * (defun crossfoot (target-string &amp;key (start 0) (end (length target-string)))
  581. (let ((sum 0))
  582. (do-matches-as-strings (m :digit-class
  583. target-string nil
  584. :start start :end end)
  585. (incf sum (parse-integer m)))
  586. (if (< sum 10)
  587. sum
  588. (crossfoot (format nil "~A" sum)))))
  589. CROSSFOOT
  590. * (crossfoot "bar")
  591. 0
  592. * (crossfoot "a3x")
  593. 3
  594. * (crossfoot "12345")
  595. 6
  596. </pre>
  597. Of course, in real life you would do this with <a href="#do-matches"><code>DO-MATCHES</code></a> and use the <code><i>start</i></code> and <code><i>end</i></code> keyword parameters of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_parse_.htm"><code>PARSE-INTEGER</code></a>.</blockquote>
  598. <p><br>[Macro]
  599. <br><a class=none name="do-register-groups"><b>do-register-groups</b> <i>var-list (regex target-string <tt>&amp;optional</tt> result-form <tt>&amp;key</tt> start end sharedp) declaration* statement*</i> =&gt; <i>result*</i></a>
  600. <blockquote><br>
  601. Iterates over <code><i>target-string</i></code> and tries to match <code><i>regex</i></code> as often as
  602. possible evaluating <code><i>statement*</i></code> with the variables in <code><i>var-list</i></code> bound to the
  603. corresponding register groups for each match in turn, i.e. each
  604. variable is either bound to a string or to <code>NIL</code>. You can use the same shortcuts and abbreviations as in <a href="#register-groups-bind"><code>REGISTER-GROUPS-BIND</code></a>. The number of
  605. variables in <code><i>var-list</i></code> must not be greater than the number of register
  606. groups. For each element of
  607. <code><i>var-list</i></code> which is <code>NIL</code> there's no binding to the corresponding register
  608. group. After the last match, returns <code><i>result-form</i></code> if provided or <code>NIL</code>
  609. otherwise. An implicit block named <code>NIL</code> surrounds <code>DO-REGISTER-GROUPS</code>;
  610. <code>RETURN</code> may be used to terminate the loop immediately. If <code><i>regex</i></code> matches
  611. an empty string, the scan is continued one position behind this
  612. match. If <code><i>sharedp</i></code> is true, the substrings may share structure with
  613. <code><i>target-string</i></code>.
  614. <pre>
  615. * (do-register-groups (first second third fourth)
  616. (&quot;((a)|(b)|(c))&quot; &quot;abababc&quot; nil :start 2 :sharedp t)
  617. (print (list first second third fourth)))
  618. (&quot;a&quot; &quot;a&quot; NIL NIL)
  619. (&quot;b&quot; NIL &quot;b&quot; NIL)
  620. (&quot;a&quot; &quot;a&quot; NIL NIL)
  621. (&quot;b&quot; NIL &quot;b&quot; NIL)
  622. (&quot;c&quot; NIL NIL &quot;c&quot;)
  623. NIL
  624. * (let (result)
  625. (do-register-groups ((#'parse-integer n) (#'intern sign) whitespace)
  626. (&quot;(\\d+)|(\\+|-|\\*|/)|(\\s+)&quot; &quot;12*15 - 42/3&quot;)
  627. (unless whitespace
  628. (push (or n sign) result)))
  629. (nreverse result))
  630. (12 * 15 - 42 / 3)
  631. </pre>
  632. </blockquote>
  633. <p><br>[Function]
  634. <br><a class=none name="all-matches"><b>all-matches</b> <i>regex target-string <tt>&amp;key</tt> start end</i> =&gt; <i>list</i></a>
  635. <blockquote><br>
  636. Returns a list containing the start and end positions of all matches
  637. of <code><i>regex</i></code> against
  638. <code><i>target-string</i></code>, i.e. if there are <code>N</code>
  639. matches the list contains <code>(* 2 N)</code> elements. If
  640. <code><i>regex</i></code> matches an empty string the scan is
  641. continued one position behind this match.
  642. <pre>
  643. * (all-matches "a" "foo bar baz")
  644. (5 6 9 10)
  645. * (all-matches "\\w*" "foo bar baz")
  646. (0 3 3 3 4 7 7 7 8 11 11 11)
  647. </pre></blockquote>
  648. <p><br>[Function]
  649. <br><a class=none name="all-matches-as-strings"><b>all-matches-as-strings</b> <i>regex target-string <tt>&amp;key</tt> start end sharedp</i> =&gt; <i>list</i></a>
  650. <blockquote><br>
  651. Like <a href="#all-matches"><code>ALL-MATCHES</code></a> but
  652. returns a list of substrings instead. If <code><i>sharedp</i></code> is true, the substrings may share structure with
  653. <code><i>target-string</i></code>.
  654. <pre>
  655. * (all-matches-as-strings "a" "foo bar baz")
  656. ("a" "a")
  657. * (all-matches-as-strings "\\w*" "foo bar baz")
  658. ("foo" "" "bar" "" "baz" "")
  659. </pre></blockquote>
  660. <h4><a name="splitting" class=none>Splitting and replacing</a></h4>
  661. <p><br>[Function]
  662. <br><a class=none name="split"><b>split</b> <i>regex target-string <tt>&amp;key</tt> start end limit with-registers-p omit-unmatched-p sharedp</i> =&gt; <i>list</i></a>
  663. <blockquote><br>
  664. Matches <code><i>regex</i></code> against
  665. <code><i>target-string</i></code> as often as possible and returns a
  666. list of the substrings between the matches. If
  667. <code><i>with-registers-p</i></code> is true, substrings corresponding
  668. to matched registers are inserted into the list as well. If
  669. <code><i>omit-unmatched-p</i></code> is true, unmatched registers will
  670. simply be left out, otherwise they will show up as
  671. <code>NIL</code>. <code><i>limit</i></code> limits the number of
  672. elements returned - registers aren't counted. If
  673. <code><i>limit</i></code> is <code>NIL</code> (or 0 which is
  674. equivalent), trailing empty strings are removed from the result list.
  675. If <code><i>regex</i></code> matches an empty string, the scan is
  676. continued one position behind this match. If <code><i>sharedp</i></code> is true, the substrings may share structure with
  677. <code><i>target-string</i></code>.
  678. <p>
  679. This function also tries hard to be
  680. Perl-compatible - thus the somewhat peculiar behaviour.
  681. <pre>
  682. * (split "\\s+" "foo bar baz
  683. frob")
  684. ("foo" "bar" "baz" "frob")
  685. * (split "\\s*" "foo bar baz")
  686. ("f" "o" "o" "b" "a" "r" "b" "a" "z")
  687. * (split "(\\s+)" "foo bar baz")
  688. ("foo" "bar" "baz")
  689. * (split "(\\s+)" "foo bar baz" :with-registers-p t)
  690. ("foo" " " "bar" " " "baz")
  691. * (split "(\\s)(\\s*)" "foo bar baz" :with-registers-p t)
  692. ("foo" " " "" "bar" " " " " "baz")
  693. * (split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
  694. ("foo" "," NIL "bar" NIL ";" "baz")
  695. * (split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
  696. ("foo" "," "bar" ";" "baz")
  697. * (split ":" "a:b:c:d:e:f:g::")
  698. ("a" "b" "c" "d" "e" "f" "g")
  699. * (split ":" "a:b:c:d:e:f:g::" :limit 1)
  700. ("a:b:c:d:e:f:g::")
  701. * (split ":" "a:b:c:d:e:f:g::" :limit 2)
  702. ("a" "b:c:d:e:f:g::")
  703. * (split ":" "a:b:c:d:e:f:g::" :limit 3)
  704. ("a" "b" "c:d:e:f:g::")
  705. * (split ":" "a:b:c:d:e:f:g::" :limit 1000)
  706. ("a" "b" "c" "d" "e" "f" "g" "" "")
  707. </pre></blockquote>
  708. <p><br>[Function]
  709. <br><a class=none name="regex-replace"><b>regex-replace</b> <i>regex target-string replacement <tt>&amp;key</tt> start end preserve-case simple-calls element-type</i> =&gt; <i>string, matchp</i></a>
  710. <blockquote><br> Try to match <code><i>target-string</i></code>
  711. between <code><i>start</i></code> and <code><i>end</i></code> against
  712. <code><i>regex</i></code> and replace the first match with
  713. <code><i>replacement</i></code>. Two values are returned; the modified
  714. string, and <code>T</code> if <code><i>regex</i></code> matched or
  715. <code>NIL</code> otherwise.
  716. <p>
  717. <code><i>replacement</i></code> can be a string which may contain the
  718. special substrings <code>&quot;\&amp;&quot;</code> for the whole
  719. match, <code>&quot;\`&quot;</code> for the part of
  720. <code><i>target-string</i></code> before the match,
  721. <code>&quot;\'&quot;</code> for the part of
  722. <code><i>target-string</i></code> after the match,
  723. <code>&quot;\N&quot;</code> or <code>&quot;\{N}&quot;</code> for the
  724. <code>N</code>th register where <code>N</code> is a positive integer.
  725. <p>
  726. <code><i>replacement</i></code> can also be a <a
  727. href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
  728. designator</a> in which case the match will be replaced with the
  729. result of calling the function designated by
  730. <code><i>replacement</i></code> with the arguments
  731. <code><i>target-string</i></code>, <code><i>start</i></code>,
  732. <code><i>end</i></code>, <code><i>match-start</i></code>,
  733. <code><i>match-end</i></code>, <code><i>reg-starts</i></code>, and
  734. <code><i>reg-ends</i></code>. (<code><i>reg-starts</i></code> and
  735. <code><i>reg-ends</i></code> are arrays holding the start and end
  736. positions of matched registers (or <code>NIL</code>) - the meaning of
  737. the other arguments should be obvious.)
  738. <p>
  739. If <code><i>simple-calls</i></code> is true, a function designated by
  740. <code><i>replacement</i></code> will instead be called with the
  741. arguments <code><i>match</i></code>, <code><i>register-1</i></code>,
  742. ..., <code><i>register-n</i></code> where <code><i>match</i></code> is
  743. the whole match as a string and <code><i>register-1</i></code> to
  744. <code><i>register-n</i></code> are the matched registers, also as
  745. strings (or <code>NIL</code>). Note that these strings share structure with
  746. <code><i>target-string</i></code> so you must not modify them.
  747. <p>
  748. Finally, <code><i>replacement</i></code> can be a list where each
  749. element is a string (which will be inserted verbatim), one of the
  750. symbols <code>:match</code>, <code>:before-match</code>, or
  751. <code>:after-match</code> (corresponding to
  752. <code>&quot;\&amp;&quot;</code>, <code>&quot;\`&quot;</code>, and
  753. <code>&quot;\'&quot;</code> above), an integer <code>N</code>
  754. (representing register <code>(1+&nbsp;N)</code>), or a function
  755. designator.
  756. <p>
  757. If <code><i>preserve-case</i></code> is true (default is
  758. <code>NIL</code>), the replacement will try to preserve the case (all
  759. upper case, all lower case, or capitalized) of the match. The result
  760. will always be a <a
  761. href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#fresh">fresh</a>
  762. string, even if <code><i>regex</i></code> doesn't match.
  763. <p>
  764. <code><i>element-type</i></code> specifies
  765. the <a
  766. href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_a.htm#array_element_type">array
  767. element type</a> of the string which is returned, the default
  768. is <a
  769. href="http://www.lispworks.com/documentation/lw50/LWRM/html/lwref-346.htm"><code>LW:SIMPLE-CHAR</code></a>
  770. for LispWorks
  771. and <a
  772. href="http://www.lispworks.com/documentation/HyperSpec/Body/t_ch.htm"><code>CHARACTER</code></a>
  773. for other Lisps.
  774. <pre>
  775. * (regex-replace "fo+" "foo bar" "frob")
  776. "frob bar"
  777. T
  778. * (regex-replace "fo+" "FOO bar" "frob")
  779. "FOO bar"
  780. NIL
  781. * (regex-replace "(?i)fo+" "FOO bar" "frob")
  782. "frob bar"
  783. T
  784. * (regex-replace "(?i)fo+" "FOO bar" "frob" :preserve-case t)
  785. "FROB bar"
  786. T
  787. * (regex-replace "(?i)fo+" "Foo bar" "frob" :preserve-case t)
  788. "Frob bar"
  789. T
  790. * (regex-replace "bar" "foo bar baz" "[frob (was '\\&' between '\\`' and '\\'')]")
  791. "foo [frob (was 'bar' between 'foo ' and ' baz')] baz"
  792. T
  793. * (regex-replace "bar" "foo bar baz"
  794. '("[frob (was '" :match "' between '" :before-match "' and '" :after-match "')]"))
  795. "foo [frob (was 'bar' between 'foo ' and ' baz')] baz"
  796. T
  797. * (regex-replace "(be)(nev)(o)(lent)"
  798. "benevolent: adj. generous, kind"
  799. #'(lambda (match &amp;rest registers)
  800. (format nil "~A [~{~A~^.~}]" match registers))
  801. :simple-calls t)
  802. "benevolent [be.nev.o.lent]: adj. generous, kind"
  803. T
  804. </pre></blockquote>
  805. <p><br>[Function]
  806. <br><a class=none name="regex-replace-all"><b>regex-replace-all</b> <i>regex target-string replacement <tt>&amp;key</tt> start end preserve-case simple-calls element-type</i> =&gt; <i>string, matchp</i></a>
  807. <blockquote><br>
  808. Like <a href="#regex-replace"><code>REGEX-REPLACE</code></a> but replaces all matches.
  809. <pre>
  810. * (regex-replace-all "(?i)fo+" "foo Fooo FOOOO bar" "frob" :preserve-case t)
  811. "frob Frob FROB bar"
  812. T
  813. * (regex-replace-all "(?i)f(o+)" "foo Fooo FOOOO bar" "fr\\1b" :preserve-case t)
  814. "froob Frooob FROOOOB bar"
  815. T
  816. * (let ((qp-regex (create-scanner "[\\x80-\\xff]")))
  817. (defun encode-quoted-printable (string)
  818. "Converts 8-bit string to quoted-printable representation."
  819. <font color=orange>;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
  820. (flet ((convert (target-string start end match-start match-end reg-starts reg-ends)
  821. (declare (ignore start end match-end reg-starts reg-ends))
  822. (format nil "=~2,'0x" (char-code (char target-string match-start)))))
  823. (regex-replace-all qp-regex string #'convert))))
  824. Converted ENCODE-QUOTED-PRINTABLE.
  825. ENCODE-QUOTED-PRINTABLE
  826. * (encode-quoted-printable "F&ecirc;te S&oslash;rensen na&iuml;ve H&uuml;hner Stra&szlig;e")
  827. "F=EAte S=F8rensen na=EFve H=FChner Stra=DFe"
  828. T
  829. * (let ((url-regex (create-scanner "[^a-zA-Z0-9_\\-.]")))
  830. (defun url-encode (string)
  831. "URL-encodes a string."
  832. <font color=orange>;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
  833. (flet ((convert (target-string start end match-start match-end reg-starts reg-ends)
  834. (declare (ignore start end match-end reg-starts reg-ends))
  835. (format nil "%~2,'0x" (char-code (char target-string match-start)))))
  836. (regex-replace-all url-regex string #'convert))))
  837. Converted URL-ENCODE.
  838. URL-ENCODE
  839. * (url-encode "F&ecirc;te S&oslash;rensen na&iuml;ve H&uuml;hner Stra&szlig;e")
  840. "F%EAte%20S%F8rensen%20na%EFve%20H%FChner%20Stra%DFe"
  841. T
  842. * (defun how-many (target-string start end match-start match-end reg-starts reg-ends)
  843. (declare (ignore start end match-start match-end))
  844. (format nil "~A" (- (svref reg-ends 0)
  845. (svref reg-starts 0))))
  846. HOW-MANY
  847. * (regex-replace-all "{(.+?)}"
  848. "foo{...}bar{.....}{..}baz{....}frob"
  849. (list "[" 'how-many " dots]"))
  850. "foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
  851. T
  852. * (let ((qp-regex (create-scanner "[\\x80-\\xff]")))
  853. (defun encode-quoted-printable (string)
  854. "Converts 8-bit string to quoted-printable representation.
  855. Version using SIMPLE-CALLS keyword argument."
  856. <font color=orange>;; ;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
  857. (flet ((convert (match)
  858. (format nil "=~2,'0x" (char-code (char match 0)))))
  859. (regex-replace-all qp-regex string #'convert
  860. :simple-calls t))))
  861. Converted ENCODE-QUOTED-PRINTABLE.
  862. ENCODE-QUOTED-PRINTABLE
  863. * (encode-quoted-printable "F&ecirc;te S&oslash;rensen na&iuml;ve H&uuml;hner Stra&szlig;e")
  864. "F=EAte S=F8rensen na=EFve H=FChner Stra=DFe"
  865. T
  866. * (defun how-many (match first-register)
  867. (declare (ignore match))
  868. (format nil "~A" (length first-register)))
  869. HOW-MANY
  870. * (regex-replace-all "{(.+?)}"
  871. "foo{...}bar{.....}{..}baz{....}frob"
  872. (list "[" 'how-many " dots]")
  873. :simple-calls t)
  874. "foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
  875. T
  876. </pre></blockquote>
  877. <h4><a name="modify" class=none>Modifying scanner behaviour</a></h4>
  878. <p><br>[Special variable]
  879. <br><a class=none name="*property-resolver*"><b>*property-resolver*</b></a>
  880. </p><blockquote><br> This is the designator for a function responsible
  881. for resolving named properties like <code>\p{Number}</code>. If
  882. CL-PPCRE encounters a <code>\p</code> or a <code>\P</code> it expects
  883. to see an opening curly brace immediately afterwards and will then
  884. read everything following that brace until it sees a closing curly
  885. brace. The resolver function will be called with this string and must
  886. return a corresponding unary test function which accepts a character
  887. as its argument and returns a true value if and only if the character
  888. has the named property. If the resolver returns <code>NIL</code>
  889. instead, it signals that a property of that name is unknown.
  890. <pre>
  891. * (labels ((char-code-odd-p (char)
  892. (oddp (char-code char)))
  893. (char-code-even-p (char)
  894. (evenp (char-code char)))
  895. (resolver (name)
  896. (cond ((string= name "odd") #'char-code-odd-p)
  897. ((string= name "even") #'char-code-even-p)
  898. ((string= name "true") (constantly t))
  899. (t (error "Can't resolve ~S." name)))))
  900. (let ((*property-resolver* #'resolver))
  901. <font color=orange>;; quiz question - why do we need CREATE-SCANNER here?</font>
  902. (list (regex-replace-all (create-scanner "\\p{odd}") "abcd" "+")
  903. (regex-replace-all (create-scanner "\\p{even}") "abcd" "+")
  904. (regex-replace-all (create-scanner "\\p{true}") "abcd" "+"))))
  905. ("+b+d" "a+c+" "++++")
  906. </pre>
  907. If the value
  908. of <a href="#*property-resolver*"><code>*PROPERTY-RESOLVER*</code></a>
  909. is <code>NIL</code> (which is the default), <code>\p</code> and <code>\P</code> in regex
  910. strings will simply be treated like <code>p</code> or <code>P</code>
  911. as in CL-PPCRE&nbsp;1.4.1 and earlier. Note that this does not affect
  912. the validity of <code>(:PROPERTY&nbsp;&lt;<i>name</i>&gt;)</code>
  913. parts in <a href="#create-scanner2">S-expression syntax</a>.
  914. </blockquote>
  915. <p><br>[Accessor]
  916. <br><a class="none" name="parse-tree-synonym"><b>parse-tree-synonym</b> <i>symbol</i> =&gt; <i>parse-tree</i>
  917. <br><tt>(setf (</tt><b>parse-tree-synonym</b> <i>symbol</i><tt>)</tt> <i>new-parse-tree</i><tt>)</tt></a>
  918. </p><blockquote><br>
  919. Any symbol (unless it's a keyword with a special meaning in parse
  920. trees) can be made a "synonym", i.e. an abbreviation, for another parse
  921. tree by this accessor. <code>PARSE-TREE-SYNONYM</code> returns <code>NIL</code> if <code><i>symbol</i></code> isn't a synonym yet.
  922. <pre>
  923. * (parse-string "a*b+")
  924. (:SEQUENCE (:GREEDY-REPETITION 0 NIL #\a) (:GREEDY-REPETITION 1 NIL #\b))
  925. * (defun my-repetition (char min)
  926. `(:greedy-repetition ,min nil ,char))
  927. MY-REPETITION
  928. * (setf (parse-tree-synonym 'a*) (my-repetition #\a 0))
  929. (:GREEDY-REPETITION 0 NIL #\a)
  930. * (setf (parse-tree-synonym 'b+) (my-repetition #\b 1))
  931. (:GREEDY-REPETITION 1 NIL #\b)
  932. * (let ((scanner (create-scanner '(:sequence a* b+))))
  933. (dolist (string '("ab" "b" "aab" "a" "x"))
  934. (print (scan scanner string)))
  935. (values))
  936. 0
  937. 0
  938. 0
  939. NIL
  940. NIL
  941. * (parse-tree-synonym 'a*)
  942. (:GREEDY-REPETITION 0 NIL #\a)
  943. * (parse-tree-synonym 'a+)
  944. NIL
  945. </pre></blockquote>
  946. <p><br>[Macro]
  947. <br><a class="none" name="define-parse-tree-synonym"><b>define-parse-tree-synonym</b> <i>name parse-tree</i> =&gt; <i>parse-tree</i></a>
  948. </p><blockquote><br>
  949. This is a convenience macro for parse tree synonyms defined as
  950. <pre>
  951. (defmacro define-parse-tree-synonym (name parse-tree)
  952. `(eval-when (:compile-toplevel :load-toplevel :execute)
  953. (setf (parse-tree-synonym ',name) ',parse-tree)))
  954. </pre>
  955. so you can write code like this:
  956. <pre>
  957. (define-parse-tree-synonym a-z
  958. (:char-class (:range #\a #\z) (:range #\A #\Z)))
  959. (define-parse-tree-synon

Large files files are truncated, but you can click here to view the full file