/doc/index.html
HTML | 2206 lines | 1861 code | 345 blank | 0 comment | 0 complexity | b8d64a620f112e30367282f0960294fa MD5 | raw file
Large files files are truncated, but you can click here to view the full file
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
- <html>
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <title>CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp</title>
- <style type="text/css">
- pre { padding:5px; background-color:#e0e0e0 }
- h3, h4 { text-decoration: underline; }
- a { text-decoration: none; padding: 1px 2px 1px 2px; }
- a:visited { text-decoration: none; padding: 1px 2px 1px 2px; }
- a:hover { text-decoration: none; padding: 1px 1px 1px 1px; border: 1px solid #000000; }
- a:focus { text-decoration: none; padding: 1px 2px 1px 2px; border: none; }
- a.none { text-decoration: none; padding: 0; }
- a.none:visited { text-decoration: none; padding: 0; }
- a.none:hover { text-decoration: none; border: none; padding: 0; }
- a.none:focus { text-decoration: none; border: none; padding: 0; }
- a.noborder { text-decoration: none; padding: 0; }
- a.noborder:visited { text-decoration: none; padding: 0; }
- a.noborder:hover { text-decoration: none; border: none; padding: 0; }
- a.noborder:focus { text-decoration: none; border: none; padding: 0; }
- pre.none { padding:5px; background-color:#ffffff }
- </style>
- <meta name="description" content="Fast and portable perl-compatible regular expressions for Common Lisp.">
- </head>
- <body bgcolor=white>
- <h2>CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp</h2>
- <blockquote>
- <br> <br><h3>Abstract</h3>
- CL-PPCRE is a portable regular expression library for Common Lisp
- which has the following features:
- <ul>
- <li>It is <b>compatible with Perl</b>.
- <li>It is pretty <b>fast</b>.
- <li>It is <b>portable</b> between ANSI-compliant Common Lisp
- implementations.
- <li>It is <b>thread-safe</b>.
- <li>In addition to specifying regular expressions as strings like in
- Perl you can also use <a
- href="#create-scanner2"><b>S-expressions</b></a>.
- <li>It comes with a <a
- href="http://www.opensource.org/licenses/bsd-license.php"><b>BSD-style
- license</b></a> so you can basically do with it whatever you want.
- </ul>
- CL-PPCRE has been used successfully in various applications like <a
- href="http://nostoc.stanford.edu/Docs/">BioBike</a>,
- <a href="http://clutu.com/">clutu</a>,
- <a
- href="http://www.hpc.unm.edu/~download/LoGS/">LoGS</a>, <a href="http://cafespot.net/">CafeSpot</a>, <a href="http://www.eboy.com/">Eboy</a>, or <a
- href="http://weitz.de/regex-coach/">The Regex Coach</a>.
- <p>
- <font color=red>Download shortcut:</font> <a href="http://weitz.de/files/cl-ppcre.tar.gz">http://weitz.de/files/cl-ppcre.tar.gz</a>.
- </blockquote>
- <br> <br><h3><a class=none name="contents">Contents</a></h3>
- <ol>
- <li><a href="#install">Download and installation</a>
- <li><a href="#support">Support</a>
- <li><a href="#dict">The CL-PPCRE dictionary</a>
- <ol>
- <li><a href="#scanning">Scanning</a>
- <ol>
- <li><a href="#create-scanner"><code>create-scanner</code></a> (for Perl regex strings)
- <li><a href="#create-scanner2"><code>create-scanner</code></a> (for parse trees)
- <li><a href="#scan"><code>scan</code></a>
- <li><a href="#scan-to-strings"><code>scan-to-strings</code></a>
- <li><a href="#register-groups-bind"><code>register-groups-bind</code></a>
- <li><a href="#do-scans"><code>do-scans</code></a>
- <li><a href="#do-matches"><code>do-matches</code></a>
- <li><a href="#do-matches-as-strings"><code>do-matches-as-strings</code></a>
- <li><a href="#do-register-groups"><code>do-register-groups</code></a>
- <li><a href="#all-matches"><code>all-matches</code></a>
- <li><a href="#all-matches-as-strings"><code>all-matches-as-strings</code></a>
- </ol>
- <li><a href="#splitting">Splitting and replacing</a>
- <ol>
- <li><a href="#split"><code>split</code></a>
- <li><a href="#regex-replace"><code>regex-replace</code></a>
- <li><a href="#regex-replace-all"><code>regex-replace-all</code></a>
- </ol>
- <li><a href="#modify">Modifying scanner behaviour</a>
- <ol>
- <li><a href="#*property-resolver*"><code>*property-resolver*</code></a>
- <li><a href="#parse-tree-synonym"><code>parse-tree-synonym</code></a>
- <li><a href="#define-parse-tree-synonym"><code>define-parse-tree-synonym</code></a>
- <li><a href="#*regex-char-code-limit*"><code>*regex-char-code-limit*</code></a>
- <li><a href="#*use-bmh-matchers*"><code>*use-bmh-matchers*</code></a>
- <li><a href="#*optimize-char-classes*"><code>*optimize-char-classes*</code></a>
- <li><a href="#*allow-quoting*"><code>*allow-quoting*</code></a>
- <li><a href="#*allow-named-registers*"><code>*allow-named-registers*</code></a>
- </ol>
- <li><a href="#misc">Miscellaneous</a>
- <ol>
- <li><a href="#parse-string"><code>parse-string</code></a>
- <li><a href="#create-optimized-test-function"><code>create-optimized-test-function</code></a>
- <li><a href="#quote-meta-chars"><code>quote-meta-chars</code></a>
- <li><a href="#regex-apropos"><code>regex-apropos</code></a>
- <li><a href="#regex-apropos-list"><code>regex-apropos-list</code></a>
- </ol>
- <li><a href="#conditions">Conditions</a>
- <ol>
- <li><a href="#ppcre-error"><code>ppcre-error</code></a>
- <li><a href="#ppcre-invocation-error"><code>ppcre-invocation-error</code></a>
- <li><a href="#ppcre-syntax-error"><code>ppcre-syntax-error</code></a>
- <li><a href="#ppcre-syntax-error-string"><code>ppcre-syntax-error-string</code></a>
- <li><a href="#ppcre-syntax-error-pos"><code>ppcre-syntax-error-pos</code></a>
- </ol>
- </ol>
- <li><a href="#unicode">Unicode properties</a>
- <ol>
- <li><a href="#unicode-property-resolver"><code>unicode-property-resolver</code></a>
- </ol>
- <li><a href="#filters">Filters</a>
- <li><a href="#perl">Compatibility with Perl</a>
- <ol>
- <li><a href="#empty">Empty strings instead of <code>undef</code> in <code>$1</code>, <code>$2</code>, etc.</a>
- <li><a href="#scope">Strange scoping of embedded modifiers</a>
- <li><a href="#inconsistent">Inconsistent capturing of <code>$1</code>, <code>$2</code>, etc.</a>
- <li><a href="#lookaround">Captured groups not available outside of look-aheads and look-behinds</a>
- <li><a href="#order">Alternations don't always work from left to right</a>
- <li><a href="#uprops">Different names for Unicode properties</a>
- <li><a href="#mac"><code>"\r"</code> doesn't work with MCL</a>
- <li><a href="#alpha">What about <code>"\w"</code>?</a>
- </ol>
- <li><a href="#bugs">Bugs and problems</a>
- <ol>
- <li><a href="#quote"><code>"\Q"</code> doesn't work, or does it?</a>
- <li><a href="#backslash">Backslashes may confuse you...</a>
- </ol>
- <li><a href="#allegro">AllegroCL compatibility mode</a>
- <li><a href="#blabla">Hints, comments, performance considerations</a>
- <li><a href="#ack">Acknowledgements</a>
- </ol>
- <br> <br><h3><a name="install" class=none>Download and installation</a></h3>
- CL-PPCRE together with this documentation can be downloaded from <a
- href="http://weitz.de/files/cl-ppcre.tar.gz">http://weitz.de/files/cl-ppcre.tar.gz</a>. The
- current version is 2.0.11.
- <p>
- CL-PPCRE comes with a system definition
- for <a href="http://www.cliki.net/asdf">ASDF</a> and you compile and
- load it in the usual way. There are no dependencies (except that the
- <a href="#test">test suite</a> which is not needed for normal operation depends
- on <a href="http://weitz.de/flexi-streams/">FLEXI-STREAMS</a>).
- <p>
- The preferred way to install CL-PPCRE is
- through <a href="http://www.quicklisp.org/" target="_new">Quicklisp</a>:
- <pre>(ql:quickload :cl-ppcre)</pre>
- </p>
- <p>
- <a class=none name="test">You</a> can run a test suite which tests most aspects of the library with
- <pre>
- (asdf:oos 'asdf:test-op :cl-ppcre)
- </pre>
- <p>
- The current development version of CL-PPCRE can be found
- at <a href="https://github.com/edicl/cl-ppcre">https://github.com/edicl/cl-ppcre</a>. If you want to send patches, please fork the github repository and send pull requests.
- <p>
- <br> <br><h3><a name="support" class=none>Support</a></h3>
- The development version of cl-ppcre can be
- found <a href="https://github.com/edicl/cl-ppcre" target="_new">on
- github</a>. Please use the github issue tracking system to submit bug
- reports. Patches are welcome, please
- use <a href="https://github.com/edicl/cl-ppcre/pulls">GitHub pull
- requests</a>. If you want to make a change,
- please <a href="http://weitz.de/patches.html" target="_new">read this
- first</a>.
- <br> <br><h3><a class=none name="dict">The CL-PPCRE dictionary</a></h3>
- <h4><a name="scanning" class=none>Scanning</a></h4>
- <p><br>[Method]
- <br><a class=none name="create-scanner"><b>create-scanner</b> <i>(string string)<tt>&key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> => <i>scanner, register-names</i></a>
- <blockquote><br> Accepts a string which is a regular expression in
- Perl syntax and returns a closure which will scan strings for this
- regular expression. The second value is only returned if <a href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a> is <i>true</i>. It represents a list of strings mapping registers to their respective names - the first element stands for first register, the second element for second register, etc. You have to store this value if you want to map a register number to its name later as <i>scanner</i> doesn't capture any information about register names. If a register isn't named, it has NIL as its name.
- <p>
- The mode keyword arguments are equivalent to the
- <code>"imsx"</code> modifiers in Perl. The
- <code>destructive</code> keyword will be ignored.
- <p>
- The function accepts most of the regex syntax of Perl 5.8 as described
- in <a href="http://perldoc.perl.org/5.8.8/perlre.html"><code>man
- perlre</code></a> including extended features like non-greedy
- repetitions, positive and negative look-ahead and look-behind
- assertions, "standalone" subexpressions, and conditional
- subpatterns. The following Perl features are (currently) <b>not</b>
- supported:
- <ul>
- <li><code>(?{ code })</code> and <code>(??{ code })</code> because
- they obviously don't make sense in Lisp.
- <li><code>\N{name}</code> (named characters), <code>\x{263a}</code>
- (wide hex characters), <code>\l</code>, <code>\u</code>,
- <code>\L</code>, and <code>\U</code>
- because they're actually not part of Perl's <em>regex</em> syntax - but see <a href="http://weitz.de/cl-interpol/">CL-INTERPOL</a>.
- <li><code>\X</code> (extended Unicode), and <code>\C</code> (single
- character). But you can of course use all characters
- supported by your CL implementation.
- <li>Posix character classes like <code>[[:alpha]]</code>.
- Use <a href="#unicode">Unicode properties</a> instead.
- <li><code>\G</code> for Perl's <code>pos()</code> because we don't have it.
- </ul>
- Note, however, that <code>\t</code>, <code>\n</code>, <code>\r</code>,
- <code>\f</code>, <code>\a</code>, <code>\e</code>, <code>\033</code>
- (octal character codes), <code>\x1B</code> (hexadecimal character
- codes), <code>\c[</code> (control characters), <code>\w</code>,
- <code>\W</code>, <code>\s</code>, <code>\S</code>, <code>\d</code>,
- <code>\D</code>, <code>\b</code>, <code>\B</code>, <code>\A</code>,
- <code>\Z</code>, and <code>\z</code> <b>are</b> supported.
- <p>
- Since version 0.6.0, CL-PPCRE also supports Perl's <code>\Q</code> and <code>\E</code> - see <a
- href="#*allow-quoting*"><code>*ALLOW-QUOTING*</code></a> below. Make sure you also read <a href="#quote">the relevant section</a> in "<a href="#bugs">Bugs and problems</a>."
- <p>
- Since version 1.3.0, CL-PPCRE offers support for <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm#regexp-new-capturing-2">AllegroCL's</a> <code>(?<name>"<regex>")</code> named registers and <code>\k<name></code> back-references syntax, have a look at <a href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a> for details.
- <p>
- Since version 2.0.0, CL-PPCRE
- supports <a href="#*property-resolver*">named properties</a>
- (<code>\p</code> and <code>\P</code>), but only the long form with
- braces is supported, i.e. <code>\p{Letter}</code>
- and <code>\p{L}</code> will work while <code>\pL</code> won't.
- <p>
- The keyword arguments are just for your
- convenience. You can always use embedded modifiers like
- <code>"(?i-s)"</code> instead.</blockquote>
- <p><br>[Method]
- <br><a class=none name="create-scanner"><b>create-scanner</b> <i>(function function)<tt>&key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> => <i>scanner</i></a>
- <blockquote><br> In this case <code><i>function</i></code> should be a
- scanner returned by another invocation
- of <code>CREATE-SCANNER</code>. It will be returned as is. You can't
- use any of the keyword arguments because the scanner has already been
- created and is immutable.
- </blockquote>
- <p><br>[Method]
- <br><a class=none name="create-scanner2"><b>create-scanner</b> <i>(parse-tree t)<tt>&key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> => <i>scanner, register-names</i></a>
- <blockquote><br>
- This is similar to <a
- href="#create-scanner"><code>CREATE-SCANNER</code></a> for regex strings above but
- accepts a <em>parse tree</em> as its first argument. A parse tree is an S-expression
- conforming to the following syntax:
- <ul>
- <li>Every string and character is a parse tree and is treated
- <em>literally</em> as a part of the regular expression,
- i.e. parentheses, brackets, asterisks and such aren't special.
- <li>The symbol <code>:VOID</code> is equivalent to the empty string.
- <li>The symbol <code>:EVERYTHING</code> is equivalent to Perl's dot,
- i.e it matches everything (except maybe a newline character depending
- on the mode).
- <li>The symbols <code>:WORD-BOUNDARY</code> and
- <code>:NON-WORD-BOUNDARY</code> are equivalent to Perl's
- <code>"\b"</code> and <code>"\B"</code>.
- <li>The symbols <code>:DIGIT-CLASS</code>,
- <code>:NON-DIGIT-CLASS</code>, <code>:WORD-CHAR-CLASS</code>,
- <code>:NON-WORD-CHAR-CLASS</code>,
- <code>:WHITESPACE-CHAR-CLASS</code>, and
- <code>:NON-WHITESPACE-CHAR-CLASS</code> are equivalent to Perl's
- <em>special character classes</em> <code>"\d"</code>,
- <code>"\D"</code>, <code>"\w"</code>,
- <code>"\W"</code>, <code>"\s"</code>, and
- <code>"\S"</code> respectively.
- <li>The symbols <code>:START-ANCHOR</code>, <code>:END-ANCHOR</code>,
- <code>:MODELESS-START-ANCHOR</code>,
- <code>:MODELESS-END-ANCHOR</code>, and
- <code>:MODELESS-END-ANCHOR-NO-NEWLINE</code> are equivalent to Perl's
- <code>"^"</code>, <code>"$"</code>,
- <code>"\A"</code>, <code>"\Z"</code>, and
- <code>"\z"</code> respectively.
- <li>The symbols <code>:CASE-INSENSITIVE-P</code>,
- <code>:CASE-SENSITIVE-P</code>, <code>:MULTI-LINE-MODE-P</code>,
- <code>:NOT-MULTI-LINE-MODE-P</code>, <code>:SINGLE-LINE-MODE-P</code>,
- and <code>:NOT-SINGLE-LINE-MODE-P</code> are equivalent to Perl's
- <em>embedded modifiers</em> <code>"(?i)"</code>,
- <code>"(?-i)"</code>, <code>"(?m)"</code>,
- <code>"(?-m)"</code>, <code>"(?s)"</code>, and
- <code>"(?-s)"</code>. As usual, changes applied to modes are
- kept local to the innermost enclosing grouping or clustering
- construct.
- </li><li>All other symbols will signal an error of type <a
- href="#ppcre-syntax-error"><code>PPCRE-SYNTAX-ERROR</code></a>
- <em>unless</em> they are defined to be <a
- href="#parse-tree-synonym"><em>parse tree synonyms</em></a>.
- <li><code>(:FLAGS {<modifier>}*)</code> where
- <code><modifier></code> is one of the modifier symbols from
- above is used to group modifier symbols. The modifiers are applied
- from left to right. (This construct is obviously redundant. It is only
- there because it's used by the parser.)
- <li><code>(:SEQUENCE {<<i>parse-tree</i>>}*)</code> means a
- sequence of parse trees, i.e. the parse trees must match one after
- another. Example: <code>(:SEQUENCE #\f #\o #\o)</code> is equivalent
- to the parse tree <code>"foo"</code>.
- <li><code>(:GROUP {<<i>parse-tree</i>>}*)</code> is like
- <code>:SEQUENCE</code> but changes applied to modifier flags (see
- above) are kept local to the parse trees enclosed by this
- construct. Think of it as the S-expression variant of Perl's
- <code>"(?:<<i>pattern</i>>)"</code> construct.
- <li><code>(:ALTERNATION {<<i>parse-tree</i>>}*)</code> means an
- alternation of parse trees, i.e. one of the parse trees must
- match. Example: <code>(:ALTERNATION #\b #\a #\z)</code> is equivalent
- to the Perl regex string <code>"b|a|z"</code>.
- <li><code>(:BRANCH <<i>test</i>>
- <<i>parse-tree</i>>)</code> is for conditional regular
- expressions. <code><<i>test</i>></code> is either a number which
- stands for a register or a parse tree which is a look-ahead or
- look-behind assertion. See the entry for
- <code>(?(<<i>condition</i>>)<<i>yes-pattern</i>>|<<i>no-pattern</i>>)</code>
- in <a
- href="http://perldoc.perl.org/perlre.html#Extended-Patterns"><code>man
- perlre</code></a> for the semantics of this construct. If
- <code><<i>parse-tree</i>></code> is an alternation is
- <em>must</em> enclose exactly one or two parse trees where the second
- one (if present) will be treated as the "no-pattern" - in
- all other cases <code><<i>parse-tree</i>></code> will be treated
- as the "yes-pattern".
- <li><code>(:POSITIVE-LOOKAHEAD|:NEGATIVE-LOOKAHEAD|:POSITIVE-LOOKBEHIND|:NEGATIVE-LOOKBEHIND
- <<i>parse-tree</i>>)</code> should be pretty obvious...
- <li><code>(:GREEDY-REPETITION|:NON-GREEDY-REPETITION
- <<i>min</i>> <<i>max</i>>
- <<i>parse-tree</i>>)</code> where
- <code><<i>min</i>></code> is a non-negative integer and
- <code><<i>max</i>></code> is either a non-negative integer not
- smaller than <code><<i>min</i>></code> or <code>NIL</code> will
- result in a regular expression which tries to match
- <code><<i>parse-tree</i>></code> at least
- <code><<i>min</i>></code> times and at most
- <code><<i>max</i>></code> times (or as often as possible if
- <code><<i>max</i>></code> is <code>NIL</code>). So, e.g.,
- <code>(:NON-GREEDY-REPETITION 0 1 "ab")</code> is equivalent
- to the Perl regex string <code>"(?:ab)??"</code>.
- <li><code>(:STANDALONE <<i>parse-tree</i>>)</code> is an
- "independent" subexpression, i.e. <code>(:STANDALONE
- "bar")</code> is equivalent to the Perl regex string
- <code>"(?>bar)"</code>.
- <li><code>(:REGISTER <<i>parse-tree</i>>)</code> is a capturing
- register group. As usual, registers are counted from left to right
- beginning with 1.
- <li><code>(:NAMED-REGISTER <<i>name</i>> <<i>parse-tree</i>>)</code> is a named capturing
- register group. Acts as <code>:REGISTER</code>, but assigns <code><<i>name</i>></code> to a register too. This <code><<i>name</i>></code> can be later referred to via <code>:BACK-REFERENCE</code>. Names are case-sensitive and don't need to be unique. See <a href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a> for details.
- <li><code>(:BACK-REFERENCE <<i>ref</i>>)</code> is a
- back-reference to a register group. <code><<i>ref</i>></code> is
- a positive integer or a string denoting a register name. If there are
- several registers with the same name, the regex engine tries to
- successfully match at least of them, starting with the most recently
- seen register continuing to the least recently seen one, until a match
- is found. See <a
- href="#*allow-named-registers*"><code>*ALLOW-NAMED-REGISTERS*</code></a>
- for more information.
- <li><code>(:PROPERTY|:INVERTED-PROPERTY <<i>property</i>>)</code> is
- a <a href="#*property-resolver*">named property</a> (or its inverse) with
- <code><<i>property</i>></code> being a function designator or a
- string which must be resolved
- by <a href="#*property-resolver*"><code>*PROPERTY-RESOLVER*</code></a>.
- <li><a class=none name="filterdef"><code>(:FILTER <<i>function</i>> <tt>&optional</tt>
- <<i>length</i>>)</code></a> where
- <code><<i>function</i>></code> is a <a
- href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
- designator</a> and <code><<i>length</i>></code> is a
- non-negative integer or <code>NIL</code> is a user-defined <a
- href="#filters">filter</a>.
- <li><code>(:REGEX <<i>string</i>>)</code> where
- <code><<i>string</i>></code> is an
- embedded <a href="#create-scanner">regular expression in Perl
- syntax</a>.
- <li><code>(:CHAR-CLASS|:INVERTED-CHAR-CLASS
- {<<i>item</i>>}*)</code> where <code><<i>item</i>></code>
- is either a character, a <em>character range</em>, a named property
- (see above), or a symbol for a special character class (see above)
- will be translated into a (one character wide) character
- class. A <em>character range</em> looks like
- <code>(:RANGE <<i>char1</i>> <<i>char2</i>>)</code> where
- <code><<i>char1</i>></code> and
- <code><<i>char2</i>></code> are characters such that
- <code>(CHAR<= <<i>char1</i>> <<i>char2</i>>)</code> is
- true. Example: <code>(:INVERTED-CHAR-CLASS #\a (:RANGE #\D #\G)
- :DIGIT-CLASS)</code> is equivalent to the Perl regex string
- <code>"[^aD-G\d]"</code>.
- </ul>
- Because <code>CREATE-SCANNER</code> is defined as a generic function
- which dispatches on its first argument there's a certain ambiguity:
- Although strings are valid parse trees they will be interpreted as
- Perl regex strings when given to <code>CREATE-SCANNER</code>. To
- circumvent this you can always use the equivalent parse tree <code>(:GROUP
- <<i>string</i>>)</code> instead.
- <p>
- Note that <code>CREATE-SCANNER</code> doesn't always check
- for the well-formedness of its first argument, i.e. you are expected
- to provide <em>correct</em> parse trees.
- <p>
- The usage of the keyword argument <code>extended-mode</code> obviously
- doesn't make sense if <code>CREATE-SCANNER</code> is applied to parse
- trees and will signal an error.
- <p>
- If <code>destructive</code> is not <code>NIL</code> (the default is
- <code>NIL</code>), the function is allowed to destructively modify
- <code><i>parse-tree</i></code> while creating the scanner.
- <p>
- If you want to find out how parse trees are related to Perl regex
- strings, you should play around with
- <a href="#parse-string"><code>PARSE-STRING</code></a>:
- <pre>
- * (parse-string "(ab)*")
- (:GREEDY-REPETITION 0 NIL (:REGISTER "ab"))
- * (parse-string "(a(b))")
- (:REGISTER (:SEQUENCE #\a (:REGISTER #\b)))
- * (parse-string "(?:abc){3,5}")
- (:GREEDY-REPETITION 3 5 (:GROUP "abc"))
- <font color=orange>;; (:GREEDY-REPETITION 3 5 "abc") would also be OK</font>
- * (parse-string "a(?i)b(?-i)c")
- (:SEQUENCE #\a
- (:SEQUENCE (:FLAGS :CASE-INSENSITIVE-P)
- (:SEQUENCE #\b (:SEQUENCE (:FLAGS :CASE-SENSITIVE-P) #\c))))
- <font color=orange>;; same as (:SEQUENCE #\a :CASE-INSENSITIVE-P #\b :CASE-SENSITIVE-P #\c)</font>
- * (parse-string "(?=a)b")
- (:SEQUENCE (:POSITIVE-LOOKAHEAD #\a) #\b)
- </pre></blockquote>
- <p><br>
- <font color=green><b>For the rest of the dictionary, </b><code><i>regex</i></code><b> can
- always be a string (which is interpreted as a Perl regular
- expression), a parse tree, or a scanner created by
- <a href="#create-scanner"><font color=green><code>CREATE-SCANNER</code></font></a>. The
- </b><code><i>start</i></code><b> and </b><code><i>end</i></code><b>
- keyword parameters are always used as in <a
- href="#scan"><font color=green><code>SCAN</code></font></a>.</b></font>
- <p><br>[Generic Function]
- <br><a class=none name="scan"><b>scan</b> <i>regex target-string <tt>&key</tt> start end</i> => <i>match-start, match-end, reg-starts, reg-ends</i></a>
- <blockquote><br>
- Searches the string <code><i>target-string</i></code>
- from <code><i>start</i></code> (which defaults to 0) to
- <code><i>end</i></code> (which default to the length of
- <code><i>target-string</i></code>) and tries to match
- <code><i>regex</i></code>. On success returns four values - the start
- of the match, the end of the match, and two arrays denoting the
- beginnings and ends of register matches. On failure returns
- <code>NIL</code>. <code><i>target-string</i></code> will be coerced
- to a simple string if it isn't one already. (There's another keyword
- parameter <code><i>real-start-pos</i></code>. This one should
- <em>never</em> be set from user code - it is only used internally.)
- <p>
- <code>SCAN</code> acts as if the part of
- <code><i>target-string</i></code> between <code><i>start</i></code>
- and <code><i>end</i></code> were a standalone string, i.e. look-aheads
- and look-behinds can't look beyond these boundaries.
- <pre>
- * (scan "(a)*b" "xaaabd")
- 1
- 5
- #(3)
- #(4)
- * (scan "(a)*b" "xaaabd" :start 1)
- 1
- 5
- #(3)
- #(4)
- * (scan "(a)*b" "xaaabd" :start 2)
- 2
- 5
- #(3)
- #(4)
- * (scan "(a)*b" "xaaabd" :end 4)
- NIL
- * (scan '(:greedy-repetition 0 nil #\b) "bbbc")
- 0
- 3
- #()
- #()
- * (scan '(:greedy-repetition 4 6 #\b) "bbbc")
- NIL
- * (let ((s (create-scanner "(([a-c])+)x")))
- (scan s "abcxy"))
- 0
- 4
- #(0 2)
- #(3 3)
- </pre></blockquote>
- <p><br>[Function]
- <br><a class=none name="scan-to-strings"><b>scan-to-strings</b> <i>regex target-string <tt>&key</tt> start end sharedp</i> => <i>match, regs</i></a>
- <blockquote><br>
- Like <a href="#scan"><code>SCAN</code></a> but returns substrings of
- <code><i>target-string</i></code> instead of positions, i.e. this
- function returns two values on success: the whole match as a string
- plus an array of substrings (or <code>NIL</code>s) corresponding to
- the matched registers. If <code><i>sharedp</i></code> is true, the substrings may share structure with
- <code><i>target-string</i></code>.
- <pre>
- * (scan-to-strings "[^b]*b" "aaabd")
- "aaab"
- #()
- * (scan-to-strings "([^b])*b" "aaabd")
- "aaab"
- #("a")
- * (scan-to-strings "(([^b])*)b" "aaabd")
- "aaab"
- #("aaa" "a")
- </pre></blockquote>
- <p><br>[Macro]
- <br><a class=none name="register-groups-bind"><b>register-groups-bind</b> <i>var-list (regex target-string <tt>&key</tt> start end sharedp) declaration* statement*</i> => <i>result*</i></a>
- <blockquote><br>
- Evaluates <code><i>statement*</i></code> with the variables in <code><i>var-list</i></code> bound to the
- corresponding register groups after <code><i>target-string</i></code> has been matched
- against <code><i>regex</i></code>, i.e. each variable is either
- bound to a string or to <code>NIL</code>.
- As a shortcut, the elements of <code><i>var-list</i></code> can also be lists of the form <code>(FN VAR)</code> where <code>VAR</code> is the variable symbol
- and <code>FN</code> is a <a
- href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
- designator</a> (which is evaluated) denoting a function which is to be applied to the string before the result is bound to <code>VAR</code>.
- To make this even more convenient the form <code>(FN VAR1 ...VARn)</code> can be used as an abbreviation for
- <code>(FN VAR1) ... (FN VARn)</code>.
- <p>
- If there is no match, the <code><i>statement*</i></code> forms are <em>not</em>
- executed. For each element of
- <code><i>var-list</i></code> which is <code>NIL</code> there's no binding to the corresponding register
- group. The number of variables in <code><i>var-list</i></code> must not be greater than
- the number of register groups. If <code><i>sharedp</i></code> is true, the substrings may
- share structure with <code><i>target-string</i></code>.
- <pre>
- * (register-groups-bind (first second third fourth)
- ("((a)|(b)|(c))+" "abababc" :sharedp t)
- (list first second third fourth))
- ("c" "a" "b" "c")
- * (register-groups-bind (nil second third fourth)
- <font color=orange>;; note that we don't bind the first and fifth register group</font>
- ("((a)|(b)|(c))()+" "abababc" :start 6)
- (list second third fourth))
- (NIL NIL "c")
- * (register-groups-bind (first)
- ("(a|b)+" "accc" :start 1)
- (format t "This will not be printed: ~A" first))
- NIL
- * (register-groups-bind (fname lname (#'parse-integer date month year))
- ("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})" "Frank Zappa 21.12.1940")
- (list fname lname (encode-universal-time 0 0 0 date month year 0)))
- ("Frank" "Zappa" 1292889600)
- </pre>
- </blockquote>
- <p><br>[Macro]
- <br><a class=none name="do-scans"><b>do-scans</b> <i>(match-start match-end reg-starts reg-ends regex target-string <tt>&optional</tt> result-form <tt>&key</tt> start end) declaration* statement*</i> => <i>result*</i></a>
- <blockquote><br>
- A macro which iterates over <code><i>target-string</i></code> and
- tries to match <code><i>regex</i></code> as often as possible
- evaluating <code><i>statement*</i></code> with
- <code><i>match-start</i></code>, <code><i>match-end</i></code>,
- <code><i>reg-starts</i></code>, and <code><i>reg-ends</i></code> bound
- to the four return values of each match (see <a
- href="#scan"><code>SCAN</code></a>) in turn. After the last match,
- returns <code><i>result-form</i></code> if provided or
- <code>NIL</code> otherwise. An implicit block named <code>NIL</code>
- surrounds <code>DO-SCANS</code>; <code>RETURN</code> may be used to
- terminate the loop immediately. If <code><i>regex</i></code> matches
- an empty string, the scan is continued one position behind this match.
- <p>
- This is the most general macro to iterate over all matches in a target
- string. See the source code of <a
- href="#do-matches"><code>DO-MATCHES</code></a>, <a
- href="#all-matches"><code>ALL-MATCHES</code></a>, <a
- href="#split"><code>SPLIT</code></a>, or <a
- href="#regex-replace-all"><code>REGEX-REPLACE-ALL</code></a> for examples of its
- usage.</blockquote>
- <p><br>[Macro]
- <br><a class=none name="do-matches"><b>do-matches</b> <i>(match-start match-end regex target-string <tt>&optional</tt> result-form <tt>&key</tt> start end) declaration* statement*</i> => <i>result*</i></a>
- <blockquote><br>
- Like <a href="#do-scans"><code>DO-SCANS</code></a> but doesn't bind
- variables to the register arrays.
- <pre>
- * (defun foo (regex target-string &key (start 0) (end (length target-string)))
- (let ((sum 0))
- (do-matches (s e regex target-string nil :start start :end end)
- (incf sum (- e s)))
- (format t "~,2F% of the string was inside of a match~%"
- <font color=orange>;; note: doesn't check for division by zero</font>
- (float (* 100 (/ sum (- end start)))))))
- FOO
- * (foo "a" "abcabcabc")
- 33.33% of the string was inside of a match
- NIL
- * (foo "aa|b" "aacabcbbc")
- 55.56% of the string was inside of a match
- NIL
- </pre></blockquote>
- <p><br>[Macro]
- <br><a class=none name="do-matches-as-strings"><b>do-matches-as-strings</b> <i>(match-var regex target-string <tt>&optional</tt> result-form <tt>&key</tt> start end sharedp) declaration* statement*</i> => <i>result*</i></a>
- <blockquote><br>
- Like <a href="#do-matches"><code>DO-MATCHES</code></a> but binds
- <code><i>match-var</i></code> to the substring of
- <code><i>target-string</i></code> corresponding to each match in turn. If <code><i>sharedp</i></code> is true, the substrings may share structure with
- <code><i>target-string</i></code>.
- <pre>
- * (defun crossfoot (target-string &key (start 0) (end (length target-string)))
- (let ((sum 0))
- (do-matches-as-strings (m :digit-class
- target-string nil
- :start start :end end)
- (incf sum (parse-integer m)))
- (if (< sum 10)
- sum
- (crossfoot (format nil "~A" sum)))))
- CROSSFOOT
- * (crossfoot "bar")
- 0
- * (crossfoot "a3x")
- 3
- * (crossfoot "12345")
- 6
- </pre>
- Of course, in real life you would do this with <a href="#do-matches"><code>DO-MATCHES</code></a> and use the <code><i>start</i></code> and <code><i>end</i></code> keyword parameters of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_parse_.htm"><code>PARSE-INTEGER</code></a>.</blockquote>
- <p><br>[Macro]
- <br><a class=none name="do-register-groups"><b>do-register-groups</b> <i>var-list (regex target-string <tt>&optional</tt> result-form <tt>&key</tt> start end sharedp) declaration* statement*</i> => <i>result*</i></a>
- <blockquote><br>
- Iterates over <code><i>target-string</i></code> and tries to match <code><i>regex</i></code> as often as
- possible evaluating <code><i>statement*</i></code> with the variables in <code><i>var-list</i></code> bound to the
- corresponding register groups for each match in turn, i.e. each
- variable is either bound to a string or to <code>NIL</code>. You can use the same shortcuts and abbreviations as in <a href="#register-groups-bind"><code>REGISTER-GROUPS-BIND</code></a>. The number of
- variables in <code><i>var-list</i></code> must not be greater than the number of register
- groups. For each element of
- <code><i>var-list</i></code> which is <code>NIL</code> there's no binding to the corresponding register
- group. After the last match, returns <code><i>result-form</i></code> if provided or <code>NIL</code>
- otherwise. An implicit block named <code>NIL</code> surrounds <code>DO-REGISTER-GROUPS</code>;
- <code>RETURN</code> may be used to terminate the loop immediately. If <code><i>regex</i></code> matches
- an empty string, the scan is continued one position behind this
- match. If <code><i>sharedp</i></code> is true, the substrings may share structure with
- <code><i>target-string</i></code>.
- <pre>
- * (do-register-groups (first second third fourth)
- ("((a)|(b)|(c))" "abababc" nil :start 2 :sharedp t)
- (print (list first second third fourth)))
- ("a" "a" NIL NIL)
- ("b" NIL "b" NIL)
- ("a" "a" NIL NIL)
- ("b" NIL "b" NIL)
- ("c" NIL NIL "c")
- NIL
- * (let (result)
- (do-register-groups ((#'parse-integer n) (#'intern sign) whitespace)
- ("(\\d+)|(\\+|-|\\*|/)|(\\s+)" "12*15 - 42/3")
- (unless whitespace
- (push (or n sign) result)))
- (nreverse result))
- (12 * 15 - 42 / 3)
- </pre>
- </blockquote>
- <p><br>[Function]
- <br><a class=none name="all-matches"><b>all-matches</b> <i>regex target-string <tt>&key</tt> start end</i> => <i>list</i></a>
- <blockquote><br>
- Returns a list containing the start and end positions of all matches
- of <code><i>regex</i></code> against
- <code><i>target-string</i></code>, i.e. if there are <code>N</code>
- matches the list contains <code>(* 2 N)</code> elements. If
- <code><i>regex</i></code> matches an empty string the scan is
- continued one position behind this match.
- <pre>
- * (all-matches "a" "foo bar baz")
- (5 6 9 10)
- * (all-matches "\\w*" "foo bar baz")
- (0 3 3 3 4 7 7 7 8 11 11 11)
- </pre></blockquote>
- <p><br>[Function]
- <br><a class=none name="all-matches-as-strings"><b>all-matches-as-strings</b> <i>regex target-string <tt>&key</tt> start end sharedp</i> => <i>list</i></a>
- <blockquote><br>
- Like <a href="#all-matches"><code>ALL-MATCHES</code></a> but
- returns a list of substrings instead. If <code><i>sharedp</i></code> is true, the substrings may share structure with
- <code><i>target-string</i></code>.
- <pre>
- * (all-matches-as-strings "a" "foo bar baz")
- ("a" "a")
- * (all-matches-as-strings "\\w*" "foo bar baz")
- ("foo" "" "bar" "" "baz" "")
- </pre></blockquote>
- <h4><a name="splitting" class=none>Splitting and replacing</a></h4>
- <p><br>[Function]
- <br><a class=none name="split"><b>split</b> <i>regex target-string <tt>&key</tt> start end limit with-registers-p omit-unmatched-p sharedp</i> => <i>list</i></a>
- <blockquote><br>
- Matches <code><i>regex</i></code> against
- <code><i>target-string</i></code> as often as possible and returns a
- list of the substrings between the matches. If
- <code><i>with-registers-p</i></code> is true, substrings corresponding
- to matched registers are inserted into the list as well. If
- <code><i>omit-unmatched-p</i></code> is true, unmatched registers will
- simply be left out, otherwise they will show up as
- <code>NIL</code>. <code><i>limit</i></code> limits the number of
- elements returned - registers aren't counted. If
- <code><i>limit</i></code> is <code>NIL</code> (or 0 which is
- equivalent), trailing empty strings are removed from the result list.
- If <code><i>regex</i></code> matches an empty string, the scan is
- continued one position behind this match. If <code><i>sharedp</i></code> is true, the substrings may share structure with
- <code><i>target-string</i></code>.
- <p>
- This function also tries hard to be
- Perl-compatible - thus the somewhat peculiar behaviour.
- <pre>
- * (split "\\s+" "foo bar baz
- frob")
- ("foo" "bar" "baz" "frob")
- * (split "\\s*" "foo bar baz")
- ("f" "o" "o" "b" "a" "r" "b" "a" "z")
- * (split "(\\s+)" "foo bar baz")
- ("foo" "bar" "baz")
- * (split "(\\s+)" "foo bar baz" :with-registers-p t)
- ("foo" " " "bar" " " "baz")
- * (split "(\\s)(\\s*)" "foo bar baz" :with-registers-p t)
- ("foo" " " "" "bar" " " " " "baz")
- * (split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
- ("foo" "," NIL "bar" NIL ";" "baz")
- * (split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
- ("foo" "," "bar" ";" "baz")
- * (split ":" "a:b:c:d:e:f:g::")
- ("a" "b" "c" "d" "e" "f" "g")
- * (split ":" "a:b:c:d:e:f:g::" :limit 1)
- ("a:b:c:d:e:f:g::")
- * (split ":" "a:b:c:d:e:f:g::" :limit 2)
- ("a" "b:c:d:e:f:g::")
- * (split ":" "a:b:c:d:e:f:g::" :limit 3)
- ("a" "b" "c:d:e:f:g::")
- * (split ":" "a:b:c:d:e:f:g::" :limit 1000)
- ("a" "b" "c" "d" "e" "f" "g" "" "")
- </pre></blockquote>
- <p><br>[Function]
- <br><a class=none name="regex-replace"><b>regex-replace</b> <i>regex target-string replacement <tt>&key</tt> start end preserve-case simple-calls element-type</i> => <i>string, matchp</i></a>
- <blockquote><br> Try to match <code><i>target-string</i></code>
- between <code><i>start</i></code> and <code><i>end</i></code> against
- <code><i>regex</i></code> and replace the first match with
- <code><i>replacement</i></code>. Two values are returned; the modified
- string, and <code>T</code> if <code><i>regex</i></code> matched or
- <code>NIL</code> otherwise.
- <p>
- <code><i>replacement</i></code> can be a string which may contain the
- special substrings <code>"\&"</code> for the whole
- match, <code>"\`"</code> for the part of
- <code><i>target-string</i></code> before the match,
- <code>"\'"</code> for the part of
- <code><i>target-string</i></code> after the match,
- <code>"\N"</code> or <code>"\{N}"</code> for the
- <code>N</code>th register where <code>N</code> is a positive integer.
- <p>
- <code><i>replacement</i></code> can also be a <a
- href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
- designator</a> in which case the match will be replaced with the
- result of calling the function designated by
- <code><i>replacement</i></code> with the arguments
- <code><i>target-string</i></code>, <code><i>start</i></code>,
- <code><i>end</i></code>, <code><i>match-start</i></code>,
- <code><i>match-end</i></code>, <code><i>reg-starts</i></code>, and
- <code><i>reg-ends</i></code>. (<code><i>reg-starts</i></code> and
- <code><i>reg-ends</i></code> are arrays holding the start and end
- positions of matched registers (or <code>NIL</code>) - the meaning of
- the other arguments should be obvious.)
- <p>
- If <code><i>simple-calls</i></code> is true, a function designated by
- <code><i>replacement</i></code> will instead be called with the
- arguments <code><i>match</i></code>, <code><i>register-1</i></code>,
- ..., <code><i>register-n</i></code> where <code><i>match</i></code> is
- the whole match as a string and <code><i>register-1</i></code> to
- <code><i>register-n</i></code> are the matched registers, also as
- strings (or <code>NIL</code>). Note that these strings share structure with
- <code><i>target-string</i></code> so you must not modify them.
- <p>
- Finally, <code><i>replacement</i></code> can be a list where each
- element is a string (which will be inserted verbatim), one of the
- symbols <code>:match</code>, <code>:before-match</code>, or
- <code>:after-match</code> (corresponding to
- <code>"\&"</code>, <code>"\`"</code>, and
- <code>"\'"</code> above), an integer <code>N</code>
- (representing register <code>(1+ N)</code>), or a function
- designator.
- <p>
- If <code><i>preserve-case</i></code> is true (default is
- <code>NIL</code>), the replacement will try to preserve the case (all
- upper case, all lower case, or capitalized) of the match. The result
- will always be a <a
- href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#fresh">fresh</a>
- string, even if <code><i>regex</i></code> doesn't match.
- <p>
- <code><i>element-type</i></code> specifies
- the <a
- href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_a.htm#array_element_type">array
- element type</a> of the string which is returned, the default
- is <a
- href="http://www.lispworks.com/documentation/lw50/LWRM/html/lwref-346.htm"><code>LW:SIMPLE-CHAR</code></a>
- for LispWorks
- and <a
- href="http://www.lispworks.com/documentation/HyperSpec/Body/t_ch.htm"><code>CHARACTER</code></a>
- for other Lisps.
- <pre>
- * (regex-replace "fo+" "foo bar" "frob")
- "frob bar"
- T
- * (regex-replace "fo+" "FOO bar" "frob")
- "FOO bar"
- NIL
- * (regex-replace "(?i)fo+" "FOO bar" "frob")
- "frob bar"
- T
- * (regex-replace "(?i)fo+" "FOO bar" "frob" :preserve-case t)
- "FROB bar"
- T
- * (regex-replace "(?i)fo+" "Foo bar" "frob" :preserve-case t)
- "Frob bar"
- T
- * (regex-replace "bar" "foo bar baz" "[frob (was '\\&' between '\\`' and '\\'')]")
- "foo [frob (was 'bar' between 'foo ' and ' baz')] baz"
- T
- * (regex-replace "bar" "foo bar baz"
- '("[frob (was '" :match "' between '" :before-match "' and '" :after-match "')]"))
- "foo [frob (was 'bar' between 'foo ' and ' baz')] baz"
- T
- * (regex-replace "(be)(nev)(o)(lent)"
- "benevolent: adj. generous, kind"
- #'(lambda (match &rest registers)
- (format nil "~A [~{~A~^.~}]" match registers))
- :simple-calls t)
- "benevolent [be.nev.o.lent]: adj. generous, kind"
- T
- </pre></blockquote>
- <p><br>[Function]
- <br><a class=none name="regex-replace-all"><b>regex-replace-all</b> <i>regex target-string replacement <tt>&key</tt> start end preserve-case simple-calls element-type</i> => <i>string, matchp</i></a>
- <blockquote><br>
- Like <a href="#regex-replace"><code>REGEX-REPLACE</code></a> but replaces all matches.
- <pre>
- * (regex-replace-all "(?i)fo+" "foo Fooo FOOOO bar" "frob" :preserve-case t)
- "frob Frob FROB bar"
- T
- * (regex-replace-all "(?i)f(o+)" "foo Fooo FOOOO bar" "fr\\1b" :preserve-case t)
- "froob Frooob FROOOOB bar"
- T
- * (let ((qp-regex (create-scanner "[\\x80-\\xff]")))
- (defun encode-quoted-printable (string)
- "Converts 8-bit string to quoted-printable representation."
- <font color=orange>;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
- (flet ((convert (target-string start end match-start match-end reg-starts reg-ends)
- (declare (ignore start end match-end reg-starts reg-ends))
- (format nil "=~2,'0x" (char-code (char target-string match-start)))))
- (regex-replace-all qp-regex string #'convert))))
- Converted ENCODE-QUOTED-PRINTABLE.
- ENCODE-QUOTED-PRINTABLE
- * (encode-quoted-printable "Fête Sørensen naïve Hühner Straße")
- "F=EAte S=F8rensen na=EFve H=FChner Stra=DFe"
- T
- * (let ((url-regex (create-scanner "[^a-zA-Z0-9_\\-.]")))
- (defun url-encode (string)
- "URL-encodes a string."
- <font color=orange>;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
- (flet ((convert (target-string start end match-start match-end reg-starts reg-ends)
- (declare (ignore start end match-end reg-starts reg-ends))
- (format nil "%~2,'0x" (char-code (char target-string match-start)))))
- (regex-replace-all url-regex string #'convert))))
- Converted URL-ENCODE.
- URL-ENCODE
- * (url-encode "Fête Sørensen naïve Hühner Straße")
- "F%EAte%20S%F8rensen%20na%EFve%20H%FChner%20Stra%DFe"
- T
- * (defun how-many (target-string start end match-start match-end reg-starts reg-ends)
- (declare (ignore start end match-start match-end))
- (format nil "~A" (- (svref reg-ends 0)
- (svref reg-starts 0))))
- HOW-MANY
- * (regex-replace-all "{(.+?)}"
- "foo{...}bar{.....}{..}baz{....}frob"
- (list "[" 'how-many " dots]"))
- "foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
- T
- * (let ((qp-regex (create-scanner "[\\x80-\\xff]")))
- (defun encode-quoted-printable (string)
- "Converts 8-bit string to quoted-printable representation.
- Version using SIMPLE-CALLS keyword argument."
- <font color=orange>;; ;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
- (flet ((convert (match)
- (format nil "=~2,'0x" (char-code (char match 0)))))
- (regex-replace-all qp-regex string #'convert
- :simple-calls t))))
- Converted ENCODE-QUOTED-PRINTABLE.
- ENCODE-QUOTED-PRINTABLE
- * (encode-quoted-printable "Fête Sørensen naïve Hühner Straße")
- "F=EAte S=F8rensen na=EFve H=FChner Stra=DFe"
- T
- * (defun how-many (match first-register)
- (declare (ignore match))
- (format nil "~A" (length first-register)))
- HOW-MANY
- * (regex-replace-all "{(.+?)}"
- "foo{...}bar{.....}{..}baz{....}frob"
- (list "[" 'how-many " dots]")
- :simple-calls t)
- "foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
- T
- </pre></blockquote>
- <h4><a name="modify" class=none>Modifying scanner behaviour</a></h4>
- <p><br>[Special variable]
- <br><a class=none name="*property-resolver*"><b>*property-resolver*</b></a>
- </p><blockquote><br> This is the designator for a function responsible
- for resolving named properties like <code>\p{Number}</code>. If
- CL-PPCRE encounters a <code>\p</code> or a <code>\P</code> it expects
- to see an opening curly brace immediately afterwards and will then
- read everything following that brace until it sees a closing curly
- brace. The resolver function will be called with this string and must
- return a corresponding unary test function which accepts a character
- as its argument and returns a true value if and only if the character
- has the named property. If the resolver returns <code>NIL</code>
- instead, it signals that a property of that name is unknown.
- <pre>
- * (labels ((char-code-odd-p (char)
- (oddp (char-code char)))
- (char-code-even-p (char)
- (evenp (char-code char)))
- (resolver (name)
- (cond ((string= name "odd") #'char-code-odd-p)
- ((string= name "even") #'char-code-even-p)
- ((string= name "true") (constantly t))
- (t (error "Can't resolve ~S." name)))))
- (let ((*property-resolver* #'resolver))
- <font color=orange>;; quiz question - why do we need CREATE-SCANNER here?</font>
- (list (regex-replace-all (create-scanner "\\p{odd}") "abcd" "+")
- (regex-replace-all (create-scanner "\\p{even}") "abcd" "+")
- (regex-replace-all (create-scanner "\\p{true}") "abcd" "+"))))
- ("+b+d" "a+c+" "++++")
- </pre>
- If the value
- of <a href="#*property-resolver*"><code>*PROPERTY-RESOLVER*</code></a>
- is <code>NIL</code> (which is the default), <code>\p</code> and <code>\P</code> in regex
- strings will simply be treated like <code>p</code> or <code>P</code>
- as in CL-PPCRE 1.4.1 and earlier. Note that this does not affect
- the validity of <code>(:PROPERTY <<i>name</i>>)</code>
- parts in <a href="#create-scanner2">S-expression syntax</a>.
- </blockquote>
- <p><br>[Accessor]
- <br><a class="none" name="parse-tree-synonym"><b>parse-tree-synonym</b> <i>symbol</i> => <i>parse-tree</i>
- <br><tt>(setf (</tt><b>parse-tree-synonym</b> <i>symbol</i><tt>)</tt> <i>new-parse-tree</i><tt>)</tt></a>
- </p><blockquote><br>
- Any symbol (unless it's a keyword with a special meaning in parse
- trees) can be made a "synonym", i.e. an abbreviation, for another parse
- tree by this accessor. <code>PARSE-TREE-SYNONYM</code> returns <code>NIL</code> if <code><i>symbol</i></code> isn't a synonym yet.
- <pre>
- * (parse-string "a*b+")
- (:SEQUENCE (:GREEDY-REPETITION 0 NIL #\a) (:GREEDY-REPETITION 1 NIL #\b))
- * (defun my-repetition (char min)
- `(:greedy-repetition ,min nil ,char))
- MY-REPETITION
- * (setf (parse-tree-synonym 'a*) (my-repetition #\a 0))
- (:GREEDY-REPETITION 0 NIL #\a)
- * (setf (parse-tree-synonym 'b+) (my-repetition #\b 1))
- (:GREEDY-REPETITION 1 NIL #\b)
- * (let ((scanner (create-scanner '(:sequence a* b+))))
- (dolist (string '("ab" "b" "aab" "a" "x"))
- (print (scan scanner string)))
- (values))
- 0
- 0
- 0
- NIL
- NIL
- * (parse-tree-synonym 'a*)
- (:GREEDY-REPETITION 0 NIL #\a)
- * (parse-tree-synonym 'a+)
- NIL
- </pre></blockquote>
- <p><br>[Macro]
- <br><a class="none" name="define-parse-tree-synonym"><b>define-parse-tree-synonym</b> <i>name parse-tree</i> => <i>parse-tree</i></a>
- </p><blockquote><br>
- This is a convenience macro for parse tree synonyms defined as
- <pre>
- (defmacro define-parse-tree-synonym (name parse-tree)
- `(eval-when (:compile-toplevel :load-toplevel :execute)
- (setf (parse-tree-synonym ',name) ',parse-tree)))
- </pre>
- so you can write code like this:
- <pre>
- (define-parse-tree-synonym a-z
- (:char-class (:range #\a #\z) (:range #\A #\Z)))
- (define-parse-tree-synon…
Large files files are truncated, but you can click here to view the full file