PageRenderTime 83ms CodeModel.GetById 77ms app.highlight 4ms RepoModel.GetById 1ms app.codeStats 0ms

/jEdit/tags/jedit-4-3-pre5/doc/users-guide/regexps.xml

#
XML | 123 lines | 115 code | 5 blank | 3 comment | 0 complexity | 6634381f507d5abc4db7e2d9b4658c4c MD5 | raw file
  1<!-- jEdit buffer-local properties: -->
  2<!-- :indentSize=1:noTabs=yes: -->
  3<!-- :xml.root=users-guide.xml: -->
  4<appendix id="regexps"><title>Regular Expressions</title>
  5 <para>
  6  jEdit 4.3pre5 and later uses regular expressions from <ulink url="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html">java.util.regex.Pattern</ulink>
  7  to implement inexact search and replace. Click there to see a complete reference guide to all supported meta-characters. </para>
  8  
  9  <para>
 10    A regular expression consists of a string where some characters are given special meaning with regard to pattern matching.
 11 </para>
 12
 13 <note><title> Inside XML files </title>
 14  <para> Inside XML files (such as Jedit mode files), it is important that you escape XML special characters, such as &amp;, &lt;, &gt;, etc. You can use the XML plugin's "characters to entities" to perform this mapping.  </para>
 15  </note>
 16
 17  <note> <title> Inside Java / beanshell / properties files </title>
 18  <para> Java strings are always parsed by java before they are processed by the regular expression engine, so you must make sure that backslashes are escaped by an extra backslash (<literal>\\</literal>)</para> </note>
 19  
 20 
 21 <para>
 22  Within a regular expression, the following characters have special meaning:
 23 </para>
 24 <bridgehead renderas="sect3">Positional Operators</bridgehead>
 25 <itemizedlist>
 26  <listitem><para><literal>^</literal> matches at the beginning of a line</para></listitem>
 27  <listitem><para><literal>$</literal> matches at the end of a line</para></listitem>
 28  <listitem><para><literal>\b</literal> matches at a word break</para></listitem>
 29  <listitem><para><literal>\B</literal> matches at a non-word break</para></listitem>
 30  <listitem><para><literal>\b</literal> matches at a word boundary</para></listitem>
 31 </itemizedlist>
 32 <bridgehead renderas="sect3">One-Character Operators</bridgehead>
 33 <itemizedlist>
 34  <listitem><para><literal>.</literal> matches any single character</para></listitem>
 35  <listitem><para><literal>\d</literal> matches any decimal digit</para></listitem>
 36  <listitem><para><literal>\D</literal> matches any non-digit</para></listitem>
 37  <listitem><para><literal>\n</literal> matches the newline character</para></listitem>
 38  <listitem><para><literal>\s</literal> matches any whitespace character</para></listitem>
 39  <listitem><para><literal>\S</literal> matches any non-whitespace character</para></listitem>
 40  <listitem><para><literal>\t</literal> matches a horizontal tab character</para></listitem>
 41  <listitem><para><literal>\w</literal> matches any word (alphanumeric) character</para></listitem>
 42  <listitem><para><literal>\W</literal> matches any non-word (alphanumeric)
 43  character</para></listitem>
 44  <listitem><para><literal>\\</literal> matches the backslash
 45  (<quote>\</quote>) character</para></listitem>
 46 </itemizedlist>
 47 <bridgehead renderas="sect3">Character Class Operator</bridgehead>
 48 <itemizedlist>
 49  <listitem><para><literal>[<replaceable>abc</replaceable>]</literal> matches
 50  any character in
 51  the set <replaceable>a</replaceable>, <replaceable>b</replaceable> or
 52  <replaceable>c</replaceable></para></listitem>
 53  <listitem><para><literal>[^<replaceable>abc</replaceable>]</literal> matches
 54  any character not
 55  in the set <replaceable>a</replaceable>, <replaceable>b</replaceable> or
 56  <replaceable>c</replaceable></para></listitem>
 57  <listitem><para><literal>[<replaceable>a-z</replaceable>]</literal> matches
 58  any character in the
 59  range <replaceable>a</replaceable> to <replaceable>z</replaceable>, inclusive.
 60  A leading or trailing dash will be interpreted literally</para></listitem>
 61 </itemizedlist>
 62 <bridgehead renderas="sect3">Subexpressions and Backreferences</bridgehead>
 63 <itemizedlist>
 64  <listitem><para><literal>(<replaceable>abc</replaceable>)</literal> matches
 65  whatever the expression
 66  <replaceable>abc</replaceable> would match, and saves it as a subexpression.
 67  Also used for grouping</para></listitem>
 68  <listitem><para><literal>(?:<replaceable>...</replaceable>)</literal> pure
 69  grouping operator, does not
 70  save contents</para></listitem>
 71  <listitem><para><literal>(?#<replaceable>...</replaceable>)</literal> embedded
 72  comment, ignored by engine</para></listitem>
 73  <listitem><para><literal>(?=<replaceable>...</replaceable>)</literal> positive
 74  lookahead; the regular expression will match if the text in the brackets
 75  matches, but that text will not be considered part of the match</para></listitem>
 76  <listitem><para><literal>(?!<replaceable>...</replaceable>)</literal> negative
 77  lookahead; the regular expression will match if the text in the brackets
 78  does not
 79  match, and that text will not be considered part of the match</para></listitem>
 80  <listitem><para><literal>\<replaceable>n</replaceable></literal> where 0 &lt;
 81  <replaceable>n</replaceable> &lt; 10,
 82  matches the same thing the <replaceable>n</replaceable>th
 83  subexpression matched. Can only be used in the search string</para></listitem>
 84  <listitem><para><literal>$<replaceable>n</replaceable></literal> where 0 &lt;
 85  <replaceable>n</replaceable> &lt; 10,
 86  substituted with the text matched by the <replaceable>n</replaceable>th
 87  subexpression. Can only be used in the replacement string</para></listitem>
 88 </itemizedlist>
 89 <bridgehead renderas="sect3">Branching (Alternation) Operator</bridgehead>
 90 <itemizedlist>
 91  <listitem><para><literal><replaceable>a</replaceable>|<replaceable>b</replaceable></literal>
 92  matches whatever the expression <replaceable>a</replaceable> would match, or whatever
 93  the expression <replaceable>b</replaceable> would match.</para></listitem>
 94 </itemizedlist>
 95 <bridgehead renderas="sect3">Repeating Operators</bridgehead>
 96 <para>
 97  These symbols operate on the previous atomic expression.
 98 </para>
 99 <itemizedlist>
100  <listitem><para><literal>?</literal> matches the preceding expression or the
101  null string</para></listitem>
102  <listitem><para><literal>*</literal> matches the null string or any number of repetitions
103  of the preceding expression</para></listitem>
104  <listitem><para><literal>+</literal> matches one or more repetitions of the preceding
105  expression</para></listitem>
106  <listitem><para><literal>{<replaceable>m</replaceable>}</literal> matches exactly
107  <replaceable>m</replaceable>
108  repetitions of the one-character expression</para></listitem>
109  <listitem><para><literal>{<replaceable>m</replaceable>,<replaceable>n</replaceable>}</literal>
110  matches between
111  <replaceable>m</replaceable> and <replaceable>n</replaceable> repetitions of the preceding
112  expression, inclusive</para></listitem>
113  <listitem><para><literal>{<replaceable>m</replaceable>,}</literal> matches
114  <replaceable>m</replaceable> or more
115  repetitions of the preceding expression</para></listitem>
116 </itemizedlist>
117 <bridgehead renderas="sect3">Stingy (Minimal) Matching</bridgehead>
118 <para>
119  If a repeating operator (above) is immediately followed by a
120  <literal>?</literal>, the repeating operator will stop at the smallest
121  number of repetitions that can complete the rest of the match.
122 </para>
123</appendix>