PageRenderTime 42ms CodeModel.GetById 18ms RepoModel.GetById 0ms app.codeStats 0ms

/jEdit/tags/jedit-4-3-pre5/doc/users-guide/regexps.xml

#
XML | 123 lines | 115 code | 5 blank | 3 comment | 0 complexity | 6634381f507d5abc4db7e2d9b4658c4c MD5 | raw file
Possible License(s): BSD-3-Clause, AGPL-1.0, Apache-2.0, LGPL-2.0, LGPL-3.0, GPL-2.0, CC-BY-SA-3.0, LGPL-2.1, GPL-3.0, MPL-2.0-no-copyleft-exception, IPL-1.0
  1. <!-- jEdit buffer-local properties: -->
  2. <!-- :indentSize=1:noTabs=yes: -->
  3. <!-- :xml.root=users-guide.xml: -->
  4. <appendix id="regexps"><title>Regular Expressions</title>
  5. <para>
  6. jEdit 4.3pre5 and later uses regular expressions from <ulink url="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html">java.util.regex.Pattern</ulink>
  7. to implement inexact search and replace. Click there to see a complete reference guide to all supported meta-characters. </para>
  8. <para>
  9. A regular expression consists of a string where some characters are given special meaning with regard to pattern matching.
  10. </para>
  11. <note><title> Inside XML files </title>
  12. <para> Inside XML files (such as Jedit mode files), it is important that you escape XML special characters, such as &amp;, &lt;, &gt;, etc. You can use the XML plugin's "characters to entities" to perform this mapping. </para>
  13. </note>
  14. <note> <title> Inside Java / beanshell / properties files </title>
  15. <para> Java strings are always parsed by java before they are processed by the regular expression engine, so you must make sure that backslashes are escaped by an extra backslash (<literal>\\</literal>)</para> </note>
  16. <para>
  17. Within a regular expression, the following characters have special meaning:
  18. </para>
  19. <bridgehead renderas="sect3">Positional Operators</bridgehead>
  20. <itemizedlist>
  21. <listitem><para><literal>^</literal> matches at the beginning of a line</para></listitem>
  22. <listitem><para><literal>$</literal> matches at the end of a line</para></listitem>
  23. <listitem><para><literal>\b</literal> matches at a word break</para></listitem>
  24. <listitem><para><literal>\B</literal> matches at a non-word break</para></listitem>
  25. <listitem><para><literal>\b</literal> matches at a word boundary</para></listitem>
  26. </itemizedlist>
  27. <bridgehead renderas="sect3">One-Character Operators</bridgehead>
  28. <itemizedlist>
  29. <listitem><para><literal>.</literal> matches any single character</para></listitem>
  30. <listitem><para><literal>\d</literal> matches any decimal digit</para></listitem>
  31. <listitem><para><literal>\D</literal> matches any non-digit</para></listitem>
  32. <listitem><para><literal>\n</literal> matches the newline character</para></listitem>
  33. <listitem><para><literal>\s</literal> matches any whitespace character</para></listitem>
  34. <listitem><para><literal>\S</literal> matches any non-whitespace character</para></listitem>
  35. <listitem><para><literal>\t</literal> matches a horizontal tab character</para></listitem>
  36. <listitem><para><literal>\w</literal> matches any word (alphanumeric) character</para></listitem>
  37. <listitem><para><literal>\W</literal> matches any non-word (alphanumeric)
  38. character</para></listitem>
  39. <listitem><para><literal>\\</literal> matches the backslash
  40. (<quote>\</quote>) character</para></listitem>
  41. </itemizedlist>
  42. <bridgehead renderas="sect3">Character Class Operator</bridgehead>
  43. <itemizedlist>
  44. <listitem><para><literal>[<replaceable>abc</replaceable>]</literal> matches
  45. any character in
  46. the set <replaceable>a</replaceable>, <replaceable>b</replaceable> or
  47. <replaceable>c</replaceable></para></listitem>
  48. <listitem><para><literal>[^<replaceable>abc</replaceable>]</literal> matches
  49. any character not
  50. in the set <replaceable>a</replaceable>, <replaceable>b</replaceable> or
  51. <replaceable>c</replaceable></para></listitem>
  52. <listitem><para><literal>[<replaceable>a-z</replaceable>]</literal> matches
  53. any character in the
  54. range <replaceable>a</replaceable> to <replaceable>z</replaceable>, inclusive.
  55. A leading or trailing dash will be interpreted literally</para></listitem>
  56. </itemizedlist>
  57. <bridgehead renderas="sect3">Subexpressions and Backreferences</bridgehead>
  58. <itemizedlist>
  59. <listitem><para><literal>(<replaceable>abc</replaceable>)</literal> matches
  60. whatever the expression
  61. <replaceable>abc</replaceable> would match, and saves it as a subexpression.
  62. Also used for grouping</para></listitem>
  63. <listitem><para><literal>(?:<replaceable>...</replaceable>)</literal> pure
  64. grouping operator, does not
  65. save contents</para></listitem>
  66. <listitem><para><literal>(?#<replaceable>...</replaceable>)</literal> embedded
  67. comment, ignored by engine</para></listitem>
  68. <listitem><para><literal>(?=<replaceable>...</replaceable>)</literal> positive
  69. lookahead; the regular expression will match if the text in the brackets
  70. matches, but that text will not be considered part of the match</para></listitem>
  71. <listitem><para><literal>(?!<replaceable>...</replaceable>)</literal> negative
  72. lookahead; the regular expression will match if the text in the brackets
  73. does not
  74. match, and that text will not be considered part of the match</para></listitem>
  75. <listitem><para><literal>\<replaceable>n</replaceable></literal> where 0 &lt;
  76. <replaceable>n</replaceable> &lt; 10,
  77. matches the same thing the <replaceable>n</replaceable>th
  78. subexpression matched. Can only be used in the search string</para></listitem>
  79. <listitem><para><literal>$<replaceable>n</replaceable></literal> where 0 &lt;
  80. <replaceable>n</replaceable> &lt; 10,
  81. substituted with the text matched by the <replaceable>n</replaceable>th
  82. subexpression. Can only be used in the replacement string</para></listitem>
  83. </itemizedlist>
  84. <bridgehead renderas="sect3">Branching (Alternation) Operator</bridgehead>
  85. <itemizedlist>
  86. <listitem><para><literal><replaceable>a</replaceable>|<replaceable>b</replaceable></literal>
  87. matches whatever the expression <replaceable>a</replaceable> would match, or whatever
  88. the expression <replaceable>b</replaceable> would match.</para></listitem>
  89. </itemizedlist>
  90. <bridgehead renderas="sect3">Repeating Operators</bridgehead>
  91. <para>
  92. These symbols operate on the previous atomic expression.
  93. </para>
  94. <itemizedlist>
  95. <listitem><para><literal>?</literal> matches the preceding expression or the
  96. null string</para></listitem>
  97. <listitem><para><literal>*</literal> matches the null string or any number of repetitions
  98. of the preceding expression</para></listitem>
  99. <listitem><para><literal>+</literal> matches one or more repetitions of the preceding
  100. expression</para></listitem>
  101. <listitem><para><literal>{<replaceable>m</replaceable>}</literal> matches exactly
  102. <replaceable>m</replaceable>
  103. repetitions of the one-character expression</para></listitem>
  104. <listitem><para><literal>{<replaceable>m</replaceable>,<replaceable>n</replaceable>}</literal>
  105. matches between
  106. <replaceable>m</replaceable> and <replaceable>n</replaceable> repetitions of the preceding
  107. expression, inclusive</para></listitem>
  108. <listitem><para><literal>{<replaceable>m</replaceable>,}</literal> matches
  109. <replaceable>m</replaceable> or more
  110. repetitions of the preceding expression</para></listitem>
  111. </itemizedlist>
  112. <bridgehead renderas="sect3">Stingy (Minimal) Matching</bridgehead>
  113. <para>
  114. If a repeating operator (above) is immediately followed by a
  115. <literal>?</literal>, the repeating operator will stop at the smallest
  116. number of repetitions that can complete the rest of the match.
  117. </para>
  118. </appendix>