PageRenderTime 81ms CodeModel.GetById 24ms RepoModel.GetById 1ms app.codeStats 0ms

/jEdit/tags/jedit-4-0-pre3/doc/users-guide/regexps.xml

#
XML | 131 lines | 128 code | 1 blank | 2 comment | 0 complexity | af2f1e42afc5d3c548d3d4da1e57732e MD5 | raw file
Possible License(s): BSD-3-Clause, AGPL-1.0, Apache-2.0, LGPL-2.0, LGPL-3.0, GPL-2.0, CC-BY-SA-3.0, LGPL-2.1, GPL-3.0, MPL-2.0-no-copyleft-exception, IPL-1.0
  1. <!-- jEdit buffer-local properties: -->
  2. <!-- :indentSize=1:noTabs=yes: -->
  3. <appendix id="regexps"><title>Regular Expressions</title>
  4. <para>
  5. jEdit uses regular expressions to implement inexact search and replace.
  6. A regular expression consists of a string where some
  7. characters are given special meaning with regard to pattern matching.
  8. </para>
  9. <para>
  10. Within a regular expression, the following characters have special meaning:
  11. </para>
  12. <bridgehead renderas="sect3">Positional Operators</bridgehead>
  13. <itemizedlist>
  14. <listitem><para><literal>^</literal> matches at the beginning of a line</para></listitem>
  15. <listitem><para><literal>$</literal> matches at the end of a line</para></listitem>
  16. <listitem><para><literal>\b</literal> matches at a word break</para></listitem>
  17. <listitem><para><literal>\B</literal> matches at a non-word break</para></listitem>
  18. <listitem><para><literal>\&lt;</literal> matches at the start of a word</para></listitem>
  19. <listitem><para><literal>\&gt;</literal> matches at the end of a word</para></listitem>
  20. </itemizedlist>
  21. <bridgehead renderas="sect3">One-Character Operators</bridgehead>
  22. <itemizedlist>
  23. <listitem><para><literal>.</literal> matches any single character</para></listitem>
  24. <listitem><para><literal>\d</literal> matches any decimal digit</para></listitem>
  25. <listitem><para><literal>\D</literal> matches any non-digit</para></listitem>
  26. <listitem><para><literal>\n</literal> matches the newline character</para></listitem>
  27. <listitem><para><literal>\s</literal> matches any whitespace character</para></listitem>
  28. <listitem><para><literal>\S</literal> matches any non-whitespace character</para></listitem>
  29. <listitem><para><literal>\t</literal> matches a horizontal tab character</para></listitem>
  30. <listitem><para><literal>\w</literal> matches any word (alphanumeric) character</para></listitem>
  31. <listitem><para><literal>\W</literal> matches any non-word (alphanumeric)
  32. character</para></listitem>
  33. <listitem><para><literal>\\</literal> matches the backslash
  34. (<quote>\</quote>) character</para></listitem>
  35. </itemizedlist>
  36. <bridgehead renderas="sect3">Character Class Operator</bridgehead>
  37. <itemizedlist>
  38. <listitem><para><literal>[<replaceable>abc</replaceable>]</literal> matches
  39. any character in
  40. the set <replaceable>a</replaceable>, <replaceable>b</replaceable> or
  41. <replaceable>c</replaceable></para></listitem>
  42. <listitem><para><literal>[^<replaceable>abc</replaceable>]</literal> matches
  43. any character not
  44. in the set <replaceable>a</replaceable>, <replaceable>b</replaceable> or
  45. <replaceable>c</replaceable></para></listitem>
  46. <listitem><para><literal>[<replaceable>a-z</replaceable>]</literal> matches
  47. any character in the
  48. range <replaceable>a</replaceable> to <replaceable>z</replaceable>, inclusive.
  49. A leading or trailing dash will be interpreted literally</para></listitem>
  50. </itemizedlist>
  51. <para>
  52. Within a character class expression, the following sequences have special meaning:
  53. </para>
  54. <itemizedlist>
  55. <listitem><para><literal>[:alnum:]</literal> Any alphanumeric
  56. character</para></listitem>
  57. <listitem><para><literal>[:alpha:]</literal> Any alphabetical character</para></listitem>
  58. <listitem><para><literal>[:blank:]</literal> A space or horizontal tab</para></listitem>
  59. <listitem><para><literal>[:cntrl:]</literal> A control character</para></listitem>
  60. <listitem><para><literal>[:digit:]</literal> A decimal digit</para></listitem>
  61. <listitem><para><literal>[:graph:]</literal> A non-space, non-control character</para></listitem>
  62. <listitem><para><literal>[:lower:]</literal> A lowercase letter</para></listitem>
  63. <listitem><para><literal>[:print:]</literal> Same as <literal>[:graph:]</literal>, but also space and tab</para></listitem>
  64. <listitem><para><literal>[:punct:]</literal> A punctuation character</para></listitem>
  65. <listitem><para><literal>[:space:]</literal> Any whitespace character, including newlines</para></listitem>
  66. <listitem><para><literal>[:upper:]</literal> An uppercase letter</para></listitem>
  67. <listitem><para><literal>[:xdigit:]</literal> A valid hexadecimal digit</para></listitem>
  68. </itemizedlist>
  69. <bridgehead renderas="sect3">Subexpressions and Backreferences</bridgehead>
  70. <itemizedlist>
  71. <listitem><para><literal>(<replaceable>abc</replaceable>)</literal> matches
  72. whatever the expression
  73. <replaceable>abc</replaceable> would match, and saves it as a subexpression.
  74. Also used for grouping</para></listitem>
  75. <listitem><para><literal>(?:<replaceable>...</replaceable>)</literal> pure
  76. grouping operator, does not
  77. save contents</para></listitem>
  78. <listitem><para><literal>(?#<replaceable>...</replaceable>)</literal> embedded
  79. comment, ignored by engine</para></listitem>
  80. <listitem><para><literal>(?=<replaceable>...</replaceable>)</literal> positive
  81. lookahead; the regular expression will match if the text in the brackets
  82. matches, but that text will not be considered part of the match</para></listitem>
  83. <listitem><para><literal>(?!<replaceable>...</replaceable>)</literal> negative
  84. lookahead; the regular expression will match if the text in the brackets
  85. does not
  86. match, and that text will not be considered part of the match</para></listitem>
  87. <listitem><para><literal>\<replaceable>n</replaceable></literal> where 0 &lt;
  88. <replaceable>n</replaceable> &lt; 10,
  89. matches the same thing the <replaceable>n</replaceable>th
  90. subexpression matched. Can only be used in the search string</para></listitem>
  91. <listitem><para><literal>$<replaceable>n</replaceable></literal> where 0 &lt;
  92. <replaceable>n</replaceable> &lt; 10,
  93. substituted with the text matched by the <replaceable>n</replaceable>th
  94. subexpression. Can only be used in the replacement string</para></listitem>
  95. </itemizedlist>
  96. <bridgehead renderas="sect3">Branching (Alternation) Operator</bridgehead>
  97. <itemizedlist>
  98. <listitem><para><literal><replaceable>a</replaceable>|<replaceable>b</replaceable></literal>
  99. matches whatever the expression <replaceable>a</replaceable> would match, or whatever
  100. the expression <replaceable>b</replaceable> would match.</para></listitem>
  101. </itemizedlist>
  102. <bridgehead renderas="sect3">Repeating Operators</bridgehead>
  103. <para>
  104. These symbols operate on the previous atomic expression.
  105. </para>
  106. <itemizedlist>
  107. <listitem><para><literal>?</literal> matches the preceding expression or the
  108. null string</para></listitem>
  109. <listitem><para><literal>*</literal> matches the null string or any number of repetitions
  110. of the preceding expression</para></listitem>
  111. <listitem><para><literal>+</literal> matches one or more repetitions of the preceding
  112. expression</para></listitem>
  113. <listitem><para><literal>{<replaceable>m</replaceable>}</literal> matches exactly
  114. <replaceable>m</replaceable>
  115. repetitions of the one-character expression</para></listitem>
  116. <listitem><para><literal>{<replaceable>m</replaceable>,<replaceable>n</replaceable>}</literal>
  117. matches between
  118. <replaceable>m</replaceable> and <replaceable>n</replaceable> repetitions of the preceding
  119. expression, inclusive</para></listitem>
  120. <listitem><para><literal>{<replaceable>m</replaceable>,}</literal> matches
  121. <replaceable>m</replaceable> or more
  122. repetitions of the preceding expression</para></listitem>
  123. </itemizedlist>
  124. <bridgehead renderas="sect3">Stingy (Minimal) Matching</bridgehead>
  125. <para>
  126. If a repeating operator (above) is immediately followed by a
  127. <literal>?</literal>, the repeating operator will stop at the smallest
  128. number of repetitions that can complete the rest of the match.
  129. </para>
  130. </appendix>