/jEdit/tags/jedit-4-3-pre14/doc/users-guide/writing-modes.xml

# · XML · 910 lines · 895 code · 10 blank · 5 comment · 0 complexity · e699a133e6b102bf84adb486588096d3 MD5 · raw file

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="writing-modes">
  3. <title>Mode Definition Syntax</title>
  4. <!-- jEdit buffer-local properties: -->
  5. <!-- :indentSize=1:noTabs=true: -->
  6. <!-- :xml.root=users-guide.xml: -->
  7. <para>Edit modes are defined using XML, the <firstterm>eXtensible Markup
  8. Language</firstterm>; mode files have the extension
  9. <filename>.xml</filename>. XML is a very simple language, and as a result
  10. edit modes are easy to create and modify. This section will start with a
  11. short XML primer, followed by detailed information about each supported tag
  12. and highlighting rule.</para>
  13. <para>Editing a mode or a mode catalog file within jEdit will cause the
  14. changes to take effect immediately. If you edit modes using another
  15. application, the changes will take effect after the
  16. <guimenu>Utilities</guimenu> &gt; <guisubmenu>Troubleshooting</guisubmenu>
  17. &gt; <guimenuitem>Reload Edit Modes</guimenuitem> command is invoked.</para>
  18. <sect1 id="xml-primer">
  19. <title>An XML Primer</title>
  20. <para>A very simple XML file (which also happens to be an edit mode)
  21. looks like so:</para>
  22. <programlisting>&lt;?xml version="1.0"?&gt;
  23. &lt;!DOCTYPE MODE SYSTEM "xmode.dtd"&gt;
  24. &lt;MODE&gt;
  25. &lt;PROPS&gt;
  26. &lt;PROPERTY NAME="commentStart" VALUE="/*" /&gt;
  27. &lt;PROPERTY NAME="commentEnd" VALUE="*/" /&gt;
  28. &lt;/PROPS&gt;
  29. &lt;RULES&gt;
  30. &lt;SPAN TYPE="COMMENT1"&gt;
  31. &lt;BEGIN&gt;/*&lt;/BEGIN&gt;
  32. &lt;END&gt;*/&lt;/END&gt;
  33. &lt;/SPAN&gt;
  34. &lt;/RULES&gt;
  35. &lt;/MODE&gt;</programlisting>
  36. <para>Note that each opening tag must have a corresponding closing tag.
  37. If there is nothing between the opening and closing tags, for example
  38. <literal>&lt;TAG&gt;&lt;/TAG&gt;</literal>, the shorthand notation
  39. <literal>&lt;TAG /&gt;</literal> may be used. An example of this
  40. shorthand can be seen in the <literal>&lt;PROPERTY&gt;</literal> tags
  41. above.</para>
  42. <para>XML is case sensitive. <literal>Span</literal> or
  43. <literal>span</literal> is not the same as
  44. <literal>SPAN</literal>.</para>
  45. <para>To insert a special character such as &lt; or &gt; literally in
  46. XML (for example, inside an attribute value), you must write it as an
  47. <firstterm>entity</firstterm>. An entity consists of the character's
  48. symbolic name enclosed within <quote>&amp;</quote> and <quote>;</quote>.
  49. The most frequently used entities are:</para>
  50. <itemizedlist>
  51. <listitem>
  52. <para><literal>&amp;lt;</literal> - The less-than (&lt;)
  53. character</para>
  54. </listitem>
  55. <listitem>
  56. <para><literal>&amp;gt;</literal> - The greater-than (&gt;)
  57. character</para>
  58. </listitem>
  59. <listitem>
  60. <para><literal>&amp;amp;</literal> - The ampersand (&amp;)
  61. character</para>
  62. </listitem>
  63. </itemizedlist>
  64. <para>For example, the following will cause a syntax error:</para>
  65. <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;&lt;/SEQ&gt;</programlisting>
  66. <para>Instead, you must write:</para>
  67. <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;amp;&lt;/SEQ&gt;</programlisting>
  68. <para>Now that the basics of XML have been covered, the rest of this
  69. section will cover each construct in detail.</para>
  70. </sect1>
  71. <sect1 id="mode-preamble">
  72. <title>The Preamble and MODE tag</title>
  73. <para>Each mode definition must begin with the following:</para>
  74. <programlisting>&lt;?xml version="1.0"?&gt;
  75. &lt;!DOCTYPE MODE SYSTEM "xmode.dtd"&gt;</programlisting>
  76. <para>Each mode definition must also contain exactly one
  77. <literal>MODE</literal> tag. All other tags (<literal>PROPS</literal>,
  78. <literal>RULES</literal>) must be placed inside the
  79. <literal>MODE</literal> tag. The <literal>MODE</literal> tag does not
  80. have any defined attributes. Here is an example:</para>
  81. <programlisting>&lt;MODE&gt;
  82. <replaceable>... mode definition goes here ...</replaceable>
  83. &lt;/MODE&gt;</programlisting>
  84. </sect1>
  85. <sect1 id="mode-tag-props">
  86. <title>The PROPS Tag</title>
  87. <para>The <literal>PROPS</literal> tag and the
  88. <literal>PROPERTY</literal> tags inside it are used to define
  89. mode-specific properties. Each <literal>PROPERTY</literal> tag must have
  90. a <literal>NAME</literal> attribute set to the property's name, and a
  91. <literal>VALUE</literal> attribute with the property's value.</para>
  92. <para>All buffer-local properties listed in <xref
  93. linkend="buffer-local" /> may be given values in edit modes.</para>
  94. <para>The following mode properties specify commenting strings:</para>
  95. <itemizedlist>
  96. <listitem>
  97. <para><literal>commentEnd</literal> - the comment end string,
  98. used by the <guimenuitem>Range Comment</guimenuitem>
  99. command.</para>
  100. </listitem>
  101. <listitem>
  102. <para><literal>commentStart</literal> - the comment start
  103. string, used by the <guimenuitem>Range Comment</guimenuitem>
  104. command.</para>
  105. </listitem>
  106. <listitem>
  107. <para><literal>lineComment</literal> - the line comment string,
  108. used by the <guimenuitem>Line Comment</guimenuitem>
  109. command.</para>
  110. </listitem>
  111. </itemizedlist>
  112. <para>When performing auto indent, a number of mode properties determine
  113. the resulting indent level:</para>
  114. <itemizedlist>
  115. <listitem>
  116. <para>The line and the one before it are scanned for brackets
  117. listed in the <literal>indentCloseBrackets</literal> and
  118. <literal>indentOpenBrackets</literal> properties. Opening
  119. brackets in the previous line increase indent.</para>
  120. <para>If <literal>lineUpClosingBracket</literal> is set to
  121. <literal>true</literal>, then closing brackets on the current
  122. line will line up with the line containing the matching opening
  123. bracket. For example, in Java mode
  124. <literal>lineUpClosingBracket</literal> is set to
  125. <literal>true</literal>, resulting in brackets being indented
  126. like so:</para>
  127. <programlisting>{
  128. // Code
  129. {
  130. // More code
  131. }
  132. }</programlisting>
  133. <para>If <literal>lineUpClosingBracket</literal> is set to
  134. <literal>false</literal>, the line <emphasis>after</emphasis> a
  135. closing bracket will be lined up with the line containing the
  136. matching opening bracket. For example, in Lisp mode
  137. <literal>lineUpClosingBracket</literal> is set to
  138. <literal>false</literal>, resulting in brackets being indented
  139. like so:</para>
  140. <programlisting>(foo 'a-parameter
  141. (crazy-p)
  142. (bar baz ()))
  143. (print "hello world")</programlisting>
  144. </listitem>
  145. <listitem>
  146. <para>If the previous line contains no opening brackets, or if
  147. the <literal>doubleBracketIndent</literal> property is set to
  148. <literal>true</literal>, the previous line is checked against
  149. the regular expressions in the <literal>indentNextLine</literal>
  150. and <literal>indentNextLines</literal> properties. If the
  151. previous line matches the former, the indent of the current line
  152. is increased and the subsequent line is shifted back again. If
  153. the previous line matches the latter, the indent of the current
  154. and subsequent lines is increased.</para>
  155. <para>In Java mode, for example, the
  156. <literal>indentNextLine</literal> property is set to match
  157. control structures such as <quote>if</quote>,
  158. <quote>else</quote>, <quote>while</quote>, and so on.</para>
  159. <para>The <literal>doubleBracketIndent</literal> property, if
  160. set to the default of <literal>false</literal>, results in code
  161. indented like so:</para>
  162. <programlisting>while(objects.hasNext())
  163. {
  164. Object next = objects.hasNext();
  165. if(next instanceof Paintable)
  166. next.paint(g);
  167. }</programlisting>
  168. <para>On the other hand, settings this property to
  169. <quote>true</quote> will give the following result:</para>
  170. <programlisting>while(objects.hasNext())
  171. {
  172. Object next = objects.hasNext();
  173. if(next instanceof Paintable)
  174. next.paint(g);
  175. }</programlisting>
  176. </listitem>
  177. </itemizedlist>
  178. <para>Here is the complete <literal>&lt;PROPS&gt;</literal> tag for Java
  179. mode:</para>
  180. <programlisting>&lt;PROPS&gt;
  181. &lt;PROPERTY NAME="commentStart" VALUE="/*" /&gt;
  182. &lt;PROPERTY NAME="commentEnd" VALUE="*/" /&gt;
  183. &lt;PROPERTY NAME="lineComment" VALUE="//" /&gt;
  184. &lt;PROPERTY NAME="wordBreakChars" VALUE=",+-=&amp;lt;&amp;gt;/?^&amp;amp;*" /&gt;
  185. &lt;!-- Auto indent --&gt;
  186. &lt;PROPERTY NAME="indentOpenBrackets" VALUE="{" /&gt;
  187. &lt;PROPERTY NAME="indentCloseBrackets" VALUE="}" /&gt;
  188. &lt;PROPERTY NAME="unalignedOpenBrackets" VALUE="(" /&gt;
  189. &lt;PROPERTY NAME="unalignedCloseBrackets" VALUE=")" /&gt;
  190. &lt;PROPERTY NAME="indentNextLine"
  191. VALUE="\s*(((if|while)\s*\(|else\s*|else\s+if\s*\(|for\s*\(.*\))[^{;]*)" /&gt;
  192. &lt;PROPERTY NAME="unindentThisLine"
  193. VALUE="^.*(default:\s*|case.*:.*)$" /&gt;
  194. &lt;PROPERTY NAME="electricKeys" VALUE=":" /&gt;
  195. &lt;!-- set this to 'true' if you want to use GNU coding style --&gt;
  196. &lt;PROPERTY NAME="doubleBracketIndent" VALUE="false" /&gt;
  197. &lt;PROPERTY NAME="lineUpClosingBracket" VALUE="true" /&gt;
  198. &lt;/PROPS&gt;</programlisting>
  199. </sect1>
  200. <sect1 id="mode-tag-rules">
  201. <title>The RULES Tag</title>
  202. <para><literal>RULES</literal> tags must be placed inside the
  203. <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
  204. <firstterm>ruleset</firstterm>. A ruleset consists of a number of
  205. <firstterm>parser rules</firstterm>, with each parser rule specifying
  206. how to highlight a specific syntax token. There must be at least one
  207. ruleset in each edit mode. There can also be more than one, with
  208. different rulesets being used to highlight different parts of a buffer
  209. (for example, in HTML mode, one rule set highlights HTML tags, and
  210. another highlights inline JavaScript). For information about using more
  211. than one ruleset, see <xref linkend="mode-rule-span" />.</para>
  212. <para>The <literal>RULES</literal> tag supports the following
  213. attributes, all of which are optional:</para>
  214. <itemizedlist>
  215. <listitem>
  216. <para><literal>SET</literal> - the name of this ruleset. All
  217. rulesets other than the first must have a name.</para>
  218. </listitem>
  219. <listitem>
  220. <para><literal>IGNORE_CASE</literal> - if set to
  221. <literal>FALSE</literal>, matches will be case sensitive.
  222. Otherwise, case will not matter. Default is
  223. <literal>TRUE</literal>.</para>
  224. </listitem>
  225. <listitem>
  226. <para><literal>ESCAPE</literal> - specifies a character sequence
  227. for escaping literals. The first character following the escape
  228. sequence is not considered as input for syntax highlighting,
  229. thus being highlighted with default token for the rule set.
  230. </para>
  231. </listitem>
  232. <listitem>
  233. <para><literal>NO_WORD_SEP</literal> - any non-alphanumeric
  234. character <emphasis>not</emphasis> in this list is treated as a
  235. word separator for the purposes of syntax highlighting.</para>
  236. </listitem>
  237. <listitem>
  238. <para><literal>DEFAULT</literal> - the token type for text which
  239. doesn't match any specific rule. Default is
  240. <literal>NULL</literal>. See <xref
  241. linkend="mode-syntax-tokens" /> for a list of token
  242. types.</para>
  243. </listitem>
  244. <listitem>
  245. <para><literal>HIGHLIGHT_DIGITS</literal></para>
  246. </listitem>
  247. <listitem>
  248. <para><literal>DIGIT_RE</literal> - see below for information
  249. about these two attributes.</para>
  250. </listitem>
  251. </itemizedlist>
  252. <para>Here is an example <literal>RULES</literal> tag:</para>
  253. <programlisting>&lt;RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE"&gt;
  254. <replaceable>... parser rules go here ...</replaceable>
  255. &lt;/RULES&gt;</programlisting>
  256. <sect2>
  257. <title>Highlighting Numbers</title>
  258. <para>If the <literal>HIGHLIGHT_DIGITS</literal> attribute is set to
  259. <literal>TRUE</literal>, jEdit will attempt to highlight numbers in
  260. this ruleset.</para>
  261. <para>Any word consisting entirely of digits (0-9) will be
  262. highlighted with the <literal>DIGIT</literal> token type. A word
  263. that contains other letters in addition to digits will be
  264. highlighted with the <literal>DIGIT</literal> token type only if it
  265. matches the regular expression specified in the
  266. <literal>DIGIT_RE</literal> attribute. If this attribute is not
  267. specified, it will not be highlighted.</para>
  268. <para>Here is an example <literal>DIGIT_RE</literal> regular
  269. expression that highlights Java-style numeric literals (normal
  270. numbers, hexadecimals prefixed with <literal>0x</literal>, numbers
  271. suffixed with various type indicators, and floating point literals
  272. containing an exponent):</para>
  273. <programlisting>DIGIT_RE="(0[lL]?|[1-9]\d{0,9}(\d{0,9}[lL])?|0[xX]\p{XDigit}{1,8}(\p{XDigit}{0,8}[lL])?|0[0-7]{1,11}([0-7]{0,11}[lL])?|([0-9]+\.[0-9]*|\.[0-9]+)([eE][+-]?[0-9]+)?[fFdD]?|[0-9]+([eE][+-]?[0-9]+[fFdD]?|([eE][+-]?[0-9]+)?[fFdD]))"</programlisting>
  274. <para>Regular expression syntax is described in <xref
  275. linkend="regexps" />.</para>
  276. </sect2>
  277. <sect2 id="rule-ordering">
  278. <title>Rule Ordering Requirements</title>
  279. <para>You might encounter this very common pitfall when writing your
  280. own modes.</para>
  281. <para>Since jEdit checks buffer text against parser rules in the
  282. order they appear in the ruleset, more specific rules must be placed
  283. before generalized ones, otherwise the generalized rules will catch
  284. everything.</para>
  285. <para>This is best demonstrated with an example. The following is
  286. incorrect rule ordering:</para>
  287. <programlisting>&lt;SPAN TYPE="MARKUP"&gt;
  288. &lt;BEGIN&gt;[&lt;/BEGIN&gt;
  289. &lt;END&gt;]&lt;/END&gt;
  290. &lt;/SPAN&gt;
  291. &lt;SPAN TYPE="KEYWORD1"&gt;
  292. &lt;BEGIN&gt;[!&lt;/BEGIN&gt;
  293. &lt;END&gt;]&lt;/END&gt;
  294. &lt;/SPAN&gt;</programlisting>
  295. <para>If you write the above in a rule set, any occurrence of
  296. <quote>[</quote> (even things like <quote>[!DEFINE</quote>, etc)
  297. will be highlighted using the first rule, because it will be the
  298. first to match. This is most likely not the intended
  299. behavior.</para>
  300. <para>The problem can be solved by placing the more specific rule
  301. before the general one:</para>
  302. <programlisting>&lt;SPAN TYPE="KEYWORD1"&gt;
  303. &lt;BEGIN&gt;[!&lt;/BEGIN&gt;
  304. &lt;END&gt;]&lt;/END&gt;
  305. &lt;/SPAN&gt;
  306. &lt;SPAN TYPE="MARKUP"&gt;
  307. &lt;BEGIN&gt;[&lt;/BEGIN&gt;
  308. &lt;END&gt;]&lt;/END&gt;
  309. &lt;/SPAN&gt;</programlisting>
  310. <para>Now, if the buffer contains the text
  311. <quote>[!SPECIAL]</quote>, the rules will be checked in order, and
  312. the first rule will be the first to match. However, if you write
  313. <quote>[FOO]</quote>, it will be highlighted using the second rule,
  314. which is exactly what you would expect.</para>
  315. </sect2>
  316. <sect2>
  317. <title>Per-Ruleset Properties</title>
  318. <para>The <literal>PROPS</literal> tag (described in <xref
  319. linkend="mode-tag-props" />) can also be placed inside the
  320. <literal>RULES</literal> tag to define ruleset-specific properties.
  321. The following properties can be set on a per-ruleset basis:</para>
  322. <itemizedlist>
  323. <listitem>
  324. <para><literal>commentEnd</literal> - the comment end
  325. string.</para>
  326. </listitem>
  327. <listitem>
  328. <para><literal>commentStart</literal> - the comment start
  329. string.</para>
  330. </listitem>
  331. <listitem>
  332. <para><literal>lineComment</literal> - the line comment
  333. string.</para>
  334. </listitem>
  335. </itemizedlist>
  336. <para>This allows different parts of a file to have different
  337. comment strings (in the case of HTML, for example, in HTML text and
  338. inline JavaScript). For information about the commenting commands,
  339. see <xref linkend="commenting" />.</para>
  340. </sect2>
  341. </sect1>
  342. <sect1 id="mode-rule-terminate">
  343. <title>The TERMINATE Tag</title>
  344. <para>The <literal>TERMINATE</literal> rule, which must be placed inside
  345. a <literal>RULES</literal> tag, specifies that parsing should stop after
  346. the specified number of characters have been read from a line. The
  347. number of characters to terminate after should be specified with the
  348. <literal>AT_CHAR</literal> attribute. Here is an example:</para>
  349. <programlisting>&lt;TERMINATE AT_CHAR="1" /&gt;</programlisting>
  350. <para>This rule is used in Patch mode, for example, because only the
  351. first character of each line affects highlighting.</para>
  352. </sect1>
  353. <sect1 id="mode-rule-span">
  354. <title>The SPAN Tag</title>
  355. <para>The <literal>SPAN</literal> rule, which must be placed inside a
  356. <literal>RULES</literal> tag, highlights text between a start and end
  357. string. The start and end strings are specified inside child elements of
  358. the <literal>SPAN</literal> tag. The following attributes are
  359. supported:</para>
  360. <itemizedlist>
  361. <listitem>
  362. <para><literal>TYPE</literal> - The token type to highlight the
  363. span with. See <xref linkend="mode-syntax-tokens" /> for a list
  364. of token types.</para>
  365. </listitem>
  366. <listitem>
  367. <para><literal>AT_LINE_START</literal> - If set to
  368. <literal>TRUE</literal>, the span will only be highlighted if
  369. the start sequence occurs at the beginning of a line.</para>
  370. </listitem>
  371. <listitem>
  372. <para><literal>AT_WHITESPACE_END</literal> - If set to
  373. <literal>TRUE</literal>, the span will only be highlighted if
  374. the start sequence is the first non-whitespace text in the
  375. line.</para>
  376. </listitem>
  377. <listitem>
  378. <para><literal>AT_WORD_START</literal> - If set to
  379. <literal>TRUE</literal>, the span will only be highlighted if
  380. the start sequence occurs at the beginning of a word.</para>
  381. </listitem>
  382. <listitem>
  383. <para><literal>DELEGATE</literal> - text inside the span will be
  384. highlighted with the specified ruleset. To delegate to a ruleset
  385. defined in the current mode, just specify its name. To delegate
  386. to a ruleset defined in another mode, specify a name of the form
  387. <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
  388. Note that the first (unnamed) ruleset in a mode is called
  389. <quote>MAIN</quote>.</para>
  390. </listitem>
  391. <listitem>
  392. <para><literal>MATCH_TYPE</literal> - Controls how the start and
  393. end of the sequence will be highlighted. See <xref
  394. linkend="mode-match-type" /> for more information.</para>
  395. </listitem>
  396. <listitem>
  397. <para><literal>ESCAPE</literal> - specifies a character sequence
  398. for escaping characters. The first character following the escape
  399. sequence is not considered as input for syntax highlighting,
  400. thus being highlighted with rule's token.
  401. </para>
  402. </listitem>
  403. <listitem>
  404. <para><literal>NO_LINE_BREAK</literal> - If set to
  405. <literal>TRUE</literal>, the span will not cross line
  406. breaks.</para>
  407. </listitem>
  408. <listitem>
  409. <para><literal>NO_WORD_BREAK</literal> - If set to
  410. <literal>TRUE</literal>, the span will not cross word
  411. breaks.</para>
  412. </listitem>
  413. </itemizedlist>
  414. <para>Note that the <literal>AT_LINE_START</literal>,
  415. <literal>AT_WHITESPACE_END</literal> and
  416. <literal>AT_WORD_START</literal> attributes can also be used on the
  417. <literal>BEGIN</literal> and <literal>END</literal> elements. Setting
  418. these attributes to the same value on both elements has the same effect
  419. as setting them on the <literal>SPAN</literal> element.</para>
  420. <para>Here is a <literal>SPAN</literal> that highlights Java string
  421. literals, which cannot include line breaks:</para>
  422. <programlisting>&lt;SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE"&gt;
  423. &lt;BEGIN&gt;"&lt;/BEGIN&gt;
  424. &lt;END&gt;"&lt;/END&gt;
  425. &lt;/SPAN&gt;</programlisting>
  426. <para>Here is a <literal>SPAN</literal> that highlights Java
  427. documentation comments by delegating to the <quote>JAVADOC</quote>
  428. ruleset defined elsewhere in the current mode:</para>
  429. <programlisting>&lt;SPAN TYPE="COMMENT2" DELEGATE="JAVADOC"&gt;
  430. &lt;BEGIN&gt;/**&lt;/BEGIN&gt;
  431. &lt;END&gt;*/&lt;/END&gt;
  432. &lt;/SPAN&gt;</programlisting>
  433. <para>Here is a <literal>SPAN</literal> that highlights HTML cascading
  434. stylesheets inside <literal>&lt;STYLE&gt;</literal> tags by delegating
  435. to the main ruleset in the CSS edit mode:</para>
  436. <programlisting>&lt;SPAN TYPE="MARKUP" DELEGATE="css::MAIN"&gt;
  437. &lt;BEGIN&gt;&amp;lt;style&amp;gt;&lt;/BEGIN&gt;
  438. &lt;END&gt;&amp;lt;/style&amp;gt;&lt;/END&gt;
  439. &lt;/SPAN&gt;</programlisting>
  440. </sect1>
  441. <sect1 id="mode-rule-span-regexp">
  442. <title>The SPAN_REGEXP Tag</title>
  443. <para>The <literal>SPAN_REGEXP</literal> rule is similar to the
  444. <literal>SPAN</literal> rule except the start sequence is taken to be a
  445. regular expression. In addition to the attributes supported by the
  446. <literal>SPAN</literal> tag, the following attributes are
  447. supported:</para>
  448. <itemizedlist>
  449. <listitem>
  450. <para><literal>HASH_CHAR</literal> - a literal string which must
  451. be at the start of a regular expression.</para>
  452. </listitem>
  453. <listitem>
  454. <para><literal>HASH_CHARS</literal> - a list of possible literal
  455. characters, one of which must match at the start of the regular
  456. expression.</para>
  457. </listitem>
  458. </itemizedlist>
  459. <para><literal>HASH_CHAR</literal> and <literal>HASH_CHARS</literal>
  460. attributes are both optional, but you may only specify one, not both. If
  461. both are specified, <literal>HASH_CHARS</literal> is ignored and an
  462. error is shown. Whenever possible, use a literal prefix to specify a
  463. <literal>SPAN_REGEXP</literal>. If the starting prefix is always the
  464. same, use <literal>HASH_CHAR</literal> and provide as much prefix as
  465. possible. Only in rare cases would you omit both attributes, such as the
  466. case where there is no other reliable way to get the highlighting you
  467. need, for example, with comments in the Cobol programming
  468. language.</para>
  469. <para>The regular expression match cannot span more than one line. Any
  470. text matched by groups in the <literal>BEGIN</literal> regular
  471. expression is substituted in the <literal>END</literal> string. See
  472. below for an example of where this is useful.</para>
  473. <para>Regular expression syntax is described in <xref
  474. linkend="regexps" />.</para>
  475. <para>Here is a <literal>SPAN_REGEXP</literal> rule that highlights
  476. <quote>read-ins</quote> in shell scripts:</para>
  477. <programlisting>&lt;SPAN_REGEXP HASH_CHAR="&amp;lt;" TYPE="LITERAL1" DELEGATE="LITERAL"&gt;
  478. &lt;BEGIN&gt;&lt;![CDATA[&lt;&lt;[\p{Space}'"]*([\p{Alnum}_]+)[\p{Space}'"]*]]&gt;&lt;/BEGIN&gt;
  479. &lt;END&gt;$1&lt;/END&gt;
  480. &lt;/SPAN_REGEXP&gt;</programlisting>
  481. <para>Here is a <literal>SPAN_REGEXP</literal> rule that highlights
  482. constructs placed between <literal>&lt;#ftl</literal> and
  483. <literal>&gt;</literal>, as long as the <literal>&lt;#ftl</literal> is
  484. followed by a word break:</para>
  485. <programlisting>&lt;SPAN_REGEXP TYPE="KEYWORD1" HASH_CHAR="&amp;lt;" DELEGATE="EXPRESSION"&gt;
  486. &lt;BEGIN&gt;&amp;lt;#ftl\b&lt;/BEGIN&gt;
  487. &lt;END&gt;&amp;gt;&lt;/END&gt;
  488. &lt;/SPAN_REGEXP&gt;</programlisting>
  489. </sect1>
  490. <sect1 id="mode-rule-eol-span">
  491. <title>The EOL_SPAN Tag</title>
  492. <para>An <literal>EOL_SPAN</literal> is similar to a
  493. <literal>SPAN</literal> except that highlighting stops at the end of the
  494. line, and no end sequence needs to be specified. The text to match is
  495. specified between the opening and closing <literal>EOL_SPAN</literal>
  496. tags. The following attributes are supported:</para>
  497. <itemizedlist>
  498. <listitem>
  499. <para><literal>TYPE</literal> - The token type to highlight the
  500. span with. See <xref linkend="mode-syntax-tokens" /> for a list
  501. of token types.</para>
  502. </listitem>
  503. <listitem>
  504. <para><literal>AT_LINE_START</literal> - If set to
  505. <literal>TRUE</literal>, the span will only be highlighted if
  506. the start sequence occurs at the beginning of a line.</para>
  507. </listitem>
  508. <listitem>
  509. <para><literal>AT_WHITESPACE_END</literal> - If set to
  510. <literal>TRUE</literal>, the span will only be highlighted if
  511. the sequence is the first non-whitespace text in the
  512. line.</para>
  513. </listitem>
  514. <listitem>
  515. <para><literal>AT_WORD_START</literal> - If set to
  516. <literal>TRUE</literal>, the span will only be highlighted if
  517. the start sequence occurs at the beginning of a word.</para>
  518. </listitem>
  519. <listitem>
  520. <para><literal>DELEGATE</literal> - text inside the span will be
  521. highlighted with the specified ruleset. To delegate to a ruleset
  522. defined in the current mode, just specify its name. To delegate
  523. to a ruleset defined in another mode, specify a name of the form
  524. <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
  525. Note that the first (unnamed) ruleset in a mode is called
  526. <quote>MAIN</quote>.</para>
  527. </listitem>
  528. <listitem>
  529. <para><literal>MATCH_TYPE</literal> - Controls how the start of
  530. the sequence will be highlighted. See <xref
  531. linkend="mode-match-type" /> for more information.</para>
  532. </listitem>
  533. </itemizedlist>
  534. <para>Here is an <literal>EOL_SPAN</literal> that highlights C++
  535. comments:</para>
  536. <programlisting>&lt;EOL_SPAN TYPE="COMMENT1"&gt;//&lt;/EOL_SPAN&gt;</programlisting>
  537. </sect1>
  538. <sect1 id="mode-rule-eol-span-regexp">
  539. <title>The EOL_SPAN_REGEXP Tag</title>
  540. <para>The <literal>EOL_SPAN_REGEXP</literal> rule is similar to the
  541. <literal>EOL_SPAN</literal> rule except the match sequence is taken to
  542. be a regular expression. In addition to the attributes supported by the
  543. <literal>EOL_SPAN</literal> tag, the following attributes are
  544. supported:</para>
  545. <itemizedlist>
  546. <listitem>
  547. <para><literal>HASH_CHAR</literal> - a literal string which must
  548. be at the start of a regular expression.</para>
  549. </listitem>
  550. <listitem>
  551. <para><literal>HASH_CHARS</literal> - a list of possible literal
  552. characters, one of which must match at the start of the regular
  553. expression.</para>
  554. </listitem>
  555. </itemizedlist>
  556. <para><literal>HASH_CHAR</literal> and <literal>HASH_CHARS</literal>
  557. attributes are both optional, but you may only specify one, not both. If
  558. both are specified, <literal>HASH_CHARS</literal> is ignored and an
  559. error is shown. Whenever possible, use a literal prefix to specify a
  560. <literal>EOL_SPAN_REGEXP</literal>. If the starting prefix is always the
  561. same, use <literal>HASH_CHAR</literal> and provide as much prefix as
  562. possible. Only in rare cases would you omit both attributes, such as the
  563. case where there is no other reliable way to get the highlighting you
  564. need, for example, with comments in the Cobol programming
  565. language.</para>
  566. <para>The regular expression match cannot span more than one
  567. line.</para>
  568. <para>Regular expression syntax is described in <xref
  569. linkend="regexps" />.</para>
  570. <para>Here is an <literal>EOL_SPAN_REGEXP</literal> that highlights
  571. MS-DOS batch file comments, which start with <literal>REM</literal>,
  572. followed by any whitespace character, and extend until the end of the
  573. line:</para>
  574. <programlisting>&lt;EOL_SPAN_REGEXP AT_WHITESPACE_END="TRUE" HASH_CHAR="REM" TYPE="COMMENT1"&gt;REM\s&lt;/EOL_SPAN_REGEXP&gt;</programlisting>
  575. </sect1>
  576. <sect1 id="mode-rule-mark-prev">
  577. <title>The MARK_PREVIOUS Tag</title>
  578. <para>The <literal>MARK_PREVIOUS</literal> rule, which must be placed
  579. inside a <literal>RULES</literal> tag, highlights from the end of the
  580. previous syntax token to the matched text. The text to match is
  581. specified between opening and closing <literal>MARK_PREVIOUS</literal>
  582. tags. The following attributes are supported:</para>
  583. <itemizedlist>
  584. <listitem>
  585. <para><literal>TYPE</literal> - The token type to highlight the
  586. text with. See <xref linkend="mode-syntax-tokens" /> for a list
  587. of token types.</para>
  588. </listitem>
  589. <listitem>
  590. <para><literal>AT_LINE_START</literal> - If set to
  591. <literal>TRUE</literal>, the sequence will only be highlighted
  592. if it occurs at the beginning of a line.</para>
  593. </listitem>
  594. <listitem>
  595. <para><literal>AT_WHITESPACE_END</literal> - If set to
  596. <literal>TRUE</literal>, the sequence will only be highlighted
  597. if it is the first non-whitespace text in the line.</para>
  598. </listitem>
  599. <listitem>
  600. <para><literal>AT_WORD_START</literal> - If set to
  601. <literal>TRUE</literal>, the sequence will only be highlighted
  602. if it occurs at the beginning of a word.</para>
  603. </listitem>
  604. <listitem>
  605. <para><literal>MATCH_TYPE</literal> - Controls how the matched
  606. region will be highlighted. See <xref
  607. linkend="mode-match-type" /> for more information.</para>
  608. </listitem>
  609. </itemizedlist>
  610. <para>Here is a rule that highlights labels in Java mode (for example,
  611. <quote>XXX:</quote>):</para>
  612. <programlisting>&lt;MARK_PREVIOUS AT_WHITESPACE_END="TRUE"
  613. MATCH_TYPE="DEFAULT"&gt;:&lt;/MARK_PREVIOUS&gt;</programlisting>
  614. </sect1>
  615. <sect1 id="mode-rule-mark-following">
  616. <title>The MARK_FOLLOWING Tag</title>
  617. <para>The <literal>MARK_FOLLOWING</literal> rule, which must be placed
  618. inside a <literal>RULES</literal> tag, highlights from the start of the
  619. match to the next syntax token. The text to match is specified between
  620. opening and closing <literal>MARK_FOLLOWING</literal> tags. The
  621. following attributes are supported:</para>
  622. <itemizedlist>
  623. <listitem>
  624. <para><literal>TYPE</literal> - The token type to highlight the
  625. text with. See <xref linkend="mode-syntax-tokens" /> for a list
  626. of token types.</para>
  627. </listitem>
  628. <listitem>
  629. <para><literal>AT_LINE_START</literal> - If set to
  630. <literal>TRUE</literal>, the sequence will only be highlighted
  631. if it occurs at the beginning of a line.</para>
  632. </listitem>
  633. <listitem>
  634. <para><literal>AT_WHITESPACE_END</literal> - If set to
  635. <literal>TRUE</literal>, the sequence will only be highlighted
  636. if it is the first non-whitespace text in the line.</para>
  637. </listitem>
  638. <listitem>
  639. <para><literal>AT_WORD_START</literal> - If set to
  640. <literal>TRUE</literal>, the sequence will only be highlighted
  641. if it occurs at the beginning of a word.</para>
  642. </listitem>
  643. <listitem>
  644. <para><literal>MATCH_TYPE</literal> - Controls how the matched
  645. region will be highlighted. See <xref
  646. linkend="mode-match-type" /> for more information.</para>
  647. </listitem>
  648. </itemizedlist>
  649. <para>Here is a rule that highlights variables in Unix shell scripts
  650. (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):</para>
  651. <programlisting>&lt;MARK_FOLLOWING TYPE="KEYWORD2"&gt;$&lt;/MARK_FOLLOWING&gt;</programlisting>
  652. </sect1>
  653. <sect1 id="mode-rule-seq">
  654. <title>The SEQ Tag</title>
  655. <para>The <literal>SEQ</literal> rule, which must be placed inside a
  656. <literal>RULES</literal> tag, highlights fixed sequences of text. The
  657. text to highlight is specified between opening and closing
  658. <literal>SEQ</literal> tags. The following attributes are
  659. supported:</para>
  660. <itemizedlist>
  661. <listitem>
  662. <para><literal>TYPE</literal> - the token type to highlight the
  663. sequence with. See <xref linkend="mode-syntax-tokens" /> for a
  664. list of token types.</para>
  665. </listitem>
  666. <listitem>
  667. <para><literal>AT_LINE_START</literal> - If set to
  668. <literal>TRUE</literal>, the sequence will only be highlighted
  669. if it occurs at the beginning of a line.</para>
  670. </listitem>
  671. <listitem>
  672. <para><literal>AT_WHITESPACE_END</literal> - If set to
  673. <literal>TRUE</literal>, the sequence will only be highlighted
  674. if it is the first non-whitespace text in the line.</para>
  675. </listitem>
  676. <listitem>
  677. <para><literal>AT_WORD_START</literal> - If set to
  678. <literal>TRUE</literal>, the sequence will only be highlighted
  679. if it occurs at the beginning of a word.</para>
  680. </listitem>
  681. <listitem>
  682. <para><literal>DELEGATE</literal> - if this attribute is
  683. specified, all text after the sequence will be highlighted using
  684. this ruleset. To delegate to a ruleset defined in the current
  685. mode, just specify its name. To delegate to a ruleset defined in
  686. another mode, specify a name of the form
  687. <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
  688. Note that the first (unnamed) ruleset in a mode is called
  689. <quote>MAIN</quote>.</para>
  690. </listitem>
  691. </itemizedlist>
  692. <para>The following rules highlight a few Java operators:</para>
  693. <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;+&lt;/SEQ&gt;
  694. &lt;SEQ TYPE="OPERATOR"&gt;-&lt;/SEQ&gt;
  695. &lt;SEQ TYPE="OPERATOR"&gt;*&lt;/SEQ&gt;
  696. &lt;SEQ TYPE="OPERATOR"&gt;/&lt;/SEQ&gt;</programlisting>
  697. </sect1>
  698. <sect1 id="mode-rule-seq-regexp">
  699. <title>The SEQ_REGEXP Tag</title>
  700. <para>The <literal>SEQ_REGEXP</literal> rule is similar to the
  701. <literal>SEQ</literal> rule except the match sequence is taken to be a
  702. regular expression. In addition to the attributes supported by the
  703. <literal>SEQ</literal> tag, the following attributes are
  704. supported:</para>
  705. <itemizedlist>
  706. <listitem>
  707. <para><literal>HASH_CHAR</literal> - a literal string which must
  708. be at the start of a regular expression.</para>
  709. </listitem>
  710. <listitem>
  711. <para><literal>HASH_CHARS</literal> - a list of possible literal
  712. characters, one of which must match at the start of the regular
  713. expression.</para>
  714. </listitem>
  715. </itemizedlist>
  716. <para><literal>HASH_CHAR</literal> and <literal>HASH_CHARS</literal>
  717. attributes are both optional, but you may only specify one, not both. If
  718. both are specified, <literal>HASH_CHARS</literal> is ignored and an
  719. error is shown. Whenever possible, use a literal prefix to specify a
  720. <literal>SEQ_REGEXP</literal>. If the starting prefix is always the
  721. same, use <literal>HASH_CHAR</literal> and provide as much prefix as
  722. possible. Only in rare cases would you omit both attributes, such as the
  723. case where there is no other reliable way to get the highlighting you
  724. need, for example, with comments in the Cobol programming
  725. language.</para>
  726. <para>The regular expression match cannot span more than one
  727. line.</para>
  728. <para>Regular expression syntax is described in <xref
  729. linkend="regexps" />.</para>
  730. <para><emphasis role="bold">NOTE</emphasis>: c-style character escaping
  731. for literals (such as the tab char: \t) do not work as attribute values
  732. in XML. Use the XML character entity instead. For example: &amp;#09;
  733. instead of \t.</para>
  734. <para>Here is a <literal>SEQ_REGEXP</literal> rule from moin.xml that
  735. uses the <literal>HASH_CHARS</literal> attribute, to describe a keyword
  736. (wikiword) that can start with any uppercase letter and contain lower
  737. case letters and at least one uppercase letter in the middle.</para>
  738. <programlisting>
  739. &lt;SEQ_REGEXP HASH_CHARS="ABCDEFGHIJKLMNOPQRSTUVWXYZ" AT_WORD_START="TRUE" TYPE="KEYWORD2"&gt;[A-Z][a-z]+[A-Z][a-zA-Z]+&lt;/SEQ_REGEXP&gt;
  740. </programlisting>
  741. </sect1>
  742. <sect1 id="mode-rule-import">
  743. <title>The IMPORT Tag</title>
  744. <para>The <literal>IMPORT</literal> tag, which must be placed inside a
  745. <literal>RULES</literal> tag, loads all rules defined in a given ruleset
  746. into the current ruleset; in other words, it has the same effect as
  747. copying and pasting the imported ruleset.</para>
  748. <para>The only required attribute <literal>DELEGATE</literal> must be
  749. set to the name of a ruleset. To import a ruleset defined in the current
  750. mode, just specify its name. To import a ruleset defined in another
  751. mode, specify a name of the form
  752. <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
  753. Note that the first (unnamed) ruleset in a mode is called
  754. <quote>MAIN</quote>.</para>
  755. <para>One quirk is that the definition of the imported ruleset is not
  756. copied to the location of the <literal>IMPORT</literal> tag, but rather
  757. to the end of the containing ruleset. This has implications with
  758. rule-ordering; see <xref linkend="rule-ordering" />.</para>
  759. <para>Here is an example from the PHP mode, which extends the inline
  760. JavaScript highlighting to support embedded PHP:</para>
  761. <programlisting>
  762. &lt;RULES SET="JAVASCRIPT+PHP"&gt;
  763. &lt;SPAN TYPE="MARKUP" DELEGATE="php::PHP"&gt;
  764. &lt;BEGIN&gt;&amp;lt;?php&lt;/BEGIN&gt;
  765. &lt;END&gt;?&amp;gt;&lt;/END&gt;
  766. &lt;/SPAN&gt;
  767. &lt;SPAN TYPE="MARKUP" DELEGATE="php::PHP"&gt;
  768. &lt;BEGIN&gt;&amp;lt;?&lt;/BEGIN&gt;
  769. &lt;END&gt;?&amp;gt;&lt;/END&gt;
  770. &lt;/SPAN&gt;
  771. &lt;SPAN TYPE="MARKUP" DELEGATE="php::PHP"&gt;
  772. &lt;BEGIN&gt;&amp;lt;%=&lt;/BEGIN&gt;
  773. &lt;END&gt;%&amp;gt;&lt;/END&gt;
  774. &lt;/SPAN&gt;
  775. &lt;IMPORT DELEGATE="javascript::MAIN"/&gt;
  776. &lt;/RULES&gt;</programlisting>
  777. </sect1>
  778. <sect1 id="mode-rule-keywords">
  779. <title>The KEYWORDS Tag</title>
  780. <para>The <literal>KEYWORDS</literal> tag, which must be placed inside a
  781. <literal>RULES</literal> tag and can only appear once, specifies a list
  782. of keywords to highlight. Keywords are similar to
  783. <literal>SEQ</literal>s, except that <literal>SEQ</literal>s match
  784. anywhere in the text, whereas keywords only match whole words. Words are
  785. considered to be runs of text separated by non-alphanumeric
  786. characters.</para>
  787. <para>The <literal>KEYWORDS</literal> tag does not define any
  788. attributes.</para>
  789. <para>Each child element of the <literal>KEYWORDS</literal> tag is an
  790. element whose name is a token type, and whose content is the keyword to
  791. highlight. For example, the following rule highlights the most common
  792. Java keywords:</para>
  793. <programlisting>&lt;KEYWORDS&gt;
  794. &lt;KEYWORD1&gt;if&lt;/KEYWORD1&gt;
  795. &lt;KEYWORD1&gt;else&lt;/KEYWORD1&gt;
  796. &lt;KEYWORD3&gt;int&lt;/KEYWORD3&gt;
  797. &lt;KEYWORD3&gt;void&lt;/KEYWORD3&gt;
  798. &lt;/KEYWORDS&gt;</programlisting>
  799. </sect1>
  800. <sect1 id="mode-syntax-tokens">
  801. <title>Token Types</title>
  802. <para>Parser rules can highlight tokens using any of the following token
  803. types:</para>
  804. <itemizedlist>
  805. <listitem>
  806. <para><literal>NULL</literal> - no special highlighting is
  807. performed on tokens of type <literal>NULL</literal></para>
  808. </listitem>
  809. <listitem>
  810. <para><literal>COMMENT1</literal></para>
  811. </listitem>
  812. <listitem>
  813. <para><literal>COMMENT2</literal></para>
  814. </listitem>
  815. <listitem>
  816. <para><literal>COMMENT3</literal></para>
  817. </listitem>
  818. <listitem>
  819. <para><literal>COMMENT4</literal></para>
  820. </listitem>
  821. <listitem>
  822. <para><literal>FUNCTION</literal></para>
  823. </listitem>
  824. <listitem>
  825. <para><literal>INVALID</literal><!-- - tokens of this type are
  826. automatically added if a <literal>NO_WORD_BREAK</literal> or
  827. <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
  828. one word or line, respectively. --></para>
  829. </listitem>
  830. <listitem>
  831. <para><literal>KEYWORD1</literal></para>
  832. </listitem>
  833. <listitem>
  834. <para><literal>KEYWORD2</literal></para>
  835. </listitem>
  836. <listitem>
  837. <para><literal>KEYWORD3</literal></para>
  838. </listitem>
  839. <listitem>
  840. <para><literal>KEYWORD4</literal></para>
  841. </listitem>
  842. <listitem>
  843. <para><literal>LABEL</literal></para>
  844. </listitem>
  845. <listitem>
  846. <para><literal>LITERAL1</literal></para>
  847. </listitem>
  848. <listitem>
  849. <para><literal>LITERAL2</literal></para>
  850. </listitem>
  851. <listitem>
  852. <para><literal>LITERAL3</literal></para>
  853. </listitem>
  854. <listitem>
  855. <para><literal>LITERAL4</literal></para>
  856. </listitem>
  857. <listitem>
  858. <para><literal>MARKUP</literal></para>
  859. </listitem>
  860. <listitem>
  861. <para><literal>OPERATOR</literal></para>
  862. </listitem>
  863. </itemizedlist>
  864. </sect1>
  865. <sect1 id="mode-match-type">
  866. <title>The MATCH_TYPE Attribute</title>
  867. <para>The <literal>MATCH_TYPE</literal> attribute is used by some of the
  868. rules to control how the region matched by the rule will be
  869. highlighted.</para>
  870. <para>For example, when using a <literal>MARK_PREVIOUS</literal> rule to
  871. highlight a function call of the form <literal>fcall()</literal>, the
  872. following rule could be used:</para>
  873. <programlisting>
  874. &lt;MARK_PREVIOUS TYPE="FUNCTION" MATCH_TYPE="OPERATOR"&gt;(&lt;/MARK_PREVIOUS&gt;</programlisting>
  875. <para>This would cause <literal>fcall</literal> to be highlighted as
  876. <literal>FUNCTION</literal>, and <literal>(</literal> to be highlighted
  877. as <literal>OPERATOR</literal>. In this case, to maintain bracket
  878. matching working, a <literal>SEQ</literal> rule would have to be added
  879. to match <literal>)</literal> and mark it as
  880. <literal>OPERATOR</literal>.</para>
  881. <para>The <literal>MATCH_TYPE</literal> attribute value can be any of
  882. the valid token types, or the following special values:</para>
  883. <itemizedlist>
  884. <listitem>
  885. <para><literal>RULE</literal>: this is the default value. It
  886. tells the syntax system to use the same token type as the TYPE
  887. attribute of the rule. This is equivalent to
  888. <literal>EXCLUDE_MATCH="FALSE"</literal> in 4.2 and earlier mode
  889. files.</para>
  890. </listitem>
  891. <listitem>
  892. <para><literal>CONTEXT</literal>: using this value tells the
  893. syntax system to mark the matched region using the default token
  894. type for the current rule set. In 4.2 and earlier mode files,
  895. this was specified by
  896. <literal>EXCLUDE_MATCH="TRUE"</literal>.</para>
  897. </listitem>
  898. </itemizedlist>
  899. </sect1>
  900. </chapter>