PageRenderTime 91ms CodeModel.GetById 76ms app.highlight 9ms RepoModel.GetById 0ms app.codeStats 1ms

/jEdit/tags/jedit-4-3-pre5/doc/users-guide/writing-modes.xml

#
XML | 783 lines | 759 code | 16 blank | 8 comment | 0 complexity | 55d7f2147914cfde928f3125f2bb4883 MD5 | raw file
  1<!-- jEdit buffer-local properties: -->
  2<!-- :indentSize=1:noTabs=true: -->
  3<!-- :xml.root=users-guide.xml: -->
  4
  5<chapter id="writing-modes"><title>Mode Definition Syntax</title>
  6
  7 <para>
  8  Edit modes are defined using XML, the <firstterm>extensible markup
  9  language</firstterm>; mode files have the extension
 10  <filename>.xml</filename>. XML is a very simple language, and as a result
 11  edit modes are easy to create and modify. This section will
 12  start with a short XML primer, followed by detailed information about
 13  each supported tag and highlighting rule.
 14 </para>
 15 <para>
 16  Editing a mode or a mode catalog file within jEdit will cause the
 17  changes to take effect immediately. If you edit modes using another
 18  application, the changes will take effect after the
 19  <guimenu>Utilities</guimenu>&gt;<guimenuitem>Reload Edit Modes</guimenuitem>
 20  command is invoked.
 21 </para>
 22 <sect1 id="xml-primer"><title>An XML Primer</title>
 23  <para>
 24   A very simple XML file (which also happens to be an edit mode) looks like so:
 25  </para>
 26  <programlisting><![CDATA[<?xml version="1.0"?>
 27
 28<!DOCTYPE MODE SYSTEM "xmode.dtd">
 29
 30<MODE>
 31    <PROPS>
 32        <PROPERTY NAME="commentStart" VALUE="/*" />
 33        <PROPERTY NAME="commentEnd" VALUE="*/" />
 34    </PROPS>
 35
 36    <RULES>
 37        <SPAN TYPE="COMMENT1">
 38            <BEGIN>/*</BEGIN>
 39            <END>*/</END>
 40        </SPAN>
 41    </RULES>
 42</MODE>]]></programlisting>
 43  <para>
 44   Note that each opening tag must have a corresponding closing tag.
 45   If there is nothing between the opening and closing tags, for example
 46   <literal>&lt;TAG&gt;&lt;/TAG&gt;</literal>, the shorthand notation
 47   <literal>&lt;TAG /&gt;</literal> may be used. An example of this shorthand
 48   can be seen
 49   in the <literal>&lt;PROPERTY&gt;</literal> tags above.
 50  </para>
 51  <para>
 52   XML is case sensitive. <literal>Span</literal> or <literal>span</literal>
 53   is not the same as <literal>SPAN</literal>.
 54  </para>
 55  <para>
 56   To insert a special character such as &lt; or &gt; literally in XML
 57   (for example, inside an attribute value), you must write it as
 58   an <firstterm>entity</firstterm>. An
 59   entity consists of the character's symbolic name enclosed with
 60   <quote>&amp;</quote> and <quote>;</quote>. The most frequently used entities
 61   are:
 62  </para>
 63  <itemizedlist>
 64   <listitem><para><literal>&amp;lt;</literal> - The less-than (&lt;)
 65   character</para></listitem>
 66   <listitem><para><literal>&amp;gt;</literal> - The greater-than (&gt;)
 67   character</para></listitem>
 68   <listitem><para><literal>&amp;amp;</literal> - The ampersand (&amp;)
 69   character</para></listitem>
 70  </itemizedlist>
 71  <para>
 72   For example, the following will cause a syntax error:
 73  </para>
 74  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;&lt;/SEQ&gt;</programlisting>
 75  <para>
 76   Instead, you must write:
 77  </para>
 78  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;amp;&lt;/SEQ&gt;</programlisting>
 79  <para>
 80   Now that the basics of XML have been covered, the rest of this
 81   section will cover each construct in detail.
 82  </para>
 83 </sect1>
 84 <sect1 id="mode-preamble"><title>The Preamble and MODE tag</title>
 85   <para>
 86    Each mode definition must begin with the following:
 87   </para>
 88   <programlisting>&lt;?xml version="1.0"?&gt;
 89&lt;!DOCTYPE MODE SYSTEM "xmode.dtd"&gt;</programlisting>
 90  <para>
 91   Each mode definition must also contain exactly one <literal>MODE</literal>
 92   tag. All other tags (<literal>PROPS</literal>, <literal>RULES</literal>)
 93   must be placed inside the <literal>MODE</literal> tag. The
 94   <literal>MODE</literal> tag does not have any defined attributes.
 95   Here is an example:
 96  </para>
 97  <programlisting><![CDATA[<MODE>]]>
 98    <replaceable>... mode definition goes here ...</replaceable>
 99<![CDATA[</MODE>]]></programlisting>
100 </sect1>
101 <sect1 id="mode-tag-props"><title>The PROPS Tag</title>
102  <para>
103   The <literal>PROPS</literal> tag and the <literal>PROPERTY</literal> tags
104   inside it are used to define mode-specific
105   properties. Each <literal>PROPERTY</literal> tag must have a
106   <literal>NAME</literal> attribute set to the property's name, and a
107   <literal>VALUE</literal> attribute with the property's value.
108  </para>
109  <para>
110   All buffer-local properties listed in <xref linkend="buffer-local" />
111   may be given values in edit modes.
112  </para>
113  <para>
114   The following mode properties specify commenting strings:
115  </para>
116  <itemizedlist>
117   <listitem><para><literal>commentEnd</literal> - the comment end
118   string, used by the <guimenuitem>Range Comment</guimenuitem> command.
119   </para></listitem>
120   <listitem><para><literal>commentStart</literal> - the comment start
121   string, used by the <guimenuitem>Range Comment</guimenuitem> command.
122   </para></listitem>
123   <listitem><para><literal>lineComment</literal> - the line comment
124   string, used by the <guimenuitem>Line Comment</guimenuitem> command.
125   </para></listitem>
126  </itemizedlist>
127  <para>
128   When performing auto indent, a number of mode properties determine the
129   resulting indent level:
130  </para>
131  <itemizedlist>
132   <listitem><para>The line and the one before it are scanned for brackets
133   listed in the <literal>indentCloseBrackets</literal> and
134   <literal>indentOpenBrackets</literal> properties.
135   Opening brackets in the previous line increase indent.
136  </para>
137  <para>
138   If <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
139   then closing brackets on the current line will line up with
140   the line containing the matching opening bracket. For example, in Java mode
141   <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
142   resulting in brackets being indented like so:
143  </para>
144  <programlisting>{
145    // Code
146    {
147        // More code
148    }
149}</programlisting>
150   <para>
151    If <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
152    the line <emphasis>after</emphasis> a closing bracket will be lined up with
153    the line containing the matching opening bracket. For example, in Lisp mode
154    <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
155    resulting in brackets being indented like so:
156   </para>
157   <programlisting>(foo 'a-parameter
158    (crazy-p)
159    (bar baz ()))
160(print "hello world")</programlisting>
161   </listitem>
162   <listitem>
163    <para>If the previous line contains no opening brackets, or if the
164     <literal>doubleBracketIndent</literal> property is set to <literal>true</literal>,
165     the previous line is checked against the regular expressions in the
166     <literal>indentNextLine</literal> and <literal>indentNextLines</literal>
167     properties. If the previous line matches the former, the indent of the
168     current line is increased and the subsequent line is shifted back again.
169     If the previous line matches the latter, the indent of the current
170     and subsequent lines is increased.
171    </para>
172    <para>
173     In Java mode, for example, the <literal>indentNextLine</literal>
174     property is set to match control structures such as <quote>if</quote>,
175     <quote>else</quote>, <quote>while</quote>, and so on.
176    </para>
177    <para>
178     The
179     <literal>doubleBracketIndent</literal> property, if set to the default of
180     <literal>false</literal>, results in code indented like so:
181    </para>
182    <programlisting>while(objects.hasNext())
183{
184    Object next = objects.hasNext();
185    if(next instanceof Paintable)
186        next.paint(g);
187}</programlisting>
188   <para>
189     On the other hand, settings this property to <quote>true</quote> will
190     give the following result:
191   </para>
192     <programlisting>while(objects.hasNext())
193    {
194        Object next = objects.hasNext();
195        if(next instanceof Paintable)
196            next.paint(g);
197    }</programlisting></listitem>
198  </itemizedlist>
199  <para>
200   Here is the complete <literal>&lt;PROPS&gt;</literal> tag for Java mode:
201  </para>
202  <programlisting><![CDATA[<PROPS>
203    <PROPERTY NAME="commentStart" VALUE="/*" />
204    <PROPERTY NAME="commentEnd" VALUE="*/" />
205    <PROPERTY NAME="lineComment" VALUE="//" />
206    <PROPERTY NAME="wordBreakChars" VALUE=",+-=&lt;&gt;/?^&amp;*" />
207
208    <!-- Auto indent -->
209    <PROPERTY NAME="indentOpenBrackets" VALUE="{" />
210    <PROPERTY NAME="indentCloseBrackets" VALUE="}" />
211    <PROPERTY NAME="indentNextLine"
212    	VALUE="\s*(((if|while)\s*\(|else\s*|else\s+if\s*\(|for\s*\(.*\))[^{;]*)" />
213    <!-- set this to 'true' if you want to use GNU coding style -->
214    <PROPERTY NAME="doubleBracketIndent" VALUE="false" />
215    <PROPERTY NAME="lineUpClosingBracket" VALUE="true" />
216</PROPS>]]></programlisting>
217 </sect1>
218 <sect1 id="mode-tag-rules"><title>The RULES Tag</title>
219  <para>
220   <literal>RULES</literal> tags must be placed inside the
221   <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
222   <firstterm>ruleset</firstterm>. A ruleset consists of a number of
223   <firstterm>parser rules</firstterm>, with each parser
224   rule specifying how to highlight a specific syntax token. There must
225   be at least one ruleset in each edit mode. There can also be more
226   than one, with different rulesets being used to highlight different
227   parts of a buffer (for example, in HTML mode, one rule set
228   highlights HTML tags, and another highlights inline JavaScript).
229   For information about using more
230   than one ruleset, see <xref linkend="mode-rule-span" />.
231  </para>
232  <para>
233   The <literal>RULES</literal> tag supports the following attributes, all of
234   which are optional:
235  </para>
236  <itemizedlist>
237   <listitem><para><literal>SET</literal> - the name of this ruleset.
238   All rulesets other than the first must have a name.
239   </para></listitem>
240   <listitem><para><literal>IGNORE_CASE</literal> - if set to
241   <literal>FALSE</literal>, matches will be case sensitive. Otherwise, case
242   will not matter. Default is <literal>TRUE</literal>.
243   </para></listitem>
244   <listitem><para><literal>NO_WORD_SEP</literal> - any non-alphanumeric
245   character <emphasis>not</emphasis> in this list is treated as a word separator
246   for the purposes of syntax highlighting.
247   </para></listitem>
248   <listitem><para><literal>DEFAULT</literal> - the token type for
249   text which doesn't match
250   any specific rule. Default is <literal>NULL</literal>. See
251   <xref linkend="mode-syntax-tokens" /> for a list of token types.
252   </para></listitem>
253   <listitem><para><literal>HIGHLIGHT_DIGITS</literal>
254   </para></listitem>
255   <listitem><para><literal>DIGIT_RE</literal> - see below for information
256   about these two attributes.</para></listitem>
257  </itemizedlist>
258  <para>
259   Here is an example <literal>RULES</literal> tag:
260  </para>
261  <programlisting>&lt;RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE"&gt;
262    <replaceable>... parser rules go here ...</replaceable>
263&lt;/RULES&gt;</programlisting>
264
265  <sect2><title>Highlighting Numbers</title>
266   <para>
267    If the <literal>HIGHLIGHT_DIGITS</literal> attribute is set to
268    <literal>TRUE</literal>, jEdit will attempt to highlight numbers in this
269    ruleset.
270   </para>
271   <para>
272    Any word consisting entirely of digits (0-9) will be highlighted with the
273    <literal>DIGIT</literal> token type.
274    A word that contains other letters in addition to digits will be
275    highlighted with the
276    <literal>DIGIT</literal> token type only if it matches the regular
277    expression specified in the <literal>DIGIT_RE</literal> attribute.
278    If this attribute is not specified, it will not be highlighted.
279   </para>
280   <para>
281    Here is an example <literal>DIGIT_RE</literal> regular expression that highlights
282    Java-style numeric literals (normal numbers, hexadecimals
283    prefixed with <literal>0x</literal>, numbers suffixed with various
284    type indicators, and floating point literals containing an exponent):
285   </para>
286   <programlisting>DIGIT_RE="(0x\p{Xdigit}+|\d+(e\d*)?)[lLdDfF]?"</programlisting>
287   <para>
288    Regular expression syntax is described in <xref linkend="regexps" />.
289   </para>
290   
291  </sect2>
292  
293  <sect2 id="rule-ordering"><title>Rule Ordering Requirements</title>
294   <para>
295    You might encounter this very common pitfall when writing your own modes.
296   </para>
297   <para>
298    Since jEdit checks buffer text against parser rules in the order they appear
299    in the ruleset, more specific rules must be placed before generalized ones,
300    otherwise the generalized rules will catch everything.
301   </para>
302   <para>
303    This is best demonstrated with an example. The following is incorrect rule
304    ordering:
305   </para>
306   <programlisting><![CDATA[<SPAN TYPE="MARKUP">
307    <BEGIN>[</BEGIN>
308    <END>]</END>
309</SPAN>
310
311<SPAN TYPE="KEYWORD1">
312    <BEGIN>[!</BEGIN>
313    <END>]</END>
314</SPAN>]]></programlisting>
315   <para>
316    If you write the above in a rule set, any occurrence of <quote>[</quote>
317    (even things like <quote>[!DEFINE</quote>, etc) 
318    will be highlighted using the first rule, because it will be the
319    first to match. This is most likely not the intended behavior.
320   </para>
321   <para>
322    The problem can be solved by placing the more specific rule before the
323    general one:
324   </para>
325   <programlisting><![CDATA[<SPAN TYPE="KEYWORD1">
326    <BEGIN>[!</BEGIN>
327    <END>]</END>
328</SPAN>
329
330<SPAN TYPE="MARKUP">
331    <BEGIN>[</BEGIN>
332    <END>]</END>
333</SPAN>]]></programlisting>
334   <para>
335    Now, if the buffer contains the text <quote>[!SPECIAL]</quote>, the
336    rules will be checked in order, and the first rule will be the first
337    to match. However, if you write <quote>[FOO]</quote>, it will be highlighted
338    using the second rule, which is exactly what you would expect.
339   </para>
340  </sect2>
341  <sect2><title>Per-Ruleset Properties</title>
342   <para>
343    The <literal>PROPS</literal> tag (described in <xref linkend="mode-tag-props"/>)
344    can also be placed inside the <literal>RULES</literal> tag to define
345    ruleset-specific properties. The following properties can
346    be set on a per-ruleset basis:
347   </para>
348   <itemizedlist>
349    <listitem><para><literal>commentEnd</literal> - the comment end
350    string.
351    </para></listitem>
352    <listitem><para><literal>commentStart</literal> - the comment start
353    string.
354    </para></listitem>
355    <listitem><para><literal>lineComment</literal> - the line comment
356    string.
357    </para></listitem>
358   </itemizedlist>
359   <para>
360    This allows different parts of a file to have different comment strings
361    (in the case of HTML, for example, in HTML text and inline JavaScript).
362    For information about the commenting commands,
363    see <xref linkend="commenting"/>.
364   </para>
365  </sect2>
366 </sect1>
367  <sect1 id="mode-rule-terminate"><title>The TERMINATE Tag</title>
368  <para>
369   The <literal>TERMINATE</literal> rule, which must be placed inside a
370   <literal>RULES</literal> tag, specifies that parsing should stop
371   after the specified number of characters have been read from a line. The
372   number of characters to terminate after should be specified with the
373   <literal>AT_CHAR</literal> attribute. Here is an example:
374  </para>
375  <programlisting>&lt;TERMINATE AT_CHAR="1" /&gt;</programlisting>
376  <para>
377   This rule is used in Patch mode, for example, because only the first
378   character of each line affects highlighting.
379  </para>
380 </sect1>
381  <sect1 id="mode-rule-span"><title>The SPAN Tag</title>
382  <para>
383   The <literal>SPAN</literal> rule, which must be placed inside a
384   <literal>RULES</literal> tag, highlights text between a start
385   and end string. The start and end strings are specified inside
386   child elements of the <literal>SPAN</literal> tag.
387   The following attributes are supported:
388  </para>
389  <itemizedlist>
390   <listitem><para><literal>TYPE</literal> - The token type to highlight the
391   span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
392   types.</para></listitem>
393   <listitem><para><literal>AT_LINE_START</literal> - If set to
394   <literal>TRUE</literal>, the span will only be highlighted if the start
395   sequence occurs at the beginning of a line.</para></listitem>
396   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
397   <literal>TRUE</literal>, the span will only be highlighted if the
398   start sequence is the first non-whitespace text in the line.</para>
399   </listitem>
400   <listitem><para><literal>AT_WORD_START</literal> - If set to
401   <literal>TRUE</literal>, the span will only be highlighted if the start
402   sequence occurs at the beginning of a word.</para></listitem>
403   <listitem><para><literal>DELEGATE</literal> - text inside the span will be
404   highlighted with the specified ruleset. To delegate to a ruleset defined
405   in the current mode, just specify its name. To delegate to a ruleset
406   defined in another mode, specify a name of the form
407   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
408   Note that the first (unnamed) ruleset in a mode is called
409   <quote>MAIN</quote>.</para></listitem>
410   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
411   <literal>TRUE</literal>, the start and end sequences will not be highlighted,
412   only the text between them will.</para></listitem>
413   <listitem><para><literal>NO_ESCAPE</literal> - If set to
414   <literal>TRUE</literal>, the ruleset's escape character will have no
415   effect before the span's end string. Otherwise, the presence of the escape
416   character will cause that occurrence of the end string to be ignored.</para></listitem>
417   <listitem><para><literal>NO_LINE_BREAK</literal> - If set to
418   <literal>TRUE</literal>, the span will not cross line breaks.</para></listitem>
419   <listitem><para><literal>NO_WORD_BREAK</literal> - If set to
420   <literal>TRUE</literal>, the span will not cross word breaks.</para></listitem>
421  </itemizedlist>
422  <para>
423   Note that the <literal>AT_LINE_START</literal>,
424   <literal>AT_WHITESPACE_END</literal> and
425   <literal>AT_WORD_START</literal> attributes can also be used on the
426   <literal>BEGIN</literal> and <literal>END</literal> elements. Setting these
427   attributes to the same value on both elements has the same effect as
428   setting them on the <literal>SPAN</literal> element.
429  </para>
430  <para>
431   Here is a <literal>SPAN</literal> that highlights Java string literals,
432   which cannot include line breaks:
433  </para>
434  <programlisting>&lt;SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE"&gt;
435  &lt;BEGIN&gt;"&lt;/BEGIN&gt;
436  &lt;END&gt;"&lt;/END&gt;
437&lt;/SPAN&gt;</programlisting>
438  <para>
439   Here is a <literal>SPAN</literal> that highlights Java documentation
440   comments by delegating to the <quote>JAVADOC</quote> ruleset defined
441   elsewhere in the current mode:
442  </para>
443  <programlisting>&lt;SPAN TYPE="COMMENT2" DELEGATE="JAVADOC"&gt;
444  &lt;BEGIN&gt;/**&lt;/BEGIN&gt;
445  &lt;END&gt;*/&lt;/END&gt;
446&lt;/SPAN&gt;</programlisting>
447  <para>
448   Here is a <literal>SPAN</literal> that highlights HTML cascading stylesheets
449   inside <literal>&lt;STYLE&gt;</literal> tags by delegating to the main
450   ruleset in the CSS edit mode:
451  </para>
452  <programlisting>&lt;SPAN TYPE="MARKUP" DELEGATE="css::MAIN"&gt;
453  &lt;BEGIN&gt;&amp;lt;style&amp;gt;&lt;/BEGIN&gt;
454  &lt;END&gt;&amp;lt;/style&amp;gt;&lt;/END&gt;
455&lt;/SPAN&gt;</programlisting>
456 </sect1>
457 <sect1 id="mode-rule-span-regexp"><title>The SPAN_REGEXP Tag</title>
458  <para>
459   The <literal>SPAN_REGEXP</literal> rule is similar to the
460   <literal>SPAN</literal> rule except the start sequence is taken to be
461   a regular expression. In addition to the attributes supported by
462   the <literal>SPAN</literal> tag, the
463   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
464   the first character that
465   the regular expression matches.  This rules out using regular expressions
466   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
467  </para>
468  <para>
469   Any text matched by groups in the <literal>BEGIN</literal> regular
470   expression is substituted in the <literal>END</literal> string. See
471   below for an example of where this is useful.
472  </para>
473  <para>
474   Regular expression syntax is described in <xref linkend="regexps" />.
475  </para>
476  <para>
477   Here is a <literal>SPAN_REGEXP</literal> rule that highlights
478   <quote>read-ins</quote> in shell scripts:
479  </para>
480  <programlisting>&lt;SPAN_REGEXP HASH_CHAR="&lt;" TYPE="LITERAL1" DELEGATE="LITERAL"&gt;
481    &lt;BEGIN&gt;&lt;![CDATA[&lt;&lt;['"\s]*(\p\w+)[\s'"]*]]&gt;&lt;/BEGIN&gt;
482    &lt;END&gt;$1&lt;/END&gt;
483&lt;/SPAN_REGEXP&gt;</programlisting>
484  <para>
485   Here is a <literal>SPAN_REGEXP</literal> rule that highlights constructs
486   placed between <literal>&lt;#ftl</literal> and <literal>&gt;</literal>,
487   as long as the <literal>&lt;#ftl</literal> is followed by a word break:
488  </para>
489  <programlisting><![CDATA[<SPAN_REGEXP TYPE="KEYWORD1" HASH_CHAR="&lt;" DELEGATE="EXPRESSION">
490    <BEGIN>&lt;#ftl\&gt;</BEGIN>
491    <END>&gt;</END>
492</SPAN_REGEXP>]]></programlisting>
493 </sect1>
494 <sect1 id="mode-rule-eol-span"><title>The EOL_SPAN Tag</title>
495  <para>
496   An <literal>EOL_SPAN</literal> is similar to a <literal>SPAN</literal>
497   except that highlighting stops at the end of the line, and no end sequence
498   needs to be specified. The text to match is specified between the opening and
499   closing <literal>EOL_SPAN</literal> tags.
500   The following attributes are supported:
501  </para>
502  <itemizedlist>
503  <listitem><para><literal>TYPE</literal> - The token type to highlight the
504  span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
505  types.</para></listitem>
506  <listitem><para><literal>AT_LINE_START</literal> - If set to
507  <literal>TRUE</literal>, the span will only be highlighted if the start
508  sequence occurs at the beginning of a line.</para></listitem>
509  <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
510  <literal>TRUE</literal>, the span will only be highlighted if the
511   sequence is the first non-whitespace text in the line.</para>
512  </listitem>
513  <listitem><para><literal>AT_WORD_START</literal> - If set to
514  <literal>TRUE</literal>, the span will only be highlighted if the start
515  sequence occurs at the beginning of a word.</para></listitem>
516  <listitem><para><literal>DELEGATE</literal> - text inside the span will be
517  highlighted with the specified ruleset. To delegate to a ruleset defined
518  in the current mode, just specify its name. To delegate to a ruleset
519  defined in another mode, specify a name of the form
520  <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
521  Note that the first (unnamed) ruleset in a mode is called
522  <quote>MAIN</quote>.</para></listitem>
523  <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
524  <literal>TRUE</literal>, the start and end sequences will not be highlighted,
525  only the text between them will.</para></listitem>
526  </itemizedlist>
527  <para>
528   Here is an <literal>EOL_SPAN</literal> that highlights C++ comments:
529  </para>
530  <programlisting>&lt;EOL_SPAN TYPE="COMMENT1"&gt;//&lt;/EOL_SPAN&gt;</programlisting>
531 </sect1>
532 <sect1 id="mode-rule-eol-span-regexp"><title>The EOL_SPAN_REGEXP Tag</title>
533  <para>
534   The <literal>EOL_SPAN_REGEXP</literal> rule is similar to the
535   <literal>EOL_SPAN</literal> rule except the match sequence is taken to be
536   a regular expression. In addition to the attributes supported by
537   the <literal>EOL_SPAN</literal> tag, the
538   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
539   the first character that
540   the regular expression matches.  This rules out using regular expressions
541   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
542  </para>
543  <para>
544   Regular expression syntax is described in <xref linkend="regexps" />.
545  </para>
546  <para>
547   Here is an <literal>EOL_SPAN_REGEXP</literal> that highlights MS-DOS batch file comments, which start with <literal>REM</literal>, followed by any whitespace character, and extend until the end of the line:
548  </para>
549  <programlisting><![CDATA[<EOL_SPAN_REGEXP AT_WHITESPACE_END="TRUE" HASH_CHAR="R" TYPE="COMMENT1">REM\s</EOL_SPAN_REGEXP>]]></programlisting>
550 </sect1>
551 <sect1 id="mode-rule-mark-prev"><title>The MARK_PREVIOUS Tag</title>
552  <para>
553   The <literal>MARK_PREVIOUS</literal> rule, which must be placed inside a
554   <literal>RULES</literal> tag, highlights from the end of the
555   previous syntax token to the matched text. The text to match
556   is specified between opening and closing <literal>MARK_PREVIOUS</literal>
557   tags. The following attributes are supported:
558  </para>
559  <itemizedlist>
560   <listitem><para><literal>TYPE</literal> - The token type to highlight the
561   text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
562   types.</para></listitem>
563   <listitem><para><literal>AT_LINE_START</literal> - If set to
564   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
565   at the beginning of a line.</para></listitem>
566   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
567   <literal>TRUE</literal>, the sequence will only be highlighted if it is
568   the first non-whitespace text in the line.</para>
569   </listitem>
570   <listitem><para><literal>AT_WORD_START</literal> - If set to
571   <literal>TRUE</literal>, the sequence will only be highlighted if
572   it occurs at the beginning of a word.</para></listitem>
573   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
574   <literal>TRUE</literal>, the match will not be highlighted,
575   only the text before it will.</para></listitem>
576  </itemizedlist>
577  <para>
578   Here is a rule that highlights labels in Java mode (for example,
579   <quote>XXX:</quote>):
580  </para>
581  <programlisting>&lt;MARK_PREVIOUS AT_WHITESPACE_END="TRUE"
582   EXCLUDE_MATCH="TRUE"&gt;:&lt;/MARK_PREVIOUS&gt;</programlisting>
583 </sect1>
584 <sect1 id="mode-rule-mark-following"><title>The MARK_FOLLOWING Tag</title>
585  <para>
586   The <literal>MARK_FOLLOWING</literal> rule, which must be placed inside a
587   <literal>RULES</literal> tag, highlights from the start of the
588   match to the next syntax token. The text to match
589   is specified between opening and closing <literal>MARK_FOLLOWING</literal>
590   tags. The following attributes are supported:
591  </para>
592  <itemizedlist>
593   <listitem><para><literal>TYPE</literal> - The token type to highlight the
594   text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
595   types.</para></listitem>
596   <listitem><para><literal>AT_LINE_START</literal> - If set to
597   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
598   at the beginning of a line.</para></listitem>
599   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
600   <literal>TRUE</literal>, the sequence will only be highlighted if it is
601   the first non-whitespace text in the line.</para>
602   </listitem>
603   <listitem><para><literal>AT_WORD_START</literal> - If set to
604   <literal>TRUE</literal>, the sequence will only be highlighted if
605   it occurs at the beginning of a word.</para></listitem>
606   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
607   <literal>TRUE</literal>, the match will not be highlighted,
608   only the text after it will.</para></listitem>
609  </itemizedlist>
610  <para>
611   Here is a rule that highlights variables in Unix shell scripts
612   (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):
613  </para>
614  <programlisting>&lt;MARK_FOLLOWING TYPE="KEYWORD2"&gt;$&lt;/MARK_FOLLOWING&gt;</programlisting>
615 </sect1>
616  <sect1 id="mode-rule-seq"><title>The SEQ Tag</title>
617  <para>
618   The <literal>SEQ</literal> rule, which must be placed inside a
619   <literal>RULES</literal> tag, highlights fixed sequences of text. The text
620   to highlight is specified between opening and closing <literal>SEQ</literal>
621   tags. The following attributes are supported:
622  </para>
623  <itemizedlist>
624   <listitem><para><literal>TYPE</literal> - the token type to highlight the
625   sequence with. See <xref linkend="mode-syntax-tokens" /> for a list of token
626   types.</para></listitem>
627   <listitem><para><literal>AT_LINE_START</literal> - If set to
628   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
629   at the beginning of a line.</para></listitem>
630   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
631   <literal>TRUE</literal>, the sequence will only be highlighted if it is
632   the first non-whitespace text in the line.</para>
633   </listitem>
634   <listitem><para><literal>AT_WORD_START</literal> - If set to
635   <literal>TRUE</literal>, the sequence will only be highlighted if
636   it occurs at the beginning of a word.</para></listitem>
637   <listitem><para><literal>DELEGATE</literal> - if this attribute is specified,
638   all text after the sequence will be highlighted using this ruleset.
639   To delegate to a ruleset defined
640   in the current mode, just specify its name. To delegate to a ruleset
641   defined in another mode, specify a name of the form
642   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
643   Note that the first (unnamed) ruleset in a mode is called
644   <quote>MAIN</quote>.</para></listitem>
645  </itemizedlist>
646  <para>
647   The following rules highlight a few Java operators:
648  </para>
649  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;+&lt;/SEQ&gt;
650&lt;SEQ TYPE="OPERATOR"&gt;-&lt;/SEQ&gt;
651&lt;SEQ TYPE="OPERATOR"&gt;*&lt;/SEQ&gt;
652&lt;SEQ TYPE="OPERATOR"&gt;/&lt;/SEQ&gt;</programlisting>
653 </sect1>
654 <sect1 id="mode-rule-seq-regexp"><title>The SEQ_REGEXP Tag</title>
655  <para>
656   The <literal>SEQ_REGEXP</literal> rule is similar to the
657   <literal>SEQ</literal> rule except the match sequence is taken to be
658   a regular expression. In addition to the attributes supported by
659   the <literal>SEQ</literal> tag, the
660   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
661   the first character that
662   the regular expression matches. This rules out using regular expressions
663   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
664  </para>
665  
666  <para>
667   Regular expression syntax is described in <xref linkend="regexps" />.
668  </para>
669 </sect1>
670 <sect1 id="mode-rule-import"><title>The IMPORT Tag</title>
671  <para>
672   The <literal>IMPORT</literal> tag, which must be placed inside a <literal>RULES</literal> tag, loads all rules defined in a given ruleset into the current ruleset; in other words, it has the same effect as copying and pasting the imported ruleset.
673  </para>
674  <para>
675   The only required attribute <literal>DELEGATE</literal> must be set to the name of a ruleset. To import a ruleset defined
676   in the current mode, just specify its name. To import a ruleset
677   defined in another mode, specify a name of the form
678   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
679   Note that the first (unnamed) ruleset in a mode is called
680   <quote>MAIN</quote>.
681  </para>
682  <para>
683   One quirk is that the definition of the imported ruleset is not copied to the location of the <literal>IMPORT</literal> tag, but rather to the end of the containing ruleset. This has implications with rule-ordering; see <xref linkend="rule-ordering"/>.
684  </para>
685  <para>
686   Here is an example from the PHP mode, which extends the inline JavaScript highlighting to support embedded PHP:
687  </para>
688  <programlisting>
689   <![CDATA[<RULES SET="JAVASCRIPT+PHP">
690
691   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
692       <BEGIN>&lt;?php</BEGIN>
693       <END>?&gt;</END>
694   </SPAN>
695   
696   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
697       <BEGIN>&lt;?</BEGIN>
698       <END>?&gt;</END>
699   </SPAN>
700   
701   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
702       <BEGIN>&lt;%=</BEGIN>
703       <END>%&gt;</END>
704   </SPAN>
705
706   <IMPORT DELEGATE="javascript::MAIN"/>
707</RULES>]]></programlisting>
708 </sect1>
709 <sect1 id="mode-rule-keywords"><title>The KEYWORDS Tag</title>
710  <para>
711   The <literal>KEYWORDS</literal> tag, which must be placed inside a
712   <literal>RULES</literal> tag and can only appear once, specifies a list of
713   keywords to highlight.
714   Keywords are similar to <literal>SEQ</literal>s, except that
715   <literal>SEQ</literal>s match anywhere in the text, whereas keywords only
716   match whole words. Words are considered to be runs of text separated by
717   non-alphanumeric characters.
718  </para>
719  <para>
720   The <literal>KEYWORDS</literal> tag does not define any attributes.
721  </para>
722  <para>
723   Each child element of the <literal>KEYWORDS</literal> tag is an element
724   whose name is a token type, and whose content is the keyword to
725   highlight. For example, the following rule highlights the most common Java
726   keywords:
727  </para>
728  <programlisting>&lt;KEYWORDS&gt;
729  &lt;KEYWORD1&gt;if&lt;/KEYWORD1&gt;
730  &lt;KEYWORD1&gt;else&lt;/KEYWORD1&gt;
731  &lt;KEYWORD3&gt;int&lt;/KEYWORD3&gt;
732  &lt;KEYWORD3&gt;void&lt;/KEYWORD3&gt;
733&lt;/KEYWORDS&gt;</programlisting>
734 </sect1>
735  <sect1 id="mode-syntax-tokens"><title>Token Types</title>
736  <para>
737   Parser rules can highlight tokens using any of the following token
738   types:
739  </para>
740  <itemizedlist>
741  <listitem><para><literal>NULL</literal> - no special
742  highlighting is performed on tokens of type <literal>NULL</literal>
743  </para></listitem>
744  <listitem><para><literal>COMMENT1</literal>
745  </para></listitem>
746  <listitem><para><literal>COMMENT2</literal>
747  </para></listitem>
748  <listitem><para><literal>COMMENT3</literal>
749  </para></listitem>
750  <listitem><para><literal>COMMENT4</literal>
751  </para></listitem>
752  <listitem><para><literal>FUNCTION</literal>
753  </para></listitem>
754  <listitem><para><literal>INVALID</literal><!--  - tokens of this type are
755  automatically added if a <literal>NO_WORD_BREAK</literal> or
756  <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
757  one word or line, respectively. -->
758  </para></listitem>
759  <listitem><para><literal>KEYWORD1</literal>
760  </para></listitem>
761  <listitem><para><literal>KEYWORD2</literal>
762  </para></listitem>
763  <listitem><para><literal>KEYWORD3</literal>
764  </para></listitem>
765  <listitem><para><literal>KEYWORD4</literal>
766  </para></listitem>
767  <listitem><para><literal>LABEL</literal>
768  </para></listitem>
769  <listitem><para><literal>LITERAL1</literal>
770  </para></listitem>
771  <listitem><para><literal>LITERAL2</literal>
772  </para></listitem>
773  <listitem><para><literal>LITERAL3</literal>
774  </para></listitem>
775  <listitem><para><literal>LITERAL4</literal>
776  </para></listitem>
777  <listitem><para><literal>MARKUP</literal>
778  </para></listitem>
779  <listitem><para><literal>OPERATOR</literal>
780  </para></listitem>
781  </itemizedlist>
782 </sect1>
783</chapter>