PageRenderTime 170ms CodeModel.GetById 142ms app.highlight 14ms RepoModel.GetById 1ms app.codeStats 1ms

/jEdit/tags/jedit-4-2-pre14/doc/users-guide/writing-modes.xml

#
XML | 789 lines | 767 code | 14 blank | 8 comment | 0 complexity | a38bc8d98114943488e82ded316cabaf MD5 | raw file
  1<!-- jEdit buffer-local properties: -->
  2<!-- :indentSize=1:noTabs=true: -->
  3<!-- :xml.root=users-guide.xml: -->
  4
  5<chapter id="writing-modes"><title>Mode Definition Syntax</title>
  6 <para>
  7  Edit modes are defined using XML, the <firstterm>extensible markup
  8  language</firstterm>; mode files have the extension
  9  <filename>.xml</filename>. XML is a very simple language, and as a result
 10  edit modes are easy to create and modify. This section will
 11  start with a short XML primer, followed by detailed information about
 12  each supported tag and highlighting rule.
 13 </para>
 14 <para>
 15  Editing a mode or a mode catalog file within jEdit will cause the
 16  changes to take effect immediately. If you edit modes using another
 17  application, the changes will take effect after the
 18  <guimenu>Utilities</guimenu>&gt;<guimenuitem>Reload Edit Modes</guimenuitem>
 19  command is invoked.
 20 </para>
 21 <sect1 id="xml-primer"><title>An XML Primer</title>
 22  <para>
 23   A very simple XML file (which also happens to be an edit mode) looks like so:
 24  </para>
 25  <programlisting><![CDATA[<?xml version="1.0"?>
 26
 27<!DOCTYPE MODE SYSTEM "xmode.dtd">
 28
 29<MODE>
 30    <PROPS>
 31        <PROPERTY NAME="commentStart" VALUE="/*" />
 32        <PROPERTY NAME="commentEnd" VALUE="*/" />
 33    </PROPS>
 34
 35    <RULES>
 36        <SPAN TYPE="COMMENT1">
 37            <BEGIN>/*</BEGIN>
 38            <END>*/</END>
 39        </SPAN>
 40    </RULES>
 41</MODE>]]></programlisting>
 42  <para>
 43   Note that each opening tag must have a corresponding closing tag.
 44   If there is nothing between the opening and closing tags, for example
 45   <literal>&lt;TAG&gt;&lt;/TAG&gt;</literal>, the shorthand notation
 46   <literal>&lt;TAG /&gt;</literal> may be used. An example of this shorthand
 47   can be seen
 48   in the <literal>&lt;PROPERTY&gt;</literal> tags above.
 49  </para>
 50  <para>
 51   XML is case sensitive. <literal>Span</literal> or <literal>span</literal>
 52   is not the same as <literal>SPAN</literal>.
 53  </para>
 54  <para>
 55   To insert a special character such as &lt; or &gt; literally in XML
 56   (for example, inside an attribute value), you must write it as
 57   an <firstterm>entity</firstterm>. An
 58   entity consists of the character's symbolic name enclosed with
 59   <quote>&amp;</quote> and <quote>;</quote>. The most frequently used entities
 60   are:
 61  </para>
 62  <itemizedlist>
 63   <listitem><para><literal>&amp;lt;</literal> - The less-than (&lt;)
 64   character</para></listitem>
 65   <listitem><para><literal>&amp;gt;</literal> - The greater-than (&gt;)
 66   character</para></listitem>
 67   <listitem><para><literal>&amp;amp;</literal> - The ampersand (&amp;)
 68   character</para></listitem>
 69  </itemizedlist>
 70  <para>
 71   For example, the following will cause a syntax error:
 72  </para>
 73  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;&lt;/SEQ&gt;</programlisting>
 74  <para>
 75   Instead, you must write:
 76  </para>
 77  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;amp;&lt;/SEQ&gt;</programlisting>
 78  <para>
 79   Now that the basics of XML have been covered, the rest of this
 80   section will cover each construct in detail.
 81  </para>
 82 </sect1>
 83 <sect1 id="mode-preamble"><title>The Preamble and MODE tag</title>
 84   <para>
 85    Each mode definition must begin with the following:
 86   </para>
 87   <programlisting>&lt;?xml version="1.0"?&gt;
 88&lt;!DOCTYPE MODE SYSTEM "xmode.dtd"&gt;</programlisting>
 89  <para>
 90   Each mode definition must also contain exactly one <literal>MODE</literal>
 91   tag. All other tags (<literal>PROPS</literal>, <literal>RULES</literal>)
 92   must be placed inside the <literal>MODE</literal> tag. The
 93   <literal>MODE</literal> tag does not have any defined attributes.
 94   Here is an example:
 95  </para>
 96  <programlisting><![CDATA[<MODE>]]>
 97    <replaceable>... mode definition goes here ...</replaceable>
 98<![CDATA[</MODE>]]></programlisting>
 99 </sect1>
100 <sect1 id="mode-tag-props"><title>The PROPS Tag</title>
101  <para>
102   The <literal>PROPS</literal> tag and the <literal>PROPERTY</literal> tags
103   inside it are used to define mode-specific
104   properties. Each <literal>PROPERTY</literal> tag must have a
105   <literal>NAME</literal> attribute set to the property's name, and a
106   <literal>VALUE</literal> attribute with the property's value.
107  </para>
108  <para>
109   All buffer-local properties listed in <xref linkend="buffer-local" />
110   may be given values in edit modes.
111  </para>
112  <para>
113   The following mode properties specify commenting strings:
114  </para>
115  <itemizedlist>
116   <listitem><para><literal>commentEnd</literal> - the comment end
117   string, used by the <guimenuitem>Range Comment</guimenuitem> command.
118   </para></listitem>
119   <listitem><para><literal>commentStart</literal> - the comment start
120   string, used by the <guimenuitem>Range Comment</guimenuitem> command.
121   </para></listitem>
122   <listitem><para><literal>lineComment</literal> - the line comment
123   string, used by the <guimenuitem>Line Comment</guimenuitem> command.
124   </para></listitem>
125  </itemizedlist>
126  <para>
127   When performing auto indent, a number of mode properties determine the
128   resulting indent level:
129  </para>
130  <itemizedlist>
131   <listitem><para>The line and the one before it are scanned for brackets
132   listed in the <literal>indentCloseBrackets</literal> and
133   <literal>indentOpenBrackets</literal> properties.
134   Opening brackets in the previous line increase indent.
135  </para>
136  <para>
137   If <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
138   then closing brackets on the current line will line up with
139   the line containing the matching opening bracket. For example, in Java mode
140   <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
141   resulting in brackets being indented like so:
142  </para>
143  <programlisting>{
144    // Code
145    {
146        // More code
147    }
148}</programlisting>
149   <para>
150    If <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
151    the line <emphasis>after</emphasis> a closing bracket will be lined up with
152    the line containing the matching opening bracket. For example, in Lisp mode
153    <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
154    resulting in brackets being indented like so:
155   </para>
156   <programlisting>(foo 'a-parameter
157    (crazy-p)
158    (bar baz ()))
159(print "hello world")</programlisting>
160   </listitem>
161   <listitem>
162    <para>If the previous line contains no opening brackets, or if the
163     <literal>doubleBracketIndent</literal> property is set to <literal>true</literal>,
164     the previous line is checked against the regular expressions in the
165     <literal>indentNextLine</literal> and <literal>indentNextLines</literal>
166     properties. If the previous line matches the former, the indent of the
167     current line is increased and the subsequent line is shifted back again.
168     If the previous line matches the latter, the indent of the current
169     and subsequent lines is increased.
170    </para>
171    <para>
172     In Java mode, for example, the <literal>indentNextLine</literal>
173     property is set to match control structures such as <quote>if</quote>,
174     <quote>else</quote>, <quote>while</quote>, and so on.
175    </para>
176    <para>
177     The
178     <literal>doubleBracketIndent</literal> property, if set to the default of
179     <literal>false</literal>, results in code indented like so:
180    </para>
181    <programlisting>while(objects.hasNext())
182{
183    Object next = objects.hasNext();
184    if(next instanceof Paintable)
185        next.paint(g);
186}</programlisting>
187   <para>
188     On the other hand, settings this property to <quote>true</quote> will
189     give the following result:
190   </para>
191     <programlisting>while(objects.hasNext())
192    {
193        Object next = objects.hasNext();
194        if(next instanceof Paintable)
195            next.paint(g);
196    }</programlisting></listitem>
197  </itemizedlist>
198  <para>
199   Here is the complete <literal>&lt;PROPS&gt;</literal> tag for Java mode:
200  </para>
201  <programlisting><![CDATA[<PROPS>
202    <PROPERTY NAME="commentStart" VALUE="/*" />
203    <PROPERTY NAME="commentEnd" VALUE="*/" />
204    <PROPERTY NAME="lineComment" VALUE="//" />
205    <PROPERTY NAME="wordBreakChars" VALUE=",+-=&lt;&gt;/?^&amp;*" />
206
207    <!-- Auto indent -->
208    <PROPERTY NAME="indentOpenBrackets" VALUE="{" />
209    <PROPERTY NAME="indentCloseBrackets" VALUE="}" />
210    <PROPERTY NAME="indentNextLine"
211    	VALUE="\s*(((if|while)\s*\(|else\s*|else\s+if\s*\(|for\s*\(.*\))[^{;]*)" />
212    <!-- set this to 'true' if you want to use GNU coding style -->
213    <PROPERTY NAME="doubleBracketIndent" VALUE="false" />
214    <PROPERTY NAME="lineUpClosingBracket" VALUE="true" />
215</PROPS>]]></programlisting>
216 </sect1>
217 <sect1 id="mode-tag-rules"><title>The RULES Tag</title>
218  <para>
219   <literal>RULES</literal> tags must be placed inside the
220   <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
221   <firstterm>ruleset</firstterm>. A ruleset consists of a number of
222   <firstterm>parser rules</firstterm>, with each parser
223   rule specifying how to highlight a specific syntax token. There must
224   be at least one ruleset in each edit mode. There can also be more
225   than one, with different rulesets being used to highlight different
226   parts of a buffer (for example, in HTML mode, one rule set
227   highlights HTML tags, and another highlights inline JavaScript).
228   For information about using more
229   than one ruleset, see <xref linkend="mode-rule-span" />.
230  </para>
231  <para>
232   The <literal>RULES</literal> tag supports the following attributes, all of
233   which are optional:
234  </para>
235  <itemizedlist>
236   <listitem><para><literal>SET</literal> - the name of this ruleset.
237   All rulesets other than the first must have a name.
238   </para></listitem>
239   <listitem><para><literal>IGNORE_CASE</literal> - if set to
240   <literal>FALSE</literal>, matches will be case sensitive. Otherwise, case
241   will not matter. Default is <literal>TRUE</literal>.
242   </para></listitem>
243   <listitem><para><literal>NO_WORD_SEP</literal> - any non-alphanumeric
244   character <emphasis>not</emphasis> in this list is treated as a word separator
245   for the purposes of syntax highlighting.
246   </para></listitem>
247   <listitem><para><literal>DEFAULT</literal> - the token type for
248   text which doesn't match
249   any specific rule. Default is <literal>NULL</literal>. See
250   <xref linkend="mode-syntax-tokens" /> for a list of token types.
251   </para></listitem>
252   <listitem><para><literal>HIGHLIGHT_DIGITS</literal>
253   </para></listitem>
254   <listitem><para><literal>DIGIT_RE</literal> - see below for information
255   about these two attributes.</para></listitem>
256  </itemizedlist>
257  <para>
258   Here is an example <literal>RULES</literal> tag:
259  </para>
260  <programlisting>&lt;RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE"&gt;
261    <replaceable>... parser rules go here ...</replaceable>
262&lt;/RULES&gt;</programlisting>
263
264  <sect2><title>Highlighting Numbers</title>
265   <para>
266    If the <literal>HIGHLIGHT_DIGITS</literal> attribute is set to
267    <literal>TRUE</literal>, jEdit will attempt to highlight numbers in this
268    ruleset.
269   </para>
270   <para>
271    Any word consisting entirely of digits (0-9) will be highlighted with the
272    <literal>DIGIT</literal> token type.
273    A word that contains other letters in addition to digits will be
274    highlighted with the
275    <literal>DIGIT</literal> token type only if it matches the regular
276    expression specified in the <literal>DIGIT_RE</literal> attribute.
277    If this attribute is not specified, it will not be highlighted.
278   </para>
279   <para>
280    Here is an example <literal>DIGIT_RE</literal> regular expression that highlights
281    Java-style numeric literals (normal numbers, hexadecimals
282    prefixed with <literal>0x</literal>, numbers suffixed with various
283    type indicators, and floating point literals containing an exponent):
284   </para>
285   <programlisting>DIGIT_RE="(0x[[:xdigit:]]+|[[:digit:]]+(e[[:digit:]]*)?)[lLdDfF]?"</programlisting>
286   <para>
287    Regular expression syntax is described in <xref linkend="regexps" />.
288   </para>
289  </sect2>
290  
291  <sect2 id="rule-ordering"><title>Rule Ordering Requirements</title>
292   <para>
293    You might encounter this very common pitfall when writing your own modes.
294   </para>
295   <para>
296    Since jEdit checks buffer text against parser rules in the order they appear
297    in the ruleset, more specific rules must be placed before generalized ones,
298    otherwise the generalized rules will catch everything.
299   </para>
300   <para>
301    This is best demonstrated with an example. The following is incorrect rule
302    ordering:
303   </para>
304   <programlisting><![CDATA[<SPAN TYPE="MARKUP">
305    <BEGIN>[</BEGIN>
306    <END>]</END>
307</SPAN>
308
309<SPAN TYPE="KEYWORD1">
310    <BEGIN>[!</BEGIN>
311    <END>]</END>
312</SPAN>]]></programlisting>
313   <para>
314    If you write the above in a rule set, any occurrence of <quote>[</quote>
315    (even things like <quote>[!DEFINE</quote>, etc) 
316    will be highlighted using the first rule, because it will be the
317    first to match. This is most likely not the intended behavior.
318   </para>
319   <para>
320    The problem can be solved by placing the more specific rule before the
321    general one:
322   </para>
323   <programlisting><![CDATA[<SPAN TYPE="KEYWORD1">
324    <BEGIN>[!</BEGIN>
325    <END>]</END>
326</SPAN>
327
328<SPAN TYPE="MARKUP">
329    <BEGIN>[</BEGIN>
330    <END>]</END>
331</SPAN>]]></programlisting>
332   <para>
333    Now, if the buffer contains the text <quote>[!SPECIAL]</quote>, the
334    rules will be checked in order, and the first rule will be the first
335    to match. However, if you write <quote>[FOO]</quote>, it will be highlighted
336    using the second rule, which is exactly what you would expect.
337   </para>
338  </sect2>
339  <sect2><title>Per-Ruleset Properties</title>
340   <para>
341    The <literal>PROPS</literal> tag (described in <xref linkend="mode-tag-props"/>)
342    can also be placed inside the <literal>RULES</literal> tag to define
343    ruleset-specific properties. The following properties can
344    be set on a per-ruleset basis:
345   </para>
346   <itemizedlist>
347    <listitem><para><literal>commentEnd</literal> - the comment end
348    string.
349    </para></listitem>
350    <listitem><para><literal>commentStart</literal> - the comment start
351    string.
352    </para></listitem>
353    <listitem><para><literal>lineComment</literal> - the line comment
354    string.
355    </para></listitem>
356   </itemizedlist>
357   <para>
358    This allows different parts of a file to have different comment strings
359    (in the case of HTML, for example, in HTML text and inline JavaScript).
360    For information about the commenting commands,
361    see <xref linkend="commenting"/>.
362   </para>
363  </sect2>
364 </sect1>
365  <sect1 id="mode-rule-terminate"><title>The TERMINATE Tag</title>
366  <para>
367   The <literal>TERMINATE</literal> rule, which must be placed inside a
368   <literal>RULES</literal> tag, specifies that parsing should stop
369   after the specified number of characters have been read from a line. The
370   number of characters to terminate after should be specified with the
371   <literal>AT_CHAR</literal> attribute. Here is an example:
372  </para>
373  <programlisting>&lt;TERMINATE AT_CHAR="1" /&gt;</programlisting>
374  <para>
375   This rule is used in Patch mode, for example, because only the first
376   character of each line affects highlighting.
377  </para>
378 </sect1>
379  <sect1 id="mode-rule-span"><title>The SPAN Tag</title>
380  <para>
381   The <literal>SPAN</literal> rule, which must be placed inside a
382   <literal>RULES</literal> tag, highlights text between a start
383   and end string. The start and end strings are specified inside
384   child elements of the <literal>SPAN</literal> tag.
385   The following attributes are supported:
386  </para>
387  <itemizedlist>
388   <listitem><para><literal>TYPE</literal> - The token type to highlight the
389   span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
390   types.</para></listitem>
391   <listitem><para><literal>AT_LINE_START</literal> - If set to
392   <literal>TRUE</literal>, the span will only be highlighted if the start
393   sequence occurs at the beginning of a line.</para></listitem>
394   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
395   <literal>TRUE</literal>, the span will only be highlighted if the
396   start sequence is the first non-whitespace text in the line.</para>
397   </listitem>
398   <listitem><para><literal>AT_WORD_START</literal> - If set to
399   <literal>TRUE</literal>, the span will only be highlighted if the start
400   sequence occurs at the beginning of a word.</para></listitem>
401   <listitem><para><literal>DELEGATE</literal> - text inside the span will be
402   highlighted with the specified ruleset. To delegate to a ruleset defined
403   in the current mode, just specify its name. To delegate to a ruleset
404   defined in another mode, specify a name of the form
405   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
406   Note that the first (unnamed) ruleset in a mode is called
407   <quote>MAIN</quote>.</para></listitem>
408   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
409   <literal>TRUE</literal>, the start and end sequences will not be highlighted,
410   only the text between them will.</para></listitem>
411   <listitem><para><literal>NO_ESCAPE</literal> - If set to
412   <literal>TRUE</literal>, the ruleset's escape character will have no
413   effect before the span's end string. Otherwise, the presence of the escape
414   character will cause that occurrence of the end string to be ignored.</para></listitem>
415   <listitem><para><literal>NO_LINE_BREAK</literal> - If set to
416   <literal>TRUE</literal>, the span will not cross line breaks.</para></listitem>
417   <listitem><para><literal>NO_WORD_BREAK</literal> - If set to
418   <literal>TRUE</literal>, the span will not cross word breaks.</para></listitem>
419  </itemizedlist>
420  <para>
421   Note that the <literal>AT_LINE_START</literal>,
422   <literal>AT_WHITESPACE_END</literal> and
423   <literal>AT_WORD_START</literal> attributes can also be used on the
424   <literal>BEGIN</literal> and <literal>END</literal> elements. Setting these
425   attributes to the same value on both elements has the same effect as
426   setting them on the <literal>SPAN</literal> element.
427  </para>
428  <para>
429   Here is a <literal>SPAN</literal> that highlights Java string literals,
430   which cannot include line breaks:
431  </para>
432  <programlisting>&lt;SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE"&gt;
433  &lt;BEGIN&gt;"&lt;/BEGIN&gt;
434  &lt;END&gt;"&lt;/END&gt;
435&lt;/SPAN&gt;</programlisting>
436  <para>
437   Here is a <literal>SPAN</literal> that highlights Java documentation
438   comments by delegating to the <quote>JAVADOC</quote> ruleset defined
439   elsewhere in the current mode:
440  </para>
441  <programlisting>&lt;SPAN TYPE="COMMENT2" DELEGATE="JAVADOC"&gt;
442  &lt;BEGIN&gt;/**&lt;/BEGIN&gt;
443  &lt;END&gt;*/&lt;/END&gt;
444&lt;/SPAN&gt;</programlisting>
445  <para>
446   Here is a <literal>SPAN</literal> that highlights HTML cascading stylesheets
447   inside <literal>&lt;STYLE&gt;</literal> tags by delegating to the main
448   ruleset in the CSS edit mode:
449  </para>
450  <programlisting>&lt;SPAN TYPE="MARKUP" DELEGATE="css::MAIN"&gt;
451  &lt;BEGIN&gt;&amp;lt;style&amp;gt;&lt;/BEGIN&gt;
452  &lt;END&gt;&amp;lt;/style&amp;gt;&lt;/END&gt;
453&lt;/SPAN&gt;</programlisting>
454 </sect1>
455 <sect1 id="mode-rule-span-regexp"><title>The SPAN_REGEXP Tag</title>
456  <para>
457   The <literal>SPAN_REGEXP</literal> rule is similar to the
458   <literal>SPAN</literal> rule except the start sequence is taken to be
459   a regular expression. In addition to the attributes supported by
460   the <literal>SPAN</literal> tag, the
461   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
462   the first character that
463   the regular expression matches.  This rules out using regular expressions
464   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
465  </para>
466  <para>
467   Any text matched by groups in the <literal>BEGIN</literal> regular
468   expression is substituted in the <literal>END</literal> string. See
469   below for an example of where this is useful.
470  </para>
471  <para>
472   Regular expression syntax is described in <xref linkend="regexps" />.
473  </para>
474  <para>
475   Here is a <literal>SPAN_REGEXP</literal> rule that highlights
476   <quote>read-ins</quote> in shell scripts:
477  </para>
478  <programlisting>&lt;SPAN_REGEXP HASH_CHAR="&lt;" TYPE="LITERAL1" DELEGATE="LITERAL"&gt;
479    &lt;BEGIN&gt;&lt;![CDATA[&lt;&lt;[[:space:]'"]*([[:alnum:]_]+)[[:space:]'"]*]]&gt;&lt;/BEGIN&gt;
480    &lt;END&gt;$1&lt;/END&gt;
481&lt;/SPAN_REGEXP&gt;</programlisting>
482  <para>
483   Here is a <literal>SPAN_REGEXP</literal> rule that highlights constructs
484   placed between <literal>&lt;#ftl</literal> and <literal>&gt;</literal>,
485   as long as the <literal>&lt;#ftl</literal> is followed by a word break:
486  </para>
487  <programlisting><![CDATA[<SPAN_REGEXP TYPE="KEYWORD1" HASH_CHAR="&lt;" DELEGATE="EXPRESSION">
488    <BEGIN>&lt;#ftl\&gt;</BEGIN>
489    <END>&gt;</END>
490</SPAN_REGEXP>]]></programlisting>
491 </sect1>
492 <sect1 id="mode-rule-eol-span"><title>The EOL_SPAN Tag</title>
493  <para>
494   An <literal>EOL_SPAN</literal> is similar to a <literal>SPAN</literal>
495   except that highlighting stops at the end of the line, and no end sequence
496   needs to be specified. The text to match is specified between the opening and
497   closing <literal>EOL_SPAN</literal> tags.
498   The following attributes are supported:
499  </para>
500  <itemizedlist>
501  <listitem><para><literal>TYPE</literal> - The token type to highlight the
502  span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
503  types.</para></listitem>
504  <listitem><para><literal>AT_LINE_START</literal> - If set to
505  <literal>TRUE</literal>, the span will only be highlighted if the start
506  sequence occurs at the beginning of a line.</para></listitem>
507  <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
508  <literal>TRUE</literal>, the span will only be highlighted if the
509   sequence is the first non-whitespace text in the line.</para>
510  </listitem>
511  <listitem><para><literal>AT_WORD_START</literal> - If set to
512  <literal>TRUE</literal>, the span will only be highlighted if the start
513  sequence occurs at the beginning of a word.</para></listitem>
514  <listitem><para><literal>DELEGATE</literal> - text inside the span will be
515  highlighted with the specified ruleset. To delegate to a ruleset defined
516  in the current mode, just specify its name. To delegate to a ruleset
517  defined in another mode, specify a name of the form
518  <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
519  Note that the first (unnamed) ruleset in a mode is called
520  <quote>MAIN</quote>.</para></listitem>
521  <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
522  <literal>TRUE</literal>, the start and end sequences will not be highlighted,
523  only the text between them will.</para></listitem>
524  </itemizedlist>
525  <para>
526   Here is an <literal>EOL_SPAN</literal> that highlights C++ comments:
527  </para>
528  <programlisting>&lt;EOL_SPAN TYPE="COMMENT1"&gt;//&lt;/EOL_SPAN&gt;</programlisting>
529 </sect1>
530 <sect1 id="mode-rule-eol-span-regexp"><title>The EOL_SPAN_REGEXP Tag</title>
531  <para>
532   The <literal>EOL_SPAN_REGEXP</literal> rule is similar to the
533   <literal>EOL_SPAN</literal> rule except the match sequence is taken to be
534   a regular expression. In addition to the attributes supported by
535   the <literal>EOL_SPAN</literal> tag, the
536   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
537   the first character that
538   the regular expression matches.  This rules out using regular expressions
539   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
540  </para>
541  <para>
542   Regular expression syntax is described in <xref linkend="regexps" />.
543  </para>
544  <para>
545   Here is an <literal>EOL_SPAN_REGEXP</literal> that highlights MS-DOS batch file comments, which start with <literal>REM</literal>, followed by any whitespace character, and extend until the end of the line:
546  </para>
547  <programlisting><![CDATA[<EOL_SPAN_REGEXP AT_WHITESPACE_END="TRUE" HASH_CHAR="R" TYPE="COMMENT1">REM\s</EOL_SPAN_REGEXP>]]></programlisting>
548 </sect1>
549 <sect1 id="mode-rule-mark-prev"><title>The MARK_PREVIOUS Tag</title>
550  <para>
551   The <literal>MARK_PREVIOUS</literal> rule, which must be placed inside a
552   <literal>RULES</literal> tag, highlights from the end of the
553   previous syntax token to the matched text. The text to match
554   is specified between opening and closing <literal>MARK_PREVIOUS</literal>
555   tags. The following attributes are supported:
556  </para>
557  <itemizedlist>
558   <listitem><para><literal>TYPE</literal> - The token type to highlight the
559   text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
560   types.</para></listitem>
561   <listitem><para><literal>AT_LINE_START</literal> - If set to
562   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
563   at the beginning of a line.</para></listitem>
564   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
565   <literal>TRUE</literal>, the sequence will only be highlighted if it is
566   the first non-whitespace text in the line.</para>
567   </listitem>
568   <listitem><para><literal>AT_WORD_START</literal> - If set to
569   <literal>TRUE</literal>, the sequence will only be highlighted if
570   it occurs at the beginning of a word.</para></listitem>
571   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
572   <literal>TRUE</literal>, the match will not be highlighted,
573   only the text before it will.</para></listitem>
574  </itemizedlist>
575  <para>
576   Here is a rule that highlights labels in Java mode (for example,
577   <quote>XXX:</quote>):
578  </para>
579  <programlisting>&lt;MARK_PREVIOUS AT_WHITESPACE_END="TRUE"
580   EXCLUDE_MATCH="TRUE"&gt;:&lt;/MARK_PREVIOUS&gt;</programlisting>
581 </sect1>
582 <sect1 id="mode-rule-mark-following"><title>The MARK_FOLLOWING Tag</title>
583  <para>
584   The <literal>MARK_FOLLOWING</literal> rule, which must be placed inside a
585   <literal>RULES</literal> tag, highlights from the start of the
586   match to the next syntax token. The text to match
587   is specified between opening and closing <literal>MARK_FOLLOWING</literal>
588   tags. The following attributes are supported:
589  </para>
590  <itemizedlist>
591   <listitem><para><literal>TYPE</literal> - The token type to highlight the
592   text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
593   types.</para></listitem>
594   <listitem><para><literal>AT_LINE_START</literal> - If set to
595   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
596   at the beginning of a line.</para></listitem>
597   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
598   <literal>TRUE</literal>, the sequence will only be highlighted if it is
599   the first non-whitespace text in the line.</para>
600   </listitem>
601   <listitem><para><literal>AT_WORD_START</literal> - If set to
602   <literal>TRUE</literal>, the sequence will only be highlighted if
603   it occurs at the beginning of a word.</para></listitem>
604   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
605   <literal>TRUE</literal>, the match will not be highlighted,
606   only the text after it will.</para></listitem>
607  </itemizedlist>
608  <para>
609   Here is a rule that highlights variables in Unix shell scripts
610   (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):
611  </para>
612  <programlisting>&lt;MARK_FOLLOWING TYPE="KEYWORD2"&gt;$&lt;/MARK_FOLLOWING&gt;</programlisting>
613 </sect1>
614  <sect1 id="mode-rule-seq"><title>The SEQ Tag</title>
615  <para>
616   The <literal>SEQ</literal> rule, which must be placed inside a
617   <literal>RULES</literal> tag, highlights fixed sequences of text. The text
618   to highlight is specified between opening and closing <literal>SEQ</literal>
619   tags. The following attributes are supported:
620  </para>
621  <itemizedlist>
622   <listitem><para><literal>TYPE</literal> - the token type to highlight the
623   sequence with. See <xref linkend="mode-syntax-tokens" /> for a list of token
624   types.</para></listitem>
625   <listitem><para><literal>AT_LINE_START</literal> - If set to
626   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
627   at the beginning of a line.</para></listitem>
628   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
629   <literal>TRUE</literal>, the sequence will only be highlighted if it is
630   the first non-whitespace text in the line.</para>
631   </listitem>
632   <listitem><para><literal>AT_WORD_START</literal> - If set to
633   <literal>TRUE</literal>, the sequence will only be highlighted if
634   it occurs at the beginning of a word.</para></listitem>
635   <listitem><para><literal>DELEGATE</literal> - if this attribute is specified,
636   all text after the sequence will be highlighted using this ruleset.
637   To delegate to a ruleset defined
638   in the current mode, just specify its name. To delegate to a ruleset
639   defined in another mode, specify a name of the form
640   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
641   Note that the first (unnamed) ruleset in a mode is called
642   <quote>MAIN</quote>.</para></listitem>
643  </itemizedlist>
644  <para>
645   The following rules highlight a few Java operators:
646  </para>
647  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;+&lt;/SEQ&gt;
648&lt;SEQ TYPE="OPERATOR"&gt;-&lt;/SEQ&gt;
649&lt;SEQ TYPE="OPERATOR"&gt;*&lt;/SEQ&gt;
650&lt;SEQ TYPE="OPERATOR"&gt;/&lt;/SEQ&gt;</programlisting>
651 </sect1>
652 <sect1 id="mode-rule-seq-regexp"><title>The SEQ_REGEXP Tag</title>
653  <para>
654   The <literal>SEQ_REGEXP</literal> rule is similar to the
655   <literal>SEQ</literal> rule except the match sequence is taken to be
656   a regular expression. In addition to the attributes supported by
657   the <literal>SEQ</literal> tag, the
658   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
659   the first character that
660   the regular expression matches. This rules out using regular expressions
661   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
662  </para>
663  <para>
664   Here is an example of a <literal>SEQ_REGEXP</literal> rule that highlights
665   Perl's matcher constructions such as <literal>m/(.+):(\d+):(.+)/</literal>:
666  </para>
667  <programlisting><![CDATA[<SEQ_REGEXP TYPE="MARKUP"
668    HASH_CHAR="m"
669    AT_WORD_START="TRUE"
670>m([[:punct:]])(?:.*?[^\\])*?\1[sgiexom]*</SEQ_REGEXP>]]></programlisting>
671
672  <para>
673   Regular expression syntax is described in <xref linkend="regexps" />.
674  </para>
675 </sect1>
676 <sect1 id="mode-rule-import"><title>The IMPORT Tag</title>
677  <para>
678   The <literal>IMPORT</literal> tag, which must be placed inside a <literal>RULES</literal> tag, loads all rules defined in a given ruleset into the current ruleset; in other words, it has the same effect as copying and pasting the imported ruleset.
679  </para>
680  <para>
681   The only required attribute <literal>DELEGATE</literal> must be set to the name of a ruleset. To import a ruleset defined
682   in the current mode, just specify its name. To import a ruleset
683   defined in another mode, specify a name of the form
684   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
685   Note that the first (unnamed) ruleset in a mode is called
686   <quote>MAIN</quote>.
687  </para>
688  <para>
689   One quirk is that the definition of the imported ruleset is not copied to the location of the <literal>IMPORT</literal> tag, but rather to the end of the containing ruleset. This has implications with rule-ordering; see <xref linkend="rule-ordering"/>.
690  </para>
691  <para>
692   Here is an example from the PHP mode, which extends the inline JavaScript highlighting to support embedded PHP:
693  </para>
694  <programlisting>
695   <![CDATA[<RULES SET="JAVASCRIPT+PHP">
696
697   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
698       <BEGIN>&lt;?php</BEGIN>
699       <END>?&gt;</END>
700   </SPAN>
701   
702   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
703       <BEGIN>&lt;?</BEGIN>
704       <END>?&gt;</END>
705   </SPAN>
706   
707   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
708       <BEGIN>&lt;%=</BEGIN>
709       <END>%&gt;</END>
710   </SPAN>
711
712   <IMPORT DELEGATE="javascript::MAIN"/>
713</RULES>]]></programlisting>
714 </sect1>
715 <sect1 id="mode-rule-keywords"><title>The KEYWORDS Tag</title>
716  <para>
717   The <literal>KEYWORDS</literal> tag, which must be placed inside a
718   <literal>RULES</literal> tag and can only appear once, specifies a list of
719   keywords to highlight.
720   Keywords are similar to <literal>SEQ</literal>s, except that
721   <literal>SEQ</literal>s match anywhere in the text, whereas keywords only
722   match whole words. Words are considered to be runs of text separated by
723   non-alphanumeric characters.
724  </para>
725  <para>
726   The <literal>KEYWORDS</literal> tag does not define any attributes.
727  </para>
728  <para>
729   Each child element of the <literal>KEYWORDS</literal> tag is an element
730   whose name is a token type, and whose content is the keyword to
731   highlight. For example, the following rule highlights the most common Java
732   keywords:
733  </para>
734  <programlisting>&lt;KEYWORDS&gt;
735  &lt;KEYWORD1&gt;if&lt;/KEYWORD1&gt;
736  &lt;KEYWORD1&gt;else&lt;/KEYWORD1&gt;
737  &lt;KEYWORD3&gt;int&lt;/KEYWORD3&gt;
738  &lt;KEYWORD3&gt;void&lt;/KEYWORD3&gt;
739&lt;/KEYWORDS&gt;</programlisting>
740 </sect1>
741  <sect1 id="mode-syntax-tokens"><title>Token Types</title>
742  <para>
743   Parser rules can highlight tokens using any of the following token
744   types:
745  </para>
746  <itemizedlist>
747  <listitem><para><literal>NULL</literal> - no special
748  highlighting is performed on tokens of type <literal>NULL</literal>
749  </para></listitem>
750  <listitem><para><literal>COMMENT1</literal>
751  </para></listitem>
752  <listitem><para><literal>COMMENT2</literal>
753  </para></listitem>
754  <listitem><para><literal>COMMENT3</literal>
755  </para></listitem>
756  <listitem><para><literal>COMMENT4</literal>
757  </para></listitem>
758  <listitem><para><literal>FUNCTION</literal>
759  </para></listitem>
760  <listitem><para><literal>INVALID</literal><!--  - tokens of this type are
761  automatically added if a <literal>NO_WORD_BREAK</literal> or
762  <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
763  one word or line, respectively. -->
764  </para></listitem>
765  <listitem><para><literal>KEYWORD1</literal>
766  </para></listitem>
767  <listitem><para><literal>KEYWORD2</literal>
768  </para></listitem>
769  <listitem><para><literal>KEYWORD3</literal>
770  </para></listitem>
771  <listitem><para><literal>KEYWORD4</literal>
772  </para></listitem>
773  <listitem><para><literal>LABEL</literal>
774  </para></listitem>
775  <listitem><para><literal>LITERAL1</literal>
776  </para></listitem>
777  <listitem><para><literal>LITERAL2</literal>
778  </para></listitem>
779  <listitem><para><literal>LITERAL3</literal>
780  </para></listitem>
781  <listitem><para><literal>LITERAL4</literal>
782  </para></listitem>
783  <listitem><para><literal>MARKUP</literal>
784  </para></listitem>
785  <listitem><para><literal>OPERATOR</literal>
786  </para></listitem>
787  </itemizedlist>
788 </sect1>
789</chapter>