writing-modes.xml | searchcode

/jEdit/tags/jedit-4-2-pre14/doc/users-guide/writing-modes.xml

# · XML · 789 lines · 767 code · 14 blank · 8 comment · 0 complexity · a38bc8d98114943488e82ded316cabaf MD5 · raw file

<!-- jEdit buffer-local properties: -->
<!-- :indentSize=1:noTabs=true: -->
<!-- :xml.root=users-guide.xml: -->

<chapter id="writing-modes"><title>Mode Definition Syntax</title>
 <para>
  Edit modes are defined using XML, the <firstterm>extensible markup
  language</firstterm>; mode files have the extension
  <filename>.xml</filename>. XML is a very simple language, and as a result
  edit modes are easy to create and modify. This section will
  start with a short XML primer, followed by detailed information about
  each supported tag and highlighting rule.
 </para>
 <para>
  Editing a mode or a mode catalog file within jEdit will cause the
  changes to take effect immediately. If you edit modes using another
  application, the changes will take effect after the
  <guimenu>Utilities</guimenu>&gt;<guimenuitem>Reload Edit Modes</guimenuitem>
  command is invoked.
 </para>
 <sect1 id="xml-primer"><title>An XML Primer</title>
  <para>
   A very simple XML file (which also happens to be an edit mode) looks like so:
  </para>
  <programlisting><![CDATA[<?xml version="1.0"?>

<!DOCTYPE MODE SYSTEM "xmode.dtd">

<MODE>
    <PROPS>
        <PROPERTY NAME="commentStart" VALUE="/*" />
        <PROPERTY NAME="commentEnd" VALUE="*/" />
    </PROPS>

    <RULES>
        <SPAN TYPE="COMMENT1">
            <BEGIN>/*</BEGIN>
            <END>*/</END>
        </SPAN>
    </RULES>
</MODE>]]></programlisting>
  <para>
   Note that each opening tag must have a corresponding closing tag.
   If there is nothing between the opening and closing tags, for example
   <literal>&lt;TAG&gt;&lt;/TAG&gt;</literal>, the shorthand notation
   <literal>&lt;TAG /&gt;</literal> may be used. An example of this shorthand
   can be seen
   in the <literal>&lt;PROPERTY&gt;</literal> tags above.
  </para>
  <para>
   XML is case sensitive. <literal>Span</literal> or <literal>span</literal>
   is not the same as <literal>SPAN</literal>.
  </para>
  <para>
   To insert a special character such as &lt; or &gt; literally in XML
   (for example, inside an attribute value), you must write it as
   an <firstterm>entity</firstterm>. An
   entity consists of the character's symbolic name enclosed with
   <quote>&amp;</quote> and <quote>;</quote>. The most frequently used entities
   are:
  </para>
  <itemizedlist>
   <listitem><para><literal>&amp;lt;</literal> - The less-than (&lt;)
   character</para></listitem>
   <listitem><para><literal>&amp;gt;</literal> - The greater-than (&gt;)
   character</para></listitem>
   <listitem><para><literal>&amp;amp;</literal> - The ampersand (&amp;)
   character</para></listitem>
  </itemizedlist>
  <para>
   For example, the following will cause a syntax error:
  </para>
  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;&lt;/SEQ&gt;</programlisting>
  <para>
   Instead, you must write:
  </para>
  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;amp;&lt;/SEQ&gt;</programlisting>
  <para>
   Now that the basics of XML have been covered, the rest of this
   section will cover each construct in detail.
  </para>
 </sect1>
 <sect1 id="mode-preamble"><title>The Preamble and MODE tag</title>
   <para>
    Each mode definition must begin with the following:
   </para>
   <programlisting>&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE MODE SYSTEM "xmode.dtd"&gt;</programlisting>
  <para>
   Each mode definition must also contain exactly one <literal>MODE</literal>
   tag. All other tags (<literal>PROPS</literal>, <literal>RULES</literal>)
   must be placed inside the <literal>MODE</literal> tag. The
   <literal>MODE</literal> tag does not have any defined attributes.
   Here is an example:
  </para>
  <programlisting><![CDATA[<MODE>]]>
    <replaceable>... mode definition goes here ...</replaceable>
<![CDATA[</MODE>]]></programlisting>
 </sect1>
 <sect1 id="mode-tag-props"><title>The PROPS Tag</title>
  <para>
   The <literal>PROPS</literal> tag and the <literal>PROPERTY</literal> tags
   inside it are used to define mode-specific
   properties. Each <literal>PROPERTY</literal> tag must have a
   <literal>NAME</literal> attribute set to the property's name, and a
   <literal>VALUE</literal> attribute with the property's value.
  </para>
  <para>
   All buffer-local properties listed in <xref linkend="buffer-local" />
   may be given values in edit modes.
  </para>
  <para>
   The following mode properties specify commenting strings:
  </para>
  <itemizedlist>
   <listitem><para><literal>commentEnd</literal> - the comment end
   string, used by the <guimenuitem>Range Comment</guimenuitem> command.
   </para></listitem>
   <listitem><para><literal>commentStart</literal> - the comment start
   string, used by the <guimenuitem>Range Comment</guimenuitem> command.
   </para></listitem>
   <listitem><para><literal>lineComment</literal> - the line comment
   string, used by the <guimenuitem>Line Comment</guimenuitem> command.
   </para></listitem>
  </itemizedlist>
  <para>
   When performing auto indent, a number of mode properties determine the
   resulting indent level:
  </para>
  <itemizedlist>
   <listitem><para>The line and the one before it are scanned for brackets
   listed in the <literal>indentCloseBrackets</literal> and
   <literal>indentOpenBrackets</literal> properties.
   Opening brackets in the previous line increase indent.
  </para>
  <para>
   If <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
   then closing brackets on the current line will line up with
   the line containing the matching opening bracket. For example, in Java mode
   <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
   resulting in brackets being indented like so:
  </para>
  <programlisting>{
    // Code
    {
        // More code
    }
}</programlisting>
   <para>
    If <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
    the line <emphasis>after</emphasis> a closing bracket will be lined up with
    the line containing the matching opening bracket. For example, in Lisp mode
    <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
    resulting in brackets being indented like so:
   </para>
   <programlisting>(foo 'a-parameter
    (crazy-p)
    (bar baz ()))
(print "hello world")</programlisting>
   </listitem>
   <listitem>
    <para>If the previous line contains no opening brackets, or if the
     <literal>doubleBracketIndent</literal> property is set to <literal>true</literal>,
     the previous line is checked against the regular expressions in the
     <literal>indentNextLine</literal> and <literal>indentNextLines</literal>
     properties. If the previous line matches the former, the indent of the
     current line is increased and the subsequent line is shifted back again.
     If the previous line matches the latter, the indent of the current
     and subsequent lines is increased.
    </para>
    <para>
     In Java mode, for example, the <literal>indentNextLine</literal>
     property is set to match control structures such as <quote>if</quote>,
     <quote>else</quote>, <quote>while</quote>, and so on.
    </para>
    <para>
     The
     <literal>doubleBracketIndent</literal> property, if set to the default of
     <literal>false</literal>, results in code indented like so:
    </para>
    <programlisting>while(objects.hasNext())
{
    Object next = objects.hasNext();
    if(next instanceof Paintable)
        next.paint(g);
}</programlisting>
   <para>
     On the other hand, settings this property to <quote>true</quote> will
     give the following result:
   </para>
     <programlisting>while(objects.hasNext())
    {
        Object next = objects.hasNext();
        if(next instanceof Paintable)
            next.paint(g);
    }</programlisting></listitem>
  </itemizedlist>
  <para>
   Here is the complete <literal>&lt;PROPS&gt;</literal> tag for Java mode:
  </para>
  <programlisting><![CDATA[<PROPS>
    <PROPERTY NAME="commentStart" VALUE="/*" />
    <PROPERTY NAME="commentEnd" VALUE="*/" />
    <PROPERTY NAME="lineComment" VALUE="//" />
    <PROPERTY NAME="wordBreakChars" VALUE=",+-=&lt;&gt;/?^&amp;*" />

    <!-- Auto indent -->
    <PROPERTY NAME="indentOpenBrackets" VALUE="{" />
    <PROPERTY NAME="indentCloseBrackets" VALUE="}" />
    <PROPERTY NAME="indentNextLine"
    	VALUE="\s*(((if|while)\s*\(|else\s*|else\s+if\s*\(|for\s*\(.*\))[^{;]*)" />
    <!-- set this to 'true' if you want to use GNU coding style -->
    <PROPERTY NAME="doubleBracketIndent" VALUE="false" />
    <PROPERTY NAME="lineUpClosingBracket" VALUE="true" />
</PROPS>]]></programlisting>
 </sect1>
 <sect1 id="mode-tag-rules"><title>The RULES Tag</title>
  <para>
   <literal>RULES</literal> tags must be placed inside the
   <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
   <firstterm>ruleset</firstterm>. A ruleset consists of a number of
   <firstterm>parser rules</firstterm>, with each parser
   rule specifying how to highlight a specific syntax token. There must
   be at least one ruleset in each edit mode. There can also be more
   than one, with different rulesets being used to highlight different
   parts of a buffer (for example, in HTML mode, one rule set
   highlights HTML tags, and another highlights inline JavaScript).
   For information about using more
   than one ruleset, see <xref linkend="mode-rule-span" />.
  </para>
  <para>
   The <literal>RULES</literal> tag supports the following attributes, all of
   which are optional:
  </para>
  <itemizedlist>
   <listitem><para><literal>SET</literal> - the name of this ruleset.
   All rulesets other than the first must have a name.
   </para></listitem>
   <listitem><para><literal>IGNORE_CASE</literal> - if set to
   <literal>FALSE</literal>, matches will be case sensitive. Otherwise, case
   will not matter. Default is <literal>TRUE</literal>.
   </para></listitem>
   <listitem><para><literal>NO_WORD_SEP</literal> - any non-alphanumeric
   character <emphasis>not</emphasis> in this list is treated as a word separator
   for the purposes of syntax highlighting.
   </para></listitem>
   <listitem><para><literal>DEFAULT</literal> - the token type for
   text which doesn't match
   any specific rule. Default is <literal>NULL</literal>. See
   <xref linkend="mode-syntax-tokens" /> for a list of token types.
   </para></listitem>
   <listitem><para><literal>HIGHLIGHT_DIGITS</literal>
   </para></listitem>
   <listitem><para><literal>DIGIT_RE</literal> - see below for information
   about these two attributes.</para></listitem>
  </itemizedlist>
  <para>
   Here is an example <literal>RULES</literal> tag:
  </para>
  <programlisting>&lt;RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE"&gt;
    <replaceable>... parser rules go here ...</replaceable>
&lt;/RULES&gt;</programlisting>

  <sect2><title>Highlighting Numbers</title>
   <para>
    If the <literal>HIGHLIGHT_DIGITS</literal> attribute is set to
    <literal>TRUE</literal>, jEdit will attempt to highlight numbers in this
    ruleset.
   </para>
   <para>
    Any word consisting entirely of digits (0-9) will be highlighted with the
    <literal>DIGIT</literal> token type.
    A word that contains other letters in addition to digits will be
    highlighted with the
    <literal>DIGIT</literal> token type only if it matches the regular
    expression specified in the <literal>DIGIT_RE</literal> attribute.
    If this attribute is not specified, it will not be highlighted.
   </para>
   <para>
    Here is an example <literal>DIGIT_RE</literal> regular expression that highlights
    Java-style numeric literals (normal numbers, hexadecimals
    prefixed with <literal>0x</literal>, numbers suffixed with various
    type indicators, and floating point literals containing an exponent):
   </para>
   <programlisting>DIGIT_RE="(0x[[:xdigit:]]+|[[:digit:]]+(e[[:digit:]]*)?)[lLdDfF]?"</programlisting>
   <para>
    Regular expression syntax is described in <xref linkend="regexps" />.
   </para>
  </sect2>
  
  <sect2 id="rule-ordering"><title>Rule Ordering Requirements</title>
   <para>
    You might encounter this very common pitfall when writing your own modes.
   </para>
   <para>
    Since jEdit checks buffer text against parser rules in the order they appear
    in the ruleset, more specific rules must be placed before generalized ones,
    otherwise the generalized rules will catch everything.
   </para>
   <para>
    This is best demonstrated with an example. The following is incorrect rule
    ordering:
   </para>
   <programlisting><![CDATA[<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>]]></programlisting>
   <para>
    If you write the above in a rule set, any occurrence of <quote>[</quote>
    (even things like <quote>[!DEFINE</quote>, etc) 
    will be highlighted using the first rule, because it will be the
    first to match. This is most likely not the intended behavior.
   </para>
   <para>
    The problem can be solved by placing the more specific rule before the
    general one:
   </para>
   <programlisting><![CDATA[<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>]]></programlisting>
   <para>
    Now, if the buffer contains the text <quote>[!SPECIAL]</quote>, the
    rules will be checked in order, and the first rule will be the first
    to match. However, if you write <quote>[FOO]</quote>, it will be highlighted
    using the second rule, which is exactly what you would expect.
   </para>
  </sect2>
  <sect2><title>Per-Ruleset Properties</title>
   <para>
    The <literal>PROPS</literal> tag (described in <xref linkend="mode-tag-props"/>)
    can also be placed inside the <literal>RULES</literal> tag to define
    ruleset-specific properties. The following properties can
    be set on a per-ruleset basis:
   </para>
   <itemizedlist>
    <listitem><para><literal>commentEnd</literal> - the comment end
    string.
    </para></listitem>
    <listitem><para><literal>commentStart</literal> - the comment start
    string.
    </para></listitem>
    <listitem><para><literal>lineComment</literal> - the line comment
    string.
    </para></listitem>
   </itemizedlist>
   <para>
    This allows different parts of a file to have different comment strings
    (in the case of HTML, for example, in HTML text and inline JavaScript).
    For information about the commenting commands,
    see <xref linkend="commenting"/>.
   </para>
  </sect2>
 </sect1>
  <sect1 id="mode-rule-terminate"><title>The TERMINATE Tag</title>
  <para>
   The <literal>TERMINATE</literal> rule, which must be placed inside a
   <literal>RULES</literal> tag, specifies that parsing should stop
   after the specified number of characters have been read from a line. The
   number of characters to terminate after should be specified with the
   <literal>AT_CHAR</literal> attribute. Here is an example:
  </para>
  <programlisting>&lt;TERMINATE AT_CHAR="1" /&gt;</programlisting>
  <para>
   This rule is used in Patch mode, for example, because only the first
   character of each line affects highlighting.
  </para>
 </sect1>
  <sect1 id="mode-rule-span"><title>The SPAN Tag</title>
  <para>
   The <literal>SPAN</literal> rule, which must be placed inside a
   <literal>RULES</literal> tag, highlights text between a start
   and end string. The start and end strings are specified inside
   child elements of the <literal>SPAN</literal> tag.
   The following attributes are supported:
  </para>
  <itemizedlist>
   <listitem><para><literal>TYPE</literal> - The token type to highlight the
   span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
   types.</para></listitem>
   <listitem><para><literal>AT_LINE_START</literal> - If set to
   <literal>TRUE</literal>, the span will only be highlighted if the start
   sequence occurs at the beginning of a line.</para></listitem>
   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
   <literal>TRUE</literal>, the span will only be highlighted if the
   start sequence is the first non-whitespace text in the line.</para>
   </listitem>
   <listitem><para><literal>AT_WORD_START</literal> - If set to
   <literal>TRUE</literal>, the span will only be highlighted if the start
   sequence occurs at the beginning of a word.</para></listitem>
   <listitem><para><literal>DELEGATE</literal> - text inside the span will be
   highlighted with the specified ruleset. To delegate to a ruleset defined
   in the current mode, just specify its name. To delegate to a ruleset
   defined in another mode, specify a name of the form
   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
   Note that the first (unnamed) ruleset in a mode is called
   <quote>MAIN</quote>.</para></listitem>
   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
   <literal>TRUE</literal>, the start and end sequences will not be highlighted,
   only the text between them will.</para></listitem>
   <listitem><para><literal>NO_ESCAPE</literal> - If set to
   <literal>TRUE</literal>, the ruleset's escape character will have no
   effect before the span's end string. Otherwise, the presence of the escape
   character will cause that occurrence of the end string to be ignored.</para></listitem>
   <listitem><para><literal>NO_LINE_BREAK</literal> - If set to
   <literal>TRUE</literal>, the span will not cross line breaks.</para></listitem>
   <listitem><para><literal>NO_WORD_BREAK</literal> - If set to
   <literal>TRUE</literal>, the span will not cross word breaks.</para></listitem>
  </itemizedlist>
  <para>
   Note that the <literal>AT_LINE_START</literal>,
   <literal>AT_WHITESPACE_END</literal> and
   <literal>AT_WORD_START</literal> attributes can also be used on the
   <literal>BEGIN</literal> and <literal>END</literal> elements. Setting these
   attributes to the same value on both elements has the same effect as
   setting them on the <literal>SPAN</literal> element.
  </para>
  <para>
   Here is a <literal>SPAN</literal> that highlights Java string literals,
   which cannot include line breaks:
  </para>
  <programlisting>&lt;SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE"&gt;
  &lt;BEGIN&gt;"&lt;/BEGIN&gt;
  &lt;END&gt;"&lt;/END&gt;
&lt;/SPAN&gt;</programlisting>
  <para>
   Here is a <literal>SPAN</literal> that highlights Java documentation
   comments by delegating to the <quote>JAVADOC</quote> ruleset defined
   elsewhere in the current mode:
  </para>
  <programlisting>&lt;SPAN TYPE="COMMENT2" DELEGATE="JAVADOC"&gt;
  &lt;BEGIN&gt;/**&lt;/BEGIN&gt;
  &lt;END&gt;*/&lt;/END&gt;
&lt;/SPAN&gt;</programlisting>
  <para>
   Here is a <literal>SPAN</literal> that highlights HTML cascading stylesheets
   inside <literal>&lt;STYLE&gt;</literal> tags by delegating to the main
   ruleset in the CSS edit mode:
  </para>
  <programlisting>&lt;SPAN TYPE="MARKUP" DELEGATE="css::MAIN"&gt;
  &lt;BEGIN&gt;&amp;lt;style&amp;gt;&lt;/BEGIN&gt;
  &lt;END&gt;&amp;lt;/style&amp;gt;&lt;/END&gt;
&lt;/SPAN&gt;</programlisting>
 </sect1>
 <sect1 id="mode-rule-span-regexp"><title>The SPAN_REGEXP Tag</title>
  <para>
   The <literal>SPAN_REGEXP</literal> rule is similar to the
   <literal>SPAN</literal> rule except the start sequence is taken to be
   a regular expression. In addition to the attributes supported by
   the <literal>SPAN</literal> tag, the
   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
   the first character that
   the regular expression matches.  This rules out using regular expressions
   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
  </para>
  <para>
   Any text matched by groups in the <literal>BEGIN</literal> regular
   expression is substituted in the <literal>END</literal> string. See
   below for an example of where this is useful.
  </para>
  <para>
   Regular expression syntax is described in <xref linkend="regexps" />.
  </para>
  <para>
   Here is a <literal>SPAN_REGEXP</literal> rule that highlights
   <quote>read-ins</quote> in shell scripts:
  </para>
  <programlisting>&lt;SPAN_REGEXP HASH_CHAR="&lt;" TYPE="LITERAL1" DELEGATE="LITERAL"&gt;
    &lt;BEGIN&gt;&lt;![CDATA[&lt;&lt;[[:space:]'"]*([[:alnum:]_]+)[[:space:]'"]*]]&gt;&lt;/BEGIN&gt;
    &lt;END&gt;$1&lt;/END&gt;
&lt;/SPAN_REGEXP&gt;</programlisting>
  <para>
   Here is a <literal>SPAN_REGEXP</literal> rule that highlights constructs
   placed between <literal>&lt;#ftl</literal> and <literal>&gt;</literal>,
   as long as the <literal>&lt;#ftl</literal> is followed by a word break:
  </para>
  <programlisting><![CDATA[<SPAN_REGEXP TYPE="KEYWORD1" HASH_CHAR="&lt;" DELEGATE="EXPRESSION">
    <BEGIN>&lt;#ftl\&gt;</BEGIN>
    <END>&gt;</END>
</SPAN_REGEXP>]]></programlisting>
 </sect1>
 <sect1 id="mode-rule-eol-span"><title>The EOL_SPAN Tag</title>
  <para>
   An <literal>EOL_SPAN</literal> is similar to a <literal>SPAN</literal>
   except that highlighting stops at the end of the line, and no end sequence
   needs to be specified. The text to match is specified between the opening and
   closing <literal>EOL_SPAN</literal> tags.
   The following attributes are supported:
  </para>
  <itemizedlist>
  <listitem><para><literal>TYPE</literal> - The token type to highlight the
  span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
  types.</para></listitem>
  <listitem><para><literal>AT_LINE_START</literal> - If set to
  <literal>TRUE</literal>, the span will only be highlighted if the start
  sequence occurs at the beginning of a line.</para></listitem>
  <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
  <literal>TRUE</literal>, the span will only be highlighted if the
   sequence is the first non-whitespace text in the line.</para>
  </listitem>
  <listitem><para><literal>AT_WORD_START</literal> - If set to
  <literal>TRUE</literal>, the span will only be highlighted if the start
  sequence occurs at the beginning of a word.</para></listitem>
  <listitem><para><literal>DELEGATE</literal> - text inside the span will be
  highlighted with the specified ruleset. To delegate to a ruleset defined
  in the current mode, just specify its name. To delegate to a ruleset
  defined in another mode, specify a name of the form
  <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
  Note that the first (unnamed) ruleset in a mode is called
  <quote>MAIN</quote>.</para></listitem>
  <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
  <literal>TRUE</literal>, the start and end sequences will not be highlighted,
  only the text between them will.</para></listitem>
  </itemizedlist>
  <para>
   Here is an <literal>EOL_SPAN</literal> that highlights C++ comments:
  </para>
  <programlisting>&lt;EOL_SPAN TYPE="COMMENT1"&gt;//&lt;/EOL_SPAN&gt;</programlisting>
 </sect1>
 <sect1 id="mode-rule-eol-span-regexp"><title>The EOL_SPAN_REGEXP Tag</title>
  <para>
   The <literal>EOL_SPAN_REGEXP</literal> rule is similar to the
   <literal>EOL_SPAN</literal> rule except the match sequence is taken to be
   a regular expression. In addition to the attributes supported by
   the <literal>EOL_SPAN</literal> tag, the
   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
   the first character that
   the regular expression matches.  This rules out using regular expressions
   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
  </para>
  <para>
   Regular expression syntax is described in <xref linkend="regexps" />.
  </para>
  <para>
   Here is an <literal>EOL_SPAN_REGEXP</literal> that highlights MS-DOS batch file comments, which start with <literal>REM</literal>, followed by any whitespace character, and extend until the end of the line:
  </para>
  <programlisting><![CDATA[<EOL_SPAN_REGEXP AT_WHITESPACE_END="TRUE" HASH_CHAR="R" TYPE="COMMENT1">REM\s</EOL_SPAN_REGEXP>]]></programlisting>
 </sect1>
 <sect1 id="mode-rule-mark-prev"><title>The MARK_PREVIOUS Tag</title>
  <para>
   The <literal>MARK_PREVIOUS</literal> rule, which must be placed inside a
   <literal>RULES</literal> tag, highlights from the end of the
   previous syntax token to the matched text. The text to match
   is specified between opening and closing <literal>MARK_PREVIOUS</literal>
   tags. The following attributes are supported:
  </para>
  <itemizedlist>
   <listitem><para><literal>TYPE</literal> - The token type to highlight the
   text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
   types.</para></listitem>
   <listitem><para><literal>AT_LINE_START</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
   at the beginning of a line.</para></listitem>
   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if it is
   the first non-whitespace text in the line.</para>
   </listitem>
   <listitem><para><literal>AT_WORD_START</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if
   it occurs at the beginning of a word.</para></listitem>
   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
   <literal>TRUE</literal>, the match will not be highlighted,
   only the text before it will.</para></listitem>
  </itemizedlist>
  <para>
   Here is a rule that highlights labels in Java mode (for example,
   <quote>XXX:</quote>):
  </para>
  <programlisting>&lt;MARK_PREVIOUS AT_WHITESPACE_END="TRUE"
   EXCLUDE_MATCH="TRUE"&gt;:&lt;/MARK_PREVIOUS&gt;</programlisting>
 </sect1>
 <sect1 id="mode-rule-mark-following"><title>The MARK_FOLLOWING Tag</title>
  <para>
   The <literal>MARK_FOLLOWING</literal> rule, which must be placed inside a
   <literal>RULES</literal> tag, highlights from the start of the
   match to the next syntax token. The text to match
   is specified between opening and closing <literal>MARK_FOLLOWING</literal>
   tags. The following attributes are supported:
  </para>
  <itemizedlist>
   <listitem><para><literal>TYPE</literal> - The token type to highlight the
   text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
   types.</para></listitem>
   <listitem><para><literal>AT_LINE_START</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
   at the beginning of a line.</para></listitem>
   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if it is
   the first non-whitespace text in the line.</para>
   </listitem>
   <listitem><para><literal>AT_WORD_START</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if
   it occurs at the beginning of a word.</para></listitem>
   <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
   <literal>TRUE</literal>, the match will not be highlighted,
   only the text after it will.</para></listitem>
  </itemizedlist>
  <para>
   Here is a rule that highlights variables in Unix shell scripts
   (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):
  </para>
  <programlisting>&lt;MARK_FOLLOWING TYPE="KEYWORD2"&gt;$&lt;/MARK_FOLLOWING&gt;</programlisting>
 </sect1>
  <sect1 id="mode-rule-seq"><title>The SEQ Tag</title>
  <para>
   The <literal>SEQ</literal> rule, which must be placed inside a
   <literal>RULES</literal> tag, highlights fixed sequences of text. The text
   to highlight is specified between opening and closing <literal>SEQ</literal>
   tags. The following attributes are supported:
  </para>
  <itemizedlist>
   <listitem><para><literal>TYPE</literal> - the token type to highlight the
   sequence with. See <xref linkend="mode-syntax-tokens" /> for a list of token
   types.</para></listitem>
   <listitem><para><literal>AT_LINE_START</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
   at the beginning of a line.</para></listitem>
   <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if it is
   the first non-whitespace text in the line.</para>
   </listitem>
   <listitem><para><literal>AT_WORD_START</literal> - If set to
   <literal>TRUE</literal>, the sequence will only be highlighted if
   it occurs at the beginning of a word.</para></listitem>
   <listitem><para><literal>DELEGATE</literal> - if this attribute is specified,
   all text after the sequence will be highlighted using this ruleset.
   To delegate to a ruleset defined
   in the current mode, just specify its name. To delegate to a ruleset
   defined in another mode, specify a name of the form
   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
   Note that the first (unnamed) ruleset in a mode is called
   <quote>MAIN</quote>.</para></listitem>
  </itemizedlist>
  <para>
   The following rules highlight a few Java operators:
  </para>
  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;+&lt;/SEQ&gt;
&lt;SEQ TYPE="OPERATOR"&gt;-&lt;/SEQ&gt;
&lt;SEQ TYPE="OPERATOR"&gt;*&lt;/SEQ&gt;
&lt;SEQ TYPE="OPERATOR"&gt;/&lt;/SEQ&gt;</programlisting>
 </sect1>
 <sect1 id="mode-rule-seq-regexp"><title>The SEQ_REGEXP Tag</title>
  <para>
   The <literal>SEQ_REGEXP</literal> rule is similar to the
   <literal>SEQ</literal> rule except the match sequence is taken to be
   a regular expression. In addition to the attributes supported by
   the <literal>SEQ</literal> tag, the
   <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
   the first character that
   the regular expression matches. This rules out using regular expressions
   which can match more than one character at the start position. The regular expression match cannot span more than one line, either.
  </para>
  <para>
   Here is an example of a <literal>SEQ_REGEXP</literal> rule that highlights
   Perl's matcher constructions such as <literal>m/(.+):(\d+):(.+)/</literal>:
  </para>
  <programlisting><![CDATA[<SEQ_REGEXP TYPE="MARKUP"
    HASH_CHAR="m"
    AT_WORD_START="TRUE"
>m([[:punct:]])(?:.*?[^\\])*?\1[sgiexom]*</SEQ_REGEXP>]]></programlisting>

  <para>
   Regular expression syntax is described in <xref linkend="regexps" />.
  </para>
 </sect1>
 <sect1 id="mode-rule-import"><title>The IMPORT Tag</title>
  <para>
   The <literal>IMPORT</literal> tag, which must be placed inside a <literal>RULES</literal> tag, loads all rules defined in a given ruleset into the current ruleset; in other words, it has the same effect as copying and pasting the imported ruleset.
  </para>
  <para>
   The only required attribute <literal>DELEGATE</literal> must be set to the name of a ruleset. To import a ruleset defined
   in the current mode, just specify its name. To import a ruleset
   defined in another mode, specify a name of the form
   <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
   Note that the first (unnamed) ruleset in a mode is called
   <quote>MAIN</quote>.
  </para>
  <para>
   One quirk is that the definition of the imported ruleset is not copied to the location of the <literal>IMPORT</literal> tag, but rather to the end of the containing ruleset. This has implications with rule-ordering; see <xref linkend="rule-ordering"/>.
  </para>
  <para>
   Here is an example from the PHP mode, which extends the inline JavaScript highlighting to support embedded PHP:
  </para>
  <programlisting>
   <![CDATA[<RULES SET="JAVASCRIPT+PHP">

   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
       <BEGIN>&lt;?php</BEGIN>
       <END>?&gt;</END>
   </SPAN>
   
   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
       <BEGIN>&lt;?</BEGIN>
       <END>?&gt;</END>
   </SPAN>
   
   <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
       <BEGIN>&lt;%=</BEGIN>
       <END>%&gt;</END>
   </SPAN>

   <IMPORT DELEGATE="javascript::MAIN"/>
</RULES>]]></programlisting>
 </sect1>
 <sect1 id="mode-rule-keywords"><title>The KEYWORDS Tag</title>
  <para>
   The <literal>KEYWORDS</literal> tag, which must be placed inside a
   <literal>RULES</literal> tag and can only appear once, specifies a list of
   keywords to highlight.
   Keywords are similar to <literal>SEQ</literal>s, except that
   <literal>SEQ</literal>s match anywhere in the text, whereas keywords only
   match whole words. Words are considered to be runs of text separated by
   non-alphanumeric characters.
  </para>
  <para>
   The <literal>KEYWORDS</literal> tag does not define any attributes.
  </para>
  <para>
   Each child element of the <literal>KEYWORDS</literal> tag is an element
   whose name is a token type, and whose content is the keyword to
   highlight. For example, the following rule highlights the most common Java
   keywords:
  </para>
  <programlisting>&lt;KEYWORDS&gt;
  &lt;KEYWORD1&gt;if&lt;/KEYWORD1&gt;
  &lt;KEYWORD1&gt;else&lt;/KEYWORD1&gt;
  &lt;KEYWORD3&gt;int&lt;/KEYWORD3&gt;
  &lt;KEYWORD3&gt;void&lt;/KEYWORD3&gt;
&lt;/KEYWORDS&gt;</programlisting>
 </sect1>
  <sect1 id="mode-syntax-tokens"><title>Token Types</title>
  <para>
   Parser rules can highlight tokens using any of the following token
   types:
  </para>
  <itemizedlist>
  <listitem><para><literal>NULL</literal> - no special
  highlighting is performed on tokens of type <literal>NULL</literal>
  </para></listitem>
  <listitem><para><literal>COMMENT1</literal>
  </para></listitem>
  <listitem><para><literal>COMMENT2</literal>
  </para></listitem>
  <listitem><para><literal>COMMENT3</literal>
  </para></listitem>
  <listitem><para><literal>COMMENT4</literal>
  </para></listitem>
  <listitem><para><literal>FUNCTION</literal>
  </para></listitem>
  <listitem><para><literal>INVALID</literal><!--  - tokens of this type are
  automatically added if a <literal>NO_WORD_BREAK</literal> or
  <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
  one word or line, respectively. -->
  </para></listitem>
  <listitem><para><literal>KEYWORD1</literal>
  </para></listitem>
  <listitem><para><literal>KEYWORD2</literal>
  </para></listitem>
  <listitem><para><literal>KEYWORD3</literal>
  </para></listitem>
  <listitem><para><literal>KEYWORD4</literal>
  </para></listitem>
  <listitem><para><literal>LABEL</literal>
  </para></listitem>
  <listitem><para><literal>LITERAL1</literal>
  </para></listitem>
  <listitem><para><literal>LITERAL2</literal>
  </para></listitem>
  <listitem><para><literal>LITERAL3</literal>
  </para></listitem>
  <listitem><para><literal>LITERAL4</literal>
  </para></listitem>
  <listitem><para><literal>MARKUP</literal>
  </para></listitem>
  <listitem><para><literal>OPERATOR</literal>
  </para></listitem>
  </itemizedlist>
 </sect1>
</chapter>