writing-modes.xml | searchcode

/jEdit/tags/jedit-4-0-pre3/doc/users-guide/writing-modes.xml

# · XML · 515 lines · 507 code · 6 blank · 2 comment · 0 complexity · 1386484f546c7d11e0ad5d68fdd0325a MD5 · raw file

<!-- jEdit buffer-local properties: -->
<!-- :indentSize=1:noTabs=true:folding=indent:collapseFolds=1: -->

<chapter id="writing-modes"><title>Writing Edit Modes</title>
 <para>
  Edit modes are defined using XML, the <firstterm>extensible markup
  language</firstterm>; mode files have the extension
  <filename>.xml</filename>. XML is a very simple language, and as a result
  edit modes are easy to create and modify. This section will
  start with a short XML primer, followed by detailed information about
  each supported tag and highlighting rule.
 </para>
 <sidebar><title>Changes to modes take effect immediately</title>
  <para>
   Editing a mode file, or a mode catalog file within jEdit will cause the
   edit modes to be reloaded automatically as soon as the file is saved.
  </para>
  <para>
   <guimenu>Utilities</guimenu>&gt;<guimenuitem>Reload Edit Modes</guimenuitem>
   can be used to reload edit modes after changes to mode files are made
   outside jEdit.
  </para>
 </sidebar>
 <sect1 id="xml-primer"><title>An XML Primer</title>
  <para>
   A very simple edit mode looks like so:
  </para>
  <programlisting><![CDATA[<?xml version="1.0"?>

<!DOCTYPE MODE SYSTEM "xmode.dtd">

<MODE>
    <PROPS>
        <PROPERTY NAME="commentStart" VALUE="/*" />
        <PROPERTY NAME="commentEnd" VALUE="*/" />
    </PROPS>

    <RULES>
        <SPAN TYPE="COMMENT1">
            <BEGIN>/*</BEGIN>
            <END>*/</END>
        </SPAN>
    </RULES>
</MODE>]]></programlisting>
  <para>
   Note that each opening tag must have a corresponding closing tag.
   If there is nothing between the opening and closing tags, for example
   <literal>&lt;TAG&gt;&lt;/TAG&gt;</literal>, the shorthand notation
   <literal>&lt;TAG /&gt;</literal> may be used. An example of this shorthand
   can be seen
   in the <literal>&lt;PROPERTY&gt;</literal> tags above.
  </para>
  <para>
   XML is case sensitive. <literal>Span</literal> or <literal>span</literal>
   is not the same as <literal>SPAN</literal>.
  </para>
  <para>
   To insert a special character such as &lt; or &gt; literally in XML
   (for example, inside an attribute value), you must write it as
   an <firstterm>entity</firstterm>. An
   entity consists of the character's symbolic name enclosed with
   <quote>&amp;</quote> and <quote>;</quote>. A full list of entities is out of
   the scope of this section, but the most important are:
  </para>
  <itemizedlist>
   <listitem><para><literal>&amp;lt;</literal> - The less-than (&lt;)
   character</para></listitem>
   <listitem><para><literal>&amp;gt;</literal> - The greater-than (&gt;)
   character</para></listitem>
   <listitem><para><literal>&amp;amp;</literal> - The ampersand (&amp;)
   character</para></listitem>
  </itemizedlist>
  <para>
   For example, the following will cause a syntax error:
  </para>
  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;&lt;/SEQ&gt;</programlisting>
  <para>
   Instead, you must write:
  </para>
  <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;&amp;amp;&lt;/SEQ&gt;</programlisting>
  <para>
   Now that the basics of XML have been covered, the rest of this
   section will cover each construct in detail.
  </para>
 </sect1>
 <sect1 id="mode-preamble"><title>The Preamble and MODE tag</title>
   <para>
    Each mode definition must begin with the following:
   </para>
   <programlisting>&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE MODE SYSTEM "xmode.dtd"&gt;</programlisting>
  <para>
   Each mode definition must also contain exactly one <literal>MODE</literal>
   tag. All other tags (<literal>PROPS</literal>, <literal>RULES</literal>)
   must be placed inside the <literal>MODE</literal> tag.
  </para>
 </sect1>
 <sect1 id="mode-tag-props"><title>The PROPS Tag</title>
  <para>
   The <literal>PROPS</literal> tag and the <literal>PROPERTY</literal> tags
   inside it are used to define mode-specific
   properties. Each <literal>PROPERTY</literal> tag must have a
   <literal>NAME</literal> attribute set to the property's name, and a
   <literal>VALUE</literal> attribute with the property's value.
  </para>
  <para>
   All buffer-local properties listed in <xref linkend="buffer-local" />
   may be given values in edit modes. In addition, the following mode
   properties have no buffer-local equivalent:
  </para>
  <itemizedlist>
   <listitem><para><literal>indentCloseBrackets</literal> -
   A list of characters (usually brackets) that subtract indent from
   the <emphasis>current</emphasis> line. For example, in Java mode this
   property is set to <quote>}</quote>.</para></listitem>
   <listitem><para><literal>indentOpenBrackets</literal> -
   A list of characters (usually brackets) that add indent to
   the <emphasis>next</emphasis> line. For example, in Java mode this
   property is set to <quote>{</quote>.</para></listitem>
   <listitem><para><literal>indentPrevLine</literal> -
   When indenting a line, jEdit checks if the previous line matches
   the regular expression stored in this property. If it does, a level
   of indent is added. For example, in Java mode this regular expression
   matches language constructs such as
   <quote>if</quote>, <quote>else</quote>, <quote>while</quote>, etc.</para>
   </listitem>
   <listitem><para><literal>doubleBracketIndent</literal> -
   If a line matches the <literal>indentPrevLine</literal> regular
   expression and the next line contains an opening bracket,
   a level of indent will not be added to the next line, unless
   this property is set to <quote>true</quote>. For example, with this
   property set to <quote>false</quote>, Java code will be indented like so:
   </para>
   <programlisting>while(objects.hasMoreElements())
{
        ((Drawable)objects.nextElement()).draw();
}</programlisting>
   <para>
     On the other hand, settings this property to <quote>true</quote> will
     give the following result:
   </para>
     <programlisting>while(objects.hasMoreElements())
        {
                ((Drawable)objects.nextElement()).draw();
        }</programlisting></listitem>
  </itemizedlist>
  <para>
   Here is the complete <literal>&lt;PROPS&gt;</literal> tag for Java mode:
  </para>
  <programlisting>&lt;PROPS&gt;
    &lt;PROPERTY NAME="indentOpenBrackets" VALUE="{" /&gt;
    &lt;PROPERTY NAME="indentCloseBrackets" VALUE="}" /&gt;
    &lt;PROPERTY NAME="indentPrevLine" VALUE="\s*(((if|while)
        \s*\(|else|case|default)[^;]*|for\s*\(.*)" /&gt;
    &lt;PROPERTY NAME="doubleBracketIndent" VALUE="false" /&gt;
    &lt;PROPERTY NAME="commentStart" VALUE="/*" /&gt;
    &lt;PROPERTY NAME="commentEnd" VALUE="*/" /&gt;
    &lt;PROPERTY NAME="blockComment" VALUE="//" /&gt;
    &lt;PROPERTY NAME="wordBreakChars" VALUE=",+-=&lt;&gt;/?^&amp;*" /&gt;
&lt;/PROPS&gt;</programlisting>
 </sect1>
 <sect1 id="mode-tag-rules"><title>The RULES Tag</title>
  <para>
   <literal>RULES</literal> tags must be placed inside the
   <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
   <firstterm>ruleset</firstterm>. A ruleset consists of a number of
   <firstterm>parser rules</firstterm>, with each parser
   rule specifying how to highlight a specific syntax token. There must
   be at least one ruleset in each edit mode. There can also be more
   than one, with different rulesets being used to highlight different
   parts of a buffer (for example, in HTML mode, one rule set
   highlights HTML tags, and another highlights inline JavaScript).
   For information about using more
   than one ruleset, see <xref linkend="mode-rule-span" />.
  </para>
  <para>
   The <literal>RULES</literal> tag supports the following attributes, all of
   which are optional:
  </para>
  <itemizedlist>
   <listitem><para><literal>SET</literal> - the name of this ruleset.
   All rulesets other than the first must have a name.
   </para></listitem>
   <listitem><para><literal>HIGHLIGHT_DIGITS</literal> - if set to
   <literal>TRUE</literal>, digits (0-9, as well as hexadecimal literals
   prefixed with <quote>0x</quote>) will be highlighted with the
   <classname>DIGIT</classname> token type. Default is <literal>FALSE</literal>.
   </para></listitem>
   <listitem><para><literal>IGNORE_CASE</literal> - if set to
   <literal>FALSE</literal>, matches will be case sensitive. Otherwise, case
   will not matter. Default is <literal>TRUE</literal>.
   </para></listitem>
   <listitem><para><literal>DEFAULT</literal> - the token type for
   text which doesn't match
   any specific rule. Default is <literal>NULL</literal>. See
   <xref linkend="mode-syntax-tokens" /> for a list of token types.
   </para></listitem>
  </itemizedlist>
  <para>
   Here is an example <literal>RULES</literal> tag:
  </para>
  <programlisting>&lt;RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE"&gt;
    <replaceable>... parser rules go here ...</replaceable>
&lt;/RULES&gt;</programlisting>
  <sidebar><title>Rule Ordering Requirements</title>
   <para>
    You might encounter this very common pitfall when writing your own modes.
   </para>
   <para>
    Since jEdit checks buffer text against parser rules in the order they appear
    in the ruleset, more specific rules must be placed before generalized ones,
    otherwise the generalized rules will catch everything.
   </para>
   <para>
    This is best demonstrated with an example. The following is incorrect rule
    ordering:
   </para>
   <programlisting><![CDATA[<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>]]></programlisting>
   <para>
    If you write the above in a rule set, any occurrence of <quote>[</quote>
    (even things like <quote>[!DEFINE</quote>, etc) 
    will be highlighted using the first rule, because it will be the
    first to match. This is most likely not the intended behavior.
   </para>
   <para>
    The problem can be solved by placing the more specific rule before the
    general one:
   </para>
   <programlisting><![CDATA[<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>]]></programlisting>
   <para>
    Now, if the buffer contains the text <quote>[!SPECIAL]</quote>, the
    rules will be checked in order, and the first rule will be the first
    to match. However, if you write <quote>[FOO]</quote>, it will be highlighted
    using the second rule, which is exactly what you would expect.
   </para>
  </sidebar>
  <sect2 id="mode-rule-terminate"><title>The TERMINATE Rule</title>
   <para>
    The <literal>TERMINATE</literal> rule specifies that parsing should stop
    after the specified number of characters have been read from a line. The
    number of characters to terminate after should be specified with the
    <literal>AT_CHAR</literal> attribute. Here is an example:
   </para>
   <programlisting>&lt;TERMINATE AT_CHAR="1" /&gt;</programlisting>
   <para>
    This rule is used in Patch mode, for example, because only the first
    character of each line affects highlighting. 
   </para>
  </sect2>
  <sect2 id="mode-rule-whitespace"><title>The WHITESPACE Rule</title>
   <para>
    The <literal>WHITESPACE</literal> rule specifies characters which are to
    be treated as keyword delimiters.
    Most rulesets will have <literal>WHITESPACE</literal> tags for spaces and
    tabs. Here is an example:
   </para>
   <programlisting>&lt;WHITESPACE&gt; &lt;/WHITESPACE&gt;
&lt;WHITESPACE&gt;        &lt;/WHITESPACE&gt;</programlisting>
  </sect2>
  <sect2 id="mode-rule-span"><title>The SPAN Rule</title>
   <para>
    The <literal>SPAN</literal> rule highlights text between a start
    and end string. The start and end strings are specified inside
    child elements of the <literal>SPAN</literal> tag.
    The following attributes are supported:
   </para>
   <itemizedlist>
    <listitem><para><literal>TYPE</literal> - The token type to highlight the
    span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
    types</para></listitem>
    <listitem><para><literal>AT_LINE_START</literal> - If set to
    <literal>TRUE</literal>, the span will only be highlighted if the start
    sequence occurs at the beginning of a line</para></listitem>
    <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
    <literal>TRUE</literal>, the start and end sequences will not be highlighted,
    only the text between them will</para></listitem>
    <listitem><para><literal>NO_LINE_BREAK</literal> - If set to
    <literal>TRUE</literal>, the span will be highlighted with the
    <classname>INVALID</classname> token type if it spans more than one
    line</para></listitem>
    <listitem><para><literal>NO_WORD_BREAK</literal> - If set to
    <literal>TRUE</literal>, the span will be highlighted with the
    <classname>INVALID</classname> token type if it includes
    whitespace</para></listitem>
    <listitem><para><literal>DELEGATE</literal> - text inside the span will be
    highlighted with the specified ruleset. To delegate to a ruleset defined
    in the current mode, just specify its name. To delegate to a ruleset
    defined in another mode, specify a name of the form
    <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
    Note that the first (unnamed) ruleset in a mode is called
    <quote>MAIN</quote>.</para></listitem>
   </itemizedlist>
   <note>
    <para>
     Do not delegate to rulesets that define a <literal>TERMINATE</literal> rule
     (examples of such rulesets include <literal>text::MAIN</literal> and
     <literal>patch::MAIN</literal>). It won't work.
    </para>
   </note>
   <para>
    Here is a <literal>SPAN</literal> that highlights Java string literals,
    which cannot include line breaks:
   </para>
   <programlisting>&lt;SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE"&gt;
   &lt;BEGIN&gt;"&lt;/BEGIN&gt;
   &lt;END&gt;"&lt;/END&gt;
&lt;/SPAN&gt;</programlisting>
   <para>
    Here is a <literal>SPAN</literal> that highlights Java documentation
    comments by delegating to the <quote>JAVADOC</quote> ruleset defined
    elsewhere in the current mode:
   </para>
   <programlisting>&lt;SPAN TYPE="COMMENT2" DELEGATE="JAVADOC"&gt;
   &lt;BEGIN&gt;/**&lt;/BEGIN&gt;
   &lt;END&gt;*/&lt;/END&gt;
&lt;/SPAN&gt;</programlisting>
   <para>
    Here is a <literal>SPAN</literal> that highlights HTML cascading stylesheets
    inside <literal>&lt;STYLE&gt;</literal> tags by delegating to the main
    ruleset in the CSS edit mode:
   </para>
   <programlisting>&lt;SPAN TYPE="MARKUP" DELEGATE="css::MAIN"&gt;
   &lt;BEGIN&gt;&amp;lt;style&amp;gt;&lt;/BEGIN&gt;
   &lt;END&gt;&amp;lt;/style&amp;gt;&lt;/END&gt;
&lt;/SPAN&gt;</programlisting>
   <tip>
    <para>
     The <literal>&lt;END&gt;</literal> tag is optional. If it is not specified,
     any occurrence of the start string will cause the remainder of the buffer
     to be highlighted with this rule.
    </para>
    <para>
     This can be very useful when combined with delegation.
    </para>
   </tip>
  </sect2>
  <sect2 id="mode-rule-eol-span"><title>The EOL_SPAN Rule</title>
   <para>
    An <literal>EOL_SPAN</literal> is similar to a <literal>SPAN</literal>
    except that highlighting stops at the end of the line, not after the end
    sequence is found. The text to match is specified between the opening and
    closing <literal>EOL_SPAN</literal> tags.
    The following attributes are supported:
   </para>
   <itemizedlist>
    <listitem><para><literal>TYPE</literal> - The token type to highlight the span
    with. See <xref linkend="mode-syntax-tokens" /> for a list of token
    types</para></listitem>
    <listitem><para><literal>AT_LINE_START</literal> - If set to
    <literal>TRUE</literal>, the span will only be highlighted if the start
    sequence occurs at the beginning of a line</para></listitem>
    <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
    <literal>TRUE</literal>, the start sequence will not be highlighted,
    only the text after it will</para></listitem>
   </itemizedlist>
   <para>
    Here is an <literal>EOL_SPAN</literal> that highlights C++ comments:
   </para>
   <programlisting>&lt;EOL_SPAN TYPE="COMMENT1"&gt;//&lt;/EOL_SPAN&gt;</programlisting>
  </sect2>
  <sect2 id="mode-rule-mark-prev"><title>The MARK_PREVIOUS Rule</title>
   <para>
    The <literal>MARK_PREVIOUS</literal> rule highlights from the end of the
    previous syntax token to the matched text. The text to match
    is specified between opening and closing <literal>MARK_PREVIOUS</literal>
    tags. The following attributes are supported:
   </para>
   <itemizedlist>
    <listitem><para><literal>TYPE</literal> - The token type to highlight the
    text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
    types</para></listitem>
    <listitem><para><literal>AT_LINE_START</literal> - If set to
    <literal>TRUE</literal>,
    the text will only be highlighted if it occurs at the beginning of
    the line</para></listitem>
    <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
    <literal>TRUE</literal>, the match will not be highlighted,
    only the text before it will</para></listitem>
   </itemizedlist>
   <para>
    Here is a rule that highlights labels in Java mode (for example,
    <quote>XXX:</quote>):
   </para>
   <programlisting>&lt;MARK_PREVIOUS AT_LINE_START="TRUE"
    EXCLUDE_MATCH="TRUE"&gt;:&lt;/MARK_PREVIOUS&gt;</programlisting>
  </sect2>
  <sect2 id="mode-rule-mark-following"><title>The MARK_FOLLOWING Rule</title>
   <para>
    The <literal>MARK_FOLLOWING</literal> rule highlights from the start of the
    match to the next syntax token. The text to match
    is specified between opening and closing <literal>MARK_FOLLOWING</literal>
    tags. The following attributes are supported:
   </para>
   <itemizedlist>
    <listitem><para><literal>TYPE</literal> - The token type to highlight the
    text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
    types</para></listitem>
    <listitem><para><literal>AT_LINE_START</literal> - If set to
    <literal>TRUE</literal>, the text will only be highlighted if the start
    sequence occurs at the beginning of a line</para></listitem>
    <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
    <literal>TRUE</literal>, the match will not be highlighted,
    only the text after it will</para></listitem>
   </itemizedlist>
   <para>
    Here is a rule that highlights variables in Unix shell scripts
    (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):
   </para>
   <programlisting>&lt;MARK_FOLLOWING TYPE="KEYWORD2"&gt;$&lt;/MARK_FOLLOWING&gt;</programlisting>
  </sect2>
  <sect2 id="mode-rule-seq"><title>The SEQ Rule</title>
   <para>
    The <literal>SEQ</literal> rule highlights fixed sequences of text. The text
    to highlight is specified between opening and closing <literal>SEQ</literal>
    tags. The following attributes are supported:
   </para>
   <itemizedlist>
    <listitem><para><literal>TYPE</literal> - the token type to highlight the
    sequence with. See <xref linkend="mode-syntax-tokens" /> for a list of token
    types</para></listitem>
    <listitem><para><literal>AT_LINE_START</literal> - If set to
    <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
    at the beginning of a line</para></listitem>
   </itemizedlist>
   <para>
    The following rules highlight a few Java operators:
   </para>
   <programlisting>&lt;SEQ TYPE="OPERATOR"&gt;+&lt;/SEQ&gt;
&lt;SEQ TYPE="OPERATOR"&gt;-&lt;/SEQ&gt;
&lt;SEQ TYPE="OPERATOR"&gt;*&lt;/SEQ&gt;
&lt;SEQ TYPE="OPERATOR"&gt;/&lt;/SEQ&gt;</programlisting>
  </sect2>
  <sect2 id="mode-rule-keywords"><title>The KEYWORDS Rule</title>
   <para>
    There can only be one <literal>KEYWORDS</literal> tag per ruleset.
    The <literal>KEYWORDS</literal> rule defines keywords to highlight.
    Keywords are similar to <literal>SEQ</literal>s, except that
    <literal>SEQ</literal>s match anywhere in the text, whereas keywords only
    match whole words.
   </para>
   <para>
    The <literal>KEYWORDS</literal> tag supports only one attribute,
    <literal>IGNORE_CASE</literal>. If set to <literal>FALSE</literal>, keywords
    will be case sensitive. Otherwise, case will not matter. Default is
    <literal>TRUE</literal>.
   </para>
   <para>
    Each child element of the <literal>KEYWORDS</literal> tag should be named
    after the desired token type, with the keyword text between the start and
    end tags. For example, the following rule highlights the most common Java
    keywords:
   </para>
   <programlisting>&lt;KEYWORDS IGNORE_CASE="FALSE"&gt;
   &lt;KEYWORD1&gt;if&lt;/KEYWORD1&gt;
   &lt;KEYWORD1&gt;else&lt;/KEYWORD1&gt;
   &lt;KEYWORD3&gt;int&lt;/KEYWORD3&gt;
   &lt;KEYWORD3&gt;void&lt;/KEYWORD3&gt;
&lt;/KEYWORDS&gt;</programlisting>
  </sect2>
  <sect2 id="mode-syntax-tokens"><title>Token Types</title>
   <para>
    Parser rules can highlight tokens using any of the following token
    types:
   </para>
   <itemizedlist>
   <listitem><para><literal>NULL</literal> - no special
   highlighting is performed on tokens of type <literal>NULL</literal>
   </para></listitem>
   <listitem><para><literal>COMMENT1</literal>
   </para></listitem>
   <listitem><para><literal>COMMENT2</literal>
   </para></listitem>
   <listitem><para><literal>FUNCTION</literal>
   </para></listitem>
   <listitem><para><literal>INVALID</literal> - tokens of this type are
   automatically added if a <literal>NO_WORD_BREAK</literal> or
   <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
   one word or line, respectively.
   </para></listitem>
   <listitem><para><literal>KEYWORD1</literal>
   </para></listitem>
   <listitem><para><literal>KEYWORD2</literal>
   </para></listitem>
   <listitem><para><literal>KEYWORD3</literal>
   </para></listitem>
   <listitem><para><literal>LABEL</literal>
   </para></listitem>
   <listitem><para><literal>LITERAL1</literal>
   </para></listitem>
   <listitem><para><literal>LITERAL2</literal>
   </para></listitem>
   <listitem><para><literal>MARKUP</literal>
   </para></listitem>
   <listitem><para><literal>OPERATOR</literal>
   </para></listitem>
   </itemizedlist>
  </sect2>
 </sect1>
</chapter>