/jEdit/tags/jedit-4-2-pre4/doc/users-guide/writing-modes.xml
XML | 746 lines | 728 code | 10 blank | 8 comment | 0 complexity | 9449d6c0fb69592849fdb4d22ed189de MD5 | raw file
Possible License(s): BSD-3-Clause, AGPL-1.0, Apache-2.0, LGPL-2.0, LGPL-3.0, GPL-2.0, CC-BY-SA-3.0, LGPL-2.1, GPL-3.0, MPL-2.0-no-copyleft-exception, IPL-1.0
- <!-- jEdit buffer-local properties: -->
- <!-- :indentSize=1:noTabs=true: -->
- <!-- :xml.root=users-guide.xml: -->
- <chapter id="writing-modes"><title>Mode Definition Syntax</title>
- <para>
- Edit modes are defined using XML, the <firstterm>extensible markup
- language</firstterm>; mode files have the extension
- <filename>.xml</filename>. XML is a very simple language, and as a result
- edit modes are easy to create and modify. This section will
- start with a short XML primer, followed by detailed information about
- each supported tag and highlighting rule.
- </para>
- <para>
- Editing a mode or a mode catalog file within jEdit will cause the
- changes to take effect immediately. If you edit modes using another
- application, the changes will take effect after the
- <guimenu>Utilities</guimenu>><guimenuitem>Reload Edit Modes</guimenuitem>
- command is invoked.
- </para>
- <sect1 id="xml-primer"><title>An XML Primer</title>
- <para>
- A very simple XML file (which also happens to be an edit mode) looks like so:
- </para>
- <programlisting><![CDATA[<?xml version="1.0"?>
- <!DOCTYPE MODE SYSTEM "xmode.dtd">
- <MODE>
- <PROPS>
- <PROPERTY NAME="commentStart" VALUE="/*" />
- <PROPERTY NAME="commentEnd" VALUE="*/" />
- </PROPS>
- <RULES>
- <SPAN TYPE="COMMENT1">
- <BEGIN>/*</BEGIN>
- <END>*/</END>
- </SPAN>
- </RULES>
- </MODE>]]></programlisting>
- <para>
- Note that each opening tag must have a corresponding closing tag.
- If there is nothing between the opening and closing tags, for example
- <literal><TAG></TAG></literal>, the shorthand notation
- <literal><TAG /></literal> may be used. An example of this shorthand
- can be seen
- in the <literal><PROPERTY></literal> tags above.
- </para>
- <para>
- XML is case sensitive. <literal>Span</literal> or <literal>span</literal>
- is not the same as <literal>SPAN</literal>.
- </para>
- <para>
- To insert a special character such as < or > literally in XML
- (for example, inside an attribute value), you must write it as
- an <firstterm>entity</firstterm>. An
- entity consists of the character's symbolic name enclosed with
- <quote>&</quote> and <quote>;</quote>. The most frequently used entities
- are:
- </para>
- <itemizedlist>
- <listitem><para><literal>&lt;</literal> - The less-than (<)
- character</para></listitem>
- <listitem><para><literal>&gt;</literal> - The greater-than (>)
- character</para></listitem>
- <listitem><para><literal>&amp;</literal> - The ampersand (&)
- character</para></listitem>
- </itemizedlist>
- <para>
- For example, the following will cause a syntax error:
- </para>
- <programlisting><SEQ TYPE="OPERATOR">&</SEQ></programlisting>
- <para>
- Instead, you must write:
- </para>
- <programlisting><SEQ TYPE="OPERATOR">&amp;</SEQ></programlisting>
- <para>
- Now that the basics of XML have been covered, the rest of this
- section will cover each construct in detail.
- </para>
- </sect1>
- <sect1 id="mode-preamble"><title>The Preamble and MODE tag</title>
- <para>
- Each mode definition must begin with the following:
- </para>
- <programlisting><?xml version="1.0"?>
- <!DOCTYPE MODE SYSTEM "xmode.dtd"></programlisting>
- <para>
- Each mode definition must also contain exactly one <literal>MODE</literal>
- tag. All other tags (<literal>PROPS</literal>, <literal>RULES</literal>)
- must be placed inside the <literal>MODE</literal> tag. The
- <literal>MODE</literal> tag does not have any defined attributes.
- Here is an example:
- </para>
- <programlisting><![CDATA[<MODE>]]>
- <replaceable>... mode definition goes here ...</replaceable>
- <![CDATA[</MODE>]]></programlisting>
- </sect1>
- <sect1 id="mode-tag-props"><title>The PROPS Tag</title>
- <para>
- The <literal>PROPS</literal> tag and the <literal>PROPERTY</literal> tags
- inside it are used to define mode-specific
- properties. Each <literal>PROPERTY</literal> tag must have a
- <literal>NAME</literal> attribute set to the property's name, and a
- <literal>VALUE</literal> attribute with the property's value.
- </para>
- <para>
- All buffer-local properties listed in <xref linkend="buffer-local" />
- may be given values in edit modes.
- </para>
- <para>
- The following mode properties specify commenting strings:
- </para>
- <itemizedlist>
- <listitem><para><literal>commentEnd</literal> - the comment end
- string, used by the <guimenuitem>Range Comment</guimenuitem> command.
- </para></listitem>
- <listitem><para><literal>commentStart</literal> - the comment start
- string, used by the <guimenuitem>Range Comment</guimenuitem> command.
- </para></listitem>
- <listitem><para><literal>lineComment</literal> - the line comment
- string, used by the <guimenuitem>Line Comment</guimenuitem> command.
- </para></listitem>
- </itemizedlist>
- <para>
- When performing auto indent, a number of mode properties determine the
- resulting indent level:
- </para>
- <itemizedlist>
- <listitem><para>The line and the one before it are scanned for brackets
- listed in the <literal>indentCloseBrackets</literal> and
- <literal>indentOpenBrackets</literal> properties.
- Opening brackets in the previous line increase indent.
- </para>
- <para>
- If <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
- then closing brackets on the current line will line up with
- the line containing the matching opening bracket. For example, in Java mode
- <literal>lineUpClosingBracket</literal> is set to <literal>true</literal>,
- resulting in brackets being indented like so:
- </para>
- <programlisting>{
- // Code
- {
- // More code
- }
- }</programlisting>
- <para>
- If <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
- the line <emphasis>after</emphasis> a closing bracket will be lined up with
- the line containing the matching opening bracket. For example, in Lisp mode
- <literal>lineUpClosingBracket</literal> is set to <literal>false</literal>,
- resulting in brackets being indented like so:
- </para>
- <programlisting>(foo 'a-parameter
- (crazy-p)
- (bar baz ()))
- (print "hello world")</programlisting>
- </listitem>
- <listitem>
- <para>If the previous line contains no opening brackets, or if the
- <literal>doubleBracketIndent</literal> property is set to <literal>true</literal>,
- the previous line is checked against the regular expressions in the
- <literal>indentNextLine</literal> and <literal>indentNextLines</literal>
- properties. If the previous line matches the former, the indent of the
- current line is increased and the subsequent line is shifted back again.
- If the previous line matches the latter, the indent of the current
- and subsequent lines is increased.
- </para>
- <para>
- In Java mode, for example, the <literal>indentNextLine</literal>
- property is set to match control structures such as <quote>if</quote>,
- <quote>else</quote>, <quote>while</quote>, and so on.
- </para>
- <para>
- The
- <literal>doubleBracketIndent</literal> property, if set to the default of
- <literal>false</literal>, results in code indented like so:
- </para>
- <programlisting>while(objects.hasNext())
- {
- Object next = objects.hasNext();
- if(next instanceof Paintable)
- next.paint(g);
- }</programlisting>
- <para>
- On the other hand, settings this property to <quote>true</quote> will
- give the following result:
- </para>
- <programlisting>while(objects.hasNext())
- {
- Object next = objects.hasNext();
- if(next instanceof Paintable)
- next.paint(g);
- }</programlisting></listitem>
- </itemizedlist>
- <para>
- Here is the complete <literal><PROPS></literal> tag for Java mode:
- </para>
- <programlisting><![CDATA[<PROPS>
- <PROPERTY NAME="commentStart" VALUE="/*" />
- <PROPERTY NAME="commentEnd" VALUE="*/" />
- <PROPERTY NAME="lineComment" VALUE="//" />
- <PROPERTY NAME="wordBreakChars" VALUE=",+-=<>/?^&*" />
- <!-- Auto indent -->
- <PROPERTY NAME="indentOpenBrackets" VALUE="{" />
- <PROPERTY NAME="indentCloseBrackets" VALUE="}" />
- <PROPERTY NAME="indentNextLine"
- VALUE="\s*(((if|while)\s*\(|else\s*|else\s+if\s*\(|for\s*\(.*\))[^{;]*)" />
- <!-- set this to 'true' if you want to use GNU coding style -->
- <PROPERTY NAME="doubleBracketIndent" VALUE="false" />
- <PROPERTY NAME="lineUpClosingBracket" VALUE="true" />
- </PROPS>]]></programlisting>
- </sect1>
- <sect1 id="mode-tag-rules"><title>The RULES Tag</title>
- <para>
- <literal>RULES</literal> tags must be placed inside the
- <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
- <firstterm>ruleset</firstterm>. A ruleset consists of a number of
- <firstterm>parser rules</firstterm>, with each parser
- rule specifying how to highlight a specific syntax token. There must
- be at least one ruleset in each edit mode. There can also be more
- than one, with different rulesets being used to highlight different
- parts of a buffer (for example, in HTML mode, one rule set
- highlights HTML tags, and another highlights inline JavaScript).
- For information about using more
- than one ruleset, see <xref linkend="mode-rule-span" />.
- </para>
- <para>
- The <literal>RULES</literal> tag supports the following attributes, all of
- which are optional:
- </para>
- <itemizedlist>
- <listitem><para><literal>SET</literal> - the name of this ruleset.
- All rulesets other than the first must have a name.
- </para></listitem>
- <listitem><para><literal>IGNORE_CASE</literal> - if set to
- <literal>FALSE</literal>, matches will be case sensitive. Otherwise, case
- will not matter. Default is <literal>TRUE</literal>.
- </para></listitem>
- <listitem><para><literal>NO_WORD_SEP</literal> - any non-alphanumeric
- character <emphasis>not</emphasis> in this list is treated as a word separator
- for the purposes of syntax highlighting.
- </para></listitem>
- <listitem><para><literal>DEFAULT</literal> - the token type for
- text which doesn't match
- any specific rule. Default is <literal>NULL</literal>. See
- <xref linkend="mode-syntax-tokens" /> for a list of token types.
- </para></listitem>
- <listitem><para><literal>HIGHLIGHT_DIGITS</literal>
- </para></listitem>
- <listitem><para><literal>DIGIT_RE</literal> - see below for information
- about these two attributes.</para></listitem>
- </itemizedlist>
- <para>
- Here is an example <literal>RULES</literal> tag:
- </para>
- <programlisting><RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE">
- <replaceable>... parser rules go here ...</replaceable>
- </RULES></programlisting>
- <sect2><title>Highlighting Numbers</title>
- <para>
- If the <literal>HIGHLIGHT_DIGITS</literal> attribute is set to
- <literal>TRUE</literal>, jEdit will attempt to highlight numbers in this
- ruleset.
- </para>
- <para>
- Any word consisting entirely of digits (0-9) will be highlighted with the
- <literal>DIGIT</literal> token type.
- A word that contains other letters in addition to digits will be
- highlighted with the
- <literal>DIGIT</literal> token type only if it matches the regular
- expression specified in the <literal>DIGIT_RE</literal> attribute.
- If this attribute is not specified, it will not be highlighted.
- </para>
- <para>
- Here is an example <literal>DIGIT_RE</literal> regular expression that highlights
- Java-style numeric literals (normal numbers, hexadecimals
- prefixed with <literal>0x</literal>, numbers suffixed with various
- type indicators, and floating point literals containing an exponent):
- </para>
- <programlisting>DIGIT_RE="(0x[[:xdigit:]]+|[[:digit:]]+(e[[:digit:]]*)?)[lLdDfF]?"</programlisting>
- <para>
- Regular expression syntax is described in <xref linkend="regexps" />.
- </para>
- </sect2>
-
- <sect2><title>Rule Ordering Requirements</title>
- <para>
- You might encounter this very common pitfall when writing your own modes.
- </para>
- <para>
- Since jEdit checks buffer text against parser rules in the order they appear
- in the ruleset, more specific rules must be placed before generalized ones,
- otherwise the generalized rules will catch everything.
- </para>
- <para>
- This is best demonstrated with an example. The following is incorrect rule
- ordering:
- </para>
- <programlisting><![CDATA[<SPAN TYPE="MARKUP">
- <BEGIN>[</BEGIN>
- <END>]</END>
- </SPAN>
- <SPAN TYPE="KEYWORD1">
- <BEGIN>[!</BEGIN>
- <END>]</END>
- </SPAN>]]></programlisting>
- <para>
- If you write the above in a rule set, any occurrence of <quote>[</quote>
- (even things like <quote>[!DEFINE</quote>, etc)
- will be highlighted using the first rule, because it will be the
- first to match. This is most likely not the intended behavior.
- </para>
- <para>
- The problem can be solved by placing the more specific rule before the
- general one:
- </para>
- <programlisting><![CDATA[<SPAN TYPE="KEYWORD1">
- <BEGIN>[!</BEGIN>
- <END>]</END>
- </SPAN>
- <SPAN TYPE="MARKUP">
- <BEGIN>[</BEGIN>
- <END>]</END>
- </SPAN>]]></programlisting>
- <para>
- Now, if the buffer contains the text <quote>[!SPECIAL]</quote>, the
- rules will be checked in order, and the first rule will be the first
- to match. However, if you write <quote>[FOO]</quote>, it will be highlighted
- using the second rule, which is exactly what you would expect.
- </para>
- </sect2>
- <sect2><title>Per-Ruleset Properties</title>
- <para>
- The <literal>PROPS</literal> tag (described in <xref linkend="mode-tag-props"/>)
- can also be placed inside the <literal>RULES</literal> tag to define
- ruleset-specific properties. The following properties can
- be set on a per-ruleset basis:
- </para>
- <itemizedlist>
- <listitem><para><literal>commentEnd</literal> - the comment end
- string.
- </para></listitem>
- <listitem><para><literal>commentStart</literal> - the comment start
- string.
- </para></listitem>
- <listitem><para><literal>lineComment</literal> - the line comment
- string.
- </para></listitem>
- </itemizedlist>
- <para>
- This allows different parts of a file to have different comment strings
- (in the case of HTML, for example, in HTML text and inline JavaScript).
- For information about the commenting commands,
- see <xref linkend="commenting"/>.
- </para>
- </sect2>
- </sect1>
- <sect1 id="mode-rule-terminate"><title>The TERMINATE Tag</title>
- <para>
- The <literal>TERMINATE</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, specifies that parsing should stop
- after the specified number of characters have been read from a line. The
- number of characters to terminate after should be specified with the
- <literal>AT_CHAR</literal> attribute. Here is an example:
- </para>
- <programlisting><TERMINATE AT_CHAR="1" /></programlisting>
- <para>
- This rule is used in Patch mode, for example, because only the first
- character of each line affects highlighting.
- </para>
- </sect1>
- <sect1 id="mode-rule-span"><title>The SPAN Tag</title>
- <para>
- The <literal>SPAN</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, highlights text between a start
- and end string. The start and end strings are specified inside
- child elements of the <literal>SPAN</literal> tag.
- The following attributes are supported:
- </para>
- <itemizedlist>
- <listitem><para><literal>TYPE</literal> - The token type to highlight the
- span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
- types.</para></listitem>
- <listitem><para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if the start
- sequence occurs at the beginning of a line.</para></listitem>
- <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if the
- start sequence is the first non-whitespace text in the line.</para>
- </listitem>
- <listitem><para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if the start
- sequence occurs at the beginning of a word.</para></listitem>
- <listitem><para><literal>DELEGATE</literal> - text inside the span will be
- highlighted with the specified ruleset. To delegate to a ruleset defined
- in the current mode, just specify its name. To delegate to a ruleset
- defined in another mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para></listitem>
- <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
- <literal>TRUE</literal>, the start and end sequences will not be highlighted,
- only the text between them will.</para></listitem>
- <listitem><para><literal>NO_ESCAPE</literal> - If set to
- <literal>TRUE</literal>, the ruleset's escape character will have no
- effect before the span's end string. Otherwise, the presence of the escape
- character will cause that occurrence of the end string to be ignored.</para></listitem>
- <listitem><para><literal>NO_LINE_BREAK</literal> - If set to
- <literal>TRUE</literal>, the span will not cross line breaks.</para></listitem>
- <listitem><para><literal>NO_WORD_BREAK</literal> - If set to
- <literal>TRUE</literal>, the span will not cross word breaks.</para></listitem>
- </itemizedlist>
- <para>
- Note that the <literal>AT_LINE_START</literal>,
- <literal>AT_WHITESPACE_END</literal> and
- <literal>AT_WORD_START</literal> attributes can also be used on the
- <literal>BEGIN</literal> and <literal>END</literal> elements. Setting these
- attributes to the same value on both elements has the same effect as
- setting them on the <literal>SPAN</literal> element.
- </para>
- <para>
- Here is a <literal>SPAN</literal> that highlights Java string literals,
- which cannot include line breaks:
- </para>
- <programlisting><SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE">
- <BEGIN>"</BEGIN>
- <END>"</END>
- </SPAN></programlisting>
- <para>
- Here is a <literal>SPAN</literal> that highlights Java documentation
- comments by delegating to the <quote>JAVADOC</quote> ruleset defined
- elsewhere in the current mode:
- </para>
- <programlisting><SPAN TYPE="COMMENT2" DELEGATE="JAVADOC">
- <BEGIN>/**</BEGIN>
- <END>*/</END>
- </SPAN></programlisting>
- <para>
- Here is a <literal>SPAN</literal> that highlights HTML cascading stylesheets
- inside <literal><STYLE></literal> tags by delegating to the main
- ruleset in the CSS edit mode:
- </para>
- <programlisting><SPAN TYPE="MARKUP" DELEGATE="css::MAIN">
- <BEGIN>&lt;style&gt;</BEGIN>
- <END>&lt;/style&gt;</END>
- </SPAN></programlisting>
- </sect1>
- <sect1 id="mode-rule-span-regexp"><title>The SPAN_REGEXP Tag</title>
- <para>
- The <literal>SPAN_REGEXP</literal> rule is similar to the
- <literal>SPAN</literal> rule except the start sequence is taken to be
- a regular expression. In addition to the attributes supported by
- the <literal>SPAN</literal> tag, the
- <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
- the first character that
- the regular expression matches. Note that this disallows regular expressions
- which can match more than one character at the start position.
- </para>
- <para>
- Any text matched by groups in the <literal>BEGIN</literal> regular
- expression is substituted in the <literal>END</literal> string. See
- below for an example of where this is useful.
- </para>
- <para>
- Regular expression syntax is described in <xref linkend="regexps" />.
- </para>
- <para>
- Here is a <literal>SPAN_REGEXP</literal> rule that highlights
- <quote>read-ins</quote> in shell scripts:
- </para>
- <programlisting><SPAN_REGEXP HASH_CHAR="<" TYPE="LITERAL1" DELEGATE="LITERAL">
- <BEGIN><![CDATA[<<[[:space:]'"]*([[:alnum:]_]+)[[:space:]'"]*]]></BEGIN>
- <END>$1</END>
- </SPAN_REGEXP></programlisting>
- <para>
- Here is a <literal>SPAN_REGEXP</literal> rule that highlights constructs
- placed between <literal><#ftl</literal> and <literal>></literal>,
- as long as the <literal><#ftl</literal> is followed by a word break:
- </para>
- <programlisting><![CDATA[<SPAN_REGEXP TYPE="KEYWORD1" HASH_CHAR="<" DELEGATE="EXPRESSION">
- <BEGIN><#ftl\></BEGIN>
- <END>></END>
- </SPAN_REGEXP>]]></programlisting>
- </sect1>
- <sect1 id="mode-rule-eol-span"><title>The EOL_SPAN Tag</title>
- <para>
- An <literal>EOL_SPAN</literal> is similar to a <literal>SPAN</literal>
- except that highlighting stops at the end of the line, and no end sequence
- needs to be specified. The text to match is specified between the opening and
- closing <literal>EOL_SPAN</literal> tags.
- The following attributes are supported:
- </para>
- <itemizedlist>
- <listitem><para><literal>TYPE</literal> - The token type to highlight the
- span with. See <xref linkend="mode-syntax-tokens" /> for a list of token
- types.</para></listitem>
- <listitem><para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if the start
- sequence occurs at the beginning of a line.</para></listitem>
- <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if the
- sequence is the first non-whitespace text in the line.</para>
- </listitem>
- <listitem><para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if the start
- sequence occurs at the beginning of a word.</para></listitem>
- <listitem><para><literal>DELEGATE</literal> - text inside the span will be
- highlighted with the specified ruleset. To delegate to a ruleset defined
- in the current mode, just specify its name. To delegate to a ruleset
- defined in another mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para></listitem>
- <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
- <literal>TRUE</literal>, the start and end sequences will not be highlighted,
- only the text between them will.</para></listitem>
- </itemizedlist>
- <para>
- Here is an <literal>EOL_SPAN</literal> that highlights C++ comments:
- </para>
- <programlisting><EOL_SPAN TYPE="COMMENT1">//</EOL_SPAN></programlisting>
- </sect1>
- <sect1 id="mode-rule-eol-span-regexp"><title>The EOL_SPAN_REGEXP Tag</title>
- <para>
- The <literal>EOL_SPAN_REGEXP</literal> rule is similar to the
- <literal>EOL_SPAN</literal> rule except the match sequence is taken to be
- a regular expression. In addition to the attributes supported by
- the <literal>EOL_SPAN</literal> tag, the
- <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
- the first character that
- the regular expression matches. Note that this disallows regular expressions
- which can match more than one character at the start position.
- </para>
- <para>
- Regular expression syntax is described in <xref linkend="regexps" />.
- </para>
- </sect1>
- <sect1 id="mode-rule-mark-prev"><title>The MARK_PREVIOUS Tag</title>
- <para>
- The <literal>MARK_PREVIOUS</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, highlights from the end of the
- previous syntax token to the matched text. The text to match
- is specified between opening and closing <literal>MARK_PREVIOUS</literal>
- tags. The following attributes are supported:
- </para>
- <itemizedlist>
- <listitem><para><literal>TYPE</literal> - The token type to highlight the
- text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
- types.</para></listitem>
- <listitem><para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
- at the beginning of a line.</para></listitem>
- <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if it is
- the first non-whitespace text in the line.</para>
- </listitem>
- <listitem><para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if
- it occurs at the beginning of a word.</para></listitem>
- <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
- <literal>TRUE</literal>, the match will not be highlighted,
- only the text before it will.</para></listitem>
- </itemizedlist>
- <para>
- Here is a rule that highlights labels in Java mode (for example,
- <quote>XXX:</quote>):
- </para>
- <programlisting><MARK_PREVIOUS AT_WHITESPACE_END="TRUE"
- EXCLUDE_MATCH="TRUE">:</MARK_PREVIOUS></programlisting>
- </sect1>
- <sect1 id="mode-rule-mark-following"><title>The MARK_FOLLOWING Tag</title>
- <para>
- The <literal>MARK_FOLLOWING</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, highlights from the start of the
- match to the next syntax token. The text to match
- is specified between opening and closing <literal>MARK_FOLLOWING</literal>
- tags. The following attributes are supported:
- </para>
- <itemizedlist>
- <listitem><para><literal>TYPE</literal> - The token type to highlight the
- text with. See <xref linkend="mode-syntax-tokens" /> for a list of token
- types.</para></listitem>
- <listitem><para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
- at the beginning of a line.</para></listitem>
- <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if it is
- the first non-whitespace text in the line.</para>
- </listitem>
- <listitem><para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if
- it occurs at the beginning of a word.</para></listitem>
- <listitem><para><literal>EXCLUDE_MATCH</literal> - If set to
- <literal>TRUE</literal>, the match will not be highlighted,
- only the text after it will.</para></listitem>
- </itemizedlist>
- <para>
- Here is a rule that highlights variables in Unix shell scripts
- (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):
- </para>
- <programlisting><MARK_FOLLOWING TYPE="KEYWORD2">$</MARK_FOLLOWING></programlisting>
- </sect1>
- <sect1 id="mode-rule-seq"><title>The SEQ Tag</title>
- <para>
- The <literal>SEQ</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, highlights fixed sequences of text. The text
- to highlight is specified between opening and closing <literal>SEQ</literal>
- tags. The following attributes are supported:
- </para>
- <itemizedlist>
- <listitem><para><literal>TYPE</literal> - the token type to highlight the
- sequence with. See <xref linkend="mode-syntax-tokens" /> for a list of token
- types.</para></listitem>
- <listitem><para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if it occurs
- at the beginning of a line.</para></listitem>
- <listitem><para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if it is
- the first non-whitespace text in the line.</para>
- </listitem>
- <listitem><para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted if
- it occurs at the beginning of a word.</para></listitem>
- <listitem><para><literal>DELEGATE</literal> - if this attribute is specified,
- all text after the sequence will be highlighted using this ruleset.
- To delegate to a ruleset defined
- in the current mode, just specify its name. To delegate to a ruleset
- defined in another mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para></listitem>
- </itemizedlist>
- <para>
- The following rules highlight a few Java operators:
- </para>
- <programlisting><SEQ TYPE="OPERATOR">+</SEQ>
- <SEQ TYPE="OPERATOR">-</SEQ>
- <SEQ TYPE="OPERATOR">*</SEQ>
- <SEQ TYPE="OPERATOR">/</SEQ></programlisting>
- </sect1>
- <sect1 id="mode-rule-seq-regexp"><title>The SEQ_REGEXP Tag</title>
- <para>
- The <literal>SEQ_REGEXP</literal> rule is similar to the
- <literal>SEQ</literal> rule except the match sequence is taken to be
- a regular expression. In addition to the attributes supported by
- the <literal>SEQ</literal> tag, the
- <literal>HASH_CHAR</literal> attribute must be specified. It must be set to
- the first character that
- the regular expression matches. Note that this disallows regular expressions
- which can match more than one character at the start position.
- </para>
- <para>
- Here is an example of a <literal>SEQ_REGEXP</literal> rule that highlights
- Perl's matcher constructions such as <literal>m/(.+):(\d+):(.+)/</literal>:
- </para>
- <programlisting><![CDATA[<SEQ_REGEXP TYPE="MARKUP"
- HASH_CHAR="m"
- AT_WORD_START="TRUE"
- >m([[:punct:]])(?:.*?[^\\])*?\1[sgiexom]*</SEQ_REGEXP>]]></programlisting>
- <para>
- Regular expression syntax is described in <xref linkend="regexps" />.
- </para>
- </sect1>
- <sect1 id="mode-rule-keywords"><title>The KEYWORDS Tag</title>
- <para>
- The <literal>KEYWORDS</literal> tag, which must be placed inside a
- <literal>RULES</literal> tag and can only appear once, specifies a list of
- keywords to highlight.
- Keywords are similar to <literal>SEQ</literal>s, except that
- <literal>SEQ</literal>s match anywhere in the text, whereas keywords only
- match whole words. Words are considered to be runs of text separated by
- non-alphanumeric characters.
- </para>
- <para>
- The <literal>KEYWORDS</literal> tag does not define any attributes.
- </para>
- <para>
- Each child element of the <literal>KEYWORDS</literal> tag is an element
- whose name is a token type, and whose content is the keyword to
- highlight. For example, the following rule highlights the most common Java
- keywords:
- </para>
- <programlisting><KEYWORDS>
- <KEYWORD1>if</KEYWORD1>
- <KEYWORD1>else</KEYWORD1>
- <KEYWORD3>int</KEYWORD3>
- <KEYWORD3>void</KEYWORD3>
- </KEYWORDS></programlisting>
- </sect1>
- <sect1 id="mode-syntax-tokens"><title>Token Types</title>
- <para>
- Parser rules can highlight tokens using any of the following token
- types:
- </para>
- <itemizedlist>
- <listitem><para><literal>NULL</literal> - no special
- highlighting is performed on tokens of type <literal>NULL</literal>
- </para></listitem>
- <listitem><para><literal>COMMENT1</literal>
- </para></listitem>
- <listitem><para><literal>COMMENT2</literal>
- </para></listitem>
- <listitem><para><literal>COMMENT3</literal>
- </para></listitem>
- <listitem><para><literal>COMMENT4</literal>
- </para></listitem>
- <listitem><para><literal>FUNCTION</literal>
- </para></listitem>
- <listitem><para><literal>INVALID</literal><!-- - tokens of this type are
- automatically added if a <literal>NO_WORD_BREAK</literal> or
- <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
- one word or line, respectively. -->
- </para></listitem>
- <listitem><para><literal>KEYWORD1</literal>
- </para></listitem>
- <listitem><para><literal>KEYWORD2</literal>
- </para></listitem>
- <listitem><para><literal>KEYWORD3</literal>
- </para></listitem>
- <listitem><para><literal>KEYWORD4</literal>
- </para></listitem>
- <listitem><para><literal>LABEL</literal>
- </para></listitem>
- <listitem><para><literal>LITERAL1</literal>
- </para></listitem>
- <listitem><para><literal>LITERAL2</literal>
- </para></listitem>
- <listitem><para><literal>LITERAL3</literal>
- </para></listitem>
- <listitem><para><literal>LITERAL4</literal>
- </para></listitem>
- <listitem><para><literal>MARKUP</literal>
- </para></listitem>
- <listitem><para><literal>OPERATOR</literal>
- </para></listitem>
- </itemizedlist>
- </sect1>
- </chapter>