/jEdit/tags/jedit-4-3-pre14/doc/users-guide/writing-modes.xml
# · XML · 910 lines · 895 code · 10 blank · 5 comment · 0 complexity · e699a133e6b102bf84adb486588096d3 MD5 · raw file
- <?xml version="1.0" encoding="UTF-8"?>
- <chapter id="writing-modes">
- <title>Mode Definition Syntax</title>
- <!-- jEdit buffer-local properties: -->
- <!-- :indentSize=1:noTabs=true: -->
- <!-- :xml.root=users-guide.xml: -->
- <para>Edit modes are defined using XML, the <firstterm>eXtensible Markup
- Language</firstterm>; mode files have the extension
- <filename>.xml</filename>. XML is a very simple language, and as a result
- edit modes are easy to create and modify. This section will start with a
- short XML primer, followed by detailed information about each supported tag
- and highlighting rule.</para>
- <para>Editing a mode or a mode catalog file within jEdit will cause the
- changes to take effect immediately. If you edit modes using another
- application, the changes will take effect after the
- <guimenu>Utilities</guimenu> > <guisubmenu>Troubleshooting</guisubmenu>
- > <guimenuitem>Reload Edit Modes</guimenuitem> command is invoked.</para>
- <sect1 id="xml-primer">
- <title>An XML Primer</title>
- <para>A very simple XML file (which also happens to be an edit mode)
- looks like so:</para>
- <programlisting><?xml version="1.0"?>
- <!DOCTYPE MODE SYSTEM "xmode.dtd">
- <MODE>
- <PROPS>
- <PROPERTY NAME="commentStart" VALUE="/*" />
- <PROPERTY NAME="commentEnd" VALUE="*/" />
- </PROPS>
- <RULES>
- <SPAN TYPE="COMMENT1">
- <BEGIN>/*</BEGIN>
- <END>*/</END>
- </SPAN>
- </RULES>
- </MODE></programlisting>
- <para>Note that each opening tag must have a corresponding closing tag.
- If there is nothing between the opening and closing tags, for example
- <literal><TAG></TAG></literal>, the shorthand notation
- <literal><TAG /></literal> may be used. An example of this
- shorthand can be seen in the <literal><PROPERTY></literal> tags
- above.</para>
- <para>XML is case sensitive. <literal>Span</literal> or
- <literal>span</literal> is not the same as
- <literal>SPAN</literal>.</para>
- <para>To insert a special character such as < or > literally in
- XML (for example, inside an attribute value), you must write it as an
- <firstterm>entity</firstterm>. An entity consists of the character's
- symbolic name enclosed within <quote>&</quote> and <quote>;</quote>.
- The most frequently used entities are:</para>
- <itemizedlist>
- <listitem>
- <para><literal>&lt;</literal> - The less-than (<)
- character</para>
- </listitem>
- <listitem>
- <para><literal>&gt;</literal> - The greater-than (>)
- character</para>
- </listitem>
- <listitem>
- <para><literal>&amp;</literal> - The ampersand (&)
- character</para>
- </listitem>
- </itemizedlist>
- <para>For example, the following will cause a syntax error:</para>
- <programlisting><SEQ TYPE="OPERATOR">&</SEQ></programlisting>
- <para>Instead, you must write:</para>
- <programlisting><SEQ TYPE="OPERATOR">&amp;</SEQ></programlisting>
- <para>Now that the basics of XML have been covered, the rest of this
- section will cover each construct in detail.</para>
- </sect1>
- <sect1 id="mode-preamble">
- <title>The Preamble and MODE tag</title>
- <para>Each mode definition must begin with the following:</para>
- <programlisting><?xml version="1.0"?>
- <!DOCTYPE MODE SYSTEM "xmode.dtd"></programlisting>
- <para>Each mode definition must also contain exactly one
- <literal>MODE</literal> tag. All other tags (<literal>PROPS</literal>,
- <literal>RULES</literal>) must be placed inside the
- <literal>MODE</literal> tag. The <literal>MODE</literal> tag does not
- have any defined attributes. Here is an example:</para>
- <programlisting><MODE>
- <replaceable>... mode definition goes here ...</replaceable>
- </MODE></programlisting>
- </sect1>
- <sect1 id="mode-tag-props">
- <title>The PROPS Tag</title>
- <para>The <literal>PROPS</literal> tag and the
- <literal>PROPERTY</literal> tags inside it are used to define
- mode-specific properties. Each <literal>PROPERTY</literal> tag must have
- a <literal>NAME</literal> attribute set to the property's name, and a
- <literal>VALUE</literal> attribute with the property's value.</para>
- <para>All buffer-local properties listed in <xref
- linkend="buffer-local" /> may be given values in edit modes.</para>
- <para>The following mode properties specify commenting strings:</para>
- <itemizedlist>
- <listitem>
- <para><literal>commentEnd</literal> - the comment end string,
- used by the <guimenuitem>Range Comment</guimenuitem>
- command.</para>
- </listitem>
- <listitem>
- <para><literal>commentStart</literal> - the comment start
- string, used by the <guimenuitem>Range Comment</guimenuitem>
- command.</para>
- </listitem>
- <listitem>
- <para><literal>lineComment</literal> - the line comment string,
- used by the <guimenuitem>Line Comment</guimenuitem>
- command.</para>
- </listitem>
- </itemizedlist>
- <para>When performing auto indent, a number of mode properties determine
- the resulting indent level:</para>
- <itemizedlist>
- <listitem>
- <para>The line and the one before it are scanned for brackets
- listed in the <literal>indentCloseBrackets</literal> and
- <literal>indentOpenBrackets</literal> properties. Opening
- brackets in the previous line increase indent.</para>
- <para>If <literal>lineUpClosingBracket</literal> is set to
- <literal>true</literal>, then closing brackets on the current
- line will line up with the line containing the matching opening
- bracket. For example, in Java mode
- <literal>lineUpClosingBracket</literal> is set to
- <literal>true</literal>, resulting in brackets being indented
- like so:</para>
- <programlisting>{
- // Code
- {
- // More code
- }
- }</programlisting>
- <para>If <literal>lineUpClosingBracket</literal> is set to
- <literal>false</literal>, the line <emphasis>after</emphasis> a
- closing bracket will be lined up with the line containing the
- matching opening bracket. For example, in Lisp mode
- <literal>lineUpClosingBracket</literal> is set to
- <literal>false</literal>, resulting in brackets being indented
- like so:</para>
- <programlisting>(foo 'a-parameter
- (crazy-p)
- (bar baz ()))
- (print "hello world")</programlisting>
- </listitem>
- <listitem>
- <para>If the previous line contains no opening brackets, or if
- the <literal>doubleBracketIndent</literal> property is set to
- <literal>true</literal>, the previous line is checked against
- the regular expressions in the <literal>indentNextLine</literal>
- and <literal>indentNextLines</literal> properties. If the
- previous line matches the former, the indent of the current line
- is increased and the subsequent line is shifted back again. If
- the previous line matches the latter, the indent of the current
- and subsequent lines is increased.</para>
- <para>In Java mode, for example, the
- <literal>indentNextLine</literal> property is set to match
- control structures such as <quote>if</quote>,
- <quote>else</quote>, <quote>while</quote>, and so on.</para>
- <para>The <literal>doubleBracketIndent</literal> property, if
- set to the default of <literal>false</literal>, results in code
- indented like so:</para>
- <programlisting>while(objects.hasNext())
- {
- Object next = objects.hasNext();
- if(next instanceof Paintable)
- next.paint(g);
- }</programlisting>
- <para>On the other hand, settings this property to
- <quote>true</quote> will give the following result:</para>
- <programlisting>while(objects.hasNext())
- {
- Object next = objects.hasNext();
- if(next instanceof Paintable)
- next.paint(g);
- }</programlisting>
- </listitem>
- </itemizedlist>
- <para>Here is the complete <literal><PROPS></literal> tag for Java
- mode:</para>
- <programlisting><PROPS>
- <PROPERTY NAME="commentStart" VALUE="/*" />
- <PROPERTY NAME="commentEnd" VALUE="*/" />
- <PROPERTY NAME="lineComment" VALUE="//" />
- <PROPERTY NAME="wordBreakChars" VALUE=",+-=&lt;&gt;/?^&amp;*" />
- <!-- Auto indent -->
- <PROPERTY NAME="indentOpenBrackets" VALUE="{" />
- <PROPERTY NAME="indentCloseBrackets" VALUE="}" />
- <PROPERTY NAME="unalignedOpenBrackets" VALUE="(" />
- <PROPERTY NAME="unalignedCloseBrackets" VALUE=")" />
- <PROPERTY NAME="indentNextLine"
- VALUE="\s*(((if|while)\s*\(|else\s*|else\s+if\s*\(|for\s*\(.*\))[^{;]*)" />
- <PROPERTY NAME="unindentThisLine"
- VALUE="^.*(default:\s*|case.*:.*)$" />
- <PROPERTY NAME="electricKeys" VALUE=":" />
- <!-- set this to 'true' if you want to use GNU coding style -->
- <PROPERTY NAME="doubleBracketIndent" VALUE="false" />
- <PROPERTY NAME="lineUpClosingBracket" VALUE="true" />
- </PROPS></programlisting>
- </sect1>
- <sect1 id="mode-tag-rules">
- <title>The RULES Tag</title>
- <para><literal>RULES</literal> tags must be placed inside the
- <literal>MODE</literal> tag. Each <literal>RULES</literal> tag defines a
- <firstterm>ruleset</firstterm>. A ruleset consists of a number of
- <firstterm>parser rules</firstterm>, with each parser rule specifying
- how to highlight a specific syntax token. There must be at least one
- ruleset in each edit mode. There can also be more than one, with
- different rulesets being used to highlight different parts of a buffer
- (for example, in HTML mode, one rule set highlights HTML tags, and
- another highlights inline JavaScript). For information about using more
- than one ruleset, see <xref linkend="mode-rule-span" />.</para>
- <para>The <literal>RULES</literal> tag supports the following
- attributes, all of which are optional:</para>
- <itemizedlist>
- <listitem>
- <para><literal>SET</literal> - the name of this ruleset. All
- rulesets other than the first must have a name.</para>
- </listitem>
- <listitem>
- <para><literal>IGNORE_CASE</literal> - if set to
- <literal>FALSE</literal>, matches will be case sensitive.
- Otherwise, case will not matter. Default is
- <literal>TRUE</literal>.</para>
- </listitem>
- <listitem>
- <para><literal>ESCAPE</literal> - specifies a character sequence
- for escaping literals. The first character following the escape
- sequence is not considered as input for syntax highlighting,
- thus being highlighted with default token for the rule set.
- </para>
- </listitem>
- <listitem>
- <para><literal>NO_WORD_SEP</literal> - any non-alphanumeric
- character <emphasis>not</emphasis> in this list is treated as a
- word separator for the purposes of syntax highlighting.</para>
- </listitem>
- <listitem>
- <para><literal>DEFAULT</literal> - the token type for text which
- doesn't match any specific rule. Default is
- <literal>NULL</literal>. See <xref
- linkend="mode-syntax-tokens" /> for a list of token
- types.</para>
- </listitem>
- <listitem>
- <para><literal>HIGHLIGHT_DIGITS</literal></para>
- </listitem>
- <listitem>
- <para><literal>DIGIT_RE</literal> - see below for information
- about these two attributes.</para>
- </listitem>
- </itemizedlist>
- <para>Here is an example <literal>RULES</literal> tag:</para>
- <programlisting><RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE">
- <replaceable>... parser rules go here ...</replaceable>
- </RULES></programlisting>
- <sect2>
- <title>Highlighting Numbers</title>
- <para>If the <literal>HIGHLIGHT_DIGITS</literal> attribute is set to
- <literal>TRUE</literal>, jEdit will attempt to highlight numbers in
- this ruleset.</para>
- <para>Any word consisting entirely of digits (0-9) will be
- highlighted with the <literal>DIGIT</literal> token type. A word
- that contains other letters in addition to digits will be
- highlighted with the <literal>DIGIT</literal> token type only if it
- matches the regular expression specified in the
- <literal>DIGIT_RE</literal> attribute. If this attribute is not
- specified, it will not be highlighted.</para>
- <para>Here is an example <literal>DIGIT_RE</literal> regular
- expression that highlights Java-style numeric literals (normal
- numbers, hexadecimals prefixed with <literal>0x</literal>, numbers
- suffixed with various type indicators, and floating point literals
- containing an exponent):</para>
- <programlisting>DIGIT_RE="(0[lL]?|[1-9]\d{0,9}(\d{0,9}[lL])?|0[xX]\p{XDigit}{1,8}(\p{XDigit}{0,8}[lL])?|0[0-7]{1,11}([0-7]{0,11}[lL])?|([0-9]+\.[0-9]*|\.[0-9]+)([eE][+-]?[0-9]+)?[fFdD]?|[0-9]+([eE][+-]?[0-9]+[fFdD]?|([eE][+-]?[0-9]+)?[fFdD]))"</programlisting>
- <para>Regular expression syntax is described in <xref
- linkend="regexps" />.</para>
- </sect2>
- <sect2 id="rule-ordering">
- <title>Rule Ordering Requirements</title>
- <para>You might encounter this very common pitfall when writing your
- own modes.</para>
- <para>Since jEdit checks buffer text against parser rules in the
- order they appear in the ruleset, more specific rules must be placed
- before generalized ones, otherwise the generalized rules will catch
- everything.</para>
- <para>This is best demonstrated with an example. The following is
- incorrect rule ordering:</para>
- <programlisting><SPAN TYPE="MARKUP">
- <BEGIN>[</BEGIN>
- <END>]</END>
- </SPAN>
- <SPAN TYPE="KEYWORD1">
- <BEGIN>[!</BEGIN>
- <END>]</END>
- </SPAN></programlisting>
- <para>If you write the above in a rule set, any occurrence of
- <quote>[</quote> (even things like <quote>[!DEFINE</quote>, etc)
- will be highlighted using the first rule, because it will be the
- first to match. This is most likely not the intended
- behavior.</para>
- <para>The problem can be solved by placing the more specific rule
- before the general one:</para>
- <programlisting><SPAN TYPE="KEYWORD1">
- <BEGIN>[!</BEGIN>
- <END>]</END>
- </SPAN>
- <SPAN TYPE="MARKUP">
- <BEGIN>[</BEGIN>
- <END>]</END>
- </SPAN></programlisting>
- <para>Now, if the buffer contains the text
- <quote>[!SPECIAL]</quote>, the rules will be checked in order, and
- the first rule will be the first to match. However, if you write
- <quote>[FOO]</quote>, it will be highlighted using the second rule,
- which is exactly what you would expect.</para>
- </sect2>
- <sect2>
- <title>Per-Ruleset Properties</title>
- <para>The <literal>PROPS</literal> tag (described in <xref
- linkend="mode-tag-props" />) can also be placed inside the
- <literal>RULES</literal> tag to define ruleset-specific properties.
- The following properties can be set on a per-ruleset basis:</para>
- <itemizedlist>
- <listitem>
- <para><literal>commentEnd</literal> - the comment end
- string.</para>
- </listitem>
- <listitem>
- <para><literal>commentStart</literal> - the comment start
- string.</para>
- </listitem>
- <listitem>
- <para><literal>lineComment</literal> - the line comment
- string.</para>
- </listitem>
- </itemizedlist>
- <para>This allows different parts of a file to have different
- comment strings (in the case of HTML, for example, in HTML text and
- inline JavaScript). For information about the commenting commands,
- see <xref linkend="commenting" />.</para>
- </sect2>
- </sect1>
- <sect1 id="mode-rule-terminate">
- <title>The TERMINATE Tag</title>
- <para>The <literal>TERMINATE</literal> rule, which must be placed inside
- a <literal>RULES</literal> tag, specifies that parsing should stop after
- the specified number of characters have been read from a line. The
- number of characters to terminate after should be specified with the
- <literal>AT_CHAR</literal> attribute. Here is an example:</para>
- <programlisting><TERMINATE AT_CHAR="1" /></programlisting>
- <para>This rule is used in Patch mode, for example, because only the
- first character of each line affects highlighting.</para>
- </sect1>
- <sect1 id="mode-rule-span">
- <title>The SPAN Tag</title>
- <para>The <literal>SPAN</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, highlights text between a start and end
- string. The start and end strings are specified inside child elements of
- the <literal>SPAN</literal> tag. The following attributes are
- supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>TYPE</literal> - The token type to highlight the
- span with. See <xref linkend="mode-syntax-tokens" /> for a list
- of token types.</para>
- </listitem>
- <listitem>
- <para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if
- the start sequence occurs at the beginning of a line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if
- the start sequence is the first non-whitespace text in the
- line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if
- the start sequence occurs at the beginning of a word.</para>
- </listitem>
- <listitem>
- <para><literal>DELEGATE</literal> - text inside the span will be
- highlighted with the specified ruleset. To delegate to a ruleset
- defined in the current mode, just specify its name. To delegate
- to a ruleset defined in another mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para>
- </listitem>
- <listitem>
- <para><literal>MATCH_TYPE</literal> - Controls how the start and
- end of the sequence will be highlighted. See <xref
- linkend="mode-match-type" /> for more information.</para>
- </listitem>
- <listitem>
- <para><literal>ESCAPE</literal> - specifies a character sequence
- for escaping characters. The first character following the escape
- sequence is not considered as input for syntax highlighting,
- thus being highlighted with rule's token.
- </para>
- </listitem>
- <listitem>
- <para><literal>NO_LINE_BREAK</literal> - If set to
- <literal>TRUE</literal>, the span will not cross line
- breaks.</para>
- </listitem>
- <listitem>
- <para><literal>NO_WORD_BREAK</literal> - If set to
- <literal>TRUE</literal>, the span will not cross word
- breaks.</para>
- </listitem>
- </itemizedlist>
- <para>Note that the <literal>AT_LINE_START</literal>,
- <literal>AT_WHITESPACE_END</literal> and
- <literal>AT_WORD_START</literal> attributes can also be used on the
- <literal>BEGIN</literal> and <literal>END</literal> elements. Setting
- these attributes to the same value on both elements has the same effect
- as setting them on the <literal>SPAN</literal> element.</para>
- <para>Here is a <literal>SPAN</literal> that highlights Java string
- literals, which cannot include line breaks:</para>
- <programlisting><SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE">
- <BEGIN>"</BEGIN>
- <END>"</END>
- </SPAN></programlisting>
- <para>Here is a <literal>SPAN</literal> that highlights Java
- documentation comments by delegating to the <quote>JAVADOC</quote>
- ruleset defined elsewhere in the current mode:</para>
- <programlisting><SPAN TYPE="COMMENT2" DELEGATE="JAVADOC">
- <BEGIN>/**</BEGIN>
- <END>*/</END>
- </SPAN></programlisting>
- <para>Here is a <literal>SPAN</literal> that highlights HTML cascading
- stylesheets inside <literal><STYLE></literal> tags by delegating
- to the main ruleset in the CSS edit mode:</para>
- <programlisting><SPAN TYPE="MARKUP" DELEGATE="css::MAIN">
- <BEGIN>&lt;style&gt;</BEGIN>
- <END>&lt;/style&gt;</END>
- </SPAN></programlisting>
- </sect1>
- <sect1 id="mode-rule-span-regexp">
- <title>The SPAN_REGEXP Tag</title>
- <para>The <literal>SPAN_REGEXP</literal> rule is similar to the
- <literal>SPAN</literal> rule except the start sequence is taken to be a
- regular expression. In addition to the attributes supported by the
- <literal>SPAN</literal> tag, the following attributes are
- supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>HASH_CHAR</literal> - a literal string which must
- be at the start of a regular expression.</para>
- </listitem>
- <listitem>
- <para><literal>HASH_CHARS</literal> - a list of possible literal
- characters, one of which must match at the start of the regular
- expression.</para>
- </listitem>
- </itemizedlist>
- <para><literal>HASH_CHAR</literal> and <literal>HASH_CHARS</literal>
- attributes are both optional, but you may only specify one, not both. If
- both are specified, <literal>HASH_CHARS</literal> is ignored and an
- error is shown. Whenever possible, use a literal prefix to specify a
- <literal>SPAN_REGEXP</literal>. If the starting prefix is always the
- same, use <literal>HASH_CHAR</literal> and provide as much prefix as
- possible. Only in rare cases would you omit both attributes, such as the
- case where there is no other reliable way to get the highlighting you
- need, for example, with comments in the Cobol programming
- language.</para>
- <para>The regular expression match cannot span more than one line. Any
- text matched by groups in the <literal>BEGIN</literal> regular
- expression is substituted in the <literal>END</literal> string. See
- below for an example of where this is useful.</para>
- <para>Regular expression syntax is described in <xref
- linkend="regexps" />.</para>
- <para>Here is a <literal>SPAN_REGEXP</literal> rule that highlights
- <quote>read-ins</quote> in shell scripts:</para>
- <programlisting><SPAN_REGEXP HASH_CHAR="&lt;" TYPE="LITERAL1" DELEGATE="LITERAL">
- <BEGIN><![CDATA[<<[\p{Space}'"]*([\p{Alnum}_]+)[\p{Space}'"]*]]></BEGIN>
- <END>$1</END>
- </SPAN_REGEXP></programlisting>
- <para>Here is a <literal>SPAN_REGEXP</literal> rule that highlights
- constructs placed between <literal><#ftl</literal> and
- <literal>></literal>, as long as the <literal><#ftl</literal> is
- followed by a word break:</para>
- <programlisting><SPAN_REGEXP TYPE="KEYWORD1" HASH_CHAR="&lt;" DELEGATE="EXPRESSION">
- <BEGIN>&lt;#ftl\b</BEGIN>
- <END>&gt;</END>
- </SPAN_REGEXP></programlisting>
- </sect1>
- <sect1 id="mode-rule-eol-span">
- <title>The EOL_SPAN Tag</title>
- <para>An <literal>EOL_SPAN</literal> is similar to a
- <literal>SPAN</literal> except that highlighting stops at the end of the
- line, and no end sequence needs to be specified. The text to match is
- specified between the opening and closing <literal>EOL_SPAN</literal>
- tags. The following attributes are supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>TYPE</literal> - The token type to highlight the
- span with. See <xref linkend="mode-syntax-tokens" /> for a list
- of token types.</para>
- </listitem>
- <listitem>
- <para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if
- the start sequence occurs at the beginning of a line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if
- the sequence is the first non-whitespace text in the
- line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the span will only be highlighted if
- the start sequence occurs at the beginning of a word.</para>
- </listitem>
- <listitem>
- <para><literal>DELEGATE</literal> - text inside the span will be
- highlighted with the specified ruleset. To delegate to a ruleset
- defined in the current mode, just specify its name. To delegate
- to a ruleset defined in another mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para>
- </listitem>
- <listitem>
- <para><literal>MATCH_TYPE</literal> - Controls how the start of
- the sequence will be highlighted. See <xref
- linkend="mode-match-type" /> for more information.</para>
- </listitem>
- </itemizedlist>
- <para>Here is an <literal>EOL_SPAN</literal> that highlights C++
- comments:</para>
- <programlisting><EOL_SPAN TYPE="COMMENT1">//</EOL_SPAN></programlisting>
- </sect1>
- <sect1 id="mode-rule-eol-span-regexp">
- <title>The EOL_SPAN_REGEXP Tag</title>
- <para>The <literal>EOL_SPAN_REGEXP</literal> rule is similar to the
- <literal>EOL_SPAN</literal> rule except the match sequence is taken to
- be a regular expression. In addition to the attributes supported by the
- <literal>EOL_SPAN</literal> tag, the following attributes are
- supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>HASH_CHAR</literal> - a literal string which must
- be at the start of a regular expression.</para>
- </listitem>
- <listitem>
- <para><literal>HASH_CHARS</literal> - a list of possible literal
- characters, one of which must match at the start of the regular
- expression.</para>
- </listitem>
- </itemizedlist>
- <para><literal>HASH_CHAR</literal> and <literal>HASH_CHARS</literal>
- attributes are both optional, but you may only specify one, not both. If
- both are specified, <literal>HASH_CHARS</literal> is ignored and an
- error is shown. Whenever possible, use a literal prefix to specify a
- <literal>EOL_SPAN_REGEXP</literal>. If the starting prefix is always the
- same, use <literal>HASH_CHAR</literal> and provide as much prefix as
- possible. Only in rare cases would you omit both attributes, such as the
- case where there is no other reliable way to get the highlighting you
- need, for example, with comments in the Cobol programming
- language.</para>
- <para>The regular expression match cannot span more than one
- line.</para>
- <para>Regular expression syntax is described in <xref
- linkend="regexps" />.</para>
- <para>Here is an <literal>EOL_SPAN_REGEXP</literal> that highlights
- MS-DOS batch file comments, which start with <literal>REM</literal>,
- followed by any whitespace character, and extend until the end of the
- line:</para>
- <programlisting><EOL_SPAN_REGEXP AT_WHITESPACE_END="TRUE" HASH_CHAR="REM" TYPE="COMMENT1">REM\s</EOL_SPAN_REGEXP></programlisting>
- </sect1>
- <sect1 id="mode-rule-mark-prev">
- <title>The MARK_PREVIOUS Tag</title>
- <para>The <literal>MARK_PREVIOUS</literal> rule, which must be placed
- inside a <literal>RULES</literal> tag, highlights from the end of the
- previous syntax token to the matched text. The text to match is
- specified between opening and closing <literal>MARK_PREVIOUS</literal>
- tags. The following attributes are supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>TYPE</literal> - The token type to highlight the
- text with. See <xref linkend="mode-syntax-tokens" /> for a list
- of token types.</para>
- </listitem>
- <listitem>
- <para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it occurs at the beginning of a line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it is the first non-whitespace text in the line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it occurs at the beginning of a word.</para>
- </listitem>
- <listitem>
- <para><literal>MATCH_TYPE</literal> - Controls how the matched
- region will be highlighted. See <xref
- linkend="mode-match-type" /> for more information.</para>
- </listitem>
- </itemizedlist>
- <para>Here is a rule that highlights labels in Java mode (for example,
- <quote>XXX:</quote>):</para>
- <programlisting><MARK_PREVIOUS AT_WHITESPACE_END="TRUE"
- MATCH_TYPE="DEFAULT">:</MARK_PREVIOUS></programlisting>
- </sect1>
- <sect1 id="mode-rule-mark-following">
- <title>The MARK_FOLLOWING Tag</title>
- <para>The <literal>MARK_FOLLOWING</literal> rule, which must be placed
- inside a <literal>RULES</literal> tag, highlights from the start of the
- match to the next syntax token. The text to match is specified between
- opening and closing <literal>MARK_FOLLOWING</literal> tags. The
- following attributes are supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>TYPE</literal> - The token type to highlight the
- text with. See <xref linkend="mode-syntax-tokens" /> for a list
- of token types.</para>
- </listitem>
- <listitem>
- <para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it occurs at the beginning of a line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it is the first non-whitespace text in the line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it occurs at the beginning of a word.</para>
- </listitem>
- <listitem>
- <para><literal>MATCH_TYPE</literal> - Controls how the matched
- region will be highlighted. See <xref
- linkend="mode-match-type" /> for more information.</para>
- </listitem>
- </itemizedlist>
- <para>Here is a rule that highlights variables in Unix shell scripts
- (<quote>$CLASSPATH</quote>, <quote>$IFS</quote>, etc):</para>
- <programlisting><MARK_FOLLOWING TYPE="KEYWORD2">$</MARK_FOLLOWING></programlisting>
- </sect1>
- <sect1 id="mode-rule-seq">
- <title>The SEQ Tag</title>
- <para>The <literal>SEQ</literal> rule, which must be placed inside a
- <literal>RULES</literal> tag, highlights fixed sequences of text. The
- text to highlight is specified between opening and closing
- <literal>SEQ</literal> tags. The following attributes are
- supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>TYPE</literal> - the token type to highlight the
- sequence with. See <xref linkend="mode-syntax-tokens" /> for a
- list of token types.</para>
- </listitem>
- <listitem>
- <para><literal>AT_LINE_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it occurs at the beginning of a line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WHITESPACE_END</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it is the first non-whitespace text in the line.</para>
- </listitem>
- <listitem>
- <para><literal>AT_WORD_START</literal> - If set to
- <literal>TRUE</literal>, the sequence will only be highlighted
- if it occurs at the beginning of a word.</para>
- </listitem>
- <listitem>
- <para><literal>DELEGATE</literal> - if this attribute is
- specified, all text after the sequence will be highlighted using
- this ruleset. To delegate to a ruleset defined in the current
- mode, just specify its name. To delegate to a ruleset defined in
- another mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para>
- </listitem>
- </itemizedlist>
- <para>The following rules highlight a few Java operators:</para>
- <programlisting><SEQ TYPE="OPERATOR">+</SEQ>
- <SEQ TYPE="OPERATOR">-</SEQ>
- <SEQ TYPE="OPERATOR">*</SEQ>
- <SEQ TYPE="OPERATOR">/</SEQ></programlisting>
- </sect1>
- <sect1 id="mode-rule-seq-regexp">
- <title>The SEQ_REGEXP Tag</title>
- <para>The <literal>SEQ_REGEXP</literal> rule is similar to the
- <literal>SEQ</literal> rule except the match sequence is taken to be a
- regular expression. In addition to the attributes supported by the
- <literal>SEQ</literal> tag, the following attributes are
- supported:</para>
- <itemizedlist>
- <listitem>
- <para><literal>HASH_CHAR</literal> - a literal string which must
- be at the start of a regular expression.</para>
- </listitem>
- <listitem>
- <para><literal>HASH_CHARS</literal> - a list of possible literal
- characters, one of which must match at the start of the regular
- expression.</para>
- </listitem>
- </itemizedlist>
- <para><literal>HASH_CHAR</literal> and <literal>HASH_CHARS</literal>
- attributes are both optional, but you may only specify one, not both. If
- both are specified, <literal>HASH_CHARS</literal> is ignored and an
- error is shown. Whenever possible, use a literal prefix to specify a
- <literal>SEQ_REGEXP</literal>. If the starting prefix is always the
- same, use <literal>HASH_CHAR</literal> and provide as much prefix as
- possible. Only in rare cases would you omit both attributes, such as the
- case where there is no other reliable way to get the highlighting you
- need, for example, with comments in the Cobol programming
- language.</para>
- <para>The regular expression match cannot span more than one
- line.</para>
- <para>Regular expression syntax is described in <xref
- linkend="regexps" />.</para>
- <para><emphasis role="bold">NOTE</emphasis>: c-style character escaping
- for literals (such as the tab char: \t) do not work as attribute values
- in XML. Use the XML character entity instead. For example: &#09;
- instead of \t.</para>
- <para>Here is a <literal>SEQ_REGEXP</literal> rule from moin.xml that
- uses the <literal>HASH_CHARS</literal> attribute, to describe a keyword
- (wikiword) that can start with any uppercase letter and contain lower
- case letters and at least one uppercase letter in the middle.</para>
- <programlisting>
- <SEQ_REGEXP HASH_CHARS="ABCDEFGHIJKLMNOPQRSTUVWXYZ" AT_WORD_START="TRUE" TYPE="KEYWORD2">[A-Z][a-z]+[A-Z][a-zA-Z]+</SEQ_REGEXP>
- </programlisting>
- </sect1>
- <sect1 id="mode-rule-import">
- <title>The IMPORT Tag</title>
- <para>The <literal>IMPORT</literal> tag, which must be placed inside a
- <literal>RULES</literal> tag, loads all rules defined in a given ruleset
- into the current ruleset; in other words, it has the same effect as
- copying and pasting the imported ruleset.</para>
- <para>The only required attribute <literal>DELEGATE</literal> must be
- set to the name of a ruleset. To import a ruleset defined in the current
- mode, just specify its name. To import a ruleset defined in another
- mode, specify a name of the form
- <literal><replaceable>mode</replaceable>::<replaceable>ruleset</replaceable></literal>.
- Note that the first (unnamed) ruleset in a mode is called
- <quote>MAIN</quote>.</para>
- <para>One quirk is that the definition of the imported ruleset is not
- copied to the location of the <literal>IMPORT</literal> tag, but rather
- to the end of the containing ruleset. This has implications with
- rule-ordering; see <xref linkend="rule-ordering" />.</para>
- <para>Here is an example from the PHP mode, which extends the inline
- JavaScript highlighting to support embedded PHP:</para>
- <programlisting>
- <RULES SET="JAVASCRIPT+PHP">
- <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
- <BEGIN>&lt;?php</BEGIN>
- <END>?&gt;</END>
- </SPAN>
- <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
- <BEGIN>&lt;?</BEGIN>
- <END>?&gt;</END>
- </SPAN>
- <SPAN TYPE="MARKUP" DELEGATE="php::PHP">
- <BEGIN>&lt;%=</BEGIN>
- <END>%&gt;</END>
- </SPAN>
- <IMPORT DELEGATE="javascript::MAIN"/>
- </RULES></programlisting>
- </sect1>
- <sect1 id="mode-rule-keywords">
- <title>The KEYWORDS Tag</title>
- <para>The <literal>KEYWORDS</literal> tag, which must be placed inside a
- <literal>RULES</literal> tag and can only appear once, specifies a list
- of keywords to highlight. Keywords are similar to
- <literal>SEQ</literal>s, except that <literal>SEQ</literal>s match
- anywhere in the text, whereas keywords only match whole words. Words are
- considered to be runs of text separated by non-alphanumeric
- characters.</para>
- <para>The <literal>KEYWORDS</literal> tag does not define any
- attributes.</para>
- <para>Each child element of the <literal>KEYWORDS</literal> tag is an
- element whose name is a token type, and whose content is the keyword to
- highlight. For example, the following rule highlights the most common
- Java keywords:</para>
- <programlisting><KEYWORDS>
- <KEYWORD1>if</KEYWORD1>
- <KEYWORD1>else</KEYWORD1>
- <KEYWORD3>int</KEYWORD3>
- <KEYWORD3>void</KEYWORD3>
- </KEYWORDS></programlisting>
- </sect1>
- <sect1 id="mode-syntax-tokens">
- <title>Token Types</title>
- <para>Parser rules can highlight tokens using any of the following token
- types:</para>
- <itemizedlist>
- <listitem>
- <para><literal>NULL</literal> - no special highlighting is
- performed on tokens of type <literal>NULL</literal></para>
- </listitem>
- <listitem>
- <para><literal>COMMENT1</literal></para>
- </listitem>
- <listitem>
- <para><literal>COMMENT2</literal></para>
- </listitem>
- <listitem>
- <para><literal>COMMENT3</literal></para>
- </listitem>
- <listitem>
- <para><literal>COMMENT4</literal></para>
- </listitem>
- <listitem>
- <para><literal>FUNCTION</literal></para>
- </listitem>
- <listitem>
- <para><literal>INVALID</literal><!-- - tokens of this type are
- automatically added if a <literal>NO_WORD_BREAK</literal> or
- <literal>NO_LINE_BREAK</literal> <literal>SPAN</literal> spans more than
- one word or line, respectively. --></para>
- </listitem>
- <listitem>
- <para><literal>KEYWORD1</literal></para>
- </listitem>
- <listitem>
- <para><literal>KEYWORD2</literal></para>
- </listitem>
- <listitem>
- <para><literal>KEYWORD3</literal></para>
- </listitem>
- <listitem>
- <para><literal>KEYWORD4</literal></para>
- </listitem>
- <listitem>
- <para><literal>LABEL</literal></para>
- </listitem>
- <listitem>
- <para><literal>LITERAL1</literal></para>
- </listitem>
- <listitem>
- <para><literal>LITERAL2</literal></para>
- </listitem>
- <listitem>
- <para><literal>LITERAL3</literal></para>
- </listitem>
- <listitem>
- <para><literal>LITERAL4</literal></para>
- </listitem>
- <listitem>
- <para><literal>MARKUP</literal></para>
- </listitem>
- <listitem>
- <para><literal>OPERATOR</literal></para>
- </listitem>
- </itemizedlist>
- </sect1>
- <sect1 id="mode-match-type">
- <title>The MATCH_TYPE Attribute</title>
- <para>The <literal>MATCH_TYPE</literal> attribute is used by some of the
- rules to control how the region matched by the rule will be
- highlighted.</para>
- <para>For example, when using a <literal>MARK_PREVIOUS</literal> rule to
- highlight a function call of the form <literal>fcall()</literal>, the
- following rule could be used:</para>
- <programlisting>
- <MARK_PREVIOUS TYPE="FUNCTION" MATCH_TYPE="OPERATOR">(</MARK_PREVIOUS></programlisting>
- <para>This would cause <literal>fcall</literal> to be highlighted as
- <literal>FUNCTION</literal>, and <literal>(</literal> to be highlighted
- as <literal>OPERATOR</literal>. In this case, to maintain bracket
- matching working, a <literal>SEQ</literal> rule would have to be added
- to match <literal>)</literal> and mark it as
- <literal>OPERATOR</literal>.</para>
- <para>The <literal>MATCH_TYPE</literal> attribute value can be any of
- the valid token types, or the following special values:</para>
- <itemizedlist>
- <listitem>
- <para><literal>RULE</literal>: this is the default value. It
- tells the syntax system to use the same token type as the TYPE
- attribute of the rule. This is equivalent to
- <literal>EXCLUDE_MATCH="FALSE"</literal> in 4.2 and earlier mode
- files.</para>
- </listitem>
- <listitem>
- <para><literal>CONTEXT</literal>: using this value tells the
- syntax system to mark the matched region using the default token
- type for the current rule set. In 4.2 and earlier mode files,
- this was specified by
- <literal>EXCLUDE_MATCH="TRUE"</literal>.</para>
- </listitem>
- </itemizedlist>
- </sect1>
- </chapter>