/Src/Dependencies/Boost/libs/spirit/classic/doc/basic_concepts.html
http://hadesmem.googlecode.com/ · HTML · 354 lines · 347 code · 7 blank · 0 comment · 0 complexity · 57a994d176b071b3be503ee2135f8967 MD5 · raw file
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
- <html>
- <head>
- <meta content=
- "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org"
- name="generator">
- <title>
- Basic Concepts
- </title>
- <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
- <link rel="stylesheet" href="theme/style.css" type="text/css">
- </head>
- <body>
- <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
- <tr>
- <td width="10"></td>
- <td width="85%">
- <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Basic
- Concepts</b></font>
- </td>
- <td width="112">
- <a href="http://spirit.sf.net"><img src="theme/spirit.gif"
- width="112" height="48" align="right" border="0"></a>
- </td>
- </tr>
- </table><br>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30">
- <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
- </td>
- <td width="30">
- <a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a>
- </td>
- <td width="30">
- <a href="organization.html"><img src="theme/r_arr.gif" border="0">
- </a>
- </td>
- </tr>
- </table>
- <p>
- There are a few fundamental concepts that need to be understood well: 1)
- The <strong>Parser</strong>, 2) <strong>Match</strong>, 3) The
- <strong>Scanner</strong>, and 4) <strong>Semantic Actions</strong>. These
- basic concepts interact with one another, and the functionalities of each
- interweave throughout the framework to make it one coherent whole.
- </p>
- <table width="48%" border="0" align="center">
- <tr>
- <td height="211">
- <img src="theme/intro1.png">
- </td>
- </tr>
- </table>
- <h2>
- The Parser
- </h2>
- <p>
- Central to the framework is the parser. The parser does the actual work
- of recognizing a linear input stream of data read sequentially from start
- to end by the scanner. The parser attempts to match the input following a
- well-defined set of specifications known as grammar rules. The parser
- reports the success or failure to its client through a match object. When
- successful, the parser calls a client-supplied semantic action. Finally,
- the semantic action extracts structural information depending on the data
- passed by the parser and the hierarchical context of the parser it is
- attached to.
- </p>
- <p>
- Parsers come in different flavors. The Spirit framework comes bundled
- with an extensive set of pre-defined parsers that perform various parsing
- tasks from the trivial to the complex. The parser, as a concept, has a
- public conceptual interface contract. Following the contract, anyone can
- write a conforming parser that will play along well with the framework's
- predefined components. We shall provide a blueprint detailing the
- conceptual interface of the parser later.
- </p>
- <p>
- Clients of the framework generally do not need to write their own
- hand-coded parsers at all. Spirit has an immense repertoire of
- pre-defined parsers covering all aspects of syntax and semantic analysis.
- We shall examine this repertoire of parsers in the following sections. In
- the rare case where a specific functionality is not available, it is
- extremely easy to write a user-defined parser. The ease in writing a
- parser entity is the main reason for Spirit's extensibility.
- </p>
- <h2>
- Primitives and Composites
- </h2>
- <p>
- Spirit parsers fall into two categories: <b>primitives</b> and
- <b>composites</b>. These two categories are more or less synonymous to
- terminals and non-terminals in parsing lingo. Primitives are
- non-decomposable atomic units. Composites on the other hand are parsers
- that are composed of other parsers which can in turn be a primitive or
- another composite. To illustrate, consider the Spirit expression:
- </p>
- <pre><code><font color="#000000"> </font></code><code><span class="identifier">real_p</span> <span class=
- "special">>></span> <span class="special">*(</span><span class=
- "literal">','</span> <span class="special">>></span> <span class=
- "identifier">real_p</span><span class="special">)</span></code>
- </pre>
- <p>
- <tt><tt>real_p</tt></tt> is a primitive parser that can parse real
- numbers. The quoted comma <tt class="quotes">','</tt> in the expression
- is a shortcut and is equivalent to <tt>ch_p<span class=
- "operators">(</span><span class="quotes">','</span><span class=
- "operators">)</span></tt>, which is another primitive parser that
- recognizes single characters.
- </p>
- <p>
- The expression above corresponds to the following parse tree:
- </p>
- <table width="29%" border="0" align="center">
- <tr>
- <td>
- <img src="theme/intro7.png">
- </td>
- </tr>
- </table>
- <p>
- The expression:
- </p>
- <pre><code><font color="#000000"> </font></code><span class=
- "literal">','</span> <span class="special">>></span> <span class=
- "identifier">real_p</span>
- </pre>
- <p>
- composes a <b>sequence</b> parser. The <tt>sequence</tt> parser is a
- composite parser comprising two parsers: the one on its left hand side
- (lhs), <tt>ch_p<span class="operators">(</span><span class=
- "quotes">','</span><span class="operators">)</span></tt> ; and the other
- on its right hand side (rhs), <tt>real_p</tt>. This composite parser,
- when called, calls its lhs and rhs in sequence and reports a successful
- match only if both are successful.
- </p>
- <table width="14%" border="0" align="center">
- <tr>
- <td>
- <img src="theme/intro2.png">
- </td>
- </tr>
- </table>
- <p>
- The <tt>sequence</tt> parser is a binary composite. It is composed of two
- parsers. There are unary composites as well. Unary composites hold only a
- single subject. Like the binary composite, the unary composite may change
- the behavior of its embedded subject. One particular example is the
- <b>Kleene star</b>. The Kleene star, when called to parse, calls its sole
- subject zero or more times. "Zero or more" implies that the Kleene star
- always returns a successful match, possibly matching the null string: "".
- </p>
- <p>
- The expression:
- </p>
- <pre><code><font color="#000000"> </font></code><code><span class=
- "special">*(</span><span class="literal">','</span> <span class=
- "special">>></span> <span class=
- "identifier">real_p</span><span class="special">)</span></code>
- </pre>
- <p>
- wraps the whole sequence composite above inside a <tt>kleene_star</tt>.
- </p>
- <table width="17%" border="0" align="center">
- <tr>
- <td>
- <img src="theme/intro3.png">
- </td>
- </tr>
- </table>
- <p>
- Finally, the full expression composes a <tt>real_p</tt> primitive parser
- and the <tt>kleene_star</tt> we have above into another higher level
- <tt>sequence</tt> parser composite.
- </p>
- <table width="34%" border="0" align="center">
- <tr>
- <td>
- <img src="theme/intro4.png">
- </td>
- </tr>
- </table>
- <p>
- A few simple classes, when composed and structured in a hierarchy, form a
- very powerful object-oriented recursive-descent parsing engine. These
- classes provide the infrastructure needed for the construction of
- more-complex parsers. The final parser composite is a non-deterministic
- recursive-descent parser with infinite look-ahead.
- </p>
- <p>
- Top-down descent traverses the hierarchy. The outer <tt>sequence</tt>
- calls the leftmost <tt>real_p</tt> parser. If successful, the
- <tt>kleene_star</tt> is called next. The <tt>kleene_star</tt> calls the
- inner <tt>sequence</tt> repeatedly in a loop until it fails to match, or
- the input is exhausted. Inside, <tt>ch_p(',')</tt> and then
- <tt>real_p</tt> are called in sequence. The following diagram illustrates
- what is happening, somewhat reminiscent of Pascal syntax diagrams.
- </p>
- <table width="37%" border="0" align="center">
- <tr>
- <td>
- <img src="theme/intro5.png">
- </td>
- </tr>
- </table>
- <p>
- The flexibility of object embedding and composition combined with
- recursion opens up a unique approach to parsing. Subclasses are free to
- form aggregates and algorithms of arbitrary complexity. Complex parsers
- can be created with the composition of only a few primitive classes.
- </p>
- <p>
- The framework is designed to be fully open-ended and extensible. New
- primitives or composites, from the trivial to the complex, may be added
- any time. Composition happens (statically) at compile time. This is
- possible through the expressive flexibility of C++ expression templates
- and template meta-programming.
- </p>
- <p>
- The result is a composite composed of primitives and smaller composites.
- This embedding strategy gives us the ability to build hierarchical
- structures that fully model EBNF expressions of arbitrary complexity.
- Later on, we shall see more primitive and composite building blocks.
- </p>
- <h2>
- The Scanner
- </h2>
- <p>
- Like the parser, the scanner is also an abstract concept. The task of the
- scanner is to feed the sequential input data stream to the parser. The
- scanner is composed of two STL conforming forward iterators, first and
- last, where first is held by reference and last, by value. The first
- iterator is held by reference to allow re-positioning by the parser. A
- set of policies governs how the scanner behaves. Parsers extract data
- from the scanner and position the iterator appropriately through its
- member functions.
- </p>
- <p>
- Knowledge of the intricacies of these policies is not required at all in
- most cases. However, knowledge of the scanner's basic API is required to
- write fully-conforming Spirit parsers. The scanner's API will be outlined
- in a separate section. In addition, for the power users and the
- adventurous among us, a full section will be devoted to covering the
- scanner policies. The scanner policies make Spirit very flexible and
- extensible. For instance, some of the policies may be modified to filter
- data. A practical example is a scanner policy that does not distinguish
- upper and lower case whereby making it useful for parsing case
- insensitive input. Another example is a scanner policy that strips white
- spaces from the input.
- </p>
- <h2>
- The Match
- </h2>
- <p>
- The parser has a conceptual parse member function taking in a scanner and
- returning a match object. The primary function of the match object is to
- report parsing success (or failure) back to the parser's caller; i.e., it
- evaluates to true if the parse function is successful, false otherwise.
- If the parse is successful, the match object may also be queried to
- report the number of characters matched (using <tt>match.length()</tt>).
- The length is non-negative if the match is successful, and the typical
- length of a parse failure is -1. A zero length is perfectly valid and
- still represents a successful match.
- </p>
- <p>
- Parsers may have attribute data associated with it. For example, the
- real_p parser has a numeric datum associated with it. This attribute is
- the parsed number. This attribute is passed on to the returned match
- object. The match object may be queried to get this attribute. This datum
- is valid only when the match is successful.
- </p>
- <h2>
- Semantic Actions
- </h2>
- <p>
- A composite parser forms a hierarchy. Parsing proceeds from the topmost
- parent parser which delegates and apportions the parsing task to its
- children recursively to its children's children and so on until a
- primitive is reached. By attaching semantic actions to various points in
- this hierarchy, in effect we can transform the flat linear input stream
- into a structured representation. This is essentially what parsers do.
- </p>
- <p>
- Recall our example above:
- </p>
- <pre><code><font color="#000000"> </font></code><code><span class=
- "identifier">real_p</span> <span class=
- "special">>></span> <span class="special">*(</span><span class=
- "literal">','</span> <span class="special">>></span> <span class=
- "identifier">real_p</span><span class="special">)</span></code>
- </pre>
- <p>
- By hooking a function (or functor) into the real_p parsers, we can
- extract the numbers from the input:
- </p>
- <pre><code><font color="#000000"> </font></code><span class=
- "identifier">real_p</span><span class="special">[&</span><span class=
- "identifier">f</span><span class="special">]</span> <span class=
- "special">>></span> <span class="special">*(</span><span class=
- "literal">','</span> <span class="special">>></span> <span class=
- "identifier">real_p</span><span class="special">[&</span><span class=
- "identifier">f</span><span class="special">])</span>
- </pre>
- <table width="41%" border="0" align="center">
- <tr>
- <td>
- <img src="theme/intro6.png">
- </td>
- </tr>
- </table>
- <p> where <tt>f</tt> is a function that takes in a single argument. The <tt><span class="operators">[&</span>f<span class=
- "operators">]</span></tt> hooks the parser with the function such that when
- <tt>real_p</tt> recognizes a valid number, the function <tt>f</tt> is called.
- It is up to the function then to do what is appropriate. For example, it can
- stuff the numbers in a vector. Or perhaps, if the grammar is changed slightly
- by replacing <tt class="quotes">','</tt> with <tt class="quotes">'+'</tt>, then
- we have a primitive calculator that computes sums. The function <tt>f</tt> then
- can then be made to add all incoming numbers.<br>
- </p>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30">
- <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
- </td>
- <td width="30">
- <a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a>
- </td>
- <td width="30">
- <a href="organization.html"><img src="theme/r_arr.gif" border="0">
- </a>
- </td>
- </tr>
- </table><br>
- <hr size="1">
- <p class="copyright">
- Copyright © 1998-2003 Joel de Guzman<br>
- <br>
- <font size="2">Use, modification and distribution is subject to the Boost
- Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or
- copy at http://www.boost.org/LICENSE_1_0.txt)</font>
- </p>
- <p>
- </p>
- </body>
- </html>