PageRenderTime 56ms CodeModel.GetById 21ms RepoModel.GetById 1ms app.codeStats 0ms

/lib/antlr-2.7.5/doc/trees.html

https://github.com/boo/boo-lang
HTML | 957 lines | 912 code | 45 blank | 0 comment | 0 complexity | d07ad4c774d557c2e66fad1f73e4039c MD5 | raw file
Possible License(s): GPL-2.0
  1. <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
  2. <html>
  3. <head>
  4. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  5. <title>ANTLR Tree Construction</title>
  6. </head>
  7. <body bgcolor="#FFFFFF" text="#000000">
  8. <h1><a name="_bb1">ANTLR Tree Construction</a></h1>
  9. <p>
  10. ANTLR helps you build intermediate form trees, or abstract syntax trees
  11. (ASTs), by providing grammar annotations that indicate what tokens are to
  12. be treated as subtree roots, which are to be leaves, and which are to be
  13. ignored with respect to tree construction.&nbsp; As with PCCTS 1.33, you
  14. may manipulate trees using tree grammar actions.
  15. </p>
  16. <p>
  17. It is often the case that programmers either have existing
  18. tree definitions or need a special physical structure, thus,
  19. preventing ANTLR from specifically defining the implementation
  20. of AST nodes. ANTLR specifies only an interface describing
  21. minimum behavior. Your tree implementation must implement this
  22. interface so ANTLR knows how to work with your trees. Further,
  23. you must tell the parser the name of your tree nodes or
  24. provide a tree &quot;factory&quot; so that ANTLR knows how to
  25. create nodes with the correct type (rather than hardcoding in
  26. a <tt>new AST()</tt> expression everywhere). &nbsp; ANTLR can
  27. construct and walk any tree that satisfies the AST
  28. interface.&nbsp; A number of common tree definitions are
  29. provided. Unfortunately, ANTLR cannot parse XML DOM trees since our
  30. method names conflict (e.g., <tt>getFirstChild()</tt>); ANTLR was here
  31. first &lt;wink>. Argh!
  32. </p>
  33. <h2><a name="_bb2"></a><a name="Notation">Notation</a></h2>
  34. <p>
  35. In this and other documents, tree structures are represented by a
  36. LISP-like notation, for example:
  37. </p>
  38. <pre><tt>#(A B C)</tt></pre>
  39. <p>
  40. is a tree with A at the root, and children B and C. This notation can be
  41. nested to describe trees of arbitrary structure, for example:
  42. </p>
  43. <pre><tt>#(A B #(C D E))</tt></pre>
  44. <p>
  45. is a tree with A at the root, B as a first child, and an entire subtree
  46. as the second child. The subtree, in turn, has C at the root and D,E as
  47. children.
  48. </p>
  49. <h2><a name="_bb3"></a><a name="Controlling AST construction">Controlling AST construction</a></h2>
  50. <p>
  51. AST construction in an ANTLR Parser, or AST transformation in a
  52. Tree-Parser, is turned on and off by the <a href="options.html#buildAST">
  53. <tt>buildAST</tt> option</a>.
  54. </p>
  55. <p>
  56. From an AST construction and walking point of view, ANTLR considers all
  57. tree nodes to look the same (i.e., they appear to be homogeneous).&nbsp;
  58. Through a tree factory or by specification, however, you can instruct ANTLR
  59. to create nodes of different types. &nbsp; See the section below on
  60. heterogeneous trees.
  61. </p>
  62. <h2><a name="_bb4"></a><a name="Grammar annotations for building ASTs">Grammar annotations for building ASTs</a></h2> <h3><a name="_bb5"></a><a name="Leaf nodes">Leaf nodes</a></h3>
  63. <p>
  64. ANTLR assumes that any nonsuffixed token reference or token-range is a
  65. leaf node in the resulting tree for the enclosing rule. If no suffixes at
  66. all are specified in a grammar, then a Parser will construct a linked-list
  67. of the tokens (a degenerate AST), and a Tree-Parser will copy the input
  68. AST.
  69. </p>
  70. <h3><a name="_bb6"></a><a name="Root nodes">Root nodes</a></h3>
  71. <p>
  72. Any token suffixed with the &quot;<tt>^</tt>&quot; operator is
  73. considered a root token. A tree node is constructed for that token and is
  74. made the root of whatever portion of the tree has been built
  75. </p>
  76. <pre><tt>a : A B^ C^ ;</tt></pre>
  77. <p>
  78. results in tree <tt>#(C #(B A))</tt>.
  79. </p>
  80. <p>
  81. First A is matched and made a lonely child, followed by B which is made the parent of the current tree, A. Finally, C is matched and made the parent of the current tree, making it the parent of the B node. Note that the same rule without any operators results in the flat tree <tt>A B C</tt>.
  82. </p>
  83. <h3><a name="_bb7"></a><a name="Turning off standard tree construction">Turning off standard tree construction</a></h3>
  84. <p>
  85. Suffix a token reference with &quot;<tt>!</tt>&quot; to prevent
  86. incorporation of the node for that token into the resulting tree (the AST
  87. node for the token is still constructed and may be referenced in actions,
  88. it is just not added to the result tree automatically). Suffix a rule
  89. reference &quot;<tt>!</tt>&quot; to indicate that the tree constructed by
  90. the invoked rule should not be linked into the tree constructed for the
  91. current rule.
  92. </p>
  93. <p>
  94. Suffix a rule definition with &quot;<tt>!</tt>&quot; to indicate that
  95. tree construction for the rule is to be turned off. Rules and tokens
  96. referenced within that rule still create ASTs, but they are not linked into
  97. a result tree. The following rule does no automatic tree construction.
  98. Actions must be used to set the return AST value, for example:
  99. </p>
  100. <pre><tt>begin!
  101. : INT PLUS i:INT
  102. { #begin = #(PLUS INT i); }
  103. ;</tt></pre>
  104. <p>
  105. For finer granularity, prefix alternatives with &quot;<tt>!</tt>&quot;
  106. to shut off tree construction for that alternative only. This granularity
  107. is useful, for example, if you have a large number of alternatives and you
  108. only want one to have manual tree construction:
  109. </p>
  110. <pre><tt>stat:
  111. ID EQUALS^ expr // auto construction
  112. ... some alternatives ...
  113. |! RETURN expr
  114. {#stat = #([IMAGINARY_TOKEN_TYPE] expr);}
  115. ... more alternatives ...
  116. ;</tt> </pre> <h3><a name="_bb8"></a><a name="Tree and tree node construction">Tree node construction</a></h3>
  117. <p>
  118. With automatic tree construction off (but with <code>buildAST</code>
  119. on), you must construct your own tree nodes and combine them into tree
  120. structures within embedded actions. There are several ways to create a tree
  121. node in an action:
  122. <ol>
  123. <li>
  124. use <tt>new <i>T</i>(<i>arg</i>)</tt> where <i>T</i> is your tree
  125. node type and <i>arg</i> is either a single token type, a token type and
  126. token text, or a <tt>Token</tt>.
  127. </li>
  128. <li>
  129. use <tt>ASTFactory.create(<i>arg</i>)</tt> where <i>T</i> is your
  130. tree node type and <i>arg</i> is either a single token type, a token type
  131. and token text, or a <tt>Token</tt>. Using the factory is more
  132. general than creating a new node directly, as it defers the node-type
  133. decision to the factory, and can be easily changed for the entire
  134. grammar.
  135. </li>
  136. <li>
  137. use the shorthand notation #[TYPE] or #[TYPE,&quot;text&quot;] or
  138. #[TYPE,&quot;text&quot;,ASTClassNameToConstruct]. The shorthand notation
  139. results in a call to ASTFactory.create() with any specified arguments.
  140. </li>
  141. <li>
  142. use the shorthand notation #<i>id</i>, where <i>id</i> is either a
  143. token matched in the rule, a label, or a rule-reference.
  144. </li>
  145. </ol>
  146. <p>
  147. To construct a tree structure from a set of nodes, you can set the
  148. first-child and next-sibling references yourself or call the factory
  149. <tt>make</tt> method or use <tt>#(...)</tt> notation described below.
  150. </p>
  151. <h3><a name="_bb9"></a><a name="ActionTranslation">AST Action Translation</a></h3>
  152. <p>
  153. In parsers and tree parsers with <tt>buildAST</tt> set to true, ANTLR
  154. will translate portions of user actions in order to make it easier to build
  155. ASTs within actions. In particular, the following constructs starting with
  156. '#' will be translated:
  157. <dl>
  158. <dt>
  159. <tt>#<i>label</i></tt>
  160. </dt>
  161. <dd>
  162. The AST associated with a labeled token-reference or rule-reference
  163. may be accessed as <tt>#<i>label</i></tt>. The translation is to a
  164. variable containing the AST node built from that token, or the AST
  165. returned from the rule.
  166. </dd>
  167. <dt>
  168. <tt>#<i>rule</i></tt>
  169. </dt>
  170. <dd>
  171. When <i>rule</i> is the name of the enclosing rule, ANTLR will
  172. translate this into the variable containing the result AST for the rule.
  173. This allows you to set the return AST for a rule or examine it from
  174. within an action. This can be used when AST generation is on or
  175. suppressed for the rule or alternate. For example:
  176. <pre><tt>r! : a:A { #r = #a; }</tt></pre>
  177. </dd>
  178. <dd>
  179. <font face="Times New Roman">Setting the return tree is very useful
  180. in combination with normal tree construction because you can have
  181. ANTLR do all the work of building a tree and then add an imaginary
  182. root node such as:</font>
  183. </dd>
  184. <dd>
  185. &nbsp;
  186. </dd>
  187. <dd>
  188. <pre><tt>decl : ( TYPE ID )+
  189. { #decl = #([DECL,&quot;decl&quot;], #decl); }
  190. ;</tt></pre>
  191. </dd>
  192. <dd>
  193. ANTLR allows you to assign to <tt>#rule</tt> anywhere within an
  194. alternative of the rule. ANTLR ensures that references of and
  195. assignments to <tt>#rule</tt> within an action force the parser's
  196. internal AST construction variables into a stable state. After you
  197. assign to <tt>#rule</tt>, the state of the parser's automatic AST
  198. construction variables will be set as if ANTLR had generated the tree
  199. rooted at <tt>#rule</tt>. For example, any children nodes added after
  200. the action will be added to the children of <tt>#rule</tt>.
  201. </dd>
  202. <dt>
  203. <tt>#<i>label</i>_in</tt>
  204. </dt>
  205. <dd>
  206. In a tree parser, the <b>input</b> AST associated with a labeled
  207. token reference or rule reference may be accessed as
  208. <tt>#<i>label</i>_in</tt>. The translation is to a variable containing the
  209. input-tree AST node from which the rule or token was extracted. Input
  210. variables are seldom used. You almost always want to use
  211. <tt>#<i>label</i></tt> instead of <tt>#<i>label</i>_in</tt>.
  212. </dd>
  213. <dt>
  214. &nbsp;
  215. </dt>
  216. <dt>
  217. <tt>#<i>id</i></tt>
  218. </dt>
  219. <dd>
  220. ANTLR supports the translation of unlabeled token references as a
  221. shorthand notation, as long as the token is unique within the scope
  222. of a single alternative. In these cases, the use of an unlabeled
  223. token reference identical to using a label. For example, this:
  224. <pre><tt>
  225. r! : A { #r = #A; }
  226. </tt></pre>
  227. <p>
  228. is equivalent to:
  229. </p>
  230. <pre><tt>
  231. r! : a:A { #r = #a; }</tt></pre>
  232. </dd>
  233. <dd>
  234. <tt>#<i>id</i>_in</tt> is given similar treatment to
  235. <tt>#<i>label</i>_in.</tt>
  236. </dd>
  237. <dt>
  238. &nbsp;
  239. </dt>
  240. <dt>
  241. <tt>#[<i>TOKEN_TYPE</i>]</tt> or <tt>#[<i>TOKEN_TYPE</i>,&quot;text&quot;] or #[TYPE,&quot;text&quot;,ASTClassNameToConstruct]</tt>
  242. </dt>
  243. <dd>
  244. AST node constructor shorthand. The translation is a call to the <tt>ASTFactory.create()</tt> method.&nbsp; For example, <tt>#[T]</tt> is translated to: <pre><tt>ASFFactory.create(T)</tt></pre>
  245. </dd>
  246. <dt>
  247. <tt>#(<i>root</i>, <i>c1</i>, ..., <i>cn</i>)</tt>
  248. </dt>
  249. <dd>
  250. AST tree construction shorthand. ANTLR looks for the comma character
  251. to separate the tree arguments. Commas within method call tree elements are
  252. handled properly; i.e., an element of &quot;<tt>foo(#a,34)</tt>&quot; is ok
  253. and will not conflict with the comma separator between the other tree
  254. elements in the tree. This tree construct is translated to a &quot;make
  255. tree&quot; call. The &quot;make-tree&quot; call is complex due to the need
  256. to simulate variable arguments in languages like Java, but the result will
  257. be something like: <pre><tt>ASTFactory.make(<i>root</i>, <i>c1</i>, ...,
  258. <i>cn</i>);</tt></pre>
  259. <p>
  260. In addition to the translation of the <tt>#(...)</tt> as a whole,
  261. the root and each child <tt><i>c1</i>..<i>cn</i></tt> will be translated.
  262. Within the context of a <tt>#(...)</tt> construct, you may use:
  263. <ul>
  264. <li>
  265. <i><tt>id</tt></i> or <i><tt>label</tt></i> as a shorthand for
  266. <tt>#<i>id</i></tt> or <i><tt>#label</tt></i>.
  267. </li>
  268. <li>
  269. <tt>[...]</tt> as a shorthand for <tt>#[...]</tt>.
  270. </li>
  271. <li>
  272. <tt>(...)</tt> as a shorthand for <tt>#(...)</tt>.
  273. </li>
  274. </ul>
  275. </dd>
  276. </dl>
  277. <p>
  278. The target code generator performs this translation with the help of a
  279. special lexer that parses the actions and asks the code-generator to create
  280. appropriate substitutions for each translated item. This lexer might impose
  281. some restrictions on label names (think of C/C++ preprocessor directives)
  282. </p>
  283. <h2><a name="_bb10"></a><a name="Invoking parsers that build trees">Invoking parsers that build trees</a></h2>
  284. <p>
  285. Assuming that you have defined a lexer <tt>L</tt> and a parser <tt>P</tt> in your grammar, you can invoke them sequentially on the system input stream as follows.
  286. </p>
  287. <pre><tt><i>L</i> lexer = new <i>L</i>(System.in);
  288. <i>P</i> parser = new <i>P</i>(lexer);
  289. parser.setASTNodeType(&quot;MyAST&quot;);
  290. parser.<i>startRule</i>();</tt> </pre>
  291. <p>
  292. If you have set <tt>buildAST=true</tt> in your parser grammar, then it will build an AST, which can be accessed via <tt>parser.getAST()</tt>. If you have defined a tree parser called <tt>T</tt>, you can invoke it with:
  293. </p>
  294. <pre><tt>T walker = new T();
  295. walker.<i>startRule</i>(parser.getAST()); // walk tree</tt> </pre>
  296. <p>
  297. If, in addition, you have set <tt>buildAST=true</tt> in your tree-parser to turn on transform mode, then you can access the resulting AST of the tree-walker:
  298. </p>
  299. <pre><tt>AST results = walker.getAST();
  300. DumpASTVisitor visitor = new DumpASTVisitor();
  301. visitor.visit(results);</tt></pre>
  302. <p>
  303. Where <tt>DumpASTVisitor</tt> is a predefined <tt>ASTVisitor</tt> implementation that simply prints the tree to the standard output.
  304. </p>
  305. <p>
  306. You can also use get a LISP-like print out of a tree via
  307. </p>
  308. <pre>String s = parser.getAST().toStringList();</pre> <h2><a name="_bb11"></a><a name="AST Factories">AST Factories</a></h2>
  309. <p>
  310. ANTLR uses a factory pattern to create and connect AST nodes. This is done to primarily to separate out the tree construction facility from the parser, but also gives you a hook in between the parser and the tree node construction.&nbsp; Subclass <tt>ASTFactory</tt> to alter the <tt>create</tt> methods.
  311. </p>
  312. <p>
  313. If you are only interested in specifying the AST node type at runtime, use the
  314. </p>
  315. <pre><tt>setASTNodeType(String className)</tt></pre>
  316. <p>
  317. method on the parser or factory.&nbsp; By default, trees are constructed of nodes of type <tt>antlr.CommonAST</tt>. (You must use the fully-qualified class name).
  318. </p>
  319. <p>
  320. You can also specify a different class name for each token type to generate heterogeneous trees:
  321. <pre>
  322. /** Specify an "override" for the Java AST object created for a
  323. * specific token. It is provided as a convenience so
  324. * you can specify node types dynamically. ANTLR sets
  325. * the token type mapping automatically from the tokens{...}
  326. * section, but you can change that mapping with this method.
  327. * ANTLR does it's best to statically determine the node
  328. * type for generating parsers, but it cannot deal with
  329. * dynamic values like #[LT(1)]. In this case, it relies
  330. * on the mapping. Beware differences in the tokens{...}
  331. * section and what you set via this method. Make sure
  332. * they are the same.
  333. *
  334. * Set className to null to remove the mapping.
  335. *
  336. * @since 2.7.2
  337. */
  338. public void setTokenTypeASTNodeType(int tokenType, String className)
  339. throws IllegalArgumentException;
  340. </pre>
  341. <p>
  342. The ASTFactory has some generically useful methods:
  343. </p>
  344. <pre>
  345. /** Copy a single node with same Java AST objec type.
  346. * Ignore the tokenType->Class mapping since you know
  347. * the type of the node, t.getClass(), and doing a dup.
  348. *
  349. * clone() is not used because we want all AST creation
  350. * to go thru the factory so creation can be
  351. * tracked. Returns null if t is null.
  352. */
  353. public AST dup(AST t);</pre>
  354. <pre>
  355. /** Duplicate tree including siblings
  356. * of root.
  357. */
  358. public AST dupList(AST t);</pre> <pre>/**Duplicate a tree, assuming this is a
  359. * root node of a tree--duplicate that node
  360. * and what's below; ignore siblings of root
  361. * node.
  362. */
  363. public AST dupTree(AST t);</pre> <h2><a name="Heterogeneous ASTs">Heterogeneous ASTs</a></h2>
  364. <p>
  365. Each node in an AST must encode information about the kind of node it is; for example, is it an ADD operator or a leaf node such as an INT?&nbsp; There are two ways to encode this: with a token type or with a Java (or C++ etc...) class type.&nbsp; In other words, do you have a single class type with numerous token types or no token types and numerous classes?&nbsp; For lack of better terms, I (Terence) have been calling ASTs with a single class type <em>homogeneous</em> trees and ASTs with many class types <em>heterogeneous</em> trees.
  366. </p>
  367. <p>
  368. The only reason to have a different class type for the various kinds of nodes is for the case where you want to execute a bunch of hand-coded tree walks or your nodes store radically different kinds of data.&nbsp; The example I use below demonstrates an expression tree where each node overrides <font face="Courier New">value()</font> so that <font face="Courier New">root.value()</font> is the result of evaluating the input expression. &nbsp; From the perspective of building trees and walking them with a generated tree parser, it is best to consider every node as an identical AST node.&nbsp; Hence, the schism that exists between the hetero- and homogeneous AST camps.
  369. </p>
  370. <p>
  371. ANTLR supports both kinds of tree nodes--at the same time!&nbsp; If you do nothing but turn on the &quot;<font face="Courier New">buildAST=true</font>&quot; option, you get a homogeneous tree.&nbsp; Later, if you want to use physically separate class types for some of the nodes, just specify that in the grammar that builds the tree.&nbsp; Then you can have the best of both worlds--the trees are built automatically, but you can apply different methods to and store different data in the various nodes.&nbsp; Note that the structure of the tree is unaffected; just the type of the nodes changes.
  372. </p>
  373. <p>
  374. ANTLR applies a &quot;scoping&quot; sort of algorithm for determining the class type of a particular AST node that it needs to create.&nbsp; The default type is <font face="Courier New">CommonAST</font> unless, prior to parser invocation, you override that with a call to:
  375. </p>
  376. <pre> <em>myParser</em>.setASTNodeType(&quot;<em>com.acme.MyAST</em>&quot;);</pre>
  377. <p>
  378. where you must use a fully qualified class name.
  379. <p>
  380. In the grammar, you can override the default class type by setting the type for nodes created from a particular input token.&nbsp; Use the element option <font face="Courier New">&lt;AST=<em>typename</em>&gt;</font> in the <font face="Courier New">tokens</font> section:
  381. </p>
  382. <pre>tokens {
  383. PLUS&lt;AST=PLUSNode&gt;;
  384. ...
  385. }</pre>
  386. <p>
  387. You may further override the class type by annotating a particular token reference in your parser grammar:
  388. </p>
  389. <pre>anInt : INT&lt;AST=INTNode&gt; ;</pre>
  390. <p>
  391. This reference override is super useful for tokens such as <font face="Courier New">ID</font> that you might want converted to a <font face="Courier New">TYPENAME</font> node in one context and a <font face="Courier New">VARREF</font> in another context.
  392. </p>
  393. <p>
  394. ANTLR uses the AST factory to create all AST nodes even if it knows the specific type. &nbsp; In other words, ANTLR generates code similar to the following:
  395. </p>
  396. <pre>ANode tmp1_AST = (ANode)astFactory.create(LT(1),"ANode");
  397. </pre>
  398. from
  399. <pre>a : A&lt;AST=ANode&gt; ;</pre>.
  400. <h3><a name="An Expression Tree Example"><font size="3">An Expression Tree Example</font></a></h3>
  401. <p>
  402. <font size="3">This example includes a parser that constructs expression ASTs, the usual lexer, and some AST node class definitions.</font>
  403. </p>
  404. <p>
  405. <font size="3">Let's start by describing the AST structure and node types. &nbsp; Expressions have plus and multiply operators and integers.&nbsp; The operators will be subtree roots (nonleaf nodes) and integers will be leaf nodes.&nbsp; For example, input 3+4*5+21 yields a tree with structure:</font>
  406. </p>
  407. <p>
  408. (&nbsp; + (&nbsp; +&nbsp; 3 (&nbsp; *&nbsp; 4&nbsp; 5 ) )&nbsp; 21 )
  409. </p>
  410. <p>
  411. or:
  412. </p>
  413. <pre> +
  414. |
  415. +--21
  416. |
  417. 3--*
  418. |
  419. 4--5</pre>
  420. <p>
  421. All AST nodes are subclasses of <font face="Courier New">CalcAST</font>, which are <font face="Courier New">BaseAST</font>'s that also answer method <font face="Courier New">value()</font>. &nbsp; Method <font face="Courier New">value()</font> evaluates the tree starting at that node.&nbsp; Naturally, for integer nodes, <font face="Courier New">value()</font> will simply return the value stored within that node.&nbsp; Here is <font face="Courier New">CalcAST:</font>
  422. </p>
  423. <pre>public abstract class CalcAST
  424. extends antlr.BaseAST
  425. {
  426. public abstract int value();
  427. }</pre>
  428. <p>
  429. The AST operator nodes must combine the results of computing the value of their two subtrees.&nbsp; They must perform a depth-first walk of the tree below them.&nbsp; For fun and to make the operations more obvious, the operator nodes define left() and right() instead, making them appear even more different than the normal child-sibling tree representation.&nbsp; Consequently, these expression trees can be treated as both homogeneous child-sibling trees and heterogeneous expression trees.
  430. </p>
  431. <pre>public abstract class BinaryOperatorAST extends
  432. CalcAST
  433. {
  434. /** Make me look like a heterogeneous tree */
  435. public CalcAST left() {
  436. return (CalcAST)getFirstChild();
  437. }
  438. public CalcAST right() {
  439. CalcAST t = left();
  440. if ( t==null ) return null;
  441. return (CalcAST)t.getNextSibling();
  442. }
  443. }</pre>
  444. <p>
  445. The simplest node in the tree looks like:
  446. </p>
  447. <pre>import antlr.BaseAST;
  448. import antlr.Token;
  449. import antlr.collections.AST;
  450. import java.io.*;
  451. /** A simple node to represent an INT */
  452. public class INTNode extends CalcAST {
  453. int v=0;
  454. public INTNode(Token tok) {
  455. v = Integer.parseInt(tok.getText());
  456. }
  457. /** Compute value of subtree; this is
  458. * heterogeneous part :)
  459. */
  460. public int value() {
  461. return v;
  462. }
  463. public String toString() {
  464. return &quot; &quot;+v;
  465. }
  466. // satisfy abstract methods from BaseAST
  467. public void initialize(int t, String txt) {
  468. }
  469. public void initialize(AST t) {
  470. }
  471. public void initialize(Token tok) {
  472. }
  473. }</pre>
  474. <p>
  475. The operators derive from <font face="Courier New">BinaryOperatorAST</font> and define <font face="Courier New">value()</font> in terms of <font face="Courier New">left()</font> and <font face="Courier New">right()</font>.&nbsp; For example, here is <font face="Courier New">PLUSNode</font>:
  476. </p>
  477. <pre>import antlr.BaseAST;
  478. import antlr.Token;
  479. import antlr.collections.AST;
  480. import java.io.*;
  481. /** A simple node to represent PLUS operation */
  482. public class PLUSNode extends BinaryOperatorAST {
  483. public PLUSNode(Token tok) {
  484. }
  485. /** Compute value of subtree;
  486. * this is heterogeneous part :)
  487. */
  488. public int value() {
  489. return left().value() + right().value();
  490. }
  491. public String toString() {
  492. return &quot; +&quot;;
  493. }
  494. // satisfy abstract methods from BaseAST
  495. public void initialize(int t, String txt) {
  496. }
  497. public void initialize(AST t) {
  498. }
  499. public void initialize(Token tok) {
  500. }
  501. }</pre>
  502. <p>
  503. The parser is pretty straightforward except that you have to add the options to tell ANTLR what node types you want to create for which token matched on the input stream. &nbsp; The <font face="Courier New">tokens</font> section lists the operators with element option AST appended to their definitions.&nbsp; This tells ANTLR to build <font face="Courier New">PLUSNode</font> objects for any <font face="Courier New">PLUS</font> tokens seen on the input stream, for example.&nbsp; For demonstration purposes, <font face="Courier New">INT</font> is not included in the <font face="Courier New">tokens</font> section--the specific token references is suffixed with the element option to specify that nodes created from that <font face="Courier New">INT</font> should be of type <font face="Courier New">INTNode</font> (of course, the effect is the same as there is only that one reference to <font face="Courier New">INT</font>).
  504. </p>
  505. <pre>class CalcParser extends Parser;
  506. options {
  507. buildAST = true; // uses CommonAST by default
  508. }
  509. // define a bunch of specific AST nodes to build.
  510. // can override at actual reference of tokens in
  511. // grammar below.
  512. tokens {
  513. PLUS&lt;AST=PLUSNode&gt;;
  514. STAR&lt;AST=MULTNode&gt;;
  515. }
  516. expr: mexpr (PLUS^ mexpr)* SEMI!
  517. ;
  518. mexpr
  519. : atom (STAR^ atom)*
  520. ;
  521. // Demonstrate token reference option
  522. atom: INT&lt;AST=INTNode&gt;
  523. ;</pre>
  524. <p>
  525. Invoking the parser is done as usual.&nbsp; Computing the value of the resulting AST is accomplished by simply calling method <font face="Courier New">value()</font> on the root.
  526. </p>
  527. <pre>import java.io.*;
  528. import antlr.CommonAST;
  529. import antlr.collections.AST;
  530. class Main {
  531. public static void main(String[] args) {
  532. try {
  533. CalcLexer lexer =
  534. new CalcLexer(
  535. new DataInputStream(System.in)
  536. );
  537. CalcParser parser =
  538. new CalcParser(lexer);
  539. // Parse the input expression
  540. parser.expr();
  541. CalcAST t = (CalcAST)parser.getAST();
  542. System.out.println(t.toStringTree());
  543. // Compute value and return
  544. int r = t.value();
  545. System.out.println(&quot;value is &quot;+r);
  546. } catch(Exception e) {
  547. System.err.println(&quot;exception: &quot;+e);
  548. e.printStackTrace();
  549. }
  550. }
  551. }</pre>
  552. <p>
  553. For completeness, here is the lexer:
  554. </p>
  555. <pre>class CalcLexer extends Lexer;
  556. WS : (' '
  557. | '\t'
  558. | '\n'
  559. | '\r')
  560. { $setType(Token.SKIP); }
  561. ;
  562. LPAREN: '(' ;
  563. RPAREN: ')' ;
  564. STAR: '*' ;
  565. PLUS: '+' ;
  566. SEMI: ';' ;
  567. protected
  568. DIGIT
  569. : '0'..'9' ;
  570. INT : (DIGIT)+ ;</pre> <h3><a name="Describing Heterogeneous Trees With Grammars">Describing Heterogeneous Trees With Grammars</a></h3>
  571. <p>
  572. So what's the difference between this approach and default homogeneous tree construction?&nbsp; The big difference is that you need a tree grammar to describe the expression tree and compute resulting values.&nbsp; But, that's a good thing as it's &quot;executable documentation&quot; and negates the need to handcode the tree parser (the <font face="Courier New">value()</font> methods).&nbsp; If you used homogeneous trees, here is all you would need beyond the parser/lexer to evaluate the expressions:&nbsp; [<em>This code comes from the <font face="Courier New">examples/java/calc</font> directory</em>.]
  573. </p>
  574. <pre>class CalcTreeWalker extends TreeParser;
  575. expr returns [float r]
  576. {
  577. float a,b;
  578. r=0;
  579. }
  580. : #(PLUS a=expr b=expr) {r = a+b;}
  581. | #(STAR a=expr b=expr) {r = a*b;}
  582. | i:INT
  583. {r = (float)
  584. Integer.parseInt(i.getText());}
  585. ;</pre>
  586. <p>
  587. Because Terence wants you to use tree grammars even when constructing heterogeneous ASTs (to avoid handcoding methods that implement a depth-first-search), implement the following methods in your various heterogeneous AST node class definitions:
  588. </p>
  589. <pre> /** Get the token text for this node */
  590. public String getText();
  591. /** Get the token type for this node */
  592. public int getType();</pre>
  593. <p>
  594. That is how you can use heterogeneous trees with a tree grammar.&nbsp; Note that your token types must match the <font face="Courier New">PLUS</font> and <font face="Courier New">STAR</font> token types imported from your parser.&nbsp; I.e., make sure <font face="Courier New">PLUSNode.getType()</font> returns <font face="Courier New">CalcParserTokenTypes.PLUS</font>. &nbsp; The token types are generated by ANTLR in interface files that look like:
  595. </p>
  596. <pre>public interface CalcParserTokenTypes {
  597. ...
  598. int PLUS = 4;
  599. int STAR = 5;
  600. ...
  601. }</pre> <h2><a name="AST Serialization">AST (XML) Serialization</a></h2>
  602. <p>
  603. [<font size="2">Oliver Zeigermann <a href="mailto:olli@zeigermann.de">olli@zeigermann.de</a> provided the initial implementation of this serialization.&nbsp; His <a href="http://www.zeigermann.de/xtal.html">XTAL</a> XML translation code is worth checking out; particularly for reading XML-serialized ASTs back in.]</font>
  604. </p>
  605. <p>
  606. For a variety of reasons, you may want to store an AST or pass it to another program or computer.&nbsp; Class antlr.BaseAST is Serializable using the Java code generator, which means you can write ASTs to the disk using the standard Java stuff.&nbsp; You can also write the ASTs out in XML form using the following methods from <font face="Courier New">BaseAST</font>:
  607. <ul>
  608. <li>
  609. <font face="Courier New">public void xmlSerialize(Writer out)</font>
  610. </li>
  611. <li>
  612. <font face="Courier New">public void xmlSerializeNode(Writer out)</font>
  613. </li>
  614. <li>
  615. <font face="Courier New">public void xmlSerializeRootOpen(Writer out)</font>
  616. </li>
  617. <li>
  618. <font face="Courier New">public void xmlSerializeRootClose(Writer out)</font>
  619. </li>
  620. </ul>
  621. <p>
  622. All methods throw <font face="Courier New">IOException</font>.
  623. </p>
  624. <p>
  625. You can override <font face="Courier New">xmlSerializeNode</font> and so on to change the way nodes are written out.&nbsp; By default the serialization uses the class type name as the tag name and has attributes <font face="Courier New">text</font> and <font face="Courier New">type</font> to store the text and token type of the node.
  626. </p>
  627. <p>
  628. The output from running the simple heterogeneous tree example, examples/java/heteroAST, yields:
  629. </p>
  630. <pre> ( + ( + 3 ( * 4 5 ) ) 21 )
  631. &lt;PLUS&gt;&lt;PLUS&gt;&lt;int&gt;3&lt;/int&gt;&lt;MULT&gt;
  632. &lt;int&gt;4&lt;/int&gt;&lt;int&gt;5&lt;/int&gt;
  633. &lt;/MULT&gt;&lt;/PLUS&gt;&lt;int&gt;21&lt;/int&gt;&lt;/PLUS&gt;
  634. value is 44</pre>
  635. <p>
  636. The LISP-form of the tree shows the structure and contents.&nbsp; The various heterogeneous nodes override the open and close tags and change the way leaf nodes are serialized to use <font face="Courier New">&lt;int&gt;<em>value</em>&lt;/int&gt;</font> instead of tag attributes of a single node.
  637. </p>
  638. <p>
  639. Here is the code that generates the XML:
  640. </p>
  641. <pre>Writer w = new OutputStreamWriter(System.out);
  642. t.xmlSerialize(w);
  643. w.write(&quot;\n&quot;);
  644. w.flush();</pre> <h2><a name="_bb12">AST enumerations</a></h2>
  645. <p>
  646. The AST <tt>findAll</tt> and <tt>findAllPartial</tt> methods return enumerations of tree nodes that you can walk.&nbsp; Interface
  647. </p>
  648. <pre>antlr.collections.ASTEnumeration</pre>
  649. <p>
  650. and
  651. </p>
  652. <pre>class antlr.Collections.impl.ASTEnumerator</pre>
  653. <p>
  654. implement this functionality.&nbsp; Here is an example:
  655. </p>
  656. <pre>// Print out all instances of
  657. // <em>a-subtree-of-interest
  658. // </em>found within tree 't'.
  659. ASTEnumeration enum;
  660. enum = t.findAll(<em>a-subtree-of-interest</em>);
  661. while ( enum.hasMoreNodes() ) {
  662. System.out.println(
  663. enum.nextNode().toStringList()
  664. );
  665. }</pre> <h2><a name="_bb13"></a><a name="A few examples">A few examples</a></h2> <pre><tt>
  666. sum :term ( PLUS^ term)*
  667. ;</tt> </pre>
  668. <p>
  669. The &quot;<tt>^</tt>&quot; suffix on the <tt>PLUS</tt> tells ANTLR to create an additional node and place it as the root of whatever subtree has been constructed up until that point for rule <tt>sum</tt>. The subtrees returned by the <tt>term</tt> references are collected as children of the addition nodes.&nbsp; If the subrule is not matched, the associated nodes would not be added to the tree. The rule returns either the tree matched for the first <tt>term</tt> reference or a <tt>PLUS</tt>-rooted tree.
  670. </p>
  671. <p>
  672. The grammar annotations should be viewed as operators, not static specifications. In the above example, each iteration of the (...)* will create a new PLUS root, with the previous tree on the left, and the tree from the new <tt>term</tt> on the right, thus preserving the usual associatively for &quot;+&quot;.
  673. </p>
  674. <p>
  675. Look at the following rule that turns off default tree construction.
  676. </p>
  677. <pre><tt>decl!:
  678. modifiers type ID SEMI;
  679. { #decl = #([DECL], ID, ([TYPE] type),
  680. ([MOD] modifiers) ); }
  681. ;</tt></pre>
  682. <p>
  683. In this example, a declaration is matched. The resulting AST has an &quot;imaginary&quot; <tt>DECL</tt> node at the root, with three children. The first child is the <tt>ID</tt> of the declaration. The second child is a subtree with an imaginary <tt>TYPE</tt> node at the root and the AST from the <tt>type</tt> rule as its child. The third child is a subtree with an imaginary <tt>MOD</tt> at the root and the results of the <tt>modifiers</tt> rule as its child.
  684. </p>
  685. <h2><a name="_bb14"></a><a name="Labeled subrules">Labeled subrules</a></h2>
  686. <p>
  687. [<big><i>THIS WILL NOT BE IMPLEMENTED AS LABELED SUBRULES...We'll do something else eventually.</i></big>]
  688. </p>
  689. <p>
  690. In 2.00 ANTLR, each rule has exactly one tree associated with it. Subrules simply add elements to the tree for the enclosing rule, which is normally what you want. For example, expression trees are easily built via:
  691. </p>
  692. <pre><tt>
  693. expr: ID ( PLUS^ ID )*
  694. ;
  695. </tt> </pre>
  696. <p>
  697. However, many times you want the elements of a subrule to produce a tree that is independent of the rule's tree. Recall that exponents must be computed before coefficients are multiplied in for exponent terms. The following grammar matches the correct syntax.
  698. </p>
  699. <pre><tt>
  700. // match exponent terms such as &quot;3*x^4&quot;
  701. eterm
  702. : expr MULT ID EXPONENT expr
  703. ;
  704. </tt> </pre>
  705. <p>
  706. However, to produce the correct AST, you would normally split the <tt>ID EXPONENT expr</tt> portion into another rule like this:
  707. </p>
  708. <pre><tt>
  709. eterm:
  710. expr MULT^ exp
  711. ;
  712. exp:
  713. ID EXPONENT^ expr
  714. ;
  715. </tt> </pre>
  716. <p>
  717. In this manner, each operator would be the root of the appropriate subrule. For input <tt>3*x^4</tt>, the tree would look like:
  718. </p>
  719. <pre><tt>
  720. #(MULT 3 #(EXPONENT ID 4))
  721. </tt> </pre>
  722. <p>
  723. However, if you attempted to keep this grammar in the same rule:
  724. </p>
  725. <pre><tt>
  726. eterm
  727. : expr MULT^ (ID EXPONENT^ expr)
  728. ;
  729. </tt> </pre>
  730. <p>
  731. both &quot;<tt>^</tt>&quot; root operators would modify the same tree yielding
  732. </p>
  733. <pre><tt>
  734. #(EXPONENT #(MULT 3 ID) 4)
  735. </tt> </pre>
  736. <p>
  737. This tree has the operators as roots, but they are associated with the wrong operands.
  738. </p>
  739. <p>
  740. Using a labeled subrule allows the original rule to generate the correct tree.
  741. </p>
  742. <pre><tt>
  743. eterm
  744. : expr MULT^ e:(ID EXPONENT^ expr)
  745. ;
  746. </tt> </pre>
  747. <p>
  748. In this case, for the same input <tt>3*x^4</tt>, the labeled subrule would build up its own subtree and make it the operand of the <tt>MULT</tt> tree of the <tt>eterm</tt> rule. The presence of the label alters the AST code generation for the elements within the subrule, making it operate more like a normal rule. Annotations of &quot;<tt>^</tt>&quot; make the node created for that token reference the root of the tree for the <tt>e</tt> subrule.
  749. </p>
  750. <p>
  751. Labeled subrules have a result AST that can be accessed just like the result AST for a rule. For example, we could rewrite the above decl example using labeled subrules (note the use of <tt>!</tt> at the start of the subrules to suppress automatic construction for the subrule):
  752. </p>
  753. <pre><tt>
  754. decl!:
  755. m:(! modifiers { #m = #([MOD] modifiers); } )
  756. t:(! type { #t = #([TYPE] type); } )
  757. ID
  758. SEMI;
  759. { #decl = #( [DECL] ID t m ); }
  760. ;
  761. </tt> </pre>
  762. <p>
  763. What about subrules that are closure loops? The same rules apply to a closure subrule--there is a single tree for that loop that is built up according to the AST operators annotating the elements of that loop. For example, consider the following rule.
  764. </p>
  765. <pre><tt>
  766. term: T^ i:(OP^ expr)+
  767. ;
  768. </tt> </pre>
  769. <p>
  770. For input <tt>T OP A OP B OP C</tt>, the following tree structure would be created:
  771. </p>
  772. <pre><tt>
  773. #(T #(OP #(OP #(OP A) B) C) )
  774. </tt> </pre>
  775. <p>
  776. which can be drawn graphically as
  777. </p>
  778. <pre><tt>
  779. T
  780. |
  781. OP
  782. |
  783. OP--C
  784. |
  785. OP--B
  786. |
  787. A
  788. </tt> </pre>
  789. <p>
  790. The first important thing to note is that each iteration of the loop in the subrule operates on the same tree. The resulting tree, after all iterations of the loop, is associated with the subrule label. The result tree for the above labeled subrule is:
  791. </p>
  792. <pre><tt>
  793. #(OP #(OP #(OP A) B) C)
  794. </tt> </pre>
  795. <p>
  796. The second thing to note is that, because <tt>T</tt> is matched first and there is a root operator after it in the rule, <tt>T</tt> would be at the bottom of the tree if it were not for the label on the subrule.
  797. </p>
  798. <p>
  799. Loops will generally be used to build up lists of subtree. For example, if you want a list of polynomial assignments to produce a sibling list of <tt>ASSIGN</tt> subtrees, then the following rule you would normally split into two rules.
  800. </p>
  801. <pre><tt>
  802. interp
  803. : ( ID ASSIGN poly &quot;;&quot; )+
  804. ;
  805. </tt> </pre>
  806. <p>
  807. Normally, the following would be required
  808. </p>
  809. <pre><tt>
  810. interp
  811. : ( assign )+
  812. ;
  813. assign
  814. : ID ASSIGN^ poly &quot;;&quot;!
  815. ;
  816. </tt> </pre>
  817. <p>
  818. Labeling a subrule allows you to write the above example more easily as:
  819. </p>
  820. <pre><tt>
  821. interp
  822. : ( r:(ID ASSIGN^ poly &quot;;&quot;) )+
  823. ;
  824. </tt> </pre>
  825. <p>
  826. Each recognition of a subrule results in a tree and if the subrule is nested in a loop, all trees are returned as a list of trees (i.e., the roots of the subtrees are siblings). If the labeled subrule is suffixed with a &quot;<tt>!</tt>&quot;, then the tree(s) created by the subrule are not linked into the tree for the enclosing rule or subrule.
  827. </p>
  828. <p>
  829. Labeled subrules within labeled subrules result in trees that are linked into the surrounding subrule's tree. For example, the following rule results in a tree of the form <tt>X #( A #(B C) D) Y</tt>.
  830. </p>
  831. <pre><tt>
  832. a : X r:( A^ s:(B^ C) D) Y
  833. ;
  834. </tt> </pre>
  835. <p>
  836. Labeled subrules within nonlabeled subrules result in trees that are linked into the surrounding rule's tree. For example, the following rule results in a tree of the form <tt>#(A X #(B C) D Y)</tt>.
  837. </p>
  838. <pre><tt>
  839. a : X ( A^ s:(B^ C) D) Y
  840. ;</tt> </pre> <h2><a name="_bb15"></a><a name="Reference nodes">Reference nodes</a></h2>
  841. <p>
  842. <b>Not implemented.</b> A node that does nothing but refer to another node in the tree. Nice for embedding the same tree in multiple lists.
  843. </p>
  844. <h2><a name="_bb16"></a><a name="Required AST functionality and form">Required AST functionality and form</a></h2>
  845. <p>
  846. The data structure representing your trees can have any form or type name as long as they implement the <tt>AST</tt> interface:
  847. </p>
  848. <pre><tt>package antlr.collections;
  849. /** Minimal AST node interface used by ANTLR
  850. * AST generation and tree-walker.
  851. */
  852. public interface AST {
  853. /** Get the token type for this node */
  854. public int getType();
  855. /** Set the token type for this node */
  856. public void setType(int ttype);
  857. /** Get the token text for this node */
  858. public String getText();
  859. /** Set the token text for this node */
  860. public void setText(String text);
  861. /** Get the first child of this node;
  862. * null if no children */
  863. public AST getFirstChild();
  864. /** Set the first child of a node */
  865. public void setFirstChild(AST c);
  866. /** Get the next sibling in line after this
  867. * one
  868. */
  869. public AST getNextSibling();
  870. /** Set the next sibling after this one */
  871. public void setNextSibling(AST n);
  872. /** Add a (rightmost) child to this node */
  873. public void addChild(AST node);</tt></pre> <pre> /** Are two nodes exactly equal? */
  874. public boolean equals(AST t);</pre> <pre> /** Are two lists of nodes/subtrees exactly
  875. * equal in structure and content? */
  876. public boolean equalsList(AST t);</pre> <pre> /** Are two lists of nodes/subtrees
  877. * partially equal? In other words, 'this'
  878. * can be bigger than 't'
  879. */
  880. public boolean equalsListPartial(AST t);</pre> <pre> /** Are two nodes/subtrees exactly equal? */
  881. public boolean equalsTree(AST t);</pre> <pre> /** Are two nodes/subtrees exactly partially
  882. * equal? In other words, 'this' can be
  883. * bigger than 't'.
  884. */
  885. public boolean equalsTreePartial(AST t);</pre> <pre> /** Return an enumeration of all exact tree
  886. * matches for tree within 'this'.
  887. */
  888. public ASTEnumeration findAll(AST tree);</pre> <pre> /** Return an enumeration of all partial
  889. * tree matches for tree within 'this'.
  890. */
  891. public ASTEnumeration findAllPartial(
  892. AST subtree);</pre> <pre> /** Init a node with token type and text */
  893. public void initialize(int t, String txt);</pre> <pre> /** Init a node using content from 't' */
  894. public void initialize(AST t);</pre> <pre> /** Init a node using content from 't' */
  895. public void initialize(Token t);</pre> <pre> /** Convert node to printable form */
  896. public String toString();</pre> <pre> /** Treat 'this' as list (i.e.,
  897. * consider 'this'
  898. * siblings) and convert to printable
  899. * form
  900. */
  901. public String toStringList();</pre> <pre> /** Treat 'this' as tree root
  902. * (i.e., don't consider
  903. * 'this' siblings) and convert
  904. * to printable form */
  905. public String toStringTree();<tt>
  906. }</tt></pre>
  907. <p>
  908. This scheme does not preclude the use of heterogeneous trees versus homogeneous trees. However, you will need to write extra code to create heterogeneous trees (via a subclass of <tt>ASTFactory</tt>) or by specifying the node types at the token reference sites or in the <font face="Courier New">tokens</font> section, whereas the homogeneous trees are free.
  909. </p>
  910. <pre><font face="Arial" size="2">Version: $Id: //depot/code/org.antlr/release/antlr-2.7.5/doc/trees.html#1 $</font></pre>
  911. </body>
  912. </html>