PageRenderTime 51ms CodeModel.GetById 14ms RepoModel.GetById 0ms app.codeStats 0ms

/lib/antlr-2.7.5/doc/glossary.html

https://github.com/boo/boo-lang
HTML | 640 lines | 499 code | 139 blank | 2 comment | 0 complexity | b1340df74c4c5311b3f0ca52ade8bea8 MD5 | raw file
Possible License(s): GPL-2.0
  1. <html>
  2. <head>
  3. <title>ANTLR-centric Language Glossary</title>
  4. </head>
  5. <body bgcolor="#FFFFFF" text="#000000">
  6. <font face=Arial>
  7. <h1><a id="ANTLR-centric_Language_Glossary" name="ANTLR-centric_Language_Glossary">ANTLR-centric Language Glossary</a></h1>
  8. <i>Terence Parr</i>
  9. <br><br>
  10. This glossary defines some of the more important terms used in the
  11. ANTLR documentation. I have tried to be very informal and provide
  12. references to other pages that are useful. For another great source
  13. of information about formal computer languages, see <a
  14. href="http://www.wikipedia.org/wiki/Context-free_grammar"><b>Wikipedia</b></a>.
  15. <font size=2>
  16. <!--index-->
  17. <ul>
  18. <li><a href="#Ambiguous">Ambiguous</a></li>
  19. <li><a href="#ANTLR">ANTLR</a></li>
  20. <li><a href="#AST">AST</a></li>
  21. <li><a href="#Bit_set">Bit set</a></li>
  22. <li><a href="#Child-sibling_Tree">Child-sibling Tree</a></li>
  23. <li><a href="#Context-free_grammar">Context-free grammar</a></li>
  24. <li><a href="#Context-sensitive">Context-sensitive</a></li>
  25. <li><a href="#DFA">DFA</a></li>
  26. <li><a href="#FIRST">FIRST</a></li>
  27. <li><a href="#FOLLOW">FOLLOW</a></li>
  28. <li><a href="#Grammar">Grammar</a></li>
  29. <li><a href="#Hoisting">Hoisting</a></li>
  30. <li><a href="#Inheritance,_grammar">Inheritance, grammar</a></li>
  31. <li><a href="#LA(n)">LA(n)</a></li>
  32. <li><a href="#Left-prefix,_left_factor">Left-prefix, left factor</a></li>
  33. <li><a href="#Literal">Literal</a></li>
  34. <li><a href="#Linear_approximate_lookahead">Linear approximate lookahead</a></li>
  35. <li><a href="#LL(k)">LL(k)</a></li>
  36. <li><a href="#LT(n)">LT(n)</a></li>
  37. <li><a href="#Language">Language</a></li>
  38. <li><a href="#Lexer">Lexer</a></li>
  39. <li><a href="#Lookahead">Lookahead</a></li>
  40. <li><a href="#nextToken">nextToken</a></li>
  41. <li><a href="#NFA">NFA</a></li>
  42. <li><a href="#Nondeterministic">Nondeterministic</a></li>
  43. <li><a href="#Parser">Parser</a></li>
  44. <li><a href="#Predicate,_semantic">Predicate, semantic</a></li>
  45. <li><a href="#Predicate,_syntactic">Predicate, syntactic</a></li>
  46. <li><a href="#Production">Production</a></li>
  47. <li><a href="#Protected">Protected</a></li>
  48. <li><a href="#Recursive-descent">Recursive-descent</a></li>
  49. <li><a href="#Regular">Regular</a></li>
  50. <li><a href="#Rule">Rule</a></li>
  51. <li><a href="#Scanner">Scanner</a></li>
  52. <li><a href="#Semantics">Semantics</a></li>
  53. <li><a href="#Subrule">Subrule</a></li>
  54. <li><a href="#Syntax">Syntax</a></li>
  55. <li><a href="#Token">Token</a></li>
  56. <li><a href="#Token_stream">Token stream</a></li>
  57. <li><a href="#Tree">Tree</a></li>
  58. <li><a href="#Tree_parser">Tree parser</a></li>
  59. <li><a href="#Vocabulary">Vocabulary</a></li>
  60. <li><a href="#Wow">Wow</a></li>
  61. </ul>
  62. <!--/index-->
  63. </font>
  64. <h3><a id="Ambiguous" name="Ambiguous">Ambiguous</a></h3>
  65. A language is ambiguous if the same sentence or phrase can be
  66. interpreted in more than a single way. For example, the following
  67. sentence by Groucho Marx is easily interpreted in two ways: "I once
  68. shot an elephant in my pajamas. How he got in my pajamas I'll never
  69. know!" In the computer world, a typical language ambiguity is the
  70. if-then-else ambiguity where the else-clause may be attached to either
  71. the most recent if-then or an older one. Reference manuals for
  72. computer languages resolve this ambiguity by stating that else-clauses
  73. always match up with the most recent if-then.
  74. <p>
  75. A grammar is ambiguous if the same input sequence can be derived in
  76. multiple ways. Ambiguous languages always yield ambiguous grammars
  77. unless you can find a way to encode semantics (actions or predicates
  78. etc...) that resolve the ambiguity. Most language tools like ANTLR
  79. resolve the if-then-else ambiguity by simply choosing to match
  80. greedily (i.e., as soon as possible). This matches the else with the
  81. most recent if-then. See nondeterministic.
  82. <h3><a id="ANTLR" name="ANTLR">ANTLR</a></h3>
  83. <b>AN</b>other <b>T</b>ool for <b>L</b>anguage <b>R</b>ecognition, a
  84. predicated-LL(k) parser generator that handles lexers, parsers, and
  85. tree parsers. ANTLR has been available since 1990 and led to a
  86. resurgence of recursive-descent parsing after decades dominated by LR
  87. and other DFA-based strategies.
  88. <h3><a id="AST" name="AST">AST</a></h3>
  89. <b>A</b>bstract <b>S</b>yntax <b>T</b>ree. ASTs are used as internal
  90. representations of an input stream, normally constructed during a
  91. parsing phase. Because AST are two-dimensional trees they
  92. can encode the structure (as determined by the parser) of the input as
  93. well as the input symbols.
  94. <p>
  95. A homogeneous AST is in one in which the physical objects are all of
  96. the same type; e.g., <tt>CommonAST</tt> in ANTLR. A heterogeneous
  97. tree may have multiple types such as <tt>PlusNode</tt>,
  98. <tt>MultNode</tt> etc...
  99. <p> An AST is not a parse tree, which encodes the sequence of rules
  100. used to match input symbols. See <a
  101. href="http://www.jguru.com/faq/view.jsp?EID=814505"><b>What's the
  102. difference between a parse tree and an abstract syntax tree (AST)? Why
  103. doesn't ANTLR generate trees with nodes for grammar rules like JJTree
  104. does?</b></a>.
  105. <p>An AST for input <tt>3+4</tt> might be represented as
  106. <pre>
  107. +
  108. / \
  109. 3 4
  110. </pre>
  111. or more typically (ala ANTLR) in child-sibling form:
  112. <pre>
  113. +
  114. |
  115. 3--4
  116. </pre>
  117. Operators are usually subtree roots and operands are usually leaves.
  118. <h3><a id="Bit_set" name="Bit_set">Bit set</a></h3>
  119. Bit sets are an extremely efficient representation for dense integer
  120. sets. You can easily encode sets of strings also by mapping unique
  121. strings to unique integers. ANTLR uses bitsets for lookahead
  122. prediction in parsers and lexers. Simple bit set implementations do
  123. not work so well for sparse sets, particularly when the maximum
  124. integer stored in the set is large.
  125. <p> ANTLR's bit set represents membership with a bit for each possible
  126. integer value. For a maximum value of <i>n</i>, a bit set needs
  127. <i>n/64</i> long words or <i>n/8</i> bytes. For ASCII bit sets with a
  128. maximum value of 127, you only need 16 bytes or 2 long words. UNICODE
  129. has a max value of \uFFFF or 65535, requiring 8k bytes, and these sets
  130. are typically sparse. Fortunately most lexers only need a few of
  131. these space inefficient (but speedy) bitsets and so it's not really a
  132. problem.
  133. <h3><a id="Child-sibling_Tree" name="Child-sibling_Tree">Child-sibling Tree</a></h3>
  134. A particularly efficient data structure for representing trees. See AST.
  135. <h3><a id="Context-free_grammar" name="Context-free_grammar">Context-free grammar</a></h3>
  136. A grammar where recognition of a particular construct does not depend
  137. on whether it is in a particular syntactic context. A context-free
  138. grammar has a set of rules like
  139. <pre>
  140. stat : IF expr THEN stat
  141. | ...
  142. ;
  143. </pre>
  144. where there is no restriction on when the <tt>IF</tt> alternative may
  145. be applied--if you are in rule <tt>stat</tt>, you may apply the
  146. alternative.
  147. <h3><a id="Context-sensitive" name="Context-sensitive">Context-sensitive</a></h3>
  148. A grammar where recognition of a particular construct may depend on a
  149. syntactic context. You never see these grammars in practice because
  150. they are impractical (Note, an Earley parser is O(n^3) worst-case for
  151. context-<i>free</i> grammars). A context-free rule looks like:
  152. <pre>
  153. &Alpha; &rarr; &gamma;
  154. </pre>
  155. but a context-<i>sensitive</i> rule may have context on the left-side:
  156. <pre>
  157. &alpha;&Alpha;&beta; &rarr; &alpha;&gamma;&beta;
  158. </pre>
  159. meaning that rule &Alpha; may only be applied (converted to &gamma;)
  160. in between &alpha; and &beta;.
  161. <p>
  162. In an ANTLR sense, you can recognize context-sensitive constructs with
  163. a semantic predicate. The action evaluates to true or false
  164. indicating the validity of applying the alternative.
  165. <p>
  166. See <a
  167. href="http://www.wikipedia.org/wiki/Context-sensitive"><b>Context-sensitive gramar</b></a>.
  168. <h3><a id="DFA" name="DFA">DFA</a></h3>
  169. <b>D</b>eterministic <b>F</b>inite <b>A</b>utomata. A state machine
  170. used typically to formally describe lexical analyzers. <tt>lex</tt>
  171. builds a DFA to recognize tokens whereas ANTLR builds a recursive
  172. descent lexer similar to what you would build by hand. See <a
  173. href="http://www.wikipedia.org/wiki/Finite_state_machine"><b>Finite
  174. state machine</b></a> and ANTLR's lexer documentation.
  175. <h3><a id="FIRST" name="FIRST">FIRST</a></h3>
  176. The set of symbols that may be matched on the left-edge of a rule.
  177. For example, the FIRST(decl) is set {ID, INT} for the following:
  178. <pre>
  179. decl : ID ID SEMICOLON
  180. | INT ID SEMICOLON
  181. ;
  182. </pre>
  183. The situation gets more complicated when you have optional
  184. constructs. The FIRST(a) below is {A,B,C}
  185. <pre>
  186. a : (A)? B
  187. | C
  188. ;
  189. </pre>
  190. because the A is optional and the B may be seen on the left-edge.
  191. <p>
  192. Naturally k>1 lookahead symbols makes this even more complicated.
  193. FIRST_k must track sets of k-sequences not just individual symbols.
  194. <h3><a id="FOLLOW" name="FOLLOW">FOLLOW</a></h3>
  195. The set of input symbols that may follow any reference to the
  196. specified rule. For example, FOLLOW(decl) is {RPAREN, SEMICOLON):
  197. <pre>
  198. methodHead : ID LPAREN decl RPAREN ;
  199. var : decl SEMICOLON ;
  200. decl : TYPENAME ID ;
  201. </pre>
  202. because RPAREN and SEMICOLON both follow references to rule decl.
  203. FIRST and FOLLOW computations are used to analyze grammars and
  204. generate parsing decisions.
  205. <p>
  206. This grammar analysis all gets very complicated when k>1.
  207. <h3><a id="Grammar" name="Grammar">Grammar</a></h3>
  208. A finite means of formally describing the structure of a possibly
  209. infinite language. Parser generators build parsers that recognize
  210. sentences in the language described by a grammar. Most parser
  211. generators allow you to add actions to be executed during the parse.
  212. <h3><a id="Hoisting" name="Hoisting">Hoisting</a></h3>
  213. Semantic predicates describe the semantic context in which a rule or
  214. alternative applies. The predicate is hoisted into a prediction
  215. expression. Hoisting typically refers to pulling a predicate out of
  216. its enclosing rule and into the prediction expression of another
  217. rule. For example,
  218. <pre>
  219. decl : typename ID SEMICOLON
  220. | ID ID SEMICOLON
  221. ;
  222. typename : {isType(LT(1))}? ID
  223. ;
  224. </pre>
  225. The predicate is not needed in typename as there is no decision,
  226. however, rule decl needs it to distinguish between its two
  227. alternatives. The first alternative would look like:
  228. <pre>
  229. if ( LA(1)==ID && isType(LT(1)) ) {
  230. typename();
  231. match(ID);
  232. match(SEMICOLON);
  233. }
  234. </pre>
  235. PCCTS 1.33 did, but ANTLR currently does not hoist predicates into
  236. other rules.
  237. <h3><a id="Inheritance,_grammar" name="Inheritance,_grammar">Inheritance, grammar</a></h3>
  238. The ability of ANTLR to define a new grammar as it differs from an
  239. existing grammar. See the ANTLR documentation.
  240. <h3><a id="LA(n)" name="LA(n)">LA(n)</a></h3>
  241. The nth lookahead character, token type, or AST node type depending
  242. on the grammar type (lexer, parser, or tree parser respectively).
  243. <h3><a id="Left-prefix,_left_factor" name="Left-prefix,_left_factor">Left-prefix, left factor</a></h3>
  244. A common sequence of symbols on the left-edge of a set of alternatives
  245. such as:
  246. <pre>
  247. a : A B X
  248. | A B Y
  249. ;
  250. </pre>
  251. The left-prefix is A B, which you can remove by left-factoring:
  252. <pre>
  253. a : A B (X|Y)
  254. ;
  255. </pre>
  256. Left-factoring is done to reduce lookahead requirements.
  257. <h3><a id="Literal" name="Literal">Literal</a></h3>
  258. Generally a literal refers to a fixed string such as <tt>begin</tt>
  259. that you wish to match. When you reference a literal in an ANTLR
  260. grammar via <tt>"begin"</tt>, ANTLR assigns it a token type like any
  261. other token. If you have defined a lexer, ANTLR provides information
  262. about the literal (type and text) to the lexer so it may detect
  263. occurrences of the literal.
  264. <h3><a id="Linear_approximate_lookahead" name="Linear_approximate_lookahead">Linear approximate lookahead</a></h3>
  265. An approximation to full lookahead (that can be applied to both LL and LR
  266. parsers) for k>1 that reduces the complexity of storing and testing
  267. lookahead from O(n^k) to O(nk); exponential to linear reduction. When
  268. linear approximate lookahead is insufficient (results in a
  269. nondeterministic parser), you can use the approximate lookahead to
  270. attenuate the cost of building the full decision.
  271. <p>Here is a simple example illustrating the difference between full
  272. and approximate lookahead:
  273. <pre>
  274. a : (A B | C D)
  275. | A D
  276. ;
  277. </pre>
  278. This rule is LL(2) but not linear approximate LL(2). The real
  279. FIRST_2(a) is {AB,CD} for alternative 1 and {AD} for alternative 2.
  280. No intersection, so no problem. Linear approximate lookahead
  281. collapses all symbols at depth i yielding k sets instead of a possibly
  282. n^k k-sequences. The approximation (compressed) sets are
  283. {AB,AD,CD,CB} and {AD}. Note the introduction of the spurious
  284. k-sequences AD and CB. Unfortunately, this compression introduces a
  285. conflict upon AD between the alternatives. PCCTS did full LL(k) and
  286. ANTLR does linear approximate only as I found that linear approximate
  287. lookahead works for the vast majority of parsing decisions and is
  288. extremely fast. I find one or two problem spots in a large grammar
  289. usually with ANTLR, which forces me to reorganize my grammar in a
  290. slightly unnatural manner. Unfortunately, your brain does full LL(k)
  291. and ANTLR does a slightly weaker linear approximate lookahead--a
  292. source of many (invalid) bug reports ;)
  293. <p>
  294. This compression was the subject of <a href="http://www.antlr.org/papers/parr.phd.thesis.pdf"><b>my doctoral
  295. dissertation</b></a> (PDF 477k) at Purdue.
  296. <h3><a id="LL(k)" name="LL(k)">LL(k)</a></h3>
  297. Formally, LL(k) represents a class of parsers and grammars that parse
  298. symbols from left-to-right (beginning to end of input stream) using a
  299. leftmost derivation and using k symbols of lookahead. A leftmost
  300. derivation is one in which derivations (parses) proceed by attempting
  301. to replace rule references from left-to-right within a production.
  302. Given the following rule
  303. <pre>
  304. stat : IF expr THEN stat
  305. | ...
  306. ;
  307. </pre>
  308. an LL parser would match the IF then attempt to parse expr rather than
  309. a rightmost derivation, which would attempt to parse stat first.
  310. <p>
  311. LL(k) is synonymous with a "top-down" parser because the
  312. parser begins at the start symbol and works its way down the
  313. derivation/parse tree (tree here means the stack of method activations
  314. for recursive descent or symbol stack for a table-driven parser). A
  315. recursive-descent parser is particular implementation of an LL parser
  316. that uses functions or method calls to implement the parser rather
  317. than a table.
  318. <p>
  319. ANTLR generates predicate-LL(k) parsers that support syntactic and
  320. sematic predicates allowing you to specify many context-free and
  321. context-sensitive grammars (with a bit of work).
  322. <h3><a id="LT(n)" name="LT(n)">LT(n)</a></h3>
  323. In a parser, this is the nth lookahead Token object.
  324. <h3><a id="Language" name="Language">Language</a></h3> A possibly infinite set of valid sentences. The
  325. vocabulary symbols may be characters, tokens, and tree nodes in an
  326. ANTLR context.
  327. <h3><a id="Lexer" name="Lexer">Lexer</a></h3>
  328. A recognizer that breaks up a stream of characters into
  329. vocabulary symbols for a parser. The parser pulls vocabulary symbols
  330. from the lexer via a queue.
  331. <h3><a id="Lookahead" name="Lookahead">Lookahead</a></h3>
  332. When parsing a stream of input symbols, a parser has matched (and no
  333. longer needs to consider) a portion of the stream to the left of its
  334. read pointer. The next k symbols to the right of the read pointer are
  335. considered the fixed lookahead. This information is used to direct
  336. the parser to the next state. In an LL(k) parser this means to
  337. predict which path to take from the current state using the next k
  338. symbols of lookahead.
  339. <p> ANTLR supports syntactic predicates, a manually-specified form of
  340. backtracking that effectively gives you infinite lookahead. For
  341. example, consider the following rule that distinguishes between sets
  342. (comma-separated lists of words) and parallel assignments (one list
  343. assigned to another):
  344. <pre>
  345. stat: ( list "=" )=> list "=" list
  346. | list
  347. ;
  348. </pre>
  349. If a list followed by an assignment operator is found on the input
  350. stream, the first production is predicted. If not, the second
  351. alternative production is attempted.
  352. <h3><a id="nextToken" name="nextToken">nextToken</a></h3>
  353. A lexer method automatically generated by ANTLR that figures out which
  354. of the lexer rules to apply. For example, if you have two rules ID
  355. and INT in your lexer, ANTLR will generate a lexer with methods for ID
  356. and INT as well as a nextToken method that figures out which rule
  357. method to attempt given k input characters.
  358. <h3><a id="NFA" name="NFA">NFA</a></h3>
  359. <b>N</b>ondeterministic <b>F</b>inite <b>A</b>utomata. See <a
  360. href="http://www.wikipedia.org/wiki/Finite_state_machine"><b>Finite
  361. state machine</b></a>.
  362. <h3><a id="Nondeterministic" name="Nondeterministic">Nondeterministic</a></h3>
  363. A parser is nondeterministic if there is at least one decision point
  364. where the parser cannot resolve which path to take. Nondeterminisms
  365. arise because of parsing strategy weaknesses.
  366. <ul>
  367. <li>If your strategy works only for unambiguous grammars, then
  368. ambiguous grammars will yield nondeterministic parsers; this is true
  369. of the basic LL, LR strategies. Even unambiguous grammars can yield
  370. nondeterministic parsers though. Here is a nondeterministic LL(1)
  371. grammar:
  372. <pre>
  373. decl : ID ID SEMICOLON
  374. | ID SEMICOLON
  375. ;
  376. </pre>
  377. Rule <tt>decl</tt> is, however, LL(2) because the second lookahead
  378. symbol (either ID or SEMICOLON) uniquely determines which alternative
  379. to predict. You could also left-factor the rule to reduce the
  380. lookahead requirements.<br><br>
  381. <li>
  382. If you are willing to pay a
  383. performance hit or simply need to handle ambiguous grammars, you can
  384. use an Earley parser or a Tomita parser (LR-based) that match all
  385. possible interpretations of the input, thus, avoiding the idea of
  386. nondeterminism altogether. This does present problems when trying to
  387. execute actions, however, because multiple parses are, in effect,
  388. occurring in parallel.
  389. </ul>
  390. <p>
  391. Note that a parser may have multiple decision points that are
  392. nondeterministic.
  393. <h3><a id="Parser" name="Parser">Parser</a></h3>
  394. A recognizer that applies a grammatical structure to a stream
  395. of vocabulary symbols called tokens.
  396. <h3><a id="Predicate,_semantic" name="Predicate,_semantic">Predicate, semantic</a></h3>
  397. A semantic predicate is a boolean expression used to alter the parse
  398. based upon semantic information. This information is usually a
  399. function of the constructs/input that have already been matched, but
  400. can even be a flag that turns on and off subsets of the language (as
  401. you might do for a grammar handling both K&R and ANSI C). One of the
  402. most common semantic predicates uses a symbol table to help
  403. distinguish between syntactically, but semantically different
  404. productions. In FORTRAN, array references and function calls look the
  405. same, but may be distinguished by checking what the type of the
  406. identifier is.
  407. <pre>
  408. expr : {isVar(LT(1))}? ID LPAREN args RPAREN // array ref
  409. | {isFunction(LT(1))}? ID LPAREN args RPAREN // func call
  410. ;
  411. </pre>
  412. <h3><a id="Predicate,_syntactic" name="Predicate,_syntactic">Predicate, syntactic</a></h3>
  413. A selective form of backtracking used to recognize language constructs
  414. that cannot be distinguished without seeing all or most of the
  415. construct. For example, in C++ some declarations look exactly like
  416. expressions. You have to check to see if it is a declaration. If it
  417. parses like a declaration, assume it is a declaration--reparse it with
  418. "feeling" (execute your actions). If not, it must be an expression or
  419. an error:
  420. <pre>
  421. stat : (declaration) => declaration
  422. | expression
  423. ;
  424. </pre>
  425. <h3><a id="Production" name="Production">Production</a></h3>
  426. An alternative in a grammar rule.
  427. <h3><a id="Protected" name="Protected">Protected</a></h3>
  428. A protected lexer rule does not represent a complete token--it is a
  429. helper rule referenced by another lexer rule. This overloading of the
  430. access-visibility Java term occurs because if the rule is not visible,
  431. it cannot be "seen" by the parser (yes, this nomeclature sucks).
  432. <h3><a id="Recursive-descent" name="Recursive-descent">Recursive-descent</a></h3>
  433. See LL(k).
  434. <h3><a id="Regular" name="Regular">Regular</a></h3>
  435. A regular language is one that can be described by a regular grammar
  436. or regular expression or accepted by a DFA-based lexer such as those
  437. generated by <tt>lex</tt>. Regular languages are normally used to
  438. describe tokens.
  439. <p> In practice you can pick out a regular grammar by noticing that
  440. references to other rules are not allowed accept at the end of a
  441. production. The following grammar is regular because reference to
  442. <tt>B</tt> occurs at the right-edge of rule <tt>A</tt>.
  443. <pre>
  444. A : ('a')+ B ;
  445. B : 'b' ;
  446. </pre>
  447. Another way to look at it is, "what can I recognize without a stack
  448. (such as a method return address stack)?".
  449. <p>
  450. Regular grammars cannot describe context-free languages, hence, LL or
  451. LR based grammars are used to describe programming languages. ANTLR
  452. is not restricted to regular languages for tokens because it generates
  453. recursive-descent lexers. This makes it handy to recognize HTML tags
  454. and so on all in the lexer.
  455. <h3><a id="Rule" name="Rule">Rule</a></h3>
  456. A rule describes a partial sentence in a language such as a statement
  457. or expression in a programming language. Rules may have one or more
  458. alternative productions.
  459. <h3><a id="Scanner" name="Scanner">Scanner</a></h3>
  460. See Lexer.
  461. <h3><a id="Semantics" name="Semantics">Semantics</a></h3> See <a href="http://www.jguru.com/faq/view.jsp?EID=81"><b>What
  462. do "syntax" and "semantics" mean and how are they different?</b></a>.
  463. <h3><a id="Subrule" name="Subrule">Subrule</a></h3>
  464. Essentially a rule that has been expanded inline. Subrules are
  465. enclosed in parenthesis and may have suffixes like star, plus, and
  466. question mark that indicate zero-or-more, one-or-more, or optional.
  467. The following rule has 3 subrules:
  468. <pre>
  469. a : (A|B)+ (C)* (D)?
  470. ;
  471. </pre>
  472. <h3><a id="Syntax" name="Syntax">Syntax</a></h3>
  473. See <a href="http://www.jguru.com/faq/view.jsp?EID=81"><b>What
  474. do "syntax" and "semantics" mean and how are they different?</b></a>.
  475. <h3><a id="Token" name="Token">Token</a></h3>
  476. A vocabulary symbol for a language. This term typically refers to the
  477. vocabulary symbols of a parser. A token may represent a constant
  478. symbol such as a keyword like <tt>begin</tt> or a "class" of input
  479. symbols like <tt>ID</tt> or <tt>INTEGER_LITERAL</tt>.
  480. <h3><a id="Token_stream" name="Token_stream">Token stream</a></h3>
  481. See <a href="http://www.antlr.org/doc/streams.html"><b>Token
  482. Streams</b></a> in the ANTLR documentation.
  483. <h3><a id="Tree" name="Tree">Tree</a></h3>
  484. See AST and <a
  485. href="http://www.jguru.com/faq/view.jsp?EID=814505"><b>What's the
  486. difference between a parse tree and an abstract syntax tree (AST)? Why
  487. doesn't ANTLR generate trees with nodes for grammar rules like JJTree
  488. does?</b></a>.
  489. <h3><a id="Tree_parser" name="Tree_parser">Tree parser</a></h3> A recognizer that applies a grammatical structure to a
  490. two-dimensional input tree. Grammatical rules are like an "executable
  491. comment" that describe the tree structure. These parsers are useful
  492. during translation to (i) annotate trees with, for example, symbol
  493. table information, (2) perform tree rewrites, and (3) generate output.
  494. <h3><a id="Vocabulary" name="Vocabulary">Vocabulary</a></h3>
  495. The set of symbols used to construct sentences in a language. These
  496. symbols are usually called tokens or token types. For lexers, the
  497. vocabulary is a set of characters.
  498. <h3><a id="Wow" name="Wow">Wow</a></h3> See ANTLR.
  499. </font>
  500. </body>
  501. </html>