boo-lang /lib/antlr-2.7.5/doc/glossary.html

Language HTML Lines 641
MD5 Hash b1340df74c4c5311b3f0ca52ade8bea8
Repository https://github.com/boo/boo-lang.git View Raw File View Project SPDX
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
<html>
<head>
	<title>ANTLR-centric Language Glossary</title>
</head>
<body bgcolor="#FFFFFF" text="#000000">

<font face=Arial>

<h1><a id="ANTLR-centric_Language_Glossary" name="ANTLR-centric_Language_Glossary">ANTLR-centric Language Glossary</a></h1>

<i>Terence Parr</i>
<br><br>
This glossary defines some of the more important terms used in the
ANTLR documentation.  I have tried to be very informal and provide
references to other pages that are useful.  For another great source
of information about formal computer languages, see <a
href="http://www.wikipedia.org/wiki/Context-free_grammar"><b>Wikipedia</b></a>.

<font size=2>
<!--index-->
<ul>
	<li><a href="#Ambiguous">Ambiguous</a></li>
	<li><a href="#ANTLR">ANTLR</a></li>
	<li><a href="#AST">AST</a></li>
	<li><a href="#Bit_set">Bit set</a></li>
	<li><a href="#Child-sibling_Tree">Child-sibling Tree</a></li>
	<li><a href="#Context-free_grammar">Context-free grammar</a></li>
	<li><a href="#Context-sensitive">Context-sensitive</a></li>
	<li><a href="#DFA">DFA</a></li>
	<li><a href="#FIRST">FIRST</a></li>
	<li><a href="#FOLLOW">FOLLOW</a></li>
	<li><a href="#Grammar">Grammar</a></li>
	<li><a href="#Hoisting">Hoisting</a></li>
	<li><a href="#Inheritance,_grammar">Inheritance, grammar</a></li>
	<li><a href="#LA(n)">LA(n)</a></li>
	<li><a href="#Left-prefix,_left_factor">Left-prefix, left factor</a></li>
	<li><a href="#Literal">Literal</a></li>
	<li><a href="#Linear_approximate_lookahead">Linear approximate lookahead</a></li>
	<li><a href="#LL(k)">LL(k)</a></li>
	<li><a href="#LT(n)">LT(n)</a></li>
	<li><a href="#Language">Language</a></li>
	<li><a href="#Lexer">Lexer</a></li>
	<li><a href="#Lookahead">Lookahead</a></li>
	<li><a href="#nextToken">nextToken</a></li>
	<li><a href="#NFA">NFA</a></li>
	<li><a href="#Nondeterministic">Nondeterministic</a></li>
	<li><a href="#Parser">Parser</a></li>
	<li><a href="#Predicate,_semantic">Predicate, semantic</a></li>
	<li><a href="#Predicate,_syntactic">Predicate, syntactic</a></li>
	<li><a href="#Production">Production</a></li>
	<li><a href="#Protected">Protected</a></li>
	<li><a href="#Recursive-descent">Recursive-descent</a></li>
	<li><a href="#Regular">Regular</a></li>
	<li><a href="#Rule">Rule</a></li>
	<li><a href="#Scanner">Scanner</a></li>
	<li><a href="#Semantics">Semantics</a></li>
	<li><a href="#Subrule">Subrule</a></li>
	<li><a href="#Syntax">Syntax</a></li>
	<li><a href="#Token">Token</a></li>
	<li><a href="#Token_stream">Token stream</a></li>
	<li><a href="#Tree">Tree</a></li>
	<li><a href="#Tree_parser">Tree parser</a></li>
	<li><a href="#Vocabulary">Vocabulary</a></li>
	<li><a href="#Wow">Wow</a></li>
</ul>
<!--/index-->
</font>

<h3><a id="Ambiguous" name="Ambiguous">Ambiguous</a></h3>

A language is ambiguous if the same sentence or phrase can be
interpreted in more than a single way.  For example, the following
sentence by Groucho Marx is easily interpreted in two ways: "I once
shot an elephant in my pajamas.  How he got in my pajamas I'll never
know!"  In the computer world, a typical language ambiguity is the
if-then-else ambiguity where the else-clause may be attached to either
the most recent if-then or an older one.  Reference manuals for
computer languages resolve this ambiguity by stating that else-clauses
always match up with the most recent if-then.

<p>
A grammar is ambiguous if the same input sequence can be derived in
multiple ways.  Ambiguous languages always yield ambiguous grammars
unless you can find a way to encode semantics (actions or predicates
etc...) that resolve the ambiguity.  Most language tools like ANTLR
resolve the if-then-else ambiguity by simply choosing to match
greedily (i.e., as soon as possible).  This matches the else with the
most recent if-then.  See nondeterministic.

<h3><a id="ANTLR" name="ANTLR">ANTLR</a></h3>

<b>AN</b>other <b>T</b>ool for <b>L</b>anguage <b>R</b>ecognition, a
predicated-LL(k) parser generator that handles lexers, parsers, and
tree parsers.  ANTLR has been available since 1990 and led to a
resurgence of recursive-descent parsing after decades dominated by LR
and other DFA-based strategies.

<h3><a id="AST" name="AST">AST</a></h3>

<b>A</b>bstract <b>S</b>yntax <b>T</b>ree.  ASTs are used as internal
representations of an input stream, normally constructed during a
parsing phase.  Because AST are two-dimensional trees they
can encode the structure (as determined by the parser) of the input as
well as the input symbols.

<p>
A homogeneous AST is in one in which the physical objects are all of
the same type; e.g., <tt>CommonAST</tt> in ANTLR.  A heterogeneous
tree may have multiple types such as <tt>PlusNode</tt>,
<tt>MultNode</tt> etc...

<p> An AST is not a parse tree, which encodes the sequence of rules
used to match input symbols.  See <a
href="http://www.jguru.com/faq/view.jsp?EID=814505"><b>What's the
difference between a parse tree and an abstract syntax tree (AST)? Why
doesn't ANTLR generate trees with nodes for grammar rules like JJTree
does?</b></a>.

<p>An AST for input <tt>3+4</tt> might be represented as
<pre>
   +
  / \
 3   4
</pre>

or more typically (ala ANTLR) in child-sibling form:

<pre>
+
|
3--4
</pre>

Operators are usually subtree roots and operands are usually leaves.

<h3><a id="Bit_set" name="Bit_set">Bit set</a></h3>

Bit sets are an extremely efficient representation for dense integer
sets.  You can easily encode sets of strings also by mapping unique
strings to unique integers.  ANTLR uses bitsets for lookahead
prediction in parsers and lexers.  Simple bit set implementations do
not work so well for sparse sets, particularly when the maximum
integer stored in the set is large.

<p> ANTLR's bit set represents membership with a bit for each possible
integer value.  For a maximum value of <i>n</i>, a bit set needs
<i>n/64</i> long words or <i>n/8</i> bytes.  For ASCII bit sets with a
maximum value of 127, you only need 16 bytes or 2 long words.  UNICODE
has a max value of \uFFFF or 65535, requiring 8k bytes, and these sets
are typically sparse.  Fortunately most lexers only need a few of
these space inefficient (but speedy) bitsets and so it's not really a
problem.

<h3><a id="Child-sibling_Tree" name="Child-sibling_Tree">Child-sibling Tree</a></h3>

A particularly efficient data structure for representing trees.  See AST.

<h3><a id="Context-free_grammar" name="Context-free_grammar">Context-free grammar</a></h3>

A grammar where recognition of a particular construct does not depend
on whether it is in a particular syntactic context.  A context-free
grammar has a set of rules like

<pre>
stat : IF expr THEN stat
     | ...
     ;
</pre>

where there is no restriction on when the <tt>IF</tt> alternative may
be applied--if you are in rule <tt>stat</tt>, you may apply the
alternative.

<h3><a id="Context-sensitive" name="Context-sensitive">Context-sensitive</a></h3>

A grammar where recognition of a particular construct may depend on a
syntactic context.  You never see these grammars in practice because
they are impractical (Note, an Earley parser is O(n^3) worst-case for
context-<i>free</i> grammars).  A context-free rule looks like:

<pre>
&Alpha; &rarr; &gamma;
</pre>

but a context-<i>sensitive</i> rule may have context on the left-side:

<pre>
&alpha;&Alpha;&beta; &rarr; &alpha;&gamma;&beta;
</pre>

meaning that rule &Alpha; may only be applied (converted to &gamma;)
in between &alpha; and &beta;.

<p>
In an ANTLR sense, you can recognize context-sensitive constructs with
a semantic predicate.  The action evaluates to true or false
indicating the validity of applying the alternative.

<p>
See <a
href="http://www.wikipedia.org/wiki/Context-sensitive"><b>Context-sensitive gramar</b></a>.

<h3><a id="DFA" name="DFA">DFA</a></h3>

<b>D</b>eterministic <b>F</b>inite <b>A</b>utomata.  A state machine
used typically to formally describe lexical analyzers.  <tt>lex</tt>
builds a DFA to recognize tokens whereas ANTLR builds a recursive
descent lexer similar to what you would build by hand.  See <a
href="http://www.wikipedia.org/wiki/Finite_state_machine"><b>Finite
state machine</b></a> and ANTLR's lexer documentation.

<h3><a id="FIRST" name="FIRST">FIRST</a></h3>

The set of symbols that may be matched on the left-edge of a rule.
For example, the FIRST(decl) is set {ID, INT} for the following:

<pre>
decl : ID ID SEMICOLON
     | INT ID SEMICOLON
     ;
</pre>

The situation gets more complicated when you have optional
constructs.  The FIRST(a) below is {A,B,C}

<pre>
a : (A)? B
  | C
  ;
</pre>

because the A is optional and the B may be seen on the left-edge.

<p>
Naturally k>1 lookahead symbols makes this even more complicated.
FIRST_k must track sets of k-sequences not just individual symbols.

<h3><a id="FOLLOW" name="FOLLOW">FOLLOW</a></h3>

The set of input symbols that may follow any reference to the
specified rule.  For example, FOLLOW(decl) is {RPAREN, SEMICOLON):

<pre>
methodHead : ID LPAREN decl RPAREN ;
var : decl SEMICOLON ;
decl : TYPENAME ID ;
</pre>

because RPAREN and SEMICOLON both follow references to rule decl.
FIRST and FOLLOW computations are used to analyze grammars and
generate parsing decisions.

<p>
This grammar analysis all gets very complicated when k>1.

<h3><a id="Grammar" name="Grammar">Grammar</a></h3>

A finite means of formally describing the structure of a possibly
infinite language.  Parser generators build parsers that recognize
sentences in the language described by a grammar.  Most parser
generators allow you to add actions to be executed during the parse.

<h3><a id="Hoisting" name="Hoisting">Hoisting</a></h3>

Semantic predicates describe the semantic context in which a rule or
alternative applies.  The predicate is hoisted into a prediction
expression.  Hoisting typically refers to pulling a predicate out of
its enclosing rule and into the prediction expression of another
rule.  For example,

<pre>
decl     : typename ID SEMICOLON
         | ID ID SEMICOLON
         ;
typename : {isType(LT(1))}? ID
         ;
</pre>

The predicate is not needed in typename as there is no decision,
however, rule decl needs it to distinguish between its two
alternatives.  The first alternative would look like:

<pre>
if ( LA(1)==ID && isType(LT(1)) ) {
  typename();
  match(ID);
  match(SEMICOLON);
}
</pre>

PCCTS 1.33 did, but ANTLR currently does not hoist predicates into
other rules.

<h3><a id="Inheritance,_grammar" name="Inheritance,_grammar">Inheritance, grammar</a></h3>

The ability of ANTLR to define a new grammar as it differs from an
existing grammar.  See the ANTLR documentation.

<h3><a id="LA(n)" name="LA(n)">LA(n)</a></h3>
The nth lookahead character, token type, or AST node type depending
on the grammar type (lexer, parser, or tree parser respectively).

<h3><a id="Left-prefix,_left_factor" name="Left-prefix,_left_factor">Left-prefix, left factor</a></h3>

A common sequence of symbols on the left-edge of a set of alternatives
such as:

<pre>
a : A B X
  | A B Y
  ;
</pre>

The left-prefix is A B, which you can remove by left-factoring:

<pre>
a : A B (X|Y)
  ;
</pre>

Left-factoring is done to reduce lookahead requirements.

<h3><a id="Literal" name="Literal">Literal</a></h3>

Generally a literal refers to a fixed string such as <tt>begin</tt>
that you wish to match.  When you reference a literal in an ANTLR
grammar via <tt>"begin"</tt>, ANTLR assigns it a token type like any
other token.  If you have defined a lexer, ANTLR provides information
about the literal (type and text) to the lexer so it may detect
occurrences of the literal.

<h3><a id="Linear_approximate_lookahead" name="Linear_approximate_lookahead">Linear approximate lookahead</a></h3>

An approximation to full lookahead (that can be applied to both LL and LR
parsers) for k>1 that reduces the complexity of storing and testing
lookahead from O(n^k) to O(nk); exponential to linear reduction.  When
linear approximate lookahead is insufficient (results in a
nondeterministic parser), you can use the approximate lookahead to
attenuate the cost of building the full decision.

<p>Here is a simple example illustrating the difference between full
and approximate lookahead:

<pre>
a : (A B | C D)
  | A D
  ;
</pre>

This rule is LL(2) but not linear approximate LL(2).  The real
FIRST_2(a) is {AB,CD} for alternative 1 and {AD} for alternative 2.
No intersection, so no problem.  Linear approximate lookahead
collapses all symbols at depth i yielding k sets instead of a possibly
n^k k-sequences.  The approximation (compressed) sets are
{AB,AD,CD,CB} and {AD}.  Note the introduction of the spurious
k-sequences AD and CB.  Unfortunately, this compression introduces a
conflict upon AD between the alternatives.  PCCTS did full LL(k) and
ANTLR does linear approximate only as I found that linear approximate
lookahead works for the vast majority of parsing decisions and is
extremely fast.  I find one or two problem spots in a large grammar
usually with ANTLR, which forces me to reorganize my grammar in a
slightly unnatural manner.  Unfortunately, your brain does full LL(k)
and ANTLR does a slightly weaker linear approximate lookahead--a
source of many (invalid) bug reports ;)

<p>
This compression was the subject of <a href="http://www.antlr.org/papers/parr.phd.thesis.pdf"><b>my doctoral
dissertation</b></a> (PDF 477k) at Purdue.

<h3><a id="LL(k)" name="LL(k)">LL(k)</a></h3>

Formally, LL(k) represents a class of parsers and grammars that parse
symbols from left-to-right (beginning to end of input stream) using a
leftmost derivation and using k symbols of lookahead.  A leftmost
derivation is one in which derivations (parses) proceed by attempting
to replace rule references from left-to-right within a production.
Given the following rule

<pre>
stat : IF expr THEN stat
     | ...
     ;
</pre>

an LL parser would match the IF then attempt to parse expr rather than
a rightmost derivation, which would attempt to parse stat first.

<p>
LL(k) is synonymous with a "top-down" parser because the
parser begins at the start symbol and works its way down the
derivation/parse tree (tree here means the stack of method activations
for recursive descent or symbol stack for a table-driven parser).  A
recursive-descent parser is particular implementation of an LL parser
that uses functions or method calls to implement the parser rather
than a table.

<p>
ANTLR generates predicate-LL(k) parsers that support syntactic and
sematic predicates allowing you to specify many context-free and
context-sensitive grammars (with a bit of work).


<h3><a id="LT(n)" name="LT(n)">LT(n)</a></h3>
In a parser, this is the nth lookahead Token object.

<h3><a id="Language" name="Language">Language</a></h3> A possibly infinite set of valid sentences.  The
vocabulary symbols may be characters, tokens, and tree nodes in an
ANTLR context.

<h3><a id="Lexer" name="Lexer">Lexer</a></h3>
   A recognizer that breaks up a stream of characters into
vocabulary symbols for a parser.  The parser pulls vocabulary symbols
from the lexer via a queue. 

<h3><a id="Lookahead" name="Lookahead">Lookahead</a></h3>

When parsing a stream of input symbols, a parser has matched (and no
longer needs to consider) a portion of the stream to the left of its
read pointer.  The next k symbols to the right of the read pointer are
considered the fixed lookahead.  This information is used to direct
the parser to the next state.  In an LL(k) parser this means to
predict which path to take from the current state using the next k
symbols of lookahead.

<p> ANTLR supports syntactic predicates, a manually-specified form of
backtracking that effectively gives you infinite lookahead.  For
example, consider the following rule that distinguishes between sets
(comma-separated lists of words) and parallel assignments (one list
assigned to another):

<pre>
stat:   ( list "=" )=> list "=" list
    |   list
    ;
</pre>

If a list followed by an assignment operator is found on the input
stream, the first production is predicted. If not, the second
alternative production is attempted.

<h3><a id="nextToken" name="nextToken">nextToken</a></h3>

A lexer method automatically generated by ANTLR that figures out which
of the lexer rules to apply.  For example, if you have two rules ID
and INT in your lexer, ANTLR will generate a lexer with methods for ID
and INT as well as a nextToken method that figures out which rule
method to attempt given k input characters.

<h3><a id="NFA" name="NFA">NFA</a></h3>

<b>N</b>ondeterministic <b>F</b>inite <b>A</b>utomata.  See <a
href="http://www.wikipedia.org/wiki/Finite_state_machine"><b>Finite
state machine</b></a>.

<h3><a id="Nondeterministic" name="Nondeterministic">Nondeterministic</a></h3>

A parser is nondeterministic if there is at least one decision point
where the parser cannot resolve which path to take.  Nondeterminisms
arise because of parsing strategy weaknesses.

<ul>

<li>If your strategy works only for unambiguous grammars, then
ambiguous grammars will yield nondeterministic parsers; this is true
of the basic LL, LR strategies.  Even unambiguous grammars can yield
nondeterministic parsers though.  Here is a nondeterministic LL(1)
grammar:

<pre>
decl : ID ID SEMICOLON
     | ID SEMICOLON
     ;
</pre>

Rule <tt>decl</tt> is, however, LL(2) because the second lookahead
symbol (either ID or SEMICOLON) uniquely determines which alternative
to predict.  You could also left-factor the rule to reduce the
lookahead requirements.<br><br>

<li>
If you are willing to pay a
performance hit or simply need to handle ambiguous grammars, you can
use an Earley parser or a Tomita parser (LR-based) that match all
possible interpretations of the input, thus, avoiding the idea of
nondeterminism altogether.  This does present problems when trying to
execute actions, however, because multiple parses are, in effect,
occurring in parallel.

</ul>

<p>
Note that a parser may have multiple decision points that are
nondeterministic.

<h3><a id="Parser" name="Parser">Parser</a></h3>
A recognizer that applies a grammatical structure to a stream
of vocabulary symbols called tokens.

<h3><a id="Predicate,_semantic" name="Predicate,_semantic">Predicate, semantic</a></h3>

A semantic predicate is a boolean expression used to alter the parse
based upon semantic information.  This information is usually a
function of the constructs/input that have already been matched, but
can even be a flag that turns on and off subsets of the language (as
you might do for a grammar handling both K&R and ANSI C).  One of the
most common semantic predicates uses a symbol table to help
distinguish between syntactically, but semantically different
productions.  In FORTRAN, array references and function calls look the
same, but may be distinguished by checking what the type of the
identifier is.

<pre>
expr : {isVar(LT(1))}? ID LPAREN args RPAREN  // array ref
     | {isFunction(LT(1))}? ID LPAREN args RPAREN // func call
     ;
</pre>

<h3><a id="Predicate,_syntactic" name="Predicate,_syntactic">Predicate, syntactic</a></h3>

A selective form of backtracking used to recognize language constructs
that cannot be distinguished without seeing all or most of the
construct.  For example, in C++ some declarations look exactly like
expressions.  You have to check to see if it is a declaration.  If it
parses like a declaration, assume it is a declaration--reparse it with
"feeling" (execute your actions).  If not, it must be an expression or
an error:

<pre>
stat : (declaration) => declaration
     | expression
     ;
</pre>

<h3><a id="Production" name="Production">Production</a></h3>

An alternative in a grammar rule.

<h3><a id="Protected" name="Protected">Protected</a></h3>

A protected lexer rule does not represent a complete token--it is a
helper rule referenced by another lexer rule.  This overloading of the
access-visibility Java term occurs because if the rule is not visible,
it cannot be "seen" by the parser (yes, this nomeclature sucks).

<h3><a id="Recursive-descent" name="Recursive-descent">Recursive-descent</a></h3>

See LL(k).

<h3><a id="Regular" name="Regular">Regular</a></h3>

A regular language is one that can be described by a regular grammar
or regular expression or accepted by a DFA-based lexer such as those
generated by <tt>lex</tt>.  Regular languages are normally used to
describe tokens.

<p> In practice you can pick out a regular grammar by noticing that
references to other rules are not allowed accept at the end of a
production.  The following grammar is regular because reference to
<tt>B</tt> occurs at the right-edge of rule <tt>A</tt>.

<pre>
A : ('a')+ B ;
B : 'b' ;
</pre>

Another way to look at it is, "what can I recognize without a stack
(such as a method return address stack)?".

<p>
Regular grammars cannot describe context-free languages, hence, LL or
LR based grammars are used to describe programming languages.  ANTLR
is not restricted to regular languages for tokens because it generates
recursive-descent lexers.  This makes it handy to recognize HTML tags
and so on all in the lexer.

<h3><a id="Rule" name="Rule">Rule</a></h3>

A rule describes a partial sentence in a language such as a statement
or expression in a programming language.  Rules may have one or more
alternative productions.

<h3><a id="Scanner" name="Scanner">Scanner</a></h3>

See Lexer.

<h3><a id="Semantics" name="Semantics">Semantics</a></h3> See <a href="http://www.jguru.com/faq/view.jsp?EID=81"><b>What
do "syntax" and "semantics" mean and how are they different?</b></a>.

<h3><a id="Subrule" name="Subrule">Subrule</a></h3>

Essentially a rule that has been expanded inline.  Subrules are
enclosed in parenthesis and may have suffixes like star, plus, and
question mark that indicate zero-or-more, one-or-more, or optional.
The following rule has 3 subrules:

<pre>
a : (A|B)+ (C)* (D)?
  ;
</pre>

<h3><a id="Syntax" name="Syntax">Syntax</a></h3>
See <a href="http://www.jguru.com/faq/view.jsp?EID=81"><b>What
do "syntax" and "semantics" mean and how are they different?</b></a>.

<h3><a id="Token" name="Token">Token</a></h3>

A vocabulary symbol for a language.  This term typically refers to the
vocabulary symbols of a parser.  A token may represent a constant
symbol such as a keyword like <tt>begin</tt> or a "class" of input
symbols like <tt>ID</tt> or <tt>INTEGER_LITERAL</tt>.

<h3><a id="Token_stream" name="Token_stream">Token stream</a></h3>

See <a href="http://www.antlr.org/doc/streams.html"><b>Token
Streams</b></a> in the ANTLR documentation.

<h3><a id="Tree" name="Tree">Tree</a></h3>
See AST and <a
href="http://www.jguru.com/faq/view.jsp?EID=814505"><b>What's the
difference between a parse tree and an abstract syntax tree (AST)? Why
doesn't ANTLR generate trees with nodes for grammar rules like JJTree
does?</b></a>.
  
<h3><a id="Tree_parser" name="Tree_parser">Tree parser</a></h3>   A recognizer that applies a grammatical structure to a
two-dimensional input tree.  Grammatical rules are like an "executable
comment" that describe the tree structure.  These parsers are useful
during translation to (i) annotate trees with, for example, symbol
table information, (2) perform tree rewrites, and (3) generate output.

<h3><a id="Vocabulary" name="Vocabulary">Vocabulary</a></h3>

The set of symbols used to construct sentences in a language.  These
symbols are usually called tokens or token types.  For lexers, the
vocabulary is a set of characters.

<h3><a id="Wow" name="Wow">Wow</a></h3>  See ANTLR.

</font>
</body>
</html>
Back to Top