PageRenderTime 31ms CodeModel.GetById 20ms app.highlight 4ms RepoModel.GetById 1ms app.codeStats 0ms

/vendor/peg/peg.1

http://github.com/feyeleanor/RubyGoLightly
Unknown | 887 lines | 809 code | 78 blank | 0 comment | 0 complexity | 7b2dbbee42a7f037ce59433cfcd1e11a MD5 | raw file
  1.\" Copyright (c) 2007 by Ian Piumarta
  2.\" All rights reserved.
  3.\" 
  4.\" Permission is hereby granted, free of charge, to any person obtaining a
  5.\" copy of this software and associated documentation files (the 'Software'),
  6.\" to deal in the Software without restriction, including without limitation
  7.\" the rights to use, copy, modify, merge, publish, distribute, and/or sell
  8.\" copies of the Software, and to permit persons to whom the Software is
  9.\" furnished to do so, provided that the above copyright notice(s) and this
 10.\" permission notice appear in all copies of the Software.  Acknowledgement
 11.\" of the use of this Software in supporting documentation would be
 12.\" appreciated but is not required.
 13.\" 
 14.\" THE SOFTWARE IS PROVIDED 'AS IS'.  USE ENTIRELY AT YOUR OWN RISK.
 15.\" 
 16.\" Last edited: 2007-09-13 08:40:20 by piumarta on emilia.local
 17.\"
 18.TH PEG 1 "May 2007" "Version 0.1"
 19.SH NAME
 20peg, leg \- parser generators
 21.SH SYNOPSIS
 22.B peg
 23.B [\-hvV \-ooutput]
 24.I [filename ...]
 25.sp 0
 26.B leg
 27.B [\-hvV \-ooutput]
 28.I [filename ...]
 29.SH DESCRIPTION
 30.I peg
 31and
 32.I leg
 33are tools for generating recursive-descent parsers: programs that
 34perform pattern matching on text.  They process a Parsing Expression
 35Grammar (PEG) [Ford 2004] to produce a program that recognises legal
 36sentences of that grammar.
 37.I peg
 38processes PEGs written using the original syntax described by Ford;
 39.I leg
 40processes PEGs written using slightly different syntax and conventions
 41that are intended to make it an attractive replacement for parsers
 42built with
 43.IR lex (1)
 44and
 45.IR yacc (1).
 46Unlike
 47.I lex
 48and
 49.IR yacc ,
 50.I peg
 51and
 52.I leg
 53support unlimited backtracking, provide ordered choice as a means for
 54disambiguation, and can combine scanning (lexical analysis) and
 55parsing (syntactic analysis) into a single activity.
 56.PP
 57.I peg
 58reads the specified
 59.IR filename s,
 60or standard input if no
 61.IR filename s
 62are given, for a grammar describing the parser to generate.
 63.I peg
 64then generates a C source file that defines a function
 65.IR yyparse().
 66This C source file can be included in, or compiled and then linked
 67with, a client program.  Each time the client program calls
 68.IR yyparse ()
 69the parser consumes input text according to the parsing rules,
 70starting from the first rule in the grammar.
 71.IR yyparse ()
 72returns non-zero if the input could be parsed according to the
 73grammar; it returns zero if the input could not be parsed.
 74.PP
 75The prefix 'yy' or 'YY' is prepended to all externally-visible symbols
 76in the generated parser.  This is intended to reduce the risk of
 77namespace pollution in client programs.  (The choice of 'yy' is
 78historical; see
 79.IR lex (1)
 80and
 81.IR yacc (1),
 82for example.)
 83.SH OPTIONS
 84.I peg
 85and 
 86.I leg
 87provide the following options:
 88.TP
 89.B \-h
 90prints a summary of available options and then exits.
 91.TP
 92.B \-ooutput
 93writes the generated parser to the file
 94.B output
 95instead of the standard output.
 96.TP
 97.B \-v
 98writes verbose information to standard error while working.
 99.TP
100.B \-V
101writes version information to standard error then exits.
102.SH A SIMPLE EXAMPLE
103The following
104.I peg
105input specifies a grammar with a single rule (called 'start') that is
106satisfied when the input contains the string "username".
107.nf
108
109    start <- "username"
110
111.fi
112(The quotation marks are
113.I not
114part of the matched text; they serve to indicate a literal string to
115be matched.)  In other words,
116.IR  yyparse ()
117in the generated C source will return non-zero only if the next eight
118characters read from the input spell the word "username".  If the
119input contains anything else,
120.IR yyparse ()
121returns zero and no input will have been consumed.  (Subsequent calls
122to
123.IR yyparse ()
124will also return zero, since the parser is effectively blocked looking
125for the string "username".)  To ensure progress we can add an
126alternative clause to the 'start' rule that will match any single
127character if "username" is not found.
128.nf
129
130    start <- "username"
131           / .
132
133.fi
134.IR yyparse ()
135now always returns non-zero (except at the very end of the input).  To
136do something useful we can add actions to the rules.  These actions
137are performed after a complete match is found (starting from the first
138rule) and are chosen according to the 'path' taken through the grammar
139to match the input.  (Linguists would call this path a 'phrase
140marker'.)
141.nf
142
143    start <- "username"    { printf("%s\\n", getlogin()); }
144           / < . >         { putchar(yytext[0]); }
145
146.fi
147The first line instructs the parser to print the user's login name
148whenever it sees "username" in the input.  If that match fails, the
149second line tells the parser to echo the next character on the input
150the standard output.  Our parser is now performing useful work: it
151will copy the input to the output, replacing all occurrences of
152"username" with the user's account name.
153.PP
154Note the angle brackets ('<' and '>') that were added to the second
155alternative.  These have no effect on the meaning of the rule, but
156serve to delimit the text made available to the following action in
157the variable
158.IR yytext .
159.PP
160If the above grammar is placed in the file
161.BR username.peg ,
162running the command
163.nf
164
165    peg -o username.c username.peg
166
167.fi
168will save the corresponding parser in the file
169.BR username.c .
170To create a complete program this parser could be included by a C
171program as follows.
172.nf
173
174    #include <stdio.h>      /* printf(), putchar() */
175    #include <unistd.h>     /* getlogin() */
176
177    #include "username.c"   /* yyparse() */
178
179    int main()
180    {
181      while (yyparse())     /* repeat until EOF */
182        ;
183      return 0;
184    }
185.fi
186.SH PEG GRAMMARS
187A grammar consists of a set of named rules.
188.nf
189
190    name <- pattern
191
192.fi
193The
194.B pattern
195contains one or more of the following elements.
196.TP
197.B name
198The element stands for the entire pattern in the rule with the given
199.BR name .
200.TP
201.BR \(dq characters \(dq
202A character or string enclosed in double quotes is matched literally.
203The ANSI C esacpe sequences are recognised within the
204.IR characters .
205.TP
206.BR ' characters '
207A character or string enclosed in single quotes is matched literally, as above.
208.TP
209.BR [ characters ]
210A set of characters enclosed in square brackets matches any single
211character from the set, with escape characters recognised as above.
212If the set begins with an uparrow (^) then the set is negated (the
213element matches any character
214.I not
215in the set).  Any pair of characters separated with a dash (-)
216represents the range of characters from the first to the second,
217inclusive.  A single alphabetic character or underscore is matched by
218the following set.
219.nf
220
221    [a-zA-Z_]
222
223.fi
224Similarly, the following matches  any single non-digit character.
225.nf
226
227    [^0-9]
228
229.fi
230.TP
231.B .
232A dot matches any character.  Note that the only time this fails is at
233the end of file, where there is no character to match.
234.TP
235.BR ( \ pattern\  )
236Parentheses are used for grouping (modifying the precendence of the
237operators described below).
238.TP
239.BR { \ action\  }
240Curly braces surround actions.  The action is arbitray C source code
241to be executed at the end of matching.  Any braces within the action
242must be properly nested.  Any input text that was matched before the
243action and delimited by angle brackets (see below) is made available
244within the action as the contents of the character array
245.IR yytext .
246The length of (number of characters in)
247.I yytext
248is available in the variable
249.IR yyleng .
250(These variable names are historical; see
251.IR lex (1).)
252.TP
253.B <
254An opening angle bracket always matches (consuming no input) and
255causes the parser to begin accumulating matched text.  This text will
256be made available to actions in the variable
257.IR yytext .
258.TP
259.B >
260A closing angle bracket always matches (consuming no input) and causes
261the parser to stop accumulating text for
262.IR yytext .
263.PP
264The above
265.IR element s
266can be made optional and/or repeatable with the following suffixes:
267.TP
268.RB element\  ?
269The element is optional.  If present on the input, it is consumed and
270the match succeeds.  If not present on the input, no text is consumed
271and the match succeeds anyway.
272.TP
273.RB element\  +
274The element is repeatable.  If present on the input, one or more
275occurrences of
276.I element
277are consumed and the match succeeds.  If no occurrences of
278.I element
279are present on the input, the match fails.
280.TP
281.RB element\  *
282The element is optional and repeatable.  If present on the input, one or more
283occurrences of
284.I element
285are consumed and the match succeeds.  If no occurrences of
286.I element
287are present on the input, the match succeeds anyway.
288.PP
289The above elements and suffixes can be converted into predicates (that
290match arbitray input text and subsequently succeed or fail
291.I without
292consuming that input) with the following prefixes:
293.TP
294.BR & \ element
295The predicate succeeds only if
296.I element
297can be matched.  Input text scanned while matching
298.I element
299is not consumed from the input and remains available for subsequent
300matching.
301.TP
302.BR ! \ element
303The predicate succeeds only if
304.I element
305cannot be matched.  Input text scanned while matching
306.I element
307is not consumed from the input and remains available for subsequent
308matching.  A popular idiom is
309.nf
310
311    !.
312
313.fi
314which matches the end of file, after the last character of the input
315has already been consumed.
316.PP
317A special form of the '&' predicate is provided:
318.TP
319.BR & {\ expression\ }
320In this predicate the simple C
321.I expression
322.RB ( not
323statement) is evaluated immediately when the parser reaches the
324predicate.  If the
325.I expression
326yields non-zero (true) the 'match' succeeds and the parser continues
327with the next element in the pattern.  If the
328.I expression
329yields zero (false) the 'match' fails and the parser backs up to look
330for an alternative parse of the input.
331.PP
332Several elements (with or without prefixes and suffixes) can be
333combined into a
334.I sequence
335by writing them one after the other.  The entire sequence matches only
336if each individual element within it matches, from left to right.
337.PP
338Sequences can be separated into disjoint alternatives by the
339alternation operator '/'.
340.TP
341.RB sequence-1\  / \ sequence-2\  / \ ...\  / \ sequence-N
342Each sequence is tried in turn until one of them matches, at which
343time matching for the overall pattern succeeds.  If none of the
344sequences matches then the match of the overall pattern fails.
345.PP
346Finally, the pound sign (#) introduces a comment (discarded) that
347continues until the end of the line.
348.PP
349To summarise the above, the parser tries to match the input text
350against a pattern containing literals, names (representing other
351rules), and various operators (written as prefixes, suffixes,
352juxtaposition for sequencing and and infix alternation operator) that
353modify how the elements within the pattern are matched.  Matches are
354made from left to right, 'descending' into named sub-rules as they are
355encountered.  If the matching process fails, the parser 'back tracks'
356('rewinding' the input appropriately in the process) to find the
357nearest alternative 'path' through the grammar.  In other words the
358parser performs a depth-first, left-to-right search for the first
359successfully-matching path through the rules.  If found, the actions
360along the successful path are executed (in the order they were
361encountered).
362.PP
363Note that predicates are evaluated
364.I immediately
365during the search for a successful match, since they contribute to the
366success or failure of the search.  Actions, however, are evaluated
367only after a successful match has been found.
368.SH PEG GRAMMAR FOR PEG GRAMMARS
369The grammar for
370.I peg
371grammars is shown below.  This will both illustrate and formalise
372the above description.
373.nf
374
375    Grammar         <- Spacing Definition+ EndOfFile
376    
377    Definition      <- Identifier LEFTARROW Expression
378    Expression      <- Sequence ( SLASH Sequence )*
379    Sequence        <- Prefix*
380    Prefix          <- AND Action
381                     / ( AND | NOT )? Suffix
382    Suffix          <- Primary ( QUERY / STAR / PLUS )?
383    Primary         <- Identifier !LEFTARROW
384                     / OPEN Expression CLOSE
385                     / Literal
386                     / Class
387                     / DOT
388                     / Action
389                     / BEGIN
390                     / END
391    
392    Identifier      <- < IdentStart IdentCont* > Spacing
393    IdentStart      <- [a-zA-Z_]
394    IdentCont       <- IdentStart / [0-9]
395    Literal         <- ['] < ( !['] Char  )* > ['] Spacing
396                     / ["] < ( !["] Char  )* > ["] Spacing
397    Class           <- '[' < ( !']' Range )* > ']' Spacing
398    Range           <- Char '-' Char / Char
399    Char            <- '\\\\' [abefnrtv'"\\[\\]\\\\]
400                     / '\\\\' [0-3][0-7][0-7]
401                     / '\\\\' [0-7][0-7]?
402                     / '\\\\' '-'
403                     / !'\\\\' .
404    LEFTARROW       <- '<-' Spacing
405    SLASH           <- '/' Spacing
406    AND             <- '&' Spacing
407    NOT             <- '!' Spacing
408    QUERY           <- '?' Spacing
409    STAR            <- '*' Spacing
410    PLUS            <- '+' Spacing
411    OPEN            <- '(' Spacing
412    CLOSE           <- ')' Spacing
413    DOT             <- '.' Spacing
414    Spacing         <- ( Space / Comment )*
415    Comment         <- '#' ( !EndOfLine . )* EndOfLine
416    Space           <- ' ' / '\\t' / EndOfLine
417    EndOfLine       <- '\\r\\n' / '\\n' / '\\r'
418    EndOfFile       <- !.
419    Action          <- '{' < [^}]* > '}' Spacing
420    BEGIN           <- '<' Spacing
421    END             <- '>' Spacing
422
423.fi
424.SH LEG GRAMMARS
425.I leg
426is a variant of
427.I peg
428that adds some features of
429.IR lex (1)
430and
431.IR yacc (1).
432It differs from
433.I peg
434in the following ways.
435.TP
436.BI %{\  text... \ %}
437A declaration section can appear anywhere that a rule definition is
438expected.  The
439.I text
440between the delimiters '%{' and '%}' is copied verbatim to the
441generated C parser code
442.I before
443the code that implements the parser itself.
444.TP
445.IB name\  = \ pattern
446The 'assignment' operator replaces the left arrow operator '<-'.
447.TP
448.B rule-name
449Hyphens can appear as letters in the names of rules.  Each hyphen is
450converted into an underscore in the generated C source code.  A single
451single hyphen '-' is a legal rule name.
452.nf
453
454    -       = [ \\t\\n\\r]*
455    number  = [0-9]+                 -
456    name    = [a-zA-Z_][a-zA_Z_0-9]* -
457    l-paren = '('                    -
458    r-paren = ')'                    -
459    
460.fi
461This example shows how ignored whitespace can be obvious when reading
462the grammar and yet unobtrusive when placed liberally at the end of
463every rule associated with a lexical element.
464.TP
465.IB seq-1\  | \ seq-2
466The alternation operator is vertical bar '|' rather than forward
467slash '/'.  The
468.I peg
469rule
470.nf
471
472    name <- sequence-1
473          / sequence-2
474          / sequence-3
475
476.fi
477is therefore written
478.nf
479
480    name = sequence-1
481         | sequence-2
482         | sequence-3
483         ;
484
485.fi
486in
487.I leg
488(with the final semicolon being optional, as described next).
489.TP
490.IB pattern\  ;
491A semicolon punctuator can optionally terminate a
492.IR pattern .
493.TP
494.BI %% \ text...
495A double percent '%%' terminates the rules (and declarations) section of
496the grammar.  All
497.I text
498following '%%' is copied verbatim to the generated C parser code
499.I after
500the parser implementation code.
501.TP
502.BI $$\ = \ value
503A sub-rule can return a semantic
504.I value
505from an action by assigning it to the pseudo-variable '$$'.  All
506semantic values must have the same type (which defaults to 'int').
507This type can be changed by defining YYSTYPE in a declaration section.
508.TP
509.IB identifier : name
510The semantic value returned (by assigning to '$$') from the sub-rule
511.I name
512is associated with the
513.I identifier
514and can be referred to in subsequent actions.
515.PP
516The desk calclator example below illustrates the use of '$$' and ':'.
517.SH LEG EXAMPLE: A DESK CALCULATOR
518The extensions in
519.I leg
520described above allow useful parsers and evaluators (including
521declarations, grammar rules, and supporting C functions such
522as 'main') to be kept within a single source file.  To illustrate this
523we show a simple desk calculator supporting the four common arithmetic
524operators and named variables.  The intermediate results of arithmetic
525evaluation will be accumulated on an implicit stack by returning them
526as semantic values from sub-rules.
527.nf
528
529    %{
530    #include <stdio.h>     /* printf() */
531    #include <stdlib.h>    /* atoi() */
532    int vars[26];
533    %}
534    
535    Stmt    = - e:Expr EOL                  { printf("%d\\n", e); }
536            | ( !EOL . )* EOL               { printf("error\\n"); }
537    
538    Expr    = i:ID ASSIGN s:Sum             { $$ = vars[i] = s; }
539            | s:Sum                         { $$ = s; }
540    
541    Sum     = l:Product
542                    ( PLUS  r:Product       { l += r; }
543                    | MINUS r:Product       { l -= r; }
544                    )*                      { $$ = l; }
545    
546    Product = l:Value
547                    ( TIMES  r:Value        { l *= r; }
548                    | DIVIDE r:Value        { l /= r; }
549                    )*                      { $$ = l; }
550    
551    Value   = i:NUMBER                      { $$ = atoi(yytext); }
552            | i:ID !ASSIGN                  { $$ = vars[i]; }
553            | OPEN i:Expr CLOSE             { $$ = i; }
554    
555    NUMBER  = < [0-9]+ >    -               { $$ = atoi(yytext); }
556    ID      = < [a-z]  >    -               { $$ = yytext[0] - 'a'; }
557    ASSIGN  = '='           -
558    PLUS    = '+'           -
559    MINUS   = '-'           -
560    TIMES   = '*'           -
561    DIVIDE  = '/'           -
562    OPEN    = '('           -
563    CLOSE   = ')'           -
564    
565    -       = [ \\t]*
566    EOL     = '\\n' | '\\r\\n' | '\\r' | ';'
567    
568    %%
569    
570    int main()
571    {
572      while (yyparse())
573        ;
574      return 0;
575    }
576
577.fi
578.SH LEG GRAMMAR FOR LEG GRAMMARS
579The grammar for
580.I leg
581grammars is shown below.  This will both illustrate and formalise the
582above description.
583.nf
584
585    grammar =       -
586                    ( declaration | definition )+
587                    trailer? end-of-file
588    
589    declaration =   '%{' < ( !'%}' . )* > RPERCENT
590    
591    trailer =       '%%' < .* >
592    
593    definition =    identifier EQUAL expression SEMICOLON?
594    
595    expression =    sequence ( BAR sequence )*
596    
597    sequence =      prefix+
598    
599    prefix =        AND action
600    |               ( AND | NOT )? suffix
601    
602    suffix =        primary ( QUERY | STAR | PLUS )?
603    
604    primary =       identifier COLON identifier !EQUAL
605    |               identifier !EQUAL
606    |               OPEN expression CLOSE
607    |               literal
608    |               class
609    |               DOT
610    |               action
611    |               BEGIN
612    |               END
613    
614    identifier =    < [-a-zA-Z_][-a-zA-Z_0-9]* > -
615    
616    literal =       ['] < ( !['] char )* > ['] -
617    |               ["] < ( !["] char )* > ["] -
618    
619    class =         '[' < ( !']' range )* > ']' -
620    
621    range =         char '-' char | char
622    
623    char =          '\\\\' [abefnrtv'"\\[\\]\\\\]
624    |               '\\\\' [0-3][0-7][0-7]
625    |               '\\\\' [0-7][0-7]?
626    |               !'\\\\' .
627    
628    action =        '{' < [^}]* > '}' -
629    
630    EQUAL =         '=' -
631    COLON =         ':' -
632    SEMICOLON =     ';' -
633    BAR =           '|' -
634    AND =           '&' -
635    NOT =           '!' -
636    QUERY =         '?' -
637    STAR =          '*' -
638    PLUS =          '+' -
639    OPEN =          '(' -
640    CLOSE =         ')' -
641    DOT =           '.' -
642    BEGIN =         '<' -
643    END =           '>' -
644    RPERCENT =      '%}' -
645    
646    - =             ( space | comment )*
647    space =         ' ' | '\\t' | end-of-line
648    comment =       '#' ( !end-of-line . )* end-of-line
649    end-of-line =   '\\r\\n' | '\\n' | '\\r'
650    end-of-file =   !.
651
652.fi
653.SH CUSTOMISING THE PARSER
654The following symbols can be redefined in declaration sections to
655modify the generated parser code.
656.TP
657.B YYSTYPE
658The semantic value type.  The pseudo-variable '$$' and the
659identifiers 'bound' to rule results with the colon operator ':' should
660all be considered as being declared to have this type.  The default
661value is 'int'.
662.TP
663.B YYPARSE
664The name of the main entry point to the parser.  The default value
665is 'yyparse'.
666.TP
667.B YYPARSEFROM
668The name of an alternative entry point to the parser.  This function
669expects one argument: the function corresponding to the rule from
670which the search for a match should begin.  The default
671is 'yyparsefrom'.  Note that yyparse() is defined as
672.nf
673
674    int yyparse() { return yyparsefrom(yy_foo); }
675
676.fi
677where 'foo' is the name of the first rule in the grammar.
678.TP
679.BI YY_INPUT( buf , \ result , \ max_size )
680This macro is invoked by the parser to obtain more input text.
681.I buf
682points to an area of memory that can hold at most
683.I max_size
684characters.  The macro should copy input text to
685.I buf
686and then assign the integer variable
687.I result
688to indicate the number of characters copied.  If no more input is available,
689the macro should assign 0 to
690.IR result .
691By default, the YY_INPUT macro is defined as follows.
692.nf
693
694    #define YY_INPUT(buf, result, max_size)        \\
695    {                                              \\
696      int yyc= getchar();                          \\
697      result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \\
698    }
699
700.fi
701.TP
702.B YY_DEBUG
703If this symbols is defined then additional code will be included in
704the parser that prints vast quantities of arcane information to the
705standard error while the parser is running.
706.TP
707.B YY_BEGIN
708This macro is invoked to mark the start of input text that will be
709made available in actions as 'yytext'.  This corresponds to
710occurrences of '<' in the grammar.  These are converted into
711predicates that are expected to succeed.  The default definition
712.nf
713
714    #define YY_BEGIN (yybegin= yypos, 1)
715
716.fi
717therefore saves the current input position and returns 1 ('true') as
718the result of the predicate.
719.TP
720.B YY_END
721This macros corresponds to '>' in the grammar.  Again, it is a
722predicate so the default definition saves the input position
723before 'succeeding'.
724.nf
725
726    #define YY_END (yyend= yypos, 1)
727
728.fi
729.TP
730.BI YY_PARSE( T )
731This macro declares the parser entry points (yyparse and yyparsefrom)
732to be of type
733.IR T .
734The default definition
735.nf
736
737    #define YY_PARSE(T) T
738
739.fi
740leaves yyparse() and yyparsefrom() with global visibility.  If they
741should not be externally visible in other source files, this macro can
742be redefined to declare them 'static'.
743.nf
744
745    #define YY_PARSE(T) static T
746
747.fi
748.PP
749The following variables can be reffered to within actions.
750.TP
751.B char *yybuf
752This variable points to the parser's input buffer used to store input
753text that has not yet been matched.
754.TP
755.B int yypos
756This is the offset (in yybuf) of the next character to be matched and
757consumed.
758.TP
759.B char *yytext
760The most recent matched text delimited by '<' and '>' is stored in this variable.
761.TP
762.B int yyleng
763This variable indicates the number of characters in 'yytext'.
764.SH DIAGNOSTICS
765.I peg
766and
767.I leg
768warn about the following conditions while converting a grammar into a parser.
769.TP
770.B syntax error
771The input grammar was malformed in some way.  The error message will
772include the text about to be matched (often backed up a huge amount
773from the actual location of the error) and the line number of the most
774recently considered character (which is often the real location of the
775problem).
776.TP
777.B rule 'foo' used but not defined
778The grammar referred to a rule named 'foo' but no definition for it
779was given.  Attempting to use the generated parser will likely result
780in errors from the linker due to undefined symbols associated with the
781missing rule.
782.TP
783.B rule 'foo' defined but not used
784The grammar defined a rule named 'foo' and then ignored it.  The code
785associated with the rule is included in the generated parser which
786will in all other respects be healthy.
787.TP
788.B possible infinite left recursion in rule 'foo'
789There exists at least one path through the grammar that leads from the
790rule 'foo' back to (a recursive invocation of) the same rule without
791consuming any input.
792.PP
793Left recursion, especially that found in standards documents, is
794often 'direct' and implies trivial repetition.
795.nf
796
797    # (6.7.6)
798    direct-abstract-declarator =
799        LPAREN abstract-declarator RPAREN
800    |   direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
801    |   direct-abstract-declarator? LBRACKET STAR RBRACKET
802    |   direct-abstract-declarator? LPAREN param-type-list? RPAREN
803
804.fi
805The recursion can easily be eliminated by converting the parts of the
806pattern following the recursion into a repeatable suffix.
807.nf
808    
809    # (6.7.6)
810    direct-abstract-declarator =
811        direct-abstract-declarator-head?
812        direct-abstract-declarator-tail*
813    
814    direct-abstract-declarator-head =
815        LPAREN abstract-declarator RPAREN
816    
817    direct-abstract-declarator-tail =
818        LBRACKET assign-expr? RBRACKET
819    |   LBRACKET STAR RBRACKET
820    |   LPAREN param-type-list? RPAREN
821
822.fi
823.SH BUGS
824The 'yy' and 'YY' prefixes cannot be changed.
825.PP
826Left recursion is detected in the input grammar but is not handled
827correctly in the generated parser.
828.PP
829Diagnostics for errors in the input grammar are obscure and not
830particularly helpful.
831.PP
832Several commonly-used
833.IR lex (1)
834features (yywrap(), yyin, etc.) are completely absent.
835.PP
836The generated parser foes not contain '#line' directives to direct C
837compiler errors back to the grammar description when appropriate.
838.IR lex (1)
839features (yywrap(), yyin, etc.) are completely absent.
840.SH SEE ALSO
841D. Val Schorre,
842.I META II, a syntax-oriented compiler writing language,
84319th ACM National Conference, 1964, pp.\ 41.301--41.311.  Describes a
844self-implementing parser generator for analytic grammars with no
845backtracking.
846.PP
847Alexander Birman,
848.I The TMG Recognition Schema,
849Ph.D. dissertation, Princeton, 1970.  A mathematical treatment of the
850power and complexity of recursive-descent parsing with backtracking.
851.PP
852Bryan Ford,
853.I Parsing Expression Grammars: A Recognition-Based Syntactic Foundation,
854ACM SIGPLAN Symposium on Principles of Programming Languages, 2004.
855Defines PEGs and analyses them in relation to context-free and regular
856grammars.  Introduces the syntax adopted in
857.IR peg .
858.PP
859The standard Unix utilies
860.IR lex (1)
861and
862.IR yacc (1)
863which influenced the syntax and features of
864.IR leg .
865.PP
866The source code for
867.I peg
868and
869.I leg
870whose grammar parsers are written using themselves.
871.PP
872The latest version of this software and documentation:
873.nf
874
875    http://piumarta.com/software/peg
876
877.fi
878.SH AUTHOR
879.IR peg ,
880.I leg
881and this manual page were written by Ian Piumarta (first-name at
882last-name dot com) while investigating the viablility of regular- and
883parsing-expression grammars for efficiently extracting type and
884signature information from C header files.
885.PP
886Please send bug reports and suggestions for improvements to the author
887at the above address.