/vendor/peg/peg.1
http://github.com/feyeleanor/RubyGoLightly · Unknown · 887 lines · 809 code · 78 blank · 0 comment · 0 complexity · 7b2dbbee42a7f037ce59433cfcd1e11a MD5 · raw file
- .\" Copyright (c) 2007 by Ian Piumarta
- .\" All rights reserved.
- .\"
- .\" Permission is hereby granted, free of charge, to any person obtaining a
- .\" copy of this software and associated documentation files (the 'Software'),
- .\" to deal in the Software without restriction, including without limitation
- .\" the rights to use, copy, modify, merge, publish, distribute, and/or sell
- .\" copies of the Software, and to permit persons to whom the Software is
- .\" furnished to do so, provided that the above copyright notice(s) and this
- .\" permission notice appear in all copies of the Software. Acknowledgement
- .\" of the use of this Software in supporting documentation would be
- .\" appreciated but is not required.
- .\"
- .\" THE SOFTWARE IS PROVIDED 'AS IS'. USE ENTIRELY AT YOUR OWN RISK.
- .\"
- .\" Last edited: 2007-09-13 08:40:20 by piumarta on emilia.local
- .\"
- .TH PEG 1 "May 2007" "Version 0.1"
- .SH NAME
- peg, leg \- parser generators
- .SH SYNOPSIS
- .B peg
- .B [\-hvV \-ooutput]
- .I [filename ...]
- .sp 0
- .B leg
- .B [\-hvV \-ooutput]
- .I [filename ...]
- .SH DESCRIPTION
- .I peg
- and
- .I leg
- are tools for generating recursive-descent parsers: programs that
- perform pattern matching on text. They process a Parsing Expression
- Grammar (PEG) [Ford 2004] to produce a program that recognises legal
- sentences of that grammar.
- .I peg
- processes PEGs written using the original syntax described by Ford;
- .I leg
- processes PEGs written using slightly different syntax and conventions
- that are intended to make it an attractive replacement for parsers
- built with
- .IR lex (1)
- and
- .IR yacc (1).
- Unlike
- .I lex
- and
- .IR yacc ,
- .I peg
- and
- .I leg
- support unlimited backtracking, provide ordered choice as a means for
- disambiguation, and can combine scanning (lexical analysis) and
- parsing (syntactic analysis) into a single activity.
- .PP
- .I peg
- reads the specified
- .IR filename s,
- or standard input if no
- .IR filename s
- are given, for a grammar describing the parser to generate.
- .I peg
- then generates a C source file that defines a function
- .IR yyparse().
- This C source file can be included in, or compiled and then linked
- with, a client program. Each time the client program calls
- .IR yyparse ()
- the parser consumes input text according to the parsing rules,
- starting from the first rule in the grammar.
- .IR yyparse ()
- returns non-zero if the input could be parsed according to the
- grammar; it returns zero if the input could not be parsed.
- .PP
- The prefix 'yy' or 'YY' is prepended to all externally-visible symbols
- in the generated parser. This is intended to reduce the risk of
- namespace pollution in client programs. (The choice of 'yy' is
- historical; see
- .IR lex (1)
- and
- .IR yacc (1),
- for example.)
- .SH OPTIONS
- .I peg
- and
- .I leg
- provide the following options:
- .TP
- .B \-h
- prints a summary of available options and then exits.
- .TP
- .B \-ooutput
- writes the generated parser to the file
- .B output
- instead of the standard output.
- .TP
- .B \-v
- writes verbose information to standard error while working.
- .TP
- .B \-V
- writes version information to standard error then exits.
- .SH A SIMPLE EXAMPLE
- The following
- .I peg
- input specifies a grammar with a single rule (called 'start') that is
- satisfied when the input contains the string "username".
- .nf
- start <- "username"
- .fi
- (The quotation marks are
- .I not
- part of the matched text; they serve to indicate a literal string to
- be matched.) In other words,
- .IR yyparse ()
- in the generated C source will return non-zero only if the next eight
- characters read from the input spell the word "username". If the
- input contains anything else,
- .IR yyparse ()
- returns zero and no input will have been consumed. (Subsequent calls
- to
- .IR yyparse ()
- will also return zero, since the parser is effectively blocked looking
- for the string "username".) To ensure progress we can add an
- alternative clause to the 'start' rule that will match any single
- character if "username" is not found.
- .nf
- start <- "username"
- / .
- .fi
- .IR yyparse ()
- now always returns non-zero (except at the very end of the input). To
- do something useful we can add actions to the rules. These actions
- are performed after a complete match is found (starting from the first
- rule) and are chosen according to the 'path' taken through the grammar
- to match the input. (Linguists would call this path a 'phrase
- marker'.)
- .nf
- start <- "username" { printf("%s\\n", getlogin()); }
- / < . > { putchar(yytext[0]); }
- .fi
- The first line instructs the parser to print the user's login name
- whenever it sees "username" in the input. If that match fails, the
- second line tells the parser to echo the next character on the input
- the standard output. Our parser is now performing useful work: it
- will copy the input to the output, replacing all occurrences of
- "username" with the user's account name.
- .PP
- Note the angle brackets ('<' and '>') that were added to the second
- alternative. These have no effect on the meaning of the rule, but
- serve to delimit the text made available to the following action in
- the variable
- .IR yytext .
- .PP
- If the above grammar is placed in the file
- .BR username.peg ,
- running the command
- .nf
- peg -o username.c username.peg
- .fi
- will save the corresponding parser in the file
- .BR username.c .
- To create a complete program this parser could be included by a C
- program as follows.
- .nf
- #include <stdio.h> /* printf(), putchar() */
- #include <unistd.h> /* getlogin() */
- #include "username.c" /* yyparse() */
- int main()
- {
- while (yyparse()) /* repeat until EOF */
- ;
- return 0;
- }
- .fi
- .SH PEG GRAMMARS
- A grammar consists of a set of named rules.
- .nf
- name <- pattern
- .fi
- The
- .B pattern
- contains one or more of the following elements.
- .TP
- .B name
- The element stands for the entire pattern in the rule with the given
- .BR name .
- .TP
- .BR \(dq characters \(dq
- A character or string enclosed in double quotes is matched literally.
- The ANSI C esacpe sequences are recognised within the
- .IR characters .
- .TP
- .BR ' characters '
- A character or string enclosed in single quotes is matched literally, as above.
- .TP
- .BR [ characters ]
- A set of characters enclosed in square brackets matches any single
- character from the set, with escape characters recognised as above.
- If the set begins with an uparrow (^) then the set is negated (the
- element matches any character
- .I not
- in the set). Any pair of characters separated with a dash (-)
- represents the range of characters from the first to the second,
- inclusive. A single alphabetic character or underscore is matched by
- the following set.
- .nf
- [a-zA-Z_]
- .fi
- Similarly, the following matches any single non-digit character.
- .nf
- [^0-9]
- .fi
- .TP
- .B .
- A dot matches any character. Note that the only time this fails is at
- the end of file, where there is no character to match.
- .TP
- .BR ( \ pattern\ )
- Parentheses are used for grouping (modifying the precendence of the
- operators described below).
- .TP
- .BR { \ action\ }
- Curly braces surround actions. The action is arbitray C source code
- to be executed at the end of matching. Any braces within the action
- must be properly nested. Any input text that was matched before the
- action and delimited by angle brackets (see below) is made available
- within the action as the contents of the character array
- .IR yytext .
- The length of (number of characters in)
- .I yytext
- is available in the variable
- .IR yyleng .
- (These variable names are historical; see
- .IR lex (1).)
- .TP
- .B <
- An opening angle bracket always matches (consuming no input) and
- causes the parser to begin accumulating matched text. This text will
- be made available to actions in the variable
- .IR yytext .
- .TP
- .B >
- A closing angle bracket always matches (consuming no input) and causes
- the parser to stop accumulating text for
- .IR yytext .
- .PP
- The above
- .IR element s
- can be made optional and/or repeatable with the following suffixes:
- .TP
- .RB element\ ?
- The element is optional. If present on the input, it is consumed and
- the match succeeds. If not present on the input, no text is consumed
- and the match succeeds anyway.
- .TP
- .RB element\ +
- The element is repeatable. If present on the input, one or more
- occurrences of
- .I element
- are consumed and the match succeeds. If no occurrences of
- .I element
- are present on the input, the match fails.
- .TP
- .RB element\ *
- The element is optional and repeatable. If present on the input, one or more
- occurrences of
- .I element
- are consumed and the match succeeds. If no occurrences of
- .I element
- are present on the input, the match succeeds anyway.
- .PP
- The above elements and suffixes can be converted into predicates (that
- match arbitray input text and subsequently succeed or fail
- .I without
- consuming that input) with the following prefixes:
- .TP
- .BR & \ element
- The predicate succeeds only if
- .I element
- can be matched. Input text scanned while matching
- .I element
- is not consumed from the input and remains available for subsequent
- matching.
- .TP
- .BR ! \ element
- The predicate succeeds only if
- .I element
- cannot be matched. Input text scanned while matching
- .I element
- is not consumed from the input and remains available for subsequent
- matching. A popular idiom is
- .nf
- !.
- .fi
- which matches the end of file, after the last character of the input
- has already been consumed.
- .PP
- A special form of the '&' predicate is provided:
- .TP
- .BR & {\ expression\ }
- In this predicate the simple C
- .I expression
- .RB ( not
- statement) is evaluated immediately when the parser reaches the
- predicate. If the
- .I expression
- yields non-zero (true) the 'match' succeeds and the parser continues
- with the next element in the pattern. If the
- .I expression
- yields zero (false) the 'match' fails and the parser backs up to look
- for an alternative parse of the input.
- .PP
- Several elements (with or without prefixes and suffixes) can be
- combined into a
- .I sequence
- by writing them one after the other. The entire sequence matches only
- if each individual element within it matches, from left to right.
- .PP
- Sequences can be separated into disjoint alternatives by the
- alternation operator '/'.
- .TP
- .RB sequence-1\ / \ sequence-2\ / \ ...\ / \ sequence-N
- Each sequence is tried in turn until one of them matches, at which
- time matching for the overall pattern succeeds. If none of the
- sequences matches then the match of the overall pattern fails.
- .PP
- Finally, the pound sign (#) introduces a comment (discarded) that
- continues until the end of the line.
- .PP
- To summarise the above, the parser tries to match the input text
- against a pattern containing literals, names (representing other
- rules), and various operators (written as prefixes, suffixes,
- juxtaposition for sequencing and and infix alternation operator) that
- modify how the elements within the pattern are matched. Matches are
- made from left to right, 'descending' into named sub-rules as they are
- encountered. If the matching process fails, the parser 'back tracks'
- ('rewinding' the input appropriately in the process) to find the
- nearest alternative 'path' through the grammar. In other words the
- parser performs a depth-first, left-to-right search for the first
- successfully-matching path through the rules. If found, the actions
- along the successful path are executed (in the order they were
- encountered).
- .PP
- Note that predicates are evaluated
- .I immediately
- during the search for a successful match, since they contribute to the
- success or failure of the search. Actions, however, are evaluated
- only after a successful match has been found.
- .SH PEG GRAMMAR FOR PEG GRAMMARS
- The grammar for
- .I peg
- grammars is shown below. This will both illustrate and formalise
- the above description.
- .nf
- Grammar <- Spacing Definition+ EndOfFile
-
- Definition <- Identifier LEFTARROW Expression
- Expression <- Sequence ( SLASH Sequence )*
- Sequence <- Prefix*
- Prefix <- AND Action
- / ( AND | NOT )? Suffix
- Suffix <- Primary ( QUERY / STAR / PLUS )?
- Primary <- Identifier !LEFTARROW
- / OPEN Expression CLOSE
- / Literal
- / Class
- / DOT
- / Action
- / BEGIN
- / END
-
- Identifier <- < IdentStart IdentCont* > Spacing
- IdentStart <- [a-zA-Z_]
- IdentCont <- IdentStart / [0-9]
- Literal <- ['] < ( !['] Char )* > ['] Spacing
- / ["] < ( !["] Char )* > ["] Spacing
- Class <- '[' < ( !']' Range )* > ']' Spacing
- Range <- Char '-' Char / Char
- Char <- '\\\\' [abefnrtv'"\\[\\]\\\\]
- / '\\\\' [0-3][0-7][0-7]
- / '\\\\' [0-7][0-7]?
- / '\\\\' '-'
- / !'\\\\' .
- LEFTARROW <- '<-' Spacing
- SLASH <- '/' Spacing
- AND <- '&' Spacing
- NOT <- '!' Spacing
- QUERY <- '?' Spacing
- STAR <- '*' Spacing
- PLUS <- '+' Spacing
- OPEN <- '(' Spacing
- CLOSE <- ')' Spacing
- DOT <- '.' Spacing
- Spacing <- ( Space / Comment )*
- Comment <- '#' ( !EndOfLine . )* EndOfLine
- Space <- ' ' / '\\t' / EndOfLine
- EndOfLine <- '\\r\\n' / '\\n' / '\\r'
- EndOfFile <- !.
- Action <- '{' < [^}]* > '}' Spacing
- BEGIN <- '<' Spacing
- END <- '>' Spacing
- .fi
- .SH LEG GRAMMARS
- .I leg
- is a variant of
- .I peg
- that adds some features of
- .IR lex (1)
- and
- .IR yacc (1).
- It differs from
- .I peg
- in the following ways.
- .TP
- .BI %{\ text... \ %}
- A declaration section can appear anywhere that a rule definition is
- expected. The
- .I text
- between the delimiters '%{' and '%}' is copied verbatim to the
- generated C parser code
- .I before
- the code that implements the parser itself.
- .TP
- .IB name\ = \ pattern
- The 'assignment' operator replaces the left arrow operator '<-'.
- .TP
- .B rule-name
- Hyphens can appear as letters in the names of rules. Each hyphen is
- converted into an underscore in the generated C source code. A single
- single hyphen '-' is a legal rule name.
- .nf
- - = [ \\t\\n\\r]*
- number = [0-9]+ -
- name = [a-zA-Z_][a-zA_Z_0-9]* -
- l-paren = '(' -
- r-paren = ')' -
-
- .fi
- This example shows how ignored whitespace can be obvious when reading
- the grammar and yet unobtrusive when placed liberally at the end of
- every rule associated with a lexical element.
- .TP
- .IB seq-1\ | \ seq-2
- The alternation operator is vertical bar '|' rather than forward
- slash '/'. The
- .I peg
- rule
- .nf
- name <- sequence-1
- / sequence-2
- / sequence-3
- .fi
- is therefore written
- .nf
- name = sequence-1
- | sequence-2
- | sequence-3
- ;
- .fi
- in
- .I leg
- (with the final semicolon being optional, as described next).
- .TP
- .IB pattern\ ;
- A semicolon punctuator can optionally terminate a
- .IR pattern .
- .TP
- .BI %% \ text...
- A double percent '%%' terminates the rules (and declarations) section of
- the grammar. All
- .I text
- following '%%' is copied verbatim to the generated C parser code
- .I after
- the parser implementation code.
- .TP
- .BI $$\ = \ value
- A sub-rule can return a semantic
- .I value
- from an action by assigning it to the pseudo-variable '$$'. All
- semantic values must have the same type (which defaults to 'int').
- This type can be changed by defining YYSTYPE in a declaration section.
- .TP
- .IB identifier : name
- The semantic value returned (by assigning to '$$') from the sub-rule
- .I name
- is associated with the
- .I identifier
- and can be referred to in subsequent actions.
- .PP
- The desk calclator example below illustrates the use of '$$' and ':'.
- .SH LEG EXAMPLE: A DESK CALCULATOR
- The extensions in
- .I leg
- described above allow useful parsers and evaluators (including
- declarations, grammar rules, and supporting C functions such
- as 'main') to be kept within a single source file. To illustrate this
- we show a simple desk calculator supporting the four common arithmetic
- operators and named variables. The intermediate results of arithmetic
- evaluation will be accumulated on an implicit stack by returning them
- as semantic values from sub-rules.
- .nf
- %{
- #include <stdio.h> /* printf() */
- #include <stdlib.h> /* atoi() */
- int vars[26];
- %}
-
- Stmt = - e:Expr EOL { printf("%d\\n", e); }
- | ( !EOL . )* EOL { printf("error\\n"); }
-
- Expr = i:ID ASSIGN s:Sum { $$ = vars[i] = s; }
- | s:Sum { $$ = s; }
-
- Sum = l:Product
- ( PLUS r:Product { l += r; }
- | MINUS r:Product { l -= r; }
- )* { $$ = l; }
-
- Product = l:Value
- ( TIMES r:Value { l *= r; }
- | DIVIDE r:Value { l /= r; }
- )* { $$ = l; }
-
- Value = i:NUMBER { $$ = atoi(yytext); }
- | i:ID !ASSIGN { $$ = vars[i]; }
- | OPEN i:Expr CLOSE { $$ = i; }
-
- NUMBER = < [0-9]+ > - { $$ = atoi(yytext); }
- ID = < [a-z] > - { $$ = yytext[0] - 'a'; }
- ASSIGN = '=' -
- PLUS = '+' -
- MINUS = '-' -
- TIMES = '*' -
- DIVIDE = '/' -
- OPEN = '(' -
- CLOSE = ')' -
-
- - = [ \\t]*
- EOL = '\\n' | '\\r\\n' | '\\r' | ';'
-
- %%
-
- int main()
- {
- while (yyparse())
- ;
- return 0;
- }
- .fi
- .SH LEG GRAMMAR FOR LEG GRAMMARS
- The grammar for
- .I leg
- grammars is shown below. This will both illustrate and formalise the
- above description.
- .nf
- grammar = -
- ( declaration | definition )+
- trailer? end-of-file
-
- declaration = '%{' < ( !'%}' . )* > RPERCENT
-
- trailer = '%%' < .* >
-
- definition = identifier EQUAL expression SEMICOLON?
-
- expression = sequence ( BAR sequence )*
-
- sequence = prefix+
-
- prefix = AND action
- | ( AND | NOT )? suffix
-
- suffix = primary ( QUERY | STAR | PLUS )?
-
- primary = identifier COLON identifier !EQUAL
- | identifier !EQUAL
- | OPEN expression CLOSE
- | literal
- | class
- | DOT
- | action
- | BEGIN
- | END
-
- identifier = < [-a-zA-Z_][-a-zA-Z_0-9]* > -
-
- literal = ['] < ( !['] char )* > ['] -
- | ["] < ( !["] char )* > ["] -
-
- class = '[' < ( !']' range )* > ']' -
-
- range = char '-' char | char
-
- char = '\\\\' [abefnrtv'"\\[\\]\\\\]
- | '\\\\' [0-3][0-7][0-7]
- | '\\\\' [0-7][0-7]?
- | !'\\\\' .
-
- action = '{' < [^}]* > '}' -
-
- EQUAL = '=' -
- COLON = ':' -
- SEMICOLON = ';' -
- BAR = '|' -
- AND = '&' -
- NOT = '!' -
- QUERY = '?' -
- STAR = '*' -
- PLUS = '+' -
- OPEN = '(' -
- CLOSE = ')' -
- DOT = '.' -
- BEGIN = '<' -
- END = '>' -
- RPERCENT = '%}' -
-
- - = ( space | comment )*
- space = ' ' | '\\t' | end-of-line
- comment = '#' ( !end-of-line . )* end-of-line
- end-of-line = '\\r\\n' | '\\n' | '\\r'
- end-of-file = !.
- .fi
- .SH CUSTOMISING THE PARSER
- The following symbols can be redefined in declaration sections to
- modify the generated parser code.
- .TP
- .B YYSTYPE
- The semantic value type. The pseudo-variable '$$' and the
- identifiers 'bound' to rule results with the colon operator ':' should
- all be considered as being declared to have this type. The default
- value is 'int'.
- .TP
- .B YYPARSE
- The name of the main entry point to the parser. The default value
- is 'yyparse'.
- .TP
- .B YYPARSEFROM
- The name of an alternative entry point to the parser. This function
- expects one argument: the function corresponding to the rule from
- which the search for a match should begin. The default
- is 'yyparsefrom'. Note that yyparse() is defined as
- .nf
- int yyparse() { return yyparsefrom(yy_foo); }
- .fi
- where 'foo' is the name of the first rule in the grammar.
- .TP
- .BI YY_INPUT( buf , \ result , \ max_size )
- This macro is invoked by the parser to obtain more input text.
- .I buf
- points to an area of memory that can hold at most
- .I max_size
- characters. The macro should copy input text to
- .I buf
- and then assign the integer variable
- .I result
- to indicate the number of characters copied. If no more input is available,
- the macro should assign 0 to
- .IR result .
- By default, the YY_INPUT macro is defined as follows.
- .nf
- #define YY_INPUT(buf, result, max_size) \\
- { \\
- int yyc= getchar(); \\
- result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \\
- }
- .fi
- .TP
- .B YY_DEBUG
- If this symbols is defined then additional code will be included in
- the parser that prints vast quantities of arcane information to the
- standard error while the parser is running.
- .TP
- .B YY_BEGIN
- This macro is invoked to mark the start of input text that will be
- made available in actions as 'yytext'. This corresponds to
- occurrences of '<' in the grammar. These are converted into
- predicates that are expected to succeed. The default definition
- .nf
- #define YY_BEGIN (yybegin= yypos, 1)
- .fi
- therefore saves the current input position and returns 1 ('true') as
- the result of the predicate.
- .TP
- .B YY_END
- This macros corresponds to '>' in the grammar. Again, it is a
- predicate so the default definition saves the input position
- before 'succeeding'.
- .nf
- #define YY_END (yyend= yypos, 1)
- .fi
- .TP
- .BI YY_PARSE( T )
- This macro declares the parser entry points (yyparse and yyparsefrom)
- to be of type
- .IR T .
- The default definition
- .nf
- #define YY_PARSE(T) T
- .fi
- leaves yyparse() and yyparsefrom() with global visibility. If they
- should not be externally visible in other source files, this macro can
- be redefined to declare them 'static'.
- .nf
- #define YY_PARSE(T) static T
- .fi
- .PP
- The following variables can be reffered to within actions.
- .TP
- .B char *yybuf
- This variable points to the parser's input buffer used to store input
- text that has not yet been matched.
- .TP
- .B int yypos
- This is the offset (in yybuf) of the next character to be matched and
- consumed.
- .TP
- .B char *yytext
- The most recent matched text delimited by '<' and '>' is stored in this variable.
- .TP
- .B int yyleng
- This variable indicates the number of characters in 'yytext'.
- .SH DIAGNOSTICS
- .I peg
- and
- .I leg
- warn about the following conditions while converting a grammar into a parser.
- .TP
- .B syntax error
- The input grammar was malformed in some way. The error message will
- include the text about to be matched (often backed up a huge amount
- from the actual location of the error) and the line number of the most
- recently considered character (which is often the real location of the
- problem).
- .TP
- .B rule 'foo' used but not defined
- The grammar referred to a rule named 'foo' but no definition for it
- was given. Attempting to use the generated parser will likely result
- in errors from the linker due to undefined symbols associated with the
- missing rule.
- .TP
- .B rule 'foo' defined but not used
- The grammar defined a rule named 'foo' and then ignored it. The code
- associated with the rule is included in the generated parser which
- will in all other respects be healthy.
- .TP
- .B possible infinite left recursion in rule 'foo'
- There exists at least one path through the grammar that leads from the
- rule 'foo' back to (a recursive invocation of) the same rule without
- consuming any input.
- .PP
- Left recursion, especially that found in standards documents, is
- often 'direct' and implies trivial repetition.
- .nf
- # (6.7.6)
- direct-abstract-declarator =
- LPAREN abstract-declarator RPAREN
- | direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
- | direct-abstract-declarator? LBRACKET STAR RBRACKET
- | direct-abstract-declarator? LPAREN param-type-list? RPAREN
- .fi
- The recursion can easily be eliminated by converting the parts of the
- pattern following the recursion into a repeatable suffix.
- .nf
-
- # (6.7.6)
- direct-abstract-declarator =
- direct-abstract-declarator-head?
- direct-abstract-declarator-tail*
-
- direct-abstract-declarator-head =
- LPAREN abstract-declarator RPAREN
-
- direct-abstract-declarator-tail =
- LBRACKET assign-expr? RBRACKET
- | LBRACKET STAR RBRACKET
- | LPAREN param-type-list? RPAREN
- .fi
- .SH BUGS
- The 'yy' and 'YY' prefixes cannot be changed.
- .PP
- Left recursion is detected in the input grammar but is not handled
- correctly in the generated parser.
- .PP
- Diagnostics for errors in the input grammar are obscure and not
- particularly helpful.
- .PP
- Several commonly-used
- .IR lex (1)
- features (yywrap(), yyin, etc.) are completely absent.
- .PP
- The generated parser foes not contain '#line' directives to direct C
- compiler errors back to the grammar description when appropriate.
- .IR lex (1)
- features (yywrap(), yyin, etc.) are completely absent.
- .SH SEE ALSO
- D. Val Schorre,
- .I META II, a syntax-oriented compiler writing language,
- 19th ACM National Conference, 1964, pp.\ 41.301--41.311. Describes a
- self-implementing parser generator for analytic grammars with no
- backtracking.
- .PP
- Alexander Birman,
- .I The TMG Recognition Schema,
- Ph.D. dissertation, Princeton, 1970. A mathematical treatment of the
- power and complexity of recursive-descent parsing with backtracking.
- .PP
- Bryan Ford,
- .I Parsing Expression Grammars: A Recognition-Based Syntactic Foundation,
- ACM SIGPLAN Symposium on Principles of Programming Languages, 2004.
- Defines PEGs and analyses them in relation to context-free and regular
- grammars. Introduces the syntax adopted in
- .IR peg .
- .PP
- The standard Unix utilies
- .IR lex (1)
- and
- .IR yacc (1)
- which influenced the syntax and features of
- .IR leg .
- .PP
- The source code for
- .I peg
- and
- .I leg
- whose grammar parsers are written using themselves.
- .PP
- The latest version of this software and documentation:
- .nf
- http://piumarta.com/software/peg
- .fi
- .SH AUTHOR
- .IR peg ,
- .I leg
- and this manual page were written by Ian Piumarta (first-name at
- last-name dot com) while investigating the viablility of regular- and
- parsing-expression grammars for efficiently extracting type and
- signature information from C header files.
- .PP
- Please send bug reports and suggestions for improvements to the author
- at the above address.