/vendor/peg/peg.1

http://github.com/feyeleanor/RubyGoLightly · Unknown · 887 lines · 809 code · 78 blank · 0 comment · 0 complexity · 7b2dbbee42a7f037ce59433cfcd1e11a MD5 · raw file

  1. .\" Copyright (c) 2007 by Ian Piumarta
  2. .\" All rights reserved.
  3. .\"
  4. .\" Permission is hereby granted, free of charge, to any person obtaining a
  5. .\" copy of this software and associated documentation files (the 'Software'),
  6. .\" to deal in the Software without restriction, including without limitation
  7. .\" the rights to use, copy, modify, merge, publish, distribute, and/or sell
  8. .\" copies of the Software, and to permit persons to whom the Software is
  9. .\" furnished to do so, provided that the above copyright notice(s) and this
  10. .\" permission notice appear in all copies of the Software. Acknowledgement
  11. .\" of the use of this Software in supporting documentation would be
  12. .\" appreciated but is not required.
  13. .\"
  14. .\" THE SOFTWARE IS PROVIDED 'AS IS'. USE ENTIRELY AT YOUR OWN RISK.
  15. .\"
  16. .\" Last edited: 2007-09-13 08:40:20 by piumarta on emilia.local
  17. .\"
  18. .TH PEG 1 "May 2007" "Version 0.1"
  19. .SH NAME
  20. peg, leg \- parser generators
  21. .SH SYNOPSIS
  22. .B peg
  23. .B [\-hvV \-ooutput]
  24. .I [filename ...]
  25. .sp 0
  26. .B leg
  27. .B [\-hvV \-ooutput]
  28. .I [filename ...]
  29. .SH DESCRIPTION
  30. .I peg
  31. and
  32. .I leg
  33. are tools for generating recursive-descent parsers: programs that
  34. perform pattern matching on text. They process a Parsing Expression
  35. Grammar (PEG) [Ford 2004] to produce a program that recognises legal
  36. sentences of that grammar.
  37. .I peg
  38. processes PEGs written using the original syntax described by Ford;
  39. .I leg
  40. processes PEGs written using slightly different syntax and conventions
  41. that are intended to make it an attractive replacement for parsers
  42. built with
  43. .IR lex (1)
  44. and
  45. .IR yacc (1).
  46. Unlike
  47. .I lex
  48. and
  49. .IR yacc ,
  50. .I peg
  51. and
  52. .I leg
  53. support unlimited backtracking, provide ordered choice as a means for
  54. disambiguation, and can combine scanning (lexical analysis) and
  55. parsing (syntactic analysis) into a single activity.
  56. .PP
  57. .I peg
  58. reads the specified
  59. .IR filename s,
  60. or standard input if no
  61. .IR filename s
  62. are given, for a grammar describing the parser to generate.
  63. .I peg
  64. then generates a C source file that defines a function
  65. .IR yyparse().
  66. This C source file can be included in, or compiled and then linked
  67. with, a client program. Each time the client program calls
  68. .IR yyparse ()
  69. the parser consumes input text according to the parsing rules,
  70. starting from the first rule in the grammar.
  71. .IR yyparse ()
  72. returns non-zero if the input could be parsed according to the
  73. grammar; it returns zero if the input could not be parsed.
  74. .PP
  75. The prefix 'yy' or 'YY' is prepended to all externally-visible symbols
  76. in the generated parser. This is intended to reduce the risk of
  77. namespace pollution in client programs. (The choice of 'yy' is
  78. historical; see
  79. .IR lex (1)
  80. and
  81. .IR yacc (1),
  82. for example.)
  83. .SH OPTIONS
  84. .I peg
  85. and
  86. .I leg
  87. provide the following options:
  88. .TP
  89. .B \-h
  90. prints a summary of available options and then exits.
  91. .TP
  92. .B \-ooutput
  93. writes the generated parser to the file
  94. .B output
  95. instead of the standard output.
  96. .TP
  97. .B \-v
  98. writes verbose information to standard error while working.
  99. .TP
  100. .B \-V
  101. writes version information to standard error then exits.
  102. .SH A SIMPLE EXAMPLE
  103. The following
  104. .I peg
  105. input specifies a grammar with a single rule (called 'start') that is
  106. satisfied when the input contains the string "username".
  107. .nf
  108. start <- "username"
  109. .fi
  110. (The quotation marks are
  111. .I not
  112. part of the matched text; they serve to indicate a literal string to
  113. be matched.) In other words,
  114. .IR yyparse ()
  115. in the generated C source will return non-zero only if the next eight
  116. characters read from the input spell the word "username". If the
  117. input contains anything else,
  118. .IR yyparse ()
  119. returns zero and no input will have been consumed. (Subsequent calls
  120. to
  121. .IR yyparse ()
  122. will also return zero, since the parser is effectively blocked looking
  123. for the string "username".) To ensure progress we can add an
  124. alternative clause to the 'start' rule that will match any single
  125. character if "username" is not found.
  126. .nf
  127. start <- "username"
  128. / .
  129. .fi
  130. .IR yyparse ()
  131. now always returns non-zero (except at the very end of the input). To
  132. do something useful we can add actions to the rules. These actions
  133. are performed after a complete match is found (starting from the first
  134. rule) and are chosen according to the 'path' taken through the grammar
  135. to match the input. (Linguists would call this path a 'phrase
  136. marker'.)
  137. .nf
  138. start <- "username" { printf("%s\\n", getlogin()); }
  139. / < . > { putchar(yytext[0]); }
  140. .fi
  141. The first line instructs the parser to print the user's login name
  142. whenever it sees "username" in the input. If that match fails, the
  143. second line tells the parser to echo the next character on the input
  144. the standard output. Our parser is now performing useful work: it
  145. will copy the input to the output, replacing all occurrences of
  146. "username" with the user's account name.
  147. .PP
  148. Note the angle brackets ('<' and '>') that were added to the second
  149. alternative. These have no effect on the meaning of the rule, but
  150. serve to delimit the text made available to the following action in
  151. the variable
  152. .IR yytext .
  153. .PP
  154. If the above grammar is placed in the file
  155. .BR username.peg ,
  156. running the command
  157. .nf
  158. peg -o username.c username.peg
  159. .fi
  160. will save the corresponding parser in the file
  161. .BR username.c .
  162. To create a complete program this parser could be included by a C
  163. program as follows.
  164. .nf
  165. #include <stdio.h> /* printf(), putchar() */
  166. #include <unistd.h> /* getlogin() */
  167. #include "username.c" /* yyparse() */
  168. int main()
  169. {
  170. while (yyparse()) /* repeat until EOF */
  171. ;
  172. return 0;
  173. }
  174. .fi
  175. .SH PEG GRAMMARS
  176. A grammar consists of a set of named rules.
  177. .nf
  178. name <- pattern
  179. .fi
  180. The
  181. .B pattern
  182. contains one or more of the following elements.
  183. .TP
  184. .B name
  185. The element stands for the entire pattern in the rule with the given
  186. .BR name .
  187. .TP
  188. .BR \(dq characters \(dq
  189. A character or string enclosed in double quotes is matched literally.
  190. The ANSI C esacpe sequences are recognised within the
  191. .IR characters .
  192. .TP
  193. .BR ' characters '
  194. A character or string enclosed in single quotes is matched literally, as above.
  195. .TP
  196. .BR [ characters ]
  197. A set of characters enclosed in square brackets matches any single
  198. character from the set, with escape characters recognised as above.
  199. If the set begins with an uparrow (^) then the set is negated (the
  200. element matches any character
  201. .I not
  202. in the set). Any pair of characters separated with a dash (-)
  203. represents the range of characters from the first to the second,
  204. inclusive. A single alphabetic character or underscore is matched by
  205. the following set.
  206. .nf
  207. [a-zA-Z_]
  208. .fi
  209. Similarly, the following matches any single non-digit character.
  210. .nf
  211. [^0-9]
  212. .fi
  213. .TP
  214. .B .
  215. A dot matches any character. Note that the only time this fails is at
  216. the end of file, where there is no character to match.
  217. .TP
  218. .BR ( \ pattern\ )
  219. Parentheses are used for grouping (modifying the precendence of the
  220. operators described below).
  221. .TP
  222. .BR { \ action\ }
  223. Curly braces surround actions. The action is arbitray C source code
  224. to be executed at the end of matching. Any braces within the action
  225. must be properly nested. Any input text that was matched before the
  226. action and delimited by angle brackets (see below) is made available
  227. within the action as the contents of the character array
  228. .IR yytext .
  229. The length of (number of characters in)
  230. .I yytext
  231. is available in the variable
  232. .IR yyleng .
  233. (These variable names are historical; see
  234. .IR lex (1).)
  235. .TP
  236. .B <
  237. An opening angle bracket always matches (consuming no input) and
  238. causes the parser to begin accumulating matched text. This text will
  239. be made available to actions in the variable
  240. .IR yytext .
  241. .TP
  242. .B >
  243. A closing angle bracket always matches (consuming no input) and causes
  244. the parser to stop accumulating text for
  245. .IR yytext .
  246. .PP
  247. The above
  248. .IR element s
  249. can be made optional and/or repeatable with the following suffixes:
  250. .TP
  251. .RB element\ ?
  252. The element is optional. If present on the input, it is consumed and
  253. the match succeeds. If not present on the input, no text is consumed
  254. and the match succeeds anyway.
  255. .TP
  256. .RB element\ +
  257. The element is repeatable. If present on the input, one or more
  258. occurrences of
  259. .I element
  260. are consumed and the match succeeds. If no occurrences of
  261. .I element
  262. are present on the input, the match fails.
  263. .TP
  264. .RB element\ *
  265. The element is optional and repeatable. If present on the input, one or more
  266. occurrences of
  267. .I element
  268. are consumed and the match succeeds. If no occurrences of
  269. .I element
  270. are present on the input, the match succeeds anyway.
  271. .PP
  272. The above elements and suffixes can be converted into predicates (that
  273. match arbitray input text and subsequently succeed or fail
  274. .I without
  275. consuming that input) with the following prefixes:
  276. .TP
  277. .BR & \ element
  278. The predicate succeeds only if
  279. .I element
  280. can be matched. Input text scanned while matching
  281. .I element
  282. is not consumed from the input and remains available for subsequent
  283. matching.
  284. .TP
  285. .BR ! \ element
  286. The predicate succeeds only if
  287. .I element
  288. cannot be matched. Input text scanned while matching
  289. .I element
  290. is not consumed from the input and remains available for subsequent
  291. matching. A popular idiom is
  292. .nf
  293. !.
  294. .fi
  295. which matches the end of file, after the last character of the input
  296. has already been consumed.
  297. .PP
  298. A special form of the '&' predicate is provided:
  299. .TP
  300. .BR & {\ expression\ }
  301. In this predicate the simple C
  302. .I expression
  303. .RB ( not
  304. statement) is evaluated immediately when the parser reaches the
  305. predicate. If the
  306. .I expression
  307. yields non-zero (true) the 'match' succeeds and the parser continues
  308. with the next element in the pattern. If the
  309. .I expression
  310. yields zero (false) the 'match' fails and the parser backs up to look
  311. for an alternative parse of the input.
  312. .PP
  313. Several elements (with or without prefixes and suffixes) can be
  314. combined into a
  315. .I sequence
  316. by writing them one after the other. The entire sequence matches only
  317. if each individual element within it matches, from left to right.
  318. .PP
  319. Sequences can be separated into disjoint alternatives by the
  320. alternation operator '/'.
  321. .TP
  322. .RB sequence-1\ / \ sequence-2\ / \ ...\ / \ sequence-N
  323. Each sequence is tried in turn until one of them matches, at which
  324. time matching for the overall pattern succeeds. If none of the
  325. sequences matches then the match of the overall pattern fails.
  326. .PP
  327. Finally, the pound sign (#) introduces a comment (discarded) that
  328. continues until the end of the line.
  329. .PP
  330. To summarise the above, the parser tries to match the input text
  331. against a pattern containing literals, names (representing other
  332. rules), and various operators (written as prefixes, suffixes,
  333. juxtaposition for sequencing and and infix alternation operator) that
  334. modify how the elements within the pattern are matched. Matches are
  335. made from left to right, 'descending' into named sub-rules as they are
  336. encountered. If the matching process fails, the parser 'back tracks'
  337. ('rewinding' the input appropriately in the process) to find the
  338. nearest alternative 'path' through the grammar. In other words the
  339. parser performs a depth-first, left-to-right search for the first
  340. successfully-matching path through the rules. If found, the actions
  341. along the successful path are executed (in the order they were
  342. encountered).
  343. .PP
  344. Note that predicates are evaluated
  345. .I immediately
  346. during the search for a successful match, since they contribute to the
  347. success or failure of the search. Actions, however, are evaluated
  348. only after a successful match has been found.
  349. .SH PEG GRAMMAR FOR PEG GRAMMARS
  350. The grammar for
  351. .I peg
  352. grammars is shown below. This will both illustrate and formalise
  353. the above description.
  354. .nf
  355. Grammar <- Spacing Definition+ EndOfFile
  356. Definition <- Identifier LEFTARROW Expression
  357. Expression <- Sequence ( SLASH Sequence )*
  358. Sequence <- Prefix*
  359. Prefix <- AND Action
  360. / ( AND | NOT )? Suffix
  361. Suffix <- Primary ( QUERY / STAR / PLUS )?
  362. Primary <- Identifier !LEFTARROW
  363. / OPEN Expression CLOSE
  364. / Literal
  365. / Class
  366. / DOT
  367. / Action
  368. / BEGIN
  369. / END
  370. Identifier <- < IdentStart IdentCont* > Spacing
  371. IdentStart <- [a-zA-Z_]
  372. IdentCont <- IdentStart / [0-9]
  373. Literal <- ['] < ( !['] Char )* > ['] Spacing
  374. / ["] < ( !["] Char )* > ["] Spacing
  375. Class <- '[' < ( !']' Range )* > ']' Spacing
  376. Range <- Char '-' Char / Char
  377. Char <- '\\\\' [abefnrtv'"\\[\\]\\\\]
  378. / '\\\\' [0-3][0-7][0-7]
  379. / '\\\\' [0-7][0-7]?
  380. / '\\\\' '-'
  381. / !'\\\\' .
  382. LEFTARROW <- '<-' Spacing
  383. SLASH <- '/' Spacing
  384. AND <- '&' Spacing
  385. NOT <- '!' Spacing
  386. QUERY <- '?' Spacing
  387. STAR <- '*' Spacing
  388. PLUS <- '+' Spacing
  389. OPEN <- '(' Spacing
  390. CLOSE <- ')' Spacing
  391. DOT <- '.' Spacing
  392. Spacing <- ( Space / Comment )*
  393. Comment <- '#' ( !EndOfLine . )* EndOfLine
  394. Space <- ' ' / '\\t' / EndOfLine
  395. EndOfLine <- '\\r\\n' / '\\n' / '\\r'
  396. EndOfFile <- !.
  397. Action <- '{' < [^}]* > '}' Spacing
  398. BEGIN <- '<' Spacing
  399. END <- '>' Spacing
  400. .fi
  401. .SH LEG GRAMMARS
  402. .I leg
  403. is a variant of
  404. .I peg
  405. that adds some features of
  406. .IR lex (1)
  407. and
  408. .IR yacc (1).
  409. It differs from
  410. .I peg
  411. in the following ways.
  412. .TP
  413. .BI %{\ text... \ %}
  414. A declaration section can appear anywhere that a rule definition is
  415. expected. The
  416. .I text
  417. between the delimiters '%{' and '%}' is copied verbatim to the
  418. generated C parser code
  419. .I before
  420. the code that implements the parser itself.
  421. .TP
  422. .IB name\ = \ pattern
  423. The 'assignment' operator replaces the left arrow operator '<-'.
  424. .TP
  425. .B rule-name
  426. Hyphens can appear as letters in the names of rules. Each hyphen is
  427. converted into an underscore in the generated C source code. A single
  428. single hyphen '-' is a legal rule name.
  429. .nf
  430. - = [ \\t\\n\\r]*
  431. number = [0-9]+ -
  432. name = [a-zA-Z_][a-zA_Z_0-9]* -
  433. l-paren = '(' -
  434. r-paren = ')' -
  435. .fi
  436. This example shows how ignored whitespace can be obvious when reading
  437. the grammar and yet unobtrusive when placed liberally at the end of
  438. every rule associated with a lexical element.
  439. .TP
  440. .IB seq-1\ | \ seq-2
  441. The alternation operator is vertical bar '|' rather than forward
  442. slash '/'. The
  443. .I peg
  444. rule
  445. .nf
  446. name <- sequence-1
  447. / sequence-2
  448. / sequence-3
  449. .fi
  450. is therefore written
  451. .nf
  452. name = sequence-1
  453. | sequence-2
  454. | sequence-3
  455. ;
  456. .fi
  457. in
  458. .I leg
  459. (with the final semicolon being optional, as described next).
  460. .TP
  461. .IB pattern\ ;
  462. A semicolon punctuator can optionally terminate a
  463. .IR pattern .
  464. .TP
  465. .BI %% \ text...
  466. A double percent '%%' terminates the rules (and declarations) section of
  467. the grammar. All
  468. .I text
  469. following '%%' is copied verbatim to the generated C parser code
  470. .I after
  471. the parser implementation code.
  472. .TP
  473. .BI $$\ = \ value
  474. A sub-rule can return a semantic
  475. .I value
  476. from an action by assigning it to the pseudo-variable '$$'. All
  477. semantic values must have the same type (which defaults to 'int').
  478. This type can be changed by defining YYSTYPE in a declaration section.
  479. .TP
  480. .IB identifier : name
  481. The semantic value returned (by assigning to '$$') from the sub-rule
  482. .I name
  483. is associated with the
  484. .I identifier
  485. and can be referred to in subsequent actions.
  486. .PP
  487. The desk calclator example below illustrates the use of '$$' and ':'.
  488. .SH LEG EXAMPLE: A DESK CALCULATOR
  489. The extensions in
  490. .I leg
  491. described above allow useful parsers and evaluators (including
  492. declarations, grammar rules, and supporting C functions such
  493. as 'main') to be kept within a single source file. To illustrate this
  494. we show a simple desk calculator supporting the four common arithmetic
  495. operators and named variables. The intermediate results of arithmetic
  496. evaluation will be accumulated on an implicit stack by returning them
  497. as semantic values from sub-rules.
  498. .nf
  499. %{
  500. #include <stdio.h> /* printf() */
  501. #include <stdlib.h> /* atoi() */
  502. int vars[26];
  503. %}
  504. Stmt = - e:Expr EOL { printf("%d\\n", e); }
  505. | ( !EOL . )* EOL { printf("error\\n"); }
  506. Expr = i:ID ASSIGN s:Sum { $$ = vars[i] = s; }
  507. | s:Sum { $$ = s; }
  508. Sum = l:Product
  509. ( PLUS r:Product { l += r; }
  510. | MINUS r:Product { l -= r; }
  511. )* { $$ = l; }
  512. Product = l:Value
  513. ( TIMES r:Value { l *= r; }
  514. | DIVIDE r:Value { l /= r; }
  515. )* { $$ = l; }
  516. Value = i:NUMBER { $$ = atoi(yytext); }
  517. | i:ID !ASSIGN { $$ = vars[i]; }
  518. | OPEN i:Expr CLOSE { $$ = i; }
  519. NUMBER = < [0-9]+ > - { $$ = atoi(yytext); }
  520. ID = < [a-z] > - { $$ = yytext[0] - 'a'; }
  521. ASSIGN = '=' -
  522. PLUS = '+' -
  523. MINUS = '-' -
  524. TIMES = '*' -
  525. DIVIDE = '/' -
  526. OPEN = '(' -
  527. CLOSE = ')' -
  528. - = [ \\t]*
  529. EOL = '\\n' | '\\r\\n' | '\\r' | ';'
  530. %%
  531. int main()
  532. {
  533. while (yyparse())
  534. ;
  535. return 0;
  536. }
  537. .fi
  538. .SH LEG GRAMMAR FOR LEG GRAMMARS
  539. The grammar for
  540. .I leg
  541. grammars is shown below. This will both illustrate and formalise the
  542. above description.
  543. .nf
  544. grammar = -
  545. ( declaration | definition )+
  546. trailer? end-of-file
  547. declaration = '%{' < ( !'%}' . )* > RPERCENT
  548. trailer = '%%' < .* >
  549. definition = identifier EQUAL expression SEMICOLON?
  550. expression = sequence ( BAR sequence )*
  551. sequence = prefix+
  552. prefix = AND action
  553. | ( AND | NOT )? suffix
  554. suffix = primary ( QUERY | STAR | PLUS )?
  555. primary = identifier COLON identifier !EQUAL
  556. | identifier !EQUAL
  557. | OPEN expression CLOSE
  558. | literal
  559. | class
  560. | DOT
  561. | action
  562. | BEGIN
  563. | END
  564. identifier = < [-a-zA-Z_][-a-zA-Z_0-9]* > -
  565. literal = ['] < ( !['] char )* > ['] -
  566. | ["] < ( !["] char )* > ["] -
  567. class = '[' < ( !']' range )* > ']' -
  568. range = char '-' char | char
  569. char = '\\\\' [abefnrtv'"\\[\\]\\\\]
  570. | '\\\\' [0-3][0-7][0-7]
  571. | '\\\\' [0-7][0-7]?
  572. | !'\\\\' .
  573. action = '{' < [^}]* > '}' -
  574. EQUAL = '=' -
  575. COLON = ':' -
  576. SEMICOLON = ';' -
  577. BAR = '|' -
  578. AND = '&' -
  579. NOT = '!' -
  580. QUERY = '?' -
  581. STAR = '*' -
  582. PLUS = '+' -
  583. OPEN = '(' -
  584. CLOSE = ')' -
  585. DOT = '.' -
  586. BEGIN = '<' -
  587. END = '>' -
  588. RPERCENT = '%}' -
  589. - = ( space | comment )*
  590. space = ' ' | '\\t' | end-of-line
  591. comment = '#' ( !end-of-line . )* end-of-line
  592. end-of-line = '\\r\\n' | '\\n' | '\\r'
  593. end-of-file = !.
  594. .fi
  595. .SH CUSTOMISING THE PARSER
  596. The following symbols can be redefined in declaration sections to
  597. modify the generated parser code.
  598. .TP
  599. .B YYSTYPE
  600. The semantic value type. The pseudo-variable '$$' and the
  601. identifiers 'bound' to rule results with the colon operator ':' should
  602. all be considered as being declared to have this type. The default
  603. value is 'int'.
  604. .TP
  605. .B YYPARSE
  606. The name of the main entry point to the parser. The default value
  607. is 'yyparse'.
  608. .TP
  609. .B YYPARSEFROM
  610. The name of an alternative entry point to the parser. This function
  611. expects one argument: the function corresponding to the rule from
  612. which the search for a match should begin. The default
  613. is 'yyparsefrom'. Note that yyparse() is defined as
  614. .nf
  615. int yyparse() { return yyparsefrom(yy_foo); }
  616. .fi
  617. where 'foo' is the name of the first rule in the grammar.
  618. .TP
  619. .BI YY_INPUT( buf , \ result , \ max_size )
  620. This macro is invoked by the parser to obtain more input text.
  621. .I buf
  622. points to an area of memory that can hold at most
  623. .I max_size
  624. characters. The macro should copy input text to
  625. .I buf
  626. and then assign the integer variable
  627. .I result
  628. to indicate the number of characters copied. If no more input is available,
  629. the macro should assign 0 to
  630. .IR result .
  631. By default, the YY_INPUT macro is defined as follows.
  632. .nf
  633. #define YY_INPUT(buf, result, max_size) \\
  634. { \\
  635. int yyc= getchar(); \\
  636. result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \\
  637. }
  638. .fi
  639. .TP
  640. .B YY_DEBUG
  641. If this symbols is defined then additional code will be included in
  642. the parser that prints vast quantities of arcane information to the
  643. standard error while the parser is running.
  644. .TP
  645. .B YY_BEGIN
  646. This macro is invoked to mark the start of input text that will be
  647. made available in actions as 'yytext'. This corresponds to
  648. occurrences of '<' in the grammar. These are converted into
  649. predicates that are expected to succeed. The default definition
  650. .nf
  651. #define YY_BEGIN (yybegin= yypos, 1)
  652. .fi
  653. therefore saves the current input position and returns 1 ('true') as
  654. the result of the predicate.
  655. .TP
  656. .B YY_END
  657. This macros corresponds to '>' in the grammar. Again, it is a
  658. predicate so the default definition saves the input position
  659. before 'succeeding'.
  660. .nf
  661. #define YY_END (yyend= yypos, 1)
  662. .fi
  663. .TP
  664. .BI YY_PARSE( T )
  665. This macro declares the parser entry points (yyparse and yyparsefrom)
  666. to be of type
  667. .IR T .
  668. The default definition
  669. .nf
  670. #define YY_PARSE(T) T
  671. .fi
  672. leaves yyparse() and yyparsefrom() with global visibility. If they
  673. should not be externally visible in other source files, this macro can
  674. be redefined to declare them 'static'.
  675. .nf
  676. #define YY_PARSE(T) static T
  677. .fi
  678. .PP
  679. The following variables can be reffered to within actions.
  680. .TP
  681. .B char *yybuf
  682. This variable points to the parser's input buffer used to store input
  683. text that has not yet been matched.
  684. .TP
  685. .B int yypos
  686. This is the offset (in yybuf) of the next character to be matched and
  687. consumed.
  688. .TP
  689. .B char *yytext
  690. The most recent matched text delimited by '<' and '>' is stored in this variable.
  691. .TP
  692. .B int yyleng
  693. This variable indicates the number of characters in 'yytext'.
  694. .SH DIAGNOSTICS
  695. .I peg
  696. and
  697. .I leg
  698. warn about the following conditions while converting a grammar into a parser.
  699. .TP
  700. .B syntax error
  701. The input grammar was malformed in some way. The error message will
  702. include the text about to be matched (often backed up a huge amount
  703. from the actual location of the error) and the line number of the most
  704. recently considered character (which is often the real location of the
  705. problem).
  706. .TP
  707. .B rule 'foo' used but not defined
  708. The grammar referred to a rule named 'foo' but no definition for it
  709. was given. Attempting to use the generated parser will likely result
  710. in errors from the linker due to undefined symbols associated with the
  711. missing rule.
  712. .TP
  713. .B rule 'foo' defined but not used
  714. The grammar defined a rule named 'foo' and then ignored it. The code
  715. associated with the rule is included in the generated parser which
  716. will in all other respects be healthy.
  717. .TP
  718. .B possible infinite left recursion in rule 'foo'
  719. There exists at least one path through the grammar that leads from the
  720. rule 'foo' back to (a recursive invocation of) the same rule without
  721. consuming any input.
  722. .PP
  723. Left recursion, especially that found in standards documents, is
  724. often 'direct' and implies trivial repetition.
  725. .nf
  726. # (6.7.6)
  727. direct-abstract-declarator =
  728. LPAREN abstract-declarator RPAREN
  729. | direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
  730. | direct-abstract-declarator? LBRACKET STAR RBRACKET
  731. | direct-abstract-declarator? LPAREN param-type-list? RPAREN
  732. .fi
  733. The recursion can easily be eliminated by converting the parts of the
  734. pattern following the recursion into a repeatable suffix.
  735. .nf
  736. # (6.7.6)
  737. direct-abstract-declarator =
  738. direct-abstract-declarator-head?
  739. direct-abstract-declarator-tail*
  740. direct-abstract-declarator-head =
  741. LPAREN abstract-declarator RPAREN
  742. direct-abstract-declarator-tail =
  743. LBRACKET assign-expr? RBRACKET
  744. | LBRACKET STAR RBRACKET
  745. | LPAREN param-type-list? RPAREN
  746. .fi
  747. .SH BUGS
  748. The 'yy' and 'YY' prefixes cannot be changed.
  749. .PP
  750. Left recursion is detected in the input grammar but is not handled
  751. correctly in the generated parser.
  752. .PP
  753. Diagnostics for errors in the input grammar are obscure and not
  754. particularly helpful.
  755. .PP
  756. Several commonly-used
  757. .IR lex (1)
  758. features (yywrap(), yyin, etc.) are completely absent.
  759. .PP
  760. The generated parser foes not contain '#line' directives to direct C
  761. compiler errors back to the grammar description when appropriate.
  762. .IR lex (1)
  763. features (yywrap(), yyin, etc.) are completely absent.
  764. .SH SEE ALSO
  765. D. Val Schorre,
  766. .I META II, a syntax-oriented compiler writing language,
  767. 19th ACM National Conference, 1964, pp.\ 41.301--41.311. Describes a
  768. self-implementing parser generator for analytic grammars with no
  769. backtracking.
  770. .PP
  771. Alexander Birman,
  772. .I The TMG Recognition Schema,
  773. Ph.D. dissertation, Princeton, 1970. A mathematical treatment of the
  774. power and complexity of recursive-descent parsing with backtracking.
  775. .PP
  776. Bryan Ford,
  777. .I Parsing Expression Grammars: A Recognition-Based Syntactic Foundation,
  778. ACM SIGPLAN Symposium on Principles of Programming Languages, 2004.
  779. Defines PEGs and analyses them in relation to context-free and regular
  780. grammars. Introduces the syntax adopted in
  781. .IR peg .
  782. .PP
  783. The standard Unix utilies
  784. .IR lex (1)
  785. and
  786. .IR yacc (1)
  787. which influenced the syntax and features of
  788. .IR leg .
  789. .PP
  790. The source code for
  791. .I peg
  792. and
  793. .I leg
  794. whose grammar parsers are written using themselves.
  795. .PP
  796. The latest version of this software and documentation:
  797. .nf
  798. http://piumarta.com/software/peg
  799. .fi
  800. .SH AUTHOR
  801. .IR peg ,
  802. .I leg
  803. and this manual page were written by Ian Piumarta (first-name at
  804. last-name dot com) while investigating the viablility of regular- and
  805. parsing-expression grammars for efficiently extracting type and
  806. signature information from C header files.
  807. .PP
  808. Please send bug reports and suggestions for improvements to the author
  809. at the above address.