PageRenderTime 49ms CodeModel.GetById 10ms RepoModel.GetById 1ms app.codeStats 0ms

/projects/javacc-5.0/www/doc/javaccgrm.html

https://gitlab.com/essere.lab.public/qualitas.class-corpus
HTML | 1284 lines | 1193 code | 50 blank | 41 comment | 0 complexity | 5db65ab80568b83eddba672d3954c2bf MD5 | raw file
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2. <html xmlns="http://www.w3.org/1999/xhtml">
  3. <!--
  4. Copyright (c) 2006, Sun Microsystems, Inc.
  5. All rights reserved.
  6. Redistribution and use in source and binary forms, with or without
  7. modification, are permitted provided that the following conditions are met:
  8. * Redistributions of source code must retain the above copyright notice,
  9. this list of conditions and the following disclaimer.
  10. * Redistributions in binary form must reproduce the above copyright
  11. notice, this list of conditions and the following disclaimer in the
  12. documentation and/or other materials provided with the distribution.
  13. * Neither the name of the Sun Microsystems, Inc. nor the names of its
  14. contributors may be used to endorse or promote products derived from
  15. this software without specific prior written permission.
  16. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  17. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  18. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  19. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  20. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  21. CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  22. SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  23. INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  24. CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  25. ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  26. THE POSSIBILITY OF SUCH DAMAGE.
  27. -->
  28. <head>
  29. <title>JavaCC Grammar Files</title>
  30. <!-- Changed by: Michael Van De Vanter, 14-Jan-2003 -->
  31. </head>
  32. <body bgcolor="#FFFFFF" >
  33. <h1>JavaCC [tm]: Grammar Files</h1>
  34. <p>
  35. This page contains the complete syntax of Java Compiler Compiler [tm]
  36. grammar files with detailed explanations of each construct.
  37. </p>
  38. <p>
  39. Tokens in the grammar files follow the same conventions as for the Java programming language.
  40. Hence identifiers, strings, characters, etc. used in the grammars are
  41. the same as Java identifiers, Java strings, Java characters, etc.
  42. </p>
  43. <p>
  44. <em>White space</em> in the grammar files also follows the same conventions as
  45. for the Java programming language. This includes the syntax for comments. Most comments present in
  46. the grammar files are generated into the generated parser/lexical analyzer.
  47. </p>
  48. <p>
  49. Grammar files are preprocessed for Unicode escapes just as Java files
  50. are (i.e., occurrences of strings such as <code>\uxxxx</code> - where <code>xxxx</code> is a hex value -
  51. are converted to the corresponding Unicode character before lexical analysis).
  52. </p>
  53. <p>
  54. <em>Exceptions to the above rules:</em>
  55. The Java operators "<code>&lt;&lt;</code>", "<code>&gt;&gt;</code>", "<code>&gt;&gt;&gt;</code>", "<code>&lt;&lt;=</code>",
  56. "<code>&gt;&gt;=</code>", and "<code>&gt;&gt;&gt;=</code>" are left out of JavaCC's input token list
  57. in order to allow convenient nested use of token specifications.
  58. Finally, the following are the additional reserved words in the Java Compiler
  59. Compiler [tm] grammar files.
  60. </p>
  61. <table cellpadding="3">
  62. <tr>
  63. <td align="left"><strong>EOF</strong></td>
  64. <td align="left"><strong><a href="#IGNORE_CASE">IGNORE_CASE</a></strong></td>
  65. <td align="left"><strong><a href="#JAVACODE">JAVACODE</a></strong></td>
  66. <td align="left"><strong><a href="#LOOKAHEAD">LOOKAHEAD</a></strong></td>
  67. </tr>
  68. <tr>
  69. <td align="left"><strong><a href="#MORE">MORE</a></strong></td>
  70. <td align="left"><strong><a href="#PARSER_BEGIN">PARSER_BEGIN</a></strong></td>
  71. <td align="left"><strong><a href="#PARSER_END">PARSER_END</a></strong></td>
  72. <td align="left"><strong><a href="#SKIP">SKIP</a></strong></td>
  73. </tr>
  74. <tr>
  75. <td align="left"><strong><a href="#SPECIAL_TOKEN">SPECIAL_TOKEN</a></strong></td>
  76. <td align="left"><strong><a href="#TOKEN">TOKEN</a></strong></td>
  77. <td align="left"><strong><a href="#TOKEN_MGR_DECLS">TOKEN_MGR_DECLS</a></strong></td>
  78. </tr>
  79. </table>
  80. <p>
  81. Any Java entities used in the grammar rules that follow appear italicized
  82. with the prefix <em>java_</em> (<em>e.g.</em>, <em>java_compilation_unit</em>).
  83. </p>
  84. <hr />
  85. <a name="PARSER_BEGIN"></a><a name="PARSER_END"></a>
  86. <table>
  87. <tr>
  88. <td align="right" valign="baseline"><a name="prod1">javacc_input</a></td>
  89. <td align="center" valign="baseline">::=</td>
  90. <td align="left" valign="baseline"><a href="#prod2">javacc_options</a></td>
  91. </tr>
  92. <tr>
  93. <td></td><td></td>
  94. <td align="left" valign="baseline">"PARSER_BEGIN" "(" &lt;IDENTIFIER&gt; ")"</td>
  95. </tr>
  96. <tr>
  97. <td></td><td></td>
  98. <td align="left" valign="baseline"><em>java_compilation_unit</em></td>
  99. </tr>
  100. <tr>
  101. <td></td><td></td>
  102. <td align="left" valign="baseline">"PARSER_END" "(" &lt;IDENTIFIER&gt; ")"</td>
  103. </tr>
  104. <tr>
  105. <td></td><td></td>
  106. <td align="left" valign="baseline">( <a href="#prod5">production</a> )*</td>
  107. </tr>
  108. <tr>
  109. <td></td><td></td>
  110. <td align="left" valign="baseline">&lt;EOF&gt;</td>
  111. </tr>
  112. </table>
  113. <p>
  114. The grammar file starts with a list of options (which is optional).
  115. This is then followed by a Java compilation unit enclosed between
  116. "PARSER_BEGIN(name)" and "PARSER_END(name)". After this is a list
  117. of grammar productions. <a href="#prod2">Options</a> and
  118. <a href="#prod5">productions</a> are described later.
  119. </p>
  120. <p>
  121. The <em>name</em> that follows "PARSER_BEGIN" and "PARSER_END" must
  122. be the same and this identifies the name of the generated parser.
  123. For example, if <em>name</em> is "MyParser", then the following files
  124. are generated:
  125. </p>
  126. <p>
  127. <strong>MyParser.java:</strong>
  128. The generate parser.
  129. <br />
  130. <strong>MyParserTokenManager.java:</strong>
  131. The generated token manager (or scanner/lexical analyzer).
  132. <br />
  133. <strong>MyParserConstants.java:</strong>
  134. A bunch of useful constants.
  135. </p>
  136. <p>
  137. Other files such as "Token.java", "ParseException.java", etc. are also
  138. generated. However, these files contain boilerplate code and are
  139. the same for any grammar and may be reused across grammars (provided the
  140. grammars use compatible options).
  141. </p>
  142. <p>
  143. Between the PARSER_BEGIN and PARSER_END constructs is a regular
  144. Java compilation unit (a compilation unit in Java lingo is the entire
  145. contents of a Java file). This may be any arbitrary
  146. Java compilation unit so long as it contains a class declaration
  147. whose name is the same as the name of the generated parser ("MyParser"
  148. in the above example). Hence, in general, this part of the grammar
  149. file looks like:
  150. </p>
  151. <pre>
  152. PARSER_BEGIN(parser_name)
  153. . . .
  154. class parser_name . . . {
  155. . . .
  156. }
  157. . . .
  158. PARSER_END(parser_name)
  159. </pre>
  160. <p>
  161. JavaCC does not perform detailed checks on the compilation unit, so
  162. it is possible for a grammar file to pass through JavaCC and generate
  163. Java files that produce errors when they are compiled.
  164. </p>
  165. <p>
  166. If the compilation unit includes a package declaration, this is
  167. included in all the generated files. If the compilation unit includes
  168. imports declarations, this is included in the generated parser and
  169. token manager files.
  170. </p>
  171. <p>
  172. The generated parser file contains everything in the compilation unit
  173. and, in addition, contains the generated parser code that is included at
  174. the end of the parser class. For the above example, the generated
  175. parser will look like:
  176. </p>
  177. <pre>
  178. . . .
  179. class parser_name . . . {
  180. . . .
  181. // generated parser is inserted here.
  182. }
  183. . . .
  184. </pre>
  185. <p>
  186. The generated parser includes a public method declaration corresponding
  187. to each non-terminal (see <a href="#prod9">javacode_production</a> and
  188. <a href="#prod11">bnf_production</a>) in the grammar file. Parsing with
  189. respect to a non-terminal is achieved by calling the method corresponding
  190. to that non-terminal. Unlike yacc, there is no single start symbol in
  191. JavaCC - one can parse with respect to any non-terminal in the grammar.
  192. </p>
  193. <p>
  194. The generated token manager provides one public method:
  195. </p>
  196. <pre>
  197. Token getNextToken() throws ParseError;
  198. </pre>
  199. <p>
  200. For more details on how this method may be used, please read
  201. <a href="apiroutines.html">the description of the Java Compiler Compiler
  202. API</a>.
  203. </p>
  204. <hr />
  205. <table>
  206. <tr>
  207. <td align="right" valign="baseline"><a name="prod2">javacc_options</a></td>
  208. <td align="center" valign="baseline">::=</td>
  209. <td align="left" valign="baseline">[ "<a name="options">options</a>" "{" ( <a href="#prod6">option_binding</a> )* "}" ]</td>
  210. </tr>
  211. </table>
  212. <p>
  213. The options if present, starts with the reserved word "options" followed
  214. by a list of one or more option bindings within braces. Each option
  215. binding specifies the setting of one option. The same option may not be
  216. set multiple times.
  217. </p>
  218. <p>
  219. Options may be specified either here in the grammar file, or from
  220. <a href="commandline.html">the command line</a>. If the option is set
  221. from <a href="commandline.html">the command line</a>, that takes precedence.
  222. </p>
  223. <p>
  224. Option names are not case-sensitive.
  225. </p>
  226. <hr />
  227. <table>
  228. <tr>
  229. <td align="right" valign="baseline"><a name="prod6">option_binding</a></td>
  230. <td align="center" valign="baseline">::=</td>
  231. <td align="left" valign="baseline">"LOOKAHEAD" "=" <em>java_integer_literal</em> ";"</td>
  232. </tr>
  233. <tr>
  234. <td></td><td align="center" valign="baseline">|</td>
  235. <td align="left" valign="baseline">"CHOICE_AMBIGUITY_CHECK" "=" <em>java_integer_literal</em> ";"</td>
  236. </tr>
  237. <tr>
  238. <td></td><td align="center" valign="baseline">|</td>
  239. <td align="left" valign="baseline">"OTHER_AMBIGUITY_CHECK" "=" <em>java_integer_literal</em> ";"</td>
  240. </tr>
  241. <tr>
  242. <td></td><td align="center" valign="baseline">|</td>
  243. <td align="left" valign="baseline">"STATIC" "=" <em>java_boolean_literal</em> ";"</td>
  244. </tr>
  245. <tr>
  246. <td></td><td align="center" valign="baseline">|</td>
  247. <td align="left" valign="baseline">"SUPPORT_CLASS_VISIBILITY_PUBLIC" "=" <em>java_boolean_literal</em> ";"</td>
  248. </tr>
  249. <tr>
  250. <td></td><td align="center" valign="baseline">|</td>
  251. <td align="left" valign="baseline">"DEBUG_PARSER" "=" <em>java_boolean_literal</em> ";"</td>
  252. </tr>
  253. <tr>
  254. <td></td><td align="center" valign="baseline">|</td>
  255. <td align="left" valign="baseline">"DEBUG_LOOKAHEAD" "=" <em>java_boolean_literal</em> ";"</td>
  256. </tr>
  257. <tr>
  258. <td></td><td align="center" valign="baseline">|</td>
  259. <td align="left" valign="baseline">"DEBUG_TOKEN_MANAGER" "=" <em>java_boolean_literal</em> ";"</td>
  260. </tr>
  261. <tr>
  262. <td></td><td align="center" valign="baseline">|</td>
  263. <td align="left" valign="baseline">"ERROR_REPORTING" "=" <em>java_boolean_literal</em> ";"</td>
  264. </tr>
  265. <tr>
  266. <td></td><td align="center" valign="baseline">|</td>
  267. <td align="left" valign="baseline">"JAVA_UNICODE_ESCAPE" "=" <em>java_boolean_literal</em> ";"</td>
  268. </tr>
  269. <tr>
  270. <td></td><td align="center" valign="baseline">|</td>
  271. <td align="left" valign="baseline">"UNICODE_INPUT" "=" <em>java_boolean_literal</em> ";"</td>
  272. </tr>
  273. <tr>
  274. <td></td><td align="center" valign="baseline">|</td>
  275. <td align="left" valign="baseline">"IGNORE_CASE" "=" <em>java_boolean_literal</em> ";"</td>
  276. </tr>
  277. <tr>
  278. <td></td><td align="center" valign="baseline">|</td>
  279. <td align="left" valign="baseline">"USER_TOKEN_MANAGER" "=" <em>java_boolean_literal</em> ";"</td>
  280. </tr>
  281. <tr>
  282. <td></td><td align="center" valign="baseline">|</td>
  283. <td align="left" valign="baseline">"USER_CHAR_STREAM" "=" <em>java_boolean_literal</em> ";"</td>
  284. </tr>
  285. <tr>
  286. <td></td><td align="center" valign="baseline">|</td>
  287. <td align="left" valign="baseline">"BUILD_PARSER" "=" <em>java_boolean_literal</em> ";"</td>
  288. </tr>
  289. <tr>
  290. <td></td><td align="center" valign="baseline">|</td>
  291. <td align="left" valign="baseline">"BUILD_TOKEN_MANAGER" "=" <em>java_boolean_literal</em> ";"</td>
  292. </tr>
  293. <tr>
  294. <td></td><td align="center" valign="baseline">|</td>
  295. <td align="left" valign="baseline">"TOKEN_EXTENDS" "=" <em>java_string_literal</em> ";"</td>
  296. </tr>
  297. <tr>
  298. <td></td><td align="center" valign="baseline">|</td>
  299. <td align="left" valign="baseline">"TOKEN_FACTORY" "=" <em>java_string_literal</em> ";"</td>
  300. </tr>
  301. <tr>
  302. <td></td><td align="center" valign="baseline">|</td>
  303. <td align="left" valign="baseline">"TOKEN_MANAGER_USES_PARSER" "=" <em>java_boolean_literal</em> ";"</td>
  304. </tr>
  305. <tr>
  306. <td></td><td align="center" valign="baseline">|</td>
  307. <td align="left" valign="baseline">"SANITY_CHECK" "=" <em>java_boolean_literal</em> ";"</td>
  308. </tr>
  309. <tr>
  310. <td></td><td align="center" valign="baseline">|</td>
  311. <td align="left" valign="baseline">"FORCE_LA_CHECK" "=" <em>java_boolean_literal</em> ";"</td>
  312. </tr>
  313. <tr>
  314. <td></td><td align="center" valign="baseline">|</td>
  315. <td align="left" valign="baseline">"COMMON_TOKEN_ACTION" "=" <em>java_boolean_literal</em> ";"</td>
  316. </tr>
  317. <tr>
  318. <td></td><td align="center" valign="baseline">|</td>
  319. <td align="left" valign="baseline">"CACHE_TOKENS" "=" <em>java_boolean_literal</em> ";"</td>
  320. </tr>
  321. <tr>
  322. <td></td><td align="center" valign="baseline">|</td>
  323. <td align="left" valign="baseline">"OUTPUT_DIRECTORY" "=" <em>java_string_literal</em> ";"</td>
  324. </tr>
  325. </table>
  326. <ul>
  327. <li>
  328. <strong><a name="LOOKAHEAD">LOOKAHEAD</a>:</strong>
  329. The number of tokens to look ahead before making a
  330. decision at a choice point during parsing. The default value is 1.
  331. The smaller this number, the faster the parser. This number may be
  332. overridden for specific productions within the grammar as described
  333. later. See the description of
  334. <a href="lookahead.html">the lookahead algorithm</a> for complete
  335. details on how lookahead works.
  336. </li>
  337. <li>
  338. <strong>CHOICE_AMBIGUITY_CHECK:</strong>
  339. This is an integer option whose default value is 2.
  340. This is the number of tokens considered in checking choices of the
  341. form "A | B | ..." for ambiguity. For example, if there is a common
  342. two token prefix for both A and B, but no common three token prefix,
  343. (assume this option is set to 3) then JavaCC can tell you to use a
  344. lookahead of 3 for disambiguation purposes. And if A and B have a
  345. common three token prefix, then JavaCC only tell you that you need to
  346. have a lookahead of 3 <em>or more</em>. Increasing this can give you more
  347. comprehensive ambiguity information at the cost of more processing
  348. time. For large grammars such as the Java grammar, increasing this number
  349. any further causes the checking to take too much time.
  350. </li>
  351. <li>
  352. <strong>OTHER_AMBIGUITY_CHECK:</strong>
  353. This is an integer option whose default value is 1.
  354. This is the number of tokens considered in checking all other kinds of
  355. choices (i.e., of the forms "(A)*", "(A)+", and "(A)?") for ambiguity.
  356. This takes more time to do than the choice checking, and hence the
  357. default value is set to 1 rather than 2.
  358. </li>
  359. <li>
  360. <strong>STATIC:</strong>
  361. This is a boolean option whose default value is true. If
  362. true, all methods and class variables are specified as static in the
  363. generated parser and token manager. This allows only one parser object to be present,
  364. but it improves the performance of the parser. To perform multiple
  365. parses during one run of your Java program, you will have to call the
  366. <a href="apiroutines.html">ReInit()</a>
  367. method to reinitialize your parser if it is static.
  368. If the parser is non-static, you may use the "new" operator to
  369. construct as many parsers as you wish. These can all be used
  370. simultaneously from different threads.
  371. </li>
  372. <li>
  373. <strong>DEBUG_PARSER:</strong>
  374. This is a boolean option whose default value is false. This
  375. option is used to obtain debugging information from the generated
  376. parser. Setting this option to true causes the parser to generate
  377. a trace of its actions. Tracing may be disabled by
  378. calling the method <a href="apiroutines.html">disable_tracing()</a>
  379. in the generated parser class. Tracing may be subsequently enabled
  380. by calling the method <a href="apiroutines.html">enable_tracing()</a>
  381. in the generated parser class.
  382. </li>
  383. <li>
  384. <strong>DEBUG_LOOKAHEAD:</strong>
  385. This is a boolean option whose default value is false. Setting this
  386. option to true causes the parser to generate all the tracing information
  387. it does when the option DEBUG_PARSER is true, and in addition, also
  388. causes it to generated a trace of actions performed during
  389. <a href="lookahead.html">lookahead operation</a>.
  390. </li>
  391. <li>
  392. <strong>DEBUG_TOKEN_MANAGER:</strong>
  393. This is a boolean option whose default value is false. This
  394. option is used to obtain debugging information from the generated
  395. token manager. Setting this option to true causes the token manager to generate
  396. a trace of its actions. This trace is rather large and should only
  397. be used when you have a lexical error that has been reported to you
  398. and you cannot understand why. Typically, in this situation, you
  399. can determine the problem by looking at the last few lines of this trace.
  400. </li>
  401. <li>
  402. <strong>ERROR_REPORTING:</strong>
  403. This is a boolean option whose default value is
  404. true. Setting it to false causes errors due to parse errors to be
  405. reported in somewhat less detail. The only reason to set this
  406. option to false is to improve performance.
  407. </li>
  408. <li>
  409. <strong>JAVA_UNICODE_ESCAPE:</strong>
  410. This is a boolean option whose default value is
  411. false. When set to true, the generated parser uses
  412. an input stream object that processes Java Unicode escapes
  413. (\u...) before sending characters to the token manager. By
  414. default, Java Unicode escapes are not processed.
  415. <br />
  416. This option is ignored if either of options USER_TOKEN_MANAGER,
  417. USER_CHAR_STREAM is set to true.
  418. </li>
  419. <li>
  420. <strong>UNICODE_INPUT:</strong>
  421. This is a boolean option whose default value is
  422. false. When set to true, the generated parser uses
  423. uses an input stream object that reads Unicode files. By default,
  424. ASCII files are assumed.
  425. <br />
  426. This option is ignored if either of
  427. options USER_TOKEN_MANAGER, USER_CHAR_STREAM is set to true.
  428. </li>
  429. <li>
  430. <strong><a name="IGNORE_CASE">IGNORE_CASE:</a></strong>
  431. This is a boolean option whose default value is false.
  432. Setting this option to true causes the generated token manager to ignore
  433. case in the token specifications and the input files. This is useful
  434. for writing grammars for languages such as HTML. It is also possible
  435. to localize the effect of IGNORE_CASE by using
  436. <a href="#prod10">an alternate mechanism described later</a>.
  437. </li>
  438. <li>
  439. <strong>USER_TOKEN_MANAGER:</strong>
  440. This is a boolean option whose default value is
  441. false. The default action is to generate a token manager
  442. that works on the specified grammar tokens. If this
  443. option is set to true, then the parser is generated to accept tokens
  444. from any token manager of type "TokenManager" - this interface
  445. is generated into the generated parser directory.
  446. </li>
  447. <li>
  448. <strong>SUPPORT_CLASS_VISIBILITY_PUBLIC:</strong>
  449. This is a boolean option whose default value is
  450. true. The default action is to generate support classes (such as
  451. Token.java, ParseException.java etc) with <em>Public</em> visibility. If
  452. set to false, the classes will be generated with package-private
  453. visibility.
  454. </li>
  455. <li>
  456. <strong>USER_CHAR_STREAM:</strong>
  457. This is a boolean option whose default value is
  458. false. The default action is to generate a character stream reader
  459. as specified by the options JAVA_UNICODE_ESCAPE and UNICODE_INPUT.
  460. The generated token manager receives characters
  461. from this stream reader. If this option is set to true, then the
  462. token manager is generated to read characters from any character
  463. stream reader of type "CharStream.java". This file is generated
  464. into the generated parser directory.
  465. <br />
  466. This option is ignored if USER_TOKEN_MANAGER is set to true.
  467. </li>
  468. <li>
  469. <strong>BUILD_PARSER:</strong>
  470. This is a boolean option whose default value is true.
  471. The default action is to generate the parser file ("MyParser.java"
  472. in the above example). When set to false, the parser file is
  473. not generated. Typically, this option is set to false when
  474. you wish to generate only the token manager and use it without
  475. the associated parser.
  476. </li>
  477. <li>
  478. <strong>BUILD_TOKEN_MANAGER:</strong>
  479. This is a boolean option whose default value is true.
  480. The default action is to generate the token manager file
  481. ("MyParserTokenManager.java" in the above example). When set to
  482. false the token manager file is not generated. The only reason
  483. to set this option to false is to save some time during parser
  484. generation when you fix problems in the parser part of the grammar
  485. file and leave the lexical specifications untouched.
  486. </li>
  487. <li>
  488. <strong>TOKEN_MANAGER_USES_PARSER:</strong>
  489. This is a boolean option whose default value is false.
  490. When set to true, the generated token manager will include a field
  491. called <CODE>parser</CODE> that references the instantiating parser
  492. instance (of type <CODE>MyParser</CODE> in the above example).
  493. The main reason for having a parser in a token manager is using
  494. some of its logic in lexical actions.
  495. This option has no effect if the STATIC option is set to true.
  496. </li>
  497. <li>
  498. <strong>TOKEN_EXTENDS:</strong>
  499. This is a string option whose default value is
  500. "", meaning that the generated Token class will extend
  501. java.lang.Object. This option may be set to the name of a
  502. class that will be used as the base class for the generated
  503. <code>Token</code> class.
  504. </li>
  505. <li>
  506. <strong>TOKEN_FACTORY:</strong>
  507. This is a string option whose default value is
  508. "", meaning that Tokens will be created by calling
  509. <code>Token.newToken()</code>. If set the option names a
  510. Token factory class containing a
  511. <code>public static Token newToken(int ofKind, String image)</code>
  512. method.
  513. </li>
  514. <li>
  515. <strong>SANITY_CHECK:</strong>
  516. This is a boolean option whose default value is true.
  517. JavaCC performs many syntactic and semantic checks on the grammar
  518. file during parser generation. Some checks such as detection of
  519. left recursion, detection of ambiguity, and bad usage of empty
  520. expansions may be suppressed for faster parser generation by
  521. setting this option to false. Note that the presence of these
  522. errors (even if they are not detected and reported by setting this
  523. option to false) can cause unexpected behavior from the generated
  524. parser.
  525. </li>
  526. <li>
  527. <strong>FORCE_LA_CHECK:</strong>
  528. This is a boolean option whose default value is false.
  529. This option setting controls lookahead ambiguity checking performed
  530. by JavaCC. By default (when this option is false), lookahead
  531. ambiguity checking is performed for all choice points where the
  532. default lookahead of 1 is used. Lookahead ambiguity checking is
  533. not performed at choice points where there is an
  534. <a href="lookahead.html">explicit lookahead specification</a>,
  535. or if the option LOOKAHEAD is set to something other than 1.
  536. Setting this option to true performs lookahead ambiguity checking
  537. at <em>all</em> choice points regardless of the lookahead specifications
  538. in the grammar file.
  539. </li>
  540. <li>
  541. <strong>COMMON_TOKEN_ACTION:</strong>
  542. This is a boolean option whose default value is false.
  543. When set to true, every call to the token manager's method
  544. "getNextToken" (<a href="apiroutines.html">see the description of the
  545. Java Compiler Compiler API</a>) will cause a call to a used defined
  546. method "CommonTokenAction" after the token has been scanned in by the
  547. token manager. The user must define this method within the
  548. <a href="#prod12">TOKEN_MGR_DECLS</a> section.
  549. The signature of this method is:
  550. <pre>
  551. void CommonTokenAction(Token t)
  552. </pre>
  553. </li>
  554. <li>
  555. <strong>CACHE_TOKENS:</strong>
  556. This is a boolean option whose default value is false.
  557. Setting this option to true causes the generated parser to lookahead for
  558. extra tokens ahead of time. This facilitates some performance improvements.
  559. However, in this case (when the option is true), interactive
  560. applications may not work since the parser needs to work synchronously
  561. with the availability of tokens from the input stream. In such cases,
  562. it's best to leave this option at its default value.
  563. </li>
  564. <li>
  565. <strong>OUTPUT_DIRECTORY:</strong>
  566. This is a string valued option whose default value is the current
  567. directory. This controls where output files are generated.
  568. </li>
  569. </ul>
  570. <hr />
  571. <table>
  572. <tr>
  573. <td align="right" valign="baseline"><a name="prod5">production</a></td>
  574. <td align="center" valign="baseline">::=</td>
  575. <td align="left" valign="baseline"><a href="#prod9">javacode_production</a></td>
  576. </tr>
  577. <tr>
  578. <td align="right" valign="baseline"></td>
  579. <td align="center" valign="baseline">|</td>
  580. <td align="left" valign="baseline"><a href="#prod10">regular_expr_production</a></td>
  581. </tr>
  582. <tr>
  583. <td align="right" valign="baseline"></td>
  584. <td align="center" valign="baseline">|</td>
  585. <td align="left" valign="baseline"><a href="#prod11">bnf_production</a></td>
  586. </tr>
  587. <tr>
  588. <td align="right" valign="baseline"></td>
  589. <td align="center" valign="baseline">|</td>
  590. <td align="left" valign="baseline"><a href="#prod12">token_manager_decls</a></td>
  591. </tr>
  592. </table>
  593. <p>
  594. There are four kinds of productions in JavaCC.
  595. <a href="#prod9">javacode_production</a> and <a href="#prod11">bnf_production</a>
  596. are used to define the grammar from which the parser is generated.
  597. <a href="#prod10">regular_expr_production</a> is used to define the grammar
  598. tokens - the token manager is generated from this information (as well as from
  599. inline token specifications in the parser grammar).
  600. <a href="#prod12">token_manager_decls</a> is used to introduce declarations
  601. that get inserted into the generated token manager.
  602. </p>
  603. <hr />
  604. <table>
  605. <tr>
  606. <td align="right" valign="baseline"><a name="prod9">javacode_production</a></td>
  607. <td align="center" valign="baseline">::=</td>
  608. <td align="left" valign="baseline">"<a name="JAVACODE">JAVACODE</a>"</td>
  609. </tr>
  610. <tr>
  611. <td></td><td></td>
  612. <td align="left" valign="baseline"><em>java_access_modifier</em> <em>java_return_type</em> <em>java_identifier</em> "(" <em>java_parameter_list</em> ")"</td>
  613. </tr>
  614. <tr>
  615. <td></td><td></td>
  616. <td align="left" valign="baseline"><em>java_block</em></td>
  617. </tr>
  618. </table>
  619. <p>
  620. The JAVACODE production is a way to write Java code for some
  621. productions instead of the usual EBNF expansion. This is useful when
  622. there is the need to recognize something that is not context-free
  623. or for whatever reason is very difficult to write a grammar for.
  624. An example of the use of JAVACODE is shown below. In this example,
  625. the non-terminal "skip_to_matching_brace" consumes tokens in the input
  626. stream all the way up to a matching closing brace (the opening brace
  627. is assumed to have been just scanned):
  628. </p>
  629. <pre>
  630. JAVACODE
  631. void skip_to_matching_brace() {
  632. <a href="apiroutines.html">Token</a> tok;
  633. int nesting = 1;
  634. while (true) {
  635. tok = <a href="apiroutines.html">getToken</a>(1);
  636. if (tok.kind == LBRACE) nesting++;
  637. if (tok.kind == RBRACE) {
  638. nesting--;
  639. if (nesting == 0) break;
  640. }
  641. tok = <a href="apiroutines.html">getNextToken</a>();
  642. }
  643. }
  644. </pre>
  645. <p>
  646. Care must be taken when using JAVACODE productions. While you can
  647. say pretty much what you want with these productions, JavaCC simply
  648. considers it a black box (that somehow performs its parsing task).
  649. This becomes a problem when JAVACODE productions appear at
  650. <a href="lookahead.html">choice points</a>. For example, if the
  651. above JAVACODE production was referred to from the following production:
  652. </p>
  653. <pre>
  654. void NT() :
  655. {}
  656. {
  657. skip_to_matching_brace()
  658. |
  659. some_other_production()
  660. }
  661. </pre>
  662. <p>
  663. Then JavaCC would not know how to choose between the two choices.
  664. On the other hand, if the JAVACODE production is used at a non-choice
  665. point as in the following example, there is no problem:
  666. </p>
  667. <pre>
  668. void NT() :
  669. {}
  670. {
  671. "{" skip_to_matching_brace()
  672. |
  673. "(" parameter_list() ")"
  674. }
  675. </pre>
  676. <p>
  677. JAVACODE productions at choice points may also be preceded by syntactic or
  678. semantic LOOKAHEAD, as in this example:
  679. </p>
  680. <pre>
  681. void NT() :
  682. {}
  683. {
  684. LOOKAHEAD( {errorOccurred} ) skip_to_matching_brace()
  685. |
  686. "(" parameter_list() ")"
  687. }
  688. </pre>
  689. <!-- JavaCC *should* print a warning message, but currently doesn't (see issue 166).
  690. <p>
  691. When this issue is fixed ww should re-instate this paragraph.
  692. When JAVACODE productions are used at choice points, JavaCC will
  693. print a warning message stating this fact. You will then have to
  694. insert some explicit LOOKAHEAD specifications to help JavaCC. See
  695. <a href="lookahead.html">the minitutorial on LOOKAHEAD</a> for a
  696. detailed guide on such issues.
  697. </p>
  698. -->
  699. <p>
  700. The default access modifier for JAVACODE productions is package private.
  701. </p>
  702. <hr />
  703. <table>
  704. <tr>
  705. <td align="right" valign="baseline"><a name="prod11">bnf_production</a></td>
  706. <td align="center" valign="baseline">::=</td>
  707. <td align="left" valign="baseline"><em>java_access_modifier</em> <em>java_return_type</em> <em>java_identifier</em> "(" <em>java_parameter_list</em> ")" ":"</td>
  708. </tr>
  709. <tr>
  710. <td></td><td></td>
  711. <td align="left" valign="baseline"><em>java_block</em></td>
  712. </tr>
  713. <tr>
  714. <td></td><td></td>
  715. <td align="left" valign="baseline">"{" <a href="#prod16">expansion_choices</a> "}"</td>
  716. </tr>
  717. </table>
  718. <p>
  719. The BNF production is the standard production used
  720. in specifying JavaCC grammars. Each BNF production has a left hand
  721. side which is a non-terminal specification. The BNF production then
  722. defines this non-terminal in terms of BNF expansions on the right hand
  723. side. The non-terminal is written exactly like a declared Java method.
  724. Since each non-terminal is translated into a method
  725. in the generated parser, this style of writing the non-terminal makes
  726. this association obvious. The name of the non-terminal is the name of
  727. the method, and the parameters and return value declared are the means
  728. to pass values up and down the parse tree. As will be seen later,
  729. non-terminals on the right hand sides of productions are written as
  730. method calls, so the passing of values up and down the tree are done
  731. using exactly the same paradigm as method call and return.
  732. The default access modifier for BNF productions is public.
  733. </p>
  734. <p>
  735. There are two parts on the right hand side of an BNF production. The
  736. first part is a set of arbitrary Java declarations and code (the Java
  737. block). This code is generated at the beginning
  738. of the method generated for the Java non-terminal. Hence, every time
  739. this non-terminal is used in the parsing process, these declarations and
  740. code are executed. The declarations in this part are visible to all Java
  741. code in actions in the BNF expansions. JavaCC does not do any processing
  742. of these declarations and code, except to skip to the matching ending
  743. brace, collecting all text encountered on the way. Hence, a Java compiler
  744. can detect errors in this code that has been processed by JavaCC.
  745. </p>
  746. <p>
  747. The second part of the right hand side are the BNF expansions. This
  748. is described <a href="#prod16">later</a>.
  749. </p>
  750. <hr />
  751. <table>
  752. <tr>
  753. <td align="right" valign="baseline"><a name="prod10">regular_expr_production</a></td>
  754. <td align="center" valign="baseline">::=</td>
  755. <td align="left" valign="baseline">[ <a href="#newprod1">lexical_state_list</a> ]</td>
  756. </tr>
  757. <tr>
  758. <td></td><td></td>
  759. <td align="left" valign="baseline"><a href="#prod17">regexpr_kind</a> [ "[" "IGNORE_CASE" "]" ] ":"</td>
  760. </tr>
  761. <tr>
  762. <td></td><td></td>
  763. <td align="left" valign="baseline">"{" <a href="#prod18">regexpr_spec</a> ( "|" <a href="#prod18">regexpr_spec</a> )* "}"</td>
  764. </tr>
  765. </table>
  766. <p>
  767. A regular expression production is used to define lexical entities
  768. that get processed by the generated token manager. A detailed description
  769. of how the token manager works is provided in
  770. <a href="tokenmanager.html">this minitutorial (click here)</a>. This
  771. page describes the syntactic aspects of specifying lexical entities,
  772. while <a href="tokenmanager.html">the minitutorial</a> describes how
  773. these syntactic constructs tie in with how the token manager actually
  774. works.
  775. </p>
  776. <p>
  777. A regular expression production starts with a specification of the
  778. lexical states for which it applies (the
  779. <a href="#newprod1">lexical state list</a>).
  780. There is a standard lexical state called "DEFAULT". If the
  781. <a href="#newprod1">lexical state list</a> is omitted, the regular
  782. expression production applies to the lexical state "DEFAULT".
  783. </p>
  784. <p>
  785. Following this is a description of what kind of regular expression
  786. production this is (<a href="#prod17">see below for what this means</a>).
  787. </p>
  788. <p>
  789. After this is an optional "[IGNORE_CASE]". If this is present, the
  790. regular expression production is case insensitive - it has the same
  791. effect as the
  792. <a href="#prod6">IGNORE_CASE</a>
  793. option, except that in this case it applies locally to this regular
  794. expression production.
  795. </p>
  796. <p>
  797. This is then followed by a list of regular expression specifications
  798. that describe in more detail the lexical entities of this regular
  799. expression production.
  800. </p>
  801. <hr />
  802. <table>
  803. <tr>
  804. <td align="right" valign="baseline"><a name="prod12">token_manager_decls</a></td>
  805. <td align="center" valign="baseline">::=</td>
  806. <td align="left" valign="baseline">"<a name="TOKEN_MGR_DECLS">TOKEN_MGR_DECLS</a>" ":" <em>java_block</em></td>
  807. </tr>
  808. </table>
  809. <p>
  810. The token manager declarations starts with the reserved word
  811. "TOKEN_MGR_DECLS" followed by a ":" and then a set of Java declarations
  812. and statements (the Java block). These declarations and statements are
  813. written into the generated token manager and are accessible from within
  814. <a href="#prod18">lexical actions</a>. See
  815. <a href="tokenmanager.html">the minitutorial on the token manager</a>
  816. for more details.
  817. </p>
  818. <p>
  819. There can only be one token manager declaration in a JavaCC grammar file.
  820. </p>
  821. <hr />
  822. <table>
  823. <tr>
  824. <td align="right" valign="baseline"><a name="newprod1">lexical_state_list</a></td>
  825. <td align="center" valign="baseline">::=</td>
  826. <td align="left" valign="baseline">"&lt;" "*" "&gt;"</td>
  827. </tr>
  828. <tr>
  829. <td></td><td>|</td>
  830. <td align="left" valign="baseline">"&lt;" <em>java_identifier</em> ( "," <em>java_identifier</em> )* "&gt;"</td>
  831. </tr>
  832. </table>
  833. <p>
  834. The lexical state list describes the set of lexical states for which
  835. the corresponding <a href="#prod10">regular expression production</a>
  836. applies. If this is written as "&lt;*&gt;", the regular expression production
  837. applies to all lexical states. Otherwise, it applies to all the lexical
  838. states in the identifier list within the angular brackets.
  839. </p>
  840. <hr />
  841. <table>
  842. <tr>
  843. <td align="right" valign="baseline"><a name="prod17">regexpr_kind</a></td>
  844. <td align="center" valign="baseline">::=</td>
  845. <td align="left" valign="baseline">"TOKEN"</td>
  846. </tr>
  847. <tr>
  848. <td align="right" valign="baseline"></td>
  849. <td align="center" valign="baseline">|</td>
  850. <td align="left" valign="baseline">"SPECIAL_TOKEN"</td>
  851. </tr>
  852. <tr>
  853. <td align="right" valign="baseline"></td>
  854. <td align="center" valign="baseline">|</td>
  855. <td align="left" valign="baseline">"SKIP"</td>
  856. </tr>
  857. <tr>
  858. <td align="right" valign="baseline"></td>
  859. <td align="center" valign="baseline">|</td>
  860. <td align="left" valign="baseline">"MORE"</td>
  861. </tr>
  862. </table>
  863. <p>
  864. This specifies the kind of
  865. <a href="#prod10">regular expression production</a>.
  866. There are four kinds:
  867. </p>
  868. <ul>
  869. <li>
  870. <strong><a name="TOKEN">TOKEN</a></strong>:
  871. The regular expressions in this regular expression production describe
  872. <em>tokens</em> in the grammar. The token manager creates a
  873. <a href="apiroutines.html">Token</a> object for each match of such
  874. a regular expression and returns it to the parser.
  875. </li>
  876. <li>
  877. <strong><a name="SPECIAL_TOKEN">SPECIAL_TOKEN</a></strong>:
  878. The regular expressions in this regular expression production describe
  879. <em>special tokens</em>. Special tokens are like tokens, except that
  880. they do not have significance during parsing - that is the BNF productions
  881. ignore them. Special tokens are, however, still passed on to the parser
  882. so that parser actions can access them. Special tokens are passed
  883. to the parser by linking them to neighboring real tokens using the
  884. field "specialToken" in the <a href="apiroutines.html">Token</a>
  885. class. Special tokens are useful in the processing of lexical entities
  886. such as comments which have no significance to parsing, but still
  887. are an important part of the input file. See
  888. <a href="tokenmanager.html">the minitutorial on the token manager</a>
  889. for more details of special token handling.
  890. </li>
  891. <li>
  892. <strong><a name="SKIP">SKIP</a></strong>:
  893. Matches to regular expressions in this regular expression production
  894. are simply skipped (ignored) by the token manager.
  895. </li>
  896. <li>
  897. <strong><a name="MORE">MORE</a></strong>:
  898. Sometimes it is useful to gradually build up a token to be passed on
  899. to the parser. Matches to this kind of regular expression are stored
  900. in a buffer until the next TOKEN or SPECIAL_TOKEN match. Then all
  901. the matches in the buffer and the final TOKEN/SPECIAL_TOKEN match
  902. are concatenated together to form one TOKEN/SPECIAL_TOKEN that is
  903. passed on to the parser. If a match to a SKIP regular expression
  904. follows a sequence of MORE matches, the contents of the buffer is
  905. discarded.
  906. </li>
  907. </ul>
  908. <hr />
  909. <table>
  910. <tr>
  911. <td align="right" valign="baseline"><a name="prod18">regexpr_spec</a></td>
  912. <td align="center" valign="baseline">::=</td>
  913. <td align="left" valign="baseline"><a href="#prod19">regular_expression</a> [ <em>java_block</em> ] [ ":" <em>java_identifier</em> ]</td>
  914. </tr>
  915. </table>
  916. <p>
  917. The regular expression specification begins the actual description
  918. of the lexical entities that are part of this
  919. <a href="#prod10">regular expression production</a>.
  920. Each regular expression production may contain any number of
  921. regular expression specifications.
  922. </p>
  923. <p>
  924. Each regular expression specification contains a regular expression
  925. followed by a Java block (the lexical action) which is optional.
  926. This is then followed by an identifier of a lexical state (which
  927. is also optional). Whenever this regular expression is matched,
  928. the lexical action (if any) gets executed, followed by any
  929. <a href="#prod6">common token actions</a>. Then the action depending
  930. on the
  931. <a href="#prod17">regular expression production kind</a>
  932. is taken. Finally, if a lexical state is specified, the token
  933. manager moves to that lexical state for further processing (the
  934. token manager starts initially in the state "DEFAULT").
  935. </p>
  936. <hr />
  937. <table>
  938. <tr>
  939. <td align="right" valign="baseline"><a name="prod16">expansion_choices</a></td>
  940. <td align="center" valign="baseline">::=</td>
  941. <td align="left" valign="baseline"><a href="#prod20">expansion</a> ( "|" <a href="#prod20">expansion</a> )*</td>
  942. </tr>
  943. </table>
  944. <p>
  945. Expansion choices are written as a list of one or more expansions
  946. separated by "|"s. The set of legal parses allowed by an expansion
  947. choice is a legal parse of any one of the contained expansions.
  948. </p>
  949. <hr />
  950. <table>
  951. <tr>
  952. <td align="right" valign="baseline"><a name="prod20">expansion</a></td>
  953. <td align="center" valign="baseline">::=</td>
  954. <td align="left" valign="baseline">( <a href="#prod22">expansion_unit</a> )*</td>
  955. </tr>
  956. </table>
  957. <p>
  958. An expansion is written as a sequence of expansion units.
  959. A concatenation of legal
  960. parses of the expansion units is a legal parse of the expansion.
  961. </p>
  962. <p>
  963. For example, the expansion "{" decls() "}" consists of three expansion
  964. units - "{", decls(), and "}". A match for the expansion is a concatenation
  965. of the matches for the individual expansion units - in this case, that would
  966. be any string that begins with a "{", ends with a "}", and contains a match
  967. for decls() in between.
  968. </p>
  969. <hr />
  970. <table>
  971. <tr>
  972. <td align="right" valign="baseline"><a name="prod22">expansion_unit</a></td>
  973. <td align="center" valign="baseline">::=</td>
  974. <td align="left" valign="baseline"><a href="#prod21">local_lookahead</a></td>
  975. </tr>
  976. <tr>
  977. <td align="right" valign="baseline"></td>
  978. <td align="center" valign="baseline">|</td>
  979. <td align="left" valign="baseline"><em>java_block</em></td>
  980. </tr>
  981. <tr>
  982. <td align="right" valign="baseline"></td>
  983. <td align="center" valign="baseline">|</td>
  984. <td align="left" valign="baseline">"(" <a href="#prod16">expansion_choices</a> ")" [ "+" | "*" | "?" ]</td>
  985. </tr>
  986. <tr>
  987. <td align="right" valign="baseline"></td>
  988. <td align="center" valign="baseline">|</td>
  989. <td align="left" valign="baseline">"[" <a href="#prod16">expansion_choices</a> "]"</td>
  990. </tr>
  991. <tr>
  992. <td align="right" valign="baseline"></td>
  993. <td align="center" valign="baseline">|</td>
  994. <td align="left" valign="baseline">[ <em>java_assignment_lhs</em> "=" ] <a href="#prod19">regular_expression</a></td>
  995. </tr>
  996. <tr>
  997. <td align="right" valign="baseline"></td>
  998. <td align="center" valign="baseline">|</td>
  999. <td align="left" valign="baseline">[ <em>java_assignment_lhs</em> "=" ] <em>java_identifier</em> "(" <em>java_expression_list</em> ")"</td>
  1000. </tr>
  1001. </table>
  1002. <p>
  1003. An expansion unit can be a <a href="#prod21">local LOOKAHEAD specification</a>.
  1004. This instructs the
  1005. generated parser on how to make choices at choice points. For details
  1006. on how LOOKAHEAD specifications work and how to write LOOKAHEAD specifications,
  1007. <a href="lookahead.html">click here to visit the minitutorial on LOOKAHEAD</a>.
  1008. </p>
  1009. <p>
  1010. An expansion unit can be a set of Java declarations and code enclosed
  1011. within braces (the Java block). These are also called <em>parser
  1012. actions</em>. This is generated into the method parsing the
  1013. non-terminal at the appropriate location. This block is executed
  1014. whenever the parsing process crosses this point successfully.
  1015. When JavaCC processes the Java block, it does not perform any detailed
  1016. syntax or semantic checking. Hence it is possible that the Java compiler
  1017. will find errors in your actions that have been processed by JavaCC.
  1018. <em>Actions are not executed during
  1019. <a href="lookahead.html">lookahead evaluation</a>.</em>
  1020. </p>
  1021. <p>
  1022. An expansion unit can be a parenthesized set of one or more
  1023. <a href="#prod16">expansion choices</a>. In which case, a legal parse of the expansion
  1024. unit is any legal parse of the nested expansion choices.
  1025. The parenthesized set of expansion choices can be suffixed (optionally) by:
  1026. </p>
  1027. <ul>
  1028. <li>
  1029. <strong>"+":</strong>
  1030. Then any legal parse of the expansion unit is one or more
  1031. repetitions of a legal parse of the parenthesized set of
  1032. expansion choices.
  1033. </li>
  1034. <li>
  1035. <strong>"*":</strong>
  1036. Then any legal parse of the expansion unit is zero or more
  1037. repetitions of a legal parse of the parenthesized set of
  1038. expansion choices.
  1039. </li>
  1040. <li>
  1041. <strong>"?":</strong>
  1042. Then a legal parse of the expansion unit is either the
  1043. empty token sequence or any legal parse of the nested expansion choices.
  1044. An alternate syntax for this construct is to enclose the
  1045. expansion choices within brackets "[...]".
  1046. </li>
  1047. </ul>
  1048. <p>
  1049. An expansion unit can be a <a href="#prod19">regular expression</a>. Then a legal parse
  1050. of the expansion unit is any token that matches this regular
  1051. expression. When a regular expression is matched, it creates an
  1052. object of type <a href="apiroutines.html">Token</a>. This object
  1053. can be accessed by assigning it to a variable by prefixing the
  1054. regular expression with "variable =". In general, you may have any
  1055. valid Java assignment left-hand side to the left of the "=".
  1056. <em>This assignment is not performed during
  1057. <a href="lookahead.html">lookahead evaluation</a>.</em>
  1058. </p>
  1059. <p>
  1060. An expansion unit can be a non-terminal (the last choice in the syntax
  1061. above). In which case, it takes
  1062. the form of a method call with the non-terminal name used as the
  1063. name of the method. A successful parse of the non-terminal causes
  1064. the parameters placed in the method call to be operated on and a
  1065. value returned (in case the non-terminal was not declared to be
  1066. of type "void"). The return value can be assigned (optionally) to
  1067. a variable by prefixing the regular expression with "variable =".
  1068. In general, you may have any
  1069. valid Java assignment left-hand side to the left of the "=".
  1070. <em>This assignment is not performed during
  1071. <a href="lookahead.html">lookahead evaluation</a>.</em>
  1072. Non-terminals may not be used in an expansion in a manner that introduces
  1073. left-recursion. JavaCC checks this for you.
  1074. </p>
  1075. <hr />
  1076. <table>
  1077. <tr>
  1078. <td align="right" valign="baseline"><a name="prod21">local_lookahead</a></td>
  1079. <td align="center" valign="baseline">::=</td>
  1080. <td align="left" valign="baseline">"LOOKAHEAD" "(" [ <em>java_integer_literal</em> ] [ "," ] [ <a href="#prod16">expansion_choices</a> ] [ "," ] [ "{" <em>java_expression</em> "}" ] ")"</td>
  1081. </tr>
  1082. </table>
  1083. <p>
  1084. A local lookahead specification is used to influence the way the generated
  1085. parser makes choices at the various
  1086. <a href="lookahead.html">choice points</a>
  1087. in the grammar. A local lookahead specification starts with the reserved
  1088. word "LOOKAHEAD" followed by a set of lookahead constraints within parentheses.
  1089. There are three different kinds of lookahead constraints - a lookahead limit
  1090. (the integer literal), a syntactic lookahead (the expansion choices), and
  1091. a semantic lookahead (the expression within braces). At least one lookahead
  1092. constraint must be present. If more than one lookahead constraint is present,
  1093. they must be separated by commas.
  1094. </p>
  1095. <p>
  1096. For a detailed description of how lookahead works, please
  1097. <a href="lookahead.html">click here to visit the minitutorial on LOOKAHEAD</a>.
  1098. A brief description of each kind of lookahead constraint is given below:
  1099. </p>
  1100. <ul>
  1101. <li>
  1102. <strong>Lookahead Limit:</strong>
  1103. This is the maximum number of tokens of lookahead that may be used for choice
  1104. determination purposes. This overrides the default value which is specified
  1105. by the <a href="#prod2">LOOKAHEAD option</a>. This lookahead limit applies
  1106. only to the <a href="lookahead.html">choice point</a>
  1107. at the location of the local lookahead specification.
  1108. If the local lookahead specification is not at a choice point, the lookahead
  1109. limit (if any) is ignored.
  1110. </li>
  1111. <li>
  1112. <strong>Syntactic Lookahead:</strong>
  1113. This is an expansion (or expansion choices) that is used for the purpose of
  1114. determining whether or not the particular choice that this local lookahead
  1115. specification applies to is to be taken. If this was not provided, the parser
  1116. uses the expansion to be selected during lookahead determination.
  1117. If the local lookahead specification is not at a
  1118. <a href="lookahead.html">choice point</a>, the syntactic
  1119. lookahead (if any) is ignored.
  1120. </li>
  1121. <li>
  1122. <strong>Semantic Lookahead:</strong>
  1123. This is a boolean expression that is evaluated whenever the parser crosses this
  1124. point during parsing. If the expression evaluates to true, the parsing
  1125. continues normally. If the expression evaluates to false and the local
  1126. lookahead specification is at a <a href="lookahead.html">choice point</a>,
  1127. the current choice is not taken and the next choice is considered.
  1128. If the expression evaluates to false and the local lookahead specification
  1129. is <em>not</em> at a choice point, then parsing aborts with a parse error.
  1130. Unlike the other two lookahead constraints that are ignored at non-choice
  1131. points, semantic lookahead is always evaluated. In fact, semantic lookahead
  1132. is even evaluated if it is encountered during the evaluation of some other
  1133. syntactic lookahead check (for more details
  1134. <a href="lookahead.html">click here to visit the minitutorial on LOOKAHEAD</a>).
  1135. </li>
  1136. </ul>
  1137. <p>
  1138. <strong>Default values for lookahead constraints:</strong>
  1139. If a local lookahead specification has been provided, but not all lookahead
  1140. constraints have been included, then the missing ones are assigned default
  1141. values as follows:
  1142. </p>
  1143. <ul>
  1144. <li>
  1145. If the lookahead limit is not provided and if the syntactic lookahead is
  1146. provided, then the lookahead limit defaults to the largest integer value
  1147. (2147483647). This essentially implements "infinite lookahead" - namely,
  1148. look ahead as many tokens as necessary to match the syntactic lookahead that
  1149. has been provided.
  1150. </li>
  1151. <li>
  1152. If neither the lookahead limit nor the syntactic lookahead has been
  1153. provided (which means the semantic lookahead is provided), the lookahead
  1154. limit defaults to 0. This means that syntactic lookahead is not performed
  1155. (it passes trivially), and only semantic lookahead is performed.
  1156. </li>
  1157. <li>
  1158. If the syntactic lookahead is not provided, it defaults to the choice
  1159. to which the local lookahead specification applies. If the local lookahead
  1160. specification is not at a choice point, then the syntactic lookahead is
  1161. ignored - hence a default value is not relevant.
  1162. </li>
  1163. <li>
  1164. If the semantic lookahead is not provided, it defaults to the boolean
  1165. expression "true". That is, it trivially passes.
  1166. </li>
  1167. </ul>
  1168. <hr />
  1169. <table>
  1170. <tr>
  1171. <td align="right" valign="baseline"><a name="prod19">regular_expression</a></td>
  1172. <td align="center" valign="baseline">::=</td>
  1173. <td align="left" valign="baseline"><em>java_string_literal</em></td>
  1174. </tr>
  1175. <tr>
  1176. <td align="right" valign="baseline"></td>
  1177. <td align="center" valign="baseline">|</td>
  1178. <td align="left" valign="baseline">"&lt;" [ [ "#" ] <em>java_identifier</em> ":" ] <a href="#prod29">complex_regular_expression_choices</a> "&gt;"</td>
  1179. </tr>
  1180. <tr>
  1181. <td align="right" valign="baseline"></td>
  1182. <td align="center" valign="baseline">|</td>
  1183. <td align="left" valign="baseline">"&lt;" <em>java_identifier</em> "&gt;"</td>
  1184. </tr>
  1185. <tr>
  1186. <td align="right" valign="baseline"></td>
  1187. <td align="center" valign="baseline">|</td>
  1188. <td align="left" valign="baseline">"&lt;" "EOF" "&gt;"</td>
  1189. </tr>
  1190. </table>
  1191. <p>
  1192. There are two places in a grammar files where regular expressions may be
  1193. written:
  1194. </p>
  1195. <ul>
  1196. <li>
  1197. Within a <a href="#prod18">regular expression specification</a>
  1198. (part of a <a href="#prod10">regular expression production</a>),
  1199. </li>
  1200. <li>
  1201. As an <a href="#prod22">expansion unit</a> with an <a href="#prod20">expansion</a>.
  1202. When a regular expression is used in this manner, it is as if the regular expression
  1203. were defined in the following manner at this location and then referred to by its
  1204. label from the expansion unit:
  1205. <pre>
  1206. &lt;DEFAULT&gt; TOKEN :
  1207. {
  1208. regular expression
  1209. }
  1210. </pre>
  1211. That is, this usage of regular expression can be rewritten using the other
  1212. kind of usage.
  1213. </li>
  1214. </ul>
  1215. <p>
  1216. The complete details of regular expression matching by the token manager is
  1217. available in
  1218. <a href="tokenmanager.html">the minitutorial on the token manager</a>. The
  1219. description of the syntactic constructs follows.
  1220. </p>
  1221. <p>
  1222. The first kind of regular expression is a string literal. The input being
  1223. parsed matches this regular expression if the token manager is in a
  1224. <a href="#prod10">lexical state</a> for which this regular expression applies
  1225. and the next set of characters in the input stream is the same (possibly with
  1226. case ignored) as this string literal.
  1227. </p>
  1228. <p>
  1229. A regular expression may