PageRenderTime 25ms CodeModel.GetById 20ms RepoModel.GetById 1ms app.codeStats 0ms

/projects/javacc-5.0/www/doc/simpleREADME.html

https://gitlab.com/essere.lab.public/qualitas.class-corpus
HTML | 482 lines | 434 code | 17 blank | 31 comment | 0 complexity | 3f5b0b14c6be5b522cb28d2b57d6115e MD5 | raw file
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2. <html xmlns="http://www.w3.org/1999/xhtml">
  3. <!--
  4. Copyright (c) 2006, Sun Microsystems, Inc.
  5. All rights reserved.
  6. Redistribution and use in source and binary forms, with or without
  7. modification, are permitted provided that the following conditions are met:
  8. * Redistributions of source code must retain the above copyright notice,
  9. this list of conditions and the following disclaimer.
  10. * Redistributions in binary form must reproduce the above copyright
  11. notice, this list of conditions and the following disclaimer in the
  12. documentation and/or other materials provided with the distribution.
  13. * Neither the name of the Sun Microsystems, Inc. nor the names of its
  14. contributors may be used to endorse or promote products derived from
  15. this software without specific prior written permission.
  16. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  17. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  18. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  19. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  20. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  21. CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  22. SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  23. INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  24. CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  25. ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  26. THE POSSIBILITY OF SUCH DAMAGE.
  27. -->
  28. <head>
  29. <title>JavaCC README for SimpleExamples</title>
  30. <!-- Changed by: Michael Van De Vanter, 14-Jan-2003 -->
  31. </head>
  32. <body bgcolor="#FFFFFF" >
  33. <h1>JavaCC [tm]: README for SimpleExamples</h1>
  34. <p>
  35. This directory contains five examples to get you started using JavaCC.
  36. Each example is contained in a single grammar file and are listed
  37. below:
  38. </p>
  39. <pre>
  40. Simple1.jj
  41. Simple2.jj
  42. Simple3.jj
  43. Simple4.jj
  44. NL_Xlator.jj
  45. IdList.jj
  46. </pre>
  47. <p>
  48. Once you have tried out and understood each of these examples, you
  49. should take a look at more complex examples in other sub-directories
  50. under the examples directory. But even with just these examples, you
  51. should be able to get started on reasonable complex grammars.
  52. </p>
  53. <h2>Summary Instructions</h2>
  54. <p>
  55. If you are a parser and lexical analyzer expert and can understand the
  56. examples by just reading them, the following instructions show you how
  57. to get started with JavaCC. The instructions below are with respect
  58. to Simple1.jj, but you can build any parser using the same set of
  59. commands.
  60. </p>
  61. <ol>
  62. <li>
  63. Run javacc on the grammar input file to generate a bunch of Java
  64. files that implement the parser and lexical analyzer (or token
  65. manager):
  66. <pre>
  67. javacc Simple1.jj
  68. </pre>
  69. </li>
  70. <li> Now compile the resulting Java programs:
  71. <pre>
  72. javac *.java
  73. </pre>
  74. </li>
  75. <li> The parser is now ready to use. To run the parser, type:
  76. <pre>
  77. java Simple1
  78. </pre>
  79. </li>
  80. </ol>
  81. <p>
  82. The Simple1 parser and others in this directory are designed to take
  83. input from standard input. Simple1 recognizes matching braces
  84. followed by zero or more line terminators and then an end of file.
  85. </p>
  86. <p>
  87. Examples of legal strings in this grammar are:
  88. <pre>
  89. "{}", "{{{{{}}}}}", etc.
  90. </pre>
  91. Examples of illegal strings are:
  92. <pre>
  93. "{{{{", "{}{}", "{}}", "{{}{}}", "{ }", "{x}", etc.
  94. </pre>
  95. <p>
  96. Try typing various different inputs to Simple1. Remember &lt;control-d&gt;
  97. may be used to indicate the end of file (this is on the UNIX platform).
  98. Here are some sample runs:
  99. </p>
  100. <pre>
  101. % java Simple1
  102. {{}}&lt;return&gt;
  103. &lt;control-d&gt;
  104. %
  105. % java Simple1
  106. {x&lt;return&gt;
  107. Lexical error at line 1, column 2. Encountered: "x"
  108. TokenMgrError: Lexical error at line 1, column 2. Encountered: "x" (120), after : ""
  109. at Simple1TokenManager.getNextToken(Simple1TokenManager.java:146)
  110. at Simple1.getToken(Simple1.java:140)
  111. at Simple1.MatchedBraces(Simple1.java:51)
  112. at Simple1.Input(Simple1.java:10)
  113. at Simple1.main(Simple1.java:6)
  114. %
  115. % java Simple1
  116. {}}&lt;return&gt;
  117. ParseException: Encountered "}" at line 1, column 3.
  118. Was expecting one of:
  119. &lt;EOF&gt;
  120. "\n" ...
  121. "\r" ...
  122. at Simple1.generateParseException(Simple1.java:184)
  123. at Simple1.jj_consume_token(Simple1.java:126)
  124. at Simple1.Input(Simple1.java:32)
  125. at Simple1.main(Simple1.java:6)
  126. %
  127. </pre>
  128. <h2>DETAILED DESCRIPTION OF Simple1.jj</h2>
  129. <p>
  130. This is a simple JavaCC grammar that recognizes a set of left braces
  131. followed by the same number of right braces and finally followed by
  132. zero or more line terminators and finally an end of file. Examples of
  133. legal strings in this grammar are:
  134. </p>
  135. <pre>
  136. "{}", "{{{{{}}}}}", etc.
  137. </pre>
  138. Examples of illegal strings are:
  139. <pre>
  140. "{{{{", "{}{}", "{}}", "{{}{}}", etc.
  141. </pre>
  142. <p>
  143. This grammar file starts with settings for all the options offered by
  144. JavaCC. In this case the option settings are their default values.
  145. Hence these option settings were really not necessary. One could as
  146. well have completely omitted the options section, or omitted one or
  147. more of the individual option settings. The details of the individual
  148. options is described in the JavaCC documentation in the web pages.
  149. </p>
  150. <p>
  151. Following this is a Java compilation unit enclosed between
  152. "PARSER_BEGIN(name)" and "PARSER_END(name)". This compilation unit
  153. can be of arbitrary complexity. The only constraint on this
  154. compilation unit is that it must define a class called "name" - the
  155. same as the arguments to PARSER_BEGIN and PARSER_END. This is the
  156. name that is used as the prefix for the Java files generated by the
  157. parser generator. The parser code that is generated is inserted
  158. immediately before the closing brace of the class called "name".
  159. </p>
  160. <p>
  161. In the above example, the class in which the parser is generated
  162. contains a main program. This main program creates an instance of the
  163. parser object (an object of type Simple1) by using a constructor that
  164. takes one argument of type java.io.InputStream ("System.in" in this
  165. case).
  166. </p>
  167. <p>
  168. The main program then makes a call to the non-terminal in the grammar
  169. that it would like to parse - "Input" in this case. All non-terminals
  170. have equal status in a JavaCC generated parser, and hence one may
  171. parse with respect to any grammar non-terminal.
  172. </p>
  173. <p>
  174. Following this is a list of productions. In this example, there are
  175. two productions that define the non-terminals "Input" and
  176. "MatchedBraces" respectively. In JavaCC grammars, non-terminals are
  177. written and implemented (by JavaCC) as Java methods. When the
  178. non-terminal is used on the left-hand side of a production, it is
  179. considered to be declared and its syntax follows the Java syntax. On
  180. the right-hand side, its use is similar to a method call in Java.
  181. </p>
  182. <p>
  183. Each production defines its left-hand side non-terminal followed by a
  184. colon. This is followed by a bunch of declarations and statements
  185. within braces (in both cases in the above example, there are no
  186. declarations and hence this appears as "{}") which are generated as
  187. common declarations and statements into the generated method. This is
  188. then followed by a set of expansions also enclosed within braces.
  189. </p>
  190. <p>
  191. Lexical tokens (regular expressions) in a JavaCC input grammar are
  192. either simple strings ("{", "}", "\n", and "\r" in the above example),
  193. or a more complex regular expression. In our example above, there is
  194. one such regular expression "&lt;EOF&gt;" which is matched by the end of
  195. file. All complex regular expressions are enclosed within angular
  196. brackets.
  197. </p>
  198. <p>
  199. The first production above says that the non-terminal "Input" expands
  200. to the non-terminal "MethodBraces" followed by zero or more line
  201. terminators ("\n" or "\r") and then the end of file.
  202. </p>
  203. <p>
  204. The second production above says that the non-terminal "MatchedBraces"
  205. expands to the token "{" followed by an optional nested expansion of
  206. "MatchedBraces" followed by the token "}". Square brackets [...]
  207. in a JavaCC input file indicate that the ... is optional.
  208. </p>
  209. <p>
  210. [...] may also be written as (...)?. These two forms are equivalent.
  211. Other structures that may appear in expansions are:
  212. </p>
  213. <pre>
  214. e1 | e2 | e3 | ... : A choice of e1, e2, e3, etc.
  215. ( e )+ : One or more occurrences of e
  216. ( e )* : Zero or more occurrences of e
  217. </pre>
  218. <p>
  219. Note that these may be nested within each other, so we can have
  220. something like:
  221. </p>
  222. <pre>
  223. (( e1 | e2 )* [ e3 ] ) | e4
  224. </pre>
  225. <p>
  226. To build this parser, simply run JavaCC on this file and compile the
  227. resulting Java files:
  228. <pre>
  229. javacc Simple1.jj
  230. javac *.java
  231. </pre>
  232. <p>
  233. Now you should be able to run the generated parser. Make sure that
  234. the current directory is in your CLASSPATH and type:
  235. </p>
  236. <pre>
  237. java Simple1
  238. </pre>
  239. <p>
  240. Now type a sequence of matching braces followed by a return and an end
  241. of file (CTRL-D on UNIX machines). If this is a problem on your
  242. machine, you can create a file and pipe it as input to the generated
  243. parser in this manner (piping also does not work on all machines - if
  244. this is a problem, just replace "System.in" in the grammar file with
  245. 'new FileInputStream("testfile")' and place your input inside this
  246. file):
  247. </p>
  248. <pre>
  249. java Simple1 &lt; myfile
  250. </pre>
  251. <p>
  252. Also try entering illegal sequences such as mismatched braces, spaces,
  253. and carriage returns between braces as well as other characters and
  254. take a look at the error messages produced by the parser.
  255. </p>
  256. <h3>DETAILED DESCRIPTION OF Simple2.jj</h3>
  257. <p>
  258. Simple2.jj is a minor modification to Simple1.jj to allow white space
  259. characters to be interspersed among the braces. So then input such
  260. as:
  261. </p>
  262. <pre>
  263. "{{ }\n}\n\n"
  264. </pre>
  265. <p>
  266. will now be legal.
  267. </p>
  268. <p>
  269. Take a look at Simple2.jj. The first thing you will note is that we
  270. have omitted the options section. This does not change anything since
  271. the options in Simple1.jj were all assigned their default values.
  272. </p>
  273. <p>
  274. The other difference between this file and Simple1.jj is that this
  275. file contains a lexical specification - the region that starts with
  276. "SKIP". Within this region are 4 regular expressions - space, tab,
  277. newline, and return. This says that matches of these regular
  278. expressions are to be ignored (and not considered for parsing). Hence
  279. whenever any of these 4 characters are encountered, they are just
  280. thrown away.
  281. </p>
  282. <p>
  283. In addition to SKIP, JavaCC has three other lexical specification
  284. regions. These are:
  285. </p>
  286. <pre>
  287. . TOKEN: This is used to specify lexical tokens (see next example)
  288. . SPECIAL_TOKEN: This is used to specify lexical tokens that are to be
  289. ignored during parsing. In this sense, SPECIAL_TOKEN is
  290. the same as SKIP. However, these tokens can be recovered
  291. within parser actions to be handled appropriately.
  292. . MORE: This specifies a partial token. A complete token is
  293. made up of a sequence of MORE's followed by a TOKEN
  294. or SPECIAL_TOKEN.
  295. </pre>
  296. <p>
  297. Please take a look at some of the more complex grammars such as the
  298. Java grammars for examples of usage of these lexical specification
  299. regions.
  300. </p>
  301. <p>
  302. You may build Simple2 and invoke the generated parser with input from
  303. the keyboard as standard input.
  304. </p>
  305. <p>
  306. You can also try generating the parser with the various debug options
  307. turned on and see what the output looks like. To do this type:
  308. </p>
  309. <pre>
  310. javacc -debug_parser Simple2.jj
  311. javac Simple2*.java
  312. java Simple2
  313. </pre>
  314. <p>
  315. Then type:
  316. </p>
  317. <pre>
  318. javacc -debug_token_manager Simple2.jj
  319. javac Simple2*.java
  320. java Simple2
  321. </pre>
  322. <p>
  323. Note that token manager debugging produces a lot of diagnostic
  324. information and it is typically used to look at debug traces a single
  325. token at a time.
  326. </p>
  327. <h3>DETAILED DESCRIPTION OF Simple3.jj</h3>
  328. <p>
  329. Simple3.jj is the third and final version of our matching brace
  330. detector. This example illustrates the use of the TOKEN region for
  331. specifying lexical tokens. In this case, "{" and "}" are defined as
  332. tokens and given names LBRACE and RBRACE respectively. These labels
  333. can then be used within angular brackets (as in the example) to refer
  334. to this token. Typically such token specifications are used for
  335. complex tokens such as identifiers and literals. Tokens that are
  336. simple strings are left as is (in the previous examples).
  337. </p>
  338. <p>
  339. This example also illustrates the use of actions in the grammar
  340. productions. The actions inserted in this example count the number of
  341. matching braces. Note the use of the declaration region to declare
  342. variables "count" and "nested_count". Also note how the non-terminal
  343. "MatchedBraces" returns its value as a function return value.
  344. </p>
  345. <h3>DETAILED DESCRIPTION OF NL_Xlator.jj</h3>
  346. <p>
  347. This example goes into the details of writing regular expressions in
  348. JavaCC grammar files. It also illustrates a slightly more complex set
  349. of actions that translate the expressions described by the grammar
  350. into English.
  351. </p>
  352. <p>
  353. The new concept in the above example is the use of more complex
  354. regular expressions. The regular expression:
  355. </p>
  356. <pre>
  357. &lt; ID: ["a"-"z","A"-"Z","_"] ( ["a"-"z","A"-"Z","_","0"-"9"] )* &gt;
  358. </pre>
  359. <p>
  360. creates a new regular expression whose name is ID. This can be
  361. referred anywhere else in the grammar simply as &lt;ID&gt;. What follows in
  362. square brackets are a set of allowable characters - in this case it is
  363. any of the lower or upper case letters or the underscore. This is
  364. followed by 0 or more occurrences of any of the lower or upper case
  365. letters, digits, or the underscore.
  366. </p>
  367. <p>
  368. Other constructs that may appear in regular expressions are:
  369. </p>
  370. <pre>
  371. ( ... )+ : One or more occurrences of ...
  372. ( ... )? : An optional occurrence of ... (Note that in the case
  373. of lexical tokens, (...)? and [...] are not equivalent)
  374. ( r1 | r2 | ... ) : Any one of r1, r2, ...
  375. </pre>
  376. <p>
  377. A construct of the form [...] is a pattern that is matched by the
  378. characters specified in ... . These characters can be individual
  379. characters or character ranges. A "~" before this construct is a
  380. pattern that matches any character not specified in ... . Therefore:
  381. </p>
  382. <pre>
  383. ["a"-"z"] matches all lower case letters
  384. ~[] matches any character
  385. ~["\n","\r"] matches any character except the new line characters
  386. </pre>
  387. <p>
  388. When a regular expression is used in an expansion, it takes a value of
  389. type "Token". This is generated into the generated parser directory
  390. as "Token.java". In the above example, we have defined a variable of
  391. type "Token" and assigned the value of the regular expression to it.
  392. </p>
  393. <h3>DETAILED DESCRIPTION OF IdList.jj</h3>
  394. <p>
  395. This example illustrates an important attribute of the SKIP
  396. specification. The main point to note is that the regular expressions
  397. in the SKIP specification are only ignored *between tokens* and not
  398. *within tokens*. This grammar accepts any sequence of identifiers
  399. with white space in between.
  400. </p>
  401. <p>
  402. A legal input for this grammar is:
  403. </p>
  404. <pre>
  405. "abc xyz123 A B C \t\n aaa"
  406. </pre>
  407. <p>
  408. This is because any number of the SKIP regular expressions are allowed
  409. in between consecutive &lt;Id&gt;'s. However, the following is not a legal
  410. input:
  411. </p>
  412. <pre>
  413. "xyz 123"
  414. </pre>
  415. <p>
  416. This is because the space character after "xyz" is in the SKIP
  417. category and therefore causes one token to end and another to begin.
  418. This requires "123" to be a separate token and hence does not match
  419. the grammar.
  420. </p>
  421. <p>
  422. If spaces were OK within &lt;Id&gt;'s, then all one has to do is to replace
  423. the definition of Id to:
  424. </p>
  425. <pre>
  426. TOKEN :
  427. {
  428. &lt; Id: ["a"-"z","A"-"Z"] ( (" ")* ["a"-"z","A"-"Z","0"-"9"] )* &gt;
  429. }
  430. </pre>
  431. <p>
  432. Note that having a space character within a TOKEN specification does
  433. not mean that the space character cannot be used in the SKIP
  434. specification. All this means is that any space character that
  435. appears in the context where it can be placed within an identifier
  436. will participate in the match for &lt;Id&gt;, whereas all other space
  437. characters will be ignored. The details of the matching algorithm are
  438. described in the JavaCC documentation.
  439. </p>
  440. <p>
  441. As a corollary, one must define as tokens anything within which
  442. characters such as white space characters must not be present. In the
  443. above example, if &lt;Id&gt; was defined as a grammar production rather than
  444. a lexical token as shown below this paragraph, then "xyz 123" would
  445. have been recognized as a legitimate &lt;Id&gt; (wrongly).
  446. </p>
  447. <pre>
  448. void Id() :
  449. {}
  450. {
  451. &lt;["a"-"z","A"-"Z"]&gt; ( &lt;["a"-"z","A"-"Z","0"-"9"]&gt; )*
  452. }
  453. </pre>
  454. <p>
  455. Note that in the above definition of non-terminal Id, it is made up of
  456. a sequence of single character tokens (note the location of &lt;...&gt;s),
  457. and hence white space is allowed between these characters.
  458. </p>
  459. </body>
  460. </html>