PageRenderTime 56ms CodeModel.GetById 27ms RepoModel.GetById 0ms app.codeStats 0ms

/ojc-core/component-common/xsdmodel/src/org/exolab/castor/xml/dtd/parser/package.html

https://bitbucket.org/pymma/openesb-components
HTML | 168 lines | 157 code | 11 blank | 0 comment | 0 complexity | effaa1215d6bd440c92fcf8666c0480a MD5 | raw file
  1. <html>
  2. <body>
  3. <p><b>The XML DTD Parser API</b></p>
  4. <dl>
  5. <dt><b>Version: </b></dt><dd> </dd>
  6. <dt><b>Author: </b></dt><dd><a href="mailto:totok@intalio.com">Alexander Totok</a></dd>
  7. </dl>
  8. This package consists of two parsers:
  9. <ul>
  10. <li>{@link org.exolab.castor.xml.dtd.parser.DTDInitialParser Initial Parser}
  11. parses the input text, searches for <b>parameter
  12. entity</b> declarations (i.e. entities used only within XML DTD)
  13. and substitutes <b>parameter entity
  14. references</b> by corresponding <b>replacement text</b>. All other
  15. text is passed to the output "as is".<br>
  16. The initial parser parses <b>internal parameter entity</b> declarations only, like:<pre>
  17. &lt;!ENTITY % name "John White" &gt; </pre>
  18. signaling an error if an <b>external parameter entity</b> declaration, like:<pre>
  19. &lt;!ENTITY % ISOLat2 SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" &gt;</pre>
  20. is met.
  21. Future versions will be able to parse and handle <b>external parameter
  22. entity</b> declarations.<br>
  23. The output of this parser is a document without paramater entity
  24. declarations and all parameter entity references substituted by corresponding
  25. replacement text.
  26. <li>{@link org.exolab.castor.xml.dtd.parser.DTDParser Main Parser}
  27. performes the main parsing process. It is able to parse:
  28. <ul>
  29. <li>ELEMENT declarations
  30. <li>ATTRIBUTE declarations
  31. <li>GENERAL ENTITY declarations
  32. <li>NOTATION declarations
  33. <li>Comments
  34. </ul>
  35. The parser <font color="red">does not</font> parse:
  36. <ul>
  37. <li>Conditional Sections:<pre>
  38. &lt;![ INCLUDE [ ... ]]&gt;
  39. &lt;![ IGNORE [ ... ]]&gt; </pre>
  40. <li>Processing Instructions, like: <pre>
  41. &lt;?xml version="1.0" encoding="UTF-16"?&gt; </pre>
  42. </ul>
  43. The parser does not expand general entity references or character
  44. references occured within attribute and entity values.
  45. </ul>
  46. <p>
  47. The parser is fully compliant with the current
  48. <a href="http://www.w3.org/TR/REC-xml">XML specification</a>,
  49. unless otherwise is stated, for instance it is able to parse Unicode text,
  50. provided the {@link java.io.Reader java.io.Reader} used to instantiate
  51. the parser is correctly set up.
  52. <p>
  53. <b>The structure of the package:</b>
  54. <p>
  55. The parser was written using <a href="http://www.metamata.com/JavaCC/">JavaCC</a>
  56. (Java Compiler Compiler) - automated tool to generate Java programming
  57. language parsers.
  58. <p>
  59. Package consists of the following classes and files:
  60. <ul>
  61. <li><font color="blue">DTDInitialParser.jj</font>
  62. - initial parser's JavaCC grammar file with
  63. the syntax specification and processing code. This file is used
  64. by JavaCC to automatically generate Java classes for the initial parser.
  65. <li>{@link org.exolab.castor.xml.dtd.parser.DTDInitialParser DTDInitialParser},
  66. {@link org.exolab.castor.xml.dtd.parser.DTDInitialParserConstants DTDInitialParserConstants},
  67. {@link org.exolab.castor.xml.dtd.parser.DTDInitialParserTokenManager DTDInitialParserTokenManager}
  68. - classes of the initial parser automatically generated by JavaCC from
  69. the <font color="blue">DTDInitialParser.jj</font> file.
  70. <li><font color="blue">DTDParser.jj</font>
  71. - main parser's JavaCC grammar file with
  72. the syntax specification and processing code. This file is used
  73. by JavaCC to automatically generate Java classes for the main parser.
  74. <li>{@link org.exolab.castor.xml.dtd.parser.DTDParser DTDParser},
  75. {@link org.exolab.castor.xml.dtd.parser.DTDParserConstants DTDParserConstants},
  76. {@link org.exolab.castor.xml.dtd.parser.DTDParserTokenManager DTDParserTokenManager}
  77. - classes of the main parser automatically generated by JavaCC from
  78. the <font color="blue">DTDParser.jj</font> file.
  79. <li>{@link org.exolab.castor.xml.dtd.parser.Token Token},
  80. {@link org.exolab.castor.xml.dtd.parser.ParseException ParseException},
  81. {@link org.exolab.castor.xml.dtd.parser.TokenMgrError TokenMgrError},
  82. {@link org.exolab.castor.xml.dtd.parser.CharStream CharStream} -
  83. classes used by both parsers and suitable for any grammar. JavaCC
  84. first looks for these files and generates them only if they are absent.
  85. But <font color="red">do not edit</font> the first line of these files,
  86. as JavaCC will give warning message being unable to authenticate them.
  87. <br>
  88. {@link org.exolab.castor.xml.dtd.parser.TokenMgrError TokenMgrError}
  89. is thrown if the Token Manager of the parser has encountered
  90. a syntax error in the text of DTD document and is unable to produce
  91. next token.
  92. <br>
  93. {@link org.exolab.castor.xml.dtd.parser.ParseException ParseException}
  94. is thrown if a DTD document does not comply with DTD syntax
  95. and the parser is unable to parse the document.
  96. <li>{@link org.exolab.castor.xml.dtd.parser.InputCharStream InputCharStream}
  97. - an implementation of interface
  98. {@link org.exolab.castor.xml.dtd.parser.CharStream CharStream}.
  99. Implements input character
  100. stream that maintains line and column number positions of the characters.
  101. It also has the capability to backup the stream to some extent.<br>
  102. The object of this class is constructed using
  103. {@link java.io.Reader java.io.Reader} <tt>reader</tt> and it is left to
  104. constructor of the <tt>reader</tt> to set up character encoding correctly.
  105. This means that method <u><font color="blue">read</font></u> of
  106. the <tt>reader</tt> is used to get next characters, assuming it returns
  107. appropriate values. It is recommended to use class
  108. {@link java.io.InputStreamReader java.io.InputStreamReader} as
  109. a <tt>reader</tt>, which allows to set desired character encoding.
  110. This class is an intermediate component between input
  111. character reader and the parser.<br>
  112. The code of this class is based on the class
  113. <b>ASCII_CharStream</b> - implementation of interface
  114. {@link org.exolab.castor.xml.dtd.parser.CharStream CharStream}, that
  115. JavaCC would have generated with the following options set in
  116. a JavaCC grammar file: <pre>
  117. JAVA_UNICODE_ESCAPE = false;
  118. UNICODE_INPUT = false;
  119. USER_CHAR_STREAM = false; </pre>
  120. Note that this class is not fully JavaCC generated.
  121. </ul>
  122. <p>
  123. The followinge example parses XML DTD file <tt>dtd-document.dtd</tt>
  124. and constructs corresponding
  125. {@link org.exolab.castor.xml.dtd.DTDdocument XML DTD document} object <tt>dtd</tt>.
  126. <pre>
  127. FileInputStream inputStream;
  128. InputStreamReader reader;
  129. InputCharStream charStream;
  130. DTDInitialParser initialParser;
  131. String intermedResult;
  132. StringReader strReader;
  133. DTDParser parser;
  134. DTDdocument dtd;
  135. <font color="red">// instantiate input byte stream, associated with the input file</font>
  136. inputStream = new FileInputStream( "dtd-document.dtd" );
  137. <font color="red">// instantiate character reader from the input file byte stream</font>
  138. reader = new InputStreamReader( inputStream, "US-ASCII" );
  139. <font color="red">// instantiate char stream for initial parser from the input reader</font>
  140. charStream = new InputCharStream( reader );
  141. <font color="red">// instantiate initial parser</font>
  142. initialParser = new DTDInitialParser( charStream );
  143. <font color="red">// get result of initial parsing - DTD text document with parameter
  144. // entity references expanded</font>
  145. intermedResult = initialParser.Input();
  146. <font color="red">// construct StringReader from the intermediate parsing result</font>
  147. strReader= new StringReader( intermedResult );
  148. <font color="red">// instantiate char stream for the main parser</font>
  149. charStream = new InputCharStream( strReader );
  150. <font color="red">// instantiate main parser</font>
  151. parser = new DTDParser( charStream );
  152. <font color="red">// parse intermediate parsing result with the main parser
  153. // and get corresponding DTD document oblect</font>
  154. dtd = parser.Input();
  155. </pre>
  156. </body>
  157. </html>