/projects/castor-1.3.3/schema/src/main/java/org/exolab/castor/xml/dtd/parser/package.html
HTML | 169 lines | 158 code | 11 blank | 0 comment | 0 complexity | 06f30876ce458ae4e56e11f03239876a MD5 | raw file
- <html>
- <body>
- <p><b>The XML DTD Parser API</b></p>
-
- <dl>
- <dt><b>Version: </b></dt><dd>$Revision: 6213 $ $Date: 2003-03-03 00:05:44 -0700 (Mon, 03 Mar 2003) $</dd>
- <dt><b>Author: </b></dt><dd><a href="mailto:totok@intalio.com">Alexander Totok</a></dd>
- </dl>
-
- This package consists of two parsers:
- <ul>
- <li>{@link org.exolab.castor.xml.dtd.parser.DTDInitialParser Initial Parser}
- parses the input text, searches for <b>parameter
- entity</b> declarations (i.e. entities used only within XML DTD)
- and substitutes <b>parameter entity
- references</b> by corresponding <b>replacement text</b>. All other
- text is passed to the output "as is".<br>
- The initial parser parses <b>internal parameter entity</b> declarations only, like:<pre>
- <!ENTITY % name "John White" > </pre>
- signaling an error if an <b>external parameter entity</b> declaration, like:<pre>
- <!ENTITY % ISOLat2 SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" ></pre>
- is met.
- Future versions will be able to parse and handle <b>external parameter
- entity</b> declarations.<br>
- The output of this parser is a document without paramater entity
- declarations and all parameter entity references substituted by corresponding
- replacement text.
- <li>{@link org.exolab.castor.xml.dtd.parser.DTDParser Main Parser}
- performes the main parsing process. It is able to parse:
- <ul>
- <li>ELEMENT declarations
- <li>ATTRIBUTE declarations
- <li>GENERAL ENTITY declarations
- <li>NOTATION declarations
- <li>Comments
- </ul>
- The parser <font color="red">does not</font> parse:
- <ul>
- <li>Conditional Sections:<pre>
- <![ INCLUDE [ ... ]]>
- <![ IGNORE [ ... ]]> </pre>
- <li>Processing Instructions, like: <pre>
- <?xml version="1.0" encoding="UTF-16"?> </pre>
- </ul>
- The parser does not expand general entity references or character
- references occurring within attribute and entity values.
- </ul>
- <p>
- The parser is fully compliant with the current
- <a href="http://www.w3.org/TR/REC-xml">XML specification</a>,
- unless otherwise is stated, for instance it is able to parse Unicode text,
- provided the {@link java.io.Reader java.io.Reader} used to instantiate
- the parser is correctly set up.
- </p>
- <p><b>The structure of the package:</b></p>
- <p>
- The parser was written using <a href="http://www.metamata.com/JavaCC/">JavaCC</a>
- (Java Compiler Compiler) - an automated tool to generate Java programming
- language parsers.
- </p>
- <p>This package consists of the following classes and files:</p>
- <ul>
- <li><font color="blue">DTDInitialParser.jj</font>
- - initial parser's JavaCC grammar file with
- the syntax specification and processing code. This file is used
- by JavaCC to automatically generate Java classes for the initial parser.
- <li>{@link org.exolab.castor.xml.dtd.parser.DTDInitialParser DTDInitialParser},
- {@link org.exolab.castor.xml.dtd.parser.DTDInitialParserConstants DTDInitialParserConstants},
- {@link org.exolab.castor.xml.dtd.parser.DTDInitialParserTokenManager DTDInitialParserTokenManager}
- - classes of the initial parser automatically generated by JavaCC from
- the <font color="blue">DTDInitialParser.jj</font> file.
- <li><font color="blue">DTDParser.jj</font>
- - main parser's JavaCC grammar file with
- the syntax specification and processing code. This file is used
- by JavaCC to automatically generate Java classes for the main parser.
- <li>{@link org.exolab.castor.xml.dtd.parser.DTDParser DTDParser},
- {@link org.exolab.castor.xml.dtd.parser.DTDParserConstants DTDParserConstants},
- {@link org.exolab.castor.xml.dtd.parser.DTDParserTokenManager DTDParserTokenManager}
- - classes of the main parser automatically generated by JavaCC from
- the <font color="blue">DTDParser.jj</font> file.
- <li>{@link org.exolab.castor.xml.dtd.parser.Token Token},
- {@link org.exolab.castor.xml.dtd.parser.ParseException ParseException},
- {@link org.exolab.castor.xml.dtd.parser.TokenMgrError TokenMgrError},
- {@link org.exolab.castor.xml.dtd.parser.CharStream CharStream} -
- classes used by both parsers and suitable for any grammar. JavaCC
- first looks for these files and generates them only if they are absent.
- But <font color="red">do not edit</font> the first line of these files,
- as JavaCC will give warning message about being unable to authenticate them.
- <br>
- {@link org.exolab.castor.xml.dtd.parser.TokenMgrError TokenMgrError}
- is thrown if the Token Manager of the parser has encountered
- a syntax error in the text of DTD document and is unable to produce
- next token.
- <br>
- {@link org.exolab.castor.xml.dtd.parser.ParseException ParseException}
- is thrown if a DTD document does not comply with the DTD syntax
- and the parser is unable to parse the document.
- <li>{@link org.exolab.castor.xml.dtd.parser.InputCharStream InputCharStream}
- - an implementation of interface
- {@link org.exolab.castor.xml.dtd.parser.CharStream CharStream}.
- Implements an input character
- stream that maintains line and column number positions of the characters.
- It also has the capability to backup the stream to some extent.<br>
- The object of this class is constructed using a
- {@link java.io.Reader java.io.Reader} <tt>reader</tt> and it is left to
- constructor of the <tt>reader</tt> to set up character encoding correctly.
- This means that method <u><font color="blue">read</font></u> of
- the <tt>reader</tt> is used to get the next characters, assuming it returns
- appropriate values. It is recommended to use class
- {@link java.io.InputStreamReader java.io.InputStreamReader} as
- a <tt>reader</tt>, which allows you to set the desired character encoding.
- This class is an intermediate component between input
- character reader and the parser.<br>
- The code of this class is based on the class
- <b>ASCII_CharStream</b> - implementation of interface
- {@link org.exolab.castor.xml.dtd.parser.CharStream CharStream}, that
- JavaCC would have generated with the following options set in
- a JavaCC grammar file: <pre>
- JAVA_UNICODE_ESCAPE = false;
- UNICODE_INPUT = false;
- USER_CHAR_STREAM = false; </pre>
- Note that this class is not fully JavaCC generated.
- </ul>
- <p>
- The following example parses the XML DTD file <tt>dtd-document.dtd</tt>
- and constructs the corresponding
- {@link org.exolab.castor.xml.dtd.DTDdocument XML DTD document} object <tt>dtd</tt>.
- </p>
- <pre>
- FileInputStream inputStream;
- InputStreamReader reader;
- InputCharStream charStream;
- DTDInitialParser initialParser;
- String intermedResult;
- StringReader strReader;
- DTDParser parser;
- DTDdocument dtd;
-
- <font color="red">// instantiate input byte stream, associated with the input file</font>
- inputStream = new FileInputStream( "dtd-document.dtd" );
-
- <font color="red">// instantiate character reader from the input file byte stream</font>
- reader = new InputStreamReader( inputStream, "US-ASCII" );
-
- <font color="red">// instantiate char stream for initial parser from the input reader</font>
- charStream = new InputCharStream( reader );
-
- <font color="red">// instantiate initial parser</font>
- initialParser = new DTDInitialParser( charStream );
-
- <font color="red">// get result of initial parsing - DTD text document with parameter
- // entity references expanded</font>
- intermedResult = initialParser.Input();
-
- <font color="red">// construct StringReader from the intermediate parsing result</font>
- strReader= new StringReader( intermedResult );
-
- <font color="red">// instantiate char stream for the main parser</font>
- charStream = new InputCharStream( strReader );
-
- <font color="red">// instantiate main parser</font>
- parser = new DTDParser( charStream );
-
- <font color="red">// parse intermediate parsing result with the main parser
- // and get corresponding DTD document oblect</font>
- dtd = parser.Input();
- </pre>
- </body>
- </html>