PageRenderTime 65ms CodeModel.GetById 40ms app.highlight 18ms RepoModel.GetById 1ms app.codeStats 0ms

/bundles/plugins-trunk/XML/docs/trang-manual.html

#
HTML | 522 lines | 365 code | 147 blank | 10 comment | 0 complexity | 7eea7fef0baf014d6bff8134b65e14e7 MD5 | raw file
  1<html xmlns="http://www.w3.org/1999/xhtml">
  2<head>
  3<title>Trang Manual</title>
  4</head>
  5<body>
  6
  7<h1>Trang Manual</h1>
  8
  9<p>Copyright &#169; 2002, 2003, 2008 Thai Open Source Software Center Ltd</p>
 10
 11<p>See the file <a href="copying.txt">copying.txt</a> for copying
 12permission.</p>
 13
 14<h3>Version 20090818</h3>
 15
 16<h2>Contents</h2>
 17
 18<ul>
 19<li><a href="#introduction">Introduction</a></li>
 20<li><a href="#running">Running Trang</a></li>
 21<li><a href="#arguments">Command-line arguments</a></li>
 22<li><a href="#input-modules">Input modules</a>
 23<ul>
 24<li><a href="#rng-input">RELAX NG (XML syntax)</a></li>
 25<li><a href="#rnc-input">RELAX NG (compact syntax)</a></li>
 26<li><a href="#dtd-input">XML DTD</a></li>
 27<li><a href="#xml-input">XML</a></li>
 28</ul>
 29</li>
 30
 31<li><a href="#output-modules">Output modules</a>
 32<ul>
 33<li><a href="#rng-output">RELAX NG (XML syntax)</a></li>
 34<li><a href="#rnc-output">RELAX NG (compact syntax)</a></li>
 35<li><a href="#dtd-output">XML DTD</a></li>
 36<li><a href="#xsd-output">W3C XML Schema</a></li>
 37</ul>
 38</li>
 39
 40<!--
 41<li><a href="#examples">Examples</a></li>
 42-->
 43</ul>
 44
 45<h2><a name="introduction">Introduction</a></h2>
 46
 47<p>Trang takes as input a schema written in any of the following formats:</p>
 48
 49<ul>
 50<li>RELAX NG (XML syntax)</li>
 51<li>RELAX NG (compact syntax)</li>
 52<li>XML 1.0 DTD</li>
 53</ul>
 54
 55<p>and produces as output a schema written in any of the following formats:</p>
 56
 57<ul>
 58<li>RELAX NG (XML syntax)</li>
 59<li>RELAX NG (compact syntax)</li>
 60<li>XML 1.0 DTD</li>
 61<li>W3C XML Schema</li>
 62</ul>
 63
 64<p>Trang can also infer a schema from one or more example XML
 65documents.</p>
 66
 67<p>Trang uses an internal representation based on RELAX NG.  For each
 68supported input format, there is an input module that converts a
 69schema in that input format into this internal representation.  For
 70each supported output format, there is an output module that converts
 71the internal representation into a schema in that output format.
 72Thus, any supported input format can be translated to any supported
 73output format.</p>
 74
 75<h2><a name="running">Running Trang</a></h2>
 76
 77<p>The file <code>trang.jar</code> contains Trang packaged for use
 78with a Java runtime. It requires a Java runtime compatible with the
 79Java 2 Platform, Standard Edition (J2SE) version 5 (or any later
 80version), such as the Java Runtime Environment (JRE), which can be
 81downloaded <a href="http://java.sun.com/j2se/downloads.html">here</a>.</p>
 82
 83<p>Once you have installed a suitable Java runtime, you can run Trang
 84by using the command:</p>
 85
 86<pre>java -jar trang.jar <var>args</var></pre>
 87
 88<p>where <code><var>args</var></code> are additional command-line
 89arguments described <a href="#arguments">below</a>.</p>
 90
 91<h2><a name="arguments">Command-line arguments</a></h2>
 92
 93<p>Trang requires two command-line arguments: the first is the URI or
 94filename of the schema to be translated; the second is the output
 95filename.</p>
 96
 97<p>Trang infers the input and output modules to be used from the
 98extension of input and output filenames as follows:</p>
 99
100<dl>
101
102<dt><code>.rng</code></dt>
103<dd>RELAX NG (XML syntax)</dd>
104
105<dt><code>.rnc</code></dt>
106
107<dd>RELAX NG (compact syntax)</dd>
108
109<dt><code>.dtd</code></dt>
110
111<dd>XML 1.0 DTD</dd>
112
113<dt><code>.xsd</code></dt>
114
115<dd>W3C XML Schema</dd>
116
117<dt><code>.xml</code></dt>
118
119<dd>XML documents (used as examples from which to infer a schema)</dd>
120
121</dl>
122
123<p>This inference can be overridden using the <code>-I</code> and
124<code>-O</code> options.</p>
125
126<p>When the input is XML documents used as examples to infer a schema,
127more than one input file may be specified as arguments.  All the input
128files are specified before the output file.</p>
129
130<p>The arguments specifying the input and output files can be preceded
131by arguments specifying options. Trang accepts the following
132options:</p>
133
134<dl>
135
136<dt><code>-I <a href="#rng-input">rng</a></code></dt>
137<dt><code>-I <a href="#rnc-input">rnc</a></code></dt>
138<dt><code>-I <a href="#dtd-input">dtd</a></code></dt>
139<dt><code>-I <a href="#xml-input">xml</a></code></dt>
140
141<dd>Specifies the input module.</dd>
142
143<dt><code>-O <a href="#rng-output">rng</a></code></dt>
144<dt><code>-O <a href="#rnc-output">rnc</a></code></dt>
145<dt><code>-O <a href="#rng-output">dtd</a></code></dt>
146<dt><code>-O <a href="#xsd-output">xsd</a></code></dt>
147
148<dd>Specifies the output module.</dd>
149
150<dt><code>-i <var>param</var></code></dt>
151<dt><code>-o <var>param</var></code></dt>
152
153<dd>Specifies an additional parameter for an input (<code>-i</code>)
154or output (<code>-o</code>) module.  The <code>-i</code> and
155<code>-o</code> options may be used multiple times in order to specify
156multiple parameters.  There are two kinds of parameter: boolean
157parameters and string-valued parameters.  A string-valued parameter is
158specified using the form
159<code><var>name</var>=<var>value</var></code>.  A boolean parameter is
160specified using the form <code><var>name</var></code> or
161<code>no-<var>name</var></code>.  The applicable parameters depend on
162the particular input and output module and are described in the
163documentation for the <a href="#input-modules">input</a> or <a
164href="#output-modules">output</a> modules.</dd>
165
166</dl>
167
168<h2><a name="input-modules">Input modules</a></h2>
169
170<h3><a name="rng-input">RELAX NG (XML syntax) input module</a></h3>
171
172<p>This input module accepts RELAX NG schemas in XML syntax as defined
173by the RELAX NG 1.0 <a
174href="http://www.oasis-open.org/committees/relax-ng/spec.html"
175>Committee Specification</a>.</p>
176
177<p>It accept the following parameters:</p>
178
179<dl>
180<dt><code>-i encoding=<var>name</var></code></dt>
181
182<dd>Use an encoding of <var>name</var> rather than the encoding
183specified in the encoding declaration of the XML document.</dd>
184
185</dl>
186
187<!-- XXX mention incomplete schemas -->
188
189<h3><a name="rnc-input">RELAX NG Compact Syntax input module</a></h3>
190
191<p>This input module accepts RELAX NG schemas using the compact syntax
192as defined in the RELAX NG Compact Syntax <a
193href="http://www.oasis-open.org/committees/relax-ng/compact-20021121.html"
194>Committee Specification</a>.</p>
195
196<p>It accepts the following parameters:</p>
197
198<dl>
199<dt><code>-i encoding=<var>name</var></code></dt>
200
201<dd>Use an encoding of <var>name</var>.  By default, Trang will
202autodetect an encoding of UTF-8 or UTF-16.</dd>
203
204</dl>
205
206
207<h3><a name="dtd-input">DTD input module</a></h3>
208
209<p>This input module accepts DTDs as defined by the XML 1.0 <a
210href="http://www.w3.org/TR/REC-xml">Recommendation</a>.</p>
211
212<!-- Say something about namespaces -->
213
214<p>It accepts the following parameters:</p>
215
216<dl>
217<dt><code>-i xmlns=<var>uri</var></code></dt>
218
219<dd>Specifies the default namespace, that is the namespace used for
220unqualified element names.</dd>
221
222<dt><code>-i xmlns:<var>prefix</var>=<var>uri</var></code></dt>
223
224<dd>Specifies the namespace for the element and attribute names using
225<code><var>prefix</var></code>.</dd>
226
227<dt><code>-i colon-replacement=<var>chars</var></code></dt>
228
229<dd><a name="colon-replacement">Replaces colons in element names by
230<code><var>chars</var></code> when constructing the names of
231definitions used to represent the element declarations and attribute
232list declarations in the DTD.  Trang generates a definition for each
233element declaration and attlist declaration in the DTD. The name of
234the definition is based on the name of the element. In RELAX NG, the
235names of definitions cannot contain colons.  However, in the DTD, the
236element name may contain a colon.  By default, Trang will first try to
237use the element names without prefixes.  If this causes a conflict, it
238will instead replace the colon by a legal name character (it try first
239to use a period).</a></dd>
240
241<dt><code>-i element-define=<var>name-pattern</var></code></dt>
242
243<dd>Specifies how to construct the name of the definition representing
244an element declaration from the name of the element.  The
245<code><var>name-pattern</var></code> must contain exactly one percent
246character.  This percent character is replaced by the name of element
247(after <a href="#colon-replacement">colon replacement</a>) and the
248result is used as the name of the definition.</dd>
249
250<dt><a name="inline-attlist"><code>-i inline-attlist</code></a></dt>
251
252<dd>Specifies not to generate definitions for attribute list
253declarations and instead move attributes declared in attribute list
254declarations into the definitions generated for element declarations.
255This is the default behavior when the output module is
256<code>xsd</code>.  Otherwise, the default behaviour is as described in
257the <a href="#no-inline-attlist"><code>-i no-inline-attlist</code></a>
258parameter.</dd>
259
260<dt><a name="no-inline-attlist"><code>-i no-inline-attlist</code></a></dt>
261
262<dd>Generates a distinct definition (with
263<code>combine="interleave"</code>) for each attribute list declaration
264in the DTD; the definition for each element declaration references the
265definition for the corresponding attribute list declaration.  This is
266the default behavior, except when the output module is
267<code>xsd</code>, for which the default behavior is as described in
268the <a href="#inline-attlist"><code>-i inline-attlist</code></a>
269parameter.</dd>
270
271<dt><code>-i attlist-define=<var>name-pattern</var></code></dt>
272
273<dd>This specifies how to construct the name of the definition
274representing an attribute list declaration from the name of the
275element. The <code><var>name-pattern</var></code> must contain exactly
276one percent character.  This percent character is replaced by the name
277of element (after <a href="#colon-replacement">colon replacement</a>)
278and the result is used as the name of the definition.</dd>
279
280<dt><code>-i any-name=<var>name</var></code></dt>
281
282<dd>Specifies the name of the definition generated for the content of
283elements declared in the DTD as having a content model of ANY.</dd>
284
285<dt><code>-i strict-any</code></dt>
286
287<dd>Preserves the exact semantics of ANY content models by using an
288explicit choice of references to all declared elements.  By default,
289Trang uses a wildcard that allows any element.</dd>
290
291<dt><code>-i annotation-prefix=<var>prefix</var></code></dt>
292
293<dd>Default values are represented using an annotation attribute
294<code><var>prefix</var>:defaultValue</code> where
295<code><var>prefix</var></code> is bound to
296<code>http://relaxng.org/ns/compatibility/annotations/1.0</code> as
297defined by the RELAX NG DTD Compatibility <a
298href="http://www.oasis-open.org/committees/relax-ng/compatibility.html"
299>Committee Specification</a>. By default, Trang will use
300<code>a</code> for <code><var>prefix</var></code> unless that
301conflicts with a prefix used in the DTD.</dd>
302
303<dt><code>-i generate-start</code></dt>
304<dt><code>-i no-generate-start</code></dt>
305
306<dd>Specifies whether Trang should generate a <code>start</code>
307element. DTDs do not indicate what elements are allowed as document
308elements.  Trang assumes that all elements that are defined but never
309referenced are allowed as document elements.</dd>
310
311</dl>
312
313<!-- Say something about limitations wrt marked sections -->
314
315<h3><a name="xml-input">XML input module</a></h3>
316
317<p>This input module accepts one or more XML documents and infers a
318schema.  All the XML documents will be valid with respect to the
319inferred schema.</p>
320
321<p>It accept the following parameters:</p>
322
323<dl>
324<dt><code>-i encoding=<var>name</var></code></dt>
325
326<dd>Use an encoding of <var>name</var> rather than the encoding
327specified in the encoding declaration of the XML document.</dd>
328
329</dl>
330
331<h2><a name="output-modules">Output modules</a></h2>
332
333<p>All output modules accept the following parameters:</p>
334
335<dl>
336
337<dt><code>-o encoding=<var>name</var></code></dt>
338
339<dd>Use an encoding of <code><var>name</var></code> for the output
340files.</dd>
341<!-- describe default -->
342
343<dt><code>-o indent=<var>n</var></code></dt>
344
345<dd>Indent by <code><var>n</var></code> spaces for each indentation
346level.</dd>
347
348</dl>
349
350<h3><a name="rng-output">RELAX NG (XML syntax) output module</a></h3>
351
352<p>This output module outputs RELAX NG schemas in XML syntax as
353defined by the RELAX NG 1.0 <a
354href="http://www.oasis-open.org/committees/relax-ng/spec.html">Committee
355Specification</a>.</p>
356
357<h3><a name="rnc-output">RELAX NG Compact Syntax output module</a></h3>
358
359<p>This output module outputs RELAX NG schemas in compact syntax as
360defined by the RELAX NG Compact Syntax <a
361href="http://www.oasis-open.org/committees/relax-ng/compact-20021121.html"
362>Committee Specification</a>.</p>
363
364<h3><a name="dtd-output">DTD output module</a></h3>
365
366<p>This output module outputs DTDs as defined by the XML 1.0 <a
367href="http://www.w3.org/TR/REC-xml">Recommendation</a>.</p>
368
369<p>It has many limitations. There are many RELAX NG features that it
370cannot handle, including:</p>
371
372<ul>
373
374<li>Wildcards</li>
375
376<li>Multiple <code>element</code> patterns with the same name</li>
377
378<li><code>externalRef</code></li>
379
380<li>overriding definitions (in an <code>include</code>)</li>
381
382<li>combining definitions with <code>combine="choice"</code></li>
383
384</ul>
385
386<p>However, it can handle many RELAX NG features, including some
387that go beyond the capabilities of DTDs.  When some part of a RELAX NG
388schema cannot be represented exactly in DTD, Trang will try to
389<i>approximate</i> it. The approximation will always be more general,
390that is, the DTD will allow everything that is allowed by the RELAX NG
391schema, but there may be some things that are allowed by the DTD that
392are not allowed by the RELAX NG schema.  For example, if the RELAX NG
393schema specifies that the content of an element is a string conforming
394to some datatype, then Trang will make the content of the element be
395<code>(#PCDATA)</code>; or if the RELAX NG schema specifies a choice
396between two attributes <var>x</var> and <var>y</var>, then the DTD
397will allow both <var>x</var> and <var>y</var> optionally. Whenever
398Trang approximates, it will give a warning message.</p>
399
400<p>If you want to be able to generate a DTD but need to use some
401feature of RELAX NG that Trang is unable to convert into a DTD, then
402you might try one of the following approaches:</p>
403
404<ul>
405
406<li>Create a RELAX NG schema including the features you need, and then
407use XSLT (or some other XML transformation language) to transform the
408schema into something that Trang can handle, perhaps making use of
409annotations in the schema to guide the transformation.</li>
410
411<li>Create a RELAX NG schema <var>S</var><sub>1</sub> which uses only
412features that Trang can handle but which, consequently, does not
413capture all the desired constraints; then create a second RELAX NG
414schema <var>S</var><sub>2</sub> that <code>include</code>s
415<var>S</var><sub>1</sub>, and overrides definitions in
416<var>S</var><sub>1</sub> replacing them with definitions that make
417unrestricted use of the features of RELAX NG.</li>
418
419</ul>
420
421<h3><a name="xsd-output">W3C XML Schema output module</a></h3>
422
423<p>This output module outputs an W3C XML Schema as defined by the XML
424Schema <a href="http://www.w3.org/TR/xmlschema-1/"
425>Recommendation</a>.</p>
426
427<p>It supports the following parameters:</p>
428
429<dl>
430<dt><code>-o disable-abstract-elements</code></dt>
431
432<dd>Disables the use of abstract elements and subsitution groups in
433the generated XML Schema.  This can also be controlled using an <a
434href="#enable-abstract-elements">annotation attribute</a>.</dd>
435
436<dt><code>-o any-process-contents=strict</code>|<code>lax</code>|<code>skip</code></dt>
437
438<dd>Specifies the value for the <code>processContents</code> attribute
439of <code>any</code> elements.  The default is <code>skip</code>
440(corresponding to RELAX NG semantics) unless the input format is
441<code>dtd</code>, in which case the default is <code>strict</code>
442(corresponding to DTD semantics).</dd>
443
444<dt><code>-o any-attribute-process-contents=strict</code>|<code>lax</code>|<code>skip</code></dt>
445
446<dd>Specifies the value for the <code>processContents</code> attribute
447of <code>anyAttribute</code> elements.  The default is
448<code>skip</code> (corresponding to RELAX NG semantics).</dd>
449
450</dl>
451
452
453<p>It has the following limitations:</p>
454
455<ul>
456
457<li>it may generate schemas that violate W3C XML Schema's restrictions
458on ambiguous content models;</li>
459
460<li>it may generate schemas that violate W3C XML Schema's restrictions
461on consistent element types;</li>
462
463<li>when the RELAX NG schema cannot be represented by W3C XML Schema,
464a generalization is generated; it should give a warning in this case,
465but does not always do so.</li>
466
467</ul>
468
469<p>Annotations can be added to the RELAX NG schema to guide the
470translation.  These annotations have the namespace URI
471<code>http://www.thaiopensource.com/ns/relaxng/xsd</code>. This document
472will use the convention that the prefix <code>tx</code> refers to this
473namespace URI; in other words, it will assume a namespace declaration
474of</p>
475
476<pre>xmlns:tx="http://www.thaiopensource.com/ns/relaxng/xsd"</pre>
477
478<p><a name="enable-abstract-elements"/>Currently, only one annotation
479is supported, an attribute <code>tx:enableAbstractElements</code>.
480The value of this must be <code>true</code> or <code>false</code>.  It
481applies to RELAX NG <code>define</code> elements.  Trang has the
482ability to translate a <code>define</code> that contains a choice of
483element patterns into an abstract element declaration, which will be
484used as the head of a substitution group whose members are the
485elements in the choice.  Whether it does this is determined by the
486value of the <code>tx:enableAbstractElements</code> annotation
487attribute.  If the value is <code>true</code>, it will attempt to use
488an abstract element element.  If the value is <code>false</code>, it
489will not, which means the <code>define</code> will typically be
490translated into a group definition.</p>
491
492<p>The <code>tx:enableAbstractElements</code> attribute is inherited
493in a similar way to the <code>ns</code> attribute: it can be specified
494on a <code>grammar</code>, <code>div</code> or <code>include</code>
495element to enable or disable the use of abstract elements for all
496descendant <code>define</code> elements. In the absence of any
497inherited <code>tx:enableAbstractElements</code> attribute, the use of
498abstract elements is enabled unless the <code>-o
499disable-abstract-elements</code> option was specified.</p>
500
501<p>It can happen that the same element name occurs in a choice in more
502than one <code>define</code> element; at most one of these
503<code>define</code> elements can be translated to an abstract element.
504In this case, Trang will not translate any of them to an abstract
505element, unless the use of abstract elements has been disabled by
506<code>tx:enableAbstractElements</code> for all except one of the
507<code>define</code> elements.</p>
508
509<p>In fact, the use of abstract elements is not restricted to the case
510where the <code>define</code> consists of a <code>choice</code> that
511contains only <code>element</code> patterns; the <code>choice</code>
512may also contain <code>ref</code> patterns referring to definitions
513that are to be translated into element declarations, whether abstract
514or not. The <code>tx:enableAbstractElements</code> attribute applies
515equally to these definitions.</p>
516
517<!--
518<h2><a name="examples">Examples</a></h2>
519-->
520
521</body>
522</html>