Extending.html | searchcode

/Doc/Manual/Extending.html

https://github.com/sunaku/swig-ruby-ffi · HTML · 4034 lines · 3273 code · 755 blank · 6 comment · 0 complexity · 2ef4bba18c35dc72606eabd75981c999 MD5 · raw file
Large files are truncated click here to view the full file

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Extending SWIG to support new languages</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>

<body bgcolor="#ffffff">
<H1><a name="Extending"></a>35 Extending SWIG to support new languages</H1>
<!-- INDEX -->
<div class="sectiontoc">
<ul>
<li><a href="#Extending_nn2">Introduction</a>
<li><a href="#Extending_nn3">Prerequisites</a>
<li><a href="#Extending_nn4">The Big Picture</a>
<li><a href="#Extending_nn5">Execution Model</a>
<ul>
<li><a href="#Extending_nn6">Preprocessing</a>
<li><a href="#Extending_nn7">Parsing</a>
<li><a href="#Extending_nn8">Parse Trees</a>
<li><a href="#Extending_nn9">Attribute namespaces</a>
<li><a href="#Extending_nn10">Symbol Tables</a>
<li><a href="#Extending_nn11">The %feature directive</a>
<li><a href="#Extending_nn12">Code Generation</a>
<li><a href="#Extending_nn13">SWIG and XML</a>
</ul>
<li><a href="#Extending_nn14">Primitive Data Structures</a>
<ul>
<li><a href="#Extending_nn15">Strings</a>
<li><a href="#Extending_nn16">Hashes</a>
<li><a href="#Extending_nn17">Lists</a>
<li><a href="#Extending_nn18">Common operations</a>
<li><a href="#Extending_nn19">Iterating over Lists and Hashes</a>
<li><a href="#Extending_nn20">I/O</a>
</ul>
<li><a href="#Extending_nn21">Navigating and manipulating parse trees</a>
<li><a href="#Extending_nn22">Working with attributes</a>
<li><a href="#Extending_nn23">Type system</a>
<ul>
<li><a href="#Extending_nn24">String encoding of types</a>
<li><a href="#Extending_nn25">Type construction</a>
<li><a href="#Extending_nn26">Type tests</a>
<li><a href="#Extending_nn27">Typedef and inheritance</a>
<li><a href="#Extending_nn28">Lvalues</a>
<li><a href="#Extending_nn29">Output functions</a>
</ul>
<li><a href="#Extending_nn30">Parameters</a>
<li><a href="#Extending_nn31">Writing a Language Module</a>
<ul>
<li><a href="#Extending_nn32">Execution model</a>
<li><a href="#Extending_starting_out">Starting out</a>
<li><a href="#Extending_nn34">Command line options</a>
<li><a href="#Extending_nn35">Configuration and preprocessing</a>
<li><a href="#Extending_nn36">Entry point to code generation</a>
<li><a href="#Extending_nn37">Module I/O and wrapper skeleton</a>
<li><a href="#Extending_nn38">Low-level code generators</a>
<li><a href="#Extending_nn39">Configuration files</a>
<li><a href="#Extending_nn40">Runtime support</a>
<li><a href="#Extending_nn41">Standard library files</a>
<li><a href="#Extending_nn42">User examples</a>
<li><a href="#Extending_test_suite">Test driven development and the test-suite</a>
<ul>
<li><a href="#Extending_running_test_suite">Running the test-suite</a>
</ul>
<li><a href="#Extending_nn43">Documentation</a>
<li><a href="#Extending_prerequisites">Prerequisites for adding a new language module to the SWIG distribution</a>
<li><a href="#Extending_coding_style_guidelines">Coding style guidelines</a>
</ul>
<li><a href="#Extending_debugging_options">Debugging Options</a>
<li><a href="#Extending_nn46">Guide to parse tree nodes</a>
<li><a href="#Extending_further_info">Further Development Information</a>
</ul>
</div>
<!-- INDEX -->



<H2><a name="Extending_nn2"></a>35.1 Introduction</H2>


<p>
This chapter describes SWIG's internal organization and the process by which
new target languages can be developed.    First, a brief word of warning---SWIG
is continually evolving.
The information in this chapter is mostly up to
date, but changes are ongoing.   Expect a few inconsistencies.
</p>

<p>
Also, this chapter is not meant to be a hand-holding tutorial.  As a starting point,
you should probably look at one of SWIG's existing modules.
</p>

<H2><a name="Extending_nn3"></a>35.2 Prerequisites</H2>


<p>
In order to extend SWIG, it is useful to have the following background:
</p>

<ul>
<li>An understanding of the C API for the target language.
<li>A good grasp of the C++ type system.
<li>An understanding of typemaps and some of SWIG's advanced features.
<li>Some familiarity with writing C++ (language modules are currently written in C++).
</ul>

<p>
Since SWIG is essentially a specialized C++ compiler, it may be useful
to have some prior experience with compiler design (perhaps even a
compilers course) to better understand certain parts of the system.  A
number of books will also be useful.  For example, "The C Programming
Language" by Kernighan and Ritchie (a.k.a, "K&amp;R") and the C++ standard,
"ISO/IEC 14882 Programming Languages - C++" will be of great use.
</p>

<p>
Also, it is useful to keep in mind that SWIG primarily operates as an
extension of the C++ <em>type</em> system.  At first glance, this might not be
obvious, but almost all SWIG directives as well as the low-level generation of
wrapper code are driven by C++ datatypes.
</p>

<H2><a name="Extending_nn4"></a>35.3 The Big Picture</H2>


<p>
SWIG is a special purpose compiler that parses C++ declarations to
generate wrapper code.  To make this conversion possible, SWIG makes
three fundamental extensions to the C++ language:
</p>

<ul>
<li><b>Typemaps</b>. Typemaps are used to define the
conversion/marshalling behavior of specific C++ datatypes.  All type conversion in SWIG is
based on typemaps.  Furthermore, the association of typemaps to datatypes utilizes an advanced pattern matching
mechanism that is fully integrated with the C++ type system.
</li>

<li><b>Declaration Annotation</b>. To customize wrapper code
generation, most declarations can be annotated with special features.
For example, you can make a variable read-only, you can ignore a
declaration, you can rename a member function, you can add exception
handling, and so forth.  Virtually all of these customizations are built on top of a low-level
declaration annotator that can attach arbitrary attributes to any declaration.
Code generation modules can look for these attributes to guide the wrapping process.
</li>

<li><b>Class extension</b>. SWIG allows classes and structures to be extended with new
methods and attributes (the <tt>%extend</tt> directive).   This has the effect of altering
the API in the target language and can be used to generate OO interfaces to C libraries.
</ul>

<p>
It is important to emphasize that virtually all SWIG features reduce to one of these three
fundamental concepts.  The type system and pattern matching rules also play a critical
role in making the system work.  For example, both typemaps and declaration annotation are
based on pattern matching and interact heavily with the underlying type system.
</p>

<H2><a name="Extending_nn5"></a>35.4 Execution Model</H2>


<p>
When you run SWIG on an interface, processing is handled in stages by a series of system components:
</p>

<ul>
<li>An integrated C preprocessor reads a collection of configuration
files and the specified interface file into memory.  The preprocessor
performs the usual functions including macro expansion and file
inclusion.   However, the preprocessor also performs some transformations of the
interface.  For instance, <tt>#define</tt> statements are sometimes transformed into
<tt>%constant</tt> declarations.  In addition, information related to file/line number
tracking is inserted.
</li>

<li>A C/C++ parser reads the preprocessed input and generates a full
parse tree of all of the SWIG directives and C declarations found.
The parser is responsible for many aspects of the system including
renaming, declaration annotation, and template expansion.  However, the parser
does not produce any output nor does it interact with the target
language module as it runs.  SWIG is not a one-pass compiler.
</li>

<li>A type-checking pass is made. This adjusts all of the C++ typenames to properly
handle namespaces, typedefs, nested classes, and other issues related to type scoping.
</li>

<li>A semantic pass is made on the parse tree to collect information
related to properties of the C++ interface.  For example, this pass
would determine whether or not a class allows a default constructor.
</li>

<li>A code generation pass is made using a specific target language
module.  This phase is responsible for generating the actual wrapper
code.  All of SWIG's user-defined modules are invoked during this
latter stage of compilation.
</li>
</ul>

<p>
The next few sections briefly describe some of these stages.
</p>

<H3><a name="Extending_nn6"></a>35.4.1 Preprocessing</H3>


<p>
The preprocessor plays a critical role in the SWIG implementation.  This is because a lot
of SWIG's processing and internal configuration is managed not by code written in C, but
by configuration files in the SWIG library.  In fact, when you
run SWIG, parsing starts with a small interface file like this (note: this explains
the cryptic error messages that new users sometimes get when SWIG is misconfigured or installed
incorrectly):
</p>

<div class="code">
<pre>
%include "swig.swg"             // Global SWIG configuration
%include "<em>langconfig.swg</em>"       // Language specific configuration
%include "yourinterface.i"      // Your interface file
</pre>
</div>

<p>
The <tt>swig.swg</tt> file contains global configuration information.  In addition, this file
defines many of SWIG's standard directives as macros.  For instance, part of
of <tt>swig.swg</tt> looks like this:
</p>

<div class="code">
<pre>
...
/* Code insertion directives such as %wrapper %{ ... %} */

#define %begin       %insert("begin")
#define %runtime     %insert("runtime")
#define %header      %insert("header")
#define %wrapper     %insert("wrapper")
#define %init        %insert("init")

/* Access control directives */

#define %immutable   %feature("immutable","1")
#define %mutable     %feature("immutable")

/* Directives for callback functions */

#define %callback(x) %feature("callback") `x`;
#define %nocallback  %feature("callback");

/* %ignore directive */

#define %ignore         %rename($ignore)
#define %ignorewarn(x)  %rename("$ignore:" x)
...
</pre>
</div>

<p>
The fact that most of the standard SWIG directives are macros is
intended to simplify the implementation of the internals.  For instance,
rather than having to support dozens of special directives, it is
easier to have a few basic primitives such as <tt>%feature</tt> or
<tt>%insert</tt>.
</p>

<p>
The <em><tt>langconfig.swg</tt></em> file is supplied by the target
language. This file contains language-specific configuration
information.  More often than not, this file provides run-time wrapper
support code (e.g., the type-checker) as well as a collection of
typemaps that define the default wrapping behavior.  Note: the name of this
file depends on the target language and is usually something like <tt>python.swg</tt>
or <tt>perl5.swg</tt>.
</p>

<p>
As a debugging aide, the text that SWIG feeds to its C++ parser can be
obtained by running <tt>swig -E interface.i</tt>.  This output
probably isn't too useful in general, but it will show how macros have
been expanded as well as everything else that goes into the low-level
construction of the wrapper code.
</p>

<H3><a name="Extending_nn7"></a>35.4.2 Parsing</H3>


<p>
The current C++ parser handles a subset of C++.  Most incompatibilities with C are due to
subtle aspects of how SWIG parses declarations.  Specifically, SWIG expects all C/C++ declarations to follow this general form:
</p>

<div class="diagram">
<pre>
<em>storage</em> <em>type</em> <em>declarator</em> <em>initializer</em>;
</pre>
</div>

<p>
<tt><em>storage</em></tt> is a keyword such as <tt>extern</tt>,
<tt>static</tt>, <tt>typedef</tt>, or <tt>virtual</tt>.  <tt><em>type</em></tt> is a primitive
datatype such as <tt>int</tt> or <tt>void</tt>.   <tt><em>type</em></tt> may be optionally
qualified with a qualifier such as <tt>const</tt> or <tt>volatile</tt>. <tt><em>declarator</em></tt>
is a name with additional type-construction modifiers attached to it (pointers, arrays, references,
functions, etc.).  Examples of declarators include <tt>*x</tt>, <tt>**x</tt>, <tt>x[20]</tt>, and
<tt>(*x)(int,double)</tt>.   The <tt><em>initializer</em></tt> may be a value assigned using <tt>=</tt> or
body of code enclosed in braces <tt>{ ... }</tt>.
</p>

<p>
This declaration format covers most common C++ declarations. However, the C++ standard
is somewhat more flexible in the placement of the parts.  For example, it is technically legal, although
uncommon to write something like <tt>int typedef const a</tt> in your program.   SWIG simply
doesn't bother to deal with this case.
</p>

<p>
The other significant difference between C++ and SWIG is in the
treatment of typenames.  In C++, if you have a declaration like this,
</p>

<div class="code">
<pre>
int blah(Foo *x, Bar *y);
</pre>
</div>

<p>
it won't parse correctly unless <tt>Foo</tt> and <tt>Bar</tt> have
been previously defined as types either using a <tt>class</tt>
definition or a <tt>typedef</tt>.  The reasons for this are subtle,
but this treatment of typenames is normally integrated at the level of the C
tokenizer---when a typename appears, a different token is returned to the parser
instead of an identifier.
</p>

<p>
SWIG does not operate in this manner--any legal identifier can be used
as a type name.  The reason for this is primarily motivated by the use
of SWIG with partially defined data.  Specifically,
SWIG is supposed to be easy to use on interfaces with missing type information.
</p>

<p>
Because of the different treatment of typenames, the most serious
limitation of the SWIG parser is that it can't process type declarations where
an extra (and unnecessary) grouping operator is used.  For example:
</p>

<div class="code">
<pre>
int (x);         /* A variable x */
int (y)(int);    /* A function y */
</pre>
</div>

<p>
The placing of extra parentheses in type declarations like this is
already recognized by the C++ community as a potential source of
strange programming errors. For example, Scott Meyers "Effective STL"
discusses this problem in a section on avoiding C++'s "most vexing
parse."
</p>

<p>
The parser is also unable to handle declarations with no return type or bare argument names.
For example, in an old C program, you might see things like this:
</p>

<div class="code">
<pre>
foo(a,b) {
...
}
</pre>
</div>

<p>
In this case, the return type as well as the types of the arguments
are taken by the C compiler to be an <tt>int</tt>.  However, SWIG
interprets the above code as an abstract declarator for a function
returning a <tt>foo</tt> and taking types <tt>a</tt> and <tt>b</tt> as
arguments).
</p>

<H3><a name="Extending_nn8"></a>35.4.3 Parse Trees</H3>


<p>
The SWIG parser produces a complete parse tree of the input file before any wrapper code
is actually generated.  Each item in the tree is known as a "Node".   Each node is identified
by a symbolic tag.   Furthermore, a node may have an arbitrary number of children.
The parse tree structure and tag names of an interface can be displayed using <tt>swig -debug-tags</tt>.
For example:
</p>

<div class="shell">
<pre>
$ <b>swig -c++ -python -debug-tags example.i</b>
 . top (example.i:1)
 . top . include (example.i:1)
 . top . include . typemap (/r0/beazley/Projects/lib/swig1.3/swig.swg:71)
 . top . include . typemap . typemapitem (/r0/beazley/Projects/lib/swig1.3/swig.swg:71)
 . top . include . typemap (/r0/beazley/Projects/lib/swig1.3/swig.swg:83)
 . top . include . typemap . typemapitem (/r0/beazley/Projects/lib/swig1.3/swig.swg:83)
 . top . include (example.i:4)
 . top . include . insert (/r0/beazley/Projects/lib/swig1.3/python/python.swg:7)
 . top . include . insert (/r0/beazley/Projects/lib/swig1.3/python/python.swg:8)
 . top . include . typemap (/r0/beazley/Projects/lib/swig1.3/python/python.swg:19)
...
 . top . include (example.i:6)
 . top . include . module (example.i:2)
 . top . include . insert (example.i:6)
 . top . include . include (example.i:9)
 . top . include . include . class (example.h:3)
 . top . include . include . class . access (example.h:4)
 . top . include . include . class . constructor (example.h:7)
 . top . include . include . class . destructor (example.h:10)
 . top . include . include . class . cdecl (example.h:11)
 . top . include . include . class . cdecl (example.h:11)
 . top . include . include . class . cdecl (example.h:12)
 . top . include . include . class . cdecl (example.h:13)
 . top . include . include . class . cdecl (example.h:14)
 . top . include . include . class . cdecl (example.h:15)
 . top . include . include . class (example.h:18)
 . top . include . include . class . access (example.h:19)
 . top . include . include . class . cdecl (example.h:20)
 . top . include . include . class . access (example.h:21)
 . top . include . include . class . constructor (example.h:22)
 . top . include . include . class . cdecl (example.h:23)
 . top . include . include . class . cdecl (example.h:24)
 . top . include . include . class (example.h:27)
 . top . include . include . class . access (example.h:28)
 . top . include . include . class . cdecl (example.h:29)
 . top . include . include . class . access (example.h:30)
 . top . include . include . class . constructor (example.h:31)
 . top . include . include . class . cdecl (example.h:32)
 . top . include . include . class . cdecl (example.h:33)
</pre>
</div>

<p>
Even for the most simple interface, the parse tree structure is larger than you might expect.  For example, in the
above output, a substantial number of nodes are actually generated by the <tt>python.swg</tt> configuration file
which defines typemaps and other directives.   The contents of the user-supplied input file don't appear until the end
of the output.
</p>

<p>
The contents of each parse tree node consist of a collection of attribute/value
pairs.  Internally, the nodes are simply represented by hash tables.  A display of
the entire parse-tree structure can be obtained using <tt>swig -debug-top &lt;n&gt;</tt>, where <tt>n</tt> is
the stage being processed. 
There are a number of other parse tree display options, for example, <tt>swig -debug-module &lt;n&gt;</tt> will
avoid displaying system parse information and only display the parse tree pertaining to the user's module at
stage <tt>n</tt> of processing.
</p>

<div class="shell">
<pre>
$ swig -c++ -python -debug-module 4 example.i
      +++ include ----------------------------------------
      | name         - "example.i"

            +++ module ----------------------------------------
            | name         - "example"
            |
            +++ insert ----------------------------------------
            | code         - "\n#include \"example.h\"\n"
            |
            +++ include ----------------------------------------
            | name         - "example.h"

                  +++ class ----------------------------------------
                  | abstract     - "1"
                  | sym:name     - "Shape"
                  | name         - "Shape"
                  | kind         - "class"
                  | symtab       - 0x40194140
                  | sym:symtab   - 0x40191078

                        +++ access ----------------------------------------
                        | kind         - "public"
                        |
                        +++ constructor ----------------------------------------
                        | sym:name     - "Shape"
                        | name         - "Shape"
                        | decl         - "f()."
                        | code         - "{\n    nshapes++;\n  }"
                        | sym:symtab   - 0x40194140
                        |
                        +++ destructor ----------------------------------------
                        | sym:name     - "~Shape"
                        | name         - "~Shape"
                        | storage      - "virtual"
                        | code         - "{\n    nshapes--;\n  }"
                        | sym:symtab   - 0x40194140
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "x"
                        | name         - "x"
                        | decl         - ""
                        | type         - "double"
                        | sym:symtab   - 0x40194140
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "y"
                        | name         - "y"
                        | decl         - ""
                        | type         - "double"
                        | sym:symtab   - 0x40194140
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "move"
                        | name         - "move"
                        | decl         - "f(double,double)."
                        | parms        - double ,double
                        | type         - "void"
                        | sym:symtab   - 0x40194140
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "area"
                        | name         - "area"
                        | decl         - "f(void)."
                        | parms        - void
                        | storage      - "virtual"
                        | value        - "0"
                        | type         - "double"
                        | sym:symtab   - 0x40194140
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "perimeter"
                        | name         - "perimeter"
                        | decl         - "f(void)."
                        | parms        - void
                        | storage      - "virtual"
                        | value        - "0"
                        | type         - "double"
                        | sym:symtab   - 0x40194140
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "nshapes"
                        | name         - "nshapes"
                        | decl         - ""
                        | storage      - "static"
                        | type         - "int"
                        | sym:symtab   - 0x40194140
                        |
                  +++ class ----------------------------------------
                  | sym:name     - "Circle"
                  | name         - "Circle"
                  | kind         - "class"
                  | bases        - 0x40194510
                  | symtab       - 0x40194538
                  | sym:symtab   - 0x40191078

                        +++ access ----------------------------------------
                        | kind         - "private"
                        |
                        +++ cdecl ----------------------------------------
                        | name         - "radius"
                        | decl         - ""
                        | type         - "double"
                        |
                        +++ access ----------------------------------------
                        | kind         - "public"
                        |
                        +++ constructor ----------------------------------------
                        | sym:name     - "Circle"
                        | name         - "Circle"
                        | parms        - double
                        | decl         - "f(double)."
                        | code         - "{ }"
                        | sym:symtab   - 0x40194538
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "area"
                        | name         - "area"
                        | decl         - "f(void)."
                        | parms        - void
                        | storage      - "virtual"
                        | type         - "double"
                        | sym:symtab   - 0x40194538
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "perimeter"
                        | name         - "perimeter"
                        | decl         - "f(void)."
                        | parms        - void
                        | storage      - "virtual"
                        | type         - "double"
                        | sym:symtab   - 0x40194538
                        |
                  +++ class ----------------------------------------
                  | sym:name     - "Square"
                  | name         - "Square"
                  | kind         - "class"
                  | bases        - 0x40194760
                  | symtab       - 0x40194788
                  | sym:symtab   - 0x40191078

                        +++ access ----------------------------------------
                        | kind         - "private"
                        |
                        +++ cdecl ----------------------------------------
                        | name         - "width"
                        | decl         - ""
                        | type         - "double"
                        |
                        +++ access ----------------------------------------
                        | kind         - "public"
                        |
                        +++ constructor ----------------------------------------
                        | sym:name     - "Square"
                        | name         - "Square"
                        | parms        - double
                        | decl         - "f(double)."
                        | code         - "{ }"
                        | sym:symtab   - 0x40194788
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "area"
                        | name         - "area"
                        | decl         - "f(void)."
                        | parms        - void
                        | storage      - "virtual"
                        | type         - "double"
                        | sym:symtab   - 0x40194788
                        |
                        +++ cdecl ----------------------------------------
                        | sym:name     - "perimeter"
                        | name         - "perimeter"
                        | decl         - "f(void)."
                        | parms        - void
                        | storage      - "virtual"
                        | type         - "double"
                        | sym:symtab   - 0x40194788
</pre>
</div>

<H3><a name="Extending_nn9"></a>35.4.4 Attribute namespaces</H3>


<p>
Attributes of parse tree nodes are often prepended with a namespace qualifier.
For example, the attributes
<tt>sym:name</tt> and <tt>sym:symtab</tt> are attributes related to
symbol table management and are prefixed with <tt>sym:</tt>.  As a
general rule, only those attributes which are directly related to the raw declaration
appear without a prefix (type, name, declarator, etc.).
</p>

<p>
Target language modules may add additional attributes to nodes to assist the generation
of wrapper code.  The convention for doing this is to place these attributes in a namespace
that matches the name of the target language.  For example, <tt>python:foo</tt> or
<tt>perl:foo</tt>.
</p>

<H3><a name="Extending_nn10"></a>35.4.5 Symbol Tables</H3>


<p>
During parsing, all symbols are managed in the space of the target
language.  The <tt>sym:name</tt> attribute of each node contains the symbol name
selected by the parser.  Normally, <tt>sym:name</tt> and <tt>name</tt>
are the same.  However, the <tt>%rename</tt> directive can be used to
change the value of <tt>sym:name</tt>.  You can see the effect of
<tt>%rename</tt> by trying it on a simple interface and dumping the
parse tree.  For example:
</p>

<div class="code">
<pre>
%rename(foo_i) foo(int);
%rename(foo_d) foo(double);

void foo(int);
void foo(double);
void foo(Bar *b);
</pre>
</div>

<p>
There are various <tt>debug-</tt> options that can be useful for debugging and analysing the parse tree.
For example, the <tt>debug-top &lt;n&gt;</tt> or <tt>debug-module &lt;n&gt;</tt> options will
dump the entire/top of the parse tree or the module subtree at one of the four <tt>n</tt> stages of processing.
The parse tree can be viewed after the final stage of processing by running SWIG:
</p>

<div class="shell">
<pre>
$ swig -debug-top 4 example.i
...
            +++ cdecl ----------------------------------------
            | sym:name     - "foo_i"
            | name         - "foo"
            | decl         - "f(int)."
            | parms        - int
            | type         - "void"
            | sym:symtab   - 0x40165078
            |
            +++ cdecl ----------------------------------------
            | sym:name     - "foo_d"
            | name         - "foo"
            | decl         - "f(double)."
            | parms        - double
            | type         - "void"
            | sym:symtab   - 0x40165078
            |
            +++ cdecl ----------------------------------------
            | sym:name     - "foo"
            | name         - "foo"
            | decl         - "f(p.Bar)."
            | parms        - Bar *
            | type         - "void"
            | sym:symtab   - 0x40165078
</pre>
</div>

<p>
All symbol-related conflicts and complaints about overloading are based on <tt>sym:name</tt> values.
For instance, the following example uses <tt>%rename</tt> in reverse to generate a name clash.
</p>

<div class="code">
<pre>
%rename(foo) foo_i(int);
%rename(foo) foo_d(double;

void foo_i(int);
void foo_d(double);
void foo(Bar *b);
</pre>
</div>

<p>
When you run SWIG on this you now get:
</p>

<div class="shell">
<pre>
$ ./swig example.i
example.i:6. Overloaded declaration ignored.  foo_d(double )
example.i:5. Previous declaration is foo_i(int )
example.i:7. Overloaded declaration ignored.  foo(Bar *)
example.i:5. Previous declaration is foo_i(int )
</pre>
</div>

<H3><a name="Extending_nn11"></a>35.4.6 The %feature directive</H3>


<p>
A number of SWIG directives such as <tt>%exception</tt> are implemented using the
low-level <tt>%feature</tt> directive.  For example:
</p>

<div class="code">
<pre>
%feature("except") getitem(int) {
  try {
     $action
  } catch (badindex) {
     ...
  }
}

...
class Foo {
public:
    Object *getitem(int index) throws(badindex);
    ...
};
</pre>
</div>

<p>
The behavior of <tt>%feature</tt> is very easy to describe--it simply
attaches a new attribute to any parse tree node that matches the
given prototype.   When a feature is added, it shows up as an attribute in the <tt>feature:</tt> namespace.
You can see this when running with the <tt>-debug-top 4</tt> option.   For example:
</p>

<div class="shell">
<pre>
 +++ cdecl ----------------------------------------
 | sym:name     - "getitem"
 | name         - "getitem"
 | decl         - "f(int).p."
 | parms        - int
 | type         - "Object"
 | feature:except - "{\n    try {\n       $action\n    } catc..."
 | sym:symtab   - 0x40168ac8
 |
</pre>
</div>

<p>
Feature names are completely arbitrary and a target language module can be
programmed to respond to any feature name that it wants to recognize.  The 
data stored in a feature attribute is usually just a raw unparsed string.   
For example, the exception code above is simply
stored without any modifications.
</p>

<H3><a name="Extending_nn12"></a>35.4.7 Code Generation</H3>


<p>
Language modules work by defining handler functions that know how to respond to
different types of parse-tree nodes.  These handlers simply look at the
attributes of each node in order to produce low-level code.
</p>

<p>
In reality, the generation of code is somewhat more subtle than simply
invoking handler functions.  This is because parse-tree nodes might be
transformed. For example, suppose you are wrapping a class like this:
</p>

<div class="code">
<pre>
class Foo {
public:
    virtual int *bar(int x);
};
</pre>
</div>

<p>
When the parser constructs a node for the member <tt>bar</tt>, it creates a raw "cdecl" node with the following
attributes:
</p>

<div class="diagram">
<pre>
nodeType    : cdecl
name        : bar
type        : int
decl        : f(int).p
parms       : int x
storage     : virtual
sym:name    : bar
</pre>
</div>

<p>
To produce wrapper code, this "cdecl" node undergoes a number of transformations.  First, the node is recognized as a function declaration.   This adjusts some of the type information--specifically, the declarator is joined with the base datatype to produce this:
</p>

<div class="diagram">
<pre>
nodeType    : cdecl
name        : bar
type        : p.int        &lt;-- Notice change in return type
decl        : f(int).p
parms       : int x
storage     : virtual
sym:name    : bar
</pre>
</div>

<p>
Next, the context of the node indicates that the node is really a
member function.  This produces a transformation to a low-level
accessor function like this:
</p>

<div class="diagram">
<pre>
nodeType    : cdecl
name        : bar
type        : int.p
decl        : f(int).p
parms       : Foo *self, int x            &lt;-- Added parameter
storage     : virtual
wrap:action : result = (arg1)-&gt;bar(arg2)  &lt;-- Action code added
sym:name    : Foo_bar                     &lt;-- Symbol name changed
</pre>
</div>

<p>
In this transformation, notice how an additional parameter was added
to the parameter list and how the symbol name of the node has suddenly
changed into an accessor using the naming scheme described in the
"SWIG Basics" chapter.  A small fragment of "action" code has also
been generated--notice how the <tt>wrap:action</tt> attribute defines
the access to the underlying method.  The data in this transformed
node is then used to generate a wrapper.
</p>

<p>
Language modules work by registering handler functions for dealing with
various types of nodes at different stages of transformation.   This is done by
inheriting from a special <tt>Language</tt> class and defining a collection
of virtual methods.   For example, the Python module defines a class as
follows:
</p>

<div class="code">
<pre>
class PYTHON : public Language {
protected:
public :
  virtual void main(int, char *argv[]);
  virtual int  top(Node *);
  virtual int  functionWrapper(Node *);
  virtual int  constantWrapper(Node *);
  virtual int  variableWrapper(Node *);
  virtual int  nativeWrapper(Node *);
  virtual int  membervariableHandler(Node *);
  virtual int  memberconstantHandler(Node *);
  virtual int  memberfunctionHandler(Node *);
  virtual int  constructorHandler(Node *);
  virtual int  destructorHandler(Node *);
  virtual int  classHandler(Node *);
  virtual int  classforwardDeclaration(Node *);
  virtual int  insertDirective(Node *);
  virtual int  importDirective(Node *);
};
</pre>
</div>

<p>
The role of these functions is described shortly.
</p>

<H3><a name="Extending_nn13"></a>35.4.8 SWIG and XML</H3>


<p>
Much of SWIG's current parser design was originally motivated by
interest in using XML to represent SWIG parse trees.  Although XML is
not currently used in any direct manner, the parse tree structure, use
of node tags, attributes, and attribute namespaces are all influenced
by aspects of XML parsing.  Therefore, in trying to understand SWIG's
internal data structures, it may be useful to keep XML in the back of
your mind as a model.
</p>

<H2><a name="Extending_nn14"></a>35.5 Primitive Data Structures</H2>


<p>
Most of SWIG is constructed using three basic data structures:
strings, hashes, and lists.  These data structures are dynamic in same way as
similar structures found in many scripting languages.  For instance,
you can have containers (lists and hash tables) of mixed types and
certain operations are polymorphic.
</p>

<p>
This section briefly describes the basic structures so that later
sections of this chapter make more sense.
</p>

<p>
When describing the low-level API, the following type name conventions are
used:
</p>

<ul>
<li><tt>String</tt>.  A string object.
<li><tt>Hash</tt>. A hash object.
<li><tt>List</tt>. A list object.
<li><tt>String_or_char</tt>. A string object or a <tt>char *</tt>.
<li><tt>Object_or_char</tt>. An object or a <tt>char *</tt>.
<li><tt>Object</tt>.  Any object (string, hash, list, etc.)
</ul>

<p>
In most cases, other typenames in the source are aliases for one of these
primitive types.   Specifically:
</p>

<div class="code">
<pre>
typedef String SwigType;
typedef Hash   Parm;
typedef Hash   ParmList;
typedef Hash   Node;
typedef Hash   Symtab;
typedef Hash   Typetab;
</pre>
</div>

<H3><a name="Extending_nn15"></a>35.5.1 Strings</H3>


<p>
<b><tt>String *NewString(const String_or_char *val)</tt></b>
</p>

<div class="indent">
Creates a new string with initial value <tt>val</tt>. <tt>val</tt> may
be a <tt>char *</tt> or another <tt>String</tt> object.   If you want
to create an empty string, use "" for val.
</div>

<p>
<b><tt>String *NewStringf(const char *fmt, ...)</tt></b>
</p>

<div class="indent">
Creates a new string whose initial value is set according to a C <tt>printf</tt> style
format string in <tt>fmt</tt>.   Additional arguments follow depending
on <tt>fmt</tt>.
</div>

<p>
<b><tt>String *Copy(String *s)</tt></b>
</p>

<div class="indent">
Make a copy of the string <tt>s</tt>.
</div>

<p>
<b><tt>void Delete(String *s)</tt></b>
</p>

<div class="indent">
Deletes <tt>s</tt>.
</div>

<p>
<b><tt>int Len(const String_or_char *s)</tt></b>
</p>

<div class="indent">
Returns the length of the string.
</div>

<p>
<b><tt>char *Char(const String_or_char *s)</tt></b>
</p>

<div class="indent">
Returns a pointer to the first character in a string.
</div>

<p>
<b><tt>void Append(String *s, const String_or_char *t)</tt></b>
</p>

<div class="indent">
Appends <tt>t</tt> to the end of string <tt>s</tt>.
</div>

<p>
<b><tt>void Insert(String *s, int pos, const String_or_char *t)</tt></b>
</p>

<div class="indent">
Inserts <tt>t</tt> into <tt>s</tt> at position <tt>pos</tt>.  The contents
of <tt>s</tt> are shifted accordingly.    The special value <tt>DOH_END</tt>
can be used for <tt>pos</tt> to indicate insertion at the end of the string (appending).
</div>

<p>
<b><tt>int Strcmp(const String_or_char *s, const String_or_char *t)</tt></b>
</p>

<div class="indent">
Compare strings <tt>s</tt> and <tt>t</tt>.   Same as the C <tt>strcmp()</tt>
function.
</div>

<p>
<b><tt>int Strncmp(const String_or_char *s, const String_or_char *t, int len)</tt></b>
</p>

<div class="indent">
Compare the first <tt>len</tt> characters of strings <tt>s</tt> and <tt>t</tt>.   Same as the C <tt>strncmp()</tt>
function.
</div>

<p>
<b><tt>char *Strstr(const String_or_char *s, const String_or_char *pat)</tt></b>
</p>

<div class="indent">
Returns a pointer to the first occurrence of <tt>pat</tt> in <tt>s</tt>.
Same as the C <tt>strstr()</tt> function.
</div>

<p>
<b><tt>char *Strchr(const String_or_char *s, char ch)</tt></b>
</p>

<div class="indent">
Returns a pointer to the first occurrence of character <tt>ch</tt> in <tt>s</tt>.
Same as the C <tt>strchr()</tt> function.
</div>

<p>
<b><tt>void Chop(String *s)</tt></b>
</p>

<div class="indent">
Chops trailing whitespace off the end of <tt>s</tt>.
</div>

<p>
<b><tt>int Replace(String *s, const String_or_char *pat, const String_or_char *rep, int flags)</tt></b>
</p>

<div class="indent">
<p>
Replaces the pattern <tt>pat</tt> with <tt>rep</tt> in string <tt>s</tt>.
<tt>flags</tt> is a combination of the following flags:</p>

<div class="code">
<pre>
DOH_REPLACE_ANY       - Replace all occurrences
DOH_REPLACE_ID        - Valid C identifiers only
DOH_REPLACE_NOQUOTE   - Don't replace in quoted strings
DOH_REPLACE_FIRST     - Replace first occurrence only.
</pre>
</div>

<p>
Returns the number of replacements made (if any).
</p>

</div>

<H3><a name="Extending_nn16"></a>35.5.2 Hashes</H3>


<p>
<b><tt>Hash *NewHash()</tt></b>
</p>

<div class="indent">
Creates a new empty hash table.
</div>

<p>
<b><tt>Hash *Copy(Hash *h)</tt></b>
</p>

<div class="indent">
Make a shallow copy of the hash <tt>h</tt>.
</div>

<p>
<b><tt>void Delete(Hash *h)</tt></b>
</p>

<div class="indent">
Deletes <tt>h</tt>.
</div>

<p>
<b><tt>int Len(Hash *h)</tt></b>
</p>

<div class="indent">
Returns the number of items in <tt>h</tt>.
</div>

<p>
<b><tt>Object *Getattr(Hash *h, const String_or_char *key)</tt></b>
</p>

<div class="indent">
Gets an object from <tt>h</tt>.  <tt>key</tt> may be a string or
a simple <tt>char *</tt> string.   Returns NULL if not found.
</div>

<p>
<b><tt>int Setattr(Hash *h, const String_or_char *key, const Object_or_char *val)</tt></b>
</p>

<div class="indent">
Stores <tt>val</tt> in <tt>h</tt>. <tt>key</tt> may be a string or
a simple <tt>char *</tt>. If <tt>val</tt> is not a standard
object (String, Hash, or List) it is assumed to be a <tt>char *</tt> in which
case it is used to construct a <tt>String</tt> that is stored in the hash.
If <tt>val</tt> is NULL, the object is deleted. Increases the reference count
of <tt>val</tt>.   Returns 1 if this operation replaced an existing hash entry,
0 otherwise.
</div>

<p>
<b><tt>int Delattr(Hash *h, const String_or_char *key)</tt></b>
</p>

<div class="indent">
Deletes the hash item referenced by <tt>key</tt>.  Decreases the
reference count on the corresponding object (if any).  Returns 1
if an object was removed, 0 otherwise.
</div>

<p>
<b><tt>List *Keys(Hash *h)</tt></b>
</p>

<div class="indent">
Returns the list of hash table keys.
</div>


<H3><a name="Extending_nn17"></a>35.5.3 Lists</H3>


<p>
<b><tt>List *NewList()</tt></b>
</p>

<div class="indent">
Creates a new empty list.
</div>

<p>
<b><tt>List *Copy(List *x)</tt></b>
</p>

<div class="indent">
Make a shallow copy of the List <tt>x</tt>.
</div>

<p>
<b><tt>void Delete(List *x)</tt></b>
</p>

<div class="indent">
Deletes <tt>x</tt>.
</div>

<p>
<b><tt>int Len(List *x)</tt></b>
</p>

<div class="indent">
Returns the number of items in <tt>x</tt>.
</div>

<p>
<b><tt>Object *Getitem(List *x, int n)</tt></b>
</p>

<div class="indent">
Returns an object from <tt>x</tt> with index <tt>n</tt>.  If <tt>n</tt> is
beyond the end of the list, the last item is returned. If <tt>n</tt> is
negative, the first item is returned.
</div>

<p>
<b><tt>int *Setitem(List *x, int n, const Object_or_char *val)</tt></b>
</p>

<div class="indent">
Stores <tt>val</tt> in <tt>x</tt>.
If <tt>val</tt> is not a standard
object (String, Hash, or List) it is assumed to be a <tt>char *</tt> in which
case it is used to construct a <tt>String</tt> that is stored in the list.
<tt>n</tt> must be in range.  Otherwise, an assertion will be raised.
</div>

<p>
<b><tt>int *Delitem(List *x, int n)</tt></b>
</p>

<div class="indent">
Deletes item <tt>n</tt> from the list, shifting items down if necessary.
To delete the last item in the list, use the special value <tt>DOH_END</tt>
for <tt>n</tt>.
</div>

<p>
<b><tt>void Append(List *x, const Object_or_char *t)</tt></b>
</p>

<div class="indent">
Appends <tt>t</tt> to the end of <tt>x</tt>.  If <tt>t</tt> is not
a standard object, it is assumed to be a <tt>char *</tt> and is
used to create a String object.
</div>

<p>
<b><tt>void Insert(String *s, int pos, const Object_or_char *t)</tt></b>
</p>

<div class="indent">
Inserts <tt>t</tt> into <tt>s</tt> at position <tt>pos</tt>.  The contents
of <tt>s</tt> are shifted accordingly.    The special value <tt>DOH_END</tt>
can be used for <tt>pos</tt> to indicate insertion at the end of the list (appending).
If <tt>t</tt> is not a standard object, it is assumed to be a <tt>char *</tt>
and is used to create a String object.
</div>

<H3><a name="Extending_nn18"></a>35.5.4 Common operations</H3>


The following operations are applicable to all datatypes.

<p>
<b><tt>Object *Copy(Object *x)</tt></b>
</p>

<div class="indent">
Make a copy of the object <tt>x</tt>.
</div>

<p>
<b><tt>void Delete(Object *x)</tt></b>
</p>

<div class="indent">
Deletes <tt>x</tt>.
</div>

<p>
<b><tt>void Setfile(Object *x, String_or_char *f)</tt></b>
</p>

<div class="indent">
Sets the filename associated with <tt>x</tt>.  Used to track
objects and report errors.
</div>

<p>
<b><tt>String *Getfile(Object *x)</tt></b>
</p>

<div class="indent">
Gets the filename associated with <tt>x</tt>.
</div>

<p>
<b><tt>void Setline(Object *x, int n)</tt></b>
</p>

<div class="indent">
Sets the line number associated with <tt>x</tt>.  Used to track
objects and report errors.
</div>

<p>
<b><tt>int Getline(Object *x)</tt></b>
</p>

<div class="indent">
Gets the line number associated with <tt>x</tt>.
</div>

<H3><a name="Extending_nn19"></a>35.5.5 Iterating over Lists and Hashes</H3>


To iterate over the elements of a list or a hash table, the following functions are used:

<p>
<b><tt>Iterator First(Object *x)</tt></b>
</p>

<div class="indent">
Returns an iterator object that points to the first item in a list or hash table.  The
<tt>item</tt> attribute of the Iterator object is a pointer to the item.  For hash tables, the <tt>key</tt> attribute
of the Iterator object additionally points to the corresponding Hash table key.  The <tt>item</tt> and <tt>key</tt> attributes
are NULL if the object contains no items or if there are no more items.
</div>

<p>
<b><tt>Iterator Next(Iterator i)</tt></b>
</p>

<div class="indent">
<p>Returns an iterator that points to the next item in a list or hash table.

Here are two examples of iteration:</p>

<div class="code">
<pre>
List *l = (some list);
Iterator i;

for (i = First(l); i.item; i = Next(i)) {
    Printf(stdout,"%s\n", i.item);
}

Hash *h = (some hash);
Iterator j;

for (j = First(j); j.item; j= Next(j)) {
    Printf(stdout,"%s : %s\n", j.key, j.item);
}
</pre>
</div>

</div>

<H3><a name="Extending_nn20"></a>35.5.6 I/O</H3>


Special I/O functions are used for all internal I/O.  These operations
work on C <tt>FILE *</tt> objects, String objects, and special <tt>File</tt> objects
(which are merely a wrapper around <tt>FILE *</tt>).

<p>
<b><tt>int Printf(String_or_FILE *f, const char *fmt, ...)</tt></b>
</p>

<div class="indent">
Formatted I/O.   Same as the C <tt>fprintf()</tt> function except that output
can also be directed to a string object.  Note:  the <tt>%s</tt> format
specifier works with both strings and <tt>char *</tt>.  All other format
operators have the same meaning.
</div>

<p>
<b><tt>int Printv(String_or_FILE *f, String_or_char *arg1,..., NULL)</tt></b>
</p>

<div class="indent">
Prints a variable number of strings arguments to the output.  The last
argument to this function must be NULL.   The other arguments can either
be <tt>char *</tt> or string objects.
</div>

<p>
<b><tt>int Putc(int ch, String_or_FILE *f)</tt></b>
</p>

<div class="indent">
Same as the C <tt>fputc()</tt> function.
</div>

<p>
<b><tt>int Write(String_or_FILE *f, void *buf, int len)</tt></b>
</p>

<div class="indent">
Same as the C <tt>write()</tt> function.
</div>

<p>
<b><tt>int Read(String_or_FILE *f, void *buf, int maxlen)</tt></b>
</p>

<div class="indent">
Same as the C <tt>read()</tt> function.
</div>

<p>
<b><tt>int Getc(String_or_FILE *f)</tt></b>
</p>

<div class="indent">
Same as the C <tt>fgetc()</tt> function.
</div>

<p>
<b><tt>int Ungetc(int ch, String_or_FILE *f)</tt></b>
</p>

<div class="indent">
Same as the C <tt>ungetc()</tt> function.
</div>

<p>
<b><tt>int Seek(String_or_FILE *f, int offset, int whence)</tt></b>
</p>

<div class="indent">
Same as the C <tt>seek()</tt> function.  <tt>offset</tt> is the number
of bytes.  <tt>whence</tt> is one of <tt>SEEK_SET</tt>,<tt>SEEK_CUR</tt>,
or <tt>SEEK_END</tt>..
</div>

<p>
<b><tt>long Tell(String_or_FILE *f)</tt></b>
</p>

<div class="indent">
Same as the C <tt>tell()</tt> function.
</div>

<p>
<b><tt>File *NewFile(const char *filename, const char *mode, List *newfiles)</tt></b>
</p>

<div class="indent">
Create a File object using the <tt>fopen()</tt> library call.  This
file differs from <tt>FILE *</tt> in that it can be placed in the standard
SWIG containers (lists, hashes, etc.). The <tt>filename</tt> is added to the 
<tt>newfiles</tt> list if <tt>newfiles</tt> is non-zero and the file was created successfully.
</div>

<p>
<b><tt>File *NewFileFromFile(FILE *f)</tt></b>
</p>

<div class="indent">
Create a File object wrapper around an existing <tt>FILE *</tt> object.
</div>

<p>
<b><tt>int Close(String_or_FILE *f)</tt></b>
</p>

<div class="indent">
<p>Closes a file.  Has no effect on strings.</p>

<p>
The use of the above I/O functions and strings play a critical role in SWIG.   It is
common to see small code fragments of code generated using code like this:
</p>

<div class="code">
<pre>
/* Print into a string */
String *s = NewString("");
Printf(s,"Hello\n");
for (i = 0; i &lt; 10; i++) {
    Printf(s,"%d\n", i);
}
...
/* Print string into a file */
Printf(f, "%s\n", s);
</pre>
</div>

<p>
Similarly, the preprocessor and parser all operate on string-files.
</p>

</div>

<H2><a name="Extending_nn21"></a>35.6 Navigating and manipulating parse trees</H2>


Parse trees are built as collections of hash tables.   Each node is a hash table in which
arbitrary attributes can be stored.  Certain attributes in the hash table provide links to
other parse tree nodes.   The following macros can be used to move around the parse tree.

<p>
<b><tt>String *nodeType(Node *n)</tt></b>
</p>

<div class="indent">
Returns the node type tag as a string.  The returned string indicates the type of parse
tree node.
</div>

<p>
<b><tt>Node *nextSibling(Node *n)</tt></b>
</p>

<div class="indent">
Returns the next node in the parse tree.  For example, the next C declaration.
</div>

<p>
<b><tt>Node *previousSibling(Node *n)</tt…