PageRenderTime 21ms CodeModel.GetById 16ms app.highlight 2ms RepoModel.GetById 0ms app.codeStats 0ms

/tags/rel-1-3-25/swigweb/article_cpp.ht

#
Unknown | 275 lines | 225 code | 50 blank | 0 comment | 0 complexity | ac73a5a0e01ee1dbd4005cc687d4999d MD5 | raw file
Possible License(s): LGPL-2.1, Cube, GPL-3.0, 0BSD, GPL-2.0
  1Thoughts on the Insanity C++ Parsing
  2
  3<h2>Thoughts on the Insanity of C++ Parsing</h2>
  4
  5<center>
  6<em>
  7"Parsing C++ is simply too complex to do correctly." -- Anonymous
  8</em>
  9</center>
 10<p>
 11Author: David Beazley (beazley@cs.uchicago.edu)
 12
 13<p>
 14August 12, 2002
 15
 16<p>
 17A central goal of the SWIG project is to generate extension modules by
 18parsing the contents of C++ header files.  It's not too hard to come up
 19with reasons why this might be useful---after all, if you've got
 20several hundred class definitions, do you really want to go off and
 21write a bunch of hand-crafted wrappers?  No, of course not---you're
 22busy and like everyone else, you've got better things to do with
 23your time.  
 24
 25<p>
 26Okay, so there are many reasons why parsing C++ would be nice.
 27However, parsing C++ is also a nightmare.  In fact, C++ would
 28probably the last language that any normal person would choose to
 29serve as an interface specification language.  It's hard to parse,
 30hard to analyze, and it involves all sorts
 31of nasty little problems related to scoping, typenames, templates,
 32access, and so forth.  Because of this, most of the tools that claim
 33to "parse" C++ don't.  Instead, they parse a subset of the language
 34that happens to match the C++ programming style used by the tool's
 35creator (believe me, I know---this is how SWIG started).  Not
 36surprisingly, these tools tend to break down when presented with code
 37that starts to challenge the capabilities of the C++ compiler.
 38Needless to say, critics see this as opportunity to make bold claims
 39such as "writing a C++ parser is folly" or "this whole approach is too
 40hard to ever work correctly."
 41
 42<p>
 43Well, one does have to give the critics a little credit---writing a
 44C++ parser certainly <em>is</em> hard and writing a parser that
 45actually works correctly is even harder.  However, these tasks are
 46certainly not "impossible."  After all, there would be no working C++
 47compiler if such claims were true!  Therefore, the question of whether
 48or not a wrapper generator can parse C++ is clearly the wrong question
 49to ask.  Instead, the real question is whether or not a wrapper
 50generation tool that parses C++ can actually do anything useful.
 51
 52<h3>The problem with using C++ as an interface definition language</h3>
 53
 54If you cut through all of the low-level details of parsing, the primary
 55problem of using C++ as an module specification language is that of
 56ambiguity. Consider a declaration like this:
 57
 58<blockquote>
 59<pre>
 60void foo(double *x, int n);
 61</pre>
 62</blockquote>
 63
 64If you look at this declaration, you can ask yourself the question,
 65what is "x"?  Is it a single input value?  Is it an output value
 66(modified by the function)?  Is it an array?  Is "n" somehow related?
 67Perhaps the real problem in this example is that of expressing the
 68programmer's intent.  Yes, the function clearly accepts a pointer to
 69some object and an integer, but the declaration does not contain
 70enough additional information to determine the purpose of these
 71parameters--information that could be useful in generating a suitable
 72set of a wrappers.
 73
 74<p>
 75IDL compilers associated with popular component frameworks (e.g.,
 76CORBA, COM, etc.)  get around this problem by requiring interfaces to
 77be precisely specified--input and output values are clearly indicated
 78as such.  Thus, one might adopt a similar approach and extend C++
 79syntax with some special modifiers or qualifiers.  For example:
 80
 81<blockquote>
 82<pre>
 83void foo(%output double *x, int n);
 84</pre>
 85</blockquote>
 86
 87The problem with this approach is that it breaks from C++ syntax and
 88it requires the user to annotate their input files (a task that C++
 89wrapper generators are supposed to eliminate).  Meanwhile, critics sit
 90back and say "Ha! I told you C++ parsing would never work."
 91
 92<p>
 93Another problem with using C++ as an input language is that interface
 94building often involves more than just blindly wrapping declarations.  For instance,
 95users might want to rename declarations, specify exception handling procedures,
 96add customized code, and so forth.  This suggests that a 
 97wrapper generator really needs to do
 98more than just parse C++---it must give users the freedom to customize
 99various aspects of the wrapper generation process.  Again, things aren't
100looking too good for C++.
101
102<h3>The SWIG approach: pattern matching</h3>
103
104SWIG takes a different approach to the C++ wrapping problem.
105Instead of trying to modify C++ with all sorts of little modifiers and
106add-ons, wrapping is largely controlled by a pattern matching mechanism that is
107built into the underlying C++ type system.
108
109<p>
110One part of the pattern matcher is programmed to look for specific sequences of
111datatypes and argument names.   These patterns, known as typemaps, are
112responsible for all aspects of data conversion.  They work by simply attaching
113bits of C conversion code to specific datatypes and argument names in the
114input file.  For example, a typemap might be used like this:
115
116<blockquote>
117<pre>
118%typemap(in) <b>double *items</b> {
119   // Get an array from the input
120   ...
121}
122... 
123void foo(<b>double *items</b>, int n);
124</pre>
125</blockquote>
126
127With this approach, type and argument names are used as
128a basis for specifying customized wrapping behavior.  For example, if a program
129always used an argument of <tt>double *items</tt> to refer to an
130array, SWIG can latch onto that and use it to provide customized
131processing.  It is even possible to write pattern matching rules for
132sequences of arguments.  For example, you could write the following:
133
134<blockquote>
135<pre>
136%typemap(in) (<b>double *items, int n</b>) {
137   // Get an array of items. Set n to number of items
138   ...
139}
140...
141void foo(<b>double *items, int n</b>);
142</pre>
143</blockquote>
144
145The precise details of typemaps are not so important (in fact, most of
146this pattern matching is hidden from SWIG users).  What is important
147is that pattern matching allows customized data handling to be
148specified without breaking C++ syntax--instead, a user merely has to
149define a few patterns that get applied across the declarations that
150appear in C++ header files.  In some sense, you might view this
151approach as providing customization through naming conventions rather than
152having to annotate arguments with extra qualifiers.
153
154<p>
155The other pattern matching mechanism used by SWIG is a declaration annotator
156that is used to attach properties to specific declarations.  A simple example of declaration
157annotation might be renaming.  For example:
158
159<blockquote>
160<pre>
161%rename(cprint) print;  // Rename all occurrences of 'print' to 'cprint'
162</pre>
163</blockquote>
164
165A more advanced form of declaration matching would be exception handling.
166For example:
167
168<blockquote>
169<pre>
170%exception Foo::getitem(int) {
171    try {
172        $action
173    } catch (std::out_of_range&amp; e) {
174        SWIG_exception(SWIG_IndexError,const_cast&lt;char*&gt;(e.what()));
175    }
176}
177
178...
179template&lt;class T&gt; class Foo {
180public:
181     ...
182     T &amp;getitem(int index);    // Exception handling code attached
183     ...
184};
185</pre>
186</blockquote>
187
188Like typemaps, declaration matching does not break from C++ syntax.
189Instead, a user merely specifies special processing rules in advance.
190These rules are then attached to any matching C++
191declaration that appears later in the input.  This means that raw C++
192header files can often be parsed and customized with few, if any,
193modifications.
194
195<h3>The SWIG difference</h3>
196
197Pattern based approaches to wrapper code generation are not unique to SWIG.
198However, most prior efforts have based their pattern matching engines on simple
199regular-expression matching. The key difference between SWIG and these systems 
200is that SWIG's customization features are fully integrated into the
201underlying C++ type system.  This means that SWIG is able to deal with very 
202complicated types of C/C++ code---especially code that makes heavy use of
203<tt>typedef</tt>, namespaces, aliases, class hierarchies, and more.  To
204illustrate, consider some code like this:
205
206<blockquote>
207<pre>
208// A simple SWIG typemap 
209%typemap(in) int {
210    $1 = PyInt_AsLong($input);
211}
212
213...
214// Some raw C++ code (included later)
215namespace X {
216  typedef int Integer;
217
218  class _FooImpl {
219  public:
220      typedef Integer value_type;
221  };
222  typedef _FooImpl Foo;
223}
224
225namespace Y = X;
226using Y::Foo;
227
228class Bar : public Foo {
229};
230
231void spam(Bar::value_type x);
232</pre>
233</blockquote>
234
235If you trace your way through this example, you will find that the
236<tt>Bar::value_type</tt> argument to function <tt>spam()</tt> is
237really an integer.  What's more, if you take a close look at the SWIG
238generated wrappers, you will find that the typemap pattern defined for
239<tt>int</tt> is applied to it--in other words, SWIG does exactly the right thing despite
240our efforts to make the code confusing.
241
242<p>
243Similarly, declaration annotation is integrated into the type system
244and can be used to define properties that span inheritance hierarchies
245and more (in fact, there are many similarities between the operation of
246SWIG and tools developed for Aspect Oriented Programming).
247
248<h3>What does this mean?</h3>
249
250Pattern-based approaches allow wrapper generation tools to parse C++
251declarations and to provide a wide variety of high-level customization
252features.  Although this approach is quite different than that found
253in a typical IDL, the use of patterns makes it possible to work from
254existing header files without having to make many (if any) changes to
255those files.  Moreover, when the underlying pattern matching mechanism
256is integrated with the C++ type system, it is possible to build
257reliable wrappers to real software---even if that software is filled
258with namespaces, templates, classes, <tt>typedef</tt> declarations,
259pointers, and other bits of nastiness.
260
261<h3>The bottom line</h3>
262
263Not only is it possible to generate extension modules by parsing C++,
264it is possible to do so with real software and with a high degree of
265reliability.  Don't believe me?  Download SWIG-1.3.14 and try it for
266yourself.
267
268
269
270
271
272
273
274
275