/tags/rel-1-3-25/swigweb/article_cpp.ht
Unknown | 275 lines | 225 code | 50 blank | 0 comment | 0 complexity | ac73a5a0e01ee1dbd4005cc687d4999d MD5 | raw file
Possible License(s): LGPL-2.1, Cube, GPL-3.0, 0BSD, GPL-2.0
- Thoughts on the Insanity C++ Parsing
- <h2>Thoughts on the Insanity of C++ Parsing</h2>
- <center>
- <em>
- "Parsing C++ is simply too complex to do correctly." -- Anonymous
- </em>
- </center>
- <p>
- Author: David Beazley (beazley@cs.uchicago.edu)
- <p>
- August 12, 2002
- <p>
- A central goal of the SWIG project is to generate extension modules by
- parsing the contents of C++ header files. It's not too hard to come up
- with reasons why this might be useful---after all, if you've got
- several hundred class definitions, do you really want to go off and
- write a bunch of hand-crafted wrappers? No, of course not---you're
- busy and like everyone else, you've got better things to do with
- your time.
- <p>
- Okay, so there are many reasons why parsing C++ would be nice.
- However, parsing C++ is also a nightmare. In fact, C++ would
- probably the last language that any normal person would choose to
- serve as an interface specification language. It's hard to parse,
- hard to analyze, and it involves all sorts
- of nasty little problems related to scoping, typenames, templates,
- access, and so forth. Because of this, most of the tools that claim
- to "parse" C++ don't. Instead, they parse a subset of the language
- that happens to match the C++ programming style used by the tool's
- creator (believe me, I know---this is how SWIG started). Not
- surprisingly, these tools tend to break down when presented with code
- that starts to challenge the capabilities of the C++ compiler.
- Needless to say, critics see this as opportunity to make bold claims
- such as "writing a C++ parser is folly" or "this whole approach is too
- hard to ever work correctly."
- <p>
- Well, one does have to give the critics a little credit---writing a
- C++ parser certainly <em>is</em> hard and writing a parser that
- actually works correctly is even harder. However, these tasks are
- certainly not "impossible." After all, there would be no working C++
- compiler if such claims were true! Therefore, the question of whether
- or not a wrapper generator can parse C++ is clearly the wrong question
- to ask. Instead, the real question is whether or not a wrapper
- generation tool that parses C++ can actually do anything useful.
- <h3>The problem with using C++ as an interface definition language</h3>
- If you cut through all of the low-level details of parsing, the primary
- problem of using C++ as an module specification language is that of
- ambiguity. Consider a declaration like this:
- <blockquote>
- <pre>
- void foo(double *x, int n);
- </pre>
- </blockquote>
- If you look at this declaration, you can ask yourself the question,
- what is "x"? Is it a single input value? Is it an output value
- (modified by the function)? Is it an array? Is "n" somehow related?
- Perhaps the real problem in this example is that of expressing the
- programmer's intent. Yes, the function clearly accepts a pointer to
- some object and an integer, but the declaration does not contain
- enough additional information to determine the purpose of these
- parameters--information that could be useful in generating a suitable
- set of a wrappers.
- <p>
- IDL compilers associated with popular component frameworks (e.g.,
- CORBA, COM, etc.) get around this problem by requiring interfaces to
- be precisely specified--input and output values are clearly indicated
- as such. Thus, one might adopt a similar approach and extend C++
- syntax with some special modifiers or qualifiers. For example:
- <blockquote>
- <pre>
- void foo(%output double *x, int n);
- </pre>
- </blockquote>
- The problem with this approach is that it breaks from C++ syntax and
- it requires the user to annotate their input files (a task that C++
- wrapper generators are supposed to eliminate). Meanwhile, critics sit
- back and say "Ha! I told you C++ parsing would never work."
- <p>
- Another problem with using C++ as an input language is that interface
- building often involves more than just blindly wrapping declarations. For instance,
- users might want to rename declarations, specify exception handling procedures,
- add customized code, and so forth. This suggests that a
- wrapper generator really needs to do
- more than just parse C++---it must give users the freedom to customize
- various aspects of the wrapper generation process. Again, things aren't
- looking too good for C++.
- <h3>The SWIG approach: pattern matching</h3>
- SWIG takes a different approach to the C++ wrapping problem.
- Instead of trying to modify C++ with all sorts of little modifiers and
- add-ons, wrapping is largely controlled by a pattern matching mechanism that is
- built into the underlying C++ type system.
- <p>
- One part of the pattern matcher is programmed to look for specific sequences of
- datatypes and argument names. These patterns, known as typemaps, are
- responsible for all aspects of data conversion. They work by simply attaching
- bits of C conversion code to specific datatypes and argument names in the
- input file. For example, a typemap might be used like this:
- <blockquote>
- <pre>
- %typemap(in) <b>double *items</b> {
- // Get an array from the input
- ...
- }
- ...
- void foo(<b>double *items</b>, int n);
- </pre>
- </blockquote>
- With this approach, type and argument names are used as
- a basis for specifying customized wrapping behavior. For example, if a program
- always used an argument of <tt>double *items</tt> to refer to an
- array, SWIG can latch onto that and use it to provide customized
- processing. It is even possible to write pattern matching rules for
- sequences of arguments. For example, you could write the following:
- <blockquote>
- <pre>
- %typemap(in) (<b>double *items, int n</b>) {
- // Get an array of items. Set n to number of items
- ...
- }
- ...
- void foo(<b>double *items, int n</b>);
- </pre>
- </blockquote>
- The precise details of typemaps are not so important (in fact, most of
- this pattern matching is hidden from SWIG users). What is important
- is that pattern matching allows customized data handling to be
- specified without breaking C++ syntax--instead, a user merely has to
- define a few patterns that get applied across the declarations that
- appear in C++ header files. In some sense, you might view this
- approach as providing customization through naming conventions rather than
- having to annotate arguments with extra qualifiers.
- <p>
- The other pattern matching mechanism used by SWIG is a declaration annotator
- that is used to attach properties to specific declarations. A simple example of declaration
- annotation might be renaming. For example:
- <blockquote>
- <pre>
- %rename(cprint) print; // Rename all occurrences of 'print' to 'cprint'
- </pre>
- </blockquote>
- A more advanced form of declaration matching would be exception handling.
- For example:
- <blockquote>
- <pre>
- %exception Foo::getitem(int) {
- try {
- $action
- } catch (std::out_of_range& e) {
- SWIG_exception(SWIG_IndexError,const_cast<char*>(e.what()));
- }
- }
- ...
- template<class T> class Foo {
- public:
- ...
- T &getitem(int index); // Exception handling code attached
- ...
- };
- </pre>
- </blockquote>
- Like typemaps, declaration matching does not break from C++ syntax.
- Instead, a user merely specifies special processing rules in advance.
- These rules are then attached to any matching C++
- declaration that appears later in the input. This means that raw C++
- header files can often be parsed and customized with few, if any,
- modifications.
- <h3>The SWIG difference</h3>
- Pattern based approaches to wrapper code generation are not unique to SWIG.
- However, most prior efforts have based their pattern matching engines on simple
- regular-expression matching. The key difference between SWIG and these systems
- is that SWIG's customization features are fully integrated into the
- underlying C++ type system. This means that SWIG is able to deal with very
- complicated types of C/C++ code---especially code that makes heavy use of
- <tt>typedef</tt>, namespaces, aliases, class hierarchies, and more. To
- illustrate, consider some code like this:
- <blockquote>
- <pre>
- // A simple SWIG typemap
- %typemap(in) int {
- $1 = PyInt_AsLong($input);
- }
- ...
- // Some raw C++ code (included later)
- namespace X {
- typedef int Integer;
- class _FooImpl {
- public:
- typedef Integer value_type;
- };
- typedef _FooImpl Foo;
- }
- namespace Y = X;
- using Y::Foo;
- class Bar : public Foo {
- };
- void spam(Bar::value_type x);
- </pre>
- </blockquote>
- If you trace your way through this example, you will find that the
- <tt>Bar::value_type</tt> argument to function <tt>spam()</tt> is
- really an integer. What's more, if you take a close look at the SWIG
- generated wrappers, you will find that the typemap pattern defined for
- <tt>int</tt> is applied to it--in other words, SWIG does exactly the right thing despite
- our efforts to make the code confusing.
- <p>
- Similarly, declaration annotation is integrated into the type system
- and can be used to define properties that span inheritance hierarchies
- and more (in fact, there are many similarities between the operation of
- SWIG and tools developed for Aspect Oriented Programming).
- <h3>What does this mean?</h3>
- Pattern-based approaches allow wrapper generation tools to parse C++
- declarations and to provide a wide variety of high-level customization
- features. Although this approach is quite different than that found
- in a typical IDL, the use of patterns makes it possible to work from
- existing header files without having to make many (if any) changes to
- those files. Moreover, when the underlying pattern matching mechanism
- is integrated with the C++ type system, it is possible to build
- reliable wrappers to real software---even if that software is filled
- with namespaces, templates, classes, <tt>typedef</tt> declarations,
- pointers, and other bits of nastiness.
- <h3>The bottom line</h3>
- Not only is it possible to generate extension modules by parsing C++,
- it is possible to do so with real software and with a high degree of
- reliability. Don't believe me? Download SWIG-1.3.14 and try it for
- yourself.