#
LaTeX | 1347 lines | 1073 code | 208 blank | 66 comment | 0 complexity | d1bffb819b928936709585ec80b9bb01 MD5 | raw file
Possible License(s): LGPL-2.1, Cube, GPL-3.0, 0BSD, GPL-2.0

Large files files are truncated, but you can click here to view the full file

   1%template for producing IEEE-format articles using LaTeX.
2%written by Matthew Ward, CS Department, Worcester Polytechnic Institute.
3%use at your own risk.  Complaints to /dev/null.
4%make two column with no page numbering, default is 10 point
5%\documentstyle{article}
6\documentstyle[twocolumn,times]{article}
7\pagestyle{empty}
8
9%set dimensions of columns, gap between columns, and space between paragraphs
10%\setlength{\textheight}{8.75in}
11\setlength{\textheight}{9.0in}
12\setlength{\columnsep}{0.25in}
13\setlength{\textwidth}{6.45in}
14\setlength{\footheight}{0.0in}
15\setlength{\topmargin}{0.0in}
18\setlength{\oddsidemargin}{0in}
19%\setlength{\oddsidemargin}{-.065in}
20%\setlength{\oddsidemargin}{-.17in}
21%\setlength{\parindent}{0pc}
22
23%I copied stuff out of art10.sty and modified them to conform to IEEE format
24
25\makeatletter
26%as Latex considers descenders in its calculation of interline spacing,
27%to get 12 point spacing for normalsize text, must set it to 10 points
28\def\@normalsize{\@setsize\normalsize{12pt}\xpt\@xpt
29\abovedisplayskip 10pt plus2pt minus5pt\belowdisplayskip \abovedisplayskip
30\abovedisplayshortskip \z@ plus3pt\belowdisplayshortskip 6pt plus3pt
31minus3pt\let\@listi\@listI}
32
33%need an 11 pt font size for subsection and abstract headings
34\def\subsize{\@setsize\subsize{12pt}\xipt\@xipt}
35
36%make section titles bold and 12 point, 2 blank lines before, 1 after
37\def\section{\@startsection {section}{1}{\z@}{24pt plus 2pt minus 2pt}
38{12pt plus 2pt minus 2pt}{\large\bf}}
39
40%make subsection titles bold and 11 point, 1 blank line before, 1 after
41\def\subsection{\@startsection {subsection}{2}{\z@}{12pt plus 2pt minus 2pt}
42{12pt plus 2pt minus 2pt}{\subsize\bf}}
43\makeatother
44
45\newcommand{\ignore}[1]{}
46%\renewcommand{\thesubsection}{\arabic{subsection}.}
47
48\begin{document}
49
50%don't want date printed
51\date{}
52
53%make title bold and 14 pt font (Latex default is non-bold, 16 pt)
54\title{\Large \bf   An Embedded Error Recovery and Debugging Mechanism for Scripting Language Extensions}
55
56%for single author (just remove % characters)
57\author{{David M.\ Beazley} \\
58{\em Department of Computer Science} \\
59{\em University of Chicago }\\
60{\em Chicago, Illinois 60637 }\\
61{\em beazley@cs.uchicago.edu }}
62
63%  My Department \\
64%  My Institute \\
65%  My City, ST, zip}
66
67%for two authors (this is what is printed)
68%\author{\begin{tabular}[t]{c@{\extracolsep{8em}}c}
69%  Roscoe Giles	                        & Pablo Tamayo \\
70% \\
71%  Department of Electrical, Computer,   & Thinking Machines Corp. \\
72%  and Systems Engineering               & Cambridge, MA~~02142.  \\
73%  and                                   & \\
74%  Center for Computational Science      & \\
75%  Boston University, Boston, MA~~02215. &
76%\end{tabular}}
77
78\maketitle
79
80%I don't know why I have to reset thispagesyle, but otherwise get page numbers
81\thispagestyle{empty}
82
83
84\subsection*{Abstract}
85{\em
86In recent years, scripting languages such as Perl, Python, and Tcl
87have become popular development tools for the creation of
88sophisticated application software.  One of the most useful features
89of these languages is their ability to easily interact with compiled
90languages such as C and C++.  Although this mixed language approach
91has many benefits, one of the greatest drawbacks is the complexity of
92debugging that results from using interpreted and compiled code in the
93same application.  In part, this is due to the fact that scripting
94language interpreters are unable to recover from catastrophic errors
95in compiled extension code. Moreover, traditional C/C++ debuggers
96do not provide a satisfactory degree of integration with interpreted
97languages.  This paper describes an experimental system in which fatal
98extension errors such as segmentation faults, bus errors, and failed
99assertions are handled as scripting language exceptions.  This system,
100which has been implemented as a general purpose shared library,
101requires no modifications to the target scripting language, introduces
102no performance penalty, and simplifies the debugging of mixed
103interpreted-compiled application software.
104}
105
106\section{Introduction}
107
108Slightly more than ten years have passed since John Ousterhout
109introduced the Tcl scripting language at the 1990 USENIX technical
110conference \cite{ousterhout}.  Since then, scripting languages have
111been gaining in popularity as evidenced by the wide-spread use of
112systems such as Tcl, Perl, Python, Guile, PHP, and Ruby
113\cite{ousterhout,perl,python,guile,php,ruby}.
114
115In part, the success of modern scripting languages is due to their
116ability to be easily integrated with software written in compiled
117languages such as C, C++, and Fortran.  In addition, a wide variety of wrapper
118generation tools can be used
119to automatically produce bindings between existing code and a
120variety of scripting language environments
121\cite{swig,sip,pyfort,f2py,advperl,heidrich,vtk,gwrap,wrappy}.  As a result, a large number of
122programmers are now using scripting languages to control
123complex C/C++ programs or as a tool for re-engineering legacy
124software.  This approach is attractive because it allows programmers
125to benefit from the flexibility and rapid development of
126scripting while retaining the best features of compiled code such as high
127performance \cite{ouster1}.
128
129A critical aspect of scripting-compiled code integration is the way in
130which it departs from traditional C/C++ development and shell
131scripting.  Rather than building stand-alone applications that run as
132separate processes, extension programming encourages a style of
133programming in which components are tightly integrated within
134an interpreter that is responsible for high-level control.
135Because of this, scripted software tends to rely heavily
137third-party extensions. In this sense, one might argue that the
138benefits of scripting are achieved at the expense of creating a
139more complicated development environment.
140
141A consequence of this complexity is an increased degree of difficulty
142associated with debugging programs that utilize multiple languages,
143dynamically loadable modules, and a sophisticated runtime environment.
144To address this problem, this paper describes an experimental system
145known as WAD (Wrapped Application Debugger) in which an embedded error
146reporting and debugging mechanism is added to common scripting
147languages.  This system converts catastrophic signals such as
148segmentation faults and failed assertions to exceptions that can be
149handled by the scripting language interpreter.  In doing so, it
150provides more seamless integration between error handling in
151scripting language interpreters and compiled extensions.
152
153\section{The Debugging Problem}
154
155Normally, a programming error in a scripted application
156results in an exception that describes the problem and the context in
157which it occurred.  For example, an error in a Python script might
158produce a traceback similar to the following:
159
160\begin{verbatim}
161% python foo.py
162Traceback (innermost last):
163  File "foo.py", line 11, in ?
164    foo()
165  File "foo.py", line 8, in foo
166    bar()
167  File "foo.py", line 5, in bar
168    spam()
169  File "foo.py", line 2, in spam
170    doh()
171NameError: doh
172\end{verbatim}
173
174In this case, a programmer might be able to apply a fix simply based
175on information in the traceback.  Alternatively, if the problem is
176more complicated, a script-level debugger can be used to provide more
177information.  In contrast, a failure in compiled extension code might
178produce the following result:
179
180\begin{verbatim}
181% python foo.py
182Segmentation Fault (core dumped)
183\end{verbatim}
184
185In this case, the user has no idea of what has happened other than it
186appears to be very bad.''  Furthermore, script-level debuggers are
187unable to identify the problem since they also crash when the error
188occurs (they run in the same process as the interpreter).  This means
189that the only way for a user to narrow the source of the problem
190within a script is through trial-and-error techniques such as
191inserting print statements, commenting out sections of scripts, or
192having a deep intuition of the underlying implementation. Obviously,
193none of these techniques are particularly elegant.
194
195An alternative approach is to run the application under the control of
196a traditional debugger such as gdb \cite{gdb}.  Although this provides
197some information about the error, the debugger mostly provides
198detailed information about the internal implementation of the
199scripting language interpreter instead of the script-level code that
200was running at the time of the error.  Needless to say, this information
201isn't very useful to most programmers.
202A related problem is that
203the structure of a scripted application tends to be much more complex
204than a traditional stand-alone program.  As a result, a user may not
205have a good sense of how to actually attach an external debugger to their
206script.  In addition, execution may occur within a
207complex run-time environment involving events, threads, and network
208connections.  Because of this, it can be difficult for the user to reproduce
209and identify certain types of catastrophic errors if they depend on
210timing or unusual event sequences. Finally, this approach
211requires a programmer to have a C development environment installed on
212their machine.  Unfortunately, this may not hold in practice.
213This is because scripting languages are often used to provide programmability to
214applications where end-users write scripts, but do not write low-level C code.
215
216Even if a traditional debugger such as gdb were modified to provide
217better integration with scripting languages, it is not clear that this
218would be the most natural solution to the problem.  For one,
219having to run a separate debugging process to debug
220extension code is unnatural when no such requirement exists for
221scripts.  Moreover, even if such a debugger existed, an
222inexperienced user may not have the expertise or inclination to use
223it.  Finally, obscure fatal errors may occur long after an application
224has been deployed.  Unless the debugger is distributed along with the
225application in some manner, it will be extraordinary difficult to
226obtain useful diagnostics when such errors occur.
227
228\begin{figure*}[t]
229{\small
230\begin{verbatim}
231% python foo.py
232Traceback (most recent call last):
233  File "<stdin>", line 1, in ?
234  File "foo.py", line 16, in ?
235    foo()
236  File "foo.py", line 13, in foo
237    bar()
238  File "foo.py", line 10, in bar
239    spam()
240  File "foo.py", line 7, in spam
241    doh.doh(a,b,c)
242
243SegFault: [ C stack trace ]
244
245#2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0) in 'ceval.c',line 2650
246#1 0xff083544 in _wrap_doh(self=0x0,args=0x1a1ccc) in 'foo_wrap.c',line 745
247#0 0xfe7e0568 in doh(a=3,b=4,c=0x0) in 'foo.c',line 28
248
250
251    int doh(int a, int b, int *c) {
252 =>   *c = a + b;
253      return *c;
254    }
255\end{verbatim}
256}
257\caption{Cross language traceback generated by WAD for a segmentation fault in a Python extension}
258\end{figure*}
259
260The current state of the art in extension debugging is to simply add
261as much error checking as possible to extension modules. This is never
262a bad thing to do, but in practice it's usually not enough to
263eliminate every possible problem.  For one, scripting languages are
264sometimes used to control hundreds of thousands to millions of lines
265of compiled code.  In this case, it is improbable that a programmer will
266foresee every conceivable error.  In addition, scripting languages are
267often used to put new user interfaces on legacy software. In this
268case, scripting may introduce new modes of execution that cause a
269formerly bug-free'' application to fail in an unexpected manner.
270Finally, certain types of errors such as floating-point exceptions can
271be particularly difficult to eliminate because they might be generated
272algorithmically (e.g., as the result of instability in a numerical
273method). Therefore, even if a programmer has worked hard to eliminate
274crashes, there is usually a small probability that an application may
275fail under unusual circumstances.
276
277\section{Embedded Error Reporting}
278
279Rather than modifying an existing debugger to support scripting
280languages, an alternative approach is to add a more powerful error
281handling and reporting mechanism to the scripting language
282interpreter.  We have implemented this approach in the form of an
285language extension module or linked to existing extension modules as a
286library.  The core of the system is generic and requires no
287modifications to the scripting interpreter or existing extension
288modules.  Furthermore, the system does not introduce a performance
289penalty as it does not rely upon program instrumentation or tracing.
290
291WAD works by converting fatal signals such as SIGSEGV,
292SIGBUS, SIGFPE, and SIGABRT into scripting language exceptions that contain
293debugging information collected from the call-stack of compiled
294extension code.  By handling errors in this manner, the scripting
295language interpreter is able to produce a cross-language stack trace that
296contains information from both the script code and extension code as
297shown for Python and Tcl/Tk in Figures 1 and 2.  In this case, the user
298is given a very clear idea of what has happened without having
299to launch a separate debugger.
300
301The advantage to this approach is that it provides more seamless
302integration between error handling in scripts and error handling in
303extensions.  In addition, it eliminates the most common debugging step
304that a developer is likely to perform in the event of a fatal
305error--running a separate debugger on a core file and typing 'where'
306to get a stack trace.  Finally, this allows end-users to provide
307extension writers with useful debugging information since they can
308supply a stack trace as opposed to a vague complaint that the program
309crashed.''
310
311\begin{figure*}[t]
312\begin{picture}(400,250)(0,0)
313\put(50,-110){\special{psfile = tcl.ps hscale = 60 vscale = 60}}
314\end{picture}
315\caption{Dialog box with WAD generated traceback information for a failed assertion in a Tcl/Tk extension}
316\end{figure*}
317
318\section{Scripting Language Internals}
319
320In order to provide embedded error recovery, it is critical to understand how
321scripting language interpreters interface with extension code.  Despite the wide variety
322of scripting languages, essentially every implementation uses a similar
323technique for accessing foreign code.
324
325Virtually all scripting languages provide an extension mechanism in the form of a foreign function
326interface in which compiled procedures can be called from the scripting language
327interpreter. This is accomplished by writing a collection of wrapper functions that conform
328to a specified calling convention. The primary purpose of the wrappers are to
329marshal arguments and return values between the two languages and to handle errors.
330For example, in Tcl, every wrapper
331function must conform to the following prototype:
332
333\begin{verbatim}
334int
335wrap_foo(ClientData clientData,
336         Tcl_Interp *interp,
337         int objc,
338         Tcl_Obj *CONST objv[])
339{
340    /* Convert arguments */
341    ...
342    /* Call a function */
343
344    result = foo(args);
345    /* Set result */
346    ...
347    if (success) {
348        return TCL_OK;
349    } else {
350        return TCL_ERROR;
351    }
352}
353\end{verbatim}
354
355Another common extension mechanism is an object/type interface that allows programmers to create new
356kinds of fundamental types or attach special properties to objects in
357the interpreter.  For example, both Tcl and Python provide an API for creating new
358built-in'' objects that behave like numbers, strings, lists, etc.
359In most cases, this involves setting up tables of function
360pointers that define various properties of an object.  For example, if
361you wanted to add complex numbers to an interpreter, you might fill in a special
362data structure with pointers to methods that implement various numerical operations like this:
363
364\begin{verbatim}
365NumberMethods ComplexMethods {
367    complex_sub,
368    complex_mul,
369    complex_div,
370    ...
371};\end{verbatim}
372
373\noindent
374Once registered with the interpreter, the methods in this structure
375would be invoked by various interpreter operators such as $+$,
376$-$, $*$, and $/$.
377
378Most interpreters handle errors as a two-step process in which
379detailed error information is first registered with the interpreter
380and then a special error code is returned. For example, in Tcl, errors
381are handled by setting error information in the interpreter and
382returning a value of TCL\_ERROR.  Similarly in Python, errors are
383handled by calling a special function to raise an exception and returning NULL.  In both cases,
384this triggers the interpreter's error handler---possibly resulting in
385a stack trace of the running script.  In some cases, an interpreter
386might handle errors using a form of the C {\tt longjmp} function.
387For example, Perl provides a special function {\tt die} that jumps back
388to the interpreter with a fatal error \cite{advperl}.
389
390The precise implementation details of these mechanisms aren't so
391important for our discussion.  The critical point is that scripting
392languages always access extension code though a well-defined interface
393that precisely defines how arguments are to be passed, values are to be
394returned, and errors are to be handled.
395
396\section{Scripting Languages and Signals}
397
398Under normal circumstances, errors in extension code are handled
399through the error-handling API provided by the scripting language
400interpreter.  For example, if an invalid function parameter is passed,
401a program can simply set an error message and return to the
402interpreter.  Similarly, automatic wrapper generators such as SWIG can produce
403code to convert C++ exceptions and other C-related error handling
404schemes to scripting language errors \cite{swigexcept}. On the other
405hand, segmentation faults, failed assertions, and similar problems
406produce signals that cause the interpreter to abort execution.
407
408Most scripting languages provide limited support for Unix signal
409handling \cite{stevens}.  However, this support is not sufficiently advanced to
410recover from fatal signals produced by extension code.
411Unlike signals generated for asynchronous events such as I/O,
412execution can {\em not} be resumed at the point of a fatal signal.
413Therefore, even if such a signal could be caught and handled by a script,
414there isn't much that it can do except to print a diagnostic
415message and abort before the signal handler returns.  In addition,
416some interpreters block signal delivery while executing
417extension code--opting to handle signals at a time when it is more convenient.
418In this case, a signal such as SIGSEGV would simply cause the whole application
419to freeze since there is no way for execution to continue to a point where
420the signal could be delivered.  Thus, scripting languages tend to
421either ignore the problem or label it as a limitation.''
422
424
425WAD installs a signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL,
426and SIGFPE using the {\tt sigaction} function
427\cite{stevens}. Furthermore, it uses a special option (SA\_SIGINFO) of
428signal handling that passes process context information to the signal
429handler when a signal occurs. Since none of these signals are normally used in the
430implementation of the scripting interpreter or by user scripts,
431this does not usually override any previous signal handling.
432Afterwards, when one of these signals occurs, a two-phase recovery
433process executes. First, information is collected about the execution
434context including a full stack-trace, symbol table entries, and
435debugging information.  Then, the current stream of execution is
436aborted and an error is returned to the interpreter.  This process is
437illustrated in Figure~3.
438
439The collection of context and debugging information involves the
440following steps:
441
442\begin{itemize}
443\item The program counter and stack pointer are obtained from
444context information passed to the signal handler.
445
446\item The virtual memory map of the process is obtained from /proc
447and used to associate virtual memory addresses with executable files,
448shared libraries, and dynamically loaded extension modules \cite{proc}.
449
450\item The call stack is unwound to collect traceback information.
451At each step of the stack traceback, symbol table and debugging
452information is gathered and stored in a generic data structure for later use
453in the recovery process.  This data is obtained by memory-mapping
454the object files associated with the process and extracting
455symbol table and debugging information.
456\end{itemize}
457
458Once debugging information has been collected, the signal handler
459enters an error-recovery phase that
460attempts to raise a scripting exception and return to a suitable location in the
461interpreter.  To do this, the following steps are performed:
462
463\begin{itemize}
464
465\item The stack trace is examined to see if there are any locations in the interpreter
466to which control can be returned.
467
468\item If a suitable return location is found, the CPU context is modified in
470with an error.  This return process is assisted by a small
471trampoline function (partially written in assembly language) that arranges a proper
473\end{itemize}
474
475\noindent
476Of the two phases, the first is the most straightforward to implement
477because it involves standard Unix API functions and common file formats such
478as ELF and stabs \cite{elf,stabs}.   On the other hand, the recovery phase in
479which control is returned to the interpreter is of greater interest.  Therefore,
480it is now described in greater detail.
481
482\begin{figure*}[t]
483\begin{picture}(480,340)(5,60)
484
485\put(50,330){\framebox(200,70){}}
486\put(60,388){\small \tt >>> {\bf foo()}}
487\put(60,376){\small \tt Traceback (most recent call last):}
488\put(70,364){\small \tt   File "<stdin>", line 1, in ?}
489\put(60,352){\small \tt SegFault: [ C stack trace ]}
490\put(60,340){\small \tt ...}
491
492\put(55,392){\line(-1,0){25}}
493\put(30,392){\line(0,-1){80}}
494\put(30,312){\line(1,0){95}}
495\put(125,312){\vector(0,-1){10}}
496\put(175,302){\line(0,1){10}}
497\put(175,312){\line(1,0){95}}
498\put(270,312){\line(0,1){65}}
499\put(270,377){\vector(-1,0){30}}
500
501\put(50,285){\framebox(200,15)[c]{[Python internals]}}
502\put(125,285){\vector(0,-1){10}}
503\put(175,275){\vector(0,1){10}}
504\put(50,260){\framebox(200,15)[c]{call\_builtin()}}
505\put(125,260){\vector(0,-1){10}}
506%\put(175,250){\vector(0,1){10}}
507\put(50,235){\framebox(200,15)[c]{wrap\_foo()}}
508\put(125,235){\vector(0,-1){10}}
509\put(50,210){\framebox(200,15)[c]{foo()}}
510\put(125,210){\vector(0,-1){10}}
511\put(50,185){\framebox(200,15)[c]{doh()}}
512\put(125,185){\vector(0,-1){20}}
513\put(110,148){SIGSEGV}
514\put(160,152){\vector(1,0){100}}
515\put(260,70){\framebox(200,100){}}
517\put(265,140){1. Unwind C stack}
518\put(265,125){2. Gather symbols and debugging info}
519\put(265,110){3. Find safe return location}
520\put(265,95){4. Raise Python exception}
521\put(265,80){5. Modify CPU context and return}
522
523\put(260,185){\framebox(200,15)[c]{return assist}}
524\put(365,174){Return from signal}
525\put(360,170){\vector(0,1){15}}
526\put(360,200){\line(0,1){65}}
527
528%\put(360,70){\line(0,-1){10}}
529%\put(360,60){\line(1,0){110}}
530%\put(470,60){\line(0,1){130}}
531%\put(470,190){\vector(-1,0){10}}
532
533\put(360,265){\vector(-1,0){105}}
534\put(255,250){NULL}
536
537\end{picture}
538
539\caption{Control Flow of the Error Recovery Mechanism for Python}
541\end{figure*}
542
543\section{Returning to the Interpreter}
544
546that correspond to locations within the interpreter
547responsible for invoking wrapper functions and object/type methods.
548For example, Table 1 shows a partial list of return locations used in
549the Python implementation.  When an error occurs, the call stack is
550scanned for the first occurrence of any symbol in this table.  If a
551match is found, control is returned to that location by emulating the
552return of a wrapper function with the error code from the table. If no
553match is found, the error handler simply prints a stack trace to
554standard output and aborts.
555
556When a symbolic match is found, WAD invokes a special user-defined
557handler function that is written for a specific scripting language.
558The primary role of this handler is to take debugging information
559gathered from the call stack and generate an appropriate scripting
560language error.  One peculiar problem of this step is that the
561generation of an error may require the use of parameters passed to a
562wrapper function.  For example, in the Tcl wrapper shown earlier, one
563of the arguments was an object of type {\tt Tcl\_Interp *}''.  This
564object contains information specific to the state of the interpreter
565(and multiple interpreter objects may exist in a single application).
566Unfortunately, no reference to the interpreter object is available in the
567signal handler nor is a reference to interpreter guaranteed to exist in
568the context of a function that generated the error.
569
570To work around this problem, WAD implements a feature
571known as argument stealing.  When examining the call-stack, the signal
572handler has full access to all function arguments and local variables of each function
573on the stack.
574Therefore, if the handler knows that an error was generated while
575calling a wrapper function (as determined by looking at the symbol names),
576it can grab the interpreter object from the stack frame of the wrapper and
577use it to set an appropriate error code before returning to the interpreter.
578Currently, this is managed by allowing the signal handler to steal
579arguments from the caller using positional information.
580For example, to grab the {\tt Tcl\_Interp *} object from a Tcl wrapper function,
581code similar to the following is written:
582
583\begin{verbatim}
584Tcl_Interp *interp;
585int         err;
586
587interp = (Tcl_Interp *)
589           stack,
590           "TclExecuteByteCode",
591           1,
592           &err
593  );
594  ...
595if (!err) {
596  Tcl_SetResult(interp,errtype,...);
598}
599\end{verbatim}
600
601In this case, the Tcl interpreter argument passed to a wrapper function
602is stolen and used to generate an error.  Also, the name {\tt TclExecuteByteCode}
603refers to the calling function, not the wrapper function itself.
604At this time, argument stealing is only applicable to simple types
605such as integers and pointers.  However, this appears to be adequate for generating
606scripting language errors.
607
608
609\begin{table}[t]
610\begin{center}
611\begin{tabular}{ll}
612Python symbol                 &   Error return value \\ \hline
613call\_builtin                 &   NULL \\
614PyObject\_Print               & -1 \\
615PyObject\_CallFunction        & NULL \\
616PyObject\_CallMethod          & NULL \\
617PyObject\_CallObject          & NULL \\
618PyObject\_Cmp                 & -1 \\
619PyObject\_DelAttrString       & -1 \\
620PyObject\_DelItem             & -1 \\
621PyObject\_GetAttrString       & NULL \\
622\end{tabular}
623\end{center}
624
625\label{returnpoints}
626\caption{A partial list of symbolic return locations in the Python interpreter}
627\end{table}
628
629\section{Register Management}
630
631A final issue concerning the return mechanism has to do with the
633speaking, this emulates the C {\tt longjmp}
634library call.  However, this is done without the use of a matching
635{\tt setjmp} in the interpreter.
636
637The primary problem with aborting execution and returning to the
638interpreter in this manner is that most compilers use a register
639management technique known as callee-save \cite{prag}.  In this case,
640it is the responsibility of the called function to save the state of
641the registers and to restore them before returning to the caller. By
642making a non-local jump, registers may be left in an inconsistent
643state due to the fact that they are not restored to their original
644values.  The {\tt longjmp} function in the C library avoids this
645problem by relying upon {\tt setjmp} to save the registers.  Unfortunately,
646WAD does not have this luxury.   As a result, a return from the signal
647handler may produce a corrupted set of registers at the point of return
648in the interpreter.
649
650The severity of this problem depends greatly on the architecture and
651compiler.  For example, on the SPARC, register windows effectively
652solve the callee-save problem \cite{sparc}.  In this case, each stack
653frame has its own register window and the windows are flushed to the
654stack whenever a signal occurs.  Therefore, the recovery mechanism can
655simply examine the stack and arrange to restore the registers to their
656proper values when control is returned.  Furthermore, certain
657conventions of the SPARC ABI resolve several related issues. For
658example, floating point registers are caller-saved and the contents of
659the SPARC global registers are not guaranteed to be preserved across
660procedure calls (in fact, they are not even saved by {\tt setjmp}).
661
662On other platforms, the problem of register management becomes
663more interesting.  In this case, a heuristic approach that examines
664the machine code for each function on the call stack can be used to
665determine where the registers might have been saved.  This approach is
666used by gdb and other debuggers when they allow users to inspect
667register values within arbitrary stack frames \cite{gdb}.  Even though
668this sounds complicated to implement, the algorithm is greatly
669simplified by the fact that compilers typically generate code to store
670the callee-save registers immediately upon the entry to each function.
671In addition, this code is highly regular and easy to examine.  For
672instance, on i386-Linux, the callee-save registers can be restored by
673simply examining the first few bytes of the machine code for each
674function on the call stack to figure out where values have been saved.
675The following code shows a typical sequence of machine instructions
676used to store callee-save registers on i386-Linux:
677
678\begin{verbatim}
679foo:
68055       pushl %ebp
68189 e5    mov  %esp, %ebp
68283 a0    subl  $0xa0,%esp 68356 pushl %esi 68457 pushl %edi 685... 686\end{verbatim} 687 688% 689% Include an example 690% 691 692% more interesting. One approach is to simply ignore the problem 693% altogether and return to the interpreter with the registers in an 694% essentially random state. Surprisingly, this approach actually seems to work (although a considerable degree of 695% caution might be in order). 696% This is because the return of an error code tends to trigger 697% a cascade of procedure returns within the implementation of the interpreter. 698% As a result, the values of the registers are simply discarded and 699% overwritten with restored values as the interpreter unwinds itself and prepares to handle an 700% exception. A better solution to this problem is to modify the recovery mechanism to discover and 701% restore saved registers from the stack. Unfortunately, there is 702% no standardized way to know exactly where the registers might have been saved. 703% Therefore, a heuristic scheme that examines the machine code for each procedure would 704% have to be used to try and identify stack locations. This approach is used by gdb 705% and other debuggers when they allow users to inspect register values 706% within arbitrary stack frames \cite{gdb}. However, this technique has 707% not yet been implemented in WAD due to its obvious implementation difficulty and the 708% fact that the WAD prototype has primarily been developed for the SPARC. 709 710As a fall-back, WAD could be configured to return control to a location 711previously specified with {\tt setjmp}. Unfortunately, this either 712requires modifications to the interpreter or its extension modules. 713Although this kind of instrumentation could be facilitated by automatic 714wrapper code generators, it is not a preferred solution and is not discussed further. 715 716\section{Initialization} 717 718To simplify the debugging of extension modules, it 719is desirable to make the use of WAD as transparent as possible. 720Currently, there are two ways in which the system is used. First, WAD 721may be explicitly loaded as a scripting language extension module. 722For instance, in Python, a user can include the statement {\tt import 723libwadpy} in a script to load the debugger. Alternatively, WAD can be 724enabled by linking it to an extension module as a shared 725library. For instance: 726 727\begin{verbatim} 728% ld -shared$(OBJS) -lwadpy
729\end{verbatim}
730
731In this latter case, WAD initializes itself whenever the extension module is
732loaded.  The same shared library is used for both situations by making
733sure two types of initialization techniques are used.  First, an empty
734initialization function is written to make WAD appear like a proper
735scripting language extension module (although it adds no functions to
736the interpreter).  Second, the real initialization of the system is
737placed into the initialization section of the WAD shared library
738object file (the init'' section of ELF files).  This code always executes
739when a library is loaded by the dynamic loader is commonly used to
740properly initialize C++ objects.  Therefore, a fairly portable way
741to force code into the initialization section is to encapsulate the
742initialization in a C++ statically constructed object like this:
743
744\begin{verbatim}
746   public:
748};
749/* This forces InitWad() to execute
752\end{verbatim}
753
756be added to an extension module to make it work.  In addition, due to
757the way in which the loader resolves and initializes libraries, the
758initialization of WAD is guaranteed to execute before any of the code
759in the extension module to which it has been linked.  The primary
760downside to this approach is that the WAD shared object file can not be
761linked directly to an interpreter.   This is because WAD sometimes needs to call the
762interpreter to properly initialize its exception handling mechanism (for instance, in Python,
763four new types of exceptions are added to the interpreter).  Clearly this type of initialization
765its initialization process would execute before before the main program of the
766interpreter started.  However,
767if you wanted to permanently add WAD to an interpreter, the problem is easily
768corrected by first removing the C++ initializer from WAD and then replacing it with an explicit
769initialization call someplace within the interpreter's startup function.
770
771\section{Exception Objects}
772
773Before WAD returns control to the interpreter, it collects all of the
774stack-trace and debugging information it was able to obtain into a
775special exception object. This object represents the state of the call
776stack and includes things like symbolic names for each stack frame,
777the names, types, and values of function parameters and stack
778variables, as well as a complete copy of data on the stack. This
779information is represented in a generic manner that hides
780platform specific details related to the CPU, object file formats,
781debugging tables, and so forth.
782
783Minimally, the exception data is used to print a stack trace as shown
784in Figure 1.  However, if the interpreter is successfully able to
785regain control, the contents of the exception object can be
786freely examined after an error has occurred.  For example, a Python
787script could catch a segmentation fault and print debugging information
788like this:
789
790\begin{verbatim}
791try:
792   # Some buggy code
793   ...
794except SegFault,e:
795   print 'Whoa!'
796   # Get WAD exception object
797   t = e.args[0]
798   # Print location info
799   print t.__FILE__
800   print t.__LINE__
801   print t.__NAME__
802   print t.__SOURCE__
803   ...
804\end{verbatim}
805
806Inspection of the exception object also makes it possible to write post mortem
807script debuggers that merge the call stacks of the two languages and
808provide cross language diagnostics.  Figure 4 shows an
809example of a simple mixed language debugging session using the WAD
810post-mortem debugger (wpm) after an extension error has occurred in a
811Python program.  In the figure, the user is first presented with a
812multi-language stack trace.  The information in this trace is obtained
813both from the WAD exception object and from the Python traceback
814generated when the exception was raised. Next, we see the user walking
815up the call stack using the 'u' command of the debugger.  As this
816proceeds, there is a seamless transition from C to Python where the
817trace crosses between the two languages.  An optional feature of the
818debugger (not shown) allows the debugger to walk up the entire C
819call-stack (in this case, the trace shows information about the
820implementation of the Python interpreter).  More advanced features of
821the debugger allow the user to query values of function
822parameters, local variables, and stack frames (although some of this
823information may not be obtainable due to compiler optimizations and the
824difficulties of accurately recovering register values).
825
826\begin{figure*}[t]
827{\small
828\begin{verbatim}
829[ Error occurred ]
830>>> from wpm import *
832#5   [ Python ] in self.widget._report_exception() in ...
833#4   [ Python ] in Button(self,text="Die", command=lambda x=self: ...
834#3   [ Python ] in death_by_segmentation() in death.py, line 22
835#2   [ Python ] in debug.seg_crash() in death.py, line 5
836#1   0xfeee2780 in _wrap_seg_crash(self=0x0,args=0x18f114) in 'pydebug.c', line 512
837#0   0xfeee1320 in seg_crash() in 'debug.c', line 20
838
839      int *a = 0;
840 =>   *a = 3;
841      return 1;
842
843>>> u
844#1   0xfeee2780 in _wrap_seg_crash(self=0x0,args=0x18f114) in 'pydebug.c', line 512
845
846        if(!PyArg_ParseTuple(args,":seg_crash")) return NULL;
847 =>     result = (int )seg_crash();
848        resultobj = PyInt_FromLong((long)result);
849
850>>> u
851#2   [ Python ] in debug.seg_crash() in death.py, line 5
852
853    def death_by_segmentation():
854 =>     debug.seg_crash()
855
856>>> u
857#3   [ Python ] in death_by_segmentation() in death.py, line 22
858
859        if ty == 1:
860 =>         death_by_segmentation()
861        elif ty == 2:
862>>> \end{verbatim}
863}
864\caption{Cross-language debugging session in Python where a user is walking a mixed language call stack.}
865\end{figure*}
866
867\section{Implementation Details}
868
869Currently, WAD is implemented in ANSI C and small amount of assembly
871implementation supports Python and Tcl extensions on SPARC Solaris and
872i386-Linux.  Each scripting language is currently supported by a
873separate shared library such as {\tt libwadpy.so} and {\tt
876a stack trace is simply printed to standard error when a problem occurs).
877The entire implementation contains approximately 2000
878semicolons. Most of this code pertains to the gathering of debugging
879information from object files.  Only a small part of the code is
880specific to a particular scripting language (170 semicolons for Python
881and 50 semicolons for Tcl).
882
883Although there are libraries such as the GNU Binary File Descriptor
884(BFD) library that can assist with the manipulation of object files,
885these are not used in the implementation \cite{bfd}.  These
886libraries tend to be quite large and are oriented more towards
888the behavior of these libraries with respect to memory management
889would need to be carefully studied before they could be safely used in
890an embedded environment. Finally, given the small size of the prototype
891implementation, it didn't seem necessary to rely upon such a
892heavyweight solution.
893
894A surprising feature of the implementation is that a significant
895amount of the code is language independent.  This is achieved by
896placing all of the process introspection, data collection, and
897platform specific code within a centralized core.  To provide a
898specific scripting language interface, a developer only needs to
899supply two things; a table containing symbolic function names where
900control can be returned (Table 1), and a handler function in the form
901of a callback.  As input, this handler receives an exception object as
902described in an earlier section.  From this, the handler can
903raise a scripting language exception in whatever manner is most
904appropriate.
905
906Significant portions of the core are also relatively straightforward
907to port between different Unix systems.  For instance, code to read
908ELF object files and stabs debugging data is essentially identical for
909Linux and Solaris.  In addition, the high-level control logic is
910unchanged between platforms.  Platform specific differences primarily
911arise in the obvious places such as the examination of CPU
912registers, manipulation of the process context in the signal handler,
914changes would also need to be made on systems with different object
915file formats such as COFF and DWARF2.  To extent that it is possible,
916these differences could be hidden by abstraction mechanisms (although
917the initial implementation of WAD is weak in this regard and would
918benefit from techniques used in more advanced debuggers such as gdb).
919Despite these porting issues, the primary requirement for WAD is a fully
920functional implementation of SVR4 signal handling that allows for
921modifications of the process context.
922
923Due to the heavy dependence on Unix signal handling, process
924introspection, and object file formats, it is unlikely that WAD could
925be easily ported to non-Unix systems such as Windows.  However, it may
926be possible to provide a similar capability using advanced features of
927Windows structured exception handling \cite{seh}.  For instance, structured
928exception handlers can be used to catch hardware faults, they can
929receive process context information, and they can arrange to take
930corrective action much like the signal implementation described here.
931
932\section{Modification of Interpreters?}
933
935or not it would make sense to modify existing interpreters to assist
936in the recovery process. For instance, instrumenting Python or Tcl with setjmp
937functions might simplify the implementation since it would eliminate
938issues related to register restoration and finding a suitable return
939location.
940
941Although it may be possible to make these changes, there are
942several drawbacks to this approach.  First, the number of required modifications may be
943quite large.  For instance, there are well over 50 entry points to
944extension code within the implementation of Python.  Second, an
945extension module may perform callbacks and evaluation of script code.
946This means that the call stack would cross back and forth
947between languages and that these modifications would have to be made
948in a way that allows arbitrary nesting of extension calls.  Finally,
949instrumenting the code in this manner may introduce a performance
950impact--a clearly undesirable side effect considering the infrequent
951occurrence of fatal extension errors.
952
953\section{Discussion}
954
955The primary goal of embedded error recovery is to provide an
956alternative approach for debugging scripting language extensions.
957Although this approach has many benefits, there are a number
958drawbacks and issues that must be discussed.
959
960First, like the C {\tt longjmp} function, the error recovery mechanism
961does not cleanly unwind the call stack.  For C++, this means that
962objects allocated on stack will not be finalized (destructors will not
963be invoked) and that memory allocated on the heap may be
964leaked. Similarly, this could result in open files, sockets, and other
965system resources. In a multi-threaded environment,
966deadlock may occur if a procedure holds a lock when an error occurs.
967
968In certain cases, the use of signals in WAD may interact adversely with scripting
969language signal handling. Since scripting languages ordinarily do not catch signals such as
970SIGSEGV, SIGBUS, and SIGABRT, the use of WAD is unlikely to conflict
971with any existing signal handling. However, most scripting languages would not
972prevent a user from disabling the WAD error recovery mechanism by
973simply specifying a new handler for one or more of these signals.  In addition, the use of
974certain extensions such as the Perl sigtrap module would completely
976
977A more difficult signal handling problem arises when thread libraries
978are used. These libraries tend to override default signal handling
979behavior in a way that defines how signals are delivered to each
981delivered to any thread within a process.  However, this does not
982appear to be a problem for WAD since hardware exceptions are delivered
983to a signal handler that runs within the same thread in which the
984error occurred.  Unfortunately, even in this case, personal experience has
985shown that certain implementations of user thread libraries (particularly on older versions
986of Linux) do not reliably pass
987signal context information nor do they universally support advanced
988signal operations such as {\tt sigaltstack}.  Because of this, WAD may
989be incompatible with a crippled implementation of user threads on
990these platforms.
991
992A even more subtle problem with threads is that the recovery process
993itself is not thread-safe (i.e., it is not possible to concurrently
994handle fatal errors occurring in different threads).  For most
995scripting language extensions, this limitation does not apply due to
996strict run-time restrictions that interpreters currently place on
998programs, it places a global mutex-lock around the interpreter that
999makes it impossible for more than one thread to concurrently execute
1000within the interpreter at once. A consequence of this restriction is
1001that extension functions are not interruptible by thread-switching
1002unless they explicitly release the interpreter lock.  Currently, the
1003behavior of WAD is undefined if extension code releases the lock and
1004proceeds to generate a fault.  In this case, the recovery process may
1005either cause an exception to be raised in an entirely different
1006thread or cause execution to violate the interpreter's mutual exclusion
1007constraint on the interpreter.
1008
1009In certain cases, errors may result in an unrecoverable crash.  For
1010example, if an application overwrites the heap, it may destroy
1011critical data structures within the interpreter.  Similarly,
1012destruction of the call stack (via buffer overflow) makes it
1013impossible for the recovery mechanism to create a stack-trace and
1015such as double-freeing of heap allocated memory can also cause a system
1016to fail in a manner that bears little resemblance to actual source
1017of the problem.    Given that WAD lives in the same process as the
1018faulting application and that such errors may occur, a common
1019question to ask is to what extent does WAD complicate debugging when it
1020doesn't work.
1021
1022To handle potential problems in the implementation of WAD itself,
1023great care is taken to avoid the use of library functions and
1024functions that rely on heap allocation (malloc, free, etc.).  For
1025instance, to provide dynamic memory allocation, WAD implements its own
1026memory allocator using mmap.  In addition, signals are disabled
1027immediately upon entry to the WAD signal handler.  Should a fatal
1028error occur inside WAD, the application will dump core and exit.  Since
1029the resulting core file contains the stack trace of both WAD and the
1030faulting application, a traditional C debugger can be used to identify
1031the problem as before.  The only difference is that a few additional
1032stack frames will appear on the traceback.
1033
1034An application may also fail after the WAD signal handler has completed
1035execution if memory or stack frames within the interpreter have been
1036corrupted in a way that prevents proper exception handling. In this case, the
1037application may fail in a manner that does not represent the original
1038programming error. It might also cause the WAD signal handler to be
1039immediately reinvoked with a different process state--causing it to
1041these kinds of problems, WAD creates a tracefile {\tt
1042wadtrace} in the current working directory that contains information
1043about each error that it has handled.  If no recovery was possible, a
1044programmer can look at this file to obtain all of the stack traces
1045that were generated.
1046
1047If an application is experiencing a very serious problem, WAD
1048does not prevent a standard debugger from being attached to the
1049process.  This is because the debugger overrides the current signal
1050handling so that it can catch fatal errors.  As a result, even if WAD
1051is loaded, fatal signals are simply redirected to the attached
1052debugger.  Such an approach also allows for more complex debugging
1053tasks such as single-step execution, breakpoints, and
1055
1056%
1058%
1059
1060Finally, there are a number of issues that pertain
1061to the interaction of the recovery mechanism with the interpreter.
1069is no way to specify a proper erro…