PageRenderTime 60ms CodeModel.GetById 11ms RepoModel.GetById 0ms app.codeStats 1ms

/tags/wad-0-2-1/SWIG/Tools/WAD/Papers/usenix2001.tex

#
TeX | 1347 lines | 1073 code | 208 blank | 66 comment | 79 complexity | d1bffb819b928936709585ec80b9bb01 MD5 | raw file
Possible License(s): LGPL-2.1, Cube, GPL-3.0, 0BSD, GPL-2.0
  1. %template for producing IEEE-format articles using LaTeX.
  2. %written by Matthew Ward, CS Department, Worcester Polytechnic Institute.
  3. %use at your own risk. Complaints to /dev/null.
  4. %make two column with no page numbering, default is 10 point
  5. %\documentstyle{article}
  6. \documentstyle[twocolumn,times]{article}
  7. \pagestyle{empty}
  8. %set dimensions of columns, gap between columns, and space between paragraphs
  9. %\setlength{\textheight}{8.75in}
  10. \setlength{\textheight}{9.0in}
  11. \setlength{\columnsep}{0.25in}
  12. \setlength{\textwidth}{6.45in}
  13. \setlength{\footheight}{0.0in}
  14. \setlength{\topmargin}{0.0in}
  15. \setlength{\headheight}{0.0in}
  16. \setlength{\headsep}{0.0in}
  17. \setlength{\oddsidemargin}{0in}
  18. %\setlength{\oddsidemargin}{-.065in}
  19. %\setlength{\oddsidemargin}{-.17in}
  20. %\setlength{\parindent}{0pc}
  21. %I copied stuff out of art10.sty and modified them to conform to IEEE format
  22. \makeatletter
  23. %as Latex considers descenders in its calculation of interline spacing,
  24. %to get 12 point spacing for normalsize text, must set it to 10 points
  25. \def\@normalsize{\@setsize\normalsize{12pt}\xpt\@xpt
  26. \abovedisplayskip 10pt plus2pt minus5pt\belowdisplayskip \abovedisplayskip
  27. \abovedisplayshortskip \z@ plus3pt\belowdisplayshortskip 6pt plus3pt
  28. minus3pt\let\@listi\@listI}
  29. %need an 11 pt font size for subsection and abstract headings
  30. \def\subsize{\@setsize\subsize{12pt}\xipt\@xipt}
  31. %make section titles bold and 12 point, 2 blank lines before, 1 after
  32. \def\section{\@startsection {section}{1}{\z@}{24pt plus 2pt minus 2pt}
  33. {12pt plus 2pt minus 2pt}{\large\bf}}
  34. %make subsection titles bold and 11 point, 1 blank line before, 1 after
  35. \def\subsection{\@startsection {subsection}{2}{\z@}{12pt plus 2pt minus 2pt}
  36. {12pt plus 2pt minus 2pt}{\subsize\bf}}
  37. \makeatother
  38. \newcommand{\ignore}[1]{}
  39. %\renewcommand{\thesubsection}{\arabic{subsection}.}
  40. \begin{document}
  41. %don't want date printed
  42. \date{}
  43. %make title bold and 14 pt font (Latex default is non-bold, 16 pt)
  44. \title{\Large \bf An Embedded Error Recovery and Debugging Mechanism for Scripting Language Extensions}
  45. %for single author (just remove % characters)
  46. \author{{David M.\ Beazley} \\
  47. {\em Department of Computer Science} \\
  48. {\em University of Chicago }\\
  49. {\em Chicago, Illinois 60637 }\\
  50. {\em beazley@cs.uchicago.edu }}
  51. % My Department \\
  52. % My Institute \\
  53. % My City, ST, zip}
  54. %for two authors (this is what is printed)
  55. %\author{\begin{tabular}[t]{c@{\extracolsep{8em}}c}
  56. % Roscoe Giles & Pablo Tamayo \\
  57. % \\
  58. % Department of Electrical, Computer, & Thinking Machines Corp. \\
  59. % and Systems Engineering & Cambridge, MA~~02142. \\
  60. % and & \\
  61. % Center for Computational Science & \\
  62. % Boston University, Boston, MA~~02215. &
  63. %\end{tabular}}
  64. \maketitle
  65. %I don't know why I have to reset thispagesyle, but otherwise get page numbers
  66. \thispagestyle{empty}
  67. \subsection*{Abstract}
  68. {\em
  69. In recent years, scripting languages such as Perl, Python, and Tcl
  70. have become popular development tools for the creation of
  71. sophisticated application software. One of the most useful features
  72. of these languages is their ability to easily interact with compiled
  73. languages such as C and C++. Although this mixed language approach
  74. has many benefits, one of the greatest drawbacks is the complexity of
  75. debugging that results from using interpreted and compiled code in the
  76. same application. In part, this is due to the fact that scripting
  77. language interpreters are unable to recover from catastrophic errors
  78. in compiled extension code. Moreover, traditional C/C++ debuggers
  79. do not provide a satisfactory degree of integration with interpreted
  80. languages. This paper describes an experimental system in which fatal
  81. extension errors such as segmentation faults, bus errors, and failed
  82. assertions are handled as scripting language exceptions. This system,
  83. which has been implemented as a general purpose shared library,
  84. requires no modifications to the target scripting language, introduces
  85. no performance penalty, and simplifies the debugging of mixed
  86. interpreted-compiled application software.
  87. }
  88. \section{Introduction}
  89. Slightly more than ten years have passed since John Ousterhout
  90. introduced the Tcl scripting language at the 1990 USENIX technical
  91. conference \cite{ousterhout}. Since then, scripting languages have
  92. been gaining in popularity as evidenced by the wide-spread use of
  93. systems such as Tcl, Perl, Python, Guile, PHP, and Ruby
  94. \cite{ousterhout,perl,python,guile,php,ruby}.
  95. In part, the success of modern scripting languages is due to their
  96. ability to be easily integrated with software written in compiled
  97. languages such as C, C++, and Fortran. In addition, a wide variety of wrapper
  98. generation tools can be used
  99. to automatically produce bindings between existing code and a
  100. variety of scripting language environments
  101. \cite{swig,sip,pyfort,f2py,advperl,heidrich,vtk,gwrap,wrappy}. As a result, a large number of
  102. programmers are now using scripting languages to control
  103. complex C/C++ programs or as a tool for re-engineering legacy
  104. software. This approach is attractive because it allows programmers
  105. to benefit from the flexibility and rapid development of
  106. scripting while retaining the best features of compiled code such as high
  107. performance \cite{ouster1}.
  108. A critical aspect of scripting-compiled code integration is the way in
  109. which it departs from traditional C/C++ development and shell
  110. scripting. Rather than building stand-alone applications that run as
  111. separate processes, extension programming encourages a style of
  112. programming in which components are tightly integrated within
  113. an interpreter that is responsible for high-level control.
  114. Because of this, scripted software tends to rely heavily
  115. upon shared libraries, dynamic loading, scripts, and
  116. third-party extensions. In this sense, one might argue that the
  117. benefits of scripting are achieved at the expense of creating a
  118. more complicated development environment.
  119. A consequence of this complexity is an increased degree of difficulty
  120. associated with debugging programs that utilize multiple languages,
  121. dynamically loadable modules, and a sophisticated runtime environment.
  122. To address this problem, this paper describes an experimental system
  123. known as WAD (Wrapped Application Debugger) in which an embedded error
  124. reporting and debugging mechanism is added to common scripting
  125. languages. This system converts catastrophic signals such as
  126. segmentation faults and failed assertions to exceptions that can be
  127. handled by the scripting language interpreter. In doing so, it
  128. provides more seamless integration between error handling in
  129. scripting language interpreters and compiled extensions.
  130. \section{The Debugging Problem}
  131. Normally, a programming error in a scripted application
  132. results in an exception that describes the problem and the context in
  133. which it occurred. For example, an error in a Python script might
  134. produce a traceback similar to the following:
  135. \begin{verbatim}
  136. % python foo.py
  137. Traceback (innermost last):
  138. File "foo.py", line 11, in ?
  139. foo()
  140. File "foo.py", line 8, in foo
  141. bar()
  142. File "foo.py", line 5, in bar
  143. spam()
  144. File "foo.py", line 2, in spam
  145. doh()
  146. NameError: doh
  147. \end{verbatim}
  148. In this case, a programmer might be able to apply a fix simply based
  149. on information in the traceback. Alternatively, if the problem is
  150. more complicated, a script-level debugger can be used to provide more
  151. information. In contrast, a failure in compiled extension code might
  152. produce the following result:
  153. \begin{verbatim}
  154. % python foo.py
  155. Segmentation Fault (core dumped)
  156. \end{verbatim}
  157. In this case, the user has no idea of what has happened other than it
  158. appears to be ``very bad.'' Furthermore, script-level debuggers are
  159. unable to identify the problem since they also crash when the error
  160. occurs (they run in the same process as the interpreter). This means
  161. that the only way for a user to narrow the source of the problem
  162. within a script is through trial-and-error techniques such as
  163. inserting print statements, commenting out sections of scripts, or
  164. having a deep intuition of the underlying implementation. Obviously,
  165. none of these techniques are particularly elegant.
  166. An alternative approach is to run the application under the control of
  167. a traditional debugger such as gdb \cite{gdb}. Although this provides
  168. some information about the error, the debugger mostly provides
  169. detailed information about the internal implementation of the
  170. scripting language interpreter instead of the script-level code that
  171. was running at the time of the error. Needless to say, this information
  172. isn't very useful to most programmers.
  173. A related problem is that
  174. the structure of a scripted application tends to be much more complex
  175. than a traditional stand-alone program. As a result, a user may not
  176. have a good sense of how to actually attach an external debugger to their
  177. script. In addition, execution may occur within a
  178. complex run-time environment involving events, threads, and network
  179. connections. Because of this, it can be difficult for the user to reproduce
  180. and identify certain types of catastrophic errors if they depend on
  181. timing or unusual event sequences. Finally, this approach
  182. requires a programmer to have a C development environment installed on
  183. their machine. Unfortunately, this may not hold in practice.
  184. This is because scripting languages are often used to provide programmability to
  185. applications where end-users write scripts, but do not write low-level C code.
  186. Even if a traditional debugger such as gdb were modified to provide
  187. better integration with scripting languages, it is not clear that this
  188. would be the most natural solution to the problem. For one,
  189. having to run a separate debugging process to debug
  190. extension code is unnatural when no such requirement exists for
  191. scripts. Moreover, even if such a debugger existed, an
  192. inexperienced user may not have the expertise or inclination to use
  193. it. Finally, obscure fatal errors may occur long after an application
  194. has been deployed. Unless the debugger is distributed along with the
  195. application in some manner, it will be extraordinary difficult to
  196. obtain useful diagnostics when such errors occur.
  197. \begin{figure*}[t]
  198. {\small
  199. \begin{verbatim}
  200. % python foo.py
  201. Traceback (most recent call last):
  202. File "<stdin>", line 1, in ?
  203. File "foo.py", line 16, in ?
  204. foo()
  205. File "foo.py", line 13, in foo
  206. bar()
  207. File "foo.py", line 10, in bar
  208. spam()
  209. File "foo.py", line 7, in spam
  210. doh.doh(a,b,c)
  211. SegFault: [ C stack trace ]
  212. #2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0) in 'ceval.c',line 2650
  213. #1 0xff083544 in _wrap_doh(self=0x0,args=0x1a1ccc) in 'foo_wrap.c',line 745
  214. #0 0xfe7e0568 in doh(a=3,b=4,c=0x0) in 'foo.c',line 28
  215. /u0/beazley/Projects/WAD/Python/foo.c, line 28
  216. int doh(int a, int b, int *c) {
  217. => *c = a + b;
  218. return *c;
  219. }
  220. \end{verbatim}
  221. }
  222. \caption{Cross language traceback generated by WAD for a segmentation fault in a Python extension}
  223. \end{figure*}
  224. The current state of the art in extension debugging is to simply add
  225. as much error checking as possible to extension modules. This is never
  226. a bad thing to do, but in practice it's usually not enough to
  227. eliminate every possible problem. For one, scripting languages are
  228. sometimes used to control hundreds of thousands to millions of lines
  229. of compiled code. In this case, it is improbable that a programmer will
  230. foresee every conceivable error. In addition, scripting languages are
  231. often used to put new user interfaces on legacy software. In this
  232. case, scripting may introduce new modes of execution that cause a
  233. formerly ``bug-free'' application to fail in an unexpected manner.
  234. Finally, certain types of errors such as floating-point exceptions can
  235. be particularly difficult to eliminate because they might be generated
  236. algorithmically (e.g., as the result of instability in a numerical
  237. method). Therefore, even if a programmer has worked hard to eliminate
  238. crashes, there is usually a small probability that an application may
  239. fail under unusual circumstances.
  240. \section{Embedded Error Reporting}
  241. Rather than modifying an existing debugger to support scripting
  242. languages, an alternative approach is to add a more powerful error
  243. handling and reporting mechanism to the scripting language
  244. interpreter. We have implemented this approach in the form of an
  245. experimental system known as WAD. WAD is packaged as dynamically
  246. loadable shared library that can either be loaded as a scripting
  247. language extension module or linked to existing extension modules as a
  248. library. The core of the system is generic and requires no
  249. modifications to the scripting interpreter or existing extension
  250. modules. Furthermore, the system does not introduce a performance
  251. penalty as it does not rely upon program instrumentation or tracing.
  252. WAD works by converting fatal signals such as SIGSEGV,
  253. SIGBUS, SIGFPE, and SIGABRT into scripting language exceptions that contain
  254. debugging information collected from the call-stack of compiled
  255. extension code. By handling errors in this manner, the scripting
  256. language interpreter is able to produce a cross-language stack trace that
  257. contains information from both the script code and extension code as
  258. shown for Python and Tcl/Tk in Figures 1 and 2. In this case, the user
  259. is given a very clear idea of what has happened without having
  260. to launch a separate debugger.
  261. The advantage to this approach is that it provides more seamless
  262. integration between error handling in scripts and error handling in
  263. extensions. In addition, it eliminates the most common debugging step
  264. that a developer is likely to perform in the event of a fatal
  265. error--running a separate debugger on a core file and typing 'where'
  266. to get a stack trace. Finally, this allows end-users to provide
  267. extension writers with useful debugging information since they can
  268. supply a stack trace as opposed to a vague complaint that the program
  269. ``crashed.''
  270. \begin{figure*}[t]
  271. \begin{picture}(400,250)(0,0)
  272. \put(50,-110){\special{psfile = tcl.ps hscale = 60 vscale = 60}}
  273. \end{picture}
  274. \caption{Dialog box with WAD generated traceback information for a failed assertion in a Tcl/Tk extension}
  275. \end{figure*}
  276. \section{Scripting Language Internals}
  277. In order to provide embedded error recovery, it is critical to understand how
  278. scripting language interpreters interface with extension code. Despite the wide variety
  279. of scripting languages, essentially every implementation uses a similar
  280. technique for accessing foreign code.
  281. Virtually all scripting languages provide an extension mechanism in the form of a foreign function
  282. interface in which compiled procedures can be called from the scripting language
  283. interpreter. This is accomplished by writing a collection of wrapper functions that conform
  284. to a specified calling convention. The primary purpose of the wrappers are to
  285. marshal arguments and return values between the two languages and to handle errors.
  286. For example, in Tcl, every wrapper
  287. function must conform to the following prototype:
  288. \begin{verbatim}
  289. int
  290. wrap_foo(ClientData clientData,
  291. Tcl_Interp *interp,
  292. int objc,
  293. Tcl_Obj *CONST objv[])
  294. {
  295. /* Convert arguments */
  296. ...
  297. /* Call a function */
  298. result = foo(args);
  299. /* Set result */
  300. ...
  301. if (success) {
  302. return TCL_OK;
  303. } else {
  304. return TCL_ERROR;
  305. }
  306. }
  307. \end{verbatim}
  308. Another common extension mechanism is an object/type interface that allows programmers to create new
  309. kinds of fundamental types or attach special properties to objects in
  310. the interpreter. For example, both Tcl and Python provide an API for creating new
  311. ``built-in'' objects that behave like numbers, strings, lists, etc.
  312. In most cases, this involves setting up tables of function
  313. pointers that define various properties of an object. For example, if
  314. you wanted to add complex numbers to an interpreter, you might fill in a special
  315. data structure with pointers to methods that implement various numerical operations like this:
  316. \begin{verbatim}
  317. NumberMethods ComplexMethods {
  318. complex_add,
  319. complex_sub,
  320. complex_mul,
  321. complex_div,
  322. ...
  323. };\end{verbatim}
  324. \noindent
  325. Once registered with the interpreter, the methods in this structure
  326. would be invoked by various interpreter operators such as $+$,
  327. $-$, $*$, and $/$.
  328. Most interpreters handle errors as a two-step process in which
  329. detailed error information is first registered with the interpreter
  330. and then a special error code is returned. For example, in Tcl, errors
  331. are handled by setting error information in the interpreter and
  332. returning a value of TCL\_ERROR. Similarly in Python, errors are
  333. handled by calling a special function to raise an exception and returning NULL. In both cases,
  334. this triggers the interpreter's error handler---possibly resulting in
  335. a stack trace of the running script. In some cases, an interpreter
  336. might handle errors using a form of the C {\tt longjmp} function.
  337. For example, Perl provides a special function {\tt die} that jumps back
  338. to the interpreter with a fatal error \cite{advperl}.
  339. The precise implementation details of these mechanisms aren't so
  340. important for our discussion. The critical point is that scripting
  341. languages always access extension code though a well-defined interface
  342. that precisely defines how arguments are to be passed, values are to be
  343. returned, and errors are to be handled.
  344. \section{Scripting Languages and Signals}
  345. Under normal circumstances, errors in extension code are handled
  346. through the error-handling API provided by the scripting language
  347. interpreter. For example, if an invalid function parameter is passed,
  348. a program can simply set an error message and return to the
  349. interpreter. Similarly, automatic wrapper generators such as SWIG can produce
  350. code to convert C++ exceptions and other C-related error handling
  351. schemes to scripting language errors \cite{swigexcept}. On the other
  352. hand, segmentation faults, failed assertions, and similar problems
  353. produce signals that cause the interpreter to abort execution.
  354. Most scripting languages provide limited support for Unix signal
  355. handling \cite{stevens}. However, this support is not sufficiently advanced to
  356. recover from fatal signals produced by extension code.
  357. Unlike signals generated for asynchronous events such as I/O,
  358. execution can {\em not} be resumed at the point of a fatal signal.
  359. Therefore, even if such a signal could be caught and handled by a script,
  360. there isn't much that it can do except to print a diagnostic
  361. message and abort before the signal handler returns. In addition,
  362. some interpreters block signal delivery while executing
  363. extension code--opting to handle signals at a time when it is more convenient.
  364. In this case, a signal such as SIGSEGV would simply cause the whole application
  365. to freeze since there is no way for execution to continue to a point where
  366. the signal could be delivered. Thus, scripting languages tend to
  367. either ignore the problem or label it as a ``limitation.''
  368. \section{Overview of WAD}
  369. WAD installs a signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL,
  370. and SIGFPE using the {\tt sigaction} function
  371. \cite{stevens}. Furthermore, it uses a special option (SA\_SIGINFO) of
  372. signal handling that passes process context information to the signal
  373. handler when a signal occurs. Since none of these signals are normally used in the
  374. implementation of the scripting interpreter or by user scripts,
  375. this does not usually override any previous signal handling.
  376. Afterwards, when one of these signals occurs, a two-phase recovery
  377. process executes. First, information is collected about the execution
  378. context including a full stack-trace, symbol table entries, and
  379. debugging information. Then, the current stream of execution is
  380. aborted and an error is returned to the interpreter. This process is
  381. illustrated in Figure~3.
  382. The collection of context and debugging information involves the
  383. following steps:
  384. \begin{itemize}
  385. \item The program counter and stack pointer are obtained from
  386. context information passed to the signal handler.
  387. \item The virtual memory map of the process is obtained from /proc
  388. and used to associate virtual memory addresses with executable files,
  389. shared libraries, and dynamically loaded extension modules \cite{proc}.
  390. \item The call stack is unwound to collect traceback information.
  391. At each step of the stack traceback, symbol table and debugging
  392. information is gathered and stored in a generic data structure for later use
  393. in the recovery process. This data is obtained by memory-mapping
  394. the object files associated with the process and extracting
  395. symbol table and debugging information.
  396. \end{itemize}
  397. Once debugging information has been collected, the signal handler
  398. enters an error-recovery phase that
  399. attempts to raise a scripting exception and return to a suitable location in the
  400. interpreter. To do this, the following steps are performed:
  401. \begin{itemize}
  402. \item The stack trace is examined to see if there are any locations in the interpreter
  403. to which control can be returned.
  404. \item If a suitable return location is found, the CPU context is modified in
  405. a manner that makes the signal handler return to the interpreter
  406. with an error. This return process is assisted by a small
  407. trampoline function (partially written in assembly language) that arranges a proper
  408. return to the interpreter after the signal handler returns.
  409. \end{itemize}
  410. \noindent
  411. Of the two phases, the first is the most straightforward to implement
  412. because it involves standard Unix API functions and common file formats such
  413. as ELF and stabs \cite{elf,stabs}. On the other hand, the recovery phase in
  414. which control is returned to the interpreter is of greater interest. Therefore,
  415. it is now described in greater detail.
  416. \begin{figure*}[t]
  417. \begin{picture}(480,340)(5,60)
  418. \put(50,330){\framebox(200,70){}}
  419. \put(60,388){\small \tt >>> {\bf foo()}}
  420. \put(60,376){\small \tt Traceback (most recent call last):}
  421. \put(70,364){\small \tt File "<stdin>", line 1, in ?}
  422. \put(60,352){\small \tt SegFault: [ C stack trace ]}
  423. \put(60,340){\small \tt ...}
  424. \put(55,392){\line(-1,0){25}}
  425. \put(30,392){\line(0,-1){80}}
  426. \put(30,312){\line(1,0){95}}
  427. \put(125,312){\vector(0,-1){10}}
  428. \put(175,302){\line(0,1){10}}
  429. \put(175,312){\line(1,0){95}}
  430. \put(270,312){\line(0,1){65}}
  431. \put(270,377){\vector(-1,0){30}}
  432. \put(50,285){\framebox(200,15)[c]{[Python internals]}}
  433. \put(125,285){\vector(0,-1){10}}
  434. \put(175,275){\vector(0,1){10}}
  435. \put(50,260){\framebox(200,15)[c]{call\_builtin()}}
  436. \put(125,260){\vector(0,-1){10}}
  437. %\put(175,250){\vector(0,1){10}}
  438. \put(50,235){\framebox(200,15)[c]{wrap\_foo()}}
  439. \put(125,235){\vector(0,-1){10}}
  440. \put(50,210){\framebox(200,15)[c]{foo()}}
  441. \put(125,210){\vector(0,-1){10}}
  442. \put(50,185){\framebox(200,15)[c]{doh()}}
  443. \put(125,185){\vector(0,-1){20}}
  444. \put(110,148){SIGSEGV}
  445. \put(160,152){\vector(1,0){100}}
  446. \put(260,70){\framebox(200,100){}}
  447. \put(310,155){WAD signal handler}
  448. \put(265,140){1. Unwind C stack}
  449. \put(265,125){2. Gather symbols and debugging info}
  450. \put(265,110){3. Find safe return location}
  451. \put(265,95){4. Raise Python exception}
  452. \put(265,80){5. Modify CPU context and return}
  453. \put(260,185){\framebox(200,15)[c]{return assist}}
  454. \put(365,174){Return from signal}
  455. \put(360,170){\vector(0,1){15}}
  456. \put(360,200){\line(0,1){65}}
  457. %\put(360,70){\line(0,-1){10}}
  458. %\put(360,60){\line(1,0){110}}
  459. %\put(470,60){\line(0,1){130}}
  460. %\put(470,190){\vector(-1,0){10}}
  461. \put(360,265){\vector(-1,0){105}}
  462. \put(255,250){NULL}
  463. \put(255,270){Return to interpreter}
  464. \end{picture}
  465. \caption{Control Flow of the Error Recovery Mechanism for Python}
  466. \label{wad}
  467. \end{figure*}
  468. \section{Returning to the Interpreter}
  469. To return to the interpreter, WAD maintains a table of symbolic names
  470. that correspond to locations within the interpreter
  471. responsible for invoking wrapper functions and object/type methods.
  472. For example, Table 1 shows a partial list of return locations used in
  473. the Python implementation. When an error occurs, the call stack is
  474. scanned for the first occurrence of any symbol in this table. If a
  475. match is found, control is returned to that location by emulating the
  476. return of a wrapper function with the error code from the table. If no
  477. match is found, the error handler simply prints a stack trace to
  478. standard output and aborts.
  479. When a symbolic match is found, WAD invokes a special user-defined
  480. handler function that is written for a specific scripting language.
  481. The primary role of this handler is to take debugging information
  482. gathered from the call stack and generate an appropriate scripting
  483. language error. One peculiar problem of this step is that the
  484. generation of an error may require the use of parameters passed to a
  485. wrapper function. For example, in the Tcl wrapper shown earlier, one
  486. of the arguments was an object of type ``{\tt Tcl\_Interp *}''. This
  487. object contains information specific to the state of the interpreter
  488. (and multiple interpreter objects may exist in a single application).
  489. Unfortunately, no reference to the interpreter object is available in the
  490. signal handler nor is a reference to interpreter guaranteed to exist in
  491. the context of a function that generated the error.
  492. To work around this problem, WAD implements a feature
  493. known as argument stealing. When examining the call-stack, the signal
  494. handler has full access to all function arguments and local variables of each function
  495. on the stack.
  496. Therefore, if the handler knows that an error was generated while
  497. calling a wrapper function (as determined by looking at the symbol names),
  498. it can grab the interpreter object from the stack frame of the wrapper and
  499. use it to set an appropriate error code before returning to the interpreter.
  500. Currently, this is managed by allowing the signal handler to steal
  501. arguments from the caller using positional information.
  502. For example, to grab the {\tt Tcl\_Interp *} object from a Tcl wrapper function,
  503. code similar to the following is written:
  504. \begin{verbatim}
  505. Tcl_Interp *interp;
  506. int err;
  507. interp = (Tcl_Interp *)
  508. wad_steal_outarg(
  509. stack,
  510. "TclExecuteByteCode",
  511. 1,
  512. &err
  513. );
  514. ...
  515. if (!err) {
  516. Tcl_SetResult(interp,errtype,...);
  517. Tcl_AddErrorInfo(interp,errdetails);
  518. }
  519. \end{verbatim}
  520. In this case, the Tcl interpreter argument passed to a wrapper function
  521. is stolen and used to generate an error. Also, the name {\tt TclExecuteByteCode}
  522. refers to the calling function, not the wrapper function itself.
  523. At this time, argument stealing is only applicable to simple types
  524. such as integers and pointers. However, this appears to be adequate for generating
  525. scripting language errors.
  526. \begin{table}[t]
  527. \begin{center}
  528. \begin{tabular}{ll}
  529. Python symbol & Error return value \\ \hline
  530. call\_builtin & NULL \\
  531. PyObject\_Print & -1 \\
  532. PyObject\_CallFunction & NULL \\
  533. PyObject\_CallMethod & NULL \\
  534. PyObject\_CallObject & NULL \\
  535. PyObject\_Cmp & -1 \\
  536. PyObject\_DelAttrString & -1 \\
  537. PyObject\_DelItem & -1 \\
  538. PyObject\_GetAttrString & NULL \\
  539. \end{tabular}
  540. \end{center}
  541. \label{returnpoints}
  542. \caption{A partial list of symbolic return locations in the Python interpreter}
  543. \end{table}
  544. \section{Register Management}
  545. A final issue concerning the return mechanism has to do with the
  546. behavior of the non-local return to the interpreter. Roughly
  547. speaking, this emulates the C {\tt longjmp}
  548. library call. However, this is done without the use of a matching
  549. {\tt setjmp} in the interpreter.
  550. The primary problem with aborting execution and returning to the
  551. interpreter in this manner is that most compilers use a register
  552. management technique known as callee-save \cite{prag}. In this case,
  553. it is the responsibility of the called function to save the state of
  554. the registers and to restore them before returning to the caller. By
  555. making a non-local jump, registers may be left in an inconsistent
  556. state due to the fact that they are not restored to their original
  557. values. The {\tt longjmp} function in the C library avoids this
  558. problem by relying upon {\tt setjmp} to save the registers. Unfortunately,
  559. WAD does not have this luxury. As a result, a return from the signal
  560. handler may produce a corrupted set of registers at the point of return
  561. in the interpreter.
  562. The severity of this problem depends greatly on the architecture and
  563. compiler. For example, on the SPARC, register windows effectively
  564. solve the callee-save problem \cite{sparc}. In this case, each stack
  565. frame has its own register window and the windows are flushed to the
  566. stack whenever a signal occurs. Therefore, the recovery mechanism can
  567. simply examine the stack and arrange to restore the registers to their
  568. proper values when control is returned. Furthermore, certain
  569. conventions of the SPARC ABI resolve several related issues. For
  570. example, floating point registers are caller-saved and the contents of
  571. the SPARC global registers are not guaranteed to be preserved across
  572. procedure calls (in fact, they are not even saved by {\tt setjmp}).
  573. On other platforms, the problem of register management becomes
  574. more interesting. In this case, a heuristic approach that examines
  575. the machine code for each function on the call stack can be used to
  576. determine where the registers might have been saved. This approach is
  577. used by gdb and other debuggers when they allow users to inspect
  578. register values within arbitrary stack frames \cite{gdb}. Even though
  579. this sounds complicated to implement, the algorithm is greatly
  580. simplified by the fact that compilers typically generate code to store
  581. the callee-save registers immediately upon the entry to each function.
  582. In addition, this code is highly regular and easy to examine. For
  583. instance, on i386-Linux, the callee-save registers can be restored by
  584. simply examining the first few bytes of the machine code for each
  585. function on the call stack to figure out where values have been saved.
  586. The following code shows a typical sequence of machine instructions
  587. used to store callee-save registers on i386-Linux:
  588. \begin{verbatim}
  589. foo:
  590. 55 pushl %ebp
  591. 89 e5 mov %esp, %ebp
  592. 83 a0 subl $0xa0,%esp
  593. 56 pushl %esi
  594. 57 pushl %edi
  595. ...
  596. \end{verbatim}
  597. %
  598. % Include an example
  599. %
  600. % more interesting. One approach is to simply ignore the problem
  601. % altogether and return to the interpreter with the registers in an
  602. % essentially random state. Surprisingly, this approach actually seems to work (although a considerable degree of
  603. % caution might be in order).
  604. % This is because the return of an error code tends to trigger
  605. % a cascade of procedure returns within the implementation of the interpreter.
  606. % As a result, the values of the registers are simply discarded and
  607. % overwritten with restored values as the interpreter unwinds itself and prepares to handle an
  608. % exception. A better solution to this problem is to modify the recovery mechanism to discover and
  609. % restore saved registers from the stack. Unfortunately, there is
  610. % no standardized way to know exactly where the registers might have been saved.
  611. % Therefore, a heuristic scheme that examines the machine code for each procedure would
  612. % have to be used to try and identify stack locations. This approach is used by gdb
  613. % and other debuggers when they allow users to inspect register values
  614. % within arbitrary stack frames \cite{gdb}. However, this technique has
  615. % not yet been implemented in WAD due to its obvious implementation difficulty and the
  616. % fact that the WAD prototype has primarily been developed for the SPARC.
  617. As a fall-back, WAD could be configured to return control to a location
  618. previously specified with {\tt setjmp}. Unfortunately, this either
  619. requires modifications to the interpreter or its extension modules.
  620. Although this kind of instrumentation could be facilitated by automatic
  621. wrapper code generators, it is not a preferred solution and is not discussed further.
  622. \section{Initialization}
  623. To simplify the debugging of extension modules, it
  624. is desirable to make the use of WAD as transparent as possible.
  625. Currently, there are two ways in which the system is used. First, WAD
  626. may be explicitly loaded as a scripting language extension module.
  627. For instance, in Python, a user can include the statement {\tt import
  628. libwadpy} in a script to load the debugger. Alternatively, WAD can be
  629. enabled by linking it to an extension module as a shared
  630. library. For instance:
  631. \begin{verbatim}
  632. % ld -shared $(OBJS) -lwadpy
  633. \end{verbatim}
  634. In this latter case, WAD initializes itself whenever the extension module is
  635. loaded. The same shared library is used for both situations by making
  636. sure two types of initialization techniques are used. First, an empty
  637. initialization function is written to make WAD appear like a proper
  638. scripting language extension module (although it adds no functions to
  639. the interpreter). Second, the real initialization of the system is
  640. placed into the initialization section of the WAD shared library
  641. object file (the ``init'' section of ELF files). This code always executes
  642. when a library is loaded by the dynamic loader is commonly used to
  643. properly initialize C++ objects. Therefore, a fairly portable way
  644. to force code into the initialization section is to encapsulate the
  645. initialization in a C++ statically constructed object like this:
  646. \begin{verbatim}
  647. class InitWad {
  648. public:
  649. InitWad() { wad_init(); }
  650. };
  651. /* This forces InitWad() to execute
  652. on loading. */
  653. static InitWad init;
  654. \end{verbatim}
  655. The nice part about this technique is that it allows WAD to be enabled
  656. simply by linking or loading; no special initialization code needs to
  657. be added to an extension module to make it work. In addition, due to
  658. the way in which the loader resolves and initializes libraries, the
  659. initialization of WAD is guaranteed to execute before any of the code
  660. in the extension module to which it has been linked. The primary
  661. downside to this approach is that the WAD shared object file can not be
  662. linked directly to an interpreter. This is because WAD sometimes needs to call the
  663. interpreter to properly initialize its exception handling mechanism (for instance, in Python,
  664. four new types of exceptions are added to the interpreter). Clearly this type of initialization
  665. is impossible if WAD is linked directly to an interpreter as
  666. its initialization process would execute before before the main program of the
  667. interpreter started. However,
  668. if you wanted to permanently add WAD to an interpreter, the problem is easily
  669. corrected by first removing the C++ initializer from WAD and then replacing it with an explicit
  670. initialization call someplace within the interpreter's startup function.
  671. \section{Exception Objects}
  672. Before WAD returns control to the interpreter, it collects all of the
  673. stack-trace and debugging information it was able to obtain into a
  674. special exception object. This object represents the state of the call
  675. stack and includes things like symbolic names for each stack frame,
  676. the names, types, and values of function parameters and stack
  677. variables, as well as a complete copy of data on the stack. This
  678. information is represented in a generic manner that hides
  679. platform specific details related to the CPU, object file formats,
  680. debugging tables, and so forth.
  681. Minimally, the exception data is used to print a stack trace as shown
  682. in Figure 1. However, if the interpreter is successfully able to
  683. regain control, the contents of the exception object can be
  684. freely examined after an error has occurred. For example, a Python
  685. script could catch a segmentation fault and print debugging information
  686. like this:
  687. \begin{verbatim}
  688. try:
  689. # Some buggy code
  690. ...
  691. except SegFault,e:
  692. print 'Whoa!'
  693. # Get WAD exception object
  694. t = e.args[0]
  695. # Print location info
  696. print t.__FILE__
  697. print t.__LINE__
  698. print t.__NAME__
  699. print t.__SOURCE__
  700. ...
  701. \end{verbatim}
  702. Inspection of the exception object also makes it possible to write post mortem
  703. script debuggers that merge the call stacks of the two languages and
  704. provide cross language diagnostics. Figure 4 shows an
  705. example of a simple mixed language debugging session using the WAD
  706. post-mortem debugger (wpm) after an extension error has occurred in a
  707. Python program. In the figure, the user is first presented with a
  708. multi-language stack trace. The information in this trace is obtained
  709. both from the WAD exception object and from the Python traceback
  710. generated when the exception was raised. Next, we see the user walking
  711. up the call stack using the 'u' command of the debugger. As this
  712. proceeds, there is a seamless transition from C to Python where the
  713. trace crosses between the two languages. An optional feature of the
  714. debugger (not shown) allows the debugger to walk up the entire C
  715. call-stack (in this case, the trace shows information about the
  716. implementation of the Python interpreter). More advanced features of
  717. the debugger allow the user to query values of function
  718. parameters, local variables, and stack frames (although some of this
  719. information may not be obtainable due to compiler optimizations and the
  720. difficulties of accurately recovering register values).
  721. \begin{figure*}[t]
  722. {\small
  723. \begin{verbatim}
  724. [ Error occurred ]
  725. >>> from wpm import *
  726. *** WAD Debugger ***
  727. #5 [ Python ] in self.widget._report_exception() in ...
  728. #4 [ Python ] in Button(self,text="Die", command=lambda x=self: ...
  729. #3 [ Python ] in death_by_segmentation() in death.py, line 22
  730. #2 [ Python ] in debug.seg_crash() in death.py, line 5
  731. #1 0xfeee2780 in _wrap_seg_crash(self=0x0,args=0x18f114) in 'pydebug.c', line 512
  732. #0 0xfeee1320 in seg_crash() in 'debug.c', line 20
  733. int *a = 0;
  734. => *a = 3;
  735. return 1;
  736. >>> u
  737. #1 0xfeee2780 in _wrap_seg_crash(self=0x0,args=0x18f114) in 'pydebug.c', line 512
  738. if(!PyArg_ParseTuple(args,":seg_crash")) return NULL;
  739. => result = (int )seg_crash();
  740. resultobj = PyInt_FromLong((long)result);
  741. >>> u
  742. #2 [ Python ] in debug.seg_crash() in death.py, line 5
  743. def death_by_segmentation():
  744. => debug.seg_crash()
  745. >>> u
  746. #3 [ Python ] in death_by_segmentation() in death.py, line 22
  747. if ty == 1:
  748. => death_by_segmentation()
  749. elif ty == 2:
  750. >>> \end{verbatim}
  751. }
  752. \caption{Cross-language debugging session in Python where a user is walking a mixed language call stack.}
  753. \end{figure*}
  754. \section{Implementation Details}
  755. Currently, WAD is implemented in ANSI C and small amount of assembly
  756. code to assist in the return to the interpreter. The current
  757. implementation supports Python and Tcl extensions on SPARC Solaris and
  758. i386-Linux. Each scripting language is currently supported by a
  759. separate shared library such as {\tt libwadpy.so} and {\tt
  760. libwadtcl.so}. In addition, a language neutral library {\tt
  761. libwad.so} can be linked against non-scripted applications (in which case
  762. a stack trace is simply printed to standard error when a problem occurs).
  763. The entire implementation contains approximately 2000
  764. semicolons. Most of this code pertains to the gathering of debugging
  765. information from object files. Only a small part of the code is
  766. specific to a particular scripting language (170 semicolons for Python
  767. and 50 semicolons for Tcl).
  768. Although there are libraries such as the GNU Binary File Descriptor
  769. (BFD) library that can assist with the manipulation of object files,
  770. these are not used in the implementation \cite{bfd}. These
  771. libraries tend to be quite large and are oriented more towards
  772. stand-alone tools such as debuggers, linkers, and loaders. In addition,
  773. the behavior of these libraries with respect to memory management
  774. would need to be carefully studied before they could be safely used in
  775. an embedded environment. Finally, given the small size of the prototype
  776. implementation, it didn't seem necessary to rely upon such a
  777. heavyweight solution.
  778. A surprising feature of the implementation is that a significant
  779. amount of the code is language independent. This is achieved by
  780. placing all of the process introspection, data collection, and
  781. platform specific code within a centralized core. To provide a
  782. specific scripting language interface, a developer only needs to
  783. supply two things; a table containing symbolic function names where
  784. control can be returned (Table 1), and a handler function in the form
  785. of a callback. As input, this handler receives an exception object as
  786. described in an earlier section. From this, the handler can
  787. raise a scripting language exception in whatever manner is most
  788. appropriate.
  789. Significant portions of the core are also relatively straightforward
  790. to port between different Unix systems. For instance, code to read
  791. ELF object files and stabs debugging data is essentially identical for
  792. Linux and Solaris. In addition, the high-level control logic is
  793. unchanged between platforms. Platform specific differences primarily
  794. arise in the obvious places such as the examination of CPU
  795. registers, manipulation of the process context in the signal handler,
  796. reading virtual memory maps from /proc, and so forth. Additional
  797. changes would also need to be made on systems with different object
  798. file formats such as COFF and DWARF2. To extent that it is possible,
  799. these differences could be hidden by abstraction mechanisms (although
  800. the initial implementation of WAD is weak in this regard and would
  801. benefit from techniques used in more advanced debuggers such as gdb).
  802. Despite these porting issues, the primary requirement for WAD is a fully
  803. functional implementation of SVR4 signal handling that allows for
  804. modifications of the process context.
  805. Due to the heavy dependence on Unix signal handling, process
  806. introspection, and object file formats, it is unlikely that WAD could
  807. be easily ported to non-Unix systems such as Windows. However, it may
  808. be possible to provide a similar capability using advanced features of
  809. Windows structured exception handling \cite{seh}. For instance, structured
  810. exception handlers can be used to catch hardware faults, they can
  811. receive process context information, and they can arrange to take
  812. corrective action much like the signal implementation described here.
  813. \section{Modification of Interpreters?}
  814. A logical question to ask about the implementation of WAD is whether
  815. or not it would make sense to modify existing interpreters to assist
  816. in the recovery process. For instance, instrumenting Python or Tcl with setjmp
  817. functions might simplify the implementation since it would eliminate
  818. issues related to register restoration and finding a suitable return
  819. location.
  820. Although it may be possible to make these changes, there are
  821. several drawbacks to this approach. First, the number of required modifications may be
  822. quite large. For instance, there are well over 50 entry points to
  823. extension code within the implementation of Python. Second, an
  824. extension module may perform callbacks and evaluation of script code.
  825. This means that the call stack would cross back and forth
  826. between languages and that these modifications would have to be made
  827. in a way that allows arbitrary nesting of extension calls. Finally,
  828. instrumenting the code in this manner may introduce a performance
  829. impact--a clearly undesirable side effect considering the infrequent
  830. occurrence of fatal extension errors.
  831. \section{Discussion}
  832. The primary goal of embedded error recovery is to provide an
  833. alternative approach for debugging scripting language extensions.
  834. Although this approach has many benefits, there are a number
  835. drawbacks and issues that must be discussed.
  836. First, like the C {\tt longjmp} function, the error recovery mechanism
  837. does not cleanly unwind the call stack. For C++, this means that
  838. objects allocated on stack will not be finalized (destructors will not
  839. be invoked) and that memory allocated on the heap may be
  840. leaked. Similarly, this could result in open files, sockets, and other
  841. system resources. In a multi-threaded environment,
  842. deadlock may occur if a procedure holds a lock when an error occurs.
  843. In certain cases, the use of signals in WAD may interact adversely with scripting
  844. language signal handling. Since scripting languages ordinarily do not catch signals such as
  845. SIGSEGV, SIGBUS, and SIGABRT, the use of WAD is unlikely to conflict
  846. with any existing signal handling. However, most scripting languages would not
  847. prevent a user from disabling the WAD error recovery mechanism by
  848. simply specifying a new handler for one or more of these signals. In addition, the use of
  849. certain extensions such as the Perl sigtrap module would completely
  850. disable WAD \cite{perl}.
  851. A more difficult signal handling problem arises when thread libraries
  852. are used. These libraries tend to override default signal handling
  853. behavior in a way that defines how signals are delivered to each
  854. thread \cite{thread}. In general, asynchronous signals can be
  855. delivered to any thread within a process. However, this does not
  856. appear to be a problem for WAD since hardware exceptions are delivered
  857. to a signal handler that runs within the same thread in which the
  858. error occurred. Unfortunately, even in this case, personal experience has
  859. shown that certain implementations of user thread libraries (particularly on older versions
  860. of Linux) do not reliably pass
  861. signal context information nor do they universally support advanced
  862. signal operations such as {\tt sigaltstack}. Because of this, WAD may
  863. be incompatible with a crippled implementation of user threads on
  864. these platforms.
  865. A even more subtle problem with threads is that the recovery process
  866. itself is not thread-safe (i.e., it is not possible to concurrently
  867. handle fatal errors occurring in different threads). For most
  868. scripting language extensions, this limitation does not apply due to
  869. strict run-time restrictions that interpreters currently place on
  870. thread support. For instance, even though Python supports threaded
  871. programs, it places a global mutex-lock around the interpreter that
  872. makes it impossible for more than one thread to concurrently execute
  873. within the interpreter at once. A consequence of this restriction is
  874. that extension functions are not interruptible by thread-switching
  875. unless they explicitly release the interpreter lock. Currently, the
  876. behavior of WAD is undefined if extension code releases the lock and
  877. proceeds to generate a fault. In this case, the recovery process may
  878. either cause an exception to be raised in an entirely different
  879. thread or cause execution to violate the interpreter's mutual exclusion
  880. constraint on the interpreter.
  881. In certain cases, errors may result in an unrecoverable crash. For
  882. example, if an application overwrites the heap, it may destroy
  883. critical data structures within the interpreter. Similarly,
  884. destruction of the call stack (via buffer overflow) makes it
  885. impossible for the recovery mechanism to create a stack-trace and
  886. return to the interpreter. More subtle memory management problems
  887. such as double-freeing of heap allocated memory can also cause a system
  888. to fail in a manner that bears little resemblance to actual source
  889. of the problem. Given that WAD lives in the same process as the
  890. faulting application and that such errors may occur, a common
  891. question to ask is to what extent does WAD complicate debugging when it
  892. doesn't work.
  893. To handle potential problems in the implementation of WAD itself,
  894. great care is taken to avoid the use of library functions and
  895. functions that rely on heap allocation (malloc, free, etc.). For
  896. instance, to provide dynamic memory allocation, WAD implements its own
  897. memory allocator using mmap. In addition, signals are disabled
  898. immediately upon entry to the WAD signal handler. Should a fatal
  899. error occur inside WAD, the application will dump core and exit. Since
  900. the resulting core file contains the stack trace of both WAD and the
  901. faulting application, a traditional C debugger can be used to identify
  902. the problem as before. The only difference is that a few additional
  903. stack frames will appear on the traceback.
  904. An application may also fail after the WAD signal handler has completed
  905. execution if memory or stack frames within the interpreter have been
  906. corrupted in a way that prevents proper exception handling. In this case, the
  907. application may fail in a manner that does not represent the original
  908. programming error. It might also cause the WAD signal handler to be
  909. immediately reinvoked with a different process state--causing it to
  910. report information about a different type of failure. To address
  911. these kinds of problems, WAD creates a tracefile {\tt
  912. wadtrace} in the current working directory that contains information
  913. about each error that it has handled. If no recovery was possible, a
  914. programmer can look at this file to obtain all of the stack traces
  915. that were generated.
  916. If an application is experiencing a very serious problem, WAD
  917. does not prevent a standard debugger from being attached to the
  918. process. This is because the debugger overrides the current signal
  919. handling so that it can catch fatal errors. As a result, even if WAD
  920. is loaded, fatal signals are simply redirected to the attached
  921. debugger. Such an approach also allows for more complex debugging
  922. tasks such as single-step execution, breakpoints, and
  923. watchpoints--none of which are easily added to WAD itself.
  924. %
  925. % Add comments about what WAD does in this case?
  926. %
  927. Finally, there are a number of issues that pertain
  928. to the interaction of the recovery mechanism with the interpreter.
  929. For instance, the recovery scheme is unable to return to procedures
  930. that might invoke wrapper functions with conflicting return codes.
  931. This problem manifests itself when the interpreter's virtual
  932. machine is built around a large {\tt switch} statement from which different
  933. types of wrapper functions are called. For example, in Python, certain
  934. internal procedures call a mix of functions where both NULL and -1 are
  935. returned to indicate errors (depending on the function). In this case, there
  936. is no way to specify a proper error return value because there will be
  937. conflicting entries in the WAD return table (although you could compromise and
  938. return the error value for the most common case). The recovery
  939. process is also extremely inefficient due to its heavy reliance on
  940. {\tt mmap}, file I/O, and linear search algorithms for finding symbols
  941. and debugging information. Therefore, WAD would
  942. unsuitable as a more general purpose extension related exception handler.
  943. Despite these limitations, embedded error recovery is still a useful
  944. capability that can be applied to a wide variety of extension related
  945. errors. This is because errors such as failed assertions, bus errors,
  946. and floating point exceptions rarely result in a situation where the
  947. recovery process would be unable to run or the interpreter would
  948. crash. Furthermore, more serious errors such as segmentation faults
  949. are more likely to caused by an uninitialized pointer than a blatant
  950. destruction of the heap or stack.
  951. \section{Related Work}
  952. A huge body of literature is devoted to the topic of exception
  953. handling in various languages and systems. Furthermore, the topic
  954. remains one of active interest in the software community. For
  955. instance, IEEE Transactions on Software Engineering recently devoted
  956. two entire issues to current trends in exception handling
  957. \cite{except1,except2}. Unfortunately, very little of this work seems
  958. to be directly related to mixed compiled-interpreted exception
  959. handling, recovery from fatal signals, and problems pertaining to
  960. mixed-language debugging.
  961. Perhaps the most directly relevant work is that of advanced programming
  962. environments for Common Lisp \cite{lisp}. Not only does CL have a foreign function interface,
  963. debuggers such as gdb have previously been modified to walk the Lisp stack
  964. \cite{ffi,wcl}. Furthermore, certain Lisp development environments have
  965. previously provided a high degree of integration between compiled code and
  966. the Lisp interpreter\cite{gabriel}.
  967. In certain cases, a scripting language module has been used to provide
  968. partial information for fatal signals. For example, the Perl {\tt
  969. sigtrap} module can be used to produce a Perl stack trace when a
  970. problem occurs \cite{perl}. Unfortunately, this module does not
  971. provide any information from the C stack. Similarly, advanced software development
  972. environments such as Microsoft's Visual Studio can automatically launch a C/C++
  973. debugger when an error occurs. Unfortunately, this doesn't provide any information
  974. about the script that was running.
  975. In the area of programming languages, a number of efforts have been made to
  976. map signals to exceptions in the form of asynchronous exception handling
  977. \cite{buhr,ml,haskell}. Unfortunately, this work tends to
  978. concentrate on the problem of handling asynchronous signals related to I/O as opposed
  979. to synchronously generated signals caused by software faults.
  980. With respect to debugging, little work appears to have been done in the area of
  981. mixed compiled-interpreted debugging. Although modern debuggers
  982. certainly try to provide advanced capabilities for debugging within a
  983. single language, they tend to ignore the boundary between languages.
  984. As previously mentioned, debuggers have occasionally been modified to
  985. support other languages such as Common Lisp \cite{wcl}. However, little work appears
  986. to have been done in the context of modern scripting languages. One system of possible interest
  987. in the context of mixed compiled-interpreted debugging is the R$^{n}$
  988. system developed at Rice University in the mid-1980's \cite{carle}. This
  989. system, primarily developed for scientific computing, allowed control
  990. to transparently pass between compiled code and an interpreter.
  991. Furthermore, the system allowed dynamic patching of an executable in
  992. which compiled procedures could be replaced by an interpreted
  993. replacement. Although this system does not directly pertain to the problem of
  994. debugging of scripting language extensions, it is one of the few
  995. examples of a system in which compiled and interpreted code have been
  996. tightly integrated within a debugger.
  997. More recently, a couple of efforts have emerged to that seem to
  998. address certain issues related to mixed-mode debugging of interpreted
  999. and compiled code. PyDebug is a recently developed system that focuses
  1000. on problems related to the management of breakpoints in Python
  1001. extension code \cite{pydebug}. It may also be possible to perform
  1002. mixed-mode debugging of Java and native methods using features of the
  1003. Java Platform Debugger Architecture (JPDA) \cite{jpda}. Mixed-mode
  1004. debugging support for Java may also be supported in advanced debugging systems
  1005. such as ICAT \cite{icat}.
  1006. However, none of these systems appear to have taken the approach of
  1007. converting hardware faults into Java errors or exceptions.
  1008. \section{Future Directions}
  1009. As of this writing, WAD is only an experimental prototype. Because of
  1010. this, there are certainly a wide variety of incremental improvements
  1011. that could be made to support additional platforms and scripting
  1012. languages. In addition, there are a variety of improvements that could be made
  1013. to provide better integration with threads and C++. One could also
  1014. investigate heuristic schemes such as backward stack tracing that might be able
  1015. to recover partial debugging information from corrupted call stacks \cite{debug}.
  1016. A more interesting extension of this work would be to see how the
  1017. exception handling approach of WAD could be incorporated with
  1018. the integrated development environments and script-level debugging
  1019. systems that have already been developed. For instance, it would be interesting
  1020. to see if a graphical debugging front-end such as DDD could be modified
  1021. to handle mixed-language stack traces within the context of a script-level debugger \cite{ddd}.
  1022. It may also be possible to extend the approach taken by WAD to other
  1023. types of extensible systems. For instance, if one were developing a
  1024. new server module for the Apache web-server, it might be possible to redirect fatal
  1025. module errors back to the server in a way that produces a webpage with
  1026. a stack trace \cite{apache}. The exception handling approach may also have
  1027. applicability to situations where compiled code is used to build software
  1028. components that are used as part of a large distributed system.
  1029. \section{Conclusions and Availability}
  1030. This paper has presented a mechanism by which fatal errors such as
  1031. segmentation faults and failed assertions can be handled as scripting
  1032. language exceptions. This approach, which relies upon advanced
  1033. features of Unix signal handling, allows fatal signals to be caught
  1034. and transformed into errors from which interpreters can produce an
  1035. informative cross-language stack trace. In doing so, it provides more
  1036. seamless integration between scripting languages and compiled
  1037. extensions. Furthermore, this has the potential to greatly simplify the
  1038. frustrating task of debugging complicated mixed scripted-compiled
  1039. software.
  1040. The prototype implementation of this system is available at :
  1041. \begin{center}
  1042. {\tt http://systems.cs.uchicago.edu/wad}.
  1043. \end{center}
  1044. \noindent
  1045. Currently, WAD supports Python and Tcl on SPARC Solaris and i386-Linux
  1046. systems. Work to support additional scripting languages and platforms
  1047. is ongoing.
  1048. \section{Acknowledgments}
  1049. Richard Gabriel and Harlan Sexton provided interesting insights
  1050. concerning debugging capabilities in Common Lisp. Stephen Hahn
  1051. provided useful information concerning the low-level details of signal
  1052. handling on Solaris. I would also like to thank the technical
  1053. reviewers and Rob Miller for their useful comments.
  1054. \begin{thebibliography}{99}
  1055. \bibitem{ousterhout} J. K. Ousterhout, {\em Tcl: An Embeddable Command Language},
  1056. Proceedings of the USENIX Association Winter Conference, 1990. p.133-146.
  1057. \bibitem{perl} L. Wall, T. Christiansen, and R. Schwartz, {\em Programming Perl}, 2nd. Ed.
  1058. O'Reilly \& Associates, 1996.
  1059. \bibitem{python} M. Lutz, {\em Programming Python}, O'Reilly \& Associates, 1996.
  1060. \bibitem{guile} Thomas Lord, {\em An Anatomy of Guile, The Interface to
  1061. Tcl/Tk}, USENIX 3rd Annual Tcl/Tk Workshop 1995.
  1062. \bibitem{php} T. Ratschiller and T. Gerken, {\em Web Application Development with PHP 4.0},
  1063. New Riders, 2000.
  1064. \bibitem{ruby} D. Thomas, A. Hunt, {\em Programming Ruby}, Addison-Wesley, 2001.
  1065. \bibitem{swig} D.M. Beazley, {\em SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++}, Proceedings of the 4th USENIX Tcl/Tk Workshop, p. 129-139, July 1996.
  1066. \bibitem{sip} P. Thompson, {\em SIP},\\
  1067. {\tt http://www.thekompany.com/ projects/pykde}.
  1068. \bibitem{pyfort} P.~F.~Dubois, {\em Climate Data Analysis Software}, 8th International Python Conference,
  1069. Arlington, VA., 2000.
  1070. \bibitem{f2py} P. Peterson, J. Martins, and J. Alonso,
  1071. {\em Fortran to Python Interface Generator with an application to Aerospace
  1072. Engineering}, 9th International Python Conference, submitted, 2000.
  1073. \bibitem{advperl} S. Srinivasan, {\em Advanced Perl Programming}, O'Reilly \& Associates, 1997.
  1074. \bibitem{heidrich} Wolfgang Heidrich and Philipp Slusallek, {\em Automatic Generation of Tcl Bindings for C and C++ Libraries.},
  1075. USENIX 3rd Tcl/Tk Workshop, 1995.
  1076. \bibitem{vtk} K. Martin, {\em Automated Wrapping of a C++ Class Library into Tcl},
  1077. USENIX 4th Tcl/Tk Workshop, p. 141-148, 1996.
  1078. \bibitem{gwrap} C. Lee, {\em G-Wrap: A tool for exporting C libraries into Scheme Interpreters},\\
  1079. {\tt http://www.cs.cmu.edu/\~{ }chrislee/
  1080. Software/g-wrap}.
  1081. \bibitem{wrappy} G. Couch, C. Huang, and T. Ferrin, {\em Wrappy :A Python Wrapper
  1082. Generator for C++ Classes}, O'Reilly Open Source Software Convention, 1999.
  1083. \bibitem{ouster1} J. K. Ousterhout, {\em Scripting: Higher-Level Programming for the 21st Century},
  1084. IEEE Computer, Vol 31, No. 3, p. 23-30, 1998.
  1085. \bibitem{gdb} R. Stallman and R. Pesch, {\em Using GDB: A Guide to the GNU Source-Level Debugger}.
  1086. Free Software Foundation and Cygnus Support, Cambridge, MA, 1991.
  1087. \bibitem{swigexcept} D.M. Beazley and P.S. Lomdahl, {\em Feeding a
  1088. Large-scale Physics Application to Python}, 6th International Python
  1089. Conference, co-sponsored by USENIX, p. 21-28, 1997.
  1090. \bibitem{stevens} W. Richard Stevens, {\em UNIX Network Programming: Interprocess Communication, Volume 2}. PTR
  1091. Prentice-Hall, 1998.
  1092. \bibitem{proc} R. Faulkner and R. Gomes, {\em The Process File System and Process Model in UNIX System V}, USENIX Conference Proceedings,
  1093. January 1991.
  1094. \bibitem{elf} J.~R.~Levine, {\em Linkers \& Loaders.} Morgan Kaufmann Publishers, 2000.
  1095. \bibitem{stabs} Free Software Foundation, {\em The ``stabs'' debugging format}. GNU info document.
  1096. \bibitem{prag} M.L. Scott. {\em Programming Language Pragmatics}, Morgan Kaufmann Publishers, 2000.
  1097. \bibitem{sparc} D. Weaver and T. Germond, {\em SPARC Architecture Manual Version 9},
  1098. Prentice-Hall, 1993.
  1099. \bibitem{bfd} S. Chamberlain. {\em libbfd: The Binary File Descriptor Library}. Cygnus Support, bfd version 3.0 edition, April 1991.
  1100. \bibitem{seh} M. Pietrek, {\em A Crash Course on the Depths of Win32 Structured Exception Handling},
  1101. Microsoft Systems Journal, January 1997.
  1102. \bibitem{thread} F. Mueller, {\em A Library Implementation of POSIX Threads Under Unix},
  1103. USENIX Winter Technical Conference, San Diego, CA., p. 29-42, 1993.
  1104. \bibitem{debug} J. B. Rosenberg, {\em How Debuggers Work: Algorithms, Data Structures, and
  1105. Architecture}, John Wiley \& Sons, 1996.
  1106. \bibitem{except1} D.E. Perry, A. Romanovsky, and A. Tripathi, {\em
  1107. Current Trends in Exception Handling-Part I},
  1108. IEEE Transactions on Software Engineering, Vol 26, No. 9, p. 817-819, 2000.
  1109. \bibitem{except2} D.E. Perry, A. Romanovsky, and A. Tripathi, {\em
  1110. Current Trends in Exception Handling-Part II},
  1111. IEEE Transactions on Software Engineering, Vol 26, No. 10, p. 921-922, 2000.
  1112. \bibitem{lisp} G.L. Steele Jr., {\em Common Lisp: The Language, Second Edition}, Digital Press,
  1113. Bedford, MA. 1990.
  1114. \bibitem{gabriel} R. Gabriel, private correspondence.
  1115. \bibitem{ffi} H. Sexton, {\em Foreign Functions and Common Lisp}, in Lisp Pointers, Vol 1, No. 5, 1988.
  1116. \bibitem{wcl} W. Henessey, {\em WCL: Delivering Efficient Common Lisp Applications Under Unix},
  1117. ACM Conference on Lisp and Functional Languages, p. 260-269, 1992.
  1118. \bibitem{buhr} P.A. Buhr and W.Y.R. Mok, {\em Advanced Exception Handling Mechanisms}, IEEE Transactions on Software Engineering,
  1119. Vol. 26, No. 9, p. 820-836, 2000.
  1120. \bibitem{haskell} S. Marlow, S. P. Jones, and A. Moran. {\em
  1121. Asynchronous Exceptions in Haskell.} In 4th International Workshop on
  1122. High-Level Concurrent Languages, September 2000.
  1123. \bibitem{ml} J. H. Reppy, {\em Asynchronous Signals in Standard ML}. Technical Report TR90-1144,
  1124. Cornell University, Computer Science Department, 1990.
  1125. \bibitem{carle} A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren,
  1126. {\em A Practical Environment for Scientific Programming.}
  1127. IEEE Computer, Vol 20, No. 11, p. 75-89, 1987.
  1128. \bibitem{pydebug} P. Stoltz, {\em PyDebug, a New Application for Integrated
  1129. Debugging of Python with C and Fortran Extensions}, O'Reilly Open Source Software Convention,
  1130. San Diego, 2001 (to appear).
  1131. \bibitem{jpda} Sun Microsystems, {\em Java Platform Debugger Architecture},
  1132. http://java.sun.com/products/jpda
  1133. \bibitem{icat} IBM, {\em ICAT Debugger}, \\
  1134. http://techsupport.services.ibm.com/icat.
  1135. \bibitem{ddd} A. Zeller, {\em Visual Debugging with DDD}, Dr. Dobb's Journal, March, 2001.
  1136. \bibitem{apache} {\em Apache HTTP Server Project}, \\
  1137. {\tt http://httpd.apache.org/}
  1138. \end{thebibliography}
  1139. \end{document}