implementation.tex | searchcode

/report2/implementation.tex

https://github.com/khskrede/mehh
LaTeX | 336 lines | 256 code | 80 blank | 0 comment | 0 complexity | a8139bb782b4f64c5a184e90ab5056a7 MD5 | raw file



\chapter{System description}
\label{chap:impl}

This chapter discusses the implementation of the project with
references to the source tree (figure \ref{fig:overview}).
Note that the source tree has been refactored by E.W. Thomassen
to better fit with other PyPy projects. This subsection discusses how
and where the specific functionality is implemented. Note that there
are a number of problems with the current implementation, and that
these are discussed in chapter \ref{chap:conc}.

\section{Toplevel}

The toplevel of the source tree contains a "readme" file, a folder
called "pyhaskell" where the main interpreter functionality is 
implemented, and a file called "targethaskellstandalone.py" that 
defines the target and entry point for the PyPy translation tool.

\section{The main functionality}

The pyhaskel folder contains 3 main programs. "main.py",
"makegraph.py" and "run\_tests.py".

"main.py" is the entry point of the compilation system. It defines
the target for the RPython translation tool, imports the \emph{builtin}
functionality and the main function begins the interpretation of the
JSCore program.

The program "makegraph.py" contains a JSON parser and dumps the resulting
JSON tokens to a dot file in order to generate a graph using the graphviz
tool.

The "runtests.py" program contains a program that executes all the programs
in the sub folder "test" and then prints the result of the tests.

\begin{sidewaysfigure}
\begin{figure}[H]
\centering
\includegraphics[width=\textheight]{../diags/overview.png}

\caption{Overview of source tree}
\label{fig:overview}

\end{figure}
\end{sidewaysfigure}

The sub folder "interpreter" contains the actual
interpreter code, the parser and the module system implementation. These are
discussed in the following sections.

\subsection*{Interpreter}

The file "haskell.py" in the folder "interpreter" contains the
Haskell-Python interpreter, the base of the compilation system.
This subsection describes the keys of the Haskell-Python implementation.

Haskell-Python consists of a few base classes; 

\begin{itemize}

\item \emph{Symbol} contains a static dictionary of all Symbols. This is 
simply used to compare "names" by their object identity.

\item \emph{HaskellObject} is a base class for all objects handled by the
interpreter.

\item \emph{Value} is a base class for evaluated values.

\item \emph{Constructor} inherits from \emph{Value} and implements an abstract 
base class for constructors with different number of arguments.

\item \emph{ConstructorN} inherits from \emph{Constructor} and is used as the 
base for a number of other Constructor classes created by a function called 
\emph{make\_arg\_subclass}. This function is again called by another function
called \emph{make\_constructor} that creates a Constructor based on the length
of the list it receives as argument. This function (\emph{make\_constructor})
is called by another function \emph{constr} that takes the name-argument (a string)
and looks it up in the list of symbols to create the actual Constructor. So the
actual Constructor is created by a coll to the function \emph{constr(name, *args)}.

\item \emph{AbstractFunction} inherits from \emph{Value} and is the base class for 
the functions of Haskell-Python. 

\item \emph{Function} inherits from AbstractFunction and defines a user-defined 
Haskell-Python function; it contains a name and a list of \emph{Rules}

\item \emph{Rule} is a field of the Function. It consists of a list of patterns and
an expression. If the function is applied and its arguments matches a pattern, the
expression is evaluated.

\item \emph{Substitution} inherits from HaskellObject and is the body of a function 
with its numbered variables substituted by values.

\item \emph{PrimFunction} inherits from \emph{AbstractFunction} and describes a function
implemented at the machine level. A function called \emph{expose\_primitive} is
be used as a python-decorator to create \emph{PrimFunction} objects.

\item \emph{Var} inherits from \emph{HaskellObject} and describes a Haskell variable. 

\item \emph{NumberedVar} inherits from \emph{HaskellObject}. A \emph{Var} is replaced
by a \emph{NumberedVar} by a call object-function \emph{enumerate\_head} inherited from
\emph{HaskellObject} when creating a \emph{Rule}.

\item \emph{Application} inherits from \emph{HaskellObject} and describes an
abstract base class for Haskell-Python function application. Like the constructor,
classes are created by a function for various numbers of arguments.

\item \emph{ApplicationN} inherits from \emph{Application} and is used by the function
\emph{make\_application} to create Application classes with various numbers of arguments.
This function, like \emph{make\_constructor} is used by the function 
\emph{make\_arg\_subclasses} to create Applications with various number of arguments.

\item \emph{Thunk} inherits from \emph{HaskellObject} and represents an unevaluated
function application. 

\item \emph{StackElement} is a base class for the elements of the evaluation stack.

\item \emph{CopyStackElement} inherits from \emph{StackElement} and contains an
\emph{Application}.

\item \emph{UpdateStackElement} inherits from \emph{StackElement} and contains a
\emph{Thunk}. The \emph{Thunk} contained in the object is updated after its 
content has been evaluated.
\end{itemize}

In addition to these classes, the most important functions are;

\begin{itemize}

\item \emph{expose\_primitive} and the following two functions have already been 
mentioned. \emph{expose\_primitive} can be used as a python-decorator.

\item \emph{constr} creates a \emph{Constructor} object by looking up the "name" in
the dictionary of the \emph{Symbol} class, and selecting the correct sub-class from the
sub-classes generated for \emph{Constructor}.

\item \emph{make\_application} is similar to \emph{make\_constructor}, there is no
need for a wrapper function as it does not have to look up a \emph{Symbol}.

\item \emph{main\_loop} is the main function, it reduces a Haskell-Python 
\emph{Application} to a \emph{Value}. The function \emph{evaluate\_hnf} is
simply a wrapper for the \emph{main\_loop function}
\emph{main\_loop}.

\end{itemize}

This section has described the Haskell-Python implementation in detail, based
on the classes and functions implemented.


\subsection*{Primitive types}

The file "primtypes.py" in the folder "interpreter" contains the primitive types
for the Haskell-Python interpreter. The types implemented inherit from the
\emph{Value} class in "haskell.py".

The following primitive types are implemented;

\begin{itemize}
\item \emph{Char} represents a single Haskell character-value.
\item \emph{Int} represents a Haskell Int value.
\item \emph{Addr} is a memory address. Among other things it is used to represent
the String type.
\item \emph{Double} is simply a double precision floating point value.
\item \emph{Float} is simply a single precision floating point value.
\end{itemize}

Since these classes all inherit from the Value type of Haskell-Python, 
they have a common interface; they all contain a variable called value that
contain its actual contents. This is currently implemented as a python
variable, so it is not represented like its Haskell equivalent at the low-level.
See listing \ref{lst:int1} for an example of how a primitive value is implemented.

\begin{figure}[H]
\lstset{ %
language=Python,
caption=Python class implementing the Haskell Int Value.,
label=lst:int1
}
\begin{lstlisting}
class Int(haskell.Value):
    _immutable_fields_ = ["value"]

    def __init__(self, integer):
        assert isinstance(integer, int)
        self.value = integer

    def match(self, other, subst):
        value = other.getvalue()
        if value:
            assert isinstance(value, Int)
            if self.value == value.value:
                return haskell.DEFINITE_MATCH
            return haskell.NO_MATCH
        return haskell.NEEDS_HNF

    def __eq__(self, other):
        return (isinstance(other, Int) and self.value == other.value)

    def __ne__(self, other):
        return not (self == other)

    def tostr(self):
        return str(self.value)
\end{lstlisting}
\end{figure}


\subsection*{Modules}

The file "module.py" in the folder "interpreter" contains the basics of the
module system. It contains a single class; \emph{CoreMod}. This class corresponds
to a Haskell Module. A \emph{CoreMod} object contains a name, and three dictionaries.
These dictionaries are called \emph{qvars}, \emph{qtycons} and \emph{qdcons} and they
contain \emph{qualified variables}, \emph{qualified type constructors} and 
\emph{qualified data constructors} respectively. When a \emph{CoreMod} object is created
it is at once added to the list of Haskell Modules.

\subsection*{JSCore parser}

The file "jscparser.py" in the "interpreter" folder contains the parser code. 
This code is responsible for loading JSCore files, and creating the AST based 
on the interpreter and module-system implementations.

The parser implementation takes advantage of some of the parsing tools 
available from
the PyPy code base. By writing an EBNF grammar for JSON, and giving it to a 
function called \emph{parse\_ebnf}, a JSON parser is created.

The resulting parser is a base class called \emph{RPythonVisitor}. The 
RPythonVisitor
class implements \emph{visit} functions for the constructs created by the 
EBNF grammar. For JSON this becomes \emph{visit\_object}, 
\emph{visit\_number} etc.

The result of this is that we have a set of functions that get called when 
visiting JSON constructs. Since JSCore is a description of External-Core 
compatible with JSON this lets us parse the Core programs by branching 
on the visitor functions.
This way, the mapping described in chapter \ref{chap:rewrite} is implemented.

\section{Built-in functionality}

The "builtin" folder contains all the Haskell library functionality that
has been implemented in Python. Most of this functionality should have been
stored in the sub folder "ghc\_modules" as JSCore files. However, due to some 
issues that are discussed in conclusions and future work, this is not currently
the case.

As an example, the file "num.py" contains the implementation for the generic
Num class. Listing \ref{lst:zmnum} contains the implementation for the
generic minus function. Currently it only supports the Int type. Note that
it takes three arguments, the first argument is used to resolve which 
function actually implements the minus function, and the other two are the
arguments for this function.

\begin{figure}[H]
\lstset{ %
language=Python,
caption=Python function implementing the generic minus function.,
label=lst:zmnum
}
\begin{lstlisting}
@haskell.expose_primitive(3)
def zm( args ):
    ty = args[0]
    a = args[1]
    b = args[2]

    if ty == mod.qvars["$fNumInt"]:
        return haskell.make_partial_app(izhconstr,
            [haskell.make_partial_app(prim.zmzh, [a.getarg(0), b.getarg(0)])])
    else:
        raise NotImplementedError

mod.qvars["-"] = zm
\end{lstlisting}
\end{figure}

\section{External-Core to JSCore}

The "core" folder contains the Haskell program responsible for generating the
JSCore intermediate format. This program is called "core2js". Currently, the
program uses the External-Core parser from the "extcore" Haskell package and
the JSON package to read the Haskell file in External-Core format and dump it
in JSCore format.

\section{GHC modules}

The folder "ghc\_modules" contain the GHC boot libraries. It also contains a 
script that goes through all these Haskell modules and uses GHC to create External-Core files
using the "-fext-core" flag. Then it uses "core2js" to create JSCore files from the 
resulting External-Core files.

The GHC library files are organized exactly as in the GHC source tree. Each
folder in the "libraries" directory corresponds to a Haskell package. These
packages contain the Haskell modules, e.g. the module "GHC.Tuple" is located
in "ghc-prim/GHC/Tuple.hs".

\section{Tests}

Tests have been designed to test different parts of the interpreter.
The "test" directory contains one subdirectory for each test and the
subdirectories contain a single Haskell file with the same name as the
subdirectory (other files are created and put in the directory when running
the test). The test "fibonacci" is located in the following path:
"test/fibonacci/fibonacci.hs" and after running the test the "fibonacci"
folder should contain the additional files; "factorial.hcr", "factorial.hi",
"factorial.o" and "factorial.hcj". ".hcj" is the extension used for JSCore
files.

The output from running the test script is:

\begin{figure}[H]
\lstset{ %
language=Python,
caption=Output from running the test script,
label=lst:testoutput
}
\begin{lstlisting}
-----------------------
Results (29.96 seconds)
-----------------------
case :  Success
data :  Success
factorial :  Success
fibonacci :  Success
helloworld :  Success
let :  Success
multiply :  Success
partialapp :  Success
\end{lstlisting}
\end{figure}