/doc/report/impl.tex
LaTeX | 364 lines | 294 code | 70 blank | 0 comment | 0 complexity | b0bd6a4edfc1db8ee9a7b890fedd7e81 MD5 | raw file
1\chapter{Implementation} 2The preparation section looked at the tools that are going to be used in implementing this project. This section describes a series of web applications that will form the project to help achieve its goals (section \ref{lab:goals}) and will investigate whether \emph{ocamljs} and \emph{froc} are useful tools for constructing web applications. 3 4\section{Log Viewer (Application 1)} 5\emph{Design a system using \emph{ocamljs} and \emph{froc} which could replace the logging module in a program to display the messages in a more helpful way}. 6 7The messages from each thread need to be separated out so that the control flow of each thread can be followed. It would also be useful to be able to compare the progress of threads with all the others. Therefore the application should display some kind of time line which shows threads as progress bars and shows any debugging messages as points on this time line. Each thread can occupy a single row so it can be seen relative to the other threads. The control will also show when the thread enters and exits functions. 8 9In addition to the time line some other widgets will be added (since the purpose of this application is to demonstrate and evaluate the \emph{ocamljs} and \emph{froc} technologies). These will be a pie chart that displays the proportions of types of messages received from the server and a word cloud style widget that shows the most popular words mentioned in debugging messages and their relative proportions. 10 11The next section will look at how to retrieve these messages in the web application. 12 13\subsection{JSON} 14JavaScript Object Notation (JSON) is a lightweight format for exchanging data. It is human readable and easy to generate and parse (especially with JavaScript). For this reason the server will use a JSON format to send messages requested by the web application. JSON takes the same format as objects in JavaScript so parsing a JSON string to an object is simply a case of passing it into the \texttt{eval} function which runs the JavaScript interpreter on the string~\cite{bib:json}. 15 16\subsubsection{JSON and ocamljs} 17Although parsing JSON is straightforward in JavaScript, the same cannot be said about OCaml. In JavaScript, objects are collections of key-value pairs, usually implemented as a hash table to provide a fast look-up: the keys are strings and the values can be any object. New key-value pairs can be added at run-time by setting unused field values; if an unused key is referenced then the null object is returned~\cite{bib:crock_js}. This would not pass the OCaml type checker because these hash tables do not have a static type -- their type changes every time a field is added or replaced with a different type of object. As such, we have to declare the final type for our message object in OCaml before compile time. This can be done using classes and an OCaml feature called \emph{external functions}. 18 19\subsubsection{External Functions} 20This feature was added to OCaml because sometimes it is helpful to use C libraries from within the OCaml program. C is a far more popular programming language than OCaml. It is older, more supported and a lot of software is written using it. As a result there are a lot of libraries that would be useful for an OCaml program that have already been implemented in C. It sometimes isn't worth reimplementing the library in OCaml, especially if the library causes \emph{unsafe} operations that cannot be written in OCaml. Communication between OCaml and C is achieved using \emph{external} function declarations~\cite{bib:ocaml}. External functions have the following syntax: 21 22\begin{center} 23\texttt{external caml\_name : type = "C\_name"} 24\end{center} 25 26There are special prefixes for the \texttt{C\_name} string which are handled by the \emph{ocamljs} compiler to provide JavaScript field accessors and method calls. Figure \ref{external} shows a table listing these. 27 28With this, an external function can be created which will apply a string to the \texttt{eval} function and return the JavaScript object, wrapped by a dummy OCaml class. For example, in this case the class has type \texttt{msg} so the type of the external function for converting a JSON string to a msg is \texttt{string -> msg}. The code for this method is as follows: 29 30\begin{center} 31\texttt{external parse\_json : string -> msg = "@eval"} 32\end{center} 33 34In order to access fields we use the \emph{.} prefix for the external function and pass in the JavaScript object. OCaml has no way of checking if these types will be correct at run-time so compilation will succeed even if they are wrong. \label{lab:json-pitfall}A pitfall discovered using this is that JSON uses strings to represent floating point numbers, if the external function is typed \texttt{msg -> float} this will pass the type checker. OCaml will treat the value in memory as a float when it is really a string, causing undetermined errors at run-time. 35 36\subsubsection{JavaScript Arrays} 37It is unlikely there will be just one message ready each time the server is queried. Therefore it would be sensible to store a queue of messages at the server and flush the whole queue at once. As a result, the JSON will contain an array of objects. \emph{ocamljs} has a module called \texttt{Javascript}\label{lab:javascript}. This provides wrappers for the common functions and objects built into JavaScript. The \texttt{js\_array} class represents JavaScript array objects. This class is polymorphic so each object in the array must be of a given type\footnote{JavaScript actually supports arrays of different types of objects so we have to be careful that each object in the array we are parsing is of the correct type or this could cause errors.}. If we are parsing a JSON array of \texttt{msg} objects using the type \texttt{string -> msg js\_array}, the external function should return an object which represents the array. 38 39The OCaml standard library has helper functions for handling OCaml lists so having a \texttt{js\_array} object is not very useful. A \texttt{js\_array} object can be converted into a \texttt{list} in \texttt{$O(n)$} time with the following function: 40\vfill\pagebreak 41\begin{lstlisting}[caption={Converting a \texttt{js\_array} to an OCaml \texttt{list}}] 42let rec js_array_to_list xs = 43 if xs#_get_length > 0 then 44 xs#pop :: (js_array_to_list xs) 45 else [] 46\end{lstlisting} 47 48\begin{figure} 49 \centering 50 \begin{tabular}{|l|l|} 51 \hline 52 \textbf{prefix} & \textbf{compiles to}\\ \hline 53 \texttt{\#} & method call\\ \hline 54 \texttt{.} & read property\\ \hline 55 \texttt{=} & assign property\\ \hline 56 \texttt{@} & call built-in function\\ \hline 57 \end{tabular} 58 \caption{\emph{ocamljs} external function prefixes} 59 \label{external} 60\end{figure} 61 62\subsection{HTTP, AJAX and JQuery} 63In order to request data from the server an HTTP call is required. There are two common types of HTTP call: \emph{GET} and \emph{POST}. \emph{GET} is a simple method which the client uses to request a data item from a server. The data requested is denoted by the URI address (Uniform Resource Identifier). \emph{POST} is used by the client to send data to a server. In this application the client is requesting messages from the server so \emph{GET} is the appropriate call to use~\cite{bib:http}. 64 65\emph{JQuery}\footnote{\url{http://jquery.com/}} is a JavaScript library that provides, among other things, AJAX interactions. In order to perform the HTTP GET to get new messages we use the AJAX \texttt{get} method. JQuery provides a simple interface for this and \emph{ocamljs} provides an interface to the JQuery library, much like the \texttt{Javascript} module mentioned in section \ref{lab:javascript}. The \texttt{JQuery} module has a \texttt{get} method with the following interface: 66 67\texttt{method get : string -> 'a -> ('b -> string -> unit)\\ -> Dom.xMLHttpRequest} 68 69This method takes a URL (string), a set of parameters and a callback function, returning a HTTP Request object. Here the important parts are the URL and the callback. The callback is a function which takes the resulting data and response string as its parameters. The response will be \emph{success} if the request was completed. 70 71\subsection{HTTP server} 72There needs to be an HTTP server running on a remote machine to provide the application web page and the data requested by the client. Both the page and the data must come from the same server because AJAX requests must be made to the same domain and port. If not the method throws a cross-site scripting exception. This is built into JavaScript to because it is a security risk~\cite{bib:xss}.\label{lab:xss} 73 74The web server this project shall use is called \emph{ocaml-cohttpserver}\footnote{\url{https://github.com/avsm/ocaml-cohttpserver}}. It is an OCaml library which parses HTTP requests and responds to them. The reason for using this implementation is that it is written in OCaml and therefore an OCaml program can be written which generates these debugging messages. 75 76\subsubsection{Static Pages} 77One issue with this server is that it does not provide a way to serve static files (such as HTML pages). The OCaml standard library provides a way of opening files so we can open a file and put its contents into the body of the HTTP reply. The code for serving up a file is as follows: 78 79\begin{lstlisting}[caption={Serve a file from disk}] 80let get_file file req = 81 let size = (Unix.stat file).Unix.st_size in 82 let fd = Unix.openfile file [Unix.O_RDONLY] 0o444 in 83 let ic = Lwt_io.of_unix_fd 84 ~close:(fun () -> Unix.close fd; Lwt.return ()) 85 ~mode:Lwt_io.input fd in 86 let t,u = Lwt.wait () in 87 let body = [`Inchan (Int64.of_int size, ic, u)] in 88 return (dyn req body) 89\end{lstlisting} 90 91\subsubsection{Log Data} 92The server will also create some dummy data to test the Log Viewer. In order to make the data look interesting, there will be a state machine that will start and end threads, create fake messages and function enters and exits. Each state machine represents a new thread in the \emph{application} (the fake one which is being debugged). Each time the state machine progresses to the next state it generates a log message for that new state. These messages are then given to the client. The state of each thread is changed each time the client requests more messages. Figure \ref{fig:state} shows the state diagram for each state machine. The code for the state machine is displayed below: 93 94\begin{lstlisting}[caption={state machine and \texttt{get\_events} function from thread\_state.ml}] 95... 96type thread_state = Started | Running | FunEnter 97 | FunExit | Msg | Finished | Stop 98... 99 100class thread = 101object (self) 102 103... 104 105 (* don't let us stop when in a function *) 106 method enterext = if in_fun then FunExit else FunEnter 107 method stopexit = if in_fun then FunExit else Finished 108 method next_state p = function (* state machine *) 109 | Started -> if p < 80 then Running 110 else if p < 90 then self#enterext else Msg 111 | Running -> if p < 40 then Running 112 else if p < 60 then self#enterext 113 else if p < 90 then Msg else self#stopexit 114 | FunEnter -> if p < 50 then Running 115 else if p < 70 then FunExit else Msg 116 | FunExit -> if p < 40 then Running 117 else if p < 50 then self#enterext else 118 if p < 80 then Msg else self#stopexit 119 | Msg -> if p < 60 then Running 120 else if p < 80 then self#enterext 121 else if p < 90 then Msg else self#stopexit 122 | Finished -> Stop 123 | Stop -> Stop 124 125... 126 127end 128 129... 130 131let threads = ref [] 132let get_events () = (* fetch next events *) 133 threads := List.filter 134 (fun t -> t#state <> Stop) !threads; 135 if (Random.int 100) < 10 then 136 threads := (new thread) :: !threads; 137 let events = List.fold_right 138 (fun t -> t#next_event) !threads [] in 139 Json_io.string_of_json (Events.jsonify events) 140\end{lstlisting} 141 142\begin{figure} 143 \centering 144 \includegraphics[width=10cm]{graphs/state.png} 145 \caption{State machine diagram for test data (\emph{f} means in a function, \emph{!f} is not in a function)} 146 \label{fig:state} 147\end{figure} 148 149\subsubsection{JSON generation} 150Before a list of messages can be sent back to the client the list needs to be serialised into a JSON string. The OCaml module \emph{json-wheel} provides helper functions for this. It has a collection of methods which convert different OCaml types into JSON strings. This string can then be sent to the client in the body of the HTTP reply~\cite{bib:json_rfc}. 151 152\subsection{froc} 153The messages are being displayed on a time line. As newer messages arrive the amount of time represented by the time line will increase. When this happens all the objects representing messages will have to realign so that they are in the correct place on the time line. The placement function for each element can be \emph{bound} (see section \ref{lab:behavior}) to a \emph{froc} \emph{behavior} which represents the time line. Whenever this \emph{behavior} is updated the placement function will be called for each element and they will realign themselves. 154 155\subsubsection{froc-lists} 156\label{lab:froc-list} 157There is also a further use of \emph{froc} in this application. Ideally all the currently displayed messages would be stored in a special list which was \emph{bound} such that whenever the contents of the list changed elements were created for new messages (and destroyed for old ones) and all the bindings for each element in the list were applied to new ones. 158 159Unfortunately, in OCaml, list objects are immutable (like most objects in functional languages). This means that appending to a list actually creates a new list object, likewise when removing an element from the head of a list. If we used the data-type \texttt{'a list behavior} then when the \emph{behavior} is changed the callback function is run over the whole list again. We only want to run this on new elements, we do not want to do unnecessary work or create duplicate elements. 160 161To solve this issue a new OCaml data type needs to be created. It uses two lists: one for the front elements and one for the back elements to allow appending to both ends of the list. It holds internally a list of functions which become bound to any new elements that are added to the list, new functions are bound to every existing list element. There are push and pop methods to add/remove elements and a method which returns the data structure as an OCaml list. 162\vfill\pagebreak 163The interface for the \emph{froc-list} should be as follows: 164 165\begin{lstlisting}[caption={flist.mli}] 166class ['a] flist : 167 object 168 val mutable first : 'a Froc.behavior list 169 val mutable fs : ('a -> unit) list 170 val mutable last : 'a Froc.behavior list 171 method lift : ('a -> unit) -> unit 172 method lift_all : 'a Froc.behavior -> unit 173 method list : 'a Froc.behavior list 174 method pop : 'a Froc.behavior 175 method pop_end : 'a Froc.behavior 176 method push : 'a Froc.behavior -> unit 177 method push_end : 'a Froc.behavior -> unit 178 end 179\end{lstlisting} 180 181Here is the code for \emph{flist.ml}: 182 183\begin{lstlisting}[caption={flist.ml}] 184class ['a] flist = 185object (self) 186 val mutable first = [] 187 val mutable last = [] 188 val mutable fs = [] 189 method lift (f : 'a -> unit) = 190 begin 191 let l o = ignore (Froc.lift f o) in 192 fs <- f :: fs; 193 List.iter l first; 194 List.iter l (List.rev last) 195 end 196 method lift_all o = (* internal *) 197 begin 198 let l f = ignore (Froc.lift f o) in 199 List.iter l fs 200 end 201 method list = List.rev_append 202 (List.rev first) (List.rev last) 203 method push o = 204 begin 205 self#lift_all o; 206 first <- o :: first 207 end 208 method push_end o = 209 begin 210 self#lift_all o; 211 last <- o :: last 212 end 213 method pop = 214 begin 215 let hd = List.hd first in 216 first <- List.tl first; 217 hd 218 end 219 method pop_end = 220 begin 221 let hd = List.hd last in 222 last <- List.tl last; 223 hd 224 end 225end 226\end{lstlisting} 227 228Figure \ref{fig:flist-comp} shows the complexity of each of the froc-list methods. Pushing to and popping from the list is computationally inexpensive. Lifts occur in $O(n)$ but are rare. The method \texttt{list} has the same complexity, however, it is called much more often (every time the data-structure needs to be read). Also the standard library function \texttt{List.rev} is not \emph{tail recursive}. Tail recursion means that while the function is recursive, it calls itself multiple times, it uses constant stack space. Not being tail recursive will use a $O(n)$ sized stack. This could potentially be a limiting factor in the number of items the \emph{froc-list} can store. 229 230\begin{figure} 231 \centering 232 \begin{tabular}{|l|l|} 233 \hline 234 \textbf{Function} & \textbf{Complexity} \\ 235 \hline 236 lift & $O(n)$ \\ 237 \hline 238 list & $O(n)$\\ 239 \hline 240 push/push\_end & $O(1)$ \\ 241 \hline 242 pop/pop\_end & $O(1)$ \\ 243 \hline 244 \end{tabular} 245 \caption{Time Complexity for \emph{froc-list} methods} 246 \label{fig:flist-comp} 247\end{figure} 248 249\subsubsection{Controlling the Time Range} 250Displaying all the messages at once quickly becomes impractical; messages are positioned so close together it gets difficult to read each individual one. A useful feature would be a way to zoom in on a particular time range. All the message elements are bound to the minimum and maximum time range displayed. Providing a way to update these values should be sufficient to perform the zooming. 251 252One such user interface element that can be used for this is a \emph{spinner}. A spinner provides a numerical input box and two buttons to increment and decrement the current value. The value of the spinner can be bound to one of the time range boundaries so two spinners can control the time range displayed. Any messages whose timestamps are outside the currently displayed range can be hidden. The value shown in the spinner can also be bound to the value for the time ranges so that when new messages arrive they display the values for the range. Figure \ref{spinners} shows the dependency graph for the spinners. In this graph, the value \emph{t0} depends on \emph{min} and \emph{min} depends on \emph{t0}. This cycle will not cause a problem in \emph{froc} because it does not propagate updates when the value has not changed. Usually with cycles, each keeps on updating the other with the same value which will never terminate. 253 254\begin{figure} 255 \centering 256 \includegraphics[scale=0.5]{graphs/spinners.png} 257 \caption{Dependency Graph for Spinners} 258 \label{spinners} 259\end{figure} 260 261\begin{figure} 262 \centering 263 \includegraphics[scale=0.75]{images/visualiser.pdf} 264 \caption{Screen-shot of Log Viewer control} 265 \label{fig:visualiser} 266\end{figure} 267 268\subsection{Pie Chart} 269Rendering the thread view for the Log Viewer requires using HTML \emph{DIV} elements. These are given position and size values and are rendered by the browser. A pie chart cannot be drawn by DIVs which are rectangular. In order to draw arbitrary shapes we need to use the HTML \emph{CANVAS} element. 270 271\subsubsection{HTML CANVAS Element} 272The CANVAS element is new to \emph{HTML5}. It provides a bitmap image and functions to draw 2D shapes and lines. It is much more flexible than drawing using HTML DIVs however removing objects from the scene requires erasing and redrawing whole sections. The functions for drawing and filling arcs and paths can be used to draw the outline and pieces of the pie chart~\cite{bib:html5}. 273 274\subsubsection{Counting Message Types} 275It would not be very efficient to iterate through all messages counting how many of each type there are every time the pie needs to be redrawn. Instead a data type can be used which maps the names of message types to values representing how many of those messages have been seen. When a new message arrives, its type is read and the appropriate counter is incremented. \emph{froc} can be used to bind the value for each counter to the method which redraws the pie chart and then this will happen whenever the counter values are changed. 276 277The size of each slice of pie represents the proportion of that type of message. 278 279\begin{figure} 280 \centering 281 \includegraphics[scale=0.75]{images/pie.pdf} 282 \caption{Screen-shot of pie chart control} 283 \label{fig:pie} 284\end{figure} 285 286\subsection{Word Cloud} 287The word cloud is similar to the pie chart. Instead of counting objects in predefined categories (such as message type) it counts occurrences of an unbounded set of words. There is no limit to the number of objects counted. The data structure to use in this case is a hash table where the word that is being counted is the key and the number of past occurrences is the value. When a new word appears and the look-up in the hash table fails, a new entry is added. This algorithm runs in \texttt{$O(1)$}, however, the space requirements are \texttt{$O(n)$} and the hash table object will require \texttt{$O(n)$} time each time it grows. 288 289To represent the data the words are drawn on a canvas with the font size (in pixels) set to the relative proportion the word is used. 290 291\begin{figure} 292 \centering 293 \includegraphics[scale=0.75]{images/cloud.pdf} 294 \caption{Screen-shot of word cloud control} 295 \label{fig:cloud} 296\end{figure} 297 298\section{Dataset Graph (Application 2)} 299\label{lab:ds-graph} 300\emph{This application will show some data with three variables, one on each axis and one that is varied using a \emph{play} function}. 301 302Hans Rosling's \emph{The Joy of Stats}\cite{bib:stats} contains a good example of a graph with these properties. It compares Life Expectancy vs GDP (Gross Domestic Product) all against time. The graph itself will be straightforward to draw; DIV elements can be used to represent each point. \emph{froc} can be used to position each div as the time variable changes value. 303 304\subsubsection{Data Sources} 305\emph{The World Bank} has yearly GDP and life expectancy data for most countries from 1960 to 2010. Their website provides an HTTP interface and the data can be retrieved in JSON format. This data can be used to recreate a graph similar to that used in \emph{The Joy of Stats}. 306 307It is a good idea to cache all the data required on the machine that hosts the HTTP server. This saves repeatedly making the same requests to The World Bank servers, saves on bandwidth and avoids the cross-site scripting problem mentioned in section \ref{lab:xss}. There are 245 pages of this data so a shell script was used to make all the data requests. This script is shown below. 308 309\lstset{language=bash} 310\begin{lstlisting} 311#!/bin/sh 312if [ "$1" = "gdp" ] 313then 314 OUT="gdp" 315 URL="http://api.worldbank.org\ 316/countries/all/indicators/NY.GDP.MKTP.CD" 317else 318 OUT="life" 319 URL="http://api.worldbank.org\ 320/countries/all/indicators/SP.DYN.LE00.IN" 321fi 322 323for n in $(seq 1 245) 324 do wget -O ${OUT}-${n}.json \ 325${URL}?format=json&per_page=50&page=${n} && sleep 5 326done 327\end{lstlisting} 328\lstset{language=caml} 329 330\subsubsection{Loading Data} 331All the data is requested from the server when the application is loaded. There is too much data to be put into a single file (there is a lot of bloat) because it is longer than the maximum size string the JavaScript \emph{eval} function will accept. This is why the data is requested over 245 pages. The data is then stored in a series of hash tables, one for each variable. The hash tables map the year to a list of key-value pairs. Here the key is the country ID and the value is the value for that country in that year. Figure \ref{fig:hashtbl} illustrates how data is stored in this application. 332 333\begin{figure} 334\centering 335\includegraphics{images/hashtbl.pdf} 336\caption{Data storage in the Dataset Graph} 337\label{fig:hashtbl} 338\end{figure} 339 340\emph{froc} will be used to move the data points around. Two \emph{behavior}s will be needed for each point, one from each axis. There is a global \emph{behavior} which represents the current time displayed. When the time variable is updated, this triggers a function which loads the data for the new year. This updates the values for each of the data point \emph{behavior}s. These \emph{behavior}s are stored in a hash table indexed by country ID. 341 342Look up for hash tables should be $O(1)$. This means that loading the data and repositioning the data points when the year changes should be $O(n)$ (there are $n$ new data points and updating a \emph{froc} variable will be $O(1)$). 343 344\begin{figure} 345 \centering 346 \includegraphics[scale=0.5]{images/graph.pdf} 347 \caption{Screen-shot of data-set graph control} 348 \label{fig:graph} 349\end{figure} 350 351\section{Heat Map (Application 3)} 352\emph{Create an application which displays data about energy usage over time for a number of rooms in a building}. 353 354A heat map can be used to show the relative amounts of energy usage in different rooms at a given time. The higher the value for that room the \emph{hotter} the colour of that part of the map. A heat map is a graph where the X and Y axes represent physical location. This will have a similar implementation to the Dataset Graph in section \ref{lab:ds-graph}. 355 356\subsection{Rendering the Map} 357One difference from the graph application is that the data points are no longer points but instead rooms which have arbitrary shapes. These could be drawn using either a canvas element or multiple DIVs. It is unlikely to make a difference which is used. Since these are just aesthetic features the \emph{heat} of a room will be represented as a coloured DIV (square) positioned over the middle of the room. The layout of the building can be displayed by putting an IMAGE element (of the build plan) on the page\footnote{Map images from \url{http://www.cl.cam.ac.uk/maps/}}. This application uses effectively the same data structures as the Dataset Graph, it binds a function which colours the DIV elements to the keys of a hash table of \emph{froc} \emph{behavior}s. 358 359\begin{figure} 360 \centering 361 \includegraphics[scale=0.75]{images/map.pdf} 362 \caption{Screen-shot of Heat Map control} 363 \label{fig:map} 364\end{figure}