PageRenderTime 38ms CodeModel.GetById 11ms app.highlight 17ms RepoModel.GetById 1ms app.codeStats 1ms

/doc/report/impl.tex

http://github.com/hhughes/ocaml-frui
LaTeX | 364 lines | 294 code | 70 blank | 0 comment | 0 complexity | b0bd6a4edfc1db8ee9a7b890fedd7e81 MD5 | raw file
  1\chapter{Implementation}
  2The preparation section looked at the tools that are going to be used in implementing this project. This section describes a series of web applications that will form the project to help achieve its goals (section \ref{lab:goals}) and will investigate whether \emph{ocamljs} and \emph{froc} are useful tools for constructing web applications.
  3
  4\section{Log Viewer (Application 1)}
  5\emph{Design a system using \emph{ocamljs} and \emph{froc} which could replace the logging module in a program to display the messages in a more helpful way}.
  6
  7The messages from each thread need to be separated out so that the control flow of each thread can be followed. It would also be useful to be able to compare the progress of threads with all the others. Therefore the application should display some kind of time line which shows threads as progress bars and shows any debugging messages as points on this time line. Each thread can occupy a single row so it can be seen relative to the other threads. The control will also show when the thread enters and exits functions.
  8
  9In addition to the time line some other widgets will be added (since the purpose of this application is to demonstrate and evaluate the \emph{ocamljs} and \emph{froc} technologies). These will be a pie chart that displays the proportions of types of messages received from the server and a word cloud style widget that shows the most popular words mentioned in debugging messages and their relative proportions.
 10
 11The next section will look at how to retrieve these messages in the web application.
 12
 13\subsection{JSON}
 14JavaScript Object Notation (JSON) is a lightweight format for exchanging data. It is human readable and easy to generate and parse (especially with JavaScript). For this reason the server will use a JSON format to send messages requested by the web application. JSON takes the same format as objects in JavaScript so parsing a JSON string to an object is simply a case of passing it into the \texttt{eval} function which runs the JavaScript interpreter on the string~\cite{bib:json}.
 15
 16\subsubsection{JSON and ocamljs}
 17Although parsing JSON is straightforward in JavaScript, the same cannot be said about OCaml. In JavaScript, objects are collections of key-value pairs, usually implemented as a hash table to provide a fast look-up: the keys are strings and the values can be any object. New key-value pairs can be added at run-time by setting unused field values; if an unused key is referenced then the null object is returned~\cite{bib:crock_js}. This would not pass the OCaml type checker because these hash tables do not have a static type -- their type changes every time a field is added or replaced with a different type of object. As such, we have to declare the final type for our message object in OCaml before compile time. This can be done using classes and an OCaml feature called \emph{external functions}.
 18
 19\subsubsection{External Functions}
 20This feature was added to OCaml because sometimes it is helpful to use C libraries from within the OCaml program. C is a far more popular programming language than OCaml. It is older, more supported and a lot of software is written using it. As a result there are a lot of libraries that would be useful for an OCaml program that have already been implemented in C. It sometimes isn't worth reimplementing the library in OCaml, especially if the library causes \emph{unsafe} operations that cannot be written in OCaml. Communication between OCaml and C is achieved using \emph{external} function declarations~\cite{bib:ocaml}. External functions have the following syntax:
 21
 22\begin{center}
 23\texttt{external caml\_name : type = "C\_name"}
 24\end{center}
 25
 26There are special prefixes for the \texttt{C\_name} string which are handled by the \emph{ocamljs} compiler to provide JavaScript field accessors and method calls. Figure \ref{external} shows a table listing these.
 27
 28With this, an external function can be created which will apply a string to the \texttt{eval} function and return the JavaScript object, wrapped by a dummy OCaml class. For example, in this case the class has type \texttt{msg} so the type of the external function for converting a JSON string to a msg is \texttt{string -> msg}. The code for this method is as follows:
 29
 30\begin{center}
 31\texttt{external parse\_json : string -> msg = "@eval"}
 32\end{center}
 33
 34In order to access fields we use the \emph{.} prefix for the external function and pass in the JavaScript object. OCaml has no way of checking if these types will be correct at run-time so compilation will succeed even if they are wrong. \label{lab:json-pitfall}A pitfall discovered using this is that JSON uses strings to represent floating point numbers, if the external function is typed \texttt{msg -> float} this will pass the type checker. OCaml will treat the value in memory as a float when it is really a string, causing undetermined errors at run-time.
 35
 36\subsubsection{JavaScript Arrays}
 37It is unlikely there will be just one message ready each time the server is queried. Therefore it would be sensible to store a queue of messages at the server and flush the whole queue at once. As a result, the JSON will contain an array of objects. \emph{ocamljs} has a module called \texttt{Javascript}\label{lab:javascript}. This provides wrappers for the common functions and objects built into JavaScript. The \texttt{js\_array} class represents JavaScript array objects. This class is polymorphic so each object in the array must be of a given type\footnote{JavaScript actually supports arrays of different types of objects so we have to be careful that each object in the array we are parsing is of the correct type or this could cause errors.}. If we are parsing a JSON array of \texttt{msg} objects using the type \texttt{string -> msg js\_array}, the external function should return an object which represents the array.
 38
 39The OCaml standard library has helper functions for handling OCaml lists so having a \texttt{js\_array} object is not very useful. A \texttt{js\_array} object can be converted into a \texttt{list} in \texttt{$O(n)$} time with the following function:
 40\vfill\pagebreak
 41\begin{lstlisting}[caption={Converting a \texttt{js\_array} to an OCaml \texttt{list}}]
 42let rec js_array_to_list xs =
 43  if xs#_get_length > 0 then
 44    xs#pop :: (js_array_to_list xs)
 45  else []
 46\end{lstlisting}
 47
 48\begin{figure}
 49  \centering
 50  \begin{tabular}{|l|l|}
 51    \hline
 52    \textbf{prefix} & \textbf{compiles to}\\ \hline
 53    \texttt{\#} & method call\\ \hline
 54    \texttt{.} & read property\\ \hline
 55    \texttt{=} & assign property\\ \hline
 56    \texttt{@} & call built-in function\\ \hline
 57  \end{tabular}
 58  \caption{\emph{ocamljs} external function prefixes}
 59  \label{external}
 60\end{figure}
 61
 62\subsection{HTTP, AJAX and JQuery}
 63In order to request data from the server an HTTP call is required. There are two common types of HTTP call: \emph{GET} and \emph{POST}. \emph{GET} is a simple method which the client uses to request a data item from a server. The data requested is denoted by the URI address (Uniform Resource Identifier). \emph{POST} is used by the client to send data to a server. In this application the client is requesting messages from the server so \emph{GET} is the appropriate call to use~\cite{bib:http}.
 64
 65\emph{JQuery}\footnote{\url{http://jquery.com/}} is a JavaScript library that provides, among other things, AJAX interactions. In order to perform the HTTP GET to get new messages we use the AJAX \texttt{get} method. JQuery provides a simple interface for this and \emph{ocamljs} provides an interface to the JQuery library, much like the \texttt{Javascript} module mentioned in section \ref{lab:javascript}. The \texttt{JQuery} module has a \texttt{get} method with the following interface:
 66
 67\texttt{method get : string -> 'a -> ('b -> string -> unit)\\ -> Dom.xMLHttpRequest}
 68
 69This method takes a URL (string), a set of parameters and a callback function, returning a HTTP Request object. Here the important parts are the URL and the callback. The callback is a function which takes the resulting data and response string as its parameters. The response will be \emph{success} if the request was completed.
 70
 71\subsection{HTTP server}
 72There needs to be an HTTP server running on a remote machine to provide the application web page and the data requested by the client. Both the page and the data must come from the same server because AJAX requests must be made to the same domain and port. If not the method throws a cross-site scripting exception. This is built into JavaScript to because it is a security risk~\cite{bib:xss}.\label{lab:xss}
 73
 74The web server this project shall use is called \emph{ocaml-cohttpserver}\footnote{\url{https://github.com/avsm/ocaml-cohttpserver}}. It is an OCaml library which parses HTTP requests and responds to them. The reason for using this implementation is that it is written in OCaml and therefore an OCaml program can be written which generates these debugging messages.
 75
 76\subsubsection{Static Pages}
 77One issue with this server is that it does not provide a way to serve static files (such as HTML pages). The OCaml standard library provides a way of opening files so we can open a file and put its contents into the body of the HTTP reply. The code for serving up a file is as follows:
 78
 79\begin{lstlisting}[caption={Serve a file from disk}]
 80let get_file file req = 
 81  let size = (Unix.stat file).Unix.st_size in
 82  let fd = Unix.openfile file [Unix.O_RDONLY] 0o444 in
 83  let ic = Lwt_io.of_unix_fd
 84    ~close:(fun () -> Unix.close fd; Lwt.return ())
 85    ~mode:Lwt_io.input fd in
 86  let t,u = Lwt.wait () in
 87  let body = [`Inchan (Int64.of_int size,  ic, u)] in
 88  return (dyn req body)
 89\end{lstlisting}
 90
 91\subsubsection{Log Data}
 92The server will also create some dummy data to test the Log Viewer. In order to make the data look interesting, there will be a state machine that will start and end threads, create fake messages and function enters and exits. Each state machine represents a new thread in the \emph{application} (the fake one which is being debugged). Each time the state machine progresses to the next state it generates a log message for that new state. These messages are then given to the client. The state of each thread is changed each time the client requests more messages. Figure \ref{fig:state} shows the state diagram for each state machine. The code for the state machine is displayed below:
 93
 94\begin{lstlisting}[caption={state machine and \texttt{get\_events} function from thread\_state.ml}]
 95...
 96type thread_state = Started | Running | FunEnter
 97                  | FunExit | Msg | Finished | Stop
 98...
 99
100class thread =
101object (self)
102
103...
104
105  (* don't let us stop when in a function *)
106  method enterext = if in_fun then FunExit else FunEnter
107  method stopexit = if in_fun then FunExit else Finished 
108  method next_state p = function (* state machine *)
109  | Started -> if p < 80 then Running
110          else if p < 90 then self#enterext else Msg
111  | Running -> if p < 40 then Running
112          else if p < 60 then self#enterext
113          else if p < 90 then Msg else self#stopexit
114  | FunEnter -> if p < 50 then Running
115           else if p < 70 then FunExit else Msg
116  | FunExit ->  if p < 40 then Running
117           else if p < 50 then self#enterext else
118                if p < 80 then Msg else self#stopexit
119  | Msg -> if p < 60 then Running
120      else if p < 80 then self#enterext
121      else if p < 90 then Msg else self#stopexit
122  | Finished -> Stop
123  | Stop -> Stop
124
125...
126
127end
128
129...
130
131let threads = ref []
132let get_events () = (* fetch next events *)
133  threads := List.filter
134    (fun t -> t#state <> Stop) !threads;
135  if (Random.int 100) < 10 then
136    threads := (new thread) :: !threads;
137  let events = List.fold_right
138     (fun t -> t#next_event) !threads [] in
139  Json_io.string_of_json (Events.jsonify events)
140\end{lstlisting}
141
142\begin{figure}
143  \centering
144  \includegraphics[width=10cm]{graphs/state.png}
145  \caption{State machine diagram for test data (\emph{f} means in a function, \emph{!f} is not in a function)}
146  \label{fig:state}
147\end{figure}
148
149\subsubsection{JSON generation}
150Before a list of messages can be sent back to the client the list needs to be serialised into a JSON string. The OCaml module \emph{json-wheel} provides helper functions for this. It has a collection of methods which convert different OCaml types into JSON strings. This string can then be sent to the client in the body of the HTTP reply~\cite{bib:json_rfc}.
151
152\subsection{froc}
153The messages are being displayed on a time line. As newer messages arrive the amount of time represented by the time line will increase. When this happens all the objects representing messages will have to realign so that they are in the correct place on the time line. The placement function for each element can be \emph{bound} (see section \ref{lab:behavior}) to a \emph{froc} \emph{behavior} which represents the time line. Whenever this \emph{behavior} is updated the placement function will be called for each element and they will realign themselves.
154
155\subsubsection{froc-lists}
156\label{lab:froc-list}
157There is also a further use of \emph{froc} in this application. Ideally all the currently displayed messages would be stored in a special list which was \emph{bound} such that whenever the contents of the list changed elements were created for new messages (and destroyed for old ones) and all the bindings for each element in the list were applied to new ones.
158
159Unfortunately, in OCaml, list objects are immutable (like most objects in functional languages). This means that appending to a list actually creates a new list object, likewise when removing an element from the head of a list. If we used the data-type \texttt{'a list behavior} then when the \emph{behavior} is changed the callback function is run over the whole list again. We only want to run this on new elements, we do not want to do unnecessary work or create duplicate elements.
160
161To solve this issue a new OCaml data type needs to be created. It uses two lists: one for the front elements and one for the back elements to allow appending to both ends of the list. It holds internally a list of functions which become bound to any new elements that are added to the list, new functions are bound to every existing list element. There are push and pop methods to add/remove elements and a method which returns the data structure as an OCaml list.
162\vfill\pagebreak
163The interface for the \emph{froc-list} should be as follows:
164
165\begin{lstlisting}[caption={flist.mli}]
166class ['a] flist :
167  object
168    val mutable first : 'a Froc.behavior list
169    val mutable fs : ('a -> unit) list
170    val mutable last : 'a Froc.behavior list
171    method lift : ('a -> unit) -> unit
172    method lift_all : 'a Froc.behavior -> unit
173    method list : 'a Froc.behavior list
174    method pop : 'a Froc.behavior
175    method pop_end : 'a Froc.behavior
176    method push : 'a Froc.behavior -> unit
177    method push_end : 'a Froc.behavior -> unit
178  end
179\end{lstlisting}
180
181Here is the code for \emph{flist.ml}:
182
183\begin{lstlisting}[caption={flist.ml}]
184class ['a] flist =
185object (self)
186  val mutable first = []
187  val mutable last = []
188  val mutable fs = []
189  method lift (f : 'a -> unit) = 
190    begin
191      let l o = ignore (Froc.lift f o) in
192      fs <- f :: fs;
193      List.iter l first;
194      List.iter l (List.rev last)
195    end
196  method lift_all o = (* internal *)
197    begin
198      let l f = ignore (Froc.lift f o) in
199      List.iter l fs
200    end
201  method list = List.rev_append
202    (List.rev first) (List.rev last)
203  method push o =
204    begin
205      self#lift_all o;
206      first <- o :: first
207    end
208  method push_end o =
209    begin
210      self#lift_all o;
211      last <- o :: last
212    end
213  method pop =
214    begin
215      let hd = List.hd first in
216      first <- List.tl first;
217      hd
218    end
219  method pop_end =
220    begin
221      let hd = List.hd last in
222      last <- List.tl last;
223      hd
224    end
225end
226\end{lstlisting}
227
228Figure \ref{fig:flist-comp} shows the complexity of each of the froc-list methods. Pushing to and popping from the list is computationally inexpensive. Lifts occur in $O(n)$ but are rare. The method \texttt{list} has the same complexity, however, it is called much more often (every time the data-structure needs to be read). Also the standard library function \texttt{List.rev} is not \emph{tail recursive}. Tail recursion means that while the function is recursive, it calls itself multiple times, it uses constant stack space. Not being tail recursive will use a $O(n)$ sized stack. This could potentially be a limiting factor in the number of items the \emph{froc-list} can store.
229
230\begin{figure}
231  \centering
232  \begin{tabular}{|l|l|}
233    \hline
234    \textbf{Function} & \textbf{Complexity} \\
235    \hline
236      lift & $O(n)$ \\
237      \hline
238      list & $O(n)$\\
239      \hline
240      push/push\_end & $O(1)$ \\
241      \hline
242      pop/pop\_end & $O(1)$ \\
243      \hline
244  \end{tabular}
245  \caption{Time Complexity for \emph{froc-list} methods}
246  \label{fig:flist-comp}
247\end{figure}
248
249\subsubsection{Controlling the Time Range}
250Displaying all the messages at once quickly becomes impractical; messages are positioned so close together it gets difficult to read each individual one. A useful feature would be a way to zoom in on a particular time range. All the message elements are bound to the minimum and maximum time range displayed. Providing a way to update these values should be sufficient to perform the zooming.
251
252One such user interface element that can be used for this is a \emph{spinner}. A spinner provides a numerical input box and two buttons to increment and decrement the current value. The value of the spinner can be bound to one of the time range boundaries so two spinners can control the time range displayed. Any messages whose timestamps are outside the currently displayed range can be hidden. The value shown in the spinner can also be bound to the value for the time ranges so that when new messages arrive they display the values for the range. Figure \ref{spinners} shows the dependency graph for the spinners. In this graph, the value \emph{t0} depends on \emph{min} and \emph{min} depends on \emph{t0}. This cycle will not cause a problem in \emph{froc} because it does not propagate updates when the value has not changed. Usually with cycles, each keeps on updating the other with the same value which will never terminate.
253
254\begin{figure}
255  \centering
256  \includegraphics[scale=0.5]{graphs/spinners.png}
257  \caption{Dependency Graph for Spinners}
258  \label{spinners}
259\end{figure}
260
261\begin{figure}
262  \centering
263  \includegraphics[scale=0.75]{images/visualiser.pdf}
264  \caption{Screen-shot of Log Viewer control}
265  \label{fig:visualiser}
266\end{figure}
267
268\subsection{Pie Chart}
269Rendering the thread view for the Log Viewer requires using HTML \emph{DIV} elements. These are given position and size values and are rendered by the browser. A pie chart cannot be drawn by DIVs which are rectangular. In order to draw arbitrary shapes we need to use the HTML \emph{CANVAS} element. 
270
271\subsubsection{HTML CANVAS Element}
272The CANVAS element is new to \emph{HTML5}. It provides a bitmap image and functions to draw 2D shapes and lines. It is much more flexible than drawing using HTML DIVs however removing objects from the scene requires erasing and redrawing whole sections. The functions for drawing and filling arcs and paths can be used to draw the outline and pieces of the pie chart~\cite{bib:html5}.
273
274\subsubsection{Counting Message Types}
275It would not be very efficient to iterate through all messages counting how many of each type there are every time the pie needs to be redrawn. Instead a data type can be used which maps the names of message types to values representing how many of those messages have been seen. When a new message arrives, its type is read and the appropriate counter is incremented. \emph{froc} can be used to bind the value for each counter to the method which redraws the pie chart and then this will happen whenever the counter values are changed.
276
277The size of each slice of pie represents the proportion of that type of message.
278
279\begin{figure}
280  \centering
281  \includegraphics[scale=0.75]{images/pie.pdf}
282  \caption{Screen-shot of pie chart control}
283  \label{fig:pie}
284\end{figure}
285
286\subsection{Word Cloud}
287The word cloud is similar to the pie chart. Instead of counting objects in predefined categories (such as message type) it counts occurrences of an unbounded set of words. There is no limit to the number of objects counted. The data structure to use in this case is a hash table where the word that is being counted is the key and the number of past occurrences is the value. When a new word appears and the look-up in the hash table fails, a new entry is added. This algorithm runs in \texttt{$O(1)$}, however, the space requirements are \texttt{$O(n)$} and the hash table object will require \texttt{$O(n)$} time each time it grows.
288
289To represent the data the words are drawn on a canvas with the font size (in pixels) set to the relative proportion the word is used.
290
291\begin{figure}
292  \centering
293  \includegraphics[scale=0.75]{images/cloud.pdf}
294  \caption{Screen-shot of word cloud control}
295  \label{fig:cloud}
296\end{figure}
297
298\section{Dataset Graph (Application 2)}
299\label{lab:ds-graph}
300\emph{This application will show some data with three variables, one on each axis and one that is varied using a \emph{play} function}.
301
302Hans Rosling's \emph{The Joy of Stats}\cite{bib:stats} contains a good example of a graph with these properties. It compares Life Expectancy vs GDP (Gross Domestic Product) all against time. The graph itself will be straightforward to draw; DIV elements can be used to represent each point. \emph{froc} can be used to position each div as the time variable changes value.
303
304\subsubsection{Data Sources}
305\emph{The World Bank} has yearly GDP and life expectancy data for most countries from 1960 to 2010. Their website provides an HTTP interface and the data can be retrieved in JSON format. This data can be used to recreate a graph similar to that used in \emph{The Joy of Stats}.
306
307It is a good idea to cache all the data required on the machine that hosts the HTTP server. This saves repeatedly making the same requests to The World Bank servers, saves on bandwidth and avoids the cross-site scripting problem mentioned in section \ref{lab:xss}. There are 245 pages of this data so a shell script was used to make all the data requests. This script is shown below.
308
309\lstset{language=bash}
310\begin{lstlisting}
311#!/bin/sh
312if [ "$1" = "gdp" ]
313then
314  OUT="gdp"
315  URL="http://api.worldbank.org\
316/countries/all/indicators/NY.GDP.MKTP.CD"
317else
318  OUT="life"
319  URL="http://api.worldbank.org\
320/countries/all/indicators/SP.DYN.LE00.IN"
321fi
322
323for n in $(seq 1 245)
324  do wget -O ${OUT}-${n}.json \
325${URL}?format=json&per_page=50&page=${n} && sleep 5
326done
327\end{lstlisting}
328\lstset{language=caml}
329
330\subsubsection{Loading Data}
331All the data is requested from the server when the application is loaded. There is too much data to be put into a single file (there is a lot of bloat) because it is longer than the maximum size string the JavaScript \emph{eval} function will accept. This is why the data is requested over 245 pages. The data is then stored in a series of hash tables, one for each variable. The hash tables map the year to a list of key-value pairs. Here the key is the country ID and the value is the value for that country in that year. Figure \ref{fig:hashtbl} illustrates how data is stored in this application.
332
333\begin{figure}
334\centering
335\includegraphics{images/hashtbl.pdf}
336\caption{Data storage in the Dataset Graph}
337\label{fig:hashtbl}
338\end{figure}
339
340\emph{froc} will be used to move the data points around. Two \emph{behavior}s will be needed for each point, one from each axis. There is a global \emph{behavior} which represents the current time displayed. When the time variable is updated, this triggers a function which loads the data for the new year. This updates the values for each of the data point \emph{behavior}s. These \emph{behavior}s are stored in a hash table indexed by country ID.
341
342Look up for hash tables should be $O(1)$. This means that loading the data and repositioning the data points when the year changes should be $O(n)$ (there are $n$ new data points and updating a \emph{froc} variable will be $O(1)$).
343
344\begin{figure}
345  \centering
346  \includegraphics[scale=0.5]{images/graph.pdf}
347  \caption{Screen-shot of data-set graph control}
348  \label{fig:graph}
349\end{figure}
350
351\section{Heat Map (Application 3)}
352\emph{Create an application which displays data about energy usage over time for a number of rooms in a building}.
353
354A heat map can be used to show the relative amounts of energy usage in different rooms at a given time. The higher the value for that room the \emph{hotter} the colour of that part of the map. A heat map is a graph where the X and Y axes represent physical location. This will have a similar implementation to the Dataset Graph in section \ref{lab:ds-graph}.
355
356\subsection{Rendering the Map}
357One difference from the graph application is that the data points are no longer points but instead rooms which have arbitrary shapes. These could be drawn using either a canvas element or multiple DIVs. It is unlikely to make a difference which is used. Since these are just aesthetic features the \emph{heat} of a room will be represented as a coloured DIV (square) positioned over the middle of the room. The layout of the building can be displayed by putting an IMAGE element (of the build plan) on the page\footnote{Map images from \url{http://www.cl.cam.ac.uk/maps/}}. This application uses effectively the same data structures as the Dataset Graph, it binds a function which colours the DIV elements to the keys of a hash table of \emph{froc} \emph{behavior}s.
358
359\begin{figure}
360  \centering
361  \includegraphics[scale=0.75]{images/map.pdf}
362  \caption{Screen-shot of Heat Map control}
363  \label{fig:map}
364\end{figure}