PageRenderTime 48ms CodeModel.GetById 15ms RepoModel.GetById 0ms app.codeStats 0ms

/Thesis/formal_pbt.tex

http://github.com/JacquesCarette/GenCheck
LaTeX | 254 lines | 200 code | 52 blank | 2 comment | 0 complexity | 8e064be3bd271bef7a66ac6a55c4d2ec MD5 | raw file
  1. %\section{Formally define property-based testing systems}
  2. % for convenience
  3. \newcommand{\hf}{\ensuremath{\mathtt{f}}\xspace}
  4. \newcommand{\Bool}{\ensuremath{\mathtt{Bool}}}
  5. Testing a Haskell function \hf, using properties, consists of four, largely
  6. independent, steps:
  7. \begin{enumerate}
  8. \item construct a property $p$ that \hf should have,
  9. \item generate test cases (valid inputs for \hf),
  10. \item evaluate the property over (a selection of) test cases,
  11. \item report the results.
  12. \end{enumerate}
  13. A \emph{test system} is a framework to perform these steps. As each
  14. step admits a number of useful variations, such a framework is only
  15. useful if it provides a variety of capabilities. To better establish the
  16. requirements for a \pbt, we first refine the definition and scope of each
  17. step.
  18. \subsection{Properties}
  19. Recall that in section~\ref{pbt}, we defined an \emph{implementation} $M$ as a
  20. computable \emph{interpretation} of a specification $S$, where the specification
  21. defines a set of symbols and their semantics as a set of axioms.
  22. The symbols are simply the names of the functions exported from a module.
  23. The purpose of a \pbt is to supply evidence supporting or refuting
  24. the correctness of this implementation.
  25. A \emph{property} $ p: \alpha \ra \Bool$ is the implementation (using $M$) of
  26. a predicate $P : A \ra \boolean$ in the specification. Note
  27. that, as it is an implementation, it is necessarily computable; without
  28. loss of generality, we can restrict properties to be univariate.
  29. It is important to note that there is a concretization step happening here,
  30. not just between predicates and properties, but also between data values:
  31. $\dom{P} \subseteq A$ and $\alpha$. As is usual in computer science,
  32. we will assume that $A$ and $\alpha$ are isomorphic, and will not explicitly
  33. worry about this isomorphism. We can thus talk about $\dom{p}$ as well,
  34. which is a ``subset'' of the values of type $\alpha$.
  35. We will, from now on, talk interchangeably between predicates and properties.
  36. Furthermore, we will make some use of set notation over types; in the context
  37. of PBT, this actually causes no harm since all data we represent will
  38. eventually be concrete, and thus has a Set model.
  39. We say that a property $p$ holds if, over all values in the domain ($\dom{p}$),
  40. the property evaluates to $\True$. Note that on $\alpha \setminus \dom{p}$, it
  41. is not required to hold, nor even be defined. If $p$ is partial, we will call
  42. it a \emph{conditional} property. For these, a \pbt must be careful to not
  43. attempt to evaluate the property outside its domain.
  44. If we have a (total) function $f : \alpha \ra \Bool$ such that
  45. $f a = \True \Leftrightarrow a \in \dom{p}$, we call $f$ a
  46. \emph{characteristic function} for $p$. Such functions are very
  47. useful to deal with partial properties.
  48. The collection of properties implementing the predicates of the specification
  49. will be called the testable specification,
  50. or just the specification where it is clear through context that
  51. the reference is to the properties and not the abstract predicates.
  52. \jacques{I deleted all the ``formal'' definitions, are they were completely
  53. equivalent to what is already above. The above is sufficiently formal.}
  54. \subsection{Test Cases and Test Suites}
  55. For this subsection, fix a predicate $p : \alpha \ra \Bool$.
  56. Recall that a \emph{test case} contains a single value (the \emph{datum}) of
  57. type $\alpha$, and that a \emph{test suite} is a collection of test
  58. cases.
  59. Each test case may include additional information (\emph{meta-datum})
  60. that can be used to improve the evaluation of the test cases or reporting of the results.
  61. A datum will not necessarily be in the property's domain,
  62. in which case the test case can be characterized as \emph{invalid}.
  63. The meta-datum can be arbitrary, but their type must be uniform over a test suite.
  64. Given a test case $\tau : \beta$, we assume that we have a projection function $\datum$
  65. to extract the data.
  66. \begin{df}[Datum Function]
  67. $$\datum(\tau) : \beta \ra \alpha$$
  68. \end{df}
  69. \begin{df}[Valid Test Case]
  70. A test case $\tau$ is \emph{valid} for a
  71. property $p$ if the datum is guaranteed to be in its domain:
  72. $$ \datum(\tau) \in \dom{p} $$
  73. \end{df}
  74. \subsection{Evaluation, Selection}
  75. We first define what happens when evaluating a single test case.
  76. Second, we define some items relating to test suites.
  77. Then we deal with actually running a test suite.
  78. \subsubsection{Verdicts and Results}
  79. The evaluation of a property at a datum does not merely produce a boolean
  80. value. Other data, such as the execution time or system resources used,
  81. can also be tracked as part of the \emph{result} of running a test.
  82. Of course, if the property evaluates to $\True$, we say that the
  83. property holds (is a successful test evaluation), while for $\False$,
  84. a failure. But this is not the only method by which a test can ``fail'':
  85. it is also possible that the test case was not valid, or that the
  86. property evaluation took too long. Lastly, the framework may decide to
  87. not evaluate every test case in a test suite, and this can be reported as well.
  88. In other words, rather than just pass/fail, we define a finer-grained notion
  89. of a \emph{verdict}, which encapsulates the outcome of a test case evaluation.
  90. \begin{df}[Verdict]
  91. A \emph{verdict} describes the result of a test evaluation. We use the label
  92. set $\verdictset = \{ \success, \fail, \nonterm, \invalid, \noteval \}$
  93. as verdicts, and these should be intererpreted as:
  94. \begin{description}
  95. \item[$\success$] the property holds over a valid test case,
  96. \item[$\fail$] the property does not hold over a valid test case,
  97. \item[$\nonterm$] the property evaluation did not terminate in the allotted time,
  98. \item[$\invalid$] the test case was not in the property's domain, so was not evaluated,
  99. \item[$\noteval$] the test case was not evaluated and its validity is unknown.
  100. \end{description}
  101. \end{df}
  102. Note that the verdict of $\nonterm$ can also be used to indicate that an
  103. exception was raised during the evaluation of the test case,
  104. assuming the evaluation function is capable of trapping it.
  105. In general, different evaluation functions used. They generally differ
  106. in what kinds of verdicts they can return. \emph{Unbounded} evaluation
  107. assumes that $\nonterm$ is impossible; \emph{unconditional} evaluation
  108. can be used when $\invalid$ is impossible.
  109. A \emph{result} consists of a test case, the verdict of the evaluation of the
  110. property for that case, and any additional information about the evaluation of
  111. the property at that value. It must be possible to extract the verdict from
  112. a result.
  113. \begin{df}[Verdict Function]
  114. $$\verdict : \gamma \ra \verdictset$$
  115. \noindent where $\gamma$ is a result type.
  116. \end{df}
  117. \subsubsection{Test suites}
  118. A test suite $T$ is a collection of test cases. $T$ can be
  119. considered to be a set of test cases, although we will rarely
  120. implement it that way. For concreteness, we will use $t$ to
  121. denote a \emph{container}, $\beta$ a type of test cases,
  122. and thus $t \beta$ will be the type of a test suite. $t$
  123. will generally be assumed to be a \texttt{Functor} as well
  124. as \texttt{Traversable}.
  125. We can lift definitions previously made on test cases to test suites.
  126. For example, a test suite is \emph{valid}, if all test cases it contains
  127. are valid.
  128. A test suite may be partitioned (and labelled) -- to
  129. assist in prioritizing the evaluation of test cases, and reporting the results.
  130. The labels could be based on the test data, meta-data or
  131. other evaluation information.
  132. \begin{df}[Labelled Partition]
  133. A labelled partition, with label set $\labels$, is an ordered sequence of test suites
  134. $$ \Pi :: \nat \ra (\labels, t \beta)$$
  135. \end{df}
  136. \noindent
  137. The verdict over a test suite is determined by interpreting the collective
  138. verdicts of each result; partial verdicts can also be determined for any part
  139. of the result set.
  140. \subsubsection{Execution of a suite}
  141. While a test suite is any collection of tests, this does not mean that each
  142. of these will actually be run. We can use different methods of choosing
  143. when to stop testing, for example:
  144. \begin{itemize}
  145. \item terminate testing when a fixed number of errors are found (\QC, \SC),
  146. \item terminate after a fixed period of time, or
  147. \item complete all tests.
  148. \end{itemize}
  149. We will naturally say that an execution of a test suite
  150. is \emph{complete} if all cases were evaluated (even if some resulted
  151. in $\invalid$ verdicts); \emph{partial} if some verdicts were $\noteval$;
  152. and \emph{time limited} if some were $\nonterm$. These can be combined.
  153. \subsubsection{Summary Verdicts}
  154. The verdict of a test suite is based on the combined verdicts of each individual result.
  155. We do this by
  156. \begin{itemize}
  157. \item putting an order structure on verdicts:
  158. $$ \fail > \nonterm > \success > \invalid > \noteval$$
  159. \item using the Foldable structure of the container to fold
  160. \item the induced join-semilattice structure (aka join, $\wedge$)
  161. fold the join ($\wedge$)
  162. \end{itemize}
  163. \noindent over the verdicts.
  164. \subsection{Reporting}
  165. \jacques{you were doing the $4$ steps up to now, so logically you should
  166. be talking about reporting at this point.}
  167. A report presents the results of a test suite, in a way useful to the
  168. (human) tester.
  169. Reports provide the verdict of a test and some level of detail about the results,
  170. organizing them in different ways depending on the goal of the report.
  171. Test reports should highlight any failed (or non-terminating) test cases.
  172. Report components should use the generic interface into
  173. the results to get the verdicts and test case values.
  174. Optional evaluation information should be provided by
  175. \emph{specializing} the results data structure, the report component,
  176. and either the test case generator (for test metadata) or
  177. the execution component (for test evaluation information).
  178. \section{Test Systems}
  179. \jacques{Made this into a section, as it is not about the $4$ steps anymore}
  180. A test system provides additional functionality over the $4$ steps described above.
  181. In particular, it can
  182. \begin{enumerate}
  183. \item obtain actual test cases from a generator,
  184. \item schedule the evaluation of test cases,
  185. \item perform evaluation of test cases (with potential early termination),
  186. \item organizes test results,
  187. \item report verdict and optionally a summary and/or details of the results.
  188. \end{enumerate}
  189. \noindent
  190. These components should adhere to a common interface
  191. but tcan be specialized to exchange additional information.
  192. Scheduling refers to providing the order in which test cases in a test suite are evaluated.
  193. Scheduling may be implicit,
  194. such as when applying |map| with an evaluation function over the test suite container.
  195. For example,
  196. \QC, \SC and the other packages reviewed in section \ref{pbt_soft}
  197. generate and evaluate test cases sequentially,
  198. terminating on the first error (by default).
  199. On distributed systems, scheduling would include
  200. the allocation of evaluations to different nodes
  201. and collecting the results into the common result set.
  202. Test \emph{results} should be considered distinct from \emph{reports}, as discussed above.