PageRenderTime 65ms CodeModel.GetById 20ms RepoModel.GetById 0ms app.codeStats 0ms

/paper/yoko-old.tex

https://code.google.com/
LaTeX | 1459 lines | 1207 code | 244 blank | 8 comment | 0 complexity | 56af897c568ca01b8fe8b5938ea8f36a MD5 | raw file
Possible License(s): BSD-3-Clause
  1. \documentclass{sigplanconf}
  2. \input{preamble}
  3. \begin{document}
  4. \maketitle
  5. \section{Introduction}
  6. \todo{page limit = 12}
  7. \todo{distinguish occurrences of ``\ig '' between the approach, the generic
  8. view, and the library}
  9. \todo{Cite \url{http://dreixel.net/research/pdf/fcadgp.pdf}}
  10. Generic programming libraries deliver, in the language, a degree of reuse that
  11. is conventionally attainable only by metalingual capabilities such as special
  12. support from compilers and generative programming. The \ig\ Haskell library,
  13. for example, generically defines a function testing for equality, a function
  14. for printing, and an ``empty'' value, all of which can be instantiated for most
  15. data types with minimal integration effort \citep{instant-generics}. Moreover,
  16. the library user can define their own extensible generic values. The common
  17. trait of these functions is that they can be defined in terms of the type
  18. structure of their domains and ranges.
  19. This paper presents the \yoko\ library, \todo{\ig\ is strictly defined in a way
  20. which forbids extention: it's an approach framework that we're working
  21. in. We're extending its demonstration on Hackage.} which extends the
  22. \ig\ approach both (1) to enable more exact types for generically defined
  23. values and also (2) to generically define functions that require more
  24. structural information than existing generic programming libraries provide.
  25. The foundation of \yoko\ is a generic view \citep{generic-view} called
  26. \emph{sum-of-constructors}, which enhances the common \emph{sum-of-products}
  27. view. While sophisticated use of sum-of-products can almost emulate
  28. sum-of-constructors, sum-of-products falls short in a way that violates
  29. encapsulation of the original data type's representation, thereby sacrificing
  30. both modularity and lucidity. Beyond these essential benefits to software
  31. quality, the enhanced view accomodates a more complete reflection of data type
  32. declarations. For example, \yoko\ thereby enables a generic treatment of
  33. polymorphic variants \citep{poly-variants} and transformation between similar
  34. data types.
  35. \begin{table}[h]
  36. \begin{tabular}{l|cc}
  37. \textbf{Syntactic Sort} & \textbf{\# of Types} & \textbf{\# of Constructors} \\
  38. \hline
  39. Type & 4 & 15\\ % (^Type^, ^TyVarBndr^, ^Kind^, ^Pred^), nominal: (^ForallT^, ^VarT^, ^ConT^, ^ClassP^)
  40. Declaration & 12 & 37\\ % (^Dec^, ^Con^, ^Strict^, ^Foreign, Callconv^, ^Safety^, ^Pragma^, ^InlineSpec^, ^FunDep^, ^FamFlavour^, ^Fixity, FixityDirection^)
  41. Expression & 8 & 41\\ % (^Exp^, ^Match^, ^Clause^, ^Body^, ^Guard^, ^Stmt^, ^Range^, ^Lit^)
  42. Pattern & 1 & 14\\ % ^Pat^
  43. \textbf{Total} & 25 & 107
  44. \end{tabular}
  45. \caption{The Template Haskell version 2.6 AST.\label{tab:th-ast}}
  46. \end{table}
  47. The statistics listed in \tref{Table}{tab:th-ast} for the Template Haskell
  48. version 2.6 AST characterize the burden of each new data type. Each data type
  49. in a Haskell compiler's pipeline would be approximately as large as the
  50. Template Haskell AST. \todo{How many such data types in GHC?} Specifying
  51. functions over such large ASTs is a daunting task that can be significantly
  52. mollified by leveraging generic programming as much as possible.
  53. Existing generic programming libraries such as \ig\ successfully mitigate the
  54. first burden by nearly direct reuse of the generically defined functions for
  55. each of the many types. The second burden, however, has so far remained beyond
  56. the purview of generic programming. The tedious cases of transformations
  57. require constructor-agnostic bookkeeping, structural recursion, and a
  58. correspondence between the similar constructors of adjacent types in the
  59. pipeline. Existing techniques can muster the first two of these, but are
  60. incapable of automatically identifying corresponding constructors except under
  61. prohibitively degenerate circumstances.
  62. By supporting generic transformations between similar types, \yoko\ reduces the
  63. cost of encoding invariants as data types. The key to its support is its
  64. ability to identify corresponding constructors in distinct data types, which is
  65. a reflective capability derived from the sum-of-constructors view. By reducing
  66. the cost of encoding invariants in the type system, the \yoko\ generic
  67. programming library enables higher assurance of compiler correctness.
  68. This paper conveys the following contributions.
  69. \begin{enumerate}
  70. \item Demonstrations of basic insufficiencies of the sum-of-products view as
  71. used in the \ig\ Haskell package.
  72. \item The incremental introduction of \yoko's enhancements redressing those
  73. insufficiencies.
  74. \item A definition of a realistic transformation using \yoko's unique
  75. capabilities. Lambda lifting transforms from an object language with
  76. anonymous functions to an object language without them. In our definition
  77. this transformation, all syntactic constructs that do not directly involve
  78. binding are handled generically.
  79. \item The lambda lifting transformation's generic treatment of its non-nominal
  80. syntactic constructors can be factored out. The resulting transformation has
  81. a purely generic definition using only the \yoko\ API, and thus it can be
  82. directly reused in the definition of other transformations.
  83. \end{enumerate}
  84. \section{Objective}
  85. \label{sec:objective}
  86. Lambda lifting \citep{lambda-lifting} is a transformation in many compiler
  87. pipelines that maps from an object language with anonymous functions to an
  88. object language without them. The declarations listed in
  89. \tref{Figure}{fig:lambda-lift-sig} include the two data types modeling the
  90. terms of each object language as well as the ^lambdaLift^ function. This
  91. section explains the invariants encoded by the two data types and motivates the
  92. generic treatment of most constructors in the definition of lambda lifting.
  93. \begin{figure}[h]
  94. \begin{haskell}
  95. lambdaLift :: AnonTerm -> TopProg
  96. module Common where
  97. -- object language types, including at least arrows
  98. data Type = ...
  99. module AnonymousFunctions where
  100. data AnonTerm = Lam Type AnonTerm | Var Int
  101. | Let [Decl] AnonTerm
  102. | ... -- non-nominal constructors
  103. newtype Decl = Decl Type AnonTerm
  104. module TopLevelFunctions where
  105. data TopTerm = DVar Int | Var Int
  106. | ... -- same non-nominal constructors
  107. type FunDec = ([Type], Type, TopTerm)
  108. data TopProg = Prog [FunDec] TopTerm
  109. \end{haskell}
  110. \caption{The signature of lambda lifting.\label{fig:lambda-lift-sig}}
  111. \end{figure}
  112. \subsection{The Signature of Lambda Lifting}
  113. The ^AnonTerm^ type models the terms of higher-order functional language with
  114. anonymous functions. It includes the ^Lam^ and ^Var^ constructors for de
  115. Bruijn-indexed lambdas and variable occurrences as well as the ^Let^
  116. constructor and ^Decl^ type that models a non-recursive let form with multiple
  117. nested declarations. The ^Let^ constructor incurs mutually recursion with the
  118. ^Decl^ type. For our purposes, the type is also assumed to include any number
  119. of other constructors modeling \emph{non-nominal} syntactic constructs, those
  120. that do not involve binding or variable occurrences. Function application,
  121. tuples, lists, and literal values of base types, for example, are
  122. non-nominal. The ^TopProg^ type, on the other hand, models the terms of a
  123. higher-order functional program in which functions can only be named and
  124. globally declared. Such a program is modeled as a pairing of a
  125. topologically-ordered list of top-level function declarations and a body term;
  126. the declaration bodies and the main body are terms without anonymous functions
  127. as modeled by the ^TopTerm^ type. As the successor of ^AnonTerm^ in the
  128. pipeline of data types, ^TopTerm^ retains most of ^AnonTerm^'s constructors. In
  129. order to encode its invariant, however, it drops the ^Lam^ and ^Let^
  130. constructors. For simplicity, we assume the corresponding constructors of both
  131. ^AnonTerm^ and ^TopTerm^ have the same name (hence the separate modules) and
  132. the same fields after accounting for the substitution of ^TopTerm^ for
  133. ^AnonTerm^.
  134. The ^AnonTerm^ and ^TopProg^ types encode the invariants that the modeled terms
  135. adhere to the respective grammars with and without lambdas. For a Haskell AST,
  136. these are not particularly impressive invariants. They are, however, quite
  137. pragmatic---the presence or absence of anonymous functions is a significant
  138. property to statically guarantee---and require a transformation inexpressible
  139. with existing generic programming techniques.
  140. Among common language features, only nominal ones are essential to lambda
  141. lifting. The nub of the ^lambdaLift^ function's semantics is accordingly
  142. present only in its cases for the ^Lam^, ^Var^, and ^Let^ constructors. As
  143. described in the introduction's second burden of a pipeline with numerous data
  144. types, the lambda lifting of non-nominal constructors merely maintains some
  145. bookkeeping and recurs through subterms, mapping each ^AnonTerm^ constructor to
  146. the obvious counterpart in ^TopTerm^. In this case, the bookkeeping involves
  147. generating the list of top-level declarations corresponding to the lifted
  148. lambdas. This is naturally implemented in
  149. \tref{Section}{sec:lambda-lift-definition} with a writer monad. It is the
  150. automated identification of corresponding constructors which \yoko\ uniquely
  151. enables.
  152. \subsection{Discussion}
  153. The compound first field of ^Let^ and the corresponding mutual recursion makes
  154. ^AnonTerm^ a realistic example.
  155. %% Modern generic programming approaches continue to depend on metalingual
  156. %% capabilities (\eg\ Template Haskell), but only for convenience: the libraries
  157. %% derive their genericity from declarations within the langauge. \todo{TODO} TODO
  158. %% what's the benefit of that? Better integration with other language features?
  159. %% Portability?
  160. \section{\ig\ Background}
  161. \todo{grammar for the representation types}
  162. The \ig\ approach \citep{instant-generics} underlies \yoko's generic
  163. programming capabilities. The approach derives its genericity from two major
  164. Haskell language features: type classes and type families
  165. \citep{type-families}. We demonstrate a slight simplification of the library
  166. with an example type and two generically defined functions in order to set the
  167. stage for \yoko's enhancements. We retrofit our own vocabulary to the concepts
  168. underlying the \ig\ Haskell declarations.
  169. \tref{Figure}{fig:instant-generics} lists the core \ig\ declarations. In this
  170. approach to generic programming, any value with a \emph{generic semantics} is
  171. defined as a method of a type class, called a \emph{generic class}. That
  172. method's generic semantics is declared as instances of the generic class for
  173. each of a small finite set of \emph{representation types}: ^Var^, ^Rec^,
  174. \etc. The ^Rep^ type family maps a data type to its sum-of-products structure
  175. \citep{sum-of-products} as encoded with those representation types. A
  176. corresponding instance of the ^Representable^ class converts between a type and
  177. its ^Rep^ structure. Via this conversion, an instance of a generic class for a
  178. data type can delegate to the generic definition by invoking the method on the
  179. type's structure. Instances of a generic class, however, are not required to
  180. rely on the generic semantics: they can use it partially or completely ignore
  181. it.
  182. \subsection{The Sum-of-Products Generic View}
  183. Each representation type models a particular structure in the declaration of a
  184. data type. The ^Rec^ and ^Var^ types represent occurrences of types in the same
  185. mutually recursive family as the represented type (roughly, its binding group)
  186. and any non-recursive occurrence of other types, respectively. Sums of
  187. constructors are represented by nestings of the higher-order type ^:+:^, and
  188. products of fields are represented similarly by ^:*:^. ^U^ serves as the empty
  189. product. Since an empty sum would represent a data type with no constructors,
  190. its usefulness is questionable. The representation of each constructor is
  191. annotated by means of ^C^ to carry a bit more reflective information in ^C^'s
  192. phantom type parameter. The ^:+:^, ^:*:^, and ^C^ types are all higher-order
  193. representations in that they expect representations as arguments. If Haskell
  194. supported subkinding \citep{promotion}, these parameters would be of a subkind
  195. of ^*^ specific to representation types. Since parameters of ^Var^ and ^Rec^
  196. are not supposed to be representation types; they would have the standard ^*^
  197. kind.
  198. \begin{figure}[h]
  199. \begin{haskell}
  200. type family Rep a
  201. data Var a = Var a
  202. data Rec a = Rec a
  203. data U = U
  204. data a :*: b = a :*: b
  205. data C c a = C a
  206. data a :+: b = L a | R b
  207. class Representable a where
  208. to :: Rep a -> a
  209. from :: a -> Rep a
  210. class Constructor c where
  211. conName :: C c a -> String
  212. \end{haskell}
  213. \caption{Core declarations of the \ig\ library.}
  214. \label{fig:instant-generics}
  215. \end{figure}
  216. Consider a simple abstract syntax for expressions denoting arithmetic sums,
  217. declared as ^Exp^.
  218. \begin{haskell}
  219. data Exp = Const Int | Plus Exp Exp
  220. \end{haskell}
  221. \noindent An instance of the ^Rep^ type family maps ^Exp^ to its structure as
  222. encoded in terms of the representation types.
  223. \begin{haskell}
  224. type instance Rep Exp =
  225. C Const (Var Int) :+: C Plus (Rec Exp :*: Rec Exp)
  226. data Const; data Plus
  227. instance Constructor Const where conName _ = "Const"
  228. instance Constructor Plus where conName _ = "Plus"
  229. \end{haskell}
  230. The ^Const^ and ^Plus^ types are considered auxiliary by \ig, added as an
  231. afterthought to the sum-of-products view in order to define another class of
  232. values generically---^Show^ and ^Read^ in particular. Since \yoko\ uses them in
  233. the same reflective capacity, but to a much greater degree, we designate them
  234. \emph{constructor types}. Each constructor type corresponds directly to a
  235. constructor from the represented data type (much like promotion
  236. \citep{promotion}).
  237. The ^Const^ constructor's field is represented with ^Var^, since ^Int^ is not a
  238. recursive occurrence. The two ^Exp^ occurrences in ^Plus^ are recursive, and so
  239. are represented with ^Rec^. The entire ^Exp^ type is represented as the sum of
  240. its constructors' representations---the products of one and two fields,
  241. respectively---with some further reflective information provided by the ^C^
  242. annotation. The ^Representable^ instance for ^Exp^ follows directly from the
  243. involved signatures.
  244. \begin{haskell}
  245. instance Representable Exp where
  246. from (Const n) = L (C (Var n))
  247. from (Plus e1 e2) = R (C (Rec e1 :*: Rec e2))
  248. to (L (C (Var n))) = Const n
  249. to (R (C (Rec e1 :*: Rec e2))) = Plus e1 e2
  250. \end{haskell}
  251. \subsection{Two Generic Definitions}
  252. We generically define an equality test and the generation of a minimal
  253. value. For equality, we reuse the ^Eq^ class as the generic class.
  254. \begin{haskell}
  255. instance Eq a => Eq (Var a) where
  256. Var x == Var y = x == y
  257. instance Eq a => Eq (Rec a) where
  258. Rec x == Rec y = x == y
  259. instance (Eq a, Eq b) => Eq (a :*: b) where
  260. x1 :*: x2 == y1 :*: y2 = x1 == x2 && y1 == y2
  261. instance Eq U where _ == _ = True
  262. instance (Eq a, Eq b) => Eq (a :*: b) where
  263. L x == L y = x == y
  264. R x == R y = x == y
  265. _ == _ = False
  266. instance Eq a => Eq (C c a) where
  267. C x == C y = x == y
  268. \end{haskell}
  269. \noindent With these instance declarations, ^Eq Exp^ is immediate. As
  270. \citeauthor{fast-and-easy} show, the GHC inliner can be compelled to optimize
  271. away much of the representational overhead.
  272. \begin{haskell}
  273. instance Eq Exp where x == y = from x == from y
  274. \end{haskell}
  275. The method of the ^Empty^ generic class generates a minimal value\footnote{Note
  276. that \inlineHaskell{empty} is not a function. This is why we avoid the term
  277. ``generic function''.}.
  278. \begin{haskell}
  279. class Empty a where empty :: a
  280. instance Empty Int where empty = 0
  281. instance Empty Char where empty = '\NUL'
  282. ...
  283. \end{haskell}
  284. \noindent In the generic definition of ^empty^, it may seem odd to define an
  285. instance for ^Rec^, since recursion seems contrary to minimality. This instance
  286. is ultimately necessary for two reasons. First, in a mutually recursive family
  287. of data types, one sibling might not have any non-recursive fields; to generate
  288. its minimal value requires recursion to reach any sibling that has a
  289. constructor capable of non-recursion. Second, the ^Rec^ instance enables a
  290. reasonable use of ^Empty^ for coinductive data types, in which corecursion is
  291. inevitable.
  292. \begin{haskell}
  293. instance Empty a => Empty (Var a) where
  294. empty = Var empty
  295. instance Empty a => Empty (Rec a) where
  296. empty = Rec empty
  297. instance (Empty a, Empty b) => Empty (a :*: b) where
  298. empty = empty :*: empty
  299. instance Empty U where empty = U
  300. instance Empty a => Empty (C c a) where
  301. empty = C empty
  302. \end{haskell}
  303. The ^Empty^ instance for ^:+:^ must prefer a summand (i.e. a constructor)
  304. capable of non-recursion. In order to do so, the auxiliary class ^HasRec^
  305. provides a means for checking if a representation value involves recursion. The
  306. instances for ^Var^ and ^Rec^ answer the predicate directly; the other
  307. representation types' instances structurally recur.
  308. \begin{haskell}
  309. class HasRec a where hasRec :: a -> Bool
  310. instance HasRec (Var a) where hasRec _ = False
  311. instance HasRec (Rec a) where hasRec _ = True
  312. instance (HasRec a, HasRec b) => HasRec (a :*: b) where
  313. hasRec (a :*: b) = hasRec a || hasRec b
  314. instance HasRec U where hasRec _ = False
  315. instance (HasRec a, HasRec b) => HasRec (a :+: b) where
  316. hasRec (L x) = hasRec x
  317. hasRec (R x) = hasRec x
  318. instance HasRec a => HasRec (C c a) where
  319. hasRec (C x) = hasRec x
  320. \end{haskell}
  321. \noindent The ^Empty^ instance for ^:+:^ uses ^hasRec^ to avoid recursion when
  322. possible. Laziness and the non-strictness of ^hasRec^ for ^Var^, ^Rec^, and ^U^
  323. prevents the use of ^empty^ in the condition from diverging. For coinductive
  324. data types, ^empty^ will generate instead an infinite nesting of the last
  325. constructor, or the cyclic analog for a mutually corecursive family.
  326. \begin{haskell}
  327. instance (HasRec a, Empty a, Empty b
  328. ) => Empty (a :+: b) where
  329. empty = if hasRec lempty then R empty else L lempty
  330. where lempty = empty :: a
  331. \end{haskell}
  332. The ^Empty^ instance for ^Exp^ is a straight-forward delegation to the generic
  333. definition and always yields ^Const 0^.
  334. \begin{haskell}
  335. instance Empty Exp where empty = to empty
  336. \end{haskell}
  337. \begin{note}{Nick}{Make \inlineHaskell{HasRec} static}
  338. Let's not stoop to a dynamic predicate when we don't need to.
  339. \begin{enumerate}
  340. \item Search instead for the first constructor with no ^R^s in its structure.
  341. \item Beware mutual recursion: some types need to traverse a finite number of
  342. ^R^s in order to reach a sibling type that has a constructor with no
  343. recursive fields.
  344. \item There's a couple ways to avoid cycles in that search: keep a list of
  345. visited types (or better yet, typenames, irrespective of type parameters), or
  346. just limit the depth of the search to the size of the sibling set (assuming
  347. it's not infinite, as for nested types)
  348. \item Flourish: choose the path to a non-recursive constructor with the
  349. smallest number of total fields?
  350. \item Lastly, fallback gracefully for necessarily infinite types.
  351. \end{enumerate}
  352. \end{note}
  353. As demonstrated with ^==^ and ^empty^, generic definitions---\ie\ the
  354. corresponding instances for the representation types---provide an easily
  355. invocable default behavior. If that behavior suffices for a representable type,
  356. then instantiation of the method on the type's representation provides a simple
  357. way to define the methods in an instance of a generic class for the
  358. representable type. If a particular type needs a distinct ad-hoc definition of
  359. the method, then that type's instance can use its own specific method
  360. definitions, relying on the default generic definition to a lesser degree or
  361. even not at all.
  362. \section{\yoko's Enhancements}
  363. \label{sec:enhancements}
  364. The \yoko\ library extends \ig\ with three principal enhancements.
  365. \subsection{Representation of Compound Fields}
  366. \label{sec:compound-fields}
  367. For the represention of data types with compound fields (\eg\ containing lists
  368. and tuples), the paucity of the \ig\ representation types precludes precise use
  369. of the ^Rec^ representation type. \todo{\ig\ isn't a view, sum-of-products is.}
  370. Though our semantics for ^Rec^ is determined by recursive occurrences of types,
  371. its semantics in \ig\ allows it to represent any type that contains recursive
  372. occurrences. Other wise, it could not represent types with compound
  373. fields. Consider representing a type such as ^Exp2^, where ^Const2^ optionally
  374. takes another expression as a second argument.
  375. \todo{Can we demonstrate mutual recursion? Perhaps Odd/Even lists?}
  376. \begin{haskell}
  377. data Exp2 = Const2 Int (Maybe Exp2) | Plus2 Exp2 Exp2
  378. data Const2 ; data Plus2
  379. type instance Rep Exp2 =
  380. C Const2 (Var Int :*: Rec (Maybe Exp2)) :+:
  381. C Plus2 (Rec Exp2 :*: Rec Exp2)
  382. \end{haskell}
  383. \noindent Note that the argument to ^Rec^ in the representation of the ^Const2^
  384. constructor must be the entire field, since the representation types include no
  385. means of navigating the components of the field in order to apply ^Rec^ only to
  386. the occurrence of ^Exp2^. The danger of simply enlisting ^Maybe^ as an ad-hoc
  387. representation type to represent the field directly as ^Maybe___(Rec___Exp2)^
  388. is discussed at the end of this subsection.
  389. This imprecise use of ^Rec^ remains predominantly efficacious because the
  390. declaration of instances does provide a means of navigating the composition of
  391. the field. For example, the generic definition of ^Eq^ suffices for ^Exp2^,
  392. since ^Maybe^ has a standard ^Eq^ instance defining equality in terms of its
  393. type parameter (^Eq a^). This would not work if the ^Eq^ instance for ^Rec^
  394. actually depended on the particular semantics of ^Rec^ as representing a
  395. recursive occurrence, as is the case with the ^HasRec^ generic class.
  396. For both ^==^ and ^empty^, the ^Rec^ and ^Var^ instances are exactly the
  397. same. For ^hasRec^, however, the two instances differ in accord with the
  398. essence of the semantics of ^hasRec^. It is therefore sensitive to imprecise
  399. use of ^Rec^; in particular, the result of the call to ^hasRec^ in the ^empty^
  400. method for the ^:+:^ type indicates that the second field of ^Const2^
  401. recurs. Unfortunately, this results in ^empty___::___Exp2^ generating an
  402. infinite nesting of ^Plus2^ instead of the obvious
  403. ^(Const2___0___Nothing)^. Because the representation of that field uses ^Rec^,
  404. the corresponding ^hasRec^ method returns ^True^ without regard for the ^Maybe^
  405. type containing the actual recursive occurrence in that field. In this way, the
  406. imprecise use of the ^Rec^ representation type has compromised the definition
  407. of ^empty^ for ^Exp2^.
  408. A more precise alternative reserves the use of ^Rec^ for the actual recursive
  409. occurrences themselves. In order to do so, \yoko\ enlarges the universe of
  410. types that are representable with precise use of ^Rec^ by extending the
  411. representation types with two types for representing applications of
  412. ^*___->___*^ and ^*___->___*___->___*^ types (\eg\ lists and tuples).
  413. \begin{haskell}
  414. newtype Par1 f c = Par1 (f c)
  415. newtype Par2 ff c d = Par2 (ff c d)
  416. \end{haskell}
  417. \noindent The resulting universe of representable types is still incomplete,
  418. but we suppose it includes significantly more of the data types commonly
  419. declared in Haskell programs, including ^Exp2^. With the new representation
  420. type, the second field of the ^Const2^ constructor now uses the ^Rec^ type only
  421. for the recursive occurrence: ^Par1___(Maybe___(Rec___Exp2))^.
  422. Adding two representation types is a simple enhancement; its only cost is the
  423. corresponding additional instances for each generic class. Indeed, parametric
  424. types such as lists and tuples could themselves be considered representation
  425. types on an ad-hoc basis, but this can lead to ambiguity of its own. The
  426. \ig\ library does not explicitly preclude such ad-hoc representation types, but
  427. its omission of instances of ^HasRec^ for lists and ^Maybe^ seemingly endorses
  428. the use of ^Rec^ to represent compound fields. Precise use of the ^Rec^
  429. representation type enables accordingly more precise use of generic definitions
  430. in \yoko\'s more advanced type programming.
  431. General ^Par1^ instances tends to rely on a corresponding class parameterized
  432. over ^*___->___*^ types, such as the ^Functor^ class. Indeed, the ^Par1^
  433. instance for ^HasRec^ will use the ^Foldable^ class. General instances for the
  434. ^Par2^ type similarly rely on a corresponding class parameterized for
  435. ^*___->___*___->___*^ types. However, in many cases, including ^Eq^ and
  436. ^Empty^, the generic class for parameterized over ^*^ types can be re-used
  437. directly. Otherwise, if no general higher-kinded class exists and the generic
  438. class itself cannot be reused, an ad-hoc higher-kinded class must be declared.
  439. \begin{haskell}
  440. instance Eq (f a) => Eq (Par1 f a) where
  441. Par1 x == Par1 y = x == y
  442. instance Empty (f a) => Empty (Par f a) where
  443. empty = Par1 empty
  444. instance (Foldable f, HasRec a) =>
  445. HasRec (Par1 f a) where
  446. hasRec (Par1 x) = Data.Foldable.any hasRec x
  447. instance Empty (Maybe a) where empty = Nothing
  448. \end{haskell}
  449. \noindent Now constraints over the representation of ^Exp2^ ultimately incur
  450. corresponding constraints over ^Maybe^. As such, the more precise ^hasRec^
  451. yields ^(Const2___0___Nothing)^ for ^Exp2^, as expected.
  452. While it may seem attractive to reuse the ^HasRec^ class for the ^Par1^
  453. instance as with ^Eq^ and ^Empty^, such a ^HasRec___(Par1___f___a)^ instance
  454. with the context ^HasRec___(f___a)^ would be unsound. In a vacuum, it is
  455. obvious that the correct definition of ^hasRec^ for ^Maybe^ is ^const___False^;
  456. the ^Maybe^ type is not recursive. This, in turn, spoils the hypothetical
  457. ^Par1^ instance. For example,
  458. ^hasRec___(Const2___5___(Just___(Const2___...)))^ would incorrectly reduce to
  459. ^False^.
  460. The key insight is that the semantics of ^hasRec^ is entirely dependent on its
  461. type parameter. The unsound ^HasRec^ instance for ^Par1^ changes the head of
  462. the type parameter to ^f^. If ^f^ were a representation type, this would be
  463. fine, because representation types serve as a proxy for the represented nominal
  464. type. However, according to the semantics of ^Par1^, ^f^ is not a
  465. representation type: it is an unrestrained ^*___->___*^ type. Without using
  466. ^Par1^ to delimit the representation, there would be no opportunity to enable
  467. the transition from ^HasRec^ to ^Foldable^. This is why adopting ^Maybe^ as an
  468. ad-hoc representation type is dangerous.
  469. \subsection{Delayed Representation of Constructors}
  470. \label{sec:granularity}
  471. The generic view of data types in \yoko\ delays the representation of
  472. constructors as products of their fields. We motivate this as an enhancement by
  473. exploring a hypothetical application of generic programming to mitigate a large
  474. number of constructors. Generic programming is most obviously useful when
  475. dealing with large data types, rapid prototyping of data types, or both. With
  476. large data types, it is also likely that a minority of a type's constructors
  477. deserve interesting deviation from the generic definition. Because \ig\ only
  478. represents the entire data type, it struggles to delegate to the generic
  479. definition on a per constructor basis.
  480. Consider enhancing the ^Exp^ type's ^Const^ constructor to support various
  481. encodings of numerical constants. The essence of this example is that the
  482. semantics of that constructor no longer corresonds so directly to its
  483. structure.
  484. \begin{haskell}
  485. data Encoding = Base Int | Roman | ...
  486. decode :: Encoding -> String -> Int
  487. data Exp3 = Const3 Encoding String | Plus3 Exp3 Exp3
  488. \end{haskell}
  489. We declare the ^Constants^ generic class with the ^constants^ method for
  490. collecting the values of all numerical constants occuring in a value of a data
  491. type. Again, this use of generic programming is hardly justified for ^Exp3^
  492. itself, but imagine it is a larger data type with numerous
  493. constructors---perhaps one for each arithmetic operation: subtraction,
  494. multiplication, \etc. The generic definition of ^Constants^ merely recurs,
  495. collecting constants from the fields.
  496. \begin{haskell}
  497. class Constants a where constants :: a -> [Int]
  498. instance Constants a => Constants (Var a) where
  499. constants (Var x) = constants x
  500. instance Constants a => Constants (Rec a) where
  501. constants (Rec x) = constants x
  502. instance (Constants a, Constants b) =>
  503. Constants (a :+: b) where
  504. constants (L x) = constants x
  505. constants (R x) = constants x
  506. instance (Constants a, Constants b) =>
  507. Constants (a :*: b) where
  508. constants (x :*: y) = constants x ++ constants y
  509. instance Constants U where constants _ = []
  510. instance Constants a => Constants (C c a) where
  511. constants (C a) = constants a
  512. \end{haskell}
  513. The semantics of the ^Const3^ constructor and the semantics of the ^Constants^
  514. class overlap in such a way that the generic definition is incorrect for that
  515. constructor: the intent is not to count the numerical constants occurring in
  516. the ^Encoding^ and ^String^ fields. Therefore, the ^Constants^ instance for
  517. ^Exp3^ must treat ^Const3^ with ad-hoc behavior and only delegate to the
  518. generic definition for the other constructors.
  519. \begin{haskell}
  520. -- NB speculative: ill-typed
  521. instance Constants Exp3 where
  522. constants (Const3 enc s) = [decode enc s]
  523. constants e = constants (from e)
  524. \end{haskell}
  525. The ill-typedness of this obvious instance is due both to the coarseness of the
  526. sum-of-products interpretation and a limitation of the Haskell type system. The
  527. resulting type error protests that there is no instance of ^Constants^ for
  528. ^Encoding^ or ^String^. These instances are required because ^constants^ is
  529. applied to the entire structure of ^Exp3^, including the representation of
  530. ^Const3^. Unfortunately, the Haskell type system cannot express that ^e^ will
  531. never be constructed with ^Const3^ and that the offending instances would never
  532. actually be invoked at run-time.
  533. Indeed, the standard ^Constants^ semantics for ^Encoding^ and ^String^,
  534. independent of their role in ^Exp3^, would be ^constants _ = []^ since they
  535. contain no expression constants. This would render the ^Constants Exp3^
  536. instance semantically incorrect. Given that this function is being defined
  537. generically only to cope with the hypothetically numerous constructors for
  538. arithmetic operators, for all of which the generic definition of ^constants^
  539. suffices, it is dubious and perhaps even misleading to instantiate ^Constants^
  540. at these two types at all. Moreover, if such instances had been declared and a
  541. new constructor were added to ^Exp3^ that contained a ^String^ or an
  542. ^Encoding^, that constructor would silently adopt the generic definition of
  543. ^Constants^. Therefore, declaring instances of generic classes that are known
  544. to be ultimately unnecessary is considered harmful. \todo{Similar to these
  545. required-but-extraneous instances?
  546. \url{http://hackage.haskell.org/trac/ghc/ticket/5499}}
  547. An alternative instance, still expressable within \ig, avoids generating the
  548. offending constraints on ^Encoding^ and ^String^ by manipulating the
  549. sum-of-products representation directly.
  550. \begin{haskell}
  551. -- NB speculative: valid, but obfuscated & immodular
  552. instance Constants Exp3 where
  553. constants e = case from e of
  554. L (C (Var enc :*: Var s)) -> [decode enc s]
  555. R x -> constants x
  556. \end{haskell}
  557. \noindent Though this instance is well-typed and semantically correct, it is
  558. also severely obfuscated and immodular, because it exposes the encoding of
  559. ^Exp3^'s structure. In particular, it assumes that ^Const3^ is the left summand
  560. of ^Exp3^'s representation. Matters only worsen if the particular structural
  561. encoding happens to bury the constructor(s) of interest deeper in the nestings
  562. of ^:+:^. This definition is especially fragile with respect to changes of the
  563. declaration of ^Exp3^, because the sum is often times expressed as a balanced
  564. nesting of ^:+:^s (for compile-time efficiency); adding a new constructor, no
  565. matter its position in the list of constructors is likely to displace ^Const3^
  566. from the left summand. The definition of ^constants^ is ultimately so
  567. obfuscated because the anonymous product representing ^Const3^ is used
  568. directly, making no indication that ^Const3^ is the constructor of interest. We
  569. consider these detriments to the software quality unacceptable.
  570. Using a slight simplification of the \yoko\ API, the preferred instance is
  571. declared as follows. Since the representation of the type's structure is not
  572. exposed, this instance is precisely as modular as the ill-typed but ideal first
  573. attempt. Furthermore, if the ^disband^, ^project^, and ^reps^ functions and the
  574. ^*_^ naming convention are recognized as pieces of a familiar library API, then
  575. this instance is also nearly as lucid as the first: the special treatment of
  576. ^Const3^ is obvious.
  577. \begin{haskell}
  578. -- derived from Exp3
  579. data Const3_ = Const3_ Encoding String
  580. -- a simplification of the #\yoko# API
  581. goSolo :: Project a sum => sum -> Either a (sum :-: a)
  582. instance Constants Exp3 where
  583. constants e = case project (disband e) of
  584. Left (Const3_ enc s) -> [decode enc s]
  585. Right x -> constants (reps x)
  586. \end{haskell}
  587. \noindent The ^Const3_^ constructor is the sole constructor of a type derived
  588. by \yoko\ from the declaration of the ^Const^ constructor. We designate such
  589. types \emph{fields types}. The ^disband^ function converts a data type to a
  590. nested ^:+:^ sum of its constructors, each represented as a fields type---hence
  591. the \emph{sum-of-constructors} generic view.
  592. The ^project^ function, ^Project^ type class, and ^:-:^ type family are
  593. operations on nested ^:+:^ sums. The ^project^ function uses ^:-:^ to remove a
  594. type from a sum and determine whether a value in the sum was a value of the
  595. removed type or part of the remaining sum, as computed by ^:-:^. The ^reps^
  596. function completes the sum-of-products structural representation by converting
  597. a sum of constructors to a sum of the corresponding products.
  598. The ^Const3_^ type and the ^:-:^ family are the essential components that avoid
  599. the generation of the superfluous ^Constants^ constraints on the ^Encoding^ and
  600. ^String^ types. In particular, the type of ^x^ in the ^Right^ branch is
  601. ^Rep___Exp3___:-:___Const3_^, which expresses to the typechecker the informal
  602. notion of an ^Exp3^ value not constructed by ^Const3^. The remaining
  603. constructors of ^Exp3^ do not incur the offending constraints when delegated to
  604. the generic definition of ^constants^. This instance declaration thereby uses
  605. ^project^ as a form of pattern matching with a more sophisticated type rule
  606. than the Haskell ^case^ expression's. The type of ^project^ locally refines
  607. types in the branches in much the same manner as does supercompilation
  608. \citep{supercompilation}. By delaying the representation of each constructor as
  609. an anonymous product, the sum-of-constructor view grants this style of pattern
  610. matching both robustness against ambiguities and the lucidity of genuine
  611. constructor syntax.
  612. % Metaprogramming with Template Haskell provides an alternative means to
  613. % overcome the immodularity, by transforming a slight obfuscation of the ideal
  614. % instance into the immodular one \citep{hackage-th-gd}.
  615. \todo{Make sure to explain that the Par1 and Par2 types are because they allow Rec to
  616. used exactly.}
  617. The capability of refining a set of types by projecting out certain
  618. constructors is crucial to our goal of automating transformation functions. The
  619. interesting constructors with semantics pertaining to the semantics of the
  620. transformation cannot be mapped automatically. Thus, without a way to remove
  621. them from the generic representation of a data type, no transformation can be
  622. automated. The ^project^ and ^partition^ functions therefore underly automated
  623. transformation by enabling the user to handle the constructors that
  624. \yoko\ cannot transform automatically.
  625. The simplicity of the strict sum-of-products generic view, \ie\ excluding the
  626. ^C^ type, precludes any reflection capabilities requiring non-structural
  627. information. The \ig\ library extends sums and products with the ^C^ type in
  628. order to enable generic definitions of classes like ^Show^ and ^Read^, which
  629. necessarily involve the names of constructors. We consider \yoko's fields types
  630. to be an evolution of the ^C^ type. In fact, the ^C^ constructor, and its
  631. associated constructor types, could underly a functionality similar to the
  632. ^project^ function from the \yoko\ instance for ^Constants Exp3^. Thus, it is
  633. possible to use the ^C^ type instead of fields types to preserve the modularity
  634. of that instance declaration. However, without relying on Template Haskell
  635. metaprogramming, the obfuscation of this non-modular \ig\ instance would
  636. actually be more severe than that of the immodular \ig\ instance from the
  637. previous section; some sort of type ascription is necessary in order to tag the
  638. structural pattern with the desired constructor name. Fields types are a
  639. natural solution that embraces the delayed representation hinted at by the ^C^
  640. representation type's constructor types without involving heavyweight
  641. metaprogramming.
  642. \subsection{Reflection of Constructor Names}
  643. \label{sec:reflecting-names}
  644. \todo{We introduced fields types first because it is convenient to index the
  645. \inlineHaskell{Tag} instances with fields types. However, \inlineHaskell{Tag}
  646. instances can also be indexed by the first parameter of the \inlineHaskell{C}
  647. representation type from the \ig\ generic view: \inlineHaskell{Tag} is
  648. orthogonal to fields types.}
  649. For the constructors that can be automatically transformed, the
  650. sum-of-constructors view still does not reflect enough information. Beyond the
  651. finer grained invocation of the generic definition on a per-constructor basis
  652. demonstrated in the previous section, an improved reflection of constructors in
  653. \yoko\ is also key to its support for generic definition of transformations
  654. between types. The strict sum-of-products view distinguishes such constructors
  655. only by their position in the sum. Reflection of non-structural information is
  656. required in order to individually distinguish multiple constructors within a
  657. type that share the same products structure. For example, the
  658. ^Plus___Exp___Exp^ and ^Mult___Exp___Exp^ constructors both have the same
  659. structure, ^Rec Exp :*: Rec Exp^, and hence require more information to be
  660. distinguished outside of a fixed sum. \yoko\ instead infers the correspondence
  661. of constructors the same way users do: by the constructor name.
  662. The library API includes a type family ^Tag^ for mapping a fields type to a
  663. reflection of its constructor name as a type-level string. In the context of a
  664. compiler pipeline, we suppose it is safe to rely on constructor names being
  665. similar in adjacent data types; it is highly likely that in all of the types
  666. the pipeline, each constructor corresponding to, say, applications will be
  667. similarly named. It might even be the same name if the types are declared in
  668. separate modules. If not, it is likely the same root same, say ^App^, with some
  669. additional suffix.
  670. \todo{For this presentation, write a \inlineHaskell{Letter} data type with a
  671. constructor per character that is allowed to occur in a Haskell
  672. identifier. Define the quadratic instances for the
  673. \inlineHaskell{EqualLetter} type family. Have \inlineHaskell{Tag} yield a
  674. promoted \inlineHaskell{[Letter]}.
  675. \begin{haskell}
  676. -- #\yoko# API
  677. type family Tag dc :: [Letter]
  678. type instance Tag Const_ = [LC, Lo, Ln, Ls, Lt]
  679. type instance Tag Plus_ = [LP, Ll, Lu, Ls]
  680. type instance Tag Const2_ = [LC, Lo, Ln, Ls, Lt, L2]
  681. type instance Tag Plus2_ = [LP, Ll, Lu, Ls, L2]
  682. type instance Tag Const3_ = [LC, Lo, Ln, Ls, Lt, L3]
  683. type instance Tag Plus3_ = [LP, Ll, Lu, Ls, L3]
  684. \end{haskell}
  685. The generic transformation we define in
  686. \tref{Section}{sec:polymorphic-generic-conversion} converts a sum of
  687. constructors to a type under the assumption that each constructor in the sum
  688. occurs in the type with the same name and a the same fields (modulo the
  689. corresponding substitution of recursive occurrences). This transformation will
  690. therefore be able to identify the corresponding cosntructors when converting
  691. from the ^AnonTerm^ type to the ^TopTerm^ type (from
  692. \tref{Section}{sec:objective}).
  693. A generic transformation of constructors from the ^Exp^ type to the ^Exp2^ or
  694. ^Exp3^ type would require a more sophisticated type-level predicate on
  695. strings. Since we are already emulating type-level strings, we only handle the
  696. simpler case of types with constructors with the same name.
  697. Reflection of a data type's set of constructors as well as the name and
  698. structure of each constructor provides an abundance of information for
  699. type-level programming in \yoko. The current Haskell features supporting such
  700. type-level programming are powerful but rudimentary, and so the next two
  701. sections introduce some fundamental machinery. The remaining technical sections
  702. present the full \yoko\ API, define the lambda lifting transformation between
  703. the ^AnonTerm^ and ^TopProg^ types, and then factor out its generic essence
  704. into a generic transformer that can be directly reused for other
  705. transformations.
  706. \todo{subsection for new representation type grammar}
  707. \section{Sets as Sums}
  708. \label{sec:sets-as-sums}
  709. \begin{figure}[h]
  710. \begin{haskell}
  711. -- #\tref{Section}{sec:sets-as-sums}#, Sets as sums
  712. data Void -- empty set
  713. newtype N a = N a -- singleton set
  714. data a :+: b = L a | R b -- set union
  715. -- #\tref{Section}{sec:embed-partition}#, Set operations
  716. type family (:-:) sum sum2
  717. class Embed sub sup where embed :: sub -> sup
  718. inject :: Embed (N a) sum => a -> sum
  719. class Partition sup sub where
  720. partition :: sup -> Either sub (sup :-: sub)
  721. project ::
  722. Partition sum (N a) => sum -> Either a (sum :-: N a)
  723. \end{haskell}
  724. \caption{The API for sets as sums.}
  725. \label{fig:sets-as-sums}
  726. \end{figure}
  727. The \yoko\ library reflects the set of constructors of a data type as a sum of
  728. the corresponding fields types. \tref{Figure}{fig:sets-as-sums} on
  729. page~\pageref*{fig:sets-as-sums} lists the API for sets of types in \yoko. Sets
  730. of types are modeled as sums using nestings of the ^:+:^ type with set elements
  731. delimited by applications of the ^N^ type, which models singleton sets. As an
  732. example, the set of the base types ^Int^, ^Char^, ^Bool^, and ^()^ is
  733. represented by the type ^Bases^, where
  734. ^type___Bases___=___(N___Int___:+:___N___Char)^ ^:+:^
  735. ^(N___Bool___:+:___N___())^.
  736. Without the ^N^ type, certain essential type families cannot be indexed by type
  737. sets, since this would require overlapping type family instances, which are
  738. forbidden by Haskell's type system. The ^C^ representation type from \ig\ could
  739. be used to model singleton sets, but the semantics of its two type parameters
  740. are incompatible with field types.
  741. \subsection{Embedding and Partitioning Sets}
  742. \label{sec:embed-partition}
  743. A value of any type can be injected into a sum modeling a set that contains
  744. that type via the ^inject^ function. For example, any ^Int^, character,
  745. Boolean, or unit can be injected into ^Bases^. A value of any subset of those
  746. types, such as ^N Int :+: N Bool^ can be embedded into ^Bases^ via the ^embed^
  747. function. Similarly, an element type can be projected via the ^project^
  748. function and a set can be partitioned into two sets via the ^partition^
  749. function.
  750. The ^partition^ function uses the ^Either^ type to model the fact that a value
  751. of the ^sup^ type may be an injection of a type not in the ^sub^ type. In that
  752. case, the ^sup^ type is refined to the difference ^sup :-: sub^ since the
  753. evaluation of the ^project^ function has established that its argument is not
  754. an injection of one of the ^sub^ types. The ^:-:^ type is a simple type-level
  755. function defined using two other type families.
  756. \begin{haskell}
  757. type instance (:-:) (N x) sum2 =
  758. If (Elem x sum2) Void (N x)
  759. type instance (:-:) (l :+: r) sum2 =
  760. Combine (l :-: sum2) (r :-: sum2)
  761. \end{haskell}
  762. \noindent The ^Elem^ family implements the obvious membership predicate; it is
  763. further discussed below. The ^Combine^ family maintains the invariant that the
  764. ^Void^ type does not occur as an summand within an otherwise non-empty set; for
  765. example, ^N Int :+: Void^ violates this property.
  766. \begin{haskell}
  767. type family Combine sum sum2
  768. type instance Combine Void x = x
  769. type instance Combine (N x) Void = N x
  770. type instance Combine (N x) (N y) = N x :+: N y
  771. type instance Combine (N x) (l :+: r) =
  772. N x :+: (l :+: r)
  773. type instance Combine (l :+: r) Void = l :+: r
  774. type instance Combine (l :+: r) (N y) =
  775. (l :+: r) :+: N y
  776. type instance Combine (ll :+: rl) (lr :+: rr) =
  777. (ll :+: rl) :+: (lr :+: rr)
  778. \end{haskell}
  779. We choose to maintain this invariant here because the assumption that every
  780. summand is non-degenerate simplifies the definition of the other type set
  781. operations. Because \yoko\ uses type sets to reflect constructors and mutually
  782. recursive types, empty sets are non-sensical. A data type with no constructors
  783. only permits degenerate method definitions, which hardly require
  784. genericity. Similarly, an empty family of mutually recursive types is no type
  785. at all. Thus, any occurrence of ^Void^ outside of intermediate type-level
  786. computations is considered user error. Hence the invariant.
  787. The ^Elem^ family and the instances of ^Embed^ and ^Partition^ are ultimately
  788. predicated upon the ^Locate^ family, which is not exposed to the user. Given a
  789. target type and a sum, ^Locate^ calculates a path through the nested ^:*:^
  790. types of the sum to reach the occurrence of the target type. Even though such a
  791. path might not exist, ^Locate^ has a total defition by structuring its result
  792. in a promotion of the ^Maybe^ type. The ^Elem^ type function simply checks if
  793. its first argument can be located in its second.
  794. \begin{haskell}
  795. data Here a ; data Left x ; data Right x
  796. type family Locate a sum
  797. type instance Locate a Void = Nothing
  798. type instance Locate a (N x) =
  799. If (Equal x a) (Just (Here a)) Nothing
  800. type instance Locate a (l :+: r) =
  801. MaybeMap (Left r) (Locate a l) `MaybePlus1`
  802. MaybeMap (Right l) (Locate a r)
  803. type Elem a sum = IsJust (Locate a sum)
  804. \end{haskell}
  805. The ^Here^, ^Left^, and ^Right^ types are type-level values, composed to model
  806. a path through nested ^:*:^ types. The ^Equal^ type family is explained in
  807. \tref{Section}{sec:scaffolding}. The omitted definitions of ^IsJust^,
  808. ^MapMaybe^, and ^MaybePlus1^ are obvious, excepting the caveat that the
  809. ^MaybePlus1^ family has no instance where both of its indices are constructed
  810. with ^Just^. As a result, a located type cannot occur in the sum model of type
  811. set more than once. In the context of these sets as constructors and sibling
  812. types, this would never happen and would again indicate user error. The
  813. specific semantics of ^MaybePlus1^ grant \yoko\ robustness against this sort of
  814. ambiguity.
  815. We omit the instances of ^Embed^ and ^Partition^. Both are defined via
  816. auxiliary classes that branch according to the results of ^Locate^, and hence
  817. accomodate no ambiguity. This implementation technique avoids the overlapping
  818. instances Haskell language extension, which would require sums to be linearized
  819. in order to define the instances. Our presumption of a type-level type-equality
  820. predicate (explained in \tref{Section}{sec:scaffolding}) is a linchpin of this
  821. technique. The elided instances of ^Embed^ and ^Partition^ include any type set
  822. built with ^N^ and ^:+:^; ^Void^ is not supported.
  823. \section{The \yoko\ Generic View}
  824. \label{sec:yoko-api}
  825. \tref{Figure}{fig:yoko-api} on page~\pageref*{fig:yoko-api} lists the core
  826. declarations of the \yoko\ generic view. We demonstrate its application with a
  827. working example that leads into the lambda lifting example. Without loss of
  828. generality, we concretize the ^AnonTerm^'s underspecified set of non-nominal
  829. constructors. There will be a single non-nominal constructor, ^App^, modelling
  830. application. Working through a concrete type is easer to follow, but the ^App^
  831. constructor is still intended to exemplify an unlimited number of other
  832. non-nominal constructors, all of which would be handled by the lambda lifting
  833. definition as developed in the next section.
  834. \begin{haskell}
  835. module AnonymousFunctions where
  836. data AnonTerm = Lam Type AnonTerm | Var Int
  837. | Let [Decl] AnonTerm
  838. | App AnonTerm AnonTerm -- new
  839. newtype Decl = Decl Type AnonTerm -- no change
  840. \end{haskell}
  841. Each of the following subsections introduce a part of the \yoko\ view and
  842. instantiates it for the ^AnonTerm^ and ^Decl^ types. The presentation proceeds
  843. bottom up, discussing first the representation of fields, then constructors,
  844. and finally whole data types.
  845. \begin{figure}[h]
  846. \begin{haskell}
  847. -- #\tref{Section}{sec:field-representation}#, Representation of fields
  848. data U = U -- empty tuple
  849. data a :*: b = a :*: b -- tuple concatenation
  850. data Dep a = Dep a
  851. data Rec t = Rec t
  852. data Par1 f a = Par1 (f a)
  853. data Par2 ff a b = Par2 (ff a b)
  854. type family Rep dc
  855. class Generic dc where
  856. rep :: dc -> Rep dc
  857. obj :: Rep dc -> dc
  858. -- #\tref{Section}{sec:reflect-dcs}#, Reflection of constructors
  859. type family Tag dc :: String -- see footnote#\footnotemark#
  860. type family Range dc
  861. class (Generic dc, DT (Range dc),
  862. Embed (N dc) (DCs (Range dc))) =>
  863. DC dc where rejoin :: dc -> Range dc
  864. -- #\tref{Section}{sec:reflect-dts}#, Reflection of data types
  865. newtype DCsOf t sum = DCsOf sum
  866. type family DCs t
  867. type Disbanded t = DCsOf t (DCs t)
  868. class DT t where disband :: t -> Disbanded t
  869. goSolo ::
  870. (DT (Range dc), Partition sum (N dc)) =>
  871. DCsOf (Range dc) sum ->
  872. Either dc (DCsOf (Range dc) (sum :- N dc))
  873. \end{haskell}
  874. \caption{The \yoko\ generic view.}
  875. \label{fig:yoko-api}
  876. \end{figure}
  877. \footnotetext{We assume type-level strings for this presentation. The actual
  878. implementation emulates type-level strings as in
  879. \tref{Appendix}{sec:scaffolding}.}
  880. \subsection{Representation of Fields}
  881. \label{sec:field-representation}
  882. The delayed representation of \yoko\ expands the role of \ig's constructor
  883. types. Each fields type represents a constructor independently of its data type
  884. of origin. Reflection of the ^AnonTerm^ and ^Decl^ types uses the five obvious
  885. fields types.
  886. \begin{haskell}
  887. data Lam_ = Lam_ Type AnonTerm
  888. data Var_ = Var_ Int
  889. data Let_ = Let_ [Decl] AnonTerm
  890. data App_ = App_ AnonTerm AnonTerm
  891. data Decl_ = Decl_ Type AnonTerm
  892. \end{haskell}
  893. As an extension of the \ig\ generic view, the \yoko\ generic view largely
  894. inherits its representation types. The basic structural representation types
  895. remain the same, excepting the addition of ^N^ and the renaming of ^Var^ to
  896. ^Dep^. The representation types also include the new ^Par1^ and ^Par2^ types
  897. for the navigation of compound fields. Finally, the ^C^ type is removed in
  898. favor of fields types.
  899. As an extension of the \ig\ approach, the \yoko\ approach centers around the
  900. ^Rep^ type family. The ^Rep^ family in \yoko\ maps constructor types, as
  901. opposed to entire data types in the \ig\ approach, to their structural
  902. representation. Accordingly, sums do not occur in \yoko\ ^Rep^ instances. In
  903. combination with operations on sets of fields types, this allows more granular
  904. invocation generic definitions as demonstrated in
  905. \tref{Section}{sec:granularity}. The constructors of ^AnonTerm^ and ^Decl^ are
  906. represented as follows.
  907. \begin{haskell}
  908. type instance Rep Lam_ = Dep Type :*: Rec AnonTerm
  909. type instance Rep Var_ = Dep Int
  910. type instance Rep Let_ =
  911. Par1 [] (Rec Decl) :*: Rec AnonTerm
  912. type instance Rep App_ = Rec AnonTerm :*: Rec AnonTerm
  913. type instance Rep Decl_ = Dep Type :*: Rec AnonTerm
  914. \end{haskell}
  915. \noindent The \ig\ approach has no difficulty with mutual recursion, and the
  916. ^Par1^ representation type from \tref{Section}{sec:compound-fields} handles the
  917. compound first field of ^Let^.
  918. The methods of the ^Generic^ class implement the isomorphism between a fields
  919. type and its representation. This class has the same semantics as the
  920. ^Representable^ class from \ig. The only distinction is that sums are absent,
  921. since they are never present in the righthand side of ^Rep^
  922. instances. Instances of ^Generic^ follow directly from the corresponding ^Rep^
  923. instances, so we omit them for ^AnonTerm^ and ^Decl^.
  924. \subsection{Reflection of Constructors}
  925. \label{sec:reflect-dcs}
  926. The \yoko\ reflection of constructors is more thorough than the ^Constructor^
  927. class of the \ig\ library. The major enhancements are the software quality
  928. benefits of fields types and the type-level programming enabled by the ^Tag^
  929. type family, as discussed in \tref{Section}{sec:reflecting-names}.
  930. The ^Tag^ family maps a fields type to a type-level reification of the
  931. constructor's name. Generic transformation between two similar types relies
  932. essentially on the reflection of constructor names independently of the type of
  933. origin.
  934. \begin{haskell}
  935. type Tag Var_ = [LV, La, Lr]
  936. type Tag Lam_ = [LL, La, Lm]
  937. type Tag Let_ = [LL, Le, Lt]
  938. type Tag App_ = [LA, Lp, Lp]
  939. type Tag Decl_ = [LD, Le, Lc, Ll]
  940. \end{haskell}
  941. In order to relate a fields type back to its type of origin, \yoko\ declares
  942. the ^Range^ type family. The ^Range^ family pervades the reflective API,
  943. connecting fields types back to their original type.
  944. \begin{haskell}
  945. type Range Lam_ = AnonTerm
  946. type Range Var_ = AnonTerm
  947. type Range Let_ = AnonTerm
  948. type Range App_ = AnonTerm
  949. type Range Decl_ = Decl
  950. \end{haskell}
  951. A type ^dc^ is a genuine fields type only if it is an instance of the ^DC^
  952. class. The super class constraints require that ^dc^ can be converted to a
  953. sum-of-products view, that a ^dc^ value can be injected into its range, and
  954. that its range is a data type satisfying the ^DT^ class, discussed in the next
  955. subsection. The ^rejoin^ method specifies the correspondence between ^dc^ and
  956. its twin constructor in its range type. We do not retain the ^occName^ method
  957. from the \ig\ ^Constructor^ class, since the rest of our developments do not
  958. rely on it. Moreover, with proper support for type-level strings, it could be
  959. defined as the \emph{demotion} of the ^Tag___dc^ type-level string.
  960. \begin{haskell}
  961. instance DC Lam_ where
  962. rejoin (Lam_ ty tm ) = Lam ty tm
  963. instance DC Var_ where
  964. rejoin (Var_ i ) = Var i
  965. instance DC Lam_ where
  966. rejoin (Let_ ds tm ) = Let ds tm
  967. instance DC App_ where
  968. rejoin (App_ tm1 tm2) = App tm1 tm2
  969. instance DC Decl_ where
  970. rejoin (Decl_ ty tm ) = Decl ty tm
  971. \end{haskell}
  972. \subsection{Reflection of Data Types}
  973. \label{sec:reflect-dts}
  974. The only distinction between reflection of data types in \ig\ and in \yoko\ is
  975. the delayed representation of constructors. Instead of mapping an entire data
  976. type to its sum-of-products view in one step, \yoko\ maps a data type to a sum
  977. of its fields types. Since \yoko\ reserves the ^Rep^ family for mapping a
  978. fields type to its structural representation, it declares the new ^DCs^ type
  979. family for specifying the representation of a data type.
  980. \begin{haskell}
  981. type instance DCs AnonTerm =
  982. (N Lam_ :+: N Var_) :+: (N Let_ :+: N App_)
  983. type instance DCs Decl = N Decl_
  984. \end{haskell}
  985. The phantom type parameter of the ^DCsOf^ type tags a sum of constructors with
  986. their shared range type. This declaration itself does not enforce those
  987. semantics, but other functions in \yoko\ use the ^DCsOf^ type under the
  988. assumption of these semantics, sometimes enforcing them. For ^*^ data types,
  989. this rarely provides more than robustness against user error by preventing the
  990. unintended mixing of fields types with disparate ranges within the same the
  991. sum. For higher-kinded types, however, it provides an avenue for inference to
  992. unify the type parameters of polymorphic values, thereby mitigating frequent
  993. ascriptions.
  994. For example, the \yoko\ reflection of lists involves the fields type ^Nil_^, in
  995. which the type parameter is phantom.
  996. \begin{haskell}
  997. data Nil_ a = Nil_
  998. type instance Range (Nil_ a) = [a]
  999. \end{haskell}
  1000. \noindent Since the type-level programming both within \yoko\ as well as
  1001. involved in typical uses of \yoko\ relies on type-equality, the polymorphism of
  1002. ^Nil_^ would often cause an unfortunate avalanche of type errors, were it not
  1003. for the ^DCsOf^ type. While the user could ascribe the intended type parameter,
  1004. it is oftentimes burdensome to do so on account of scoped type
  1005. variables. However, the ubiquity of the ^DCsOf^ type throughout the API
  1006. mitigates this burden. For example, if the ^null_^ function is defined as
  1007. follows, the inferred type is ^[a] -> Bool^.
  1008. \begin{haskell}
  1009. -- null_ :: [a] -> Bool -- inferred via DCsOf
  1010. null_ x = case goSolo (disband x) of
  1011. Left Nil_ -> True
  1012. _ -> False
  1013. \end{haskell}
  1014. If the ^DCsOf^ wrapper were hypothetically not maintained by the ^disband^ and
  1015. ^goSolo^ functions, the resulting inferred type of ^null_^ would actually
  1016. render the function inapplicable\footnote{Indeed, GHC generates an type error
  1017. with the suggestion: ``Probable cause: the inferred type is
  1018. ambiguous''.}. Its inferred type would involve two separate and completely
  1019. unrelated type variables: one for the type parameter of the ^Nil_^ pattern and
  1020. another for the type of ^null_^'s argument.
  1021. \begin{haskell}
  1022. -- without DCsOf,
  1023. -- null_ :: (DT a, Partition (DCs a) (N (Nil_ b))) =>
  1024. -- a -> Bool
  1025. \end{haskell}
  1026. In particular, ^Nil_^'s type parameter would be unreachable in the type of
  1027. ^null_^. The type variable would, however, participate in a ^Partition^
  1028. constraint, rendering it unsatisfiable. This scenario is avoided because the
  1029. types of ^goSolo^ and ^disband^ do use the ^DCsOf^ wrapper to unify the shared
  1030. range type of their arguments. Via ^DCsOf^ tag, the type of ^x^ is unified with
  1031. ^Range (Nil_ a)^, \ie\ ^[a]^. The type parameter of the ^Nil_^ pattern is thus
  1032. propogated through the types of the \yoko\ API elements all the way to the
  1033. argument of ^null_^.
  1034. The ^Disbanded^ type synonym wraps the ^DCsOf^ tag around a type's sum of
  1035. constructors representation. This synonym occurs in the type of the ^DT^ type
  1036. class's ^disband^ method, which specifies how to disband the constructors of a
  1037. data type by converting each constructor to a value in the corresponding sum of
  1038. constructors.
  1039. \begin{haskell}
  1040. instance DT AnonTerm where
  1041. disband (Lam ty tm) = inject $ Lam_ ty tm
  1042. disband (Var i) = inject $ Var_ i
  1043. disband (Let ds tm) = inject $ Let_ ds tm
  1044. disband (App tm1 tm2) = inject $ App_ tm1 tm2
  1045. instance DT Decl where
  1046. disband (Decl ds tm) = inject $ Decl_ ds tm
  1047. \end{haskell}%$
  1048. \noindent The ^disband^ method complements the ^rejoin^ method of the ^DC^
  1049. class, akin to the inverse relationship of the ^rep^ and ^obj^ methods of the
  1050. ^Generic^ class.
  1051. \subsection{Discussion}
  1052. Once the concept of fields types is understood, all of these instances are
  1053. intuitive. They are also straight-forward to generate, and even fields types
  1054. can be generated.
  1055. In the following, we assume that ^TopTerm^ and all of its constructors are
  1056. completely reflected in the \yoko\ generic view.
  1057. \section{The Lambda Lifting Transformation}
  1058. \label{sec:lambda-lift-definition}
  1059. \stub
  1060. \subsection{Monomorphic}
  1061. \begin{figure}[p]
  1062. \begin{haskell}
  1063. type family FindDCs s sum
  1064. type instance FindDCs s (N dc) =
  1065. If (Equal s (Tag dc)) (Just (N dc)) Nothing
  1066. type instance FindDCs s (a :+: b) =
  1067. DistMaybePlus (FindDCs s a) (FindDCs s b)
  1068. -- TODO say something smart here
  1069. class ToExp2 a where toExp2 :: a -> Exp2
  1070. instance ToExp2 sum => ToExp2 (DCsOf t sum) where
  1071. toExp2 = toExp2 . unDCsOf
  1072. instance ToExp2 Exp where
  1073. toExp2 e = case goSolo (disband e) of
  1074. Left (Const_ i) -> Const2 i Nothing
  1075. Right x -> toExp2 x
  1076. -- mapping over the sum
  1077. instance (ToExp2 a, ToExp2 b) => ToExp2 (a :+: b) where
  1078. toExp2 = foldPlus toExp2 toExp2
  1079. instance (Generic dc,
  1080. Just (N dc') ~ FindDCs (Tag dc) (DCs Exp2),
  1081. ToExp2Rs (Rep dc),
  1082. Rep dc' ~ RsToExp2 (Rep dc),
  1083. DC dc', Range dc' ~ Exp2) =>
  1084. ToExp2 (N dc) where
  1085. toExp2 (N x) = rejoin ((obj . toExp2Rs . rep) x :: dc')
  1086. -- mapping into just the recursive fields
  1087. type family RsToExp2 prod
  1088. class ToExp2Rs prod where toExp2Rs :: prod -> RsToExp2 prod
  1089. type instance RsToExp2 (R a) = R Exp2
  1090. instance ToExp2 a => ToExp2Rs (R a) where
  1091. toExp2Rs (R x) = R (toExp2 x)
  1092. type instance RsToExp2 (D a) = D a
  1093. instance ToExp2Rs (D a) where toExp2Rs = id
  1094. type instance RsToExp2 (a :*: b) =
  1095. RsToExp2 a :*: RsToExp2 b
  1096. instance (ToExp2Rs a, ToExp2Rs b) =>
  1097. ToExp2Rs (a :*: b) where
  1098. toExp2Rs = mapTimes toExp2Rs toExp2Rs
  1099. \end{haskell}
  1100. \caption{The lambda lifting transformation.\label{fig:dc-conversion}}
  1101. \end{figure}
  1102. We have not declared instances for ^U^, ^Par1^, nor ^Par2^ because they are not
  1103. required for this lambda lifting example. Since no constructors in the involved
  1104. types have zero fields, ^U^ never occurs. In the instance of ^ToExp2^, the
  1105. ^Let^ constructor is not delegated to the generic definition, so its compound
  1106. field does not generate a ^ToExp2^ constraint involving ^Par1^.
  1107. \subsection{Polymorphic}
  1108. \label{sec:polymorphic-generic-conversion}
  1109. \begin{figure}[h]
  1110. \begin{haskell}
  1111. type family Cnv cnv a
  1112. class Convert cnv a where
  1113. convert :: cnv -> a -> Cnv cnv a
  1114. type instance Cnv cnv (DCsOf t sum) = Cnv cnv t
  1115. instance ConvertTo cnv sum (Cnv cnv t) =>
  1116. Convert cnv (DCsOf t sum) where
  1117. convert cnv = convertTo cnv . unDCsOf
  1118. -- mapping over the sum
  1119. class ConvertTo cnv sum t where
  1120. convertTo :: cnv -> sum -> t
  1121. instance (ConvertTo cnv a t, ConvertTo cnv b t) =>
  1122. ConvertTo cnv (a :+: b) t where
  1123. convertTo cnv =
  1124. foldPlus (convertTo cnv) (convertTo cnv)
  1125. instance (Generic dc,
  1126. Just (N dc') ~ FindDCs (Tag dc) (DCs t),
  1127. ConvertRs cnv (Rep dc),
  1128. Rep dc' ~ RsConvert cnv (Rep dc),
  1129. DC dc', Range dc' ~ t, DT t) =>
  1130. ConvertTo cnv (N dc) t where
  1131. convertTo cnv (N x) =
  1132. rejoin ((obj . convertRs cnv . rep) x :: dc')
  1133. -- mapping into just the recursive fields
  1134. type family RsConvert cnv prod
  1135. class ConvertRs cnv prod where
  1136. convertRs :: cnv -> prod -> RsConvert cnv prod
  1137. type instance RsConvert cnv (R a) = R (Cnv cnv a)
  1138. instance Convert cnv a => ConvertRs cnv (R a) where
  1139. convertRs cnv (R x) = R (convert cnv x)
  1140. type instance RsConvert cnv (D a) = D a
  1141. instance ConvertRs cnv (D a) where convertRs _ = id
  1142. type instance RsConvert cnv (a :*: b) =
  1143. RsConvert cnv a :*: RsConvert cnv b
  1144. instance (ConvertRs cnv a, ConvertRs cnv b) =>
  1145. ConvertRs cnv (a :*: b) where
  1146. convertRs cnv =
  1147. mapTimes (convertRs cnv) (convertRs cnv)
  1148. \end{haskell}
  1149. \caption{Generic constructor conversion.\label{fig:generic-dc-conversion}}
  1150. \end{figure}
  1151. \begin{haskell}
  1152. data ToExp2 = ToExp2
  1153. type instance Cnv ToExp2 Exp = Exp2
  1154. instance Convert ToExp2 Exp where
  1155. convert cnv e = case goSolo (disband e) of
  1156. Left (Const_ i) -> Const2 i Nothing
  1157. Right x -> convert cnv x
  1158. \end{haskell}
  1159. Type-level programming is a well-known and powerful capability of Haskell, but
  1160. such programming unfortunately has limited specific support in the language. In
  1161. particular, type-level strings and a type-level equality are features planned
  1162. for the immediate future. For this presentation, we will presume they exist; we
  1163. discuss our emulation in \tref{Appendix}{sec:scaffolding}.
  1164. \section{Discussion}
  1165. \stub
  1166. \section{Related Work}
  1167. \stub
  1168. \section{Conclusion}
  1169. \stub
  1170. \subsection{Fields Types}
  1171. We developed the \yoko\ library by exploring the concepts underlying this
  1172. ^Constants^ instance as the foundation of a generic programming library. In
  1173. particular, the decoupling of constructors from their data types and sibling
  1174. constructors enables many of the library's enhancements. This decoupling is
  1175. most explicit in the novel fields types, as demonstrated by the use of
  1176. ^Const3_^. Each fields type is a simple nominal type with a single constructor
  1177. emulating an eponymous constructor of the original data type.
  1178. The purpose of fields types is to make the benefits of a genuine constructor---
  1179. including concise syntax, familiarity, and direct admittance as a
  1180. pattern---more widely available. Thus a fields type is a copy of constructor
  1181. that is can be used in more contexts. In particular, fields types grant
  1182. totality to functions that match just a single constructor, which is the most
  1183. basic building block within the assembly of ad-hoc behaviors for a subset of
  1184. data type's constructors. The direct use of fields types in the \yoko\ generic
  1185. view prevents obfuscation and immodularity when exposing to the user values of
  1186. representations, which is natural in certain generic capabilities, especially
  1187. per-constructor overrides (as in ^Constants^), zippers, algebras, and other
  1188. type-indexed data types \citep{indexed-data}.
  1189. The fields of constructors are ultimately represented by the exact same
  1190. representation types as in \ig, via the ^reps^ function. However, delaying the
  1191. purely structural representation by relying on the fields types allows partial
  1192. exposure of representation---an occasional necessity---without incurring
  1193. drastic detriments to code quality, such as immodularity.
  1194. \appendix
  1195. \section{Scaffolding for Type-level Programming}
  1196. \label{sec:scaffolding}
  1197. \stub
  1198. The type-level programming involved in \yoko's most advanced capabilities
  1199. requires significant but quite fundamental scaffolding. This sections declares
  1200. these and other related mechanisms.
  1201. \subsection{Promotion and Proxies}
  1202. \stub
  1203. Explain the
  1204. \href{http://hackage.haskell.org/package/type-bool}{\inlineHaskell{type-bool}}
  1205. package.
  1206. Unfortunately, kind-indexed type families are not yet implemented. As a result,
  1207. using data kinds for some type families would be overly restrictive. For
  1208. example, the best currently expressible kind for ^IsJust^ is
  1209. ^Maybe___*___->___Bool^. Because subkinding is not supported, ^IsJust^ cannot
  1210. be applied to ^Maybe Bool^ or ^Maybe (Maybe *)^. The ideal kind is
  1211. ^forall___k.___Maybe___k___->___Bool^, but GHC currently has no syntax for
  1212. expressing that kind. Thus, we do not promote higher-kinded data to kinds. For
  1213. promoting data types of kind ^*^, there is no need for polymorphism, we do
  1214. promote ^Bool^.
  1215. \begin{