/paper/yoko-old.tex
LaTeX | 1459 lines | 1207 code | 244 blank | 8 comment | 0 complexity | 56af897c568ca01b8fe8b5938ea8f36a MD5 | raw file
Possible License(s): BSD-3-Clause
Large files files are truncated, but you can click here to view the full file
- \documentclass{sigplanconf}
- \input{preamble}
- \begin{document}
- \maketitle
- \section{Introduction}
- \todo{page limit = 12}
- \todo{distinguish occurrences of ``\ig '' between the approach, the generic
- view, and the library}
- \todo{Cite \url{http://dreixel.net/research/pdf/fcadgp.pdf}}
- Generic programming libraries deliver, in the language, a degree of reuse that
- is conventionally attainable only by metalingual capabilities such as special
- support from compilers and generative programming. The \ig\ Haskell library,
- for example, generically defines a function testing for equality, a function
- for printing, and an ``empty'' value, all of which can be instantiated for most
- data types with minimal integration effort \citep{instant-generics}. Moreover,
- the library user can define their own extensible generic values. The common
- trait of these functions is that they can be defined in terms of the type
- structure of their domains and ranges.
- This paper presents the \yoko\ library, \todo{\ig\ is strictly defined in a way
- which forbids extention: it's an approach framework that we're working
- in. We're extending its demonstration on Hackage.} which extends the
- \ig\ approach both (1) to enable more exact types for generically defined
- values and also (2) to generically define functions that require more
- structural information than existing generic programming libraries provide.
- The foundation of \yoko\ is a generic view \citep{generic-view} called
- \emph{sum-of-constructors}, which enhances the common \emph{sum-of-products}
- view. While sophisticated use of sum-of-products can almost emulate
- sum-of-constructors, sum-of-products falls short in a way that violates
- encapsulation of the original data type's representation, thereby sacrificing
- both modularity and lucidity. Beyond these essential benefits to software
- quality, the enhanced view accomodates a more complete reflection of data type
- declarations. For example, \yoko\ thereby enables a generic treatment of
- polymorphic variants \citep{poly-variants} and transformation between similar
- data types.
- \begin{table}[h]
- \begin{tabular}{l|cc}
- \textbf{Syntactic Sort} & \textbf{\# of Types} & \textbf{\# of Constructors} \\
- \hline
- Type & 4 & 15\\ % (^Type^, ^TyVarBndr^, ^Kind^, ^Pred^), nominal: (^ForallT^, ^VarT^, ^ConT^, ^ClassP^)
- Declaration & 12 & 37\\ % (^Dec^, ^Con^, ^Strict^, ^Foreign, Callconv^, ^Safety^, ^Pragma^, ^InlineSpec^, ^FunDep^, ^FamFlavour^, ^Fixity, FixityDirection^)
- Expression & 8 & 41\\ % (^Exp^, ^Match^, ^Clause^, ^Body^, ^Guard^, ^Stmt^, ^Range^, ^Lit^)
- Pattern & 1 & 14\\ % ^Pat^
- \textbf{Total} & 25 & 107
- \end{tabular}
- \caption{The Template Haskell version 2.6 AST.\label{tab:th-ast}}
- \end{table}
- The statistics listed in \tref{Table}{tab:th-ast} for the Template Haskell
- version 2.6 AST characterize the burden of each new data type. Each data type
- in a Haskell compiler's pipeline would be approximately as large as the
- Template Haskell AST. \todo{How many such data types in GHC?} Specifying
- functions over such large ASTs is a daunting task that can be significantly
- mollified by leveraging generic programming as much as possible.
- Existing generic programming libraries such as \ig\ successfully mitigate the
- first burden by nearly direct reuse of the generically defined functions for
- each of the many types. The second burden, however, has so far remained beyond
- the purview of generic programming. The tedious cases of transformations
- require constructor-agnostic bookkeeping, structural recursion, and a
- correspondence between the similar constructors of adjacent types in the
- pipeline. Existing techniques can muster the first two of these, but are
- incapable of automatically identifying corresponding constructors except under
- prohibitively degenerate circumstances.
- By supporting generic transformations between similar types, \yoko\ reduces the
- cost of encoding invariants as data types. The key to its support is its
- ability to identify corresponding constructors in distinct data types, which is
- a reflective capability derived from the sum-of-constructors view. By reducing
- the cost of encoding invariants in the type system, the \yoko\ generic
- programming library enables higher assurance of compiler correctness.
- This paper conveys the following contributions.
- \begin{enumerate}
- \item Demonstrations of basic insufficiencies of the sum-of-products view as
- used in the \ig\ Haskell package.
- \item The incremental introduction of \yoko's enhancements redressing those
- insufficiencies.
- \item A definition of a realistic transformation using \yoko's unique
- capabilities. Lambda lifting transforms from an object language with
- anonymous functions to an object language without them. In our definition
- this transformation, all syntactic constructs that do not directly involve
- binding are handled generically.
- \item The lambda lifting transformation's generic treatment of its non-nominal
- syntactic constructors can be factored out. The resulting transformation has
- a purely generic definition using only the \yoko\ API, and thus it can be
- directly reused in the definition of other transformations.
- \end{enumerate}
- \section{Objective}
- \label{sec:objective}
- Lambda lifting \citep{lambda-lifting} is a transformation in many compiler
- pipelines that maps from an object language with anonymous functions to an
- object language without them. The declarations listed in
- \tref{Figure}{fig:lambda-lift-sig} include the two data types modeling the
- terms of each object language as well as the ^lambdaLift^ function. This
- section explains the invariants encoded by the two data types and motivates the
- generic treatment of most constructors in the definition of lambda lifting.
- \begin{figure}[h]
- \begin{haskell}
- lambdaLift :: AnonTerm -> TopProg
- module Common where
- -- object language types, including at least arrows
- data Type = ...
- module AnonymousFunctions where
- data AnonTerm = Lam Type AnonTerm | Var Int
- | Let [Decl] AnonTerm
- | ... -- non-nominal constructors
- newtype Decl = Decl Type AnonTerm
- module TopLevelFunctions where
- data TopTerm = DVar Int | Var Int
- | ... -- same non-nominal constructors
- type FunDec = ([Type], Type, TopTerm)
- data TopProg = Prog [FunDec] TopTerm
- \end{haskell}
- \caption{The signature of lambda lifting.\label{fig:lambda-lift-sig}}
- \end{figure}
- \subsection{The Signature of Lambda Lifting}
- The ^AnonTerm^ type models the terms of higher-order functional language with
- anonymous functions. It includes the ^Lam^ and ^Var^ constructors for de
- Bruijn-indexed lambdas and variable occurrences as well as the ^Let^
- constructor and ^Decl^ type that models a non-recursive let form with multiple
- nested declarations. The ^Let^ constructor incurs mutually recursion with the
- ^Decl^ type. For our purposes, the type is also assumed to include any number
- of other constructors modeling \emph{non-nominal} syntactic constructs, those
- that do not involve binding or variable occurrences. Function application,
- tuples, lists, and literal values of base types, for example, are
- non-nominal. The ^TopProg^ type, on the other hand, models the terms of a
- higher-order functional program in which functions can only be named and
- globally declared. Such a program is modeled as a pairing of a
- topologically-ordered list of top-level function declarations and a body term;
- the declaration bodies and the main body are terms without anonymous functions
- as modeled by the ^TopTerm^ type. As the successor of ^AnonTerm^ in the
- pipeline of data types, ^TopTerm^ retains most of ^AnonTerm^'s constructors. In
- order to encode its invariant, however, it drops the ^Lam^ and ^Let^
- constructors. For simplicity, we assume the corresponding constructors of both
- ^AnonTerm^ and ^TopTerm^ have the same name (hence the separate modules) and
- the same fields after accounting for the substitution of ^TopTerm^ for
- ^AnonTerm^.
- The ^AnonTerm^ and ^TopProg^ types encode the invariants that the modeled terms
- adhere to the respective grammars with and without lambdas. For a Haskell AST,
- these are not particularly impressive invariants. They are, however, quite
- pragmatic---the presence or absence of anonymous functions is a significant
- property to statically guarantee---and require a transformation inexpressible
- with existing generic programming techniques.
- Among common language features, only nominal ones are essential to lambda
- lifting. The nub of the ^lambdaLift^ function's semantics is accordingly
- present only in its cases for the ^Lam^, ^Var^, and ^Let^ constructors. As
- described in the introduction's second burden of a pipeline with numerous data
- types, the lambda lifting of non-nominal constructors merely maintains some
- bookkeeping and recurs through subterms, mapping each ^AnonTerm^ constructor to
- the obvious counterpart in ^TopTerm^. In this case, the bookkeeping involves
- generating the list of top-level declarations corresponding to the lifted
- lambdas. This is naturally implemented in
- \tref{Section}{sec:lambda-lift-definition} with a writer monad. It is the
- automated identification of corresponding constructors which \yoko\ uniquely
- enables.
- \subsection{Discussion}
- The compound first field of ^Let^ and the corresponding mutual recursion makes
- ^AnonTerm^ a realistic example.
- %% Modern generic programming approaches continue to depend on metalingual
- %% capabilities (\eg\ Template Haskell), but only for convenience: the libraries
- %% derive their genericity from declarations within the langauge. \todo{TODO} TODO
- %% what's the benefit of that? Better integration with other language features?
- %% Portability?
- \section{\ig\ Background}
- \todo{grammar for the representation types}
- The \ig\ approach \citep{instant-generics} underlies \yoko's generic
- programming capabilities. The approach derives its genericity from two major
- Haskell language features: type classes and type families
- \citep{type-families}. We demonstrate a slight simplification of the library
- with an example type and two generically defined functions in order to set the
- stage for \yoko's enhancements. We retrofit our own vocabulary to the concepts
- underlying the \ig\ Haskell declarations.
- \tref{Figure}{fig:instant-generics} lists the core \ig\ declarations. In this
- approach to generic programming, any value with a \emph{generic semantics} is
- defined as a method of a type class, called a \emph{generic class}. That
- method's generic semantics is declared as instances of the generic class for
- each of a small finite set of \emph{representation types}: ^Var^, ^Rec^,
- \etc. The ^Rep^ type family maps a data type to its sum-of-products structure
- \citep{sum-of-products} as encoded with those representation types. A
- corresponding instance of the ^Representable^ class converts between a type and
- its ^Rep^ structure. Via this conversion, an instance of a generic class for a
- data type can delegate to the generic definition by invoking the method on the
- type's structure. Instances of a generic class, however, are not required to
- rely on the generic semantics: they can use it partially or completely ignore
- it.
- \subsection{The Sum-of-Products Generic View}
- Each representation type models a particular structure in the declaration of a
- data type. The ^Rec^ and ^Var^ types represent occurrences of types in the same
- mutually recursive family as the represented type (roughly, its binding group)
- and any non-recursive occurrence of other types, respectively. Sums of
- constructors are represented by nestings of the higher-order type ^:+:^, and
- products of fields are represented similarly by ^:*:^. ^U^ serves as the empty
- product. Since an empty sum would represent a data type with no constructors,
- its usefulness is questionable. The representation of each constructor is
- annotated by means of ^C^ to carry a bit more reflective information in ^C^'s
- phantom type parameter. The ^:+:^, ^:*:^, and ^C^ types are all higher-order
- representations in that they expect representations as arguments. If Haskell
- supported subkinding \citep{promotion}, these parameters would be of a subkind
- of ^*^ specific to representation types. Since parameters of ^Var^ and ^Rec^
- are not supposed to be representation types; they would have the standard ^*^
- kind.
- \begin{figure}[h]
- \begin{haskell}
- type family Rep a
- data Var a = Var a
- data Rec a = Rec a
- data U = U
- data a :*: b = a :*: b
- data C c a = C a
- data a :+: b = L a | R b
- class Representable a where
- to :: Rep a -> a
- from :: a -> Rep a
- class Constructor c where
- conName :: C c a -> String
- \end{haskell}
- \caption{Core declarations of the \ig\ library.}
- \label{fig:instant-generics}
- \end{figure}
- Consider a simple abstract syntax for expressions denoting arithmetic sums,
- declared as ^Exp^.
- \begin{haskell}
- data Exp = Const Int | Plus Exp Exp
- \end{haskell}
- \noindent An instance of the ^Rep^ type family maps ^Exp^ to its structure as
- encoded in terms of the representation types.
- \begin{haskell}
- type instance Rep Exp =
- C Const (Var Int) :+: C Plus (Rec Exp :*: Rec Exp)
- data Const; data Plus
- instance Constructor Const where conName _ = "Const"
- instance Constructor Plus where conName _ = "Plus"
- \end{haskell}
- The ^Const^ and ^Plus^ types are considered auxiliary by \ig, added as an
- afterthought to the sum-of-products view in order to define another class of
- values generically---^Show^ and ^Read^ in particular. Since \yoko\ uses them in
- the same reflective capacity, but to a much greater degree, we designate them
- \emph{constructor types}. Each constructor type corresponds directly to a
- constructor from the represented data type (much like promotion
- \citep{promotion}).
- The ^Const^ constructor's field is represented with ^Var^, since ^Int^ is not a
- recursive occurrence. The two ^Exp^ occurrences in ^Plus^ are recursive, and so
- are represented with ^Rec^. The entire ^Exp^ type is represented as the sum of
- its constructors' representations---the products of one and two fields,
- respectively---with some further reflective information provided by the ^C^
- annotation. The ^Representable^ instance for ^Exp^ follows directly from the
- involved signatures.
- \begin{haskell}
- instance Representable Exp where
- from (Const n) = L (C (Var n))
- from (Plus e1 e2) = R (C (Rec e1 :*: Rec e2))
- to (L (C (Var n))) = Const n
- to (R (C (Rec e1 :*: Rec e2))) = Plus e1 e2
- \end{haskell}
- \subsection{Two Generic Definitions}
- We generically define an equality test and the generation of a minimal
- value. For equality, we reuse the ^Eq^ class as the generic class.
- \begin{haskell}
- instance Eq a => Eq (Var a) where
- Var x == Var y = x == y
- instance Eq a => Eq (Rec a) where
- Rec x == Rec y = x == y
- instance (Eq a, Eq b) => Eq (a :*: b) where
- x1 :*: x2 == y1 :*: y2 = x1 == x2 && y1 == y2
- instance Eq U where _ == _ = True
- instance (Eq a, Eq b) => Eq (a :*: b) where
- L x == L y = x == y
- R x == R y = x == y
- _ == _ = False
- instance Eq a => Eq (C c a) where
- C x == C y = x == y
- \end{haskell}
- \noindent With these instance declarations, ^Eq Exp^ is immediate. As
- \citeauthor{fast-and-easy} show, the GHC inliner can be compelled to optimize
- away much of the representational overhead.
- \begin{haskell}
- instance Eq Exp where x == y = from x == from y
- \end{haskell}
- The method of the ^Empty^ generic class generates a minimal value\footnote{Note
- that \inlineHaskell{empty} is not a function. This is why we avoid the term
- ``generic function''.}.
- \begin{haskell}
- class Empty a where empty :: a
- instance Empty Int where empty = 0
- instance Empty Char where empty = '\NUL'
- ...
- \end{haskell}
- \noindent In the generic definition of ^empty^, it may seem odd to define an
- instance for ^Rec^, since recursion seems contrary to minimality. This instance
- is ultimately necessary for two reasons. First, in a mutually recursive family
- of data types, one sibling might not have any non-recursive fields; to generate
- its minimal value requires recursion to reach any sibling that has a
- constructor capable of non-recursion. Second, the ^Rec^ instance enables a
- reasonable use of ^Empty^ for coinductive data types, in which corecursion is
- inevitable.
- \begin{haskell}
- instance Empty a => Empty (Var a) where
- empty = Var empty
- instance Empty a => Empty (Rec a) where
- empty = Rec empty
- instance (Empty a, Empty b) => Empty (a :*: b) where
- empty = empty :*: empty
- instance Empty U where empty = U
- instance Empty a => Empty (C c a) where
- empty = C empty
- \end{haskell}
- The ^Empty^ instance for ^:+:^ must prefer a summand (i.e. a constructor)
- capable of non-recursion. In order to do so, the auxiliary class ^HasRec^
- provides a means for checking if a representation value involves recursion. The
- instances for ^Var^ and ^Rec^ answer the predicate directly; the other
- representation types' instances structurally recur.
- \begin{haskell}
- class HasRec a where hasRec :: a -> Bool
- instance HasRec (Var a) where hasRec _ = False
- instance HasRec (Rec a) where hasRec _ = True
- instance (HasRec a, HasRec b) => HasRec (a :*: b) where
- hasRec (a :*: b) = hasRec a || hasRec b
- instance HasRec U where hasRec _ = False
- instance (HasRec a, HasRec b) => HasRec (a :+: b) where
- hasRec (L x) = hasRec x
- hasRec (R x) = hasRec x
- instance HasRec a => HasRec (C c a) where
- hasRec (C x) = hasRec x
- \end{haskell}
- \noindent The ^Empty^ instance for ^:+:^ uses ^hasRec^ to avoid recursion when
- possible. Laziness and the non-strictness of ^hasRec^ for ^Var^, ^Rec^, and ^U^
- prevents the use of ^empty^ in the condition from diverging. For coinductive
- data types, ^empty^ will generate instead an infinite nesting of the last
- constructor, or the cyclic analog for a mutually corecursive family.
- \begin{haskell}
- instance (HasRec a, Empty a, Empty b
- ) => Empty (a :+: b) where
- empty = if hasRec lempty then R empty else L lempty
- where lempty = empty :: a
- \end{haskell}
- The ^Empty^ instance for ^Exp^ is a straight-forward delegation to the generic
- definition and always yields ^Const 0^.
- \begin{haskell}
- instance Empty Exp where empty = to empty
- \end{haskell}
- \begin{note}{Nick}{Make \inlineHaskell{HasRec} static}
- Let's not stoop to a dynamic predicate when we don't need to.
- \begin{enumerate}
- \item Search instead for the first constructor with no ^R^s in its structure.
- \item Beware mutual recursion: some types need to traverse a finite number of
- ^R^s in order to reach a sibling type that has a constructor with no
- recursive fields.
- \item There's a couple ways to avoid cycles in that search: keep a list of
- visited types (or better yet, typenames, irrespective of type parameters), or
- just limit the depth of the search to the size of the sibling set (assuming
- it's not infinite, as for nested types)
- \item Flourish: choose the path to a non-recursive constructor with the
- smallest number of total fields?
- \item Lastly, fallback gracefully for necessarily infinite types.
- \end{enumerate}
- \end{note}
- As demonstrated with ^==^ and ^empty^, generic definitions---\ie\ the
- corresponding instances for the representation types---provide an easily
- invocable default behavior. If that behavior suffices for a representable type,
- then instantiation of the method on the type's representation provides a simple
- way to define the methods in an instance of a generic class for the
- representable type. If a particular type needs a distinct ad-hoc definition of
- the method, then that type's instance can use its own specific method
- definitions, relying on the default generic definition to a lesser degree or
- even not at all.
- \section{\yoko's Enhancements}
- \label{sec:enhancements}
- The \yoko\ library extends \ig\ with three principal enhancements.
- \subsection{Representation of Compound Fields}
- \label{sec:compound-fields}
- For the represention of data types with compound fields (\eg\ containing lists
- and tuples), the paucity of the \ig\ representation types precludes precise use
- of the ^Rec^ representation type. \todo{\ig\ isn't a view, sum-of-products is.}
- Though our semantics for ^Rec^ is determined by recursive occurrences of types,
- its semantics in \ig\ allows it to represent any type that contains recursive
- occurrences. Other wise, it could not represent types with compound
- fields. Consider representing a type such as ^Exp2^, where ^Const2^ optionally
- takes another expression as a second argument.
- \todo{Can we demonstrate mutual recursion? Perhaps Odd/Even lists?}
- \begin{haskell}
- data Exp2 = Const2 Int (Maybe Exp2) | Plus2 Exp2 Exp2
- data Const2 ; data Plus2
- type instance Rep Exp2 =
- C Const2 (Var Int :*: Rec (Maybe Exp2)) :+:
- C Plus2 (Rec Exp2 :*: Rec Exp2)
- \end{haskell}
- \noindent Note that the argument to ^Rec^ in the representation of the ^Const2^
- constructor must be the entire field, since the representation types include no
- means of navigating the components of the field in order to apply ^Rec^ only to
- the occurrence of ^Exp2^. The danger of simply enlisting ^Maybe^ as an ad-hoc
- representation type to represent the field directly as ^Maybe___(Rec___Exp2)^
- is discussed at the end of this subsection.
- This imprecise use of ^Rec^ remains predominantly efficacious because the
- declaration of instances does provide a means of navigating the composition of
- the field. For example, the generic definition of ^Eq^ suffices for ^Exp2^,
- since ^Maybe^ has a standard ^Eq^ instance defining equality in terms of its
- type parameter (^Eq a^). This would not work if the ^Eq^ instance for ^Rec^
- actually depended on the particular semantics of ^Rec^ as representing a
- recursive occurrence, as is the case with the ^HasRec^ generic class.
- For both ^==^ and ^empty^, the ^Rec^ and ^Var^ instances are exactly the
- same. For ^hasRec^, however, the two instances differ in accord with the
- essence of the semantics of ^hasRec^. It is therefore sensitive to imprecise
- use of ^Rec^; in particular, the result of the call to ^hasRec^ in the ^empty^
- method for the ^:+:^ type indicates that the second field of ^Const2^
- recurs. Unfortunately, this results in ^empty___::___Exp2^ generating an
- infinite nesting of ^Plus2^ instead of the obvious
- ^(Const2___0___Nothing)^. Because the representation of that field uses ^Rec^,
- the corresponding ^hasRec^ method returns ^True^ without regard for the ^Maybe^
- type containing the actual recursive occurrence in that field. In this way, the
- imprecise use of the ^Rec^ representation type has compromised the definition
- of ^empty^ for ^Exp2^.
- A more precise alternative reserves the use of ^Rec^ for the actual recursive
- occurrences themselves. In order to do so, \yoko\ enlarges the universe of
- types that are representable with precise use of ^Rec^ by extending the
- representation types with two types for representing applications of
- ^*___->___*^ and ^*___->___*___->___*^ types (\eg\ lists and tuples).
- \begin{haskell}
- newtype Par1 f c = Par1 (f c)
- newtype Par2 ff c d = Par2 (ff c d)
- \end{haskell}
- \noindent The resulting universe of representable types is still incomplete,
- but we suppose it includes significantly more of the data types commonly
- declared in Haskell programs, including ^Exp2^. With the new representation
- type, the second field of the ^Const2^ constructor now uses the ^Rec^ type only
- for the recursive occurrence: ^Par1___(Maybe___(Rec___Exp2))^.
- Adding two representation types is a simple enhancement; its only cost is the
- corresponding additional instances for each generic class. Indeed, parametric
- types such as lists and tuples could themselves be considered representation
- types on an ad-hoc basis, but this can lead to ambiguity of its own. The
- \ig\ library does not explicitly preclude such ad-hoc representation types, but
- its omission of instances of ^HasRec^ for lists and ^Maybe^ seemingly endorses
- the use of ^Rec^ to represent compound fields. Precise use of the ^Rec^
- representation type enables accordingly more precise use of generic definitions
- in \yoko\'s more advanced type programming.
- General ^Par1^ instances tends to rely on a corresponding class parameterized
- over ^*___->___*^ types, such as the ^Functor^ class. Indeed, the ^Par1^
- instance for ^HasRec^ will use the ^Foldable^ class. General instances for the
- ^Par2^ type similarly rely on a corresponding class parameterized for
- ^*___->___*___->___*^ types. However, in many cases, including ^Eq^ and
- ^Empty^, the generic class for parameterized over ^*^ types can be re-used
- directly. Otherwise, if no general higher-kinded class exists and the generic
- class itself cannot be reused, an ad-hoc higher-kinded class must be declared.
- \begin{haskell}
- instance Eq (f a) => Eq (Par1 f a) where
- Par1 x == Par1 y = x == y
- instance Empty (f a) => Empty (Par f a) where
- empty = Par1 empty
- instance (Foldable f, HasRec a) =>
- HasRec (Par1 f a) where
- hasRec (Par1 x) = Data.Foldable.any hasRec x
- instance Empty (Maybe a) where empty = Nothing
- \end{haskell}
- \noindent Now constraints over the representation of ^Exp2^ ultimately incur
- corresponding constraints over ^Maybe^. As such, the more precise ^hasRec^
- yields ^(Const2___0___Nothing)^ for ^Exp2^, as expected.
- While it may seem attractive to reuse the ^HasRec^ class for the ^Par1^
- instance as with ^Eq^ and ^Empty^, such a ^HasRec___(Par1___f___a)^ instance
- with the context ^HasRec___(f___a)^ would be unsound. In a vacuum, it is
- obvious that the correct definition of ^hasRec^ for ^Maybe^ is ^const___False^;
- the ^Maybe^ type is not recursive. This, in turn, spoils the hypothetical
- ^Par1^ instance. For example,
- ^hasRec___(Const2___5___(Just___(Const2___...)))^ would incorrectly reduce to
- ^False^.
- The key insight is that the semantics of ^hasRec^ is entirely dependent on its
- type parameter. The unsound ^HasRec^ instance for ^Par1^ changes the head of
- the type parameter to ^f^. If ^f^ were a representation type, this would be
- fine, because representation types serve as a proxy for the represented nominal
- type. However, according to the semantics of ^Par1^, ^f^ is not a
- representation type: it is an unrestrained ^*___->___*^ type. Without using
- ^Par1^ to delimit the representation, there would be no opportunity to enable
- the transition from ^HasRec^ to ^Foldable^. This is why adopting ^Maybe^ as an
- ad-hoc representation type is dangerous.
- \subsection{Delayed Representation of Constructors}
- \label{sec:granularity}
- The generic view of data types in \yoko\ delays the representation of
- constructors as products of their fields. We motivate this as an enhancement by
- exploring a hypothetical application of generic programming to mitigate a large
- number of constructors. Generic programming is most obviously useful when
- dealing with large data types, rapid prototyping of data types, or both. With
- large data types, it is also likely that a minority of a type's constructors
- deserve interesting deviation from the generic definition. Because \ig\ only
- represents the entire data type, it struggles to delegate to the generic
- definition on a per constructor basis.
- Consider enhancing the ^Exp^ type's ^Const^ constructor to support various
- encodings of numerical constants. The essence of this example is that the
- semantics of that constructor no longer corresonds so directly to its
- structure.
- \begin{haskell}
- data Encoding = Base Int | Roman | ...
- decode :: Encoding -> String -> Int
- data Exp3 = Const3 Encoding String | Plus3 Exp3 Exp3
- \end{haskell}
- We declare the ^Constants^ generic class with the ^constants^ method for
- collecting the values of all numerical constants occuring in a value of a data
- type. Again, this use of generic programming is hardly justified for ^Exp3^
- itself, but imagine it is a larger data type with numerous
- constructors---perhaps one for each arithmetic operation: subtraction,
- multiplication, \etc. The generic definition of ^Constants^ merely recurs,
- collecting constants from the fields.
- \begin{haskell}
- class Constants a where constants :: a -> [Int]
- instance Constants a => Constants (Var a) where
- constants (Var x) = constants x
- instance Constants a => Constants (Rec a) where
- constants (Rec x) = constants x
- instance (Constants a, Constants b) =>
- Constants (a :+: b) where
- constants (L x) = constants x
- constants (R x) = constants x
- instance (Constants a, Constants b) =>
- Constants (a :*: b) where
- constants (x :*: y) = constants x ++ constants y
- instance Constants U where constants _ = []
- instance Constants a => Constants (C c a) where
- constants (C a) = constants a
- \end{haskell}
- The semantics of the ^Const3^ constructor and the semantics of the ^Constants^
- class overlap in such a way that the generic definition is incorrect for that
- constructor: the intent is not to count the numerical constants occurring in
- the ^Encoding^ and ^String^ fields. Therefore, the ^Constants^ instance for
- ^Exp3^ must treat ^Const3^ with ad-hoc behavior and only delegate to the
- generic definition for the other constructors.
- \begin{haskell}
- -- NB speculative: ill-typed
- instance Constants Exp3 where
- constants (Const3 enc s) = [decode enc s]
- constants e = constants (from e)
- \end{haskell}
- The ill-typedness of this obvious instance is due both to the coarseness of the
- sum-of-products interpretation and a limitation of the Haskell type system. The
- resulting type error protests that there is no instance of ^Constants^ for
- ^Encoding^ or ^String^. These instances are required because ^constants^ is
- applied to the entire structure of ^Exp3^, including the representation of
- ^Const3^. Unfortunately, the Haskell type system cannot express that ^e^ will
- never be constructed with ^Const3^ and that the offending instances would never
- actually be invoked at run-time.
- Indeed, the standard ^Constants^ semantics for ^Encoding^ and ^String^,
- independent of their role in ^Exp3^, would be ^constants _ = []^ since they
- contain no expression constants. This would render the ^Constants Exp3^
- instance semantically incorrect. Given that this function is being defined
- generically only to cope with the hypothetically numerous constructors for
- arithmetic operators, for all of which the generic definition of ^constants^
- suffices, it is dubious and perhaps even misleading to instantiate ^Constants^
- at these two types at all. Moreover, if such instances had been declared and a
- new constructor were added to ^Exp3^ that contained a ^String^ or an
- ^Encoding^, that constructor would silently adopt the generic definition of
- ^Constants^. Therefore, declaring instances of generic classes that are known
- to be ultimately unnecessary is considered harmful. \todo{Similar to these
- required-but-extraneous instances?
- \url{http://hackage.haskell.org/trac/ghc/ticket/5499}}
- An alternative instance, still expressable within \ig, avoids generating the
- offending constraints on ^Encoding^ and ^String^ by manipulating the
- sum-of-products representation directly.
- \begin{haskell}
- -- NB speculative: valid, but obfuscated & immodular
- instance Constants Exp3 where
- constants e = case from e of
- L (C (Var enc :*: Var s)) -> [decode enc s]
- R x -> constants x
- \end{haskell}
- \noindent Though this instance is well-typed and semantically correct, it is
- also severely obfuscated and immodular, because it exposes the encoding of
- ^Exp3^'s structure. In particular, it assumes that ^Const3^ is the left summand
- of ^Exp3^'s representation. Matters only worsen if the particular structural
- encoding happens to bury the constructor(s) of interest deeper in the nestings
- of ^:+:^. This definition is especially fragile with respect to changes of the
- declaration of ^Exp3^, because the sum is often times expressed as a balanced
- nesting of ^:+:^s (for compile-time efficiency); adding a new constructor, no
- matter its position in the list of constructors is likely to displace ^Const3^
- from the left summand. The definition of ^constants^ is ultimately so
- obfuscated because the anonymous product representing ^Const3^ is used
- directly, making no indication that ^Const3^ is the constructor of interest. We
- consider these detriments to the software quality unacceptable.
- Using a slight simplification of the \yoko\ API, the preferred instance is
- declared as follows. Since the representation of the type's structure is not
- exposed, this instance is precisely as modular as the ill-typed but ideal first
- attempt. Furthermore, if the ^disband^, ^project^, and ^reps^ functions and the
- ^*_^ naming convention are recognized as pieces of a familiar library API, then
- this instance is also nearly as lucid as the first: the special treatment of
- ^Const3^ is obvious.
- \begin{haskell}
- -- derived from Exp3
- data Const3_ = Const3_ Encoding String
- -- a simplification of the #\yoko# API
- goSolo :: Project a sum => sum -> Either a (sum :-: a)
- instance Constants Exp3 where
- constants e = case project (disband e) of
- Left (Const3_ enc s) -> [decode enc s]
- Right x -> constants (reps x)
- \end{haskell}
- \noindent The ^Const3_^ constructor is the sole constructor of a type derived
- by \yoko\ from the declaration of the ^Const^ constructor. We designate such
- types \emph{fields types}. The ^disband^ function converts a data type to a
- nested ^:+:^ sum of its constructors, each represented as a fields type---hence
- the \emph{sum-of-constructors} generic view.
- The ^project^ function, ^Project^ type class, and ^:-:^ type family are
- operations on nested ^:+:^ sums. The ^project^ function uses ^:-:^ to remove a
- type from a sum and determine whether a value in the sum was a value of the
- removed type or part of the remaining sum, as computed by ^:-:^. The ^reps^
- function completes the sum-of-products structural representation by converting
- a sum of constructors to a sum of the corresponding products.
- The ^Const3_^ type and the ^:-:^ family are the essential components that avoid
- the generation of the superfluous ^Constants^ constraints on the ^Encoding^ and
- ^String^ types. In particular, the type of ^x^ in the ^Right^ branch is
- ^Rep___Exp3___:-:___Const3_^, which expresses to the typechecker the informal
- notion of an ^Exp3^ value not constructed by ^Const3^. The remaining
- constructors of ^Exp3^ do not incur the offending constraints when delegated to
- the generic definition of ^constants^. This instance declaration thereby uses
- ^project^ as a form of pattern matching with a more sophisticated type rule
- than the Haskell ^case^ expression's. The type of ^project^ locally refines
- types in the branches in much the same manner as does supercompilation
- \citep{supercompilation}. By delaying the representation of each constructor as
- an anonymous product, the sum-of-constructor view grants this style of pattern
- matching both robustness against ambiguities and the lucidity of genuine
- constructor syntax.
- % Metaprogramming with Template Haskell provides an alternative means to
- % overcome the immodularity, by transforming a slight obfuscation of the ideal
- % instance into the immodular one \citep{hackage-th-gd}.
- \todo{Make sure to explain that the Par1 and Par2 types are because they allow Rec to
- used exactly.}
- The capability of refining a set of types by projecting out certain
- constructors is crucial to our goal of automating transformation functions. The
- interesting constructors with semantics pertaining to the semantics of the
- transformation cannot be mapped automatically. Thus, without a way to remove
- them from the generic representation of a data type, no transformation can be
- automated. The ^project^ and ^partition^ functions therefore underly automated
- transformation by enabling the user to handle the constructors that
- \yoko\ cannot transform automatically.
- The simplicity of the strict sum-of-products generic view, \ie\ excluding the
- ^C^ type, precludes any reflection capabilities requiring non-structural
- information. The \ig\ library extends sums and products with the ^C^ type in
- order to enable generic definitions of classes like ^Show^ and ^Read^, which
- necessarily involve the names of constructors. We consider \yoko's fields types
- to be an evolution of the ^C^ type. In fact, the ^C^ constructor, and its
- associated constructor types, could underly a functionality similar to the
- ^project^ function from the \yoko\ instance for ^Constants Exp3^. Thus, it is
- possible to use the ^C^ type instead of fields types to preserve the modularity
- of that instance declaration. However, without relying on Template Haskell
- metaprogramming, the obfuscation of this non-modular \ig\ instance would
- actually be more severe than that of the immodular \ig\ instance from the
- previous section; some sort of type ascription is necessary in order to tag the
- structural pattern with the desired constructor name. Fields types are a
- natural solution that embraces the delayed representation hinted at by the ^C^
- representation type's constructor types without involving heavyweight
- metaprogramming.
- \subsection{Reflection of Constructor Names}
- \label{sec:reflecting-names}
- \todo{We introduced fields types first because it is convenient to index the
- \inlineHaskell{Tag} instances with fields types. However, \inlineHaskell{Tag}
- instances can also be indexed by the first parameter of the \inlineHaskell{C}
- representation type from the \ig\ generic view: \inlineHaskell{Tag} is
- orthogonal to fields types.}
- For the constructors that can be automatically transformed, the
- sum-of-constructors view still does not reflect enough information. Beyond the
- finer grained invocation of the generic definition on a per-constructor basis
- demonstrated in the previous section, an improved reflection of constructors in
- \yoko\ is also key to its support for generic definition of transformations
- between types. The strict sum-of-products view distinguishes such constructors
- only by their position in the sum. Reflection of non-structural information is
- required in order to individually distinguish multiple constructors within a
- type that share the same products structure. For example, the
- ^Plus___Exp___Exp^ and ^Mult___Exp___Exp^ constructors both have the same
- structure, ^Rec Exp :*: Rec Exp^, and hence require more information to be
- distinguished outside of a fixed sum. \yoko\ instead infers the correspondence
- of constructors the same way users do: by the constructor name.
- The library API includes a type family ^Tag^ for mapping a fields type to a
- reflection of its constructor name as a type-level string. In the context of a
- compiler pipeline, we suppose it is safe to rely on constructor names being
- similar in adjacent data types; it is highly likely that in all of the types
- the pipeline, each constructor corresponding to, say, applications will be
- similarly named. It might even be the same name if the types are declared in
- separate modules. If not, it is likely the same root same, say ^App^, with some
- additional suffix.
- \todo{For this presentation, write a \inlineHaskell{Letter} data type with a
- constructor per character that is allowed to occur in a Haskell
- identifier. Define the quadratic instances for the
- \inlineHaskell{EqualLetter} type family. Have \inlineHaskell{Tag} yield a
- promoted \inlineHaskell{[Letter]}.
- \begin{haskell}
- -- #\yoko# API
- type family Tag dc :: [Letter]
- type instance Tag Const_ = [LC, Lo, Ln, Ls, Lt]
- type instance Tag Plus_ = [LP, Ll, Lu, Ls]
- type instance Tag Const2_ = [LC, Lo, Ln, Ls, Lt, L2]
- type instance Tag Plus2_ = [LP, Ll, Lu, Ls, L2]
- type instance Tag Const3_ = [LC, Lo, Ln, Ls, Lt, L3]
- type instance Tag Plus3_ = [LP, Ll, Lu, Ls, L3]
- \end{haskell}
- The generic transformation we define in
- \tref{Section}{sec:polymorphic-generic-conversion} converts a sum of
- constructors to a type under the assumption that each constructor in the sum
- occurs in the type with the same name and a the same fields (modulo the
- corresponding substitution of recursive occurrences). This transformation will
- therefore be able to identify the corresponding cosntructors when converting
- from the ^AnonTerm^ type to the ^TopTerm^ type (from
- \tref{Section}{sec:objective}).
- A generic transformation of constructors from the ^Exp^ type to the ^Exp2^ or
- ^Exp3^ type would require a more sophisticated type-level predicate on
- strings. Since we are already emulating type-level strings, we only handle the
- simpler case of types with constructors with the same name.
- Reflection of a data type's set of constructors as well as the name and
- structure of each constructor provides an abundance of information for
- type-level programming in \yoko. The current Haskell features supporting such
- type-level programming are powerful but rudimentary, and so the next two
- sections introduce some fundamental machinery. The remaining technical sections
- present the full \yoko\ API, define the lambda lifting transformation between
- the ^AnonTerm^ and ^TopProg^ types, and then factor out its generic essence
- into a generic transformer that can be directly reused for other
- transformations.
- \todo{subsection for new representation type grammar}
- \section{Sets as Sums}
- \label{sec:sets-as-sums}
- \begin{figure}[h]
- \begin{haskell}
- -- #\tref{Section}{sec:sets-as-sums}#, Sets as sums
- data Void -- empty set
- newtype N a = N a -- singleton set
- data a :+: b = L a | R b -- set union
- -- #\tref{Section}{sec:embed-partition}#, Set operations
- type family (:-:) sum sum2
- class Embed sub sup where embed :: sub -> sup
- inject :: Embed (N a) sum => a -> sum
- class Partition sup sub where
- partition :: sup -> Either sub (sup :-: sub)
- project ::
- Partition sum (N a) => sum -> Either a (sum :-: N a)
- \end{haskell}
- \caption{The API for sets as sums.}
- \label{fig:sets-as-sums}
- \end{figure}
- The \yoko\ library reflects the set of constructors of a data type as a sum of
- the corresponding fields types. \tref{Figure}{fig:sets-as-sums} on
- page~\pageref*{fig:sets-as-sums} lists the API for sets of types in \yoko. Sets
- of types are modeled as sums using nestings of the ^:+:^ type with set elements
- delimited by applications of the ^N^ type, which models singleton sets. As an
- example, the set of the base types ^Int^, ^Char^, ^Bool^, and ^()^ is
- represented by the type ^Bases^, where
- ^type___Bases___=___(N___Int___:+:___N___Char)^ ^:+:^
- ^(N___Bool___:+:___N___())^.
- Without the ^N^ type, certain essential type families cannot be indexed by type
- sets, since this would require overlapping type family instances, which are
- forbidden by Haskell's type system. The ^C^ representation type from \ig\ could
- be used to model singleton sets, but the semantics of its two type parameters
- are incompatible with field types.
- \subsection{Embedding and Partitioning Sets}
- \label{sec:embed-partition}
- A value of any type can be injected into a sum modeling a set that contains
- that type via the ^inject^ function. For example, any ^Int^, character,
- Boolean, or unit can be injected into ^Bases^. A value of any subset of those
- types, such as ^N Int :+: N Bool^ can be embedded into ^Bases^ via the ^embed^
- function. Similarly, an element type can be projected via the ^project^
- function and a set can be partitioned into two sets via the ^partition^
- function.
- The ^partition^ function uses the ^Either^ type to model the fact that a value
- of the ^sup^ type may be an injection of a type not in the ^sub^ type. In that
- case, the ^sup^ type is refined to the difference ^sup :-: sub^ since the
- evaluation of the ^project^ function has established that its argument is not
- an injection of one of the ^sub^ types. The ^:-:^ type is a simple type-level
- function defined using two other type families.
- \begin{haskell}
- type instance (:-:) (N x) sum2 =
- If (Elem x sum2) Void (N x)
- type instance (:-:) (l :+: r) sum2 =
- Combine (l :-: sum2) (r :-: sum2)
- \end{haskell}
- \noindent The ^Elem^ family implements the obvious membership predicate; it is
- further discussed below. The ^Combine^ family maintains the invariant that the
- ^Void^ type does not occur as an summand within an otherwise non-empty set; for
- example, ^N Int :+: Void^ violates this property.
- \begin{haskell}
- type family Combine sum sum2
- type instance Combine Void x = x
- type instance Combine (N x) Void = N x
- type instance Combine (N x) (N y) = N x :+: N y
- type instance Combine (N x) (l :+: r) =
- N x :+: (l :+: r)
- type instance Combine (l :+: r) Void = l :+: r
- type instance Combine (l :+: r) (N y) =
- (l :+: r) :+: N y
- type instance Combine (ll :+: rl) (lr :+: rr) =
- (ll :+: rl) :+: (lr :+: rr)
- \end{haskell}
- We choose to maintain this invariant here because the assumption that every
- summand is non-degenerate simplifies the definition of the other type set
- operations. Because \yoko\ uses type sets to reflect constructors and mutually
- recursive types, empty sets are non-sensical. A data type with no constructors
- only permits degenerate method definitions, which hardly require
- genericity. Similarly, an empty family of mutually recursive types is no type
- at all. Thus, any occurrence of ^Void^ outside of intermediate type-level
- computations is considered user error. Hence the invariant.
- The ^Elem^ family and the instances of ^Embed^ and ^Partition^ are ultimately
- predicated upon the ^Locate^ family, which is not exposed to the user. Given a
- target type and a sum, ^Locate^ calculates a path through the nested ^:*:^
- types of the sum to reach the occurrence of the target type. Even though such a
- path might not exist, ^Locate^ has a total defition by structuring its result
- in a promotion of the ^Maybe^ type. The ^Elem^ type function simply checks if
- its first argument can be located in its second.
- \begin{haskell}
- data Here a ; data Left x ; data Right x
- type family Locate a sum
- type instance Locate a Void = Nothing
- type instance Locate a (N x) =
- If (Equal x a) (Just (Here a)) Nothing
- type instance Locate a (l :+: r) =
- MaybeMap (Left r) (Locate a l) `MaybePlus1`
- MaybeMap (Right l) (Locate a r)
- type Elem a sum = IsJust (Locate a sum)
- \end{haskell}
- The ^Here^, ^Left^, and ^Right^ types are type-level values, composed to model
- a path through nested ^:*:^ types. The ^Equal^ type family is explained in
- \tref{Section}{sec:scaffolding}. The omitted definitions of ^IsJust^,
- ^MapMaybe^, and ^MaybePlus1^ are obvious, excepting the caveat that the
- ^MaybePlus1^ family has no instance where both of its indices are constructed
- with ^Just^. As a result, a located type cannot occur in the sum model of type
- set more than once. In the context of these sets as constructors and sibling
- types, this would never happen and would again indicate user error. The
- specific semantics of ^MaybePlus1^ grant \yoko\ robustness against this sort of
- ambiguity.
- We omit the instances of ^Embed^ and ^Partition^. Both are defined via
- auxiliary classes that branch according to the results of ^Locate^, and hence
- accomodate no ambiguity. This implementation technique avoids the overlapping
- instances Haskell language extension, which would require sums to be linearized
- in order to define the instances. Our presumption of a type-level type-equality
- predicate (explained in \tref{Section}{sec:scaffolding}) is a linchpin of this
- technique. The elided instances of ^Embed^ and ^Partition^ include any type set
- built with ^N^ and ^:+:^; ^Void^ is not supported.
- \section{The \yoko\ Generic View}
- \label{sec:yoko-api}
- \tref{Figure}{fig:yoko-api} on page~\pageref*{fig:yoko-api} lists the core
- declarations of the \yoko\ generic view. We demonstrate its application with a
- working example that leads into the lambda lifting example. Without loss of
- generality, we concretize the ^AnonTerm^'s underspecified set of non-nominal
- constructors. There will be a single non-nominal constructor, ^App^, modelling
- application. Working through a concrete type is easer to follow, but the ^App^
- constructor is still intended to exemplify an unlimited number of other
- non-nominal constructors, all of which would be handled by the lambda lifting
- definition as developed in the next section.
- \begin{haskell}
- module AnonymousFunctions where
- data AnonTerm = Lam Type AnonTerm | Var Int
- | Let [Decl] AnonTerm
- | App AnonTerm AnonTerm -- new
- newtype Decl = Decl Type AnonTerm -- no change
- \end{haskell}
- Each of the following subsections introduce a part of the \yoko\ view and
- instantiates it for the ^AnonTerm^ and ^Decl^ types. The presentation proceeds
- bottom up, discussing first the representation of fields, then constructors,
- and finally whole data types.
- \begin{figure}[h]
- \begin{haskell}
- -- #\tref{Section}{sec:field-representation}#, Representation of fields
- data U = U -- empty tuple
- data a :*: b = a :*: b -- tuple concatenation
- data Dep a = Dep a
- data Rec t = Rec t
- data Par1 f a = Par1 (f a)
- data Par2 ff a b = Par2 (ff a b)
- type family Rep dc
- class Generic dc where
- rep :: dc -> Rep dc
- obj :: Rep dc -> dc
- -- #\tref{Section}{sec:reflect-dcs}#, Reflection of constructors
- type family Tag dc :: String -- see footnote#\footnotemark#
- type family Range dc
- class (Generic dc, DT (Range dc),
- Embed (N dc) (DCs (Range dc))) =>
- DC dc where rejoin :: dc -> Range dc
- -- #\tref{Section}{sec:reflect-dts}#, Reflection of data types
- newtype DCsOf t sum = DCsOf sum
- type family DCs t
- type Disbanded t = DCsOf t (DCs t)
- class DT t where disband :: t -> Disbanded t
- goSolo ::
- (DT (Range dc), Partition sum (N dc)) =>
- DCsOf (Range dc) sum ->
- Either dc (DCsOf (Range dc) (sum :- N dc))
- \end{haskell}
- \caption{The \yoko\ generic view.}
- \label{fig:yoko-api}
- \end{figure}
- \footnotetext{We assume type-level strings for this presentation. The actual
- implementation emulates type-level strings as in
- \tref{Appendix}{sec:scaffolding}.}
- \subsection{Representation of Fields}
- \label{sec:field-representation}
- The delayed representation of \yoko\ expands the role of \ig's constructor
- types. Each fields type represents a constructor independently of its data type
- of origin. Reflection of the ^AnonTerm^ and ^Decl^ types uses the five obvious
- fields types.
- \begin{haskell}
- data Lam_ = Lam_ Type AnonTerm
- data Var_ = Var_ Int
- data Let_ = Let_ [Decl] AnonTerm
- data App_ = App_ AnonTerm AnonTerm
- data Decl_ = Decl_ Type AnonTerm
- \end{haskell}
- As an extension…
Large files files are truncated, but you can click here to view the full file