From dd82b49ad6f719bd324de7f2d63f3341c0e87694 Mon Sep 17 00:00:00 2001 From: Manuel M T Chakravarty Date: Wed, 8 Oct 2008 06:19:27 +0000 Subject: [PATCH] Users Guide: added type family documentation MERGE TO 6.10 --- docs/users_guide/glasgow_exts.xml | 668 +++++++++++++++++++++++++++++++++++-- 1 file changed, 641 insertions(+), 27 deletions(-) diff --git a/docs/users_guide/glasgow_exts.xml b/docs/users_guide/glasgow_exts.xml index 0eb0f7d..e19f5d0 100644 --- a/docs/users_guide/glasgow_exts.xml +++ b/docs/users_guide/glasgow_exts.xml @@ -3694,6 +3694,647 @@ to work since it gets translated into an equality comparison. + +Type families + + + Indexed type families are a new GHC extension to + facilitate type-level + programming. Type families are a generalisation of associated + data types + (“Associated + Types with Class”, M. Chakravarty, G. Keller, S. Peyton Jones, + and S. Marlow. In Proceedings of “The 32nd Annual ACM SIGPLAN-SIGACT + Symposium on Principles of Programming Languages (POPL'05)”, pages + 1-13, ACM Press, 2005) and associated type synonyms + (“Type + Associated Type Synonyms”. M. Chakravarty, G. Keller, and + S. Peyton Jones. + In Proceedings of “The Tenth ACM SIGPLAN International Conference on + Functional Programming”, ACM Press, pages 241-253, 2005). Type families + themselves are described in the paper “Type + Checking with Open Type Functions”, T. Schrijvers, + S. Peyton-Jones, + M. Chakravarty, and M. Sulzmann, in Proceedings of “ICFP 2008: The + 13th ACM SIGPLAN International Conference on Functional + Programming”, ACM Press, pages 51-62, 2008. Type families + essentially provide type-indexed data types and named functions on types, + which are useful for generic programming and highly parameterised library + interfaces as well as interfaces with enhanced static information, much like + dependent types. They might also be regarded as an alternative to functional + dependencies, but provide a more functional style of type-level programming + than the relational style of functional dependencies. + + + Indexed type families, or type families for short, are type constructors that + represent sets of types. Set members are denoted by supplying the type family + constructor with type parameters, which are called type + indices. The + difference between vanilla parametrised type constructors and family + constructors is much like between parametrically polymorphic functions and + (ad-hoc polymorphic) methods of type classes. Parametric polymorphic functions + behave the same at all type instances, whereas class methods can change their + behaviour in dependence on the class type parameters. Similarly, vanilla type + constructors imply the same data representation for all type instances, but + family constructors can have varying representation types for varying type + indices. + + + Indexed type families come in two flavours: data + families and type synonym + families. They are the indexed family variants of algebraic + data types and type synonyms, respectively. The instances of data families + can be data types and newtypes. + + + Type families are enabled by the flag . + Additional information on the use of type families in GHC is available on + the + Haskell wiki page on type families. + + + + Data families + + + Data families appear in two flavours: (1) they can be defined on the + toplevel + or (2) they can appear inside type classes (in which case they are known as + associated types). The former is the more general variant, as it lacks the + requirement for the type-indexes to coincide with the class + parameters. However, the latter can lead to more clearly structured code and + compiler warnings if some type instances were - possibly accidentally - + omitted. In the following, we always discuss the general toplevel form first + and then cover the additional constraints placed on associated types. + + + + Data family declarations + + + Indexed data families are introduced by a signature, such as + +data family GMap k :: * -> * + + The special family distinguishes family from standard + data declarations. The result kind annotation is optional and, as + usual, defaults to * if omitted. An example is + +data family Array e + + Named arguments can also be given explicit kind signatures if needed. + Just as with + [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT + declarations] named arguments are entirely optional, so that we can + declare Array alternatively with + +data family Array :: * -> * + + + + + Associated data family declarations + + When a data family is declared as part of a type class, we drop + the family special. The GMap + declaration takes the following form + +class GMapKey k where + data GMap k :: * -> * + ... + + In contrast to toplevel declarations, named arguments must be used for + all type parameters that are to be used as type-indexes. Moreover, + the argument names must be class parameters. Each class parameter may + only be used at most once per associated type, but some may be omitted + and they may be in an order other than in the class head. Hence, the + following contrived example is admissible: + + class C a b c where + data T c a :: * + + + + + + + Data instance declarations + + + Instance declarations of data and newtype families are very similar to + standard data and newtype declarations. The only two differences are + that the keyword data or newtype + is followed by instance and that some or all of the + type arguments can be non-variable types, but may not contain forall + types or type synonym families. However, data families are generally + allowed in type parameters, and type synonyms are allowed as long as + they are fully applied and expand to a type that is itself admissible - + exactly as this is required for occurrences of type synonyms in class + instance parameters. For example, the Either + instance for GMap is + +data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) + + In this example, the declaration has only one variant. In general, it + can be any number. + + + Data and newtype instance declarations are only legit when an + appropriate family declaration is in scope - just like class instances + require the class declaration to be visible. Moreover, each instance + declaration has to conform to the kind determined by its family + declaration. This implies that the number of parameters of an instance + declaration matches the arity determined by the kind of the family. + Although, all data families are declared with + the data keyword, instances can be + either data or newtypes, or a mix + of both. + + + Even if type families are defined as toplevel declarations, functions + that perform different computations for different family instances still + need to be defined as methods of type classes. In particular, the + following is not possible: + +data family T a +data instance T Int = A +data instance T Char = B +nonsence :: T a -> Int +nonsence A = 1 -- WRONG: These two equations together... +nonsence B = 2 -- ...will produce a type error. + + Given the functionality provided by GADTs (Generalised Algebraic Data + Types), it might seem as if a definition, such as the above, should be + feasible. However, type families are - in contrast to GADTs - are + open; i.e., new instances can always be added, + possibly in other + modules. Supporting pattern matching across different data instances + would require a form of extensible case construct. + + + + Associated data instances + + When an associated data family instance is declared within a type + class instance, we drop the instance keyword in the + family instance. So, the Either instance + for GMap becomes: + +instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where + data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) + ... + + The most important point about associated family instances is that the + type indexes corresponding to class parameters must be identical to + the type given in the instance head; here this is the first argument + of GMap, namely Either a b, + which coincides with the only class parameter. Any parameters to the + family constructor that do not correspond to class parameters, need to + be variables in every instance; here this is the + variable v. + + + Instances for an associated family can only appear as part of + instances declarations of the class in which the family was declared - + just as with the equations of the methods of a class. Also in + correspondence to how methods are handled, declarations of associated + types can be omitted in class instances. If an associated family + instance is omitted, the corresponding instance type is not inhabited; + i.e., only diverging expressions, such + as undefined, can assume the type. + + + + + Scoping of class parameters + + In the case of multi-parameter type classes, the visibility of class + parameters in the right-hand side of associated family instances + depends solely on the parameters of the data + family. As an example, consider the simple class declaration + +class C a b where + data T a + + Only one of the two class parameters is a parameter to the data + family. Hence, the following instance declaration is invalid: + +instance C [c] d where + data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope + + Here, the right-hand side of the data instance mentions the type + variable d that does not occur in its left-hand + side. We cannot admit such data instances as they would compromise + type safety. + + + + + Type class instances of family instances + + Type class instances of instances of data families can be defined as + usual, and in particular data instance declarations can + have deriving clauses. For example, we can write + +data GMap () v = GMapUnit (Maybe v) + deriving Show + + which implicitly defines an instance of the form + +instance Show v => Show (GMap () v) where ... + + + + Note that class instances are always for + particular instances of a data family and never + for an entire family as a whole. This is for essentially the same + reasons that we cannot define a toplevel function that performs + pattern matching on the data constructors + of different instances of a single type family. + It would require a form of extensible case construct. + + + + + Overlap of data instances + + The instance declarations of a data family used in a single program + may not overlap at all, independent of whether they are associated or + not. In contrast to type class instances, this is not only a matter + of consistency, but one of type safety. + + + + + + + Import and export + + + The association of data constructors with type families is more dynamic + than that is the case with standard data and newtype declarations. In + the standard case, the notation T(..) in an import or + export list denotes the type constructor and all the data constructors + introduced in its declaration. However, a family declaration never + introduces any data constructors; instead, data constructors are + introduced by family instances. As a result, which data constructors + are associated with a type family depends on the currently visible + instance declarations for that family. Consequently, an import or + export item of the form T(..) denotes the family + constructor and all currently visible data constructors - in the case of + an export item, these may be either imported or defined in the current + module. The treatment of import and export items that explicitly list + data constructors, such as GMap(GMapEither), is + analogous. + + + + Associated families + + As expected, an import or export item of the + form C(..) denotes all of the class' methods and + associated types. However, when associated types are explicitly + listed as subitems of a class, we need some new syntax, as uppercase + identifiers as subitems are usually data constructors, not type + constructors. To clarify that we denote types here, each associated + type name needs to be prefixed by the keyword type. + So for example, when explicitly listing the components of + the GMapKey class, we write GMapKey(type + GMap, empty, lookup, insert). + + + + + Examples + + Assuming our running GMapKey class example, let us + look at some export lists and their meaning: + + + module GMap (GMapKey) where...: Exports + just the class name. + + + module GMap (GMapKey(..)) where...: + Exports the class, the associated type GMap + and the member + functions empty, lookup, + and insert. None of the data constructors is + exported. + + + module GMap (GMapKey(..), GMap(..)) + where...: As before, but also exports all the data + constructors GMapInt, + GMapChar, + GMapUnit, GMapPair, + and GMapUnit. + + + module GMap (GMapKey(empty, lookup, insert), + GMap(..)) where...: As before. + + + module GMap (GMapKey, empty, lookup, insert, GMap(..)) + where...: As before. + + + + + Finally, you can write GMapKey(type GMap) to denote + both the class GMapKey as well as its associated + type GMap. However, you cannot + write GMapKey(type GMap(..)) — i.e., + sub-component specifications cannot be nested. To + specify GMap's data constructors, you have to list + it separately. + + + + + Instances + + Family instances are implicitly exported, just like class instances. + However, this applies only to the heads of instances, not to the data + constructors an instance defines. + + + + + + + + + Synonym families + + + Type families appear in two flavours: (1) they can be defined on the + toplevel or (2) they can appear inside type classes (in which case they + are known as associated type synonyms). The former is the more general + variant, as it lacks the requirement for the type-indexes to coincide with + the class parameters. However, the latter can lead to more clearly + structured code and compiler warnings if some type instances were - + possibly accidentally - omitted. In the following, we always discuss the + general toplevel form first and then cover the additional constraints + placed on associated types. + + + + Type family declarations + + + Indexed type families are introduced by a signature, such as + +type family Elem c :: * + + The special family distinguishes family from standard + type declarations. The result kind annotation is optional and, as + usual, defaults to * if omitted. An example is + +type family Elem c + + Parameters can also be given explicit kind signatures if needed. We + call the number of parameters in a type family declaration, the family's + arity, and all applications of a type family must be fully saturated + w.r.t. to that arity. This requirement is unlike ordinary type synonyms + and it implies that the kind of a type family is not sufficient to + determine a family's arity, and hence in general, also insufficient to + determine whether a type family application is well formed. As an + example, consider the following declaration: + +type family F a b :: * -> * -- F's arity is 2, + -- although it's overall kind is * -> * -> * -> * + + Given this declaration the following are examples of well-formed and + malformed types: + +F Char [Int] -- OK! Kind: * -> * +F Char [Int] Bool -- OK! Kind: * +F IO Bool -- WRONG: kind mismatch in the first argument +F Bool -- WRONG: unsaturated application + + + + + Associated type family declarations + + When a type family is declared as part of a type class, we drop + the family special. The Elem + declaration takes the following form + +class Collects ce where + type Elem ce :: * + ... + + The argument names of the type family must be class parameters. Each + class parameter may only be used at most once per associated type, but + some may be omitted and they may be in an order other than in the + class head. Hence, the following contrived example is admissible: + +class C a b c where + type T c a :: * + + These rules are exactly as for associated data families. + + + + + + Type instance declarations + + Instance declarations of type families are very similar to standard type + synonym declarations. The only two differences are that the + keyword type is followed + by instance and that some or all of the type + arguments can be non-variable types, but may not contain forall types or + type synonym families. However, data families are generally allowed, and + type synonyms are allowed as long as they are fully applied and expand + to a type that is admissible - these are the exact same requirements as + for data instances. For example, the [e] instance + for Elem is + +type instance Elem [e] = e + + + + Type family instance declarations are only legitimate when an + appropriate family declaration is in scope - just like class instances + require the class declaration to be visible. Moreover, each instance + declaration has to conform to the kind determined by its family + declaration, and the number of type parameters in an instance + declaration must match the number of type parameters in the family + declaration. Finally, the right-hand side of a type instance must be a + monotype (i.e., it may not include foralls) and after the expansion of + all saturated vanilla type synonyms, no synonyms, except family synonyms + may remain. Here are some examples of admissible and illegal type + instances: + +type family F a :: * +type instance F [Int] = Int -- OK! +type instance F String = Char -- OK! +type instance F (F a) = a -- WRONG: type parameter mentions a type family +type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter +type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type + +type family G a b :: * -> * +type instance G Int = (,) -- WRONG: must be two type parameters +type instance G Int Char Float = Double -- WRONG: must be two type parameters + + + + + Associated type instance declarations + + When an associated family instance is declared within a type class + instance, we drop the instance keyword in the family + instance. So, the [e] instance + for Elem becomes: + +instance (Eq (Elem [e])) => Collects ([e]) where + type Elem [e] = e + ... + + The most important point about associated family instances is that the + type indexes corresponding to class parameters must be identical to the + type given in the instance head; here this is [e], + which coincides with the only class parameter. + + + Instances for an associated family can only appear as part of instances + declarations of the class in which the family was declared - just as + with the equations of the methods of a class. Also in correspondence to + how methods are handled, declarations of associated types can be omitted + in class instances. If an associated family instance is omitted, the + corresponding instance type is not inhabited; i.e., only diverging + expressions, such as undefined, can assume the type. + + + + + Overlap of type synonym instances + + The instance declarations of a type family used in a single program + may only overlap if the right-hand sides of the overlapping instances + coincide for the overlapping types. More formally, two instance + declarations overlap if there is a substitution that makes the + left-hand sides of the instances syntactically the same. Whenever + that is the case, the right-hand sides of the instances must also be + syntactically equal under the same substitution. This condition is + independent of whether the type family is associated or not, and it is + not only a matter of consistency, but one of type safety. + + + Here are two example to illustrate the condition under which overlap + is permitted. + +type instance F (a, Int) = [a] +type instance F (Int, b) = [b] -- overlap permitted + +type instance G (a, Int) = [a] +type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int] + + + + + + Decidability of type synonym instances + + In order to guarantee that type inference in the presence of type + families decidable, we need to place a number of additional + restrictions on the formation of type instance declarations (c.f., + Definition 5 (Relaxed Conditions) of “Type + Checking with Open Type Functions”). Instance + declarations have the general form + +type instance F t1 .. tn = t + + where we require that for every type family application (G s1 + .. sm) in t, + + + s1 .. sm do not contain any type family + constructors, + + + the total number of symbols (data type constructors and type + variables) in s1 .. sm is strictly smaller than + in t1 .. tn, and + + + for every type + variable a, a occurs + in s1 .. sm at most as often as in t1 + .. tn. + + + These restrictions are easily verified and ensure termination of type + inference. However, they are not sufficient to guarantee completeness + of type inference in the presence of, so called, ''loopy equalities'', + such as a ~ [F a], where a recursive occurrence of + a type variable is underneath a family application and data + constructor application - see the above mentioned paper for details. + + + If the option is passed to the + compiler, the above restrictions are not enforced and it is on the + programmer to ensure termination of the normalisation of type families + during type inference. + + + + + + Equality constraints + + Type context can include equality constraints of the form t1 ~ + t2, which denote that the types t1 + and t2 need to be the same. In the presence of type + families, whether two types are equal cannot generally be decided + locally. Hence, the contexts of function signatures may include + equality constraints, as in the following example: + +sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2 + + where we require that the element type of c1 + and c2 are the same. In general, the + types t1 and t2 of an equality + constraint may be arbitrary monotypes; i.e., they may not contain any + quantifiers, independent of whether higher-rank types are otherwise + enabled. + + + Equality constraints can also appear in class and instance contexts. + The former enable a simple translation of programs using functional + dependencies into programs using family synonyms instead. The general + idea is to rewrite a class declaration of the form + +class C a b | a -> b + + to + +class (F a ~ b) => C a b where + type F a + + That is, we represent every functional dependency (FD) a1 .. an + -> b by an FD type family F a1 .. an and a + superclass context equality F a1 .. an ~ b, + essentially giving a name to the functional dependency. In class + instances, we define the type instances of FD families in accordance + with the class head. Method signatures are not affected by that + process. + + + NB: Equalities in superclass contexts are not fully implemented in + GHC 6.10. + + + + + + + Other type system extensions @@ -4975,33 +5616,6 @@ pattern binding must have the same context. For example, this is fine: - -Type families - - - -GHC supports the definition of type families indexed by types. They may be -seen as an extension of Haskell 98's class-based overloading of values to -types. When type families are declared in classes, they are also known as -associated types. - - -There are two forms of type families: data families and type synonym families. -Currently, only the former are fully implemented, while we are still working -on the latter. As a result, the specification of the language extension is -also still to some degree in flux. Hence, a more detailed description of -the language extension and its use is currently available -from the Haskell -wiki page on type families. The material will be moved to this user's -guide when it has stabilised. - - -Type families are enabled by the flag . - - - - - -- 1.7.10.4