Haskell Libraries

Haskell Libraries The Haskell Libraries Mailing List

libraries@haskell.org

Introduction This document consistutes a proposal for an extension to the Haskell 98 language. The proposal has several parts: A modest language extension to Haskell 98 that adds the character . to the lexical syntax for a module name, allowing a hierarchical module namespace where a module name is a sequence of components separated by periods. The extension is described in . An allocation of the new module namespace to existing and non-existent libraries, people, organisations, and local use. A policy and procedure for allocating new parts of the namespace. A set of libraries which are under the control of the community, have reference implementations kept in a standard place, and conform to a set of guidelines and policies set out in this document. We shall call this set of libraries the core libraries. In addition, this document also describes: Guidelines and conventions for organising the hierarchy. Our policy with respect to the design and evolution of library APIs, versioning of library APIs, and maintenance of the reference implementation. A set of conventions for coding style and portability within the core libraries. How to contribute This project is driven by the Haskell community, so contributions of all kinds are welcome. The first step is to join the Haskell libraries mailing list, and maybe browse the list archives. Some of the ways you can contribute are: By donating code: for libraries in the core set which don't yet have a reference implementation, or for new contributions to the core set, code is always welcome. Code that conforms to the style guidelines (which aren't very strict, see ) and comes with documentation () and a test suite () is better, but these aren't essential. As a library progresses through the stability scale () these things become more important, but for an experimental library we're not going to worry too much about this stuff. By porting code for an existing library to a new compiler or architecture. A library is classed as portable if it should be available regardless of which compiler/platform combination you're using; however, many libraries are non-portable for one reason or another (see , and broadening the scope of these libraries is always welcome. Become a library maintainer: if you have a particular interest in and/or knowledge about a certain library, and have the time to spare, and the library in question doesn't already have a maintainer, then you may be a suitable maintainer for the library. The responsibilities of library maintainers are given in . Participating in the design process for new libraries, and suggesting improvements to existing libraries. Everyone on the Haskell libraries mailing list is invited to participate in the design process, so get involved! The language extension The key concept here is to map the module namespace into a hierarchical directory-like structure. We propose using the dot as a separator, analogous to Java's usage for namespaces. For most compilers and interpreters, this extended module namespace maps directly to a directory/file structure in which the modules are stored. Storing unrelated modules in separate directories (and related modules in the same directory) is a useful and common practice when engineering large systems. (But note that, just as Haskell'98 does not insist that modules live in files of the same name, this proposal does not insist on it either. However, we expect most tools to use the close correspondance to their advantage.) There are several issues arising from this proposal proposal here. This is a surface change to the module naming convention. It does not introduce nested definition of modules. The syntax we propose (a dot separator) is familiar from other languages such as Java, but could in principle be something else, for instance a prime ', underscore _ or centred dot ċ or something different again. Of the choices of separator, dot requires a change to the Haskell'98 lexical syntax, allowing modid -> qconid qconid -> [modid .] conid where currently the syntax is modid -> conid qconid -> [modid .] conid Note that the new syntax is recursive, a modid may contain multiple components separated by dots, where the final component is a conid. A consequence of using the dot as the module namespace separator is that it steals one extremely rare construction from Haskell'98: A.B.C.D in Haskell'98 means the composition of constructor D from module C, with constructor B from module A: (.) A.B C.D No-one so far thinks this is any great loss, and if you really want to say the latter, you still can by simply inserting spaces: A.B . C.D A possible extension The use of qualified imports has become more verbose: for instance import qualified XmlParse ... XmlParse.element f ... becomes import qualified Text.Xml.Parse ... Text.Xml.Parse.element f ... It is usually more convenient to make use of Haskell's as keyword to shorten qualified identifiers: import qualified Text.Xml.Parse as Parse ... Parse.element f ... A possible extension to the proposal is to make this use of as implicit, unless overridden by the programmer with her own as clause. The implicit as clause always uses the final subdivision of the module name. So for instance, either the fully-qualified or abbreviated-qualified names Text.Xml.Parse.element Parse.element would be accepted and have the same referent, but a partial qualification like Xml.Parse.element would not be accepted. Renaming subtrees Various proposals have been made to allow you to rename a whole subtree. This may occasionally be convenient: for example suppose there are several libraries under Org.Com.Microsoft that I need to import, it would be easier to rename this subtree to just Microsoft for use in future import declarations. For example: import Org.Com.Microsoft.* as Microsoft.* import Microsoft.Foo import Microsoft.Bar ... The exact syntax of the renaming declaration is up for debate (as is whether we need it at all), please send suggestions to libraries@haskell.org. The hierarchy layout We first classify each node in the hierarchy according to one of the following terms: ToDo: unpublished interfaces. Allocated Nodes in the hierarchy can be allocated to a library (whether the library actually exists or not). The currently allocated nodes are specified in . User The User hierarchy is reserved for users: a user may always use the portion of the hierarchy which is formed from his/her email address as follows: replace any .s in the username (before the @) with _, replace the @ by a ., reverse the order of the components, capitalise the first letter of each component, and prepend User.. For example, simonmar@microsoft.com becomes User.Com.Microsoft.Simonmar. Organisation The Org hierarchy is reserved for organisations. Any organisation with a DNS domain name owns a unique space in the hierarchy formed by reversing the components of the domain, capitalising the first character of each component, and prepending Org.. ToDo: the Org name isn't great, especially when the domain name also ends with Org (eg. Org.Org.Haskell?). Contrib has also been suggested. Local The Local hierarchy is reserved for libraries which are local to the current site. Libraries which are to be distributed outside the current site should not be placed in the Local hierarchy. Top-level All top-level names (i.e. module names that don't contain a .) that are otherwise unallocated, are available for use by the program. Note that for compabibility with Haskell 98, some modules in this namespace are reserved (eg. Directory, IO, Time etc.). Unallocated Any node which doesn't belong to any of the above categories is currently unallocated, and is not available for use. A node in the hierarchy may be both a specific library and a parent node for a number of child nodes. For example, Foreign is a library, and so is Foreign.Ptr. Hierarchy design guidelines Apart from the User, Local and Organisation top-level categories, the rest of the hierarchy is organised with a single principle in mind:

Modules are grouped by functionality, since this is the single property that is most helpful for a user of the library - we want users to be able to find out where to obtain functionality easily, and to easily find all the modules that provide relevant functionality. So, if two modules provide similar functionality, or alternative interfaces to the same functionality, then they should be children of the same node in the hierarchy. Modules are never grouped by standards compliance, portability, stability, or any other property.

There are some other considerations when choosing where to place libraries. Where possible, choose a layout that finds a good compromise between depth of nesting and logical grouping of functionality; for example, although the Text hierarchy could logically be placed as a child of FileFormat, we choose not to because Text is ubiquitous and we don't want to have to type the extra component all the time. Also consider consistency: if a particular sub-hierarchy provides similar functionality to another sub-hierarchy in the tree, then preferably the structure of the two subtrees should also be similar. For example: under Language.Haskell we have children Syntax, Lexer, Parser etc., so under Language.C we should have a similar structure. Module Naming Conventions A module defining a data type or type class X has itself the name X, e.g. StablePtr. A module which re-exports the modules in a subtree of the hierarchy has the same name as the root of that subtree, eg. Foreign re-exports Foreign.Ptr, Foreign.Marshal.Utils etc. If a subtree of the hierarchy contains several modules which provide similar functionality (eg. there are several pretty-printing libraries under Text.PrettyPrinter), then the module at the root of the subtree generally re-exports just one of the modules in the subtree (possibly the most popular or commonly-used alternative). In Haskell you sometimes publish two interfaces to your libraries; one for users, and one for library writers or advanced users who might want to extend things. Typically the advanced users need to be able to see past certain abstractions. The current proposal is for a module named M, the advanced version would be named M.Internals. eg. import Text.HTML -- The library import Text.HTML.Internals -- The non-abstract library Acronyms are fully capitalised in a module name. eg. HTML, URI, CGI, etc. Exceptions may be made for acronyms which have an existing well-established alternative capitalisation, or acronyms which are also valid words, and are more often used as such. A module name should be made plural only if the module actually defines multiple entities of a particular kind: eg. Foreign.C.Types. Most module names which define a type or class will follow the name of the type or class, so whether to pluralize is not an issue. The hierarchy The currently allocated top-level names are: Prelude Haskell98 Prelude (mostly just re-exports other parts of the tree). Control Libraries which provide functions, types or classes whose purpose is primarily to express control structure. Data Libraries which provide data types, operations over data types, or type classes, except for libraries for which one of the other more specific categories is appropriate. Database Libraries for providing access to or operations for building databases. Debug Support for debugging Haskell programs. Edison The Edison data structure library. FileFormat Support for reading and/or writing various file formats (except: programming language source code which lives in Language, database formats which live in Database, and textual file formats which are catered for in Text). Foreign Interaction with code written in a foreign programming language. Graphics Libraries for producing graphics or providing graphical user interfaces. Language Libraries for operating on or generating source code in various programming languages, including parsers, pretty printers, abstract syntax definitions etc. Local Available for site-local use. Numeric Functions and classes which provide operations over numeric data. Network Libraries for communicating over a network, including implementations of network protocols. Org Allocated to organisations on a domain-name basis (see ). System Libraries for communication with the system on which the Haskell program is running (including the runtime system). Text Libraries for parsing and generating data in a textual format (including structured textual formats such as XML, HTML, but not including programming language source, which lives in Language). GHC Libraries specific to the GHC/GHCi system. Nhc Libraries specific to the Nhc compiler. Hugs Libraries specific to the Hugs system. User Allocated to individual users, using email addresses (see ). Licensing Following some discussion on the mailing list related to how we should license the libraries, the viewpoint that was least offensive to all involved seems to be the following: We wish to accomodate source code from different contributors, and with different licenses. However, a library of modules where each module is released under a different license, and where the dependencies between modules aren't clear, isn't workable (it's too hard for a user of the library to tell whether they're violating the terms of the each license or not). So the solution is as follows: code under different licenses will be clearly separate in the repository (i.e. in separate subdirectories), and compilers are expected to present packages of modules where all modules in a package fall under the same license, and where the dependencies between packages are clear. It was decided that certain essential functionality should be available under a BSD style license. Hence, the BSD part of the repository will contain implementations of at least the following modules: Prelude, Foreign, ToDo: what else?. There is one further requirement: only licenses approved by the Open Source Initiative may be used with the core libraries. See The Open Source Initiative for a list of approved licensees. ToDo: include a prototype BSD license here. Versioning Library Stability The stability of a library relates primarily to its API. Stability provides an indication of how often the API is likely to change (or whether it may even go away entirely). The stability scale is also a measure of how strictly the conventions in this document are applied to the library: an experimental library isn't subject to any restrictions regarding coding style and documentation, but a stable library is expected to adhere to the guidelines, and come with full documentation and tests. To help with the stability issue, library maintainers are allowed to mark functions, types or classes as deprecatedCompilers may have extra support for warning about the use of a deprecated feature, for example GHC's DEPRECATED pragma. , which means simply that the feature will be removed at a later date. Just how long it will stick around for depends on the stability category of the library (see below). A feature is marked as deprecated in the documentation for the library, and optionally in an implementation-dependent way which enables the system to warn about the use of deprecated features. The current stability categories are: experimental An experimental library is unrestricted in terms of API changes: the API may change between minor revisions and there is no requirement to retain old interfaces for compatibility. Documentation and tests aren't required for an experimental library. provisional A provisional library is moving towards stability, and the rate of change of the API is slower. API changes between minor revisions must be accompanied by deprecated versions of the old features where possible. API changes between major versions are unrestricted. The library should come with at least rudimentary documentation. stable A stable library has an essentially fixed API. Additions to the API may be made for a minor release, deprecated features must be retained for at least one major revision, and small changes only may be made to the existing API semantics for a major revision. A stable library is expected to include full documentation and tests. Portability Considerations The portability status of a library affects under which platforms and compilers the library will be available on. Haskell implementations are expected to provide all of the portable core libraries, and those non-portable core libraries which are appropriate for that particular platform/compiler implementation. The precise meaning of the terms portable and non-portable for our purposes are given below: Portable A portable library may use only Haskell 98 features plus approved extensions, and may not use any platform-specific features. It may make use of other portable libraries only. Non-portable A non-portable library may be non-portable for one or more of the following reasons: Requires extensions A library which uses non-approved language extensions. Requires nonportable libraries A library which depends (directly or indirectly) on other non-portable libraries. OS-specific Platform-specific A library which depends on features or APIs particular to a certain OS or platform is non-portable for that reason. Approved Extensions Very few of the core libraries can be implemented using pure Haskell 98. For this reason, we decided to raise the baseline for portable libraries to include a few common extensions; the following langauge extensions can be assumed to be present when writing libraries: The Foreign Function Interface. Mutable variables (Data.IORef). Unsafe IO monad operations (System.IO.Unsafe). Packed strings (Data.PackedString). Extensions which we'd like to be standard, but aren't currently implemented by one or more of the target compilers: Bit operations (Data.Bits). Exceptions (synchronous only), defined by the Control.Exception interface. The ST monad, defined by Control.Monad.ST, and the associated Data.Array.ST and Data.STRef libraries. ST requires a small typechecker extension for the runST function. Concurrent Haskell (pre-emptive multitasking optional). GHC and Hugs implement this, but Nhc currently does not. The following extensions are not likely to become part of the baseline, but are nevertheless used by one or more libraries in the core set (which are thus designated non-portable): Multi-parameter type classes. Local unversal and existential quantification. Concurrent Haskell with pre-emptive multitasking. Asynchronous exceptions. Stable Names. Weak Pointers. Other extensions are supported by a single compiler only, and can be accessed by libraries under the top level hierarchy for that compiler, eg. GHC.UnboxedTypes. Library Maintainers This is a collaborative project, so we like to devolve control of the design and implementation of libraries to those with an interest or appropriate expertise (or maybe just the time!). A maintainer isn't necessarily a single person - for example, the listed maintainer for most of the core libraries is libraries@haskell.org, indicating that the library is under the control of the community as a whole. The maintainer for the Foreign hierarchy is ffi@haskell.org, the mailing list for discussion of the Haskell FFI standard. The responsibilities of a library maintainer include: Most importantly: act as a single point of contact for issues relating to the library API and its implementation. Manage any discussion related to the library (which can take place on libraries@haskell.org if necessary), and summarise the results. Make final decisions, and implement them. Maintain the implementation, including: fixing bugs, updating to keep up with changes in other libraries, porting to new compilers/platforms, and integrating code from other contributors. The maintainer is expected to be the only person/group to make functional changes to the source code (non-functional or trivial changes don't count). Maintain/write the documentation and tests. If you can't maintain the library any more for whatever reason, tell libraries@haskell.org and we'll revert the maintainer status of the library to the default. The Core Team The core team is responsible for making final decisions about the project as a whole and resolving disputes where necessary. We expect that needing to invoke the core team will be a rare occurrence. The core team is also responsible for approving maintainership requests. Currently, the core team consists of one person from each of the compiler camps, and these are also the people that will primarily be maintaining the library framework for their respective compiler projects: Simon Marlow simonmar@microsoft.com (GHC representative) Malcolm Wallace Malcolm.Wallace@cs.york.ac.uk (Nhc representative) Andy Gill andy@galconn.com (Hugs representative) Documentation Testing Migration path How compatible will a compiler using the new libraries be with code written for Haskell 98 or older library systems (such as the hslibs suite and GHC's package system), and for how long will compatibility be maintained? Our current plan for GHC is as follows: by default, with the flag, you'll get access to the core libraries. Compatibility with Haskell 98 code will be maintained using a separate package of wrappers presenting interfaces for the Haskell 98 libraries (IO, Ratio, Directory, etc.). The Haskell 98 compatibility package will be enabled by default, but we plan to add an option to disable it if necessary. For code that uses -package lang, we could also provide a compatibility wrapper package (so -package lang will continue to work as before and present the same library interfaces), but this may prove too much work to maintain - we haven't decided whether to do this or not. It is unlikely that compatibility wrappers for any of the other hslibs packages will be provided. Programming Conventions Standard Module Header The following module header will be used for all core libraries, and we recommend using it for library source code in general: ----------------------------------------------------------------------------- -- -- Module : module -- Copyright : (c) author year -- License : license -- -- Maintainer : libraries@haskell.org | email-address -- Stability : experimental | provisional | stable -- Portability : portable | non-portable (reason(s)) -- -- $Id: libraries.sgml,v 1.8 2002/06/11 10:53:03 simonmar Exp $ -- -- Description ----------------------------------------------------------------------------- where: $Id: libraries.sgml,v 1.8 2002/06/11 10:53:03 simonmar Exp $ is optional, but usually included if the module is under CVS or RCS control. module is the fully qualified module name of the module author/year Is the primary author and copyright holder of the module, and the year in which copyright is claimed. license Specifies the license on the file (see ). email-address The email address of the maintainer, or maintainers, of the library (see ). reason(s) The reasons for non-portability must be listed (see ). description A short description of the module. Naming Conventions These naming conventions are pulled straight from the hslibs documentation. They were formed after lengthy discussions and are heavily based on an initial suggestion from Marcin Kowalczyk qrczak@knm.org.pl. Note that the conventions are not mutually exclusive, e.g. should the function creating a set from a list of elements have the name set or listToSet? (Alas, it currently has neither name.) The following nomenclature is used: Pure, i.e. non-monadic functions are simply called, well, functions. Monadic functions, i.e. functions having a type ... -> m a for some Monad m are called actions. Constructor names Constructor names Empty values of type X have the name emptyX, e.g. emptySet. Actions creating a new empty value of type X have the name newEmptyX, e.g. newEmptyMVar. Functions creating an arbitrary value of type X have the name X itself (with the first letter downcased), e.g. array. (TODO: This often collides with xToY convention, how should this be resolved?) Actions creating new values arbitrary values of type X have the name newX, e.g. newIORef. Accessor names Accessor names Functions getting an attribute of a value or a part of it have the name of the attribute itself, e.g. length, bounds. Actions accessing some kind of reference or state have the name getX, where X is the type of the contents or the name of the part being accessed, e.g. getChar, getEnv. An alternative naming scheme is readY, where Y is the type of the reference or container, e.g. readIORef. Functions or actions getting a value via a pointer-like type X should be named deRefX, e.g. deRefStablePtr, deRefWeak. Modifier names Modifier names Functions returning a value with attribute X set to a new value should be named setX. (TODO: Add Examples.) Actions setting some kind of reference or state have the name putX, where X is the type of the contents or the name of the part being accessed, e.g. putChar. An alternative naming scheme is writeY, where X is the type of the reference or container, e.g. writeIORef. Actions in the IO monad setting some global state X are traditionally named setX, too, although putX would be more appropriate, e.g. setReadlineName. Actions modifying a container X by a function of type a -> a have the name modifyX, e.g. modifySTRef. Predicate names Predicate names Predicates, both non-monadic and monadic, testing a property X have the name isX. Names for conversions Names for conversions Functions converting a value of type X to a value of type Y have the name XToY with all leading uppercase characters of X converted to lower case, e.g. stToIO. Overloaded conversion functions of type C a => a -> X have the name toX, e.g. toInteger. Overloaded conversion functions of type C a => X -> a have the name fromX, e.g. fromInteger. Miscellaneous naming conventions Miscellaneous naming convetions An action that is identical to another one called X, but discards the return value has the name X_, e.g. mapM and mapM_. Functions and actions which are potentially dangerous to use and leave some kind of proof obligation to the programmer have the name unsafeX, e.g. unsafePerformIO. There are two conventions for binary and N-ary variants of an associative operation: One convention uses an operator or a short name for the binary operation and a long name for the N-ary variant, e.g. (+) and sum, max and maximum. The other convention suffixes the N-ary variant with Many. (TODO: Add Examples.) If possible, names are chosen such that either plain application or arg1 `operation` arg2 is correct English, e.g. isPrefixOf is good for use in backquotes. Library design conventions Actions setting and modifying a kind of reference or state return (), getting the value is separate, e.g. writeIORef and modifyIORef both return (), only readIORef returns the value in an IORef A function or action taking a some kind of state and returning a pair consisting of a result and a new state, the result is the first element of the pair and the new state is the second, see e.g. Random. When the type Either is used to encode an error condition and a normal result, Left is used for the former and Right for the latter, see e.g. Control.Monad.Error. A module corresponding to a class (e.g. Bits) contains the class definition, perhaps some auxiliary functions, and all sensible instances for Prelude types, but nothing more. Other modules containing types for which an instance for the class in question makes sense contain the code for the instance itself. Record-like C bit fields or structs have a record-like interface, i.e. pure getting and setting of fields. (TODO: Clarify a little bit. Add examples.) Although the possibility of partial application suggests the type attr -> object -> object for functions setting an attribute or value, infix notation with backquotes implies object -> attr -> object. (TODO: Add Examples.) Coding style conventions Changes to standard Haskell 98 libraries Some changes have been made to the standard Haskell 98 libraries in the new library scheme, both in the names of the modules themselves and in their exported interfaces. Below is a summary of those changes - at this time, the new libraries are marked as provisional and are maintained by libraries@haskell.org, so changes in the interfaces are all up for discussion. modules with interface changes ------------------------------ Array -> Data.Array added instance Typeable (Array ix a) Char -> Data.Char no interface changes (should have instance Typeable?) Complex -> Data.Complex added instance Typeable (Complex a) IO -> System.IO added hPutBuf :: Handle -> Ptr a -> Int -> IO () hGetBuf :: Handle -> Ptr a -> Int -> IO Int fixIO :: (a -> IO a) -> IO a hSetEcho :: Handle -> Bool -> IO () hGetEcho :: Handle -> IO Bool hIsTerminalDevice :: Handle -> IO Bool List -> Data.List exports [](..) System -> System.Exit, System.Environment, System.Cmd split into three modules just renamed, no interface changes: ----------------------------------- CPUTTime -> System.CPUTime Directory -> System.IO.Directory Ix -> Data.Ix Locale -> System.Locale Maybe -> Data.Maybe Monad -> Data.Monad Numeric -> Numeric Random -> System.Random Ratio -> Data.Ratio Time -> System.Time