%************************************************************************ %* * \section{How to add an optimisation pass} %* * %************************************************************************ \subsection{Summary of the steps required} Well, with all the preliminaries out of the way, here is all that it takes to add your optimisation pass to the new glorious Glasgow Haskell compiler: \begin{enumerate} \item Select the input and output types for your pass; these will very likely be particular parameterisations of the Core or annotated Core data types. There is a small chance you will prefer to work at the STG-syntax level. (If these data types are inadequate to this purpose, {\em please} let us know!) \item Depending on exactly how you want your pass to work, set up some monad-ery, so as to avoid lots of horrible needless plumbing. The whole compiler is written in a monadic style, and there are plenty of examples from which you may steal. Section~\ref{sec:monadic-style} gives further details about how you might do this. \item Write your optimisation pass, and... {\em Do} use the existing types in the compiler, e.g., @UniType@, and the ``approved'' interfaces to them. {\em Don't} rewrite lots of utility code! Scattered around in various sometimes-obvious places, there is considerable code already written for the mangling and massaging of expressions, types, variables, etc. Section~\ref{sec:reuse-code} says more about how to re-use existing compiler bits. \item Follow our naming conventions \smiley{} Seriously, it may lead to greater acceptance of our code and yours if readers find a system written with at least a veneer of uniformity. \ToDo{mention Style Guide, if it ever really exists.} \item To hook your pass into the compiler, either add something directly to the @Main@ module of the compiler,\srcloc{main/Main.lhs} or into the Core-to-Core simplification driver,\srcloc{simplCore/SimplCore.lhs} or into the STG-to-STG driver.\srcloc{simplStg/SimplStg.lhs} Also add something to the compilation-system driver\srcloc{ghc/driver/ghc.lprl} (called @ghc@ here) so that appropriate user-provided command-line options will be transmogrified into the correct options fed to the @Main@ module. \item Add some appropriate documentation to the user's guide, @ghc/docs/users_guide@. \item Run your optimisation on {\em real programs}, measure, and improve. (Separate from this compiler's distribution, we provide a ``NoFib'' suite of ``real Haskell programs'' \cite{partain92a}. We strongly encourage their use, so you can more readily compare your work with others'.) \item Send us your contribution so we can add it to the distribution! We will be happy to include anything semi-reasonable. This will practically ensure fame, if not fortune, and---with a little luck---considerable notoriety. \end{enumerate} %************************************************************************ %* * \subsection{Using monadic code}\label{sec:monadic-style} %* * %************************************************************************ {\em Monads} are one way of structuring functional programs. Phil Wadler is their champion, and his recent papers on the subject are a good place to start your investigations. ``The essence of functional programming'' even has a section about the use of monads in this compiler \cite{wadler92a}! An earlier paper describes ``monad comprehensions'' \cite{wadler90a}. For a smaller self-contained example, see his ``literate typechecker'' \cite{wadler90b}. We use monads extensively in this compiler, mainly as a way to plumb state around. The simplest example is a monad to plumb a @UniqueSupply@\srcloc{basicTypes/Unique.lhs} (i.e., name supply) through a function. \ToDo{Actually describe one monad thing completely.} We encourage you to use a monadic style, where appropriate, in the code you add to the compiler. To this end, here is a list of monads already in use in the compiler: \begin{description} \item[@UniqueSupply@ monad:] \srcloc{basicTypes/Unique.lhs} To carry a name supply around; do a @getUnique@ when you need one. Used in several parts of the compiler. \item[Typechecker monad:] \srcloc{typecheck/TcMonad.lhs} Quite a complicated monad; carries around a substitution, some source-location information, and a @UniqueSupply@; also plumbs typechecker success/failure back up to the right place. \item[Desugarer monad:] \srcloc{deSugar/DsMonad.lhs} Carries around a @UniqueSupply@ and source-location information (to put in pattern-matching-failure error messages). \item[Code-generator monad:] \srcloc{codeGen/CgMonad.lhs} Carries around an environment that maps variables to addressing modes (e.g., ``in this block, @f@ is at @Node@ offset 3''); also, carries around stack- and heap-usage information. Quite tricky plumbing, in part so that the ``Abstract~C'' output will be produced lazily. \item[Monad for underlying I/O machinery:] \srcloc{ghc/lib/io/GlaIOMonad.lhs} This is the basis of our I/O implementation. See the paper about it \cite{peyton-jones92b}. \end{description} %************************************************************************ %* * \subsection{Adding a new @PrimitiveOp@}\label{sec:add-primop} %* * %************************************************************************ You may find yourself wanting to add a new @PrimitiveOp@\srcloc{prelude/PrimOps.lhs} to the compiler's repertoire: these are the lowest-level operations that cannot be expressed in Haskell---in our case, things written in C. What you would need to do to add a new op: \begin{itemize} \item Add it to the @PrimitiveOp@ datatype in @prelude/PrimOps.lhs@; it's just an enumeration type. \item Most importantly, add an entry in the @primOpInfo@ function for your new primitive. \item If you want your primitive to be visible to some other part of the compiler, export it via the @AbsPrel@\srcloc{prelude/AbsPrel.lhs} interface (and import it from there). \item If you want your primitive to be visible to the user (modulo some ``show-me-nonstd-names'' compiler flag...), add your primitive to one or more of the appropriate lists in @buildinNameFuns@, in @prelude/AbsPrel.lhs@. \item If your primitive can be implemented with just a C macro, add it to @ghc/imports/StgMacros.lh@. If it needs a C function, put that in @ghc/runtime/prims/@, somewhere appropriate; you might need to put a declaration of some kind in a C header file in @ghc/imports/@. \item If these steps are not enough, please get in touch. \end{itemize} %************************************************************************ %* * \section{How to add a new ``PrimOp'' (primitive operation)} %* * %************************************************************************ %************************************************************************ %* * \section{How to add a new ``user pragma''} %* * %************************************************************************ %************************************************************************ %* * \section{GHC utilities and re-usable code}\label{sec:reuse-code} %* * %************************************************************************ %************************************************************************ %* * \subsection{Reuse existing utilities} %* * %************************************************************************ Besides the utility functions provided in Haskell's standard prelude, we have several modules of generally-useful utilities in \mbox{\tt utils/} (no need to re-invent them!): \begin{description} \item[@Maybe@ and @MaybeErr@:] Two very widely used types (and some operations on them): \begin{verbatim} data Maybe a = Nothing | Just a data MaybeErr a b = Succeeded a | Failed b \end{verbatim} \item[@Set@:] A simple implementation of sets (an abstract type). The things you want to have @Sets@ of must be in class @Ord@. \item[@ListSetOps@:] A module providing operations on lists that have @Set@-sounding names; e.g., @unionLists@. \item[@Digraph@:] A few functions to do with directed graphs, notably finding strongly-connected components (and cycles). \item[@Util@:] General grab-bag of utility functions not provided by the standard prelude. \end{description} Much of the compiler is structured around major datatypes, e.g., @UniType@ or @Id@. These datatypes (often ``abstract''---you can't see their actual constructors) are packaged with many useful operations on them. So, again, look around a little for these functions before rolling your own. Section~\ref{sec:reuse-datatypes} goes into this matter in more detail. %************************************************************************ %* * \subsection{Use pretty-printing and forcing machinery} %* * %************************************************************************ All of the non-trivial datatypes in the compiler are in class @Outputable@, meaning you can pretty-print them (method: @ppr@) or force them (method: @frc@). Pretty-printing is by far the more common operation. @ppr@ takes a ``style'' as its first argument; it can be one of @PprForUser@, @PprDebug@, or @PprShowAll@, which---in turn---are intended to show more and more detail. For example, @ppr PprForUser@ on a @UniType@ should print a type that would be recognisable to a Haskell user; @ppr PprDebug@ prints a type in the way an implementer would normally want to see it (e.g., with all the ``for all...''s), and @ppr PprShowAll@ prints everything you could ever want to know about that type. @ppr@ produces a @Pretty@, which should eventually wend its way to @main@. @main@ can then peruse the program's command-line options to decide on a @PprStyle@ and column width in which to print. In particular, it's bad form to @ppShow@ the @Pretty@ into a @String@ deep in the bowels of the compiler, where the user cannot control the printing. If you introduce non-trivial datatypes, please make them instances of class @Outputable@. %************************************************************************ %* * \subsection{Use existing data types appropriately}\label{sec:reuse-datatypes} %* * %************************************************************************ The compiler uses many datatypes. Believe it or not, these have carefully structured interfaces to the ``outside world''! Unfortunately, the current Haskell module system does not let us enforce proper access to these datatypes to the extent we would prefer. Here is a list of datatypes (and their operations) you should feel free to use, as well as how to access them. The first major group of datatypes are the ``syntax datatypes,'' the various ways in which the program text is represented as it makes its way through the compiler. These are notable in that you are allowed to see/make-use-of all of their constructors: \begin{description} \item[Prefix form:]\srcloc{reader/PrefixSyn.lhs} You shouldn't need this. \item[Abstract Haskell syntax:]\srcloc{abstractSyn/AbsSyn.lhs} Access via the @AbsSyn@ interface. An example of what you should {\em not} do is import the @AbsSynFuns@ (or @HsBinds@ or ...) interface directly. @AbsSyn@ tells you what you're supposed to see. \item[Core syntax:]\srcloc{coreSyn/*Core.lhs} Core syntax is parameterised, and you should access it {\em via one of the parameterisations}. The most common is @PlainCore@; another is @TaggedCore@. Don't use @CoreSyn@, though. \item[STG syntax:]\srcloc{stgSyn/StgSyn.lhs} Access via the @StgSyn@ interface. \item[Abstract~C syntax:]\srcloc{absCSyn/AbsCSyn.lhs} Access via the @AbsCSyn@ interface. \end{description} The second major group of datatypes are the ``basic entity'' datatypes; these are notable in that you don't need to know their representation to use them. Several have already been mentioned: \begin{description} \item[UniTypes:]\srcloc{uniType/AbsUniType.lhs} This is a gigantic interface onto the world of @UniTypes@; accessible via the @AbsUniType@ interface. You should import operations on all the {\em pieces} of @UniTypes@ (@TyVars@, @TyVarTemplates@, @TyCons@, @Classes@, and @ClassOps@) from here as well---everything for the ``type world.'' {\em Please don't grab type-related functions from internal modules, behind @AbsUniType@'s back!} (Otherwise, we won't discover the shortcomings of the interface...) \item[Identifiers:]\srcloc{basicTypes/Id.lhs} Interface: @Id@. \item[``Core'' literals:]\srcloc{basicTypes/CoreLit.lhs} These are the unboxed literals used in Core syntax onwards. Interface: @CoreLit@. \item[Environments:]\srcloc{envs/GenericEnv.lhs} A generic environment datatype, plus a generally useful set of operations, is provided via the @GenericEnv@ interface. We encourage you to use this, rather than roll your own; then your code will benefit when we speed up the generic code. All of the typechecker's environment stuff (of which there is plenty) is built on @GenericEnv@, so there are plenty of examples to follow. \item[@Uniques@:]\srcloc{basicTypes/Unique.lhs} Essentially @Ints@. When you need something unique for fast comparisons. Interface: @Unique@. This interface also provides a simple @UniqueSupply@ monad; often just the thing... \item[Wired-in standard prelude knowledge:]\srcloc{prelude/} The compiler has to know a lot about the standard prelude. What it knows is in the @compiler/prelude@ directory; all the rest of the compiler gets its prelude knowledge through the @AbsPrel@ interface. The prelude stuff can get hairy. There is a separate document about it. Check the @ghc/docs/README@ list for a pointer to it... \end{description} The above list isn't exhaustive. By all means, ask if you think ``Surely a function like {\em this} is in here somewhere...'' %************************************************************************ %* * \section{Cross-module pragmatic info: the mysteries revealed} %* * %************************************************************************ ToDo: mention wired-in info. %************************************************************************ %* * \section{GHC hacking tips and ``good practice''} %* * %************************************************************************ ASSERT %************************************************************************ %* * \section{Glasgow pragmatics: build trees, etc.} %* * %************************************************************************