X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fadd_to_compiler%2Fhowto-add.verb;fp=ghc%2Fdocs%2Fadd_to_compiler%2Fhowto-add.verb;h=c5dfcf65b2b2deccc27edc9dd425bea6343db6b5;hb=e7d21ee4f8ac907665a7e170c71d59e13a01da09;hp=0000000000000000000000000000000000000000;hpb=e48474bff05e6cfb506660420f025f694c870d38;p=ghc-hetmet.git diff --git a/ghc/docs/add_to_compiler/howto-add.verb b/ghc/docs/add_to_compiler/howto-add.verb new file mode 100644 index 0000000..c5dfcf6 --- /dev/null +++ b/ghc/docs/add_to_compiler/howto-add.verb @@ -0,0 +1,353 @@ +%************************************************************************ +%* * +\section{How to add an optimisation pass} +%* * +%************************************************************************ +\subsection{Summary of the steps required} + +Well, with all the preliminaries out of the way, here is all that it +takes to add your optimisation pass to the new glorious Glasgow +Haskell compiler: +\begin{enumerate} +\item +Select the input and output types for your pass; these will very +likely be particular parameterisations of the Core or annotated Core +data types. There is a small chance you will prefer to work at the +STG-syntax level. (If these data types are inadequate to this +purpose, {\em please} let us know!) + +\item +Depending on exactly how you want your pass to work, set up some +monad-ery, so as to avoid lots of horrible needless plumbing. The +whole compiler is written in a monadic style, and there are plenty of +examples from which you may steal. Section~\ref{sec:monadic-style} +gives further details about how you might do this. + +\item +Write your optimisation pass, and... + +{\em Do} use the existing types in the compiler, e.g., @UniType@, +and the ``approved'' interfaces to them. + +{\em Don't} rewrite lots of utility code! Scattered around in various +sometimes-obvious places, there is considerable code already written +for the mangling and massaging of expressions, types, variables, etc. + +Section~\ref{sec:reuse-code} says more about how to re-use existing +compiler bits. + +\item +Follow our naming conventions \smiley{} Seriously, it may lead to greater +acceptance of our code and yours if readers find a system written with +at least a veneer of uniformity. +\ToDo{mention Style Guide, if it ever really exists.} + +\item +To hook your pass into the compiler, either add something directly to +the @Main@ module of the compiler,\srcloc{main/Main.lhs} or into the +Core-to-Core simplification driver,\srcloc{simplCore/SimplCore.lhs} or +into the STG-to-STG driver.\srcloc{simplStg/SimplStg.lhs} + +Also add something to the compilation-system +driver\srcloc{ghc/driver/ghc.lprl} +(called @ghc@ here) so that appropriate user-provided command-line +options will be transmogrified into the correct options fed to the +@Main@ module. + +\item +Add some appropriate documentation to the user's guide, +@ghc/docs/users_guide@. + +\item +Run your optimisation on {\em real programs}, measure, and improve. +(Separate from this compiler's distribution, we provide a ``NoFib'' +suite of ``real Haskell programs'' \cite{partain92a}. We strongly +encourage their use, so you can more readily compare your work with +others'.) + +\item +Send us your contribution so we can add it to the distribution! We +will be happy to include anything semi-reasonable. +This will practically ensure fame, if +not fortune, and---with a little luck---considerable notoriety. +\end{enumerate} + +%************************************************************************ +%* * +\subsection{Using monadic code}\label{sec:monadic-style} +%* * +%************************************************************************ + +{\em Monads} are one way of structuring functional programs. Phil +Wadler is their champion, and his recent papers on the subject are a +good place to start your investigations. ``The essence of functional +programming'' even has a section about the use of monads in this +compiler \cite{wadler92a}! An earlier paper describes ``monad +comprehensions'' \cite{wadler90a}. For a smaller self-contained +example, see his ``literate typechecker'' \cite{wadler90b}. + +We use monads extensively in this compiler, mainly as a way to plumb +state around. The simplest example is a monad to plumb a +@UniqueSupply@\srcloc{basicTypes/Unique.lhs} (i.e., name supply) +through a function. + +\ToDo{Actually describe one monad thing completely.} + +We encourage you to use a monadic style, where appropriate, in +the code you add to the compiler. To this end, here is a list of +monads already in use in the compiler: +\begin{description} +\item[@UniqueSupply@ monad:] \srcloc{basicTypes/Unique.lhs} +To carry a name supply around; do a @getUnique@ when you +need one. Used in several parts of the compiler. + +\item[Typechecker monad:] \srcloc{typecheck/TcMonad.lhs} +Quite a complicated monad; carries around a substitution, some +source-location information, and a @UniqueSupply@; also plumbs +typechecker success/failure back up to the right place. + +\item[Desugarer monad:] \srcloc{deSugar/DsMonad.lhs} +Carries around a @UniqueSupply@ and source-location information (to +put in pattern-matching-failure error messages). + +\item[Code-generator monad:] \srcloc{codeGen/CgMonad.lhs} +Carries around an environment that maps variables to addressing modes +(e.g., ``in this block, @f@ is at @Node@ offset 3''); also, carries +around stack- and heap-usage information. Quite tricky plumbing, in +part so that the ``Abstract~C'' output will be produced lazily. + +\item[Monad for underlying I/O machinery:] \srcloc{ghc/lib/io/GlaIOMonad.lhs} +This is the basis of our I/O implementation. See the paper about it +\cite{peyton-jones92b}. +\end{description} + +%************************************************************************ +%* * +\subsection{Adding a new @PrimitiveOp@}\label{sec:add-primop} +%* * +%************************************************************************ + +You may find yourself wanting to add a new +@PrimitiveOp@\srcloc{prelude/PrimOps.lhs} to the compiler's +repertoire: these are the lowest-level operations that cannot be +expressed in Haskell---in our case, things written in C. + +What you would need to do to add a new op: +\begin{itemize} +\item +Add it to the @PrimitiveOp@ datatype in @prelude/PrimOps.lhs@; it's +just an enumeration type. +\item +Most importantly, add an entry in the @primOpInfo@ function for your +new primitive. +\item +If you want your primitive to be visible to some other part of the +compiler, export it via the @AbsPrel@\srcloc{prelude/AbsPrel.lhs} +interface (and import it from there). +\item +If you want your primitive to be visible to the user (modulo some +``show-me-nonstd-names'' compiler flag...), add your primitive to one +or more of the appropriate lists in @buildinNameFuns@, in +@prelude/AbsPrel.lhs@. +\item +If your primitive can be implemented with just a C macro, add it to +@ghc/imports/StgMacros.lh@. If it needs a C function, put that in +@ghc/runtime/prims/@, somewhere appropriate; you might need to put a +declaration of some kind in a C header file in @ghc/imports/@. +\item +If these steps are not enough, please get in touch. +\end{itemize} + +%************************************************************************ +%* * +\section{How to add a new ``PrimOp'' (primitive operation)} +%* * +%************************************************************************ + +%************************************************************************ +%* * +\section{How to add a new ``user pragma''} +%* * +%************************************************************************ + +%************************************************************************ +%* * +\section{GHC utilities and re-usable code}\label{sec:reuse-code} +%* * +%************************************************************************ + +%************************************************************************ +%* * +\subsection{Reuse existing utilities} +%* * +%************************************************************************ + +Besides the utility functions provided in Haskell's standard prelude, +we have several modules of generally-useful utilities in \mbox{\tt utils/} +(no need to re-invent them!): +\begin{description} +\item[@Maybe@ and @MaybeErr@:] +Two very widely used types (and some operations on them): +\begin{verbatim} +data Maybe a = Nothing | Just a +data MaybeErr a b = Succeeded a | Failed b +\end{verbatim} + +\item[@Set@:] +A simple implementation of sets (an abstract type). The things you +want to have @Sets@ of must be in class @Ord@. + +\item[@ListSetOps@:] +A module providing operations on lists that have @Set@-sounding names; +e.g., @unionLists@. + +\item[@Digraph@:] +A few functions to do with directed graphs, notably finding +strongly-connected components (and cycles). + +\item[@Util@:] +General grab-bag of utility functions not provided by the standard +prelude. +\end{description} + +Much of the compiler is structured around major datatypes, e.g., +@UniType@ or @Id@. These datatypes (often ``abstract''---you can't +see their actual constructors) are packaged with many useful +operations on them. So, again, look around a little for these +functions before rolling your own. Section~\ref{sec:reuse-datatypes} +goes into this matter in more detail. + +%************************************************************************ +%* * +\subsection{Use pretty-printing and forcing machinery} +%* * +%************************************************************************ + +All of the non-trivial datatypes in the compiler are in class +@Outputable@, meaning you can pretty-print them (method: @ppr@) or +force them (method: @frc@). + +Pretty-printing is by far the more common operation. @ppr@ takes a +``style'' as its first argument; it can be one of @PprForUser@, +@PprDebug@, or @PprShowAll@, which---in turn---are intended to show +more and more detail. For example, @ppr PprForUser@ on a @UniType@ +should print a type that would be recognisable to a Haskell user; +@ppr PprDebug@ prints a type in the way an implementer would normally +want to see it (e.g., with all the ``for all...''s), and +@ppr PprShowAll@ prints everything you could ever want to know about that +type. + +@ppr@ produces a @Pretty@, which should eventually wend its way to +@main@. @main@ can then peruse the program's command-line options to +decide on a @PprStyle@ and column width in which to print. In +particular, it's bad form to @ppShow@ the @Pretty@ into a @String@ +deep in the bowels of the compiler, where the user cannot control the +printing. + +If you introduce non-trivial datatypes, please make them instances of +class @Outputable@. + +%************************************************************************ +%* * +\subsection{Use existing data types appropriately}\label{sec:reuse-datatypes} +%* * +%************************************************************************ + +The compiler uses many datatypes. Believe it or not, these have +carefully structured interfaces to the ``outside world''! Unfortunately, +the current Haskell module system does not let us enforce proper +access to these datatypes to the extent we would prefer. Here is a +list of datatypes (and their operations) you should feel free to use, +as well as how to access them. + +The first major group of datatypes are the ``syntax datatypes,'' the +various ways in which the program text is represented as it makes its +way through the compiler. These are notable in that you are allowed +to see/make-use-of all of their constructors: +\begin{description} +\item[Prefix form:]\srcloc{reader/PrefixSyn.lhs} You shouldn't need +this. + +\item[Abstract Haskell syntax:]\srcloc{abstractSyn/AbsSyn.lhs} Access +via the @AbsSyn@ interface. An example of what you should {\em not} +do is import the @AbsSynFuns@ (or @HsBinds@ or ...) interface +directly. @AbsSyn@ tells you what you're supposed to see. + +\item[Core syntax:]\srcloc{coreSyn/*Core.lhs} Core syntax is +parameterised, and you should access it {\em via one of the +parameterisations}. The most common is @PlainCore@; another is +@TaggedCore@. Don't use @CoreSyn@, though. + +\item[STG syntax:]\srcloc{stgSyn/StgSyn.lhs} Access via the @StgSyn@ interface. + +\item[Abstract~C syntax:]\srcloc{absCSyn/AbsCSyn.lhs} Access via the +@AbsCSyn@ interface. +\end{description} + +The second major group of datatypes are the ``basic entity'' +datatypes; these are notable in that you don't need to know their +representation to use them. Several have already been mentioned: +\begin{description} +\item[UniTypes:]\srcloc{uniType/AbsUniType.lhs} This is a gigantic +interface onto the world of @UniTypes@; accessible via the +@AbsUniType@ interface. You should import operations on all the {\em +pieces} of @UniTypes@ (@TyVars@, @TyVarTemplates@, @TyCons@, +@Classes@, and @ClassOps@) from here as well---everything for the +``type world.'' + +{\em Please don't grab type-related functions from internal modules, +behind @AbsUniType@'s back!} (Otherwise, we won't discover the +shortcomings of the interface...) + +\item[Identifiers:]\srcloc{basicTypes/Id.lhs} Interface: @Id@. + +\item[``Core'' literals:]\srcloc{basicTypes/CoreLit.lhs} These are +the unboxed literals used in Core syntax onwards. Interface: @CoreLit@. + +\item[Environments:]\srcloc{envs/GenericEnv.lhs} +A generic environment datatype, plus a generally useful set of +operations, is provided via the @GenericEnv@ interface. We encourage +you to use this, rather than roll your own; then your code will +benefit when we speed up the generic code. All of the typechecker's +environment stuff (of which there is plenty) is built on @GenericEnv@, +so there are plenty of examples to follow. + +\item[@Uniques@:]\srcloc{basicTypes/Unique.lhs} Essentially @Ints@. +When you need something unique for fast comparisons. Interface: +@Unique@. This interface also provides a simple @UniqueSupply@ monad; +often just the thing... + +\item[Wired-in standard prelude knowledge:]\srcloc{prelude/} The +compiler has to know a lot about the standard prelude. What it knows +is in the @compiler/prelude@ directory; all the rest of the compiler +gets its prelude knowledge through the @AbsPrel@ interface. + +The prelude stuff can get hairy. There is a separate document about +it. Check the @ghc/docs/README@ list for a pointer to it... +\end{description} + +The above list isn't exhaustive. By all means, ask if you think +``Surely a function like {\em this} is in here somewhere...'' + + +%************************************************************************ +%* * +\section{Cross-module pragmatic info: the mysteries revealed} +%* * +%************************************************************************ + +ToDo: mention wired-in info. + +%************************************************************************ +%* * +\section{GHC hacking tips and ``good practice''} +%* * +%************************************************************************ + +ASSERT + +%************************************************************************ +%* * +\section{Glasgow pragmatics: build trees, etc.} +%* * +%************************************************************************