+++ /dev/null
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
-<html>
- <head>
- <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
- <title>The GHC Commentary - The Evil Mangler</title>
- </head>
-
- <body BGCOLOR="FFFFFF">
- <h1>The GHC Commentary - The Evil Mangler</h1>
- <p>
- The Evil Mangler (EM) is a Perl script invoked by the <a
- href="driver.html">Glorious Driver</a> after the C compiler (gcc) has
- translated the GHC-produced C code into assembly. Consequently, it is
- only of interest if <code>-fvia-C</code> is in effect (either explicitly
- or implicitly).
-
- <h4>Its purpose</h4>
- <p>
- The EM reads the assembly produced by gcc and re-arranges code blocks as
- well as nukes instructions that it considers <em>non-essential.</em> It
- derives it evilness from its utterly ad hoc, machine, compiler, and
- whatnot dependent design and implementation. More precisely, the EM
- performs the following tasks:
- <ul>
- <li>The code executed when a closure is entered is moved adjacent to
- that closure's infotable. Moreover, the order of the info table
- entries is reversed. Also, SRT pointers are removed from closures that
- don't need them (non-FUN, RET and THUNK ones).
- <li>Function prologue and epilogue code is removed. (GHC generated code
- manages its own stack and uses the system stack only for return
- addresses and during calls to C code.)
- <li>Certain code patterns are replaced by simpler code (eg, loads of
- fast entry points followed by indirect jumps are replaced by direct
- jumps to the fast entry point).
- </ul>
-
- <h4>Implementation</h4>
- <p>
- The EM is located in the Perl script <a
- href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/driver/mangler/ghc-asm.lprl"><code>ghc-asm.lprl</code></a>.
- The script reads the <code>.s</code> file and chops it up into
- <em>chunks</em> (that's how they are actually called in the script) that
- roughly correspond to basic blocks. Each chunk is annotated with an
- educated guess about what kind of code it contains (e.g., infotable,
- fast entry point, slow entry point, etc.). The annotations also contain
- the symbol introducing the chunk of assembly and whether that chunk has
- already been processed or not.
- <p>
- The parsing of the input into chunks as well as recognising assembly
- instructions that are to be removed or altered is based on a large
- number of Perl regular expressions sprinkled over the whole code. These
- expressions are rather fragile as they heavily rely on the structure of
- the generated code - in fact, they even rely on the right amount of
- white space and thus on the formatting of the assembly.
- <p>
- Afterwards, the chunks are reordered, some of them purged, and some
- stripped of some useless instructions. Moreover, some instructions are
- manipulated (eg, loads of fast entry points followed by indirect jumps
- are replaced by direct jumps to the fast entry point).
- <p>
- The EM knows which part of the code belongs to function prologues and
- epilogues as <a href="../rts-libs/stgc.html">STG C</a> adds tags of the
- form <code>--- BEGIN ---</code> and <code>--- END ---</code> the
- assembler just before and after the code proper of a function starts.
- It adds these tags using gcc's <code>__asm__</code> feature.
- <p>
- <strong>Update:</strong> Gcc 2.96 upwards performs more aggressive basic
- block re-ordering and dead code elimination. This seems to make the
- whole <code>--- END ---</code> tag business redundant -- in fact, if
- proper code is generated, no <code>--- END ---</code> tags survive gcc
- optimiser.
-
- <p><small>
-<!-- hhmts start -->
-Last modified: Sun Feb 17 17:55:47 EST 2002
-<!-- hhmts end -->
- </small>
- </body>
-</html>