X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fthe-beast%2Fmangler.html;fp=docs%2Fcomm%2Fthe-beast%2Fmangler.html;h=1ad80f0d5c38e6aadaecbc4a415c5bd10c42badb;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/the-beast/mangler.html b/docs/comm/the-beast/mangler.html new file mode 100644 index 0000000..1ad80f0 --- /dev/null +++ b/docs/comm/the-beast/mangler.html @@ -0,0 +1,79 @@ + + + + + The GHC Commentary - The Evil Mangler + + + +

The GHC Commentary - The Evil Mangler

+

+ The Evil Mangler (EM) is a Perl script invoked by the Glorious Driver after the C compiler (gcc) has + translated the GHC-produced C code into assembly. Consequently, it is + only of interest if -fvia-C is in effect (either explicitly + or implicitly). + +

Its purpose

+

+ The EM reads the assembly produced by gcc and re-arranges code blocks as + well as nukes instructions that it considers non-essential. It + derives it evilness from its utterly ad hoc, machine, compiler, and + whatnot dependent design and implementation. More precisely, the EM + performs the following tasks: +

+ +

Implementation

+

+ The EM is located in the Perl script ghc-asm.lprl. + The script reads the .s file and chops it up into + chunks (that's how they are actually called in the script) that + roughly correspond to basic blocks. Each chunk is annotated with an + educated guess about what kind of code it contains (e.g., infotable, + fast entry point, slow entry point, etc.). The annotations also contain + the symbol introducing the chunk of assembly and whether that chunk has + already been processed or not. +

+ The parsing of the input into chunks as well as recognising assembly + instructions that are to be removed or altered is based on a large + number of Perl regular expressions sprinkled over the whole code. These + expressions are rather fragile as they heavily rely on the structure of + the generated code - in fact, they even rely on the right amount of + white space and thus on the formatting of the assembly. +

+ Afterwards, the chunks are reordered, some of them purged, and some + stripped of some useless instructions. Moreover, some instructions are + manipulated (eg, loads of fast entry points followed by indirect jumps + are replaced by direct jumps to the fast entry point). +

+ The EM knows which part of the code belongs to function prologues and + epilogues as STG C adds tags of the + form --- BEGIN --- and --- END --- the + assembler just before and after the code proper of a function starts. + It adds these tags using gcc's __asm__ feature. +

+ Update: Gcc 2.96 upwards performs more aggressive basic + block re-ordering and dead code elimination. This seems to make the + whole --- END --- tag business redundant -- in fact, if + proper code is generated, no --- END --- tags survive gcc + optimiser. + +

+ +Last modified: Sun Feb 17 17:55:47 EST 2002 + + + +