[project @ 2000-04-27 16:35:29 by sewardj]
A total rewrite of the BCO assembler/linker, and rationalisation of
the code management and code generation phases of Hugs.
Problems with the old linker:
* Didn't have a clean way to insert a pointer to GHC code into a BCO.
This meant CAF GC didn't work properly in combined mode.
* Leaked memory. Each BCO, caf and constructor generated by Hugs had
a corresponding malloc'd record used in its construction. These
records existed forever. Pointers from the Hugs symbol tables into
the runtime heap always went via these intermediates, for no apparent
reason.
* A global variable holding a list of top-level stg trees was used
during code generation. It was hard to associate trees in this
list with entries in the name/tycon tables. Just too many
mechanisms.
The New World Order is as follows:
* The global code list (stgGlobals) is gone.
* Each name in the name table has a .closure field. This points
to the top-level code for that name. Before bytecode generation
this points to a STG tree. During bytecode generation but before
bytecode linking it is a MPtr pointing to a malloc'd intermediate
structure (an AsmObject). After linking, it is a real live pointer
into the execution heap (CPtr) which is treated as a root during GC.
Because tuples do not have name table entries, tycons which are
tuples also have a .closure field, which is treated identically
to those of name table entries.
* Each module has a code list -- a list of names and tuples. If you
are a name or tuple and you have something (code, CAF or Con) which
needs to wind up in the execution heap, you MUST be on your module's
code list. Otherwise you won't get code generated.
* Lambda lifting generates new name table entries, which of course
also wind up on the code list.
* The initial phase of code generation for a module m traverses m's
code list. The stg trees referenced in the .closure fields are
code generated, creating AsmObject (AsmBCO, AsmCAF, AsmCon) in
mallocville. The .closure fields then point to these AsmObjects.
Since AsmObjects can be mutually recursive, they can contain
references to:
* Other AsmObjects Asm_RefObject
* Existing closures Asm_RefNoOp
* name/tycon table entries Asm_RefHugs
AsmObjects can also contain BCO insns and non-ptr words.
* A second copy-and-link phase copies the AsmObjects into the
execution heap, resolves the Asm_Ref* items, and frees up
the malloc'd entities.
* Minor cleanups in compile-time storage. There are now 3 kinds of
address-y things available:
CPtr/mkCPtr/cptrOf -- ptrs to Closures, probably in exec heap
ie anything which the exec GC knows about
MPtr/mkMPtr/mptrOf -- ptrs to mallocville, which the exec GC
knows nothing about
Addr/mkAddr/addrOf -- literal addresses (like literal ints)
* Many hacky cases removed from codegen.c. Referencing code or
data during code generation is a lot simpler, since an entity
is either:
a CPtr, in which case use it as is
a MPtr -- stuff it into the AsmObject and the linker will fix it
a name or tycon
-- ditto
* I've checked, using Purify that, at least in standalone mode,
no longer leaks mallocd memory. Prior to this it would leak at
the rate of about 300k per Prelude.
* Added this comment to the top of codegen.c.
Still to do:
* Reinstate peephole optimisation for BCOs.
* Nuke magic number headers in AsmObjects, used for debugging.
* Profile and accelerate. Code generation is slower because linking
is slower. Evaluation GC is slower because markHugsObjects has
slowed down.
* Make setCurrentModule ignore name table entries created by the
lambda-lifter.
* Zap various #if 0's in codegen.c/Assembler.c.
* Zap CRUDE_PROFILING.
19 files changed: