1 \section[codegen-intro]{Intro/background info for the code generator}
3 \tr{NOTES.codeGen} LIVES!!!
6 =======================
7 NEW! 10 Nov 93 Semi-tagging
11 case x of -- NB just a variable scrutinised
13 (p:ps) -> ...p... -- eg. ps not used
18 while TRUE do { switch TAG(Node) {
20 INDIRECTION_TAG : Node = Node[1]; break; -- Dereference indirection
22 OTHER_TAG : adjust stack; push return address; ENTER(Node)
28 R2 := Node[2] -- Get ps
32 * The "return address" is a vector table, which contains pointers to
33 Nil_case and Cons_case.
35 * The "adjust stack" in the case of OTHER_TAG is one word different to
36 that in the case of a constructor tag (0,1,...), because it needs to
37 take account of the return address. That's why the stack adjust
38 shows up in the branches, rather than before the switch.
40 * In the case of *unvectored* returns, the "return address" will be
41 some code which switches on TagReg. Currently, the branches of the
42 case at the return address have the code for the alternatives
47 1 : code for cons case
50 But with semi-tagging, we'll have to label each branch:
57 So there's an extra jump. Boring. Boring. (But things are usually
58 eval'd...in which case we save a jump.)
60 * TAG is a macro which gets a "tag" from the info table. The tag
61 encodes whether the thing is (a) an indirection, (b) evaluated
62 constructor with tag N, or (c) something else. The "something else"
63 usually indicates something unevaluated, but it might also include
64 FETCH_MEs etc. Anything which must be entered.
66 * Maybe we should get the info ptr out of Node, into a temporary
67 InfoPtrReg, so that TAG and ENTER share the info-ptr fetch.
69 * We only load registers which are live in the alternatives. So at
70 the start of an alternative, either the unused fields *will* be in
71 regs (if we came via enter/return) or they *won't* (if we came via
72 the semi-tagging switch). If they aren't, GC had better not follow
73 them. So we can't arrange that all live ptrs are neatly lined up in
74 the first N regs any more. So GC has to take a liveness
75 bit-pattern, not just a "number of live regs" number.
77 * We need to know which of the constructors fields are live in the
78 alternatives. Hence STG code has to be elaborated to keep live vars
79 for each alternative, or to tag each bound-var in the alternatives
80 with whether or not it is used.
82 * The code generator needs to be able to construct unique labels for
83 the case alternatives. (Previously this was done by the AbsC
84 flattening pass.) Reason: we now have an explicit join point at the
85 start of each alternative.
87 * There's some question about how tags are mapped. Is 0 the first
88 tag? (Good when switching on TagReg when there are only two
89 constructors.) What is OTHER_TAG and INDIRECTION_TAG?
91 * This whole deal can be freely mixed with un-semi-tagged code.
92 There should be a compiler flag to control it.
94 =======================
95 Many of the details herein are moldy and dubious, but the general
96 principles are still mostly sound.
99 %************************************************************************
101 \subsection{LIST OF OPTIMISATIONS TO DO}
103 %************************************************************************
107 Register return conventions.
110 Optimisations for Enter when
113 know code ptr, so don't indirect via Node
117 top level closures don't load Node
123 Case of unboxed op with more than one alternative, should generate
124 a switch or an if statement.
131 Don't allocate constructors with no args.
132 Instead have a single global one.
135 Have global closures for all characters, and all small numbers.
143 When a closure is one of its own free variables, don't waste a field
144 on it. Instead just use Node.
148 %************************************************************************
150 \subsection{ENTERING THE GARBAGE COLLECTOR}
152 %************************************************************************
156 There are the following ways to get into the garbage collector:
159 _HEAP_OVERFLOW_ReturnViaNode
160 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
161 Used for the GC trap at closure entry.
163 - Node is only live ptr
164 - After GC, enter Node
166 _HEAP_OVERFLOW_ReturnDirect0, _HEAP_OVERFLOW_ReturnDirect1, ...
167 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
168 Used: for fast entry of functions, and
169 case alternative where values are returned in regs
171 - PtrReg1..n are live ptrs
172 - ReturnReg points to start of code (before hp oflo check)
173 - After GC, jump to ReturnReg
174 - TagReg is preserved, in case this is an unvectored return
177 _HEAP_OVERFLOW_CaseReturnViaNode
178 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181 Used for case alternatives which return node in heap
183 - Node is only live ptr
184 - RetVecReg points to return vector
185 - After GC, push RetVecReg and enter Node
188 Exactly equivalent to @GC_ReturnViaNode@, preceded by pushing @ReturnVectorReg@.
190 The only reason we re-enter Node is so that in a GRIP-ish world, the
191 closure pointed to be Node is re-loaded into local store if necessary.
193 %************************************************************************
197 %************************************************************************
199 [New stuff 27 Nov 91]
201 \subsubsection{Return conventions}
202 %~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
204 When executing the update continuation code for a constructor,
205 @RetVecReg@ points to the {\em beginning of} the return vector. This is to
206 enable the update code to find the normal continuation code.
207 (@RetVecReg@ is set up by the code which jumps to the update continuation
210 \subsubsection{Stack arrangement}
211 %~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213 Each stack has a ``stack update ptr'', SuA and SuB, which point to the
214 topmost word of the stack just after an update frame has been pushed.
216 A standard update frame (on the B stack) looks like this
217 (stack grows downward in this picture):
221 |---------------------------------------|
223 |---------------------------------------|
225 |---------------------------------------|
226 | Pointer to closure to be updated |
227 |---------------------------------------|
228 | Pointer to Update return vector |
229 |---------------------------------------|
232 The SuB therefore points to the Update return vector component of the
233 topmost update frame.
235 A {\em constructor} update frame, which is pushed only by closures
236 which know they will evaluate to a data object, looks just the
237 same, but without the saved SuA pointer.
239 \subsubsection{Pushing update frames}
240 %~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
242 An update is pushed right at the start of the code for an updatable
243 closure. But {\em after} the stack overflow check. (The B-stack oflo
244 check should thereby include allowance for the update frame itself.)
246 \subsubsection{Return vectors}
247 %~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
249 Every ``return address'' pushed on the stack by a boxed \tr{case} is a
250 pointer to a vector of one or more pairs of code pointers:
253 ------> -----------------
265 Each pair consists of a {\em continuation} code pointer and an
266 {\em update} code pointer.
268 For data types with only one constructor, or too many constructors for
269 vectoring, the return vector consists of a single pair.
271 When the \tr{data} decl for each data type is compiled, as well as
272 making info tables for each constructor, an update code sequence for
273 each constructor (or a single one, if unvectored) is also created.
275 ToDo: ** record naming convention for these code sequences somewhere **
277 When the update code is entered, it uses the value stored in the
278 return registers used by that constructor to update the thing pointed
279 to by the update frame (all of which except for the return address is
280 still on the B stack). If it can do an update in place (ie
281 constructor takes 3 words or fewer) it does so.
283 In the unvectored case, this code first has to do a switch on the tag,
284 UNLESS the return is in the heap, in which case simply overwrite with
285 an indirection to the thing Node points to.
287 Tricky point: if the update code can't update in place it has to
288 allocate a new object, by performing a heap-oflo check and jumping to
289 the appropriate heap-overflow entry point depending on which RetPtr
290 registers are live (just as when compiling a case alternative).
292 When the update code is entered, a register @ReturnReg@ is assumed to
293 contain the ``return address'' popped from the B stack. This is so
294 that the update code can enter the normal continuation code when it is
297 For standard update frames, the A and B stack update ptrs are restored
298 from the saved versions before returning, too.
300 \subsubsection{Update return vector}
301 %~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
303 Both standard and constructor update frames have as their topmost word
304 a pointer to a static, fixed, update return vector.
306 The ``continuation'' entry of each pair in this vector sets UpdReg to
307 point to the thing to be updated (gotten from the update frame), pops
308 the update frame, and returns to the ``update'' entry of the
309 corresponding pair in the next return vector (now exposed on top of B
312 The ``update'' entry of each pair in this vector overwrites the thing
313 to be updated with an indirection to the thing UpdReg points to, and
314 then returns in the same was as the "continuation" entry above.
316 There need to be enough pairs in the update return vector to cater for
317 any constructor at all.
320 *************************
322 Things which need to be altered if you change the number of constructors
323 which switches off vectored returns:
325 Extra cases in update return vector (file xxx)
326 The value xxxx in yyyy.lhs
329 **************************
331 %************************************************************************
333 \subsection{HEAP OBJECTS}
335 %************************************************************************
337 The heap consists of {\em closures}.
338 A closure can be either:
341 a {\em suspension}, which is an unevaluated thunk.
343 a {\em constructed object} (or just constructor); created by let(recs) and
346 a {\em partial application} (only updating creates these).
349 Closures are laid out with the {\em info pointer} at the lowest
350 address (but see notes on the Global Address field for parallel
351 system). [We don't try to localise knowledge of this! It is a royal
352 pain having to cope with closures laid out backwards.]
354 Ptr fields occur first (before non-ptr ones).
356 Non-normal-form closures are always at least 3 words in size (excl
357 global address), so they can be updated with a list cell (should they
360 Normal form (constructor) closures are always at least 2 words in size
361 (excl global address), so they have room enough for forwarding ptrs
362 during GC, and FETCHME boxes after flushing.
364 1-word closures for normal-form closures in static space. Explain
367 Ideally, the info pointer of a closure would point to...
375 But when C is the target code we can't guarantee the relative
376 positions of code and data. So the info ptr points to
379 info ptr ---->| ------------------------> code
385 That is, there's an extra indirection involved; and the info table
386 occurs AFTER the info pointer rather than before. The info table
387 entries are ``reversed'' too, so that bigger negative offsets in the
388 ``usual'' case turn into bigger positive offsets.
392 The simplest form of suspension is
394 info-ptr, ptr free vars, non-ptr free vars
397 where the info table for info-ptr gives
400 the total number of words of free vars
402 the number of words of ptr free vars (== number of ptr free vars)
403 in its extra-info part.
406 Optimised versions omit the size info from the info table, and instead
407 use specialised GC routines.
410 %************************************************************************
412 \subsection{NAMING CONVENTIONS for compiled code}
414 %************************************************************************
417 Given a top-level closure called f defined in module M,
420 _M_f_closure labels the closure itself
421 (only for top-level (ie static) closures)
423 _M_f_entry labels the slow entry point of the code
424 _M_f_fast labels the fast entry point of the code
426 _M_f_info labels the info pointer for the closure for f
427 (NB the info ptr of a closure isn't public
428 in the sense that these labels
429 are. It is private to a module, and
430 its name can be a secret.)
433 These names are the REAL names that the linker sees. The initial underscores
434 are attached by the C compiler.
436 A non-top-level closure has the same names, but as well as the \tr{f}
437 the labels have the unique number, so that different local closures
438 which share a name don't get confused. The reason we need a naming
439 convention at all is that with a little optimisation a tail call may
440 jump direct to the fast entry of a locally-defined closure.
442 \tr{f} may be a constructor, in the case of closures which are the curried
443 versions of the constructor.
445 For constructor closures, we have the following naming conventions, where
446 the constructor is C defined in module M:
449 _M_C_con_info is the info ptr for the constructor
450 _M_C_con_entry is the corresponding code entry point
453 %************************************************************************
455 \subsection{ENTRY CONVENTIONS}
457 %************************************************************************
460 \item[Constructor objects:]
461 On entry to the code for a constructor (\tr{_M_C_con_entry}), Node
462 points to the constructor object. [Even if the constructor has arity
465 \item[Non-top-level suspensions (both fast and slow entries):]
466 Node points to the closure.
468 \item[Top-level suspensions, slow entry:]
469 ReturnReg points to the slow entry point itself
471 \item[..ditto, fast entry:]
476 %************************************************************************
478 \subsection{CONSTRUCTOR RETURN CONVENTIONS}
480 %************************************************************************
482 There is lots of excitement concerning the way in which constructors
483 are returned to case expressions.
485 {\em Simplest version}
486 %=====================
488 The return address on the stack points directly to some code. It
493 PtrReg1 points to the constructed value (in the heap) (unless arity=0)
494 Tag contains its tag (unless # of constructors = 1)
496 Unboxed Ints: IntReg contains the int
497 Float: FloatReg contains the returned value
500 {\em Small improvement: vectoring}
501 %=================================
503 If there are fewer than (say) 8 constructors in the type, the return
504 address points to a vector of return addresses. The constructor does
505 a vectored return. No CSwitch.
507 Complication: updates. Update frames are built before the type of the
508 thing which will be returned is known. Hence their return address
509 UPDATE has to be able to handle anything (vectored and nonvectored).
511 Hence the vector table goes BACKWARD from ONE WORD BEFORE the word
512 pointed to by the return address.
514 {\em Big improvement: contents in registers}
515 %===========================================
517 Constructor with few enough components (eg 8ish) return their
518 arguments in registers. [If there is only one constructor in the
519 type, the tag register can be pressed into service for this purpose.]
521 Complication: updates. Update frames are built before the type of the
522 thing which will be returned is known. Hence their return address
523 UPDATE has to be able to handle anything.
525 So, a return address is a pointer to a PAIR of return addresses (or
526 maybe a pointer to some code immediately preceded by a pointer to some
529 The ``main'' return address is just as before.
531 The ``update'' return address expects just the same regs to be in use
532 as the ``main'' address, BUT AS WELL the magic loc UpdPtr points to a
533 closure to be updated. It carries out the update, and contines with
534 the main return address.
536 The ``main'' code for UPDATE just loads UpdPtr the thing to be
537 updated, and returns to the "update" entry of the next thing on the
540 The ``update'' entry for UPDATE just overwrites the thing to be
541 updated with an indirection to UpdPtr.
543 These two improvements can be combined orthogonally.
546 %************************************************************************
548 \subsection{REGISTERS}
550 %************************************************************************
552 Separate registers for
554 C stack (incl interrupt handling, if this is not done on
555 another stk) (if interrupts don't mangle the C stack,
556 we could save it for most of the time and reuse the
560 Basic value and control stack
561 These two grow towards each other, so they are each
573 %************************************************************************
575 \subsection{THE OFFSET SWAMP}
577 %************************************************************************
579 There are THREE kinds of offset:
581 \item[virtual offsets:]
583 start at 1 at base of frame, and increase towards top of stack.
585 don't change when you adjust sp/hp.
587 independent of stack direction.
589 only exist inside the code generator, pre Abstract C
591 for multi-word objects, the offset identifies the word of the
592 object with smallest offset
594 \item[reg-relative offsets:]
596 start at 0 for elt to which sp points, and increase ``into the
599 Specifically, towards
602 bottom of stack (for SpA, SpB)
604 beginning of heap (for Hp)
606 end of closure (for Node)
609 offset for a particular item changes when you adjust sp.
611 independent of stack direction.
613 exist in abstract C CVal and CAddr addressing modes
615 for multi-word objects, the offset identifies the word of the
616 object with smallest offset
620 either the negation or identity of sp-relative offset.
622 start at 0 for elt to which sp points, and either increase or
623 decrease towards bottom of stk, depending on stk direction
625 exist in real C, usually as a macro call passing an sp-rel offset
627 for multi-word objects, the offset identifies the word of the
628 object with lowest address
631 %************************************************************************
635 %************************************************************************
637 There are two stacks, as in the STG paper.
640 contains only closure pointers. Its stack ptr is SpA.
643 contains basic values, return addresses, update frames.
644 Its stack ptr is SpB.
647 SpA and SpB point to the topmost allocated word of stack (though they
648 may not be up to date in the middle of a basic block).
650 \subsubsection{STACK ALLOCATION}
652 A stack and B stack grow towards each other, so they overflow when
655 The A stack grows downward; the B stack grows upward. [We'll try to
656 localise stuff which uses this info.]
658 We can check for stack {\em overflow} not just at the start of a basic
659 block, but at the start of an entire expression evaluation. The
660 high-water marks of case-expression alternatives can be max'd.
662 Within the code for a closure, the ``stack frame'' is deemed to start
663 with the last argument taken by the closure (ie the one deepest in the
664 stack). Stack slots are can then be identified by ``virtual offsets''
665 from the base of the frame; the bottom-most word of the frame has
668 For multi-word slots (B stack only) the offset identifies the word
669 with the smallest virtual offset. [If B grows upward, this is the word
670 with the lowest physical address too.]
672 Since there are two stacks, a ``stack frame'' really consists of two
673 stack frames, one on each stack.
675 For each stack, we keep track of the following:
678 * virtSp virtual stack ptr offset of topmost occupied stack slot
679 (initialised to 0 if no args)
681 * realSp real stack ptr offset of real stack ptr reg
682 (initialised to 0 if no args)
684 * tailSp tail-call ptr offset of topmost slot to be retained
685 at next tail call, excluding the
686 argument to the tail call itself
688 * hwSp high-water mark largest value taken by virtSp
692 The real stack pointer is (for now) only adjusted at the tail call itself,
693 at which point it is made to point to the topmost occupied word of the stack.
695 We can't always adjust it at the beginning, because we don't
696 necessarily know which tail call will be made (a conditional might
697 intervene). So stuff is actually put on the stack ``above'' the stack
698 pointer. This is ok because interrupts are serviced on a different
701 The code generator works entirely in terms of stack {\em virtual
702 offsets}. The conversion to real addressing modes is done solely when
703 we look up a binding. When we move a stack pointer, the offsets of
704 variables currently bound to stack offsets in the environment will
705 change. We provide operations in the @cgBindings@ type to perform
706 this offset-change (to wit, @shiftStkOffsets@), leaving open whether
707 it is done pronto, or kept separate and applied to lookups.
709 Stack overflow checking takes place at the start of a closure body, using
710 the high-water mark information gotten from the closure body.
713 %************************************************************************
715 \subsection{HEAP ALLOCATION}
717 %************************************************************************
719 Heap ptr reg (Hp) points to the last word of allocated space (and not
720 to the first word of free space).
722 The heap limit register (HpLim) points to the last word of available
725 A basic block allocates a chunk of heap called a ``heap frame''.
726 The word of the frame nearest to the previously-allocated stuff
727 has virtual offset 1, and offsets increase from 1 to the size of the
730 Closures are allocated with their code pointers having the lowest virtual
733 NOTE: this means that closures are only laid out with code ptr at
734 lowest PHYSICAL address if the heap grows upwards.
736 Heap ptr reg is moved at the beginning of a basic block to account for
737 the allocation of the whole frame. At this time a heap exhaustion
738 check is made (has the heap ptr gone past the heap limit?). In the
739 basic block, indexed accesses off the heap ptr fill in this newly
740 allocated block. [Bias to RISC here: no cheap auto-inc mode, and free
743 We maintain the following information during code generation:
746 * virtHp virtual heap ptr offset of last word
747 of the frame allocated so far
748 Starts at 0 and increases.
749 * realHp virtual offset of
753 Since virtHp only ever increases, it doubles as the heap high water mark.
755 \subsubsection{BINDINGS}
757 The code generator maintains info for each name about where it is.
758 Each variable maps to:
763 - its volatile location:- a temporary variable
764 - a virtual heap offset n, meaning the
765 ADDRESS OF a word in the current
769 - its stable location: - a virtual stack offset n, meaning the
770 CONTENTS OF an object in the
775 \subsubsection{ENTERING AN OBJECT}
777 When a closure is entered at the normal entry point, the magic locs
779 Node points to the closure (unless it is a top-level closure)
780 ReturnReg points to the code being jumped to
782 At the fast entry point, Node is still set up, but ReturnReg may not be.
783 [Not sure about this.]