1 \section[COptJumps]{Macros for tail-jumping}
3 % this file is part of the C-as-assembler document
10 %************************************************************************
12 \subsection[COptJumps-portable]{Tail-(non-)jumping in ``portable~C''}
14 %************************************************************************
17 #if ! (defined(__STG_TAILJUMPS__) && defined(__GNUC__))
19 #define JMP_(target) return((F_) (target))
20 #define RESUME_(target) JMP_(target)
23 Don't need to do anything magical for the mini-interpreter, because
24 we're really going to use the plain old C one (and the debugging
25 variant, too, for that matter).
27 %************************************************************************
29 \subsection[COptJumps-optimised]{Tail-jumping in ``optimised~C''}
31 %************************************************************************
34 #else /* __STG_TAILJUMPS__ && __GNUC__ */
37 GCC will have assumed that pushing/popping of C-stack frames is going
38 on when it generated its code, and used stack space accordingly.
39 However, we actually {\em post-process away} all such stack-framery
40 (see \tr{ghc/driver/ghc-asm.lprl}).
41 Thing will be OK however, if we initially make sure there are
42 @RESERVED_C_STACK_BYTES@ on the C-stack to begin with, for local
46 #define RESERVED_C_STACK_BYTES (512 * sizeof(I_)) /* MUST BE OF GENEROUS ALIGNMENT */
49 The platform-specific details are given in alphabetical order.
51 %************************************************************************
53 \subsubsection[COptJumps-alpha]{Tail-jumping on Alphas}
55 %************************************************************************
57 We have to set the procedure value register (\$27) before branching, so
58 that the target function can load the gp (\$29) as appropriate.
60 It seems that \tr{_procedure} can't be declared within the body of the
61 \tr{JMP_} macro...at least, not if we want it to be \$27, which we do!
64 #if alpha_dec_osf1_TARGET
65 /* ToDo: less specific? */
68 Jumping to a new block of code, we need to set up $27 to point
69 at the target, so that the callee can establish its gp (as an
70 offset from its own starting address). For some reason, gcc
71 refuses to give us $27 for _procedure if it's declared as a
72 local variable, so the workaround is to make it a global.
74 Note: The local variable works in gcc 2.6.2, but fails in 2.5.8.
77 /* MOVED: to COptRegs.lh -- very unsatisfactorily.
78 Otherwise, we can get a "global register variable follows a
79 function definition" error.
81 Once we can take gcc 2.6.x as std, then we can use
82 the local variant, and the problem goes away. (WDP 95/02)
84 register void *_procedure __asm__("$27");
88 do { _procedure = (void *)(cont); \
93 When we resume at the point where a call was originally made,
94 we need to restore $26, so that gp can be reloaded appropriately.
95 However, sometimes we ``resume'' by entering a new function
96 (typically EnterNodeCode), so we need to set up $27 as well.
99 #define RESUME_(cont) \
100 do { _procedure = (void *)(cont); \
101 __asm__ volatile("mov $27,$26"); \
105 #define MINI_INTERPRETER_SETUP \
106 __asm__ volatile ("stq $9,-8($30)\n" \
107 "stq $10,-16($30)\n" \
108 "stq $11,-24($30)\n" \
109 "stq $12,-32($30)\n" \
110 "stq $13,-40($30)\n" \
111 "stq $14,-48($30)\n" \
112 "stq $15,-56($30)\n" \
113 "stt $f2,-64($30)\n" \
114 "stt $f3,-72($30)\n" \
115 "stt $f4,-80($30)\n" \
116 "stt $f5,-88($30)\n" \
117 "stt $f6,-96($30)\n" \
118 "stt $f7,-104($30)\n" \
119 "stt $f8,-112($30)\n" \
120 "stt $f9,-120($30)\n" \
121 "lda $30,-%0($30)" : : \
122 "K" (RESERVED_C_STACK_BYTES+8*sizeof(double)+8*sizeof(long)));
124 #define MINI_INTERPRETER_END \
125 __asm__ volatile (".align 3\n" \
126 ".globl miniInterpretEnd\n" \
127 "miniInterpretEnd:\n" \
128 "lda $30,%0($30)\n" \
130 "ldq $10,-16($30)\n" \
131 "ldq $11,-24($30)\n" \
132 "ldq $12,-32($30)\n" \
133 "ldq $13,-40($30)\n" \
134 "ldq $14,-48($30)\n" \
135 "ldq $15,-56($30)\n" \
136 "ldt $f2,-64($30)\n" \
137 "ldt $f3,-72($30)\n" \
138 "ldt $f4,-80($30)\n" \
139 "ldt $f5,-88($30)\n" \
140 "ldt $f6,-96($30)\n" \
141 "ldt $f7,-104($30)\n" \
142 "ldt $f8,-112($30)\n" \
143 "ldt $f9,-120($30)" : : \
144 "K" (RESERVED_C_STACK_BYTES+8*sizeof(double)+8*sizeof(long)));
149 %************************************************************************
151 \subsubsection[COptJumps-Hpux]{Tail-jumping on a HP-PA machine running HP-UX}
153 %************************************************************************
156 #if hppa1_1_hp_hpux_TARGET
158 /* do FUNBEGIN/END the easy way */
159 #define FUNBEGIN __asm__ volatile ("--- BEGIN ---");
160 #define FUNEND __asm__ volatile ("--- END ---");
162 /* The stack grows up! Local variables are allocated just above the
163 frame pointer, and extra arguments are stashed just below the stack
164 pointer, so the safe space is again in the middle (cf. sparc).
166 Sven Panne <Sven.Panne@informatik.uni-muenchen.de> writes:
168 But now for the reallly bad news: Some nasty guy in the threaded world
169 modifies R3 (the frame pointer)!! This should not happen (as far as I
170 know R3 should be a callee-saves register). Sadly, I can't reproduce
171 this behaviour consistently, Perhaps it is some strange point of our
172 boxes here? (uname -svrm gives HP-UX A.09.05 A 9000/715)
176 So here is my next try: Don't calculate the register buffer by _adding_
177 to FP[r3], but by _subtracting_ from SP! The patch below should result in the
178 same addresses (+/- some bytes :-) By the way, is the SP[r30] after returning
179 from the threaded world the same as the one before entering it?
180 I really hope so, otherwise %#*&!!
184 do { void *_procedure = (void *)(cont); \
188 #define RESUME_(cont) JMP_(cont)
190 #define MINI_INTERPRETER_SETUP \
191 StgChar space[RESERVED_C_STACK_BYTES+16*sizeof(long)+10*sizeof(double)]; \
192 /* __asm__ volatile ("ldo %0(%%r3),%%r19\n" */ \
193 __asm__ volatile ("ldo %0(%%r30),%%r19\n" \
194 "\tstw %%r3, 0(0,%%r19)\n" \
195 "\tstw %%r4, 4(0,%%r19)\n" \
196 "\tstw %%r5, 8(0,%%r19)\n" \
197 "\tstw %%r6,12(0,%%r19)\n" \
198 "\tstw %%r7,16(0,%%r19)\n" \
199 "\tstw %%r8,20(0,%%r19)\n" \
200 "\tstw %%r9,24(0,%%r19)\n" \
201 "\tstw %%r10,28(0,%%r19)\n" \
202 "\tstw %%r11,32(0,%%r19)\n" \
203 "\tstw %%r12,36(0,%%r19)\n" \
204 "\tstw %%r13,40(0,%%r19)\n" \
205 "\tstw %%r14,44(0,%%r19)\n" \
206 "\tstw %%r15,48(0,%%r19)\n" \
207 "\tstw %%r16,52(0,%%r19)\n" \
208 "\tstw %%r17,56(0,%%r19)\n" \
209 "\tstw %%r18,60(0,%%r19)\n" \
210 "\tldo 80(%%r19),%%r19\n" \
211 "\tfstds %%fr12,-16(0,%%r19)\n" \
212 "\tfstds %%fr13, -8(0,%%r19)\n" \
213 "\tfstds %%fr14, 0(0,%%r19)\n" \
214 "\tfstds %%fr15, 8(0,%%r19)\n" \
215 "\tldo 32(%%r19),%%r19\n" \
216 "\tfstds %%fr16,-16(0,%%r19)\n" \
217 "\tfstds %%fr17, -8(0,%%r19)\n" \
218 "\tfstds %%fr18, 0(0,%%r19)\n" \
219 "\tfstds %%fr19, 8(0,%%r19)\n" \
220 "\tldo 32(%%r19),%%r19\n" \
221 "\tfstds %%fr20,-16(0,%%r19)\n" \
222 "\tfstds %%fr21, -8(0,%%r19)\n" : : \
223 /* "n" (RESERVED_C_STACK_BYTES - (116 * sizeof(long) + 10 * sizeof(double))) : "%r19" ); */ \
224 "n" (-(116 * sizeof(long) + 10 * sizeof(double))) : "%r19" );
226 #define MINI_INTERPRETER_END \
227 __asm__ volatile (".align 4\n" \
228 "\t.EXPORT miniInterpretEnd,CODE\n" \
229 "\t.EXPORT miniInterpretEnd,ENTRY,PRIV_LEV=3\n" \
230 "miniInterpretEnd\n" \
231 /* "\tldo %0(%%r3),%%r19\n" */ \
232 "\tldo %0(%%r30),%%r19\n" \
233 "\tldw 0(0,%%r19),%%r3\n" \
234 "\tldw 4(0,%%r19),%%r4\n" \
235 "\tldw 8(0,%%r19),%%r5\n" \
236 "\tldw 12(0,%%r19),%%r6\n" \
237 "\tldw 16(0,%%r19),%%r7\n" \
238 "\tldw 20(0,%%r19),%%r8\n" \
239 "\tldw 24(0,%%r19),%%r9\n" \
240 "\tldw 28(0,%%r19),%%r10\n" \
241 "\tldw 32(0,%%r19),%%r11\n" \
242 "\tldw 36(0,%%r19),%%r12\n" \
243 "\tldw 40(0,%%r19),%%r13\n" \
244 "\tldw 44(0,%%r19),%%r14\n" \
245 "\tldw 48(0,%%r19),%%r15\n" \
246 "\tldw 52(0,%%r19),%%r16\n" \
247 "\tldw 56(0,%%r19),%%r17\n" \
248 "\tldw 60(0,%%r19),%%r18\n" \
249 "\tldo 80(%%r19),%%r19\n" \
250 "\tfldds -16(0,%%r19),%%fr12\n" \
251 "\tfldds -8(0,%%r19),%%fr13\n" \
252 "\tfldds 0(0,%%r19),%%fr14\n" \
253 "\tfldds 8(0,%%r19),%%fr15\n" \
254 "\tldo 32(%%r19),%%r19\n" \
255 "\tfldds -16(0,%%r19),%%fr16\n" \
256 "\tfldds -8(0,%%r19),%%fr17\n" \
257 "\tfldds 0(0,%%r19),%%fr18\n" \
258 "\tfldds 8(0,%%r19),%%fr19\n" \
259 "\tldo 32(%%r19),%%r19\n" \
260 "\tfldds -16(0,%%r19),%%fr20\n" \
261 "\tfldds -8(0,%%r19),%%fr21\n" : : \
262 /* "n" (RESERVED_C_STACK_BYTES - (116 * sizeof(long) + 10 * sizeof(double))) : "%r19"); */ \
263 "n" (-(116 * sizeof(long) + 10 * sizeof(double))) : "%r19");
265 #endif /* hppa1.1-hp-hpux* */
268 %************************************************************************
270 \subsubsection[COptJumps-iX86]{Tail-jumping on a 386/486}
272 %************************************************************************
277 /* *not* a good way to do this (WDP 96/05) */
278 #if defined(solaris2_TARGET_OS) || defined(linux_TARGET_OS)
279 #define MINI_INTERPRET_END "miniInterpretEnd"
281 #define MINI_INTERPRET_END "_miniInterpretEnd"
284 /* do FUNBEGIN/END the easy way */
285 #define FUNBEGIN __asm__ volatile ("--- BEGIN ---");
286 #define FUNEND __asm__ volatile ("--- END ---");
288 /* try "m68k-style" for now */
289 extern void __DISCARD__(STG_NO_ARGS);
294 target = (void *)(cont); \
298 #define RESUME_(target) JMP_(target)
300 /* The safe part of the stack frame is near the top */
302 #define MINI_INTERPRETER_SETUP \
303 StgChar space[RESERVED_C_STACK_BYTES+4*sizeof(long)]; \
304 __asm__ volatile ("leal %c0(%%esp),%%eax\n" \
305 "\tmovl %%ebx,0(%%eax)\n" \
306 "\tmovl %%esi,4(%%eax)\n" \
307 "\tmovl %%edi,8(%%eax)\n" \
308 "\tmovl %%ebp,12(%%eax)\n" \
309 : : "n" (RESERVED_C_STACK_BYTES) \
312 /* the initial "addl $f,%esp" in ..._END compensates for
313 the "call" (rather than a jump) in miniInterpret.
316 #define MINI_INTERPRETER_END \
317 __asm__ volatile (".align 4\n" \
318 ".globl " MINI_INTERPRET_END "\n" \
319 MINI_INTERPRET_END ":\n" \
322 __asm__ volatile ("addl $4,%%esp\n" \
323 "\tleal %c0(%%esp),%%eax\n" \
324 "\tmovl 0(%%eax),%%ebx\n" \
325 "\tmovl 4(%%eax),%%esi\n" \
326 "\tmovl 8(%%eax),%%edi\n" \
327 "\tmovl 12(%%eax),%%ebp" \
328 : : "n" (RESERVED_C_STACK_BYTES) : "%eax");
330 #endif /* __i[34]86__ */
333 %************************************************************************
335 \subsubsection[COptJumps-m68k]{Tail-jumping on m68k boxes}
337 %************************************************************************
339 For 680x0s, we use a quite-magic @JMP_@ macro, which includes
340 beginning- and end-of-function markers.
345 #define FUNBEGIN __asm__ volatile ("--- BEGIN ---");
346 #define FUNEND __asm__ volatile ("--- END ---");
349 The call to \tr{__DISCARD__} in @JMP_@ is fodder for GCC, to force it
350 to pop arguments to previous function calls before the end of the
351 current function. This is unnecessary if we can manage to compile
352 with \tr{-fomit-frame-pointer} as well as \tr{-fno-defer-pop}. (WDP
353 95/02: Either false or dodgy.) At the moment, the asm mangler removes
354 these calls to \tr{__DISCARD__}.
358 extern void __DISCARD__(STG_NO_ARGS);
363 target = (void *)(cont); \
367 #define RESUME_(target) JMP_(target)
369 #define MINI_INTERPRETER_SETUP \
370 StgChar space[RESERVED_C_STACK_BYTES+11*sizeof(long)]; \
371 __asm__ volatile ("moveml a2-a6/d2-d7,sp@(%c0)\n" \
372 "\tlea sp@(%c0),a6" : : "J" (RESERVED_C_STACK_BYTES));
374 #define MINI_INTERPRETER_END \
375 __asm__ volatile (".even\n" \
376 ".globl _miniInterpretEnd\n" \
377 "_miniInterpretEnd:\n" \
379 "\tmoveml sp@(%c0),a2-a6/d2-d7" : : "J" (RESERVED_C_STACK_BYTES));
381 #endif /* __m68k__ */
384 %************************************************************************
386 \subsubsection[COptJumps-mips]{Tail-jumping on a MIPS box}
388 %************************************************************************
391 #if mipseb_TARGET_ARCH || mipsel_TARGET_ARCH
393 /* do FUNBEGIN/END the easy way */
394 #define FUNBEGIN __asm__ volatile ("--- BEGIN ---");
395 #define FUNEND __asm__ volatile ("--- END ---");
397 /* try "m68k-style" for now */
398 extern void __DISCARD__(STG_NO_ARGS);
400 /* this is "alpha-style" */
402 do { __DISCARD__(); \
403 _procedure = (void *)(cont); \
407 #define RESUME_(target) JMP_(target)
409 /* _All_ callee-saved regs, whether we steal them or not, must be saved
413 #define MINI_INTERPRETER_SETUP \
414 StgChar space[RESERVED_C_STACK_BYTES+6*sizeof(double)+9*sizeof(long)]; \
415 __asm__ volatile ("addu $2,$sp,%0\n" \
416 "\ts.d $f20,0($2)\n" \
417 "\ts.d $f22,8($2)\n" \
418 "\ts.d $f24,16($2)\n" \
419 "\ts.d $f26,24($2)\n" \
420 "\ts.d $f28,32($2)\n" \
421 "\ts.d $f30,40($2)\n" \
422 "\tsw $16,48($2)\n" \
423 "\tsw $17,52($2)\n" \
424 "\tsw $18,56($2)\n" \
425 "\tsw $19,60($2)\n" \
426 "\tsw $20,64($2)\n" \
427 "\tsw $21,68($2)\n" \
428 "\tsw $22,72($2)\n" \
429 "\tsw $23,76($2)\n" \
430 "\tsw $fp,80($2)\n" \
431 : : "I" (RESERVED_C_STACK_BYTES+16) : "$2" );
433 /* the 16 bytes is for the argument-register save-area above $sp */
435 #define MINI_INTERPRETER_END \
436 __asm__ volatile (".align 2\n" \
437 ".globl miniInterpretEnd\n" \
438 "miniInterpretEnd:\n" \
439 "\taddu $2,$sp,%0\n" \
440 "\tl.d $f20,0($2)\n" \
441 "\tl.d $f22,8($2)\n" \
442 "\tl.d $f24,16($2)\n" \
443 "\tl.d $f26,24($2)\n" \
444 "\tl.d $f28,32($2)\n" \
445 "\tl.d $f30,40($2)\n" \
446 "\tlw $16,48($2)\n" \
447 "\tlw $17,52($2)\n" \
448 "\tlw $18,56($2)\n" \
449 "\tlw $19,60($2)\n" \
450 "\tlw $20,64($2)\n" \
451 "\tlw $21,68($2)\n" \
452 "\tlw $22,72($2)\n" \
453 "\tlw $23,76($2)\n" \
454 "\tlw $fp,80($2)\n" \
455 : : "I" (RESERVED_C_STACK_BYTES+16) : "$2" );
460 %************************************************************************
462 \subsubsection[COptJumps-powerpc]{Tail-jumping on an IBM PowerPC running AIX}
464 %************************************************************************
467 #if powerpc_TARGET_ARCH || rs6000_TARGET_ARCH
469 /* do FUNBEGIN/END the easy way */
470 #define FUNBEGIN __asm__ volatile ("--- BEGIN ---");
471 #define FUNEND __asm__ volatile ("--- END ---");
473 /* try "m68k-style" for now */
474 extern void __DISCARD__(STG_NO_ARGS);
476 /* this is "alpha-style" */
478 do { void *_procedure = (void *)(cont); \
482 #define RESUME_(target) JMP_(target)
484 /* _All_ callee-saved regs, whether we steal them or not, must be saved
488 #define MINI_INTERPRETER_SETUP \
489 StgChar space[RESERVED_C_STACK_BYTES+6*sizeof(double)+19*sizeof(long)]; \
490 __asm__ volatile ("stm 13,-176(1)\n" \
491 "\tstfd 14,-200(1)\n" \
492 "\tstfd 15,-208(1)\n" \
493 "\tstfd 16,-216(1)\n" \
494 "\tstfd 17,-224(1)\n" \
495 "\tstfd 18,-232(1)\n" \
496 "\tstfd 19,-240(1)\n" \
497 : : "I" (RESERVED_C_STACK_BYTES+16) : "1" );
499 /* the 16 bytes is for the argument-register save-area above $sp */
501 #define MINI_INTERPRETER_END \
502 __asm__ volatile (".globl miniInterpretEnd\n" \
503 "miniInterpretEnd:\n" \
504 "\tlm 13,-176(1)\n" \
505 "\tlfd 14,-200(1)\n" \
506 "\tlfd 15,-208(1)\n" \
507 "\tlfd 16,-216(1)\n" \
508 "\tlfd 17,-224(1)\n" \
509 "\tlfd 18,-232(1)\n" \
510 "\tlfd 19,-240(1)\n" \
511 : : "I" (RESERVED_C_STACK_BYTES+16) : "1" );
516 %************************************************************************
518 \subsubsection[COptJumps-sparc]{Tail-jumping on Sun4s}
520 %************************************************************************
522 We want tailjumps to be calls, because `call xxx' is the only Sparc branch
523 that allows an arbitrary label as a target. (Gcc's ``goto *target'' construct
524 ends up loading the label into a register and then jumping, at the cost of
525 two extra instructions for the 32-bit load.)
527 When entering the threaded world, we stash our return address in a known
528 location so that \tr{%i7} is available as an extra callee-saves register.
529 Of course, we have to restore this when coming out of the threaded world.
531 I hate this god-forsaken architecture. Since the top of the reserved
532 stack space is used for globals and the bottom is reserved for outgoing arguments,
533 we have to stick our return address somewhere in the middle. Currently, I'm
534 allowing 100 extra outgoing arguments beyond the first 6. --JSM
537 #if sparc_TARGET_ARCH
539 #ifdef solaris2_TARGET_OS
540 #define MINI_INTERPRET_END "miniInterpretEnd"
542 #define MINI_INTERPRET_END "_miniInterpretEnd"
545 #define JMP_(cont) ((F_) (cont))()
546 /* Oh so happily, the above turns into a "call" instruction,
547 which, on a SPARC, is nothing but a "jmpl" with the
548 return address in %o7 [which we don't care about].
550 #define RESUME_(target) JMP_(target)
552 #define MINI_INTERPRETER_SETUP \
553 StgChar space[RESERVED_C_STACK_BYTES+sizeof(void *)]; \
554 register void *i7 __asm__("%i7"); \
555 ((void **)(space))[100] = i7;
557 #define MINI_INTERPRETER_END \
558 __asm__ volatile (".align 4\n" \
559 ".global " MINI_INTERPRET_END "\n" \
560 MINI_INTERPRET_END ":\n" \
561 "\tld %1,%0" : "=r" (i7) : "m" (((void **)(space))[100]));
563 #endif /* __sparc__ */
566 %************************************************************************
568 \subsubsection[COptJumps-OOPS]{Someone screwed up here, too...}
570 %************************************************************************
572 If one of the above machine-dependent sections wasn't triggered,
573 @JMP_@ won't be defined and you'll get link errors (if not
578 *???????* No JMP_ macro???
581 #endif /* __STG_TAILJUMPS__ */
584 If @FUNBEGIN@ and @FUNEND@ weren't defined, give them the default
585 (nothing). Also, define @FB_@ and @FE_@ (short forms).
587 #if ! defined(FUNBEGIN)
588 #define FUNBEGIN /* nothing */
590 #if ! defined(FUNEND)
591 #define FUNEND /* nothing */
594 #define FB_ FUNBEGIN /* short forms */
597 #endif /* ! that's all of... COPTJUMPS_H */