From 4472bf9a7516f18dd07748475eefc4098aca990d Mon Sep 17 00:00:00 2001 From: sewardj Date: Wed, 27 Feb 2002 12:46:25 +0000 Subject: [PATCH] [project @ 2002-02-27 12:46:25 by sewardj] Add description of how f-x-{static,dynamic} are implemented. --- ghc/docs/comm/index.html | 2 + ghc/docs/comm/the-beast/fexport.html | 223 ++++++++++++++++++++++++++++++++++ 2 files changed, 225 insertions(+) create mode 100644 ghc/docs/comm/the-beast/fexport.html diff --git a/ghc/docs/comm/index.html b/ghc/docs/comm/index.html index ac2379e..9ecc7a0 100644 --- a/ghc/docs/comm/index.html +++ b/ghc/docs/comm/index.html @@ -66,6 +66,8 @@

Alien Functions

The Native Code Generator

GHCi +

Implementation of + foreign export

RTS & Libraries

diff --git a/ghc/docs/comm/the-beast/fexport.html b/ghc/docs/comm/the-beast/fexport.html new file mode 100644 index 0000000..531e93a --- /dev/null +++ b/ghc/docs/comm/the-beast/fexport.html @@ -0,0 +1,223 @@ + + + + + The GHC Commentary - foreign export + + + +

The GHC Commentary - foreign export

+ + The implementation scheme for foreign export, as of 27 Feb 02, is + as follows. There are four cases, of which the first two are easy. +

+ (1) static export of an IO-typed function from some module MMM +

+ foreign export foo :: Int -> Int -> IO Int +

+ For this we generate no Haskell code. However, a C stub is + generated, and it looks like this: +

+extern StgClosure* MMM_foo_closure;
+
+HsInt foo (HsInt a1, HsInt a2)
+{
+   SchedulerStatus rc;
+   HaskellObj ret;
+   rc = rts_evalIO(
+           rts_apply(rts_apply(MMM_foo_closure,rts_mkInt(a1)),
+                     rts_mkInt(a2)
+                    ),
+           &ret
+        );
+   rts_checkSchedStatus("foo",rc);
+   return(rts_getInt(ret));
+}
+

+ This does the obvious thing: builds in the heap the expression + (foo a1 a2), calls rts_evalIO to run it, + and uses rts_getInt to fish out the result. + +

+ (2) static export of a non-IO-typed function from some module MMM +

+ foreign export foo :: Int -> Int -> Int +

+ This is identical to case (1), with the sole difference that the + stub calls rts_eval rather than + rts_evalIO. +

+ + (3) dynamic export of an IO-typed function from some module MMM +

+ foreign export mkCallback :: (Int -> Int -> IO Int) -> IO (FunPtr a) +

+ Dynamic exports are a whole lot more complicated than their static + counterparts. +

+ First of all, we get some Haskell code, which, when given a + function callMe :: (Int -> Int -> IO Int) to be made + C-callable, IO-returns a FunPtr a, which is the + address of the resulting C-callable code. This address can now be + handed out to the C-world, and callers to it will get routed + through to callMe. +

+ The generated Haskell function looks like this: +

+mkCallback f
+  = do sp <- mkStablePtr f
+       r  <- ccall "createAdjustorThunk" sp (&"run_mkCallback")
+       return r
+

+ createAdjustorThunk is a gruesome, + architecture-specific function in the RTS. It takes a stable + pointer to the Haskell function to be run, and the address of the + associated C wrapper, and returns a piece of machine code, + which, when called from the outside (C) world, eventually calls + through to f. +

+ This machine code fragment is called the "Adjustor Thunk" (don't + ask me why). What it does is simply to call onwards to the C + helper + function run_mkCallback, passing all the args given + to it but also conveying sp, which is a stable + pointer + to the Haskell function to run. So: +

+createAdjustorThunk ( StablePtr sp, CCodeAddress addr_of_helper_C_fn ) 
+{
+   create malloc'd piece of machine code "mc", behaving thusly:
+
+   mc ( args_to_mc ) 
+   { 
+      jump to addr_of_helper_C_fn, passing sp as an additional
+      argument
+   }
+

+ This is a horrible hack, because there is no portable way, even at + the machine code level, to function which adds one argument and + then transfers onwards to another C function. On x86s args are + pushed R to L onto the stack, so we can just push sp, + fiddle around with return addresses, and jump onwards to the + helper C function. However, on architectures which use register + windows and/or pass args extensively in registers (Sparc, Alpha, + MIPS, IA64), this scheme borders on the unviable. GHC has a + limited createAdjustorThunk implementation for Sparc + and Alpha, which handles only the cases where all args, including + the extra one, fit in registers. +

+ Anyway: the other lump of code generated as a result of a + f-x-dynamic declaration is the C helper stub. This is basically + the same as in the static case, except that it only ever gets + called from the adjustor thunk, and therefore must accept + as an extra argument, a stable pointer to the Haskell function + to run, naturally enough, as this is not known until run-time. + It then dereferences the stable pointer and does the call in + the same way as the f-x-static case: +

+HsInt Main_d1kv ( StgStablePtr the_stableptr, 
+                  void* original_return_addr, 
+                  HsInt a1, HsInt a2 )
+{
+   SchedulerStatus rc;
+   HaskellObj ret;
+   rc = rts_evalIO(
+           rts_apply(rts_apply((StgClosure*)deRefStablePtr(the_stableptr),
+                               rts_mkInt(a1)
+                     ),
+                     rts_mkInt(a2)
+           ),
+           &ret
+        );
+   rts_checkSchedStatus("Main_d1kv",rc);
+   return(rts_getInt(ret));
+}
+

+ Note how this function has a purely made-up name + Main_d1kv, since unlike the f-x-static case, this + function is never called from user code, only from the adjustor + thunk. +

+ Note also how the function takes a bogus parameter + original_return_addr, which is part of this extra-arg + hack. The usual scheme is to leave the original caller's return + address in place and merely push the stable pointer above that, + hence the spare parameter. +

+ Finally, there is some extra trickery, detailed in + ghc/rts/Adjustor.c, to get round the following + problem: the adjustor thunk lives in mallocville. It is + quite possible that the Haskell code will actually + call free() on the adjustor thunk used to get to it + -- because otherwise there is no way to reclaim the space used + by the adjustor thunk. That's all very well, but it means that + the C helper cannot return to the adjustor thunk in the obvious + way, since we've already given it back using free(). + So we leave, on the C stack, the address of whoever called the + adjustor thunk, and before calling the helper, mess with the stack + such that when the helper returns, it returns directly to the + adjustor thunk's caller. +

+ That's how the stdcall convention works. If the + adjustor thunk has been called using the ccall + convention, we return indirectly, via a statically-allocated + yet-another-magic-piece-of-code, which takes care of removing the + extra argument that the adjustor thunk pushed onto the stack. + This is needed because in ccall-world, it is the + caller who removes args after the call, and the original caller of + the adjustor thunk has no way to know about the extra arg pushed + by the adjustor thunk. +

+ You didn't really want to know all this stuff, did you? +

+ + + + (4) dynamic export of an non-IO-typed function from some module MMM +

+ foreign export mkCallback :: (Int -> Int -> Int) -> IO (FunPtr a) +

+ (4) relates to (3) as (2) relates to (1), that is, it's identical, + except the C stub uses rts_eval instead of + rts_evalIO. +

+ + +

Some perspective on f-x-dynamic

+ + The only really horrible problem with f-x-dynamic is how the + adjustor thunk should pass to the C helper the stable pointer to + use. Ideally we would like this to be conveyed via some invisible + side channel, since then the adjustor thunk could simply jump + directly to the C helper, with no non-portable stack fiddling. +

+ Unfortunately there is no obvious candidate for the invisible + side-channel. We've chosen to pass it on the stack, with the + bad consequences detailed above. Another possibility would be to + park it in a global variable, but this is non-reentrant and + non-(OS-)thread-safe. A third idea is to put it into a callee-saves + register, but that has problems too: the C helper may not use that + register and therefore we will have trashed any value placed there + by the caller; and there is no C-level portable way to read from + the register inside the C helper. +

+ In short, we can't think of a really satisfactory solution. I'd + vote for introducing some kind of OS-thread-local-state and passing + it in there, but that introduces complications of its own. + + +

+ + +Last modified: Weds 27 Feb 02 + + + + -- 1.7.10.4