The bug in this case was that we had a worker thread making a foreign
call which invoked a callback (in this case it was performGC, I
think). When the callback ended, boundTaskExiting() was setting
task->stopped, but the Task is now per-OS-thread, so it is shared by
the worker that made the original foreign call. When the foreign call
returned, because task->stopped was set, the worker was not placed on
the queue of spare workers. Somehow the worker woke up again, and
found the spare_workers queue empty, which lead to a crash.
Two bugs here: task->stopped should not have been set by
boundTaskExiting (this broke when I split the Task and InCall structs,
in 6.12.2), and releaseCapabilityAndQueueWorker() should not be
testing task->stopped anyway, because it should only ever be called
when task->stopped is false (this is now an assertion).
void
boundTaskExiting (Task *task)
{
void
boundTaskExiting (Task *task)
{
- task->stopped = rtsTrue;
-
#if defined(THREADED_RTS)
ASSERT(osThreadId() == task->id);
#endif
#if defined(THREADED_RTS)
ASSERT(osThreadId() == task->id);
#endif
+ // Set task->stopped, but only if this is the last call (#4850).
+ // Remember that we might have a worker Task that makes a foreign
+ // call and then a callback, so it can transform into a bound
+ // Task for the duration of the callback.
+ if (task->incall == NULL) {
+ task->stopped = rtsTrue;
+ }
+
debugTrace(DEBUG_sched, "task exiting");
}
debugTrace(DEBUG_sched, "task exiting");
}