Fix a bug that can lead to noDuplicate# not working sometimes.
The symptom is that under some rare conditions when running in
parallel, an unsafePerformIO or unsafeInterleaveIO computation might
be duplicated, so e.g. lazy I/O might give the wrong answer (the
stream might appear to have duplicate parts or parts missing).
I have a program that demonstrates it -N3 or more, some lazy I/O, and
a lot of shared mutable state. See the comment with stg_noDuplicatezh
in PrimOps.cmm that explains the problem and the fix. This took me
about a day to find :-(