Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this
authorMax Bolingbroke <batterseapower@hotmail.com>
Sat, 14 May 2011 21:50:46 +0000 (22:50 +0100)
committerMax Bolingbroke <batterseapower@hotmail.com>
Sat, 14 May 2011 21:50:46 +0000 (22:50 +0100)
commit509f28cc93b980d30aca37008cbe66c677a0d6f6
tree93d3ff075d0442e4c9321f25038263c4fb414bd0
parentb751723d882e51241f04d6d2ec46fce70f0e0817
Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this
patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855.

The major changes are:

 1) Make Foreign.C.String.*CString use the locale encoding

    This change follows the FFI specification in Haskell 98, which
    has never actually been implemented before.

    The functions exported from Foreign.C.String are partially-applied
    versions of those from GHC.Foreign, which allows the user to supply
    their own TextEncoding.

    We also introduce foreignEncoding as the name of the text encoding
    that follows the FFI appendix in that it transliterates encoding
    errors.

 2) I also changed the code so that mkTextEncoding always tries the
    native-Haskell decoders in preference to those from iconv, even on
    non-Windows. The motivation here is simply that it is better for
    compatibility if we do this, and those are the ones you get for
    the utf* and latin1* predefined TextEncodings anyway.

 3) Implement surrogate-byte error handling mode for TextEncoding

    This implements PEP383-like behaviour so that we are able to
    roundtrip byte strings through Strings without loss of information.

    The withFilePath function now uses this encoding to get to/from CStrings,
    so any code that uses that will get the right PEP383 behaviour automatically.

 4) Implement three other coding failure modes: ignore, throw error, transliterate

    These mimic the behaviour of the GNU Iconv extensions.
24 files changed:
Control/Exception/Base.hs
Foreign/C/String.hs
GHC/Conc/Windows.hs
GHC/Environment.hs
GHC/Foreign.hs [new file with mode: 0644]
GHC/IO.hs
GHC/IO/Encoding.hs
GHC/IO/Encoding.hs-boot [new file with mode: 0644]
GHC/IO/Encoding/CodePage.hs
GHC/IO/Encoding/Failure.hs [new file with mode: 0644]
GHC/IO/Encoding/Iconv.hs
GHC/IO/Encoding/Latin1.hs
GHC/IO/Encoding/Types.hs
GHC/IO/Encoding/UTF16.hs
GHC/IO/Encoding/UTF32.hs
GHC/IO/Encoding/UTF8.hs
GHC/IO/FD.hs
GHC/IO/Handle/Internals.hs
GHC/Windows.hs [new file with mode: 0644]
System/Environment.hs
System/IO.hs
System/Posix/Internals.hs
System/Posix/Internals.hs-boot [new file with mode: 0644]
base.cabal