This was something that came out of a discussion with Makarius about
using the native Windows version with Isabelle but it's probably of
general interest to anyone who uses or might use the native Windows version.
The current master version uses the ANSI interface to Windows API calls
rather than Unicode. I've been experimenting with a version that uses
the Unicode, or more correctly UTF-16, interface. What this means is
that the conversion between ML strings and, for example file-names, is
handled by Poly/ML itself rather than defaulting to the current
code-page. It only affects non-ASCII characters.
The experimental code is in the Windows-Unicode branch and requires
./configure CPPFLAGS="-DUNICODE -D_UNICODE"
to build the Unicode version. The resulting poly takes a --codepage
option to set the code-page to be used for conversion. Probably "utf8"
is the most useful argument to give here.
I've been wondering whether to make the Unicode version the default
rather than ANSI. There's also the question of how best to specify the
codepage. For backwards compatibility I think it should default to the
system code-page but UTF-8 is likely to be very popular. Perhaps there
should be some programmatic way (PolyML.setWindowCodePage ???) to set it
as well as/instead of the command line argument.
Setting the code-page affects file-names, both those used for reading
and writing files but also the names returned by OS.FileSys.readDir. It
also affects command-line arguments and environment variables. When
using the Windows GUI in "poly.exe" it affects the way characters are
displayed when text is written to TextIO.stdOut.
David