This was something that came out of a discussion with Makarius about using the native Windows version with Isabelle but it's probably of general interest to anyone who uses or might use the native Windows version.
The current master version uses the ANSI interface to Windows API calls rather than Unicode. I've been experimenting with a version that uses the Unicode, or more correctly UTF-16, interface. What this means is that the conversion between ML strings and, for example file-names, is handled by Poly/ML itself rather than defaulting to the current code-page. It only affects non-ASCII characters.
The experimental code is in the Windows-Unicode branch and requires ./configure CPPFLAGS="-DUNICODE -D_UNICODE" to build the Unicode version. The resulting poly takes a --codepage option to set the code-page to be used for conversion. Probably "utf8" is the most useful argument to give here.
I've been wondering whether to make the Unicode version the default rather than ANSI. There's also the question of how best to specify the codepage. For backwards compatibility I think it should default to the system code-page but UTF-8 is likely to be very popular. Perhaps there should be some programmatic way (PolyML.setWindowCodePage ???) to set it as well as/instead of the command line argument.
Setting the code-page affects file-names, both those used for reading and writing files but also the names returned by OS.FileSys.readDir. It also affects command-line arguments and environment variables. When using the Windows GUI in "poly.exe" it affects the way characters are displayed when text is written to TextIO.stdOut.
David