This was something that came out of a discussion with Makarius about using the native Windows version with Isabelle but it's probably of general interest to anyone who uses or might use the native Windows version.
The current master version uses the ANSI interface to Windows API calls rather than Unicode. I've been experimenting with a version that uses the Unicode, or more correctly UTF-16, interface. What this means is that the conversion between ML strings and, for example file-names, is handled by Poly/ML itself rather than defaulting to the current code-page. It only affects non-ASCII characters.
The experimental code is in the Windows-Unicode branch and requires ./configure CPPFLAGS="-DUNICODE -D_UNICODE" to build the Unicode version. The resulting poly takes a --codepage option to set the code-page to be used for conversion. Probably "utf8" is the most useful argument to give here.
I've been wondering whether to make the Unicode version the default rather than ANSI. There's also the question of how best to specify the codepage. For backwards compatibility I think it should default to the system code-page but UTF-8 is likely to be very popular. Perhaps there should be some programmatic way (PolyML.setWindowCodePage ???) to set it as well as/instead of the command line argument.
Setting the code-page affects file-names, both those used for reading and writing files but also the names returned by OS.FileSys.readDir. It also affects command-line arguments and environment variables. When using the Windows GUI in "poly.exe" it affects the way characters are displayed when text is written to TextIO.stdOut.
David
IMHO, any move which get us closer to unicode everywhere is a good move. Any discomfort this may cause in the immediate is well worth to endure.
Cheers
====
On Mon, Sep 7, 2015 at 2:29 PM, David Matthews <David.Matthews at prolingua.co.uk> wrote:
This was something that came out of a discussion with Makarius about using the native Windows version with Isabelle but it's probably of general interest to anyone who uses or might use the native Windows version.
The current master version uses the ANSI interface to Windows API calls rather than Unicode. I've been experimenting with a version that uses the Unicode, or more correctly UTF-16, interface. What this means is that the conversion between ML strings and, for example file-names, is handled by Poly/ML itself rather than defaulting to the current code-page. It only affects non-ASCII characters.
The experimental code is in the Windows-Unicode branch and requires ./configure CPPFLAGS="-DUNICODE -D_UNICODE" to build the Unicode version. The resulting poly takes a --codepage option to set the code-page to be used for conversion. Probably "utf8" is the most useful argument to give here.
I've been wondering whether to make the Unicode version the default rather than ANSI. There's also the question of how best to specify the codepage. For backwards compatibility I think it should default to the system code-page but UTF-8 is likely to be very popular. Perhaps there should be some programmatic way (PolyML.setWindowCodePage ???) to set it as well as/instead of the command line argument.
Setting the code-page affects file-names, both those used for reading and writing files but also the names returned by OS.FileSys.readDir. It also affects command-line arguments and environment variables. When using the Windows GUI in "poly.exe" it affects the way characters are displayed when text is written to TextIO.stdOut.
David _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
I agree with Pierpaolo.
On 7 September 2015 at 22:37, Pierpaolo Bernardi <olopierpa at gmail.com> wrote:
IMHO, any move which get us closer to unicode everywhere is a good move. Any discomfort this may cause in the immediate is well worth to endure.
Cheers
====
On Mon, Sep 7, 2015 at 2:29 PM, David Matthews <David.Matthews at prolingua.co.uk> wrote:
This was something that came out of a discussion with Makarius about
using
the native Windows version with Isabelle but it's probably of general interest to anyone who uses or might use the native Windows version.
The current master version uses the ANSI interface to Windows API calls rather than Unicode. I've been experimenting with a version that uses
the
Unicode, or more correctly UTF-16, interface. What this means is that
the
conversion between ML strings and, for example file-names, is handled by Poly/ML itself rather than defaulting to the current code-page. It only affects non-ASCII characters.
The experimental code is in the Windows-Unicode branch and requires ./configure CPPFLAGS="-DUNICODE -D_UNICODE" to build the Unicode version. The resulting poly takes a --codepage
option
to set the code-page to be used for conversion. Probably "utf8" is the
most
useful argument to give here.
I've been wondering whether to make the Unicode version the default
rather
than ANSI. There's also the question of how best to specify the
codepage.
For backwards compatibility I think it should default to the system code-page but UTF-8 is likely to be very popular. Perhaps there should
be
some programmatic way (PolyML.setWindowCodePage ???) to set it as well as/instead of the command line argument.
Setting the code-page affects file-names, both those used for reading and writing files but also the names returned by OS.FileSys.readDir. It also affects command-line arguments and environment variables. When using the Windows GUI in "poly.exe" it affects the way characters are displayed
when
text is written to TextIO.stdOut.
David _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On Mon, 7 Sep 2015, David Matthews wrote:
The experimental code is in the Windows-Unicode branch and requires ./configure CPPFLAGS="-DUNICODE -D_UNICODE" to build the Unicode version. The resulting poly takes a --codepage option to set the code-page to be used for conversion. Probably "utf8" is the most useful argument to give here.
I've tried that, and updated the current polyml test component for Isabelle accordingly -- see repository version http://isabelle.in.tum.de/repos/isabelle/rev/4010e1559a24. It also includes x86_64-windows now, but that is not used by default.
Accessing the file-system with Unicode works, but there is a small problem with OS.Path.toString: it raises exception InvalidArc for Unicode arcs. For example:
OS.Path.toString {isAbs = false, vol = "", arcs = ["?"]}
The above changeset 4010e1559a24 avoids that slightly odd SML Basis Library operation, so there is no imminent problem.
I've been wondering whether to make the Unicode version the default rather than ANSI. There's also the question of how best to specify the codepage.
An explicit option is OK for our Isabelle setup.
Makarius
On 08/09/2015 21:35, Makarius wrote:
On Mon, 7 Sep 2015, David Matthews wrote:
The experimental code is in the Windows-Unicode branch and requires ./configure CPPFLAGS="-DUNICODE -D_UNICODE" to build the Unicode version. The resulting poly takes a --codepage option to set the code-page to be used for conversion. Probably "utf8" is the most useful argument to give here.
I've tried that, and updated the current polyml test component for Isabelle accordingly -- see repository version http://isabelle.in.tum.de/repos/isabelle/rev/4010e1559a24. It also includes x86_64-windows now, but that is not used by default.
Accessing the file-system with Unicode works, but there is a small problem with OS.Path.toString: it raises exception InvalidArc for Unicode arcs. For example:
OS.Path.toString {isAbs = false, vol = "", arcs = ["?"]}
I've fixed that. The Windows-Unicode branch has now been merged into master. The -DUNICODE -D_UNICODE options are automatically included in the Windows build. There isn't currently a programmatic way to change the code-page but I may add a function to do that.
David
On Thu, 10 Sep 2015, David Matthews wrote:
The Windows-Unicode branch has now been merged into master. The -DUNICODE -D_UNICODE options are automatically included in the Windows build. There isn't currently a programmatic way to change the code-page but I may add a function to do that.
Great. I have now updated the Isabelle setup accordingly. It works smoothly for x86-windows and x86_64-windows.
The formal references are here: https://github.com/polyml/polyml/commit/4eba188ce05c99959bf20cd0fbc347d515eb... http://isabelle.in.tum.de/repos/isabelle/file/9e81e87f755b/Admin/polyml/buil...
The above polyml/build script may be taken as a blueprint for anybody who wants to compile it by himself. See also the other files in that directory, especially INSTALL-MinGW.
Makarius