Hi all,
Please forgive the previous posting without data. I've been doing some
get-out-the-vote work in California's famous democratic election, still
scheduled for October 7th. As part of this work I have been using PolyML to
tokenize voter record strings. I thought that we could not only surpass
Florida, but even Canada's overnight hand counted vote. However, I am
beginning to have my doubts.
To my shock, some tokenizations from repeated token separators dropped
fields which I believe should have resulted in empty strings. Are my
expectations reasonable, or have I subtly misinterpreted the use of
String.tokenize?
Note that the field that would convey whether Miss Maxwell's male
counterpart was a Sr, Jr, II, III, XIV, etc, has been dropped, making the
results unsuitable for reading into an RDBMS, for example. Also, at the
end, following the token "General," is the token "\n," when I believe there
should be several empty strings inbetween.
Thanks,
Byron Hale
Sample input/output follows, with data altered to protect the voter. (This
time with data.:)
val voter = "Miss\tMaxwell\tMaria\tElaine\t\t2323 Fremont St \tSanta
Clara\tCA\t95050\t2323 Fremont St \tSanta Clara CA
95050\t(408)555-3323\tGeneral\t\t\t\t\t\t\n";
String.tokens(fn ch => (#"\t" = ch)) voter;
val it =
["Miss", "Maxwell", "Maria", "Elaine", "2323 Fremont St ", "Santa Clara",
"CA", "95050", "2323 Fremont St ", "Santa Clara CA 95050",
"(408)555-3323", "General", "\n"] : String.string list