Hi all,
Please forgive the previous posting without data. I've been doing some get-out-the-vote work in California's famous democratic election, still scheduled for October 7th. As part of this work I have been using PolyML to tokenize voter record strings. I thought that we could not only surpass Florida, but even Canada's overnight hand counted vote. However, I am beginning to have my doubts.
To my shock, some tokenizations from repeated token separators dropped fields which I believe should have resulted in empty strings. Are my expectations reasonable, or have I subtly misinterpreted the use of String.tokenize?
Note that the field that would convey whether Miss Maxwell's male counterpart was a Sr, Jr, II, III, XIV, etc, has been dropped, making the results unsuitable for reading into an RDBMS, for example. Also, at the end, following the token "General," is the token "\n," when I believe there should be several empty strings inbetween.
Thanks, Byron Hale
Sample input/output follows, with data altered to protect the voter. (This time with data.:)
val voter = "Miss\tMaxwell\tMaria\tElaine\t\t2323 Fremont St \tSanta Clara\tCA\t95050\t2323 Fremont St \tSanta Clara CA 95050\t(408)555-3323\tGeneral\t\t\t\t\t\t\n";
String.tokens(fn ch => (#"\t" = ch)) voter;
val it = ["Miss", "Maxwell", "Maria", "Elaine", "2323 Fremont St ", "Santa Clara", "CA", "95050", "2323 Fremont St ", "Santa Clara CA 95050", "(408)555-3323", "General", "\n"] : String.string list