Ersin Er wrote a brief blog post about handling the Turkish language in Haskell. Because Turkish uses a character set that mostly looks familiar to Westerners, it is notorious for its ability to trip up the unwary programmer (see examples in PHP and PostgreSQL).
In the text-icu library, we use the
LocaleNametype to describe the locale in which we want a function to operate. This type is an instance of the
IsStringclass, so if we enable the OverloadedStrings language feature, we can write plain
"tr-TR"to specify a Turkish locale.
Texttype is also an instance of the
IsStringclass, so we can write a literal string like
"foo"and the compiler will infer the correct type for it.
The Data.Text.IO module contains functions for performing locale-sensitive I/O using
This combination of features can let us write a less cluttered program, following the dictum that simple things should be simple:
I've intentionally kept the number of lines the same to preserve clarity, but there are a few advantages to the rewrite:
Less clutter, more speed: we don't need to explicitly pack or unpack
Textvalues to or from
Performance: we're not performing I/O on
Stringvalues. This would be a big deal if we were writing a real application: I/O with
Textis much faster than with
Putting inference to work: the compiler correctly infers the type of
"tr-TR"to be a
LocaleName, and of the strings at the end to be
Text, so we don't need to be so explicit.
Oh, and we still give the right answer (look carefully at upper and lower case dotted and dotless "I"):
toLower Ã‡IÄ°ÄžÃ–ÅžÃœ gives Ã§Ä±iÄŸÃ¶ÅŸÃ¼
The full documentation to the text and text-icu libraries is a little difficult to read on Hackage (in fact, the text-icu API docs are completely missing), so here are links: