Unicode characters names and aliases.
unicode-data-names
provides Haskell APIs to access the Unicode character names and aliases from the Unicode character database (UCD).
The Haskell data structures are generated programmatically from the UCD files. The latest Unicode version supported by this library is 15.1.0
.
README
unicode-data-names
provides Haskell APIs to efficiently access the Unicode character names and aliases from the Unicode character database.
There are 3 APIs:
String
API: enabled by default.ByteString
API: enabled via the package flaghas-bytestring
.Text
API: enabled via the package flaghas-text
.
The Haskell data structures are generated programmatically from the Unicode character database (UCD) files. The latest Unicode version supported by this library is 15.1.0
.
Please see the Haddock documentation for reference documentation.
Comparing with ICU
We can compare the implementation against ICU. This requires working with the source repository, as we need the internal package icu
.
Warning: An ICU version with the exact same Unicode version is required.
cabal run -O2 --flag dev-has-icu unicode-data-names:tests -- -m ICU
Comparing with Python
In order to check Unicode implementation in Haskell, we compare the results obtained with Python.
Warning: A Python version with the exact same Unicode version is required.
cabal run -O2 -f "export-all-chars" -v0 export-all-chars > ./test/all_chars.csv
python3 ./test/check.py -v ./test/all_chars.csv
Licensing
unicode-data-names
is an open source project available under a liberal Apache-2.0 license.
Contributing
As an open project we welcome contributions.