converting text to properly encoded german umlauts.
converts the convenient ae, oe and ue replacements for german umlauts into their proper UTF-8 encoded umlauts - respecting cases where the ae, oe and ue must remain based on a extensible list. Treats a file completely.
Replacement of ae oe ue by the UTF-8 umlaut-glyph ä ö ü
ReplaceUmlaut processes a txt
or md
file (or all md
files in a directory) and replaces the umlaut to its proper form (possibly capitalized). It uses an extensible list to avoid replacements which are not appropriate.
The file nichtUmlaut.txt lists exceptions
An extensible list of parts of words which contain character combinations which should not be replaced; for example, Koeffizient. The list needs only contain parts of words (i.e. koeff) not a complete list of all words with non-replaceable combinations.
Command Line Use
The command replaceUmlaut filepath
processes just the file and returns the changed file (the original is renamed with extension bak
). Switches: - -m directory
processes all md
files in the directory. - -d
is a debug option; files are not changed, but a changed version is returned with extension new
.
Function
The function procMd1
is useful to process a single md
file in a program. The list of permitted combinations must be passed as an argument. It returns True if the file is changed. Typical use is
changed <- header
then do
erl1 <- readErlaubt fnErlaubt
let addErl = dyDoNotReplace . meta1 $ doc1
-- allow additions to the list in the YAML header
erl2 = addErl -- add erl1
changed1 <- applyReplace debugReplace erl2 fnin
return changed1