Description
Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software.
Description
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.
README.md
interlineaR : utility functions for importing into R interlinearized corpora and dictionaries
Author: Sylvain Loiseau
License:BSD_3_clause
Installation
devtools::install_github("sylvainloiseau/interlineaR", build_vignettes=TRUE)
Usage
Import an interlinearised corpus in the EMELD XML format (as exported from SIL FieldWorks for instance):
path <- system.file("exampleData", "tuwariInterlinear.xml", package="interlineaR")
corpus <- read.emeld(path, vernacular.languages="tww")
Import an interlinearised corpus in Toolbox (SIL) format:
path <- system.file("exampleData", "tuwariToolbox.txt", package="interlineaR")
corpus <- read.toolbox(path)
Import a dictionary in the LIFT XML format (as exported from SIL FieldWorks for instance):
dicpath <- system.file("exampleData", "tuwariDictionary.lift", package="interlineaR")
dictionary <- read.lift(dicpath, language.code="tww")
Documentation
See the vignette interlineaR for an overview of the data model and the functions of this package.