MyNixOS website logo
Description

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity.

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Note: For full functionality a local installation of TreeTagger is recommended. It is also recommended to not load this package directly, but by loading one of the available language support packages from the 'l10n' repository <https://undocumeantit.github.io/repos/l10n/>. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev mailing list (<https://korpusml.reaktanz.de>).

koRpus

koRpus is an R package for text analysis. This includes, amongst others, a wrapper for the POS tagger TreeTagger, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall, Tuldava).

koRpus also includes a plugin for RKWard, a powerful GUI and IDE for R, providing graphical dialogs for its basic features. To make full use of this feature, please install RKWard (plugins are detected automatically).

More information on koRpus is available on the project homepage.

Installation

There are three easy ways of getting koRpus:

Stable releases via CRAN

The latest release that is considered stable for productive work can be found on the CRAN mirrors, which means you can install it from a running R session like this:

install.packages("koRpus")

The CRAN packages are usually a bit behind the recent state of the package, and only updated after a significant amount of changes or important bug fixes.

Development releases via the project repository

Inbetween stable CRAN releases there's usually several testing or development versions released on the project's own repository. These releases should also work without problems, but they are also intended to test new features or supposed bug fixes, and get feedback before the next release goes to CRAN.

Installation is fairly easy, too:

install.packages("koRpus", repo=c(getOption("repos"), reaktanz="https://reaktanz.de/R"))

To automatically get updates, consider adding the repository to your R configuration. You might also want to subscribe to the package's RSS feed to get notified of new releases.

If you're running a Debian based operating system, you might be interested in the precompiled *.deb packages.

Installation via GitHub

To install it directly from GitHub, you can use install_github() from the devtools package:

devtools::install_github("unDocUMeantIt/koRpus") # stable release
devtools::install_github("unDocUMeantIt/koRpus", ref="develop") # development release

Installing language support

koRpus does not support any particular language out-of-the-box. Therefore, after installing the package you'll have to also install at least one language support package to really make use of it. You can find these in the l10n repository, they're called koRpus.lang.*.

The most straight forward way to get these packages is to use the function install.koRpus.lang(). Here's an example how to install support for English and German:

library(koRpus)
install.koRpus.lang(lang=c("en", "de"))

There's also precompiled Debian packages.

Contributing

To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please either subscribe to the koRpus-dev mailing list, or use the issue tracker on GitHub.

Branches

Please note that all development happens in the develop branch. Pull requests against the master branch will be rejected, as it is reserved for the current stable release.

Licence

Copyright 2012-2021 Meik Michalke [email protected]

koRpus is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

koRpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with koRpus. If not, see https://www.gnu.org/licenses/.

Metadata

Version

0.13-8

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows