MyNixOS website logo
Description

Customisable Stop-Words in 110 Languages.

Functions to generate stop-word lists in 110 languages, in a way consistent across all the languages supported. The generated lists are based on the morphological tagset from the Universal Dependencies.

tidystopwords: R package for multilingual stopwords

Authors: Silvie Cinková*, Maciej Eder
License:GPL-3

An R package containing customizable lists of stopwords in multiple languages; it attempts to follow tidy data principles.

The idea behind this package is to provide stopwords for less-resourced languages as well as give the user control over the stopword selection with respect to parts of speech. For the purposes of this package, stopwords are defined as forms of function words from closed parts of speech (e.g. prepositions, conjunctions, auxiliary verbs, and pronouns). The core generate_stoplist() function relies on multilingual_stopwords(), a large data frame derived from the current release of the Universal Dependencies Treebanks. We have included all languages. The data comes encoded in UTF-8. The vocabulary coverage for each language depends on the size, textual diversity, and annotation quality of the available treebanks. No manual post-editing was performed.

Installation

Install the package directly from the GitHub repository:

library(devtools)
install_github("computationalstylistics/stopwoRds", build_vignettes = TRUE)
Metadata

Version

0.9.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows