MyNixOS website logo
Description

All Stop Words in One Place.

A standalone package combining several stop-word lists for 65 languages with a median of 329 stop words for language and over 1,000 entries for English, Breton, Latin, Slovenian, and Ancient Greek! The user automatically gets access to all the unique stop words contained in: the 'StopwordISO' repository; the 'Natural Language Toolkit' for 'python'; the 'Snowball' stop-word list; the R package 'quanteda'; the 'marimo' repository; the 'Perseus' project; and A. Berra's list of stop words for Ancient Greek and Latin.

morestopwords: All Stop Words in One Place

Author: Fabio Ashtar Telarico, University of Ljubljana, FDV





Introduction

stopwords is an R package originally developed by Kohei Watanabe of the Waseda Institute for Advanced Study (check out his publications here) that provides easy access to stopwords in more than 50 languages in the Stopwords ISO library.

The package has not been updated since Dec 22, 2017 and was not installable anymore from GitHub. So, this reboot happened to grant continuity to the project.

Installation

CRAN (Stable release)

install.packages('morestopwords')

GitHub (Development version)

if(requireNamespace('remotes'))
remotes::install_github('fatelarico/morestopwords')

Usage

The code base has changed since version 0.1.0 (the last maintained by Dr. Watanabe). Now, the function stopwords::stopwords() supports not only two-letter ISO codes, but also three-letter ones. Moreover, it can identify languages by their ISO name (e.g., German, not Deutsch; Swedish, not Sverige, etc.).

Comparison to similar packages

The package stopwords is also based on Watanabe's archived GitHub repository. Thus, it is the most similar to morestopwords, too. However, these two packages are differentiated by both design choices and features:

  1. morestopwords has got no dependencies and integrates with the package cld2.
  2. morestopwords can (if cld2 is installed) identify the language of one (or more) string(s) automatically
  3. morestopwords can remove stop words from one or more strings either in conjuction with language detection or independently.
  4. morestopwords does not allow the user to choose a list of stop words to use. Rather, it tries to provide the most comprehensive list in an intuitive way.
  5. morestopwords's lists include more stop words than any single list included in stopwords.

Metadata

Version

0.2.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows