MyNixOS website logo
Description

Quantitative Analysis of Textual Data.

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

quanteda: quantitative analysis of textualdata

CRANVersion Downloads TotalDownloads R-CMD-check codecov DOI DOI

About

quanteda is an R package for managing and analyzing text, created and maintained by Kenneth Benoit and Kohei Watanabe. Its creation was funded by the European Research Council grant ERC-2011-StG 283794-QUANTESS and its continued development is supported by the Quanteda Initiative CIC.

For more details, see https://quanteda.io.

quanteda version 4

The quanteda 4.0 is a major release that improves functionality and performance and further improves function consistency by removing previously deprecated functions. It also includes significant new tokeniser rules that make the default tokeniser smarter than ever, with new Unicode and ICU-compliant rules enabling it to work more consistently with even more languages.

We describe more fully these significant changes in:

The quanteda family of packages

We completed the trend of splitting quanteda into modular packages with the release of v3. The quanteda family of packages includes the following:

  • quanteda: contains all of the core natural language processing and textual data management functions
  • quanteda.textmodels: contains all of the text models and supporting functions, namely the textmodel_*() functions. This was split from the main package with the v2 release
  • quanteda.textstats: statistics for textual data, namely the textstat_*() functions, split with the v3 release
  • quanteda.textplots: plots for textual data, namely the textplot_*() functions, split with the v3 release

We are working on additional package releases, available in the meantime from our GitHub pages:

  • quanteda.sentiment: Functions and lexicons for sentiment analysis using dictionaries
  • quanteda.tidy: Extensions for manipulating document variables in core quanteda objects using your favourite tidyverse functions

and more to come.

How To…

Install (binaries) from CRAN

The normal way from CRAN, using your R GUI or

install.packages("quanteda") 

(New for quanteda v4.0) For Linux users: Because all installations on Linux are compiled, Linux users will first need to install the Intel oneAPI Threading Building Blocks for parallel computing for installation to work.

To install TBB on Linux:

# Fedora, CentOS, RHEL
sudo yum install tbb-devel

# Debian and Ubuntu
sudo apt install libtbb-dev

Windows or macOS users do not have to install TBB or any other packages to enable parallel computing when installing quanteda from CRAN.

Compile from source (macOS and Windows)

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers to build the development version.

You will also need to install TBB:

macOS:

First, you will need to install XCode command line tools.

xcode-select --install

Then install the TBB libraries and the pkg-config utility: (after installing Homebrew):

brew install tbb pkg-config

Finally, you will need to install gfortran.

Windows:

Install RTools, which includes the TBB libraries.

Use quanteda

See the quick start guide to learn how to use quanteda.

Get Help

Cite the package

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leave Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contribute

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Metadata

Version

4.0.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows