MyNixOS website logo
Description

Text Processing for Small or Big Data Files.

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

textTinyR


The textTinyR package consists of text processing functions for small or big data files. More details on the functionality of textTinyR can be found in blog-post1 and blog-post2. The R package can be installed, in the following Operating Systems: Linux, Mac and Windows. However, there is one limitation : chinese, japanese, korean, thai or languages with ambiguous word boundaries are not supported.


UPDATE 01-04-2018 : boost-locale is no longer a system requirement for the textTinyR package.


Installation of the textTinyR package (CRAN, Github)


To install the package from CRAN use,


install.packages('textTinyR')



and to download the latest version from Github use the install_github function of the devtools package,


devtools::install_github(repo = 'mlampros/textTinyR')



Use the following link to report bugs/issues,

https://github.com/mlampros/textTinyR/issues



UPDATE 06-02-2020


Docker images of the textTinyR package are available to download from my dockerhub account. The images come with Rstudio and the R-development version (latest) installed. The whole process was tested on Ubuntu 18.04. To pull & run the image do the following,



docker pull mlampros/texttinyr:rstudiodev

docker run -d --name rstudio_dev -e USER=rstudio -e PASSWORD=give_here_your_password --rm -p 8787:8787 mlampros/texttinyr:rstudiodev


The user can also bind a home directory / folder to the image to use its files by specifying the -v command,



docker run -d --name rstudio_dev -e USER=rstudio -e PASSWORD=give_here_your_password --rm -p 8787:8787 -v /home/YOUR_DIR:/home/rstudio/YOUR_DIR mlampros/texttinyr:rstudiodev



In the latter case you might have first give permission privileges for write access to YOUR_DIR directory (not necessarily) using,



chmod -R 777 /home/YOUR_DIR



The USER defaults to rstudio but you have to give your PASSWORD of preference (see https://rocker-project.org/ for more information).


Open your web-browser and depending where the docker image was build / run give,


1st. Option on your personal computer,


http://0.0.0.0:8787 


2nd. Option on a cloud instance,


http://Public DNS:8787


to access the Rstudio console in order to give your username and password.


Citation:

If you use the code of this repository in your paper or research please cite both textTinyR and the original softwarehttps://CRAN.R-project.org/package=textTinyR/citation.html:


@Manual{,
  title = {{textTinyR}: Text Processing for Small or Big Data Files},
  author = {Lampros Mouselimis},
  year = {2021},
  note = {R package version 1.1.8},
  url = {https://CRAN.R-project.org/package=textTinyR},
}

Metadata

Version

1.1.8

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows