MyNixOS website logo
Description

Provide Tools to Extract and Analyze Word Vectors.

Provides access to various word embedding methods (GloVe, fasttext and word2vec) to extract word vectors using a unified framework to increase reproducibility and correctness.

wordsalad

R buildstatus Lifecycle:experimental

The goal of wordsalad is to provide a unified interface for word embedding methods to produce word vectors. wordsalad doesn’t provide the implementation of these methods, only a unified interface. For more information on the specific method please refer to the documentation.

Goals of this package are:

  • Allow the specification of tokenizer used
  • Streamline argument names and order
  • Consistent output formats
  • Avoid the need for creating temporary files

Installation

If you want the development version instead then install directly from GitHub:

# install.packages("devtools")
devtools::install_github("EmilHvitfeldt/wordsalad")

Example

library(wordsalad)

glove(fairy_tales)
#> # A tibble: 451 x 11
#>    tokens     V1     V2     V3      V4      V5      V6     V7      V8      V9
#>    <chr>   <dbl>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
#>  1 "\"Do" -0.315 -0.699 -0.287  0.466   0.321   0.568   0.179 -0.0679 -1.00  
#>  2 "\"Go… -0.708 -0.983  0.464  0.589  -0.630   0.446  -1.03   0.447  -0.187 
#>  3 "\"He" -0.199 -0.592  0.259  0.157   0.224   0.456   0.127  0.177  -0.655 
#>  4 "\"He… -0.179 -0.690 -0.539  0.376  -0.367  -0.0658  0.378  0.302  -0.557 
#>  5 "\"Oh" -0.812 -0.327  0.640  1.11   -0.251   0.478  -0.229 -0.242  -0.538 
#>  6 "\"Th… -1.18   0.168 -0.246 -0.189   0.193   0.670   0.171 -0.0168 -0.585 
#>  7 "\"Ye… -0.245 -0.669  0.281  0.0824  0.343   0.977  -0.364  0.695  -0.768 
#>  8 "-"    -0.349 -0.412  0.701  0.501  -0.0532  0.704  -0.248  0.361  -0.757 
#>  9 "All"  -0.218 -0.669 -0.309  0.272  -0.122   0.277  -0.241 -0.0569  0.0158
#> 10 "You"  -0.843 -0.921  0.219  0.112   0.504   0.551   0.184  0.655  -0.958 
#> # … with 441 more rows, and 1 more variable: V10 <dbl>

Code of Conduct

Please note that the wordsalad project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Metadata

Version

0.2.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD 13
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd13
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd13
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows