Description

Word Embedding Research Framework for Psychological Science.

Description

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a group of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; and (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <doi:10.48550/arXiv.1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <doi:10.48550/arXiv.1607.04606>.

README.md

cran.r-project.org

PsychWordVec

Word Embedding Research Framework for Psychological Science.

An integrative toolbox of word embedding research that provides:

A collection of pre-trained static word vectors in the .RData compressed format.
A group of functions to process, analyze, and visualize word vectors.
A range of tests to examine conceptual associations, including the Word Embedding Association Test (Caliskan et al., 2017) and the Relative Norm Distance (Garg et al., 2018), with permutation test of significance.
A set of training methods to locally train (static) word vectors from text corpora, including Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and FastText (Bojanowski et al., 2017).

⚠️ All users should update the package to version ≥ 0.3.2. Old versions may have slow processing speed and other problems.

Author

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

📬 [email protected]

📋 psychbruce.github.io

Citation

Bao, H.-W.-S. (2022). PsychWordVec: Word embedding research framework for psychological science. https://CRAN.R-project.org/package=PsychWordVec
- Note: This is the original citation. Please refer to the information when you library(PsychWordVec) for the APA-7 format of the version you installed.
Bao, H.-W.-S., Wang, Z., Cheng, X., Su, Z., Yang, Y., Zhang, G., Wang, B., & Cai, H. (2023). Using word embeddings to investigate human psychology: Methods and applications. Advances in Psychological Science, 31(6), 887–904.
[包寒吴霜, 王梓西, 程曦, 苏展, 杨盈, 张光耀, 王博, 蔡华俭. (2023). 基于词嵌入技术的心理学研究：方法及应用. 心理科学进展, 31(6), 887–904.]

Installation

## Method 1: Install from CRAN
install.packages("PsychWordVec")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/PsychWordVec", force=TRUE)

Types of Data for `PsychWordVec`

	`embed`	`wordvec`
Basic class	matrix	data.table
Row size	vocabulary size	vocabulary size
Column size	dimension size	2 (variables: `word`, `vec`)
Advantage	faster (with matrix operation)	easier to inspect and manage
Function to get	`as_embed()`	`as_wordvec()`
Function to load	`load_embed()`	`load_wordvec()`

: Note: Word embedding refers to a natural language processing technique that embeds word semantics into a low-dimensional embedding matrix, with each word (actually token) quantified as a numeric vector representing its (uninterpretable) semantic features. Users are suggested to import word vectors data as the embed class using the function load_embed(), which would automatically normalize all word vectors to the unit length 1 (see the normalize() function) and accelerate the running of most functions in PsychWordVec.

Functions in `PsychWordVec`

Word Embeddings Data Management and Transformation
- as_embed(): from wordvec (data.table) to embed (matrix)
- as_wordvec(): from embed (matrix) to wordvec (data.table)
- load_embed(): load word embeddings data as embed (matrix)
- load_wordvec(): load word embeddings data as wordvec (data.table)
- data_transform(): transform plain text word vectors to wordvec or embed
Word Vectors Extraction, Linear Operation, and Visualization
- subset(): extract a subset of wordvec and embed
- normalize(): normalize all word vectors to the unit length 1
- get_wordvec(): extract word vectors
- sum_wordvec(): calculate the sum vector of multiple words
- plot_wordvec(): visualize word vectors
- plot_wordvec_tSNE(): 2D or 3D visualization with t-SNE
- orth_procrustes(): Orthogonal Procrustes matrix alignment
Word Semantic Similarity Analysis, Network Analysis, and Association Test
- cosine_similarity(): cos_sim() or cos_dist()
- pair_similarity(): compute a similarity matrix of word pairs
- plot_similarity(): visualize similarities of word pairs
- tab_similarity(): tabulate similarities of word pairs
- most_similar(): find the Top-N most similar words
- plot_network(): visualize a (partial correlation) network graph of words
- test_WEAT(): WEAT and SC-WEAT with permutation test of significance
- test_RND(): RND with permutation test of significance
Dictionary Automatic Expansion and Reliability Analysis
- dict_expand(): expand a dictionary from the most similar words
- dict_reliability(): reliability analysis and PCA of a dictionary
Local Training of Static Word Embeddings (Word2Vec, GloVe, and FastText)
- tokenize(): tokenize raw text
- train_wordvec(): train static word embeddings

See the documentation (help pages) for their usage and details.

r-PsychWordVec

PsychWordVec

Author

Citation

Installation

Types of Data for `PsychWordVec`

Functions in `PsychWordVec`

Version

License

Status

Source

Homepage

Platforms (76)

PsychWordVec

Author

Citation

Installation

Types of Data for PsychWordVec

Functions in PsychWordVec

Version

License

Status

Source

Homepage

Platforms76 (76)

Types of Data for `PsychWordVec`

Functions in `PsychWordVec`

Platforms (76)