Description
Word and Phrase Frequency Tools for CHILDES.
Description
Tools for extracting word and phrase frequencies from the Child Language Data Exchange System (CHILDES) database via the 'childesr' API. Supports type-level word counts, token-mode searches with simple wildcard patterns and part-of-speech filters, optional stemming, and Zipf-scaled frequencies. Provides normalization per number of tokens or utterances, speaker-role breakdowns, dataset summaries, and export to Excel workbooks for reproducible child language research. The CHILDES database is maintained at <https://talkbank.org/childes/>.
README.md
childeswordfreq
Nahar Albudoor
childeswordfreq is an R package for extracting word and phrase frequencies from the CHILDES database using the childesr API. It enables users to:
- Get word counts by speaker role with
word_counts(). - Count surface phrases in utterance text with
phrase_counts(). - Normalize frequencies by corpus size or utterance count.
- Export results as Excel workbooks with full metadata for replication.
The primary functions are:
word_counts()for type- or token-based frequencies by speaker role, with optional stemming, MOR-tier filtering, wildcard matching, normalization, and Zipf scaling.phrase_counts()for surface phrase counts in utterance text, with optional wildcard patterns and per-utterance normalization.
Requirements
childeswordfreq does not require CLAN or local copies of CHILDES corpora. All queries go through childesr. Optional on-disk caching can be enabled to speed up repeated CHILDES queries.
Installation
# CRAN
install.packages("childeswordfreq")
# Or install the development version from GitHub
remotes::install_github("n-albudoor/childeswordfreq")
Then load the package:
library(childeswordfreq)
The package depends on childesr and its requirements. An active internet connection is required as all queries are executed through the TalkBank API.
Usage & Best Practices
For more detailed guidance on using the package, see the vignette:
browseVignettes("childeswordfreq-best-practices")
License
MIT.