MyNixOS website logo
Description

Word and Phrase Frequency Tools for CHILDES.

Tools for extracting word and phrase frequencies from the Child Language Data Exchange System (CHILDES) database via the 'childesr' API. Supports type-level word counts, token-mode searches with simple wildcard patterns and part-of-speech filters, optional stemming, and Zipf-scaled frequencies. Provides normalization per number of tokens or utterances, speaker-role breakdowns, dataset summaries, and export to Excel workbooks for reproducible child language research. The CHILDES database is maintained at <https://talkbank.org/childes/>.

childeswordfreq

Nahar Albudoor

childeswordfreq is an R package for extracting word and phrase frequencies from the CHILDES database using the childesr API. It enables users to:

  • Get word counts by speaker role with word_counts().
  • Count surface phrases in utterance text with phrase_counts().
  • Normalize frequencies by corpus size or utterance count.
  • Export results as Excel workbooks with full metadata for replication.

The primary functions are:

  • word_counts() for type- or token-based frequencies by speaker role, with optional stemming, MOR-tier filtering, wildcard matching, normalization, and Zipf scaling.
  • phrase_counts() for surface phrase counts in utterance text, with optional wildcard patterns and per-utterance normalization.

Requirements

childeswordfreq does not require CLAN or local copies of CHILDES corpora. All queries go through childesr. Optional on-disk caching can be enabled to speed up repeated CHILDES queries.


Installation

# CRAN
install.packages("childeswordfreq")

# Or install the development version from GitHub
remotes::install_github("n-albudoor/childeswordfreq")

Then load the package:

library(childeswordfreq)

The package depends on childesr and its requirements. An active internet connection is required as all queries are executed through the TalkBank API.


Usage & Best Practices

For more detailed guidance on using the package, see the vignette:

browseVignettes("childeswordfreq-best-practices")

License

MIT.


Metadata

Version

0.2.0

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows