MyNixOS website logo
Description

Base-Rate Item Evaluation and Typicality Scoring Using Large Language Models.

Download typicality rating datasets, generate new stereotype-based typicality ratings using large language models via the Inference Providers API (<https://huggingface.co/docs/inference-providers>), and evaluate them against human-annotated validation data. Also includes functions to extract stereotype strength and base-rate items from typicality matrices. For more details see Beucler et al. (2025) <doi:10.31234/osf.io/eqrfu_v1>.

‘baserater’ package

The baserater package allows to:

  • Download LLM‑generated base-rate item datasets and human validation ratings from the original paper.
  • Generate new typicality scores with any ‘Inference Provider’ model.
  • Benchmark new scores against human ground truth, and compare performance against strong LLM baselines.
  • Build base‑rate items database from typicality matrices.

It is designed to streamline the creation of base-rate neglect items for reasoning experiments. A base-rate neglect item typically involves two groups (e.g., “engineers” and “construction workers”) and a descriptive trait (e.g., “nerdy”). Participants are presented with statistical information (base-rates; e.g., “There are 995 construction workers and 5 engineers”) and stereotypical information (the descriptive trait). Their task is to decide the most likely group membership of an individual described by that trait. The “typicality rating” generated by large language models quantifies how strongly certain traits (e.g., “nerdy,” “kind”) or descriptions are (stereo)typically associated with specific groups (e.g., engineers, nurses). This allows researchers to precisely measure and control “stereotype strength”–the extent to which a given description is perceived as belonging more strongly to one group over another (e.g., the trait “nerdy” is typically seen as more characteristic of engineers than of construction workers).

The full documentation along with a tutorial is available at: https://jeremie-beucler.github.io/baserater/

To learn more about the theoretical framework and studies underlying the ‘baserater’ package, see the paper: Using Large Language Models to Estimate Belief Strength in Reasoning (Beucler et al., 2025), available at https://doi.org/10.31234/osf.io/eqrfu_v1.

Installation

To install the package, run:

install.packages("baserater")

# Alternatively, install via devtools
# install.packages("devtools")  # if not yet installed
devtools::install_github("Jeremie-Beucler/baserater")

Citation

Please cite the package as:

Beucler, J. (2025). baserater: An R package using large language models to estimate belief strength in reasoning. R package version 0.1.2. https://doi.org/10.32614/CRAN.package.baserater

Package overview



Schematic overview of the ‘baserater’ package.

License

MIT

Inference Provider Resources

The baserater package can connect to various inference providers such as ‘Together AI’, ‘Hugging Face’ Inference, ‘Fireworks’, and ‘Replicate’.
These platforms host or serve large language models and allow you to query them through a standard style chat/completions API.

Here are some useful resources to get started:

Before generating new scores using the generate_typicality() function, make sure you have completed the following setup steps:

  • Obtain your provider’s API URL and token:
    You can pass them directly to the function or store them as environment variables in R, for example:
    Sys.setenv(PROVIDER_API_URL = "https://api.together.xyz/v1/chat/completions")
    Sys.setenv(PROVIDER_API_TOKEN = "your_secret_token")

  • Check model availability and license terms:
    Some models require that you accept license terms before use. Check your provider’s model catalog for details.

  • Verify the correct model name for your provider:
    Model identifiers can differ between providers (for example, "meta-llama/Llama-3.3-70B-Instruct-Turbo" on ‘Together AI’ vs. "meta-llama/Llama-3.3-70B-Instruct" on ‘Hugging Face’).
    Always use the exact model name listed in your provider’s documentation.

Metadata

Version

0.1.2

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows