MyNixOS website logo
Description

Datasets from Computer Age Statistical Inference.

Provides the datasets from Efron & Hastie (2016, ISBN: 9781108107952), "Computer Age Statistical Inference: Algorithms, Evidence, and Data Science", in an accessible R format for those who want to use them for study or to try to reproduce analyses from the book.

Lifecycle:stable CRAN_Status R-Universe LastCommit

CASIdata CASIdata logo

CASIdata provides the datasets from Efron & Hastie (2016, ISBN: 9781108107952), Computer Age Statistical Inference: Algorithms, Evidence, and Data Science in an accessible R format for those who want to use them for teaching, study or to try to reproduce or extend analyses from the book. They were downloaded from Trevor Hastie’s web site, https://hastie.su.domains/CASI_files/DATA/, but quite a few files were messy and required some processing to make into R datasets.

Even so, some of the datasets may require data cleaning, renaming of variables, re-shaping or other tidying steps to be useful for analysis. But that’s part of learning.

Installation

This package is not yet on CRAN. You can install it from this GitHub repo or from R-universe

remotes::install.github("friendly/CASIdata")
install.packages('CASIdata', repos = c('https://friendly.r-universe.dev'))

Datasets included here

Loading package: CASIdata

DatasetdimTitle
DTI15443x4DTI Brain Imaging Data
als1822x371ALS Data
baseball18x3Baseball Batting Averages
bivnorm40x2Bivariate Normal Data
butterfly24x2Butterfly Species Data
cellinfusion25x4Cell Infusion Data
cholesterol164x2Cholesterol Data
diabetes442x12Diabetes Data
doseresponse11x2Dose Response Data
galaxy270x3Galaxy Data
haplotype197x102Human Ancestry Haplotype Data
insurance60x3Insurance Life Table Data
leukemia_small3571x72Leukemia Gene Expression Data (Small)
ncog96x6NCOG Head and Neck Cancer Data
nodes844x2Lymph Nodes Cancer Data
pediatric1620x7Pediatric Cancer Survival Data
police2748x1Police Racial Bias Data
prostz6032x1Prostate Cancer Z-values
student_score22x5Student Score Data
supernova39x11Type Ia Supernova Data
vasoconstriction39x2Vasoconstriction Data

Missing Datasets

The following dataset appears in data-raw/CASI-save.R but is not (yet) included in the package:

DatasetReason
SPAMVariable names need cleanup; requires mapping from UCI Spambase documentation

See data-raw/missing-datasets.md for details on resolving this.

External Datasets (Not Included)

These large datasets are referenced in the book but not included in the package due to size constraints. They can be downloaded directly from the sources listed below.

CASI datasets (too large for CRAN)

  • protein_kernel: 1708 x 1708 inner-product (kernel) matrix for human proteins (Section 19.6). Computed using a string kernel on bag-of-4-grams amino acid representations.
  • protein_label: Response labels (-1/+1) for the 1708 proteins (45 positives, 1663 negatives).
  • prostmat: 6033 x 102 gene expression matrix comparing 50 controls vs 52 prostate cancer patients (Section 3.3).
  • leukemia_big: 7128 x 72 gene expression matrix (10MB). A larger version of leukemia_small.

Image datasets (hosted externally)

Variable Renaming

Some datasets had variables renamed for clarity:

DatasetOriginalRenamed
butterflyx, yk, count
policeX2.411z
prostzX1.47236666651029z
galaxyReshaped from wide to long format with mag, red, freq

Example

No examples yet.

library(CASIdata)
## basic example code
Metadata

Version

0.2.1

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows