MyNixOS website logo
Description

Data Frame Fingerprints and Lineage Figures.

Profiles R data frames as compact data fingerprints using schema, shape, missingness, distribution, category, uniqueness, time, and role signals. It compares versions, identifies close relatives in a library of historical data sets, and renders portable HTML cards plus static PNG/PDF lineage figures for reports.

DataDNA

DataDNA is an R package that gives every data frame a compact fingerprint, lineage match, and report-ready identity figure.

Instead of only asking "what is in this table?", DataDNA asks:

  • What kind of data set is this?
  • How stable is its identity?
  • Did this version drift from the previous one?
  • Which columns changed their role, missingness, categories, or distribution?

The package is designed for analysts who receive CSVs, extracts, dashboards, or modeling data sets and need a fast way to recognize and compare them.

Example

library(DataDNA)

demo <- dna_example_customers()

dna <- data_dna(demo$customers_new, name = "customers_new")
dna

card <- dna_card(dna, file = "customers_dna.html")

dna_compare(demo$customers_old, demo$customers_new)
dna_diff(demo$customers_old, demo$customers_new)

dna_compare() combines exact schema overlap with shape, species, role structure, distribution, missingness, category, and identity signals. This makes the score feel more like a data fingerprint than a strict column-name check.

The package also includes lazy-loaded customers_old and customers_new example data sets.

Find the closest ancestor

library <- list(
  customers_2024 = data_dna(customers_old),
  customers_2025 = data_dna(customers_new)
)

match <- dna_match(customers_new, library)
match

dna_match_plot(match, file = "lineage.png")

dna_match_plot() is now the recommended reporting output. It renders a static PNG/PDF lineage figure with base R graphics: white background, compact ranking table, and restrained similarity lines that fit technical reports, papers, and slide decks better than a web page.

Core API

data_dna(df)
dna_card(df)
dna_compare(old_df, new_df)
dna_diff(old_df, new_df)
dna_match(new_df, dna_library)
dna_match_card(match)
dna_match_plot(match)
dna_species(df)

Installation

From GitHub:

install.packages("devtools")
devtools::install_github("TonyIsFool/DataDNA")

Or with the lighter remotes package:

install.packages("remotes")
remotes::install_github("TonyIsFool/DataDNA")

From a local source tarball:

install.packages("DataDNA_0.1.0.tar.gz", repos = NULL, type = "source")

Design

The profiling and comparison algorithms use base R. The HTML card uses the lightweight htmltools package so the result is portable and CRAN-friendly.

Metadata

Version

0.1.0

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows