MyNixOS website logo
Description

High-Probability Lower Bounds for the Total Variance Distance.

An implementation of high-probability lower bounds for the total variance distance as introduced in Michel & Naef & Meinshausen (2020) <arXiv:2005.06006>. An estimated lower-bound (with high-probability) on the total variation distance between two probability distributions from which samples are observed can be obtained with the function HPLB.

HPLB

Build Status Build status

Overview

HPLB is a package intended to provided high-probability lower bounds (HPLB) for the total variance distance (TV) based on finite samples. In particular, it implements the abc and bc estimators described in Michel et al. 2020. The main idea is to compute HPLBs for TV from uni-dimensional projections that would practically be obtained from standard learning algorithms. For more information the user can refer to the original paper. Examples of use of the library are shown below.

Installation

The package should be (soon) available on CRAN, To install the package from github you can run

install.packages("devtools")
devtools::install_github("lorismichel/HPLB")

Examples:

We provide two examples, a shift in mean and a contamination example.

library(HPLB)
library(stats)
library(ranger)
library(distrEx)

## univariate shift in mean for normals using random forest as a lower-dimensional projceion
m <- n <- 500

x.train <- c(rnorm(n = m, mean = 0), rnorm(n = n, mean = 2))
y.train <- c(rep(0, m), rep(1, n))

x.test <- c(rnorm(n = m, mean = 0), rnorm(n = n, mean = 2))
y.test <- c(rep(0, m), rep(1, n))

# fitting a classification forest
rf <- ranger(factor(y)~.,data = data.frame(y = y.train, x = x.train))
rf.prob <- ranger(factor(y)~.,data = data.frame(y = y.train, x = x.train), probability = TRUE)

# getting the predictions on the test set (hard or soft)
preds.hard <- as.numeric(predict(rf, data.frame(x = x.test))$predictions)-1
preds.soft <- predict(rf.prob, data.frame(x = x.test))$predictions[,"1"]


# Total variation distance between N(0,1) and N(2,1)
TotalVarDist(Norm(0,1),Norm(2,1))

# getting lower-bounds on total variation distance N(0,1) and N(2,1) with different estimators

# binary classifier (bc)
HPLB(t = y.test, rho = preds.hard, estimator.type = "bc")

# adaptive binary classifier (bc)
HPLB(t = y.test, rho = preds.hard, estimator.type = "abc")


## contamination  using random forest as a lower-dimensional projceion
m <- n <- 500

x.train <- c(rnorm(n = m, mean = 0), ifelse(runif(n = n) <= 0.05, rnorm(n = n, mean = 5), rnorm(n = n, mean = 0)))
y.train <- c(rep(0, m), rep(1, n))

x.test <- c(rnorm(n = m, mean = 0), ifelse(runif(n = n) <= 0.05, rnorm(n = n, mean = 5), rnorm(n = n, mean = 0)))
y.test <- c(rep(0, m), rep(1, n))

# fitting a classification forest
rf <- ranger(factor(y)~.,data = data.frame(y = y.train, x = x.train))
rf.prob <- ranger(factor(y)~.,data = data.frame(y = y.train, x = x.train), probability = TRUE)

# getting the predictions on the test set (hard or soft)
preds.hard <- as.numeric(predict(rf, data.frame(x = x.test))$predictions)-1
preds.soft <- predict(rf.prob, data.frame(x = x.test))$predictions[,"1"]


# Total variation distance between N(0,1) and 0.05 x N(5,1) + 0.95 x N(0,1)
TotalVarDist(Norm(0,1), UnivarMixingDistribution(Norm(0,1),Norm(5,1), mixCoeff = c(0.95,0.05)))

# getting lower-bounds on total variation distance between N(0,1) and 0.05 x N(5,1) + 0.95 x N(0,1) with different estimators

# binary classifier (bc)
HPLB(t = y.test, rho = preds.hard, estimator.type = "bc")

# adaptive binary classifier (bc)
HPLB(t = y.test, rho = preds.hard, estimator.type = "abc")

Issues

To report an issue, please use the issue tracker on github.com.

Metadata

Version

1.0.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows