MyNixOS website logo
Description

Sample-Based Estimation of Kullback-Leibler Divergence.

Estimation algorithms for Kullback-Leibler divergence between two probability distributions, based on one or two samples, and including uncertainty quantification. Distributions can be uni- or multivariate and continuous, discrete or mixed.

kldest: Kullback-Leibler divergence estimation

R-CMD-check Codecov testcoverage

The goal of kldest is to estimate Kullback-Leibler (KL) divergence $D_{KL}(P||Q)$ between two probability distributions $P$ and $Q$ based on:

  • a sample $x_1,...,x_n$ from $P$ and the probability density $q$ of $Q$, or
  • samples $x_1,...,x_n$ from $P$ and $y_1,...,y_m$ from $Q$.

The distributions $P$ and $Q$ may be uni- or multivariate, and they may be discrete, continuous or mixed discrete/continuous.

Different estimation algorithms are provided for continuous distributions, either based on nearest neighbour density estimation or kernel density estimation. Confidence intervals for KL divergence can also be computed, either via subsampling (preferred) or bootstrapping.

Installation

You can install kldest from CRAN:

install.packages("kldest")

Alternatively, can install the development version of kldest from GitHub with:

# install.packages("devtools")
devtools::install_github("niklhart/kldest")

A minimal example for KL divergence estimation

KL divergence estimation based on nearest neighbour density estimates is the most flexible approach.

library(kldest)

Set a seed for reproducibility

set.seed(0)

KL divergence between 1-D Gaussians

Analytical KL divergence:

kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
#> [1] 0.4431472

Estimate based on two samples from these Gaussians:

X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
kld_est_nn(X, Y)
#> [1] 0.2169136

Estimate based on a sample from the first Gaussian and the density of the second:

q <- function(x) dnorm(x, mean = 1, sd =2)
kld_est_nn(X, q = q)
#> [1] 0.6374628

Uncertainty quantification via subsampling:

kld_ci_subsampling(X, q = q)
#> $est
#> [1] 0.6374628
#> 
#> $ci
#>      2.5%     97.5% 
#> 0.2601375 0.9008446

KL divergence between 2-D Gaussians

Analytical KL divergence between an uncorrelated and a correlated Gaussian:

kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
#> [1] 0.5

Estimate based on two samples from these Gaussians:

X1 <- rnorm(100)
X2 <- rnorm(100)
Y1 <- rnorm(100)
Y2 <- Y1 + rnorm(100)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)

kld_est_nn(X, Y)
#> [1] 0.3358918
Metadata

Version

1.0.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows