MyNixOS website logo
Description

Sequential Outlier Identification for Model-Based Clustering.

Sequential outlier identification for Gaussian mixture models using the distribution of Mahalanobis distances. The optimal number of outliers is chosen based on the dissimilarity between the theoretical and observed distributions of the scaled squared sample Mahalanobis distances. Also includes an extension for Gaussian linear cluster-weighted models using the distribution of studentized residuals. Doherty, McNicholas, and White (2025) <doi:10.48550/arXiv.2505.11668>.

outlierMBC

Ultán P. Doherty 2025-05-08

Gaussian Mixture Models

gross_gmm_k3n1000o10 <- find_gross(gmm_k3n1000o10[, 1:2], max_out = 20)

ombc_gmm_k3n1000o10 <- ombc_gmm(
  gmm_k3n1000o10[, 1:2], comp_num = 3, max_out = 20, gross_outs = gross_gmm_k3n1000o10$gross_bool
)

print(ombc_gmm_k3n1000o10)
## Starting number of data points:   1010 
## Maximum number of outliers:   20 
## Number of gross outliers:     5 
## Final number of outliers:     10 (minimum dissimilarity)
plot(ombc_gmm_k3n1000o10)

gmm_k3n1000o10 |>
  mutate("ombc" = as.factor(ombc_gmm_k3n1000o10$labels), G = as.factor(G)) |>
  ggplot(aes(x = X1, y = X2, colour = ombc, shape = G)) +
  geom_point() +
  labs(colour = "outlierMBC", shape = "Simulation") +
  ggokabeito::scale_colour_okabe_ito(order = c(9, 1:3))

Linear Cluster-Weighted Models

gross_lcwm_k3n1000o10 <- find_gross(lcwm_k3n1000o10[, 1:2], max_out = 20)

ombc_lcwm_k3n1000o10 <- ombc_gmm(
  lcwm_k3n1000o10[, 1:2], comp_num = 3, max_out = 20, gross_outs = gross_lcwm_k3n1000o10$gross_bool
)

print(ombc_lcwm_k3n1000o10)
## Starting number of data points:   1010 
## Maximum number of outliers:   20 
## Number of gross outliers:     0 
## Final number of outliers:     10 (minimum dissimilarity)
plot(ombc_lcwm_k3n1000o10)

lcwm_k3n1000o10 |>
  mutate("ombc" = as.factor(ombc_lcwm_k3n1000o10$labels), G = as.factor(G)) |>
  ggplot(aes(x = X1, y = Y, colour = ombc, shape = G)) +
  geom_point() +
  labs(colour = "outlierMBC", shape = "Simulation") +
  ggokabeito::scale_colour_okabe_ito(order = c(9, 1:3))

Metadata

Version

0.0.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows