MyNixOS website logo
Description

XOR Pattern Detection and Visualization.

Provides tools for detecting XOR-like patterns in variable pairs in two-class data sets. Includes visualizations for pattern exploration and reporting capabilities with both text and HTML output formats.

detectXOR: XOR pattern detection and visualization in R

Provides tools for detecting XOR-like patterns in variable pairs. Includes visualizations for pattern exploration.

Overview

Traditional feature selection methods often miss complex non-linear relationships where variables interact to produce class differences. The detectXOR package specifically targets XOR patterns - relationships where class discrimination only emerges through variable interactions, not individual variables alone.

Key capabilities

🔍 XOR pattern detection - Statistical identification using χ² and Wilcoxon tests
📈 Correlation analysis - Class-wise Kendall τ coefficients
📊 Visualization - Spaghetti plots and decision boundary visualizations
Parallel processing - Multi-core acceleration for large datasets
🔬 Robust statistics - Winsorization and scaling options for outlier handling

Installation

Install the development version from GitHub:

# Install devtools if needed
if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }
# Install detectXOR
devtools::install_github("JornLotsch/detectXOR")

Dependencies

The package requires R ≥ 3.5.0 and depends on:

  • dplyr, tibble (data manipulation)
  • ggplot2, ggh4x, scales (visualization)
  • future, future.apply, pbmcapply, parallel (parallel processing)
  • reshape2, glue (data processing and string manipulation)
  • DescTools (statistical tools)
  • Base R packages: stats, utils, methods, grDevices

Optional packages (suggested):

  • testthat, knitr, rmarkdown (development and documentation)
  • doParallel, foreach (additional parallel processing options)

Quick start

Basic XOR detection

library(detectXOR)
# Load example data
data(XOR_data)
# Detect XOR patterns with default settings
results <- detectXOR(XOR_data, class_col = "class")
# View summary
print(results$results_df)

Usage with custom parameters

# Detection with custom thresholds and parallel processing
results <- detect_xor(
  data = XOR_data,
  class_col = "class",
  p_threshold = 0.01,
  tau_threshold = 0.4,
  max_cores = 4,
  extreme_handling = "winsorize",
  scale_data = TRUE
)

Function parameters

detectXOR() - Main detection function

ParameterTypeDefaultDescription
datadata.framerequiredInput dataset with variables and class column
class_colcharacter"class"Name of the class/target variable column
check_taulogicalTRUECompute class-wise Kendall τ correlations
compute_axes_parallel_significancelogicalTRUEPerform group-wise Wilcoxon tests
p_thresholdnumeric0.05Significance threshold for statistical tests
tau_thresholdnumeric0.3Minimum absolute τ for "strong" correlation
abs_diff_thresholdnumeric20Minimum absolute difference for practical significance
split_methodcharacter"quantile"Tile splitting method: "quantile" or "range"
max_coresintegerNULLMaximum cores for parallel processing (auto-detect if NULL)
extreme_handlingcharacter"winsorize"Outlier handling: "winsorize", "remove", or "none"
winsor_limitsnumeric vectorc(0.05, 0.95)Winsorization percentiles
scale_datalogicalTRUEStandardize variables before analysis
use_completelogicalTRUEUse only complete cases (remove NA values)

Output structure

The detectXOR() function returns a list with two components:

results_df - Summary data frame

ColumnDescription
var1, var2Variable pair names
xor_shape_detectedLogical: XOR pattern identified
chi_sq_p_valueχ² test p-value for tile independencepractical_significanceLogical: meets practical significance threshold
tau_class_0, tau_class_1Class-wise Kendall τ coefficients
tau_differenceAbsolute difference between class τ values
wilcox_p_x, wilcox_p_yWilcoxon test p-values for each axis
significant_wilcoxLogical: significant group differences detected

pair_list - Detailed results

Contains comprehensive analysis for each variable pair including:

  • Tile pattern analysis results
  • Statistical test outputs
  • Processed data subsets
  • Intermediate calculations

Visualization functions

FunctionDescriptionKey Parameters
generate_spaghetti_plot_from_results()Creates connected line plots showing variable trajectories for XOR-detected pairsresults, data, class_col, scale_data = TRUE
generate_xy_plot_from_results()Generates scatter plots with decision boundary lines for detected XOR patternsresults, data, class_col, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile"

Both functions return ggplot objects that can be displayed or saved manually.

# Generate plots
generate_spaghetti_plot_from_results(results, XOR_data) 
generate_xy_plot_from_results(results, XOR_data)

Example plots

Reporting functions

FunctionDescriptionKey Parameters
generate_xor_reportConsole()Creates console-friendly formatted report with optional plotsresults, data, class_col, scale_data = TRUE, show_plots = TRUE
generate_xor_reportHTML()Generates comprehensive HTML report with interactive elementsresults, data, class_col, output_file, open_browser = TRUE

Example report

# Generate formatted report 
generate_xor_reportHTML(results, XOR_data, class_col = "class")

The report will be automaticlaly opened in the system standard web browser.

Methodology

XOR detection pipeline

  1. Pairwise dataset creation - Extract all variable pairs with preprocessing
  2. Tile pattern analysis - Divide variable space into 2×2 tiles and test for XOR-like distributions
  3. Statistical validation - Apply χ² tests for independence and Wilcoxon tests for group differences
  4. Correlation analysis - Compute class-wise Kendall τ to quantify relationship strength
  5. Result aggregation - Combine findings into interpretable summary format

Statistical tests

  • χ² Test: Tests independence of tile patterns vs. random distribution
  • Wilcoxon rank sum: Evaluates group differences along variable axes
  • Kendall τ: Measures monotonic correlation within each class separately

Use cases

Machine learning

  • Feature selection enhancement - Identify interaction features that complement traditional univariate methods
  • Variable interaction discovery - Find synergistic variable pairs where class separation emerges only through combined effects
  • Preprocessing for ensemble methods - Generate interaction features for boosting algorithms and neural networks
  • Dimensionality reduction guidance - Preserve important variable interactions when reducing feature space

Technical details

Cross-platform compatibility

  • Windows: Uses future::multisession for parallel processing
  • Unix/Linux/macOS: Uses pbmcapply::pbmclapply with fork-based parallelism
  • Memory management: Automatic chunk-based processing for large datasets

Package structure

detectXOR/
├── R/                 # Package source code
├── man/               # Package documentation
├── data/              # Example dataset
├── issues/            # Problem reporting
└── analyses/          # Files used to generate or plot publictaion data sets (not in library)

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests on GitHub.

License

GPL-3

Citation

For citation details or to request a formal publication reference, please contact the maintainer.

Metadata

Version

0.1.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows