Provides tools for detecting XOR-like patterns in variable pairs. Includes visualizations for pattern exploration.
Traditional feature selection methods often miss complex non-linear relationships where variables interact to produce class differences. The detectXOR package specifically targets XOR patterns - relationships where class discrimination only emerges through variable interactions, not individual variables alone.
🔍 XOR pattern detection - Statistical identification using χ² and Wilcoxon tests
📈 Correlation analysis - Class-wise Kendall τ coefficients
📊 Visualization - Spaghetti plots and decision boundary visualizations
⚡ Parallel processing - Multi-core acceleration for large datasets
🔬 Robust statistics - Winsorization and scaling options for outlier handling
Install the development version from GitHub:
# Install devtools if needed
if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }
# Install detectXOR
devtools::install_github("JornLotsch/detectXOR")The package requires R ≥ 3.5.0 and depends on:
dplyr,tibble(data manipulation)ggplot2,ggh4x,scales(visualization)future,future.apply,pbmcapply,parallel(parallel processing)reshape2,glue(data processing and string manipulation)DescTools(statistical tools)- Base R packages:
stats,utils,methods,grDevices
Optional packages (suggested):
testthat,knitr,rmarkdown(development and documentation)doParallel,foreach(additional parallel processing options)
library(detectXOR)
# Load example data
data(XOR_data)
# Detect XOR patterns with default settings
results <- detectXOR(XOR_data, class_col = "class")
# View summary
print(results$results_df)# Detection with custom thresholds and parallel processing
results <- detect_xor(
data = XOR_data,
class_col = "class",
p_threshold = 0.01,
tau_threshold = 0.4,
max_cores = 4,
extreme_handling = "winsorize",
scale_data = TRUE
)| Parameter | Type | Default | Description |
|---|---|---|---|
data |
data.frame | required | Input dataset with variables and class column |
class_col |
character | "class" |
Name of the class/target variable column |
check_tau |
logical | TRUE |
Compute class-wise Kendall τ correlations |
compute_axes_parallel_significance |
logical | TRUE |
Perform group-wise Wilcoxon tests |
p_threshold |
numeric | 0.05 |
Significance threshold for statistical tests |
tau_threshold |
numeric | 0.3 |
Minimum absolute τ for "strong" correlation |
abs_diff_threshold |
numeric | 20 |
Minimum absolute difference for practical significance |
split_method |
character | "quantile" |
Tile splitting method: "quantile" or "range" |
max_cores |
integer | NULL |
Maximum cores for parallel processing (auto-detect if NULL) |
extreme_handling |
character | "winsorize" |
Outlier handling: "winsorize", "remove", or "none" |
winsor_limits |
numeric vector | c(0.05, 0.95) |
Winsorization percentiles |
scale_data |
logical | TRUE |
Standardize variables before analysis |
use_complete |
logical | TRUE |
Use only complete cases (remove NA values) |
The detectXOR() function returns a list with two components:
| Column | Description |
|---|---|
var1, var2 |
Variable pair names |
xor_shape_detected |
Logical: XOR pattern identified |
chi_sq_p_value |
χ² test p-value for tile independence |
tau_class_0, tau_class_1 |
Class-wise Kendall τ coefficients |
tau_difference |
Absolute difference between class τ values |
wilcox_p_x, wilcox_p_y |
Wilcoxon test p-values for each axis |
significant_wilcox |
Logical: significant group differences detected |
Contains comprehensive analysis for each variable pair including:
- Tile pattern analysis results
- Statistical test outputs
- Processed data subsets
- Intermediate calculations
| Function | Description | Key Parameters |
|---|---|---|
generate_spaghetti_plot_from_results() |
Creates connected line plots showing variable trajectories for XOR-detected pairs | results, data, class_col, scale_data = TRUE |
generate_xy_plot_from_results() |
Generates scatter plots with decision boundary lines for detected XOR patterns | results, data, class_col, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" |
Both functions return ggplot objects that can be displayed or saved manually.
# Generate plots
generate_spaghetti_plot_from_results(results, XOR_data)
generate_xy_plot_from_results(results, XOR_data)| Function | Description | Key Parameters |
|---|---|---|
generate_xor_reportConsole() |
Creates console-friendly formatted report with optional plots | results, data, class_col, scale_data = TRUE, show_plots = TRUE |
generate_xor_reportHTML() |
Generates comprehensive HTML report with interactive elements | results, data, class_col, output_file, open_browser = TRUE |
# Generate formatted report
generate_xor_reportHTML(results, XOR_data, class_col = "class")The report will be automaticlaly opened in the system standard web browser.
- Pairwise dataset creation - Extract all variable pairs with preprocessing
- Tile pattern analysis - Divide variable space into 2×2 tiles and test for XOR-like distributions
- Statistical validation - Apply χ² tests for independence and Wilcoxon tests for group differences
- Correlation analysis - Compute class-wise Kendall τ to quantify relationship strength
- Result aggregation - Combine findings into interpretable summary format
- χ² Test: Tests independence of tile patterns vs. random distribution
- Wilcoxon rank sum: Evaluates group differences along variable axes
- Kendall τ: Measures monotonic correlation within each class separately
- Feature selection enhancement - Identify interaction features that complement traditional univariate methods
- Variable interaction discovery - Find synergistic variable pairs where class separation emerges only through combined effects
- Preprocessing for ensemble methods - Generate interaction features for boosting algorithms and neural networks
- Dimensionality reduction guidance - Preserve important variable interactions when reducing feature space
- Windows: Uses
future::multisessionfor parallel processing - Unix/Linux/macOS: Uses
pbmcapply::pbmclapplywith fork-based parallelism - Memory management: Automatic chunk-based processing for large datasets
detectXOR/
├── R/ # Package source code
├── man/ # Package documentation
├── data/ # Example dataset
├── issues/ # Problem reporting
└── analyses/ # Files used to generate or plot publictaion data sets (not in library)Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests on GitHub.
GPL-3
For citation details or to request a formal publication reference, please contact the maintainer.