Multivariate Shift Detectors

We have a Source distribution, $P_{\text{source}}$, and a Target distribution, $P_{\text{target}}$:

Source distribution: $(X^{\text{source}}, Y^{\text{source}}) ∼ P_{\text{source}}$
Target distribution: $(X^{\text{target}}, Y^{\text{target}}) ∼ P_{\text{target}}$

Covariate drifts happen when:

$$P_{\text{target}}(Y \mid X) = P_{\text{source}}(Y \mid X) \quad \text{but} \quad P_{\text{target}}(X) \ne P_{\text{source}}(X)$$

Multivariate drift detectors

Maximum Mean Discrepancy Two-Sample Test - MMD Test

from source.mmd import MMD_test
"""
x_before (np.ndarray): First sample of shape (n, d) from source distribution.
x_after (np.ndarray): Second sample of shape (m, d) from reference distribution.
n_permutations (int): Number of permutations for the permutation test.
sigma (float): Bandwidth parameter for the Gaussian kernel.
"""
sigma = 1.0
n_permutations=1000
mmd_statistic, mmd_perms, pval = MMD_test(x_before, x_after, sigma, n_permutations=n_permutations)
print(f"MMD Statistic: {mmd}, p-value: {pval}")

Log-Likelihood Ratio Test - LLR Test

from source.ratio import LLR_test
"""
x_before (np.ndarray): First sample of shape (n, d) from source distribution.
x_after (np.ndarray): Second sample of shape (m, d) from reference distribution.
bandwidth (float): Bandwidth parameter for KDE.
n_permutations (int): Number of permutations for the permutation test. Default is 1000.
"""
bandwidth = 0.5
n_permutations=1000
llr_statistic, llr_perms, p_value = LLR_test(x_before, x_after, bandwidth=bandwidth, n_permutations=n_permutations)
print(f'LLR Statistic: {llr_statistic}, p-value: {p_value}')

Streaming batch data simulator

Data stream with simulated mean drifts

Drifts detected with MOVING reference window with LLR-test

Useful to build adaptive learning models in streaming environments. The learning model is updated or rebuilt as soon as a drift-event is detected.

Drifts detected with FIXED reference window with LLR-test

Useful to monitor automated systems and identify the full duration of the concept drift. The reference period should be representative.

Lambda framework for near real-time covariate monitoring

Offline layer: define the Reference Component

Using data collected offline, perform the following steps:

1. Define the reference distribution: select a fixed portion of the offline data to construct a stable covariate distribution representing normal condition.
1. Simulate streaming data via batch sampling: from the remaining offline data, draw multiple batches to simulate streaming behavior. For each batch, compute the statistic of interest.
1. Model null distribution: Aggregate the statistics to form a distribution (i.e. Null hypothesis) that captures the natural variability of the statistic under normal conditions. This distribution serves as a reference and is passed to the streaming layer for real-time monitoring.

Streaming layer: define Monitoring component

For each incoming batch of data in streaming:

1. Compare the current batch against the reference distribution by computing the statistic of interest.
1. Verify where the computed statistic fall within the null distribution derived in the offline layer. If the statistic exceeds a predifined threshold (e.g, quantile(1-$\alpha$)), flag the batch as a potential drift event and trigger an alert.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
imgs		imgs
notebooks		notebooks
source		source
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multivariate Shift Detectors

Multivariate drift detectors

Maximum Mean Discrepancy Two-Sample Test - MMD Test

Log-Likelihood Ratio Test - LLR Test

Streaming batch data simulator

Data stream with simulated mean drifts

Drifts detected with MOVING reference window with LLR-test

Drifts detected with FIXED reference window with LLR-test

Lambda framework for near real-time covariate monitoring

Offline layer: define the Reference Component

Streaming layer: define Monitoring component

About

Uh oh!

Uh oh!

Languages

License

giobbu/covariate-shift

Folders and files

Latest commit

History

Repository files navigation

Multivariate Shift Detectors

Multivariate drift detectors

Maximum Mean Discrepancy Two-Sample Test - MMD Test

Log-Likelihood Ratio Test - LLR Test

Streaming batch data simulator

Data stream with simulated mean drifts

Drifts detected with MOVING reference window with LLR-test

Drifts detected with FIXED reference window with LLR-test

Lambda framework for near real-time covariate monitoring

Offline layer: define the Reference Component

Streaming layer: define Monitoring component

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages