We have a Source distribution,
-
Source distribution:
$(X^{\text{source}}, Y^{\text{source}}) ∼ P_{\text{source}}$ -
Target distribution:
$(X^{\text{target}}, Y^{\text{target}}) ∼ P_{\text{target}}$
Covariate drifts happen when:
from source.mmd import MMD_test
"""
x_before (np.ndarray): First sample of shape (n, d) from source distribution.
x_after (np.ndarray): Second sample of shape (m, d) from reference distribution.
n_permutations (int): Number of permutations for the permutation test.
sigma (float): Bandwidth parameter for the Gaussian kernel.
"""
sigma = 1.0
n_permutations=1000
mmd_statistic, mmd_perms, pval = MMD_test(x_before, x_after, sigma, n_permutations=n_permutations)
print(f"MMD Statistic: {mmd}, p-value: {pval}")
from source.ratio import LLR_test
"""
x_before (np.ndarray): First sample of shape (n, d) from source distribution.
x_after (np.ndarray): Second sample of shape (m, d) from reference distribution.
bandwidth (float): Bandwidth parameter for KDE.
n_permutations (int): Number of permutations for the permutation test. Default is 1000.
"""
bandwidth = 0.5
n_permutations=1000
llr_statistic, llr_perms, p_value = LLR_test(x_before, x_after, bandwidth=bandwidth, n_permutations=n_permutations)
print(f'LLR Statistic: {llr_statistic}, p-value: {p_value}')
Useful to build adaptive learning models in streaming environments. The learning model is updated or rebuilt as soon as a drift-event is detected.
Useful to monitor automated systems and identify the full duration of the concept drift. The reference period should be representative.
Using data collected offline, perform the following steps:
-
- Define the reference distribution: select a fixed portion of the offline data to construct a stable covariate distribution representing normal condition.
-
- Simulate streaming data via batch sampling: from the remaining offline data, draw multiple batches to simulate streaming behavior. For each batch, compute the statistic of interest.
For each incoming batch of data in streaming:
-
- Compare the current batch against the reference distribution by computing the statistic of interest.