my RIF estimator, i need ideassssss #31515
Unanswered
GiulioSurya
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
As part of my Master's thesis, I am developing a new estimator based on Isolation Forest that operates on residuals. Without delving into the theoretical background, which isn't relevant here, I'm currently facing a technical issue.
My repository is available at:
(Rif estimator)
The repository includes two modules:
RIF
_residual_gen
The estimator is implemented within the scikit-learn ecosystem and therefore inherits its methods. In particular, here is what happens:
When I call the
fit
method on theRIF
estimator, it internally invokesfit_transform
from_residual_gen
, which is responsible for computing residuals and using them to fit the Isolation Forest.These residuals are computed using a Random Forest model. To avoid data leakage, they are calculated either with out-of-bag (OOB) predictions or k-fold cross-validation. (Thereβs also a βvanillaβ version without leakage control, but thatβs not relevant for this issue.)
Once computed, the residuals are cached. Why?
Because when
RIF.predict(X)
is called:X
is the same as the one used inRIF.fit(X)
, the cached residuals are reused.X
is different, the previously fitted Random Forest is used to compute new residuals, and anomalies are detected on these.Currently, this distinction between training and prediction data is handled using
id(X)
, which checks whether the memory reference of the two datasets is the same. I also tried using a hash of the dataset content, but both approaches seem fragile and not robust in practice.Iβm looking for a better solution, either one that improves the logic of comparing the two datasets, or a new approach that achieves the same goal in a more reliable way.
Any help or suggestions would be greatly appreciated.
Best regards,
Giulio
Beta Was this translation helpful? Give feedback.
All reactions