my RIF estimator, i need ideassssss #31515

GiulioSurya · 2025-06-10T09:38:02Z

GiulioSurya
Jun 10, 2025

Hello everyone,

As part of my Master's thesis, I am developing a new estimator based on Isolation Forest that operates on residuals. Without delving into the theoretical background, which isn't relevant here, I'm currently facing a technical issue.

My repository is available at:
(Rif estimator)

The repository includes two modules:

RIF
_residual_gen

The estimator is implemented within the scikit-learn ecosystem and therefore inherits its methods. In particular, here is what happens:

When I call the fit method on the RIF estimator, it internally invokes fit_transform from _residual_gen, which is responsible for computing residuals and using them to fit the Isolation Forest.
These residuals are computed using a Random Forest model. To avoid data leakage, they are calculated either with out-of-bag (OOB) predictions or k-fold cross-validation. (There’s also a “vanilla” version without leakage control, but that’s not relevant for this issue.)

Once computed, the residuals are cached. Why?
Because when RIF.predict(X) is called:

If the input X is the same as the one used in RIF.fit(X), the cached residuals are reused.
If the input X is different, the previously fitted Random Forest is used to compute new residuals, and anomalies are detected on these.

Currently, this distinction between training and prediction data is handled using id(X), which checks whether the memory reference of the two datasets is the same. I also tried using a hash of the dataset content, but both approaches seem fragile and not robust in practice.

I’m looking for a better solution, either one that improves the logic of comparing the two datasets, or a new approach that achieves the same goal in a more reliable way.

Any help or suggestions would be greatly appreciated.

Best regards,
Giulio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

my RIF estimator, i need ideassssss #31515

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

my RIF estimator, i need ideassssss #31515

Uh oh!

Uh oh!

GiulioSurya Jun 10, 2025

Replies: 0 comments

GiulioSurya
Jun 10, 2025