Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The depth formula in iforest is incorrect #8549

Closed
@liuxinuestc

Description

@liuxinuestc

Description

In the paper of iforest the formula of average path length of unsuccessful search in BST as

avgPathLen = 2. * (np.log(n_samples_leaf   - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

but in code sklearn.ensemble.iforest._average_path_length the formula is

2 * (np.log(n_samples_leaf ) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

Steps/Code to Reproduce

Actual Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask]) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

-->

Expected Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf - 1.0) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask] - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

Versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugEasyWell-defined and straightforward way to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions