Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FEA Support missing-values in ExtraTrees* #28268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 171 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
171 commits
Select commit Hold shift + click to select a range
a8607b2
Added necessary Cython changes
adam2392 Dec 15, 2023
6d69617
Adding random splitter
adam2392 Dec 16, 2023
3474db3
WIP unit tests
adam2392 Dec 17, 2023
7904dee
Fully functioning extratrees with missing-values
adam2392 Dec 18, 2023
a6ac0e1
Add changelog
adam2392 Dec 18, 2023
65d1a51
Fix unit-test
adam2392 Dec 18, 2023
e062b72
Merge branch 'main' into extratreenan
adam2392 Dec 19, 2023
da1b1a1
Try again
adam2392 Dec 19, 2023
352dc4d
Merge branch 'main' into extratreenan
adam2392 Dec 19, 2023
f285a29
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Dec 19, 2023
5654a44
Merge branch 'main' into extratreenan
adam2392 Jan 2, 2024
a9a06e6
Merge branch 'main' into extratreenan
adam2392 Jan 18, 2024
92b21f7
Merge branch 'main' into extratreenan
adam2392 Jan 19, 2024
6b9e387
Merge branch 'main' into extratreenan
adam2392 Jan 19, 2024
1c2b807
Adding update
adam2392 Jan 19, 2024
79f24f4
Merge branch 'scikit-learn:main' into extratreenan
adam2392 Jan 25, 2024
2429774
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jan 25, 2024
69f2563
Fix splitting
adam2392 Jan 25, 2024
e2c6017
Move changelog
adam2392 Jan 25, 2024
078a255
Try to fix unit test
adam2392 Jan 25, 2024
3b9672e
Adding benchmarking scripts to run
adam2392 Jan 25, 2024
31acc24
Adding unit test for forest
adam2392 Jan 25, 2024
b376f51
New file
adam2392 Jan 25, 2024
da56a55
Commit benchmark changes
adam2392 Jan 25, 2024
6ba691f
Add changelog entry
adam2392 Jan 25, 2024
61723a5
Try again
adam2392 Jan 25, 2024
85c34bf
Remove extra file
adam2392 Jan 25, 2024
64ad1cd
Fix extratrees
adam2392 Jan 25, 2024
1d0e7f6
Try again
adam2392 Jan 26, 2024
a4a70ae
Merge branch 'main' into extratreenan
adam2392 Jan 26, 2024
f3f14e5
Try again
adam2392 Jan 26, 2024
3f82051
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jan 26, 2024
bf6cf27
Try again
adam2392 Jan 26, 2024
e01b6c5
Try again
adam2392 Jan 26, 2024
d336470
Tray again
adam2392 Jan 26, 2024
7726518
Not reproducible on local
adam2392 Jan 26, 2024
30af450
Try again
adam2392 Jan 26, 2024
dbdb0f8
Fix Ci
adam2392 Jan 26, 2024
f85ed1a
Try again
adam2392 Jan 29, 2024
4124ac4
Merge branch 'main' into extratreenan
adam2392 Jan 29, 2024
ff7f5d8
Try to fix test
adam2392 Jan 29, 2024
5f6a728
Try again
adam2392 Jan 29, 2024
b152b84
Try again
adam2392 Jan 29, 2024
e9ee8b4
again
adam2392 Jan 29, 2024
83324b7
Merge main
adam2392 Jan 30, 2024
01cb1ad
Try again on ci
adam2392 Jan 30, 2024
8a28e68
Fix bug and add unit test
adam2392 Jan 31, 2024
4cf7bef
Add changelog entry
adam2392 Jan 31, 2024
02fd866
Fix lint
adam2392 Jan 31, 2024
2ecdffe
Changelog and fix build
adam2392 Jan 31, 2024
0fc8f58
Fix unit test
adam2392 Jan 31, 2024
f2a7364
Merge branch 'main' into regtree
adam2392 Jan 31, 2024
60baa80
Merge branch 'regtree' into extratreenan
adam2392 Jan 31, 2024
0dd8cae
Merge branch 'main' into extratreenan
adam2392 Jan 31, 2024
260ad04
Almost working
adam2392 Jan 31, 2024
18701d8
Merge branch 'extratreenan' into extraforest
adam2392 Jan 31, 2024
da61a79
Better performing forests
adam2392 Jan 31, 2024
ffb2c68
Cleanup
adam2392 Jan 31, 2024
2b3de39
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jan 31, 2024
a4b2f43
Apply suggestions from code review
adam2392 Feb 1, 2024
442968b
Merge
adam2392 Feb 1, 2024
6f070c2
Merge branch 'regtree' of https://github.com/adam2392/scikit-learn in…
adam2392 Feb 1, 2024
4782b8a
Change unit test according to Guillame
adam2392 Feb 1, 2024
cfb3ad7
Merge branch 'main' into regtree
adam2392 Feb 1, 2024
284a450
Merge branch 'regtree' of https://github.com/adam2392/scikit-learn in…
adam2392 Feb 1, 2024
13ddf83
Fix unit test docstiring
adam2392 Feb 1, 2024
e6a28f6
Merging
adam2392 Feb 1, 2024
7384b22
Add fixes to unit-test
adam2392 Feb 1, 2024
40ad130
Fix lint
adam2392 Feb 1, 2024
b5c8ddc
Merge branch 'main' into extratreenan
adam2392 Feb 1, 2024
4253cc7
Fix lint
adam2392 Feb 1, 2024
c72e462
Try again
adam2392 Feb 1, 2024
e1fe9be
TST improve regression test
glemaitre Feb 2, 2024
c3e01b8
Merge branch 'regtree' into extratreenan
adam2392 Feb 2, 2024
a4adf70
Try new dataset
adam2392 Feb 2, 2024
80047a6
Merge branch 'main' into extratreenan
adam2392 Feb 2, 2024
4ad56d7
Merge
adam2392 Feb 2, 2024
b7a50fc
Merge branch 'main' into extratreenan
adam2392 Feb 5, 2024
13ca9ee
Add new expected score
adam2392 Feb 6, 2024
e5bff94
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Feb 6, 2024
e8ba177
Merge branch 'main' into extratreenan
adam2392 Feb 13, 2024
19521f1
Clean up
adam2392 Feb 13, 2024
8418b6b
Merge branch 'main' into extratreenan
adam2392 Feb 13, 2024
e72bb62
Cleanup merge
adam2392 Feb 13, 2024
f0b03ab
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Feb 13, 2024
8890d1f
Merged main
adam2392 Feb 13, 2024
45bd80c
merge extratreenan
adam2392 Feb 13, 2024
5398a39
Correct without limiting depth
adam2392 Feb 14, 2024
7b07f1a
Try with noise
adam2392 Feb 15, 2024
17fbce8
Merge branch 'main' into extratreenan
adam2392 Feb 15, 2024
04ceef0
Fix unit test for global seed
adam2392 Feb 15, 2024
065f60b
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Feb 15, 2024
8fb59ea
Merge branch 'extratreenan' into extraforest
adam2392 Feb 15, 2024
e4c9b85
Fix lint
adam2392 Feb 15, 2024
ef1dda5
Merge branch 'main' into extratreenan
adam2392 Feb 16, 2024
03d3869
Merge branch 'main' into extraforest
adam2392 Feb 16, 2024
60b9e43
Merge branch 'main' into extratreenan
adam2392 Feb 19, 2024
d85ca3d
Merge branch 'main' into extratreenan
adam2392 Feb 19, 2024
97acf36
Merge branch 'main' into extratreenan
adam2392 Feb 21, 2024
116de12
Merge branch 'main' into extratreenan
adam2392 Feb 25, 2024
f00aa61
Merge branch 'main' into extratreenan
adam2392 Feb 26, 2024
f2de8e4
Merge branch 'main' into extratreenan
adam2392 Mar 2, 2024
c849418
Merge branch 'main' into extratreenan
adam2392 Mar 14, 2024
02882ae
Merge branch 'main' into extratreenan
adam2392 Mar 15, 2024
401c8d2
Merge branch 'main' into extratreenan
adam2392 Mar 19, 2024
e504b22
Merge branch 'main' into extratreenan
adam2392 Mar 26, 2024
3bcadd1
Merge branch 'main' into extratreenan
adam2392 Apr 15, 2024
0b572a5
Merge branch 'main' into extratreenan
adam2392 May 23, 2024
8096b50
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 May 23, 2024
13572cd
Fix isolation forest that relies on extratree
adam2392 May 23, 2024
42c1a7f
Merge branch 'main' into extratreenan
adam2392 Jun 10, 2024
6345933
Merge branch 'main' into extratreenan
adam2392 Jun 12, 2024
c67895a
Merge branch 'main' into extratreenan
adam2392 Jun 14, 2024
56f04b5
DOC update changelog
glemaitre Jun 20, 2024
0cceaf2
Merge remote-tracking branch 'origin/main' into pr/adam2392/27966
glemaitre Jun 20, 2024
73e4fd8
Merge branch 'main' into extratreenan
adam2392 Jun 20, 2024
d74ba65
Address guillame comments
adam2392 Jun 20, 2024
3526bcb
Do not force all finnite
adam2392 Jun 20, 2024
000363a
Merge branch 'main' into extratreenan
adam2392 Jun 26, 2024
6f63c86
Merge branch 'main' into extraforest
adam2392 Jun 27, 2024
97f24a4
Apply suggestions from code review
adam2392 Jul 1, 2024
994508b
Merging
adam2392 Jul 1, 2024
b53e881
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jul 1, 2024
e2dcca3
Merge branch 'main' into extratreenan
adam2392 Jul 1, 2024
ff39dba
Add extra unit test
adam2392 Jul 1, 2024
618cf53
Merge branch 'main' into extratreenan
adam2392 Jul 1, 2024
e8aadd2
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jul 1, 2024
818c1e5
Fix codecoverage
adam2392 Jul 1, 2024
38154ae
Merge branch 'main' into extratreenan
adam2392 Jul 2, 2024
8226eee
Apply suggestions from code review
adam2392 Jul 2, 2024
9d984a8
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jul 2, 2024
a2f9322
Merge branch 'main' into extratreenan
adam2392 Jul 2, 2024
b817cda
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jul 2, 2024
198769f
Fix lint
adam2392 Jul 2, 2024
e15fa4f
Merge branch 'extratreenan' into extraforest
adam2392 Jul 2, 2024
709b081
Add changelog
adam2392 Jul 2, 2024
3cb6c5e
Remove benchmark files
adam2392 Jul 2, 2024
c508046
Update _splitter.pyx
adam2392 Jul 3, 2024
5a5e0c3
Merge branch 'main' into extratreenan
adam2392 Jul 3, 2024
36e7c10
Fix unit test
adam2392 Jul 3, 2024
542019f
Merge branch 'main' into extratreenan
adam2392 Jul 3, 2024
ac57082
Apply suggestions from code review
adam2392 Jul 4, 2024
0753aa2
Address omar's comments
adam2392 Jul 4, 2024
4266679
Merge branch 'main' into extratreenan
adam2392 Jul 4, 2024
d80b60f
Remove if/else branch
adam2392 Jul 5, 2024
ac6b25a
Add extra section documenting missing-value treatment in extratrees
adam2392 Jul 5, 2024
eec51df
Revert the change which sets max_depth
OmarManzoor Jul 5, 2024
b7afac9
Revert the change
OmarManzoor Jul 5, 2024
01109f4
Merge branch 'extratreenan' into extraforest
adam2392 Jul 5, 2024
90e8cec
Make extratrees documented
adam2392 Jul 6, 2024
9e40707
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jul 6, 2024
058671a
Merge branch 'main' into extratreenan
adam2392 Jul 6, 2024
708b614
Merge branch 'main' into extratreenan
adam2392 Jul 6, 2024
cafbde1
Merge branch 'extratreenan' of https://github.com/adam2392/scikit-lea…
adam2392 Jul 6, 2024
acd5a19
Fix cdef
adam2392 Jul 6, 2024
6b4906a
Fix circle
adam2392 Jul 7, 2024
699c97a
Try again
adam2392 Jul 7, 2024
0313819
Merge branch 'main' into extraforest
adam2392 Jul 7, 2024
b125c1d
Merge branch 'extratreenan' into extraforest
adam2392 Jul 7, 2024
47bb90f
Merge branch 'extraforest' of https://github.com/adam2392/scikit-lear…
adam2392 Jul 7, 2024
f3ec8e1
Merging main
adam2392 Jul 9, 2024
11e9b9e
Remove diff
adam2392 Jul 9, 2024
4ea9bb8
Remove diff
adam2392 Jul 9, 2024
1978f93
Update _classes.py
adam2392 Jul 9, 2024
890f137
Update _classes.py
adam2392 Jul 9, 2024
1441925
Add unit test
adam2392 Jul 9, 2024
329f88f
Merge branch 'extraforest' of https://github.com/adam2392/scikit-lear…
adam2392 Jul 9, 2024
bf16fe9
Fix lint
adam2392 Jul 9, 2024
f782364
Update doc/whats_new/v1.6.rst
adam2392 Jul 9, 2024
e7207f1
Merge branch 'main' into extraforest
adam2392 Jul 9, 2024
963de46
Merge branch 'main' into extraforest
OmarManzoor Jul 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/whats_new/v1.6.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,11 @@ Changelog
:pr:`28622` by :user:`Adam Li <adam2392>` and
:user:`Sérgio Pereira <sergiormpereira>`.

- |Feature| :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now support
missing-values in the data matrix `X`. Missing-values are handled by randomly moving all of
the samples to the left, or right child node as the tree is traversed.
:pr:`28268` by :user:`Adam Li <adam2392>`.

:mod:`sklearn.impute`
.....................

Expand Down
22 changes: 17 additions & 5 deletions sklearn/ensemble/tests/test_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -1767,6 +1767,8 @@ def test_estimators_samples(ForestClass, bootstrap, seed):
[
(datasets.make_regression, RandomForestRegressor),
(datasets.make_classification, RandomForestClassifier),
(datasets.make_regression, ExtraTreesRegressor),
(datasets.make_classification, ExtraTreesClassifier),
],
)
def test_missing_values_is_resilient(make_data, Forest):
Expand Down Expand Up @@ -1800,12 +1802,21 @@ def test_missing_values_is_resilient(make_data, Forest):
assert score_with_missing >= 0.80 * score_without_missing


@pytest.mark.parametrize("Forest", [RandomForestClassifier, RandomForestRegressor])
@pytest.mark.parametrize(
"Forest",
[
RandomForestClassifier,
RandomForestRegressor,
ExtraTreesRegressor,
ExtraTreesClassifier,
],
)
def test_missing_value_is_predictive(Forest):
"""Check that the forest learns when missing values are only present for
a predictive feature."""
rng = np.random.RandomState(0)
n_samples = 300
expected_score = 0.75

X_non_predictive = rng.standard_normal(size=(n_samples, 10))
y = rng.randint(0, high=2, size=n_samples)
Expand Down Expand Up @@ -1835,19 +1846,20 @@ def test_missing_value_is_predictive(Forest):

predictive_test_score = forest_predictive.score(X_predictive_test, y_test)

assert predictive_test_score >= 0.75
assert predictive_test_score >= expected_score
assert predictive_test_score >= forest_non_predictive.score(
X_non_predictive_test, y_test
)


def test_non_supported_criterion_raises_error_with_missing_values():
@pytest.mark.parametrize("Forest", FOREST_REGRESSORS.values())
def test_non_supported_criterion_raises_error_with_missing_values(Forest):
"""Raise error for unsupported criterion when there are missing values."""
X = np.array([[0, 1, 2], [np.nan, 0, 2.0]])
y = [0.5, 1.0]

forest = RandomForestRegressor(criterion="absolute_error")
forest = Forest(criterion="absolute_error")

msg = "RandomForestRegressor does not accept missing values"
msg = ".*does not accept missing values"
with pytest.raises(ValueError, match=msg):
forest.fit(X, y)
20 changes: 20 additions & 0 deletions sklearn/tree/_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1686,6 +1686,16 @@ def __init__(
monotonic_cst=monotonic_cst,
)

def _more_tags(self):
# XXX: nan is only supported for dense arrays, but we set this for the
# common test to pass, specifically: check_estimators_nan_inf
allow_nan = self.splitter == "random" and self.criterion in {
"gini",
"log_loss",
"entropy",
}
return {"multilabel": True, "allow_nan": allow_nan}


class ExtraTreeRegressor(DecisionTreeRegressor):
"""An extremely randomized tree regressor.
Expand Down Expand Up @@ -1929,3 +1939,13 @@ def __init__(
ccp_alpha=ccp_alpha,
monotonic_cst=monotonic_cst,
)

def _more_tags(self):
# XXX: nan is only supported for dense arrays, but we set this for the
# common test to pass, specifically: check_estimators_nan_inf
allow_nan = self.splitter == "random" and self.criterion in {
"squared_error",
"friedman_mse",
"poisson",
}
return {"allow_nan": allow_nan}