FEA Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic #32119

cakedev0 · 2025-09-06T08:40:21Z

This PR refactors how missing values are handled in trees by:

removing missing-values handling from Criterion subclasses
making it the responsability of the splitter & partitionner only

This greatly simplifies the logic and unlocks for free the support of missing values for MAE trees.

Reference Issues/PRs

This PR accidentally fixes #32178

Otherwise, I looked but I didn't find any issue requesting this feature. I think it's because MAE trees are just too slow in sklearn for now so it's not much used... People wanting to use the MAE will just search for other options in sklearn or other libs.

What does this implement/fix? Explain your changes.

Currently, a part of the missing values support is done by each subclass of Criterion. I believe it's not a great design because:

Criterion is "X-blind", it's not aware of X values. It just looks at y and sample_weights in the order defined by the sorted indices (sample_indices). It never looks at X values. But somehow, by making it handle missing values, it does have some dealing with X values. Why not just use the ordering of sample_indices to take into account missing values? Like what we do for any other value (even inf/-inf for instance).
It requires each criterion to implement several methods.

So, to the question "Why not just use the ordering of sample_indices to take into account missing values? " my answer is :"yes, let's just do that". The result is removing 200 lines from _criterion.pxy while not increasing the complexity of the splitter and the partitionner (actually, it also simplifies a bit the splitter).

Any other comments?

I think it might unlock making the support for missing values + monotonic constraints easy, but I haven't look into it yet.

It might also simplify a bit the support for missing values + sparse, but this is still not easy to do.

github-actions · 2025-09-06T08:41:22Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e1379ce. Link to the linter CI: here}

cakedev0 · 2025-09-10T11:46:24Z

Note: at this day, tests passes on my laptop and most CI unit tests pipelines are successful. But some are failing, I managed to reproduce one of the failing pipelines locally using a Docker image. I still need to find the bug though.

cakedev0 · 2025-09-11T07:52:47Z

Tests pass! 🎊

Well, I learned the difference between memcpy and memmove the hard way 😂

adam2392 · 2025-09-14T17:15:09Z

Let's keep this in draft mode until we merge #32100. Ping here for a review after the initial PRs are sorted out

cakedev0 added 9 commits September 5, 2025 21:30

first draft: compilation ok

d5da0ec

fixed seg fault bug

61bbe68

some tests with decision tree and missing values are passing

974bd88

WIP

b02427f

WIP

59aae5b

fixed the silly mistake that was causing seg faults & infinite loops

13bd4e2

AE is now supported

944dd55

Fix last major bug - in random splits

d24078e

cleanup

2d867e6

github-actions bot added module:ensemble module:tree cython labels Sep 6, 2025

cakedev0 added 7 commits September 6, 2025 10:41

cleanup prints

f722f28

cleanup unsued var

4a1d061

cleanup

8abd690

attempt at fixing CI failing for 32-bits systems

dc66653

removed checks on criterion; included abs err in more tests

92a8041

removed a line added for exp I forgot to remove

6d6c1cc

WIP debuggin 32-bits tests

a209a9d

cakedev0 added 5 commits September 10, 2025 23:26

fixed silly bug that took way too long to find

628a61a

WIP debugging

b8365c5

tests are passing? Or am I too tired?

99a9a64

Probably fixed all the bugs

1c876bc

Removed all debug prints

79e12ea

cakedev0 added 3 commits September 12, 2025 16:46

cleanup; comments; unit test for swap slices

f687cfa

changelog

b3c06ae

Merge remote-tracking branch 'upstream/main' into tree-simpler-missing

7a5cfce

cakedev0 mentioned this pull request Sep 12, 2025

Unexepected behavior of tree splits: missing values handling is buggy? #32175

Open

cakedev0 marked this pull request as ready for review September 12, 2025 21:44

cakedev0 changed the title ~~[draft] Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic~~ Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic Sep 12, 2025

cakedev0 changed the title ~~Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic~~ FEA Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic Sep 12, 2025

adam2392 self-requested a review September 13, 2025 04:08

simpler next_p logic

ec29673

adam2392 marked this pull request as draft September 14, 2025 17:15

cakedev0 mentioned this pull request Sep 15, 2025

MNT: Decision trees: add test for split optimality #32193

Draft

cakedev0 added 2 commits September 15, 2025 21:44

Merge remote-tracking branch 'upstream/main' into tree-simpler-missing

3c068be

comments

e1379ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEA Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic #32119

FEA Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic #32119

cakedev0 commented Sep 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

cakedev0 commented Sep 10, 2025

Uh oh!

cakedev0 commented Sep 11, 2025

Uh oh!

adam2392 commented Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

FEA Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic #32119

Are you sure you want to change the base?

FEA Trees: Add support for missing values with criterion="absolute_error" by greatly simplifying the logic #32119

Conversation

cakedev0 commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

cakedev0 commented Sep 10, 2025

Uh oh!

cakedev0 commented Sep 11, 2025

Uh oh!

adam2392 commented Sep 14, 2025

Uh oh!

Uh oh!

cakedev0 commented Sep 6, 2025 •

edited

Loading

github-actions bot commented Sep 6, 2025 •

edited

Loading