Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

raixyzaditya
Copy link

Closes #27159

Added clarification in documentation explaining how candidate split thresholds
are chosen in DecisionTree and RandomForest. Thresholds are the midpoints between
successive distinct feature values, leading to O(N × F) candidate splits.

Copy link

github-actions bot commented Sep 12, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d36bdeb. Link to the linter CI: here

Comment on lines 1021 to 1022
distinct feature values. This results in up to ``O(N × F)`` candidate
thresholds, where ``N`` is the number of samples at the node and ``F`` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: not sure how relevant it is for the doc, but for one features, the number of candidate thresholds might be much less than N, it's actually:

  • number of unique values in X[:, f] minus one, if there are no missing values
  • 2 x n_uniques - 1, if there are missing values (and NaN is not counted in n_uniques)

Copy link
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @raixyzaditya.

I noticed you already opened #32180 with the same proposed changes, so I'll close this one in favour of that.

Also, just so you know, you don't need to open a new PR each time you want to update your changes. You can simply commit the changes and push them to the same feature branch.

@virchan virchan closed this Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RandomForest{Classifier,Regressor} split criterion documentation
3 participants