Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

cakedev0
Copy link
Contributor

In the user guide, remove the sentence:

Note that it fits much slower than the MSE criterion.

From:

Setting criterion="poisson" might be a good choice if your target is a count or a frequency (count per some unit). In any case,
is a necessary condition to use this criterion. Note that it fits much slower than the MSE criterion. For performance reasons the actual implementation minimizes the half mean poisson deviance, i.e. the mean poisson deviance divided by 2.

As it's not true: poisson criterion is only ~10% slower than MSE criterion. Did the experiment with the same script than for this PR #32181 for both criteria, the execution time is vastly dominated by the sort (sort_samples_and_feature_values).

Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: acaa721. Link to the linter CI: here

@cakedev0 cakedev0 changed the title DOC: Poisson criterion is not slower than MSE DOC: Poisson criterion is not slower than MSE in decision trees Sep 17, 2025
@adam2392
Copy link
Member

For completeness and self containment of the PR, can you link/copy the code and relevant results comparing poisson and MSE?

Thanks!

@cakedev0
Copy link
Contributor Author

Benchmark script:

from time import perf_counter
import numpy
from sklearn.tree import DecisionTreeRegressor

if __name__ == "__main__":
    d = 20
    n = 3_000_000 // d
    n_fit = 10
    for criterion in ["squared_error", "poisson"]:
        dt = 0
        for _ in range(n_fit):
            X = numpy.random.rand(n, d)
            y = numpy.random.rand(n) + X.sum(axis=1)
            t = perf_counter()
            tree = DecisionTreeRegressor(max_depth=4, max_features=d,
                                        criterion=criterion).fit(X, y)
            dt += perf_counter() - t
        print(f"{criterion}: {dt / n_fit:.3f}s")

Results:

squared_error: 0.990s
poisson: 1.139s

Also a flame graph for a run with just criterion="poisson" showing that sort dominates (and hence, poisson loss computation is not very significant in the exec time):

image

@adam2392
Copy link
Member

@thomasjpfan do you have an historical context why the docs say poisson is much slower than MSE?

@cakedev0
Copy link
Contributor Author

historical context why the docs say poisson is much slower

The PR that added poisson loss: #17386

The original code was basically the same as today.

I quickly browsed the reviews and I think this statement was just not challenged (which seems fair, I tend to challenge people when say it's fast but much less when they say it's slow 😂 ).

My hypothesis: it's true that the criterion-related computations are much slower for Poisson loss than for MSE, that's why the PR author added this comment. But because tree building is sort-dominated, this doesn't change much the total exec. time.

@thomasjpfan
Copy link
Member

From memory it was because the Poisson's node_impurity needs to recompute the loss by going through the data again:

cdef float64_t node_impurity(self) noexcept nogil:
"""Evaluate the impurity of the current node.
Evaluate the Poisson criterion as impurity of the current node,
i.e. the impurity of sample_indices[start:end]. The smaller the impurity the
better.
"""
return self.poisson_loss(self.start, self.end, self.sum_total,
self.weighted_n_node_samples)

where MSE can compute the node_impurity without going through the data:

impurity = self.sq_sum_total / self.weighted_n_node_samples
for k in range(self.n_outputs):
impurity -= (self.sum_total[k] / self.weighted_n_node_samples)**2.0
return impurity / self.n_outputs

Although, "much slower" could be an overstatement. @lorentzenchr Do you recall why poisson was marked as "much slower" in the docs?

@cakedev0
Copy link
Contributor Author

Bump here.

While it remaines unclear why this statement was added in the doc, I feel that we have enough proofs to remove it:

  • theorically: compared to squared error, poisson loss only changes a "O(n) part" of the O(n log n) algorithm, so even if the constant in this O(n) part gets quite bigger, we don't expect a significant impact on execution time.
  • flame graph from py-spy confirms that the O(n log n) part is dominant in the execution time.
  • benchmarks confirm that is not that much slower (max 25% slower).

Here is a new extensive benchmark, trying to explore cases were the sort might be less dominant in the execution time (duplicates, no max_depth). In all cases, poisson is less than 25% slower than MSE.

from time import perf_counter
import numpy
from sklearn.tree import DecisionTreeRegressor

if __name__ == "__main__":
    n_fit = 15
    n_skip = 5
    for d in [2, 20]:
        for with_duplicates in [False, True]:
            for max_depth in [4, None]:
                for criterion in ["squared_error", "poisson"]:
                    n = 2_000_000 // d
                    dts = []    
                    for _ in range(n_fit):
                        X = numpy.random.rand(n, d)
                        if with_duplicates:
                            X = X.round(2)
                        y = numpy.random.rand(n) + X.sum(axis=1)
                        t = perf_counter()
                        tree = DecisionTreeRegressor(
                            criterion=criterion,
                            max_features=d, max_depth=max_depth,
                        )
                        tree.fit(X, y)
                        dts.append(perf_counter() - t)
                    avg = numpy.mean(dts[n_skip:])
                    std = numpy.std(dts[n_skip:])
                    print(
                        f"d={d}; with_duplicates={with_duplicates}; "
                        f"max_depth={max_depth}; criterion={criterion}:"
                        f" {avg:.2f} ± {std:.3f}s"
                    )
                print()

Results:

d=2; with_duplicates=False; max_depth=4; criterion=squared_error: 0.97 ± 0.025s
d=2; with_duplicates=False; max_depth=4; criterion=poisson: 1.13 ± 0.017s

d=2; with_duplicates=False; max_depth=None; criterion=squared_error: 4.57 ± 0.331s
d=2; with_duplicates=False; max_depth=None; criterion=poisson: 5.43 ± 0.083s

d=2; with_duplicates=True; max_depth=4; criterion=squared_error: 0.33 ± 0.014s
d=2; with_duplicates=True; max_depth=4; criterion=poisson: 0.35 ± 0.007s

d=2; with_duplicates=True; max_depth=None; criterion=squared_error: 0.56 ± 0.007s
d=2; with_duplicates=True; max_depth=None; criterion=poisson: 0.69 ± 0.006s

d=20; with_duplicates=False; max_depth=4; criterion=squared_error: 0.68 ± 0.002s
d=20; with_duplicates=False; max_depth=4; criterion=poisson: 0.77 ± 0.008s

d=20; with_duplicates=False; max_depth=None; criterion=squared_error: 2.01 ± 0.026s
d=20; with_duplicates=False; max_depth=None; criterion=poisson: 2.42 ± 0.058s

d=20; with_duplicates=True; max_depth=4; criterion=squared_error: 0.26 ± 0.006s
d=20; with_duplicates=True; max_depth=4; criterion=poisson: 0.26 ± 0.002s

d=20; with_duplicates=True; max_depth=None; criterion=squared_error: 1.23 ± 0.007s
d=20; with_duplicates=True; max_depth=None; criterion=poisson: 1.38 ± 0.006s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants