Faster categorical tick formatter. #13917
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Having thousands of categories is most likely a sign that the user
forgot to convert strings to floats or dates, but we may as well not
take forever to generate the incorrect plot so that they can observe the
failure faster (cf "slow beyond belief" comment in #13910).
Right now StrCategoryFormatter constructs the value-to-label dict at
every call to
__call__
which leads to quadratic complexity wheniterating over the ticks. Instead, just do this once in
format_ticks
(and let
__call__
use that implementation too), for linear complexity.This speeds up
from ~25s to ~11s (and the difference gets bigger for more ticks as
we're comparing O(n^2) to O(n) (modulo dict lookup terms in log(n),
probably)).
The other option was to make UnitData maintain both a forward and a
backward mapping in sync but this would require passing the UnitData
instance rather than the mapping to the StrCategoryFormatter constructor
and the API break is just not worth it.
PR Summary
PR Checklist