Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Faster categorical tick formatter. #13917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2019
Merged

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Apr 10, 2019

Having thousands of categories is most likely a sign that the user
forgot to convert strings to floats or dates, but we may as well not
take forever to generate the incorrect plot so that they can observe the
failure faster (cf "slow beyond belief" comment in #13910).

Right now StrCategoryFormatter constructs the value-to-label dict at
every call to __call__ which leads to quadratic complexity when
iterating over the ticks. Instead, just do this once in format_ticks
(and let __call__ use that implementation too), for linear complexity.

This speeds up

from pylab import *
cats = [str(x) for x in np.random.rand(4000)]  # Bunch of labels.
plt.plot(cats)
plt.gcf().canvas.draw()

from ~25s to ~11s (and the difference gets bigger for more ticks as
we're comparing O(n^2) to O(n) (modulo dict lookup terms in log(n),
probably)).

The other option was to make UnitData maintain both a forward and a
backward mapping in sync but this would require passing the UnitData
instance rather than the mapping to the StrCategoryFormatter constructor
and the API break is just not worth it.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

Having thousands of categories is most likely a sign that the user
forgot to convert strings to floats or dates, but we may as well not
take forever to generate the incorrect plot so that they can observe the
failure faster.

Right now StrCategoryFormatter constructs the value-to-label dict at
every call to `__call__` which leads to quadratic complexity when
iterating over the ticks.  Instead, just do this once in `format_ticks`
(and let `__call__` use that implementation too), for linear complexity.

This speeds up

    from pylab import *
    cats = [str(x) for x in np.random.rand(4000)]  # Bunch of labels.
    plt.plot(cats)
    plt.gcf().canvas.draw()

from ~25s to ~11s (and the difference gets bigger for more ticks as
we're comparing O(n^2) to O(n) (modulo dict lookup terms in log(n),
probably)).

The other option was to make UnitData maintain both a forward and a
backward mapping in sync but this would require passing the UnitData
instance rather than the mapping to the StrCategoryFormatter constructor
and the API break is just not worth it.
return r_mapping.get(int(np.round(x)), '')
return '' if pos is None else self.format_ticks([x])[0]

def format_ticks(self, values):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious/confused about why/how the dict is only built once if there's still a call to .format_ticks([x]) on every call to call. Is their caching somewhere?

Copy link
Contributor Author

@anntzer anntzer Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tick-labeling code calls format_ticks, not __call__ (__call__ is only used by the mouse-cursor-text (as a fallback because format_data_short is not defined) and left as a backcompat API). (So mouse-cursor-text is still slow, but at least the initial draw is less slow.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:/ I have no objection to switching out the data structure holding the mapping to something more reverse friendly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the deprecation is a bit a pain to handle, I doubt it's worth it, especially considering that nearly all cases where this would be relevant are actually due to user error (not converting strings to floats/dates).

@tacaswell tacaswell added this to the v3.2.0 milestone Apr 10, 2019
@tacaswell tacaswell merged commit d7d6947 into matplotlib:master Apr 10, 2019
@anntzer anntzer deleted the fastcat branch April 10, 2019 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants