Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add new-style string formatting option and callable option to fmt in Axes.bar_label(). #23690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 7, 2022

Conversation

stefmolin
Copy link
Contributor

@stefmolin stefmolin commented Aug 20, 2022

PR Summary

Addresses #23689

Enhancement to Axes.bar_label() to make it possible to format values with f-strings.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@stefmolin
Copy link
Contributor Author

Example usage:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
bar_container = ax.barh(['c', 'b', 'a'], [1_000, 5_000, 7_000])
ax.bar_label(bar_container, label_type='center', fmt='{:,.0f}')

Screen Shot 2022-08-20 at 6 44 41 PM

@stefmolin stefmolin changed the title [ENH] Add f-string option to fmt in Axes.bar_label(). [WIP] Add f-string option to fmt in Axes.bar_label(). Aug 20, 2022
@story645
Copy link
Member

story645 commented Aug 21, 2022

I think this is totally reasonable given we allow fstrings directly and indirectly for most of the other labeling, so I'm +1 on this.

ETA: Also, sorry should have posted this on the issue!

@tacaswell
Copy link
Member

In general we have not tried to guess which version of format strings the user passed. In the test used here "label {:f}" and "{%f}" would both fail the wrong way.

This did come up when we added this in 2019 #15602 (comment) and we opted for "old style" at the time.

I do not think we should go down the route of guessing which one the user passed (in addition to the two examples above "a: %d b: {:d}" I think will correctly format under either scheme and we have no way to tell which the user meant (I'm sure there are more realistic examples that have the same problem)). We have not done it elsewhere in the library (this is why we have a FormatStrFormatter (old style) and StrMethodFormatter (new style) classes).

I think our options are

  1. add a flag to control how we interpret fmt
  2. add a second (conflicting) keyword for newsytle format (if both are passed error, if neither is passed current behavior)
  3. accept a callable to fmt so you would do ax.bar_label(..., fmt="{:g}".format) (we can reliable tell apart a str and a callable)

I think option 3 is my favorite followed by 2.


My understand is that "fstring" refers to when you do f"xy {code}" which is effectivly a string literal and can not exist un-resolved where as this is about "new style format strings".

@tacaswell tacaswell added this to the v3.7.0 milestone Aug 21, 2022
@story645
Copy link
Member

story645 commented Aug 21, 2022

I think allowing a callable would be great as that would be consistent with the tick formatters and provides the most labeling flexibility, so cool if that's the easiest path forward!

ETA 2: @stefmolin you should probably hold off on changing code until more people chime in on what direction this should take.

ETA: I do think f-strings should be allowed directly since they're so widely used, but shifting that discussion to #23694 since I don't want that discussion to clobber this one.

@stefmolin
Copy link
Contributor Author

stefmolin commented Aug 21, 2022

@tacaswell - Yes, I was referring to new-style format strings – sorry for any confusion.

Going off the other issue, I'm going to update this to accept callables as well. @story645 - I really like the idea of being able to use the tick formatters here as well.

@stefmolin stefmolin changed the title [WIP] Add f-string option to fmt in Axes.bar_label(). [WIP] Add new-style string formatting option and callable option to fmt in Axes.bar_label(). Aug 21, 2022
@story645
Copy link
Member

@stefmolin just to be clear, you're 💯% welcome to join in on the formatting discussion where ever it's taking place.

@tacaswell
Copy link
Member

I continue to be very skeptical of trying to guess which format string the user intended.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format function needs thorough testing.

@tacaswell
Copy link
Member

It is worth always trying both % and .format and warning (raising) if both work?

To @jklymak 's point about being able to force the users preference, I think "use a lambda" is sufficient for that case.

@timhoffm
Copy link
Member

timhoffm commented Aug 26, 2022

It is worth always trying both % and .format and warning (raising) if both work?

I originally also thought that way, but now think otherwise

  • "...".format() returns the string unchanged if there is no {} in the string. I.e. for a simple %-style string {} works as well. - You could test that by checking whether the returned string is different, but that is extra effort.
  • by prefering %-style, we have 100% backward compatibility. - That's what we have been doing so far. We only catch the exception that was earlier surfaced to the user, and now try something additionally. In turn, pathological cases like "%d {}" have worked so far and would break if we raise on both working.

Additional note: My only slight concern with prefering %-style is that I would actually prefer {}-style when I had to write that from scratch, because {} is more common nowadays. But I think, that's not important enough to introduce minor backward incompatibilities.

@tacaswell
Copy link
Member

@timhoffm I'm convinced.

@stefmolin What strings were passing the % formatting unexpectedly? That is an awkward construction of re-raising, it would be good to understand exactly why we need it....

@stefmolin
Copy link
Contributor Author

@stefmolin What strings were passing the % formatting unexpectedly? That is an awkward construction of re-raising, it would be good to understand exactly why we need it....

@tacaswell - {:,.0f} was one example where the labels became {:,.0f}. I noticed that anything like {:.2%} didn't need the check – for obvious reasons. Here's a complete example that needed that check to work:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
bar_container = ax.barh(['c', 'b', 'a'], [1_000, 5_000, 7_000])
ax.bar_label(bar_container, label_type='center', fmt='{:,.0f}')

Screen Shot 2022-08-26 at 1 25 13 PM

@tacaswell
Copy link
Member

I can reproduce this 🤯 , but I think I have an explanation:

If you put a breakpoint in the formatting function you get to:

fmt='{:,.0f}'
here fmt='{:,.0f}' value=1000.0 fmt % value='{:,.0f}'
> /home/tcaswell/source/p/matplotlib/matplotlib/lib/matplotlib/cbook/__init__.py(2341)_auto_format_str()
-> return fmt % value
(Pdb) fmt
'{:,.0f}'
(Pdb) value
1000.0
(Pdb) fmt % value
'{:,.0f}'
(Pdb) fmt % 5
*** TypeError: not all arguments converted during string formatting
(Pdb) fmt % value
'{:,.0f}'
(Pdb) fmt % 1000.0
*** TypeError: not all arguments converted during string formatting
(Pdb) 1000.0 == value
True
(Pdb) fmt % value
'{:,.0f}'
(Pdb) fmt % 1000.0
*** TypeError: not all arguments converted during string formatting
(Pdb) type(value)
<class 'numpy.float64'>

so I think the problem here is that if the value is a numpy float the string formatting fails to fail. However it is not that numpy floats fully escape formatting:

In [16]: 's' % np.float64(5)
Out[16]: 's'

In [17]: '%d' % np.float64(5)
Out[17]: '5'

@melissawm have you seen anything like this in numpy before? This seems like a bug, but is rather hard to google...

@tacaswell
Copy link
Member

Ok, thanks pointers from @melissawm and Sebastian, I think I understand what is going on here:

Parsing the documnetation on this very carefully:

If format requires a single argument, values may be a single non-tuple object. 5 Otherwise, values must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary).

The input can be:

  1. a single value if the format require a single argument
  2. a tuple with exactly the right number of values
  3. a single mapping (which despite the example is really anything with __getitem__)

If we pass a format string with 1 % and our value is no a tuple, we hit case 1 and everything is copacetic. However, if we pass a string without any % in it then we try the next two cases. If it is a tuple (and the c code does a type check) then we get case 2. If the value passed is also not a string but does have __getitem__ then we get into case 3 where it is forgiving of some keys in the mapping not being puled out (which makes sense for a use-case where the library provides a whole lot of values and the user can pass in a constructed format string to use the (possibly empty) subset they want.

Because np.float64 has __getitem__ defined we end up in case 3 unexpectedly and hence have this issue. By putting the value (which we expect to be treated as a scalar because we told the users to provide us with a string to format a single number) we force case 2.

Other cases that are surprising:

>>> 'd' % [1, 2, 3]
'd'
>>> 'd %(0)d' % [1, 2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str
>>> 'd %(0)d' % {0: 1}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '0'
>>> import numpy as np
>>> 'd %(0)d' % np.float64(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: invalid index to scalar variable.
>>> 'd %(0)d' % np.array([1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> 'd' % np.array([1])
'd'
>>> 'd' % {0:1}
'd'

The % formatting code in CPython https://github.com/python/cpython/blob/29f1b0bb1ff73dcc28f0ca7e11794141b6de58c9/Objects/unicodeobject.c#L14262-L14389 (grepping for the error message worked very well here) and the definition of PyMapping_Check https://github.com/python/cpython/blob/29f1b0bb1ff73dcc28f0ca7e11794141b6de58c9/Objects/abstract.c#L2293-L2300

Co-authored-by: Thomas A Caswell <[email protected]>
@stefmolin
Copy link
Contributor Author

Thanks for the debugging and explanation @tacaswell - very interesting to understand the why behind that issue.

@stefmolin stefmolin changed the title [WIP] Add new-style string formatting option and callable option to fmt in Axes.bar_label(). Add new-style string formatting option and callable option to fmt in Axes.bar_label(). Sep 1, 2022
@stefmolin
Copy link
Contributor Author

Should I add a what's new entry and/or example for this?

@story645
Copy link
Member

story645 commented Sep 1, 2022

@stefmolin
Copy link
Contributor Author

yes what's new entry, I think update either https://matplotlib.org/devdocs/gallery/lines_bars_and_markers/bar_label_demo.html#sphx-glr-gallery-lines-bars-and-markers-bar-label-demo-py or https://matplotlib.org/devdocs/gallery/lines_bars_and_markers/barchart.html#sphx-glr-gallery-lines-bars-and-markers-barchart-py with the new formatting? (use callables for one group, strings for the other group maybe?)

@story645 – I added a what's new entry, but I'm thinking it might be better to add the examples from the what's new entry to the Bar Label Demo. The demo already has multiple examples, but it feels a bit contrived to change the current ones to use the new functionality since there is really no benefit to doing so (there's currently only one example that even uses fmt). What do you think?

For reference, here are the examples I created to make it more apparent why you might use the fmt argument in the new way:

import matplotlib.pyplot as plt

animal_names = ['Lion', 'Gazelle', 'Cheetah']
mph_speed = [50, 60, 75]

fig, ax = plt.subplots()
bar_container = ax.bar(animal_names, mph_speed)
ax.set(ylabel='speed in MPH', title='Running speeds', ylim=(0, 80))
ax.bar_label(
    bar_container, fmt=lambda x: '{:.1f} km/h'.format(x * 1.61)
)

Screen Shot 2022-09-03 at 9 20 10 PM

import matplotlib.pyplot as plt

fruit_names = ['Coffee', 'Salted Caramel', 'Pistachio']
fruit_counts = [4000, 2000, 7000]

fig, ax = plt.subplots()
bar_container = ax.bar(fruit_names, fruit_counts)
ax.set(ylabel='pints sold', title='Gelato sales by flavor', ylim=(0, 8000))
ax.bar_label(bar_container, fmt='{:,.0f}')

Screen Shot 2022-09-03 at 9 23 37 PM

@timhoffm
Copy link
Member

timhoffm commented Sep 4, 2022

The demo already has multiple examples, but it feels a bit contrived to change the current ones to use the new functionality since there is really no benefit to doing so (there's currently only one example that even uses fmt). What do you think?

IMHO the bar_label demo needs an overhaul anyway. The semanitc content in the existing examples is quite far fetched and doesn't really make sense. I'd like to see more realistic content, like your plots.

For now, it's fine to add your plots to the bar label demo as is.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have two minor suggestions, but approve already what is there.

@timhoffm
Copy link
Member

timhoffm commented Sep 4, 2022

Off-topic: @stefmolin Thanks for working on Matplotlib. I'd like to make you aware of the new contributor meeting. This is a good place to get to know some of the matplotlib developers, to learn more about the project and of course to ask questions in general. If you are interested, you are very welcome to join. The next date is Tuesday, Sept 6 at 5pm UTC.

@stefmolin
Copy link
Contributor Author

@timhoffm - I've incorporated the suggestions and added the two new examples to the demo. Thanks for letting me know about the new contributor meeting – it's during work for me, but I'll see if I can attend 😄

@story645
Copy link
Member

story645 commented Sep 7, 2022

Holding off on merging only cause I'm not sure if features are fixed when a release candidate is cut - @QuLogic?

@jklymak
Copy link
Member

jklymak commented Sep 7, 2022

It is milestoned 3.7, so shouldn't be a problem, right?

@jklymak jklymak merged commit ae53649 into matplotlib:main Sep 7, 2022
@story645
Copy link
Member

story645 commented Sep 7, 2022

Yeah was just unsure if feature freeze comes in once the RCs are cut or if they can come in - under the wire.

Granted, targeting 3.7 gives time to roll out this new formatting support to all the functions with a fmt keyword (and support for formatters) if we want to go that route.

@timhoffm
Copy link
Member

timhoffm commented Sep 7, 2022

Yeah was just unsure if feature freeze comes in once the RCs are cut or if they can come in - under the wire.

3.6.0 feature freeze comes in as soon as we have the 3.6.x branch. Then, the PRs are going into main and need to be backported to the 3.6.x brach explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants