Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add an auto-labeling helper function for bar charts #15602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 22, 2020

Conversation

immaxchen
Copy link
Contributor

@immaxchen immaxchen commented Nov 4, 2019

PR Summary

For GH #12386 (suggesting new feature: autolabel option for bar plots)
Added a new method bar_label into _axes.py and pyplot.py (via boilerplate.py)
Add a new prop orientation into BarContainer correspondingly
This function is designed to work out-of-the-box with minimum config required while retaining some of the customization capability, such as float format, label captions and spacings, along with other mpl's annotate and text options.
It work in two mode: edge and center, for cumulative and individual values respectively (especially useful in stacked bars). It will automatically avoid error bars and update axis-limits to fit-in the labels.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@immaxchen
Copy link
Contributor Author

have made some improvements 😉 :

  • named it as bar_label
  • add argument validation against mode
  • add tests
  • add what's new

@immaxchen
Copy link
Contributor Author

Example page can be found here! 🎉
https://26425-1385122-gh.circle-artifacts.com/0/home/circleci/project/doc/build/html/gallery/lines_bars_and_markers/bar_label_demo.html
But why can't I pass CircleCI? ... 😕

@ImportanceOfBeingErnest
Copy link
Member

The report says

/home/circleci/project/lib/matplotlib/pyplot.py:docstring of matplotlib.pyplot.bar_label:4: WARNING: py:obj reference target not found: BarContainer
/home/circleci/project/doc/gallery/lines_bars_and_markers/bar_label_demo.rst:14: WARNING: py:obj reference target not found: bar_label
/home/circleci/project/doc/users/next_whats_new/2019-11-06_auto-labeling-for-bar-charts.rst:5: WARNING: py:obj reference target not found: Axes.bar_label

so one would need to add the respective documentation entries, in this case probably in doc/api/axes_api.rst.

@ImportanceOfBeingErnest ImportanceOfBeingErnest added this to the v3.3.0 milestone Nov 7, 2019
@immaxchen
Copy link
Contributor Author

Finally all green! Thank you so much @ImportanceOfBeingErnest !

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this is really a useful feature! The quality of the PR is very good overall. But I still have some comments for further improvement.

Comment on lines 2727 to 2735
if autoscale:
transform = self.transData.inverted()
renderer = self.figure.canvas.get_renderer()
corners = []
for text in annotations:
points = text.get_window_extent(renderer).get_points()
corners.append(transform.transform(points))
self.update_datalim(np.vstack(corners))
self.autoscale_view()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the scaling mechanism. @anntzer @jklymak could the labels be included in the auto-scaling calculation, so that we don't have to explicitly scale here?

Copy link
Contributor

@anntzer anntzer Nov 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Texts normally don't participate (at all) in autoscaling -- they have the same problem as the one recently fixed by @jklymak for scatter(), which is that their extents is in physical units, not in data units.
To be honest I'm not really a fan of the solution here -- it'll likely break if e.g. someone adds more plot elements causing a rescaling, because then the "data limits" computed for the text won't be correct anymore.
I think a simpler solution for now is to just make sure that for "default" settings (a 640x480 figure, default axes size, default 5% margins, default font size), the default padding used by bar_label() ensures that the text indeed fits into the 5% margins, and leave more complex scaling to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the scaling either (and mpl codebase 😂), do you mean I just drop the autoscale stuff and use a smaller default padding and fontsize?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, it's a rather tricky part of the codebase, and your original approach is far from silly.
I would suggest indeed that (if possible) you make the default padding small enough that the things fit (normally) without re-autoscaling (if possible keeping the normal fontsize).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I tried and found immediately that it seems not feasible for horizontal bar labels.. 😱

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair point re: horizontal labels. Even then I would be tempted to just document this as a limitation of the method and suggest that the user picks whatever limits works for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sphx_glr_bar_label_demo_003

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, if so, I believe I can manage to achieve the goal! 😜

padding : float, optional
Space in points for label to leave the edge of the bar.

shifting : float, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name feels not quite right. But I don't have a better either right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

displacement? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding => longitudinal_offset
shifting => lateral_offset ? 🤔

@immaxchen
Copy link
Contributor Author

Thank you @timhoffm for patiently review and detailed explain for the considerations, very appreciated! 👍

elif position == "edge" and orientation == "horizontal" and xc < 0:
ha, va = "right", "center"

annotation = self.annotate(cap or fmt % value, xy, xytext,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ignore user input of an empty string, probably better to check for None-ness?

- 'edge': Placed at edge, the cumulative values will be shown.
- 'center': Placed at center, the individual values will be shown.

padding : float, default: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "longitudinal" and "lateral" how about "{parallel, perpendicular} to the direction of the bar" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest:

`padding` : float, default: 0
      Distance of label from end of bar
`offset` : float default: 0
     Distance label is moved relative to the centre line of the bar (positive up/right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the padding is still effective when position is 'center' and its direction will be the same as the bar points to, how could we describe this behavior? (or we disable it for 'center'?)

68781801-46f9c080-0673-11ea-8b76-fa4154bdd36ff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the shift perpendicular to the bar?

  1. I can't imagine a plot that wants this shift. Anybody has a good example?
  2. As it is implemented no, it's difficult to use because the label is centered in that direction. And offsets relative to center are somewhat messy (neither the centers nor the boundaries of the labels are very distinct reference points).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, perpendicular shifting is a bit over-design, I would vote for remove it because I can't think of a justified / common use case either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is to leave it off for now. Folks can still manually place the text if they don't like the helper. We can always add it later if its requested.

OTOH, don't be confused. Different folks have different opinions. @timhoffm is now API lead so if we don't get a consensus in a day or so, we can ask him for an explicit ruling 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it & thanks for the reply!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jklymak. Leave perpendicular shifts out for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@immaxchen is this taken care of now? (being lazy and not totally refreshing myself on this). If so, I think you are just waiting for @QuLogic to unblock. If you have addressed all his concerns this should be very close...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just @QuLogic need some time to think about if adding datavalues into BarContainer is a good idea, other suggestions should all be already addressed

position : {'edge', 'center'}, default: 'edge'
Position of the label relative to the bar:

- 'edge': Placed at edge, the cumulative values will be shown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand what this meant, would it be clearer to say "If edge the value is the position of the edge (cumulative for stacked bars)" and "if center the value shown will be the length of the bar"?

@tacaswell
Copy link
Member

Thanks for your work on this @immaxchen !

Given that this is adding to our public API I would like to be cautious and be very sure we have it right before we merge (as once it is released changing it is something between painful and impossible).

Can you also add a whats-new for the new state / init kwarg on container?

@tacaswell
Copy link
Member

Another question for the thread, does bar have an API for injecting the labels in at call time? If it does, should we be pulling those out here? If it does not should we add that functionality and then go back to the second question?

@immaxchen
Copy link
Contributor Author

Sure, cautious is good, thank you for your review! 👍
Not quite get it what is an API for injecting the labels in at call time?


- 'edge': the value is the position of the edge.
(cumulative for stacked bars)
- 'center': the value shown will be the length of the bar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code this looks like it affects both the position and the value printed? I really don't understand this at all, and the docstring should make it clear what "position" and "value" have to do with one another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry didn't explain well, yes it affects both the label position and the value printed, please refer to the second example for better illustration, as below:

ax.bar_label(p1, position='center')
ax.bar_label(p2, position='center') # print 25 32 34 20 -25
ax.bar_label(p2) # default 'edge', print 45 67 64 55 -52

image
how could we describe it better?

Copy link
Member

@jklymak jklymak Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are trying to do, but this is pretty confusing. I'm probably not the right person to help you because I'm not sure that I have ever plotted a stacked bar in my life.
But this is not really what we usually call a "position" in matplotlib, this is really two distinct ways to label a bar, so suggest perhaps:

label_type: string, default: 'edge'
  'edge' :  Label the end of the bar with the bar's absolute value 
                at the end.  
  'center':  Label the center of the bar, with the length of this segment 
                 (useful for stacked bars, i.e. `link_to_your_example`.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for (label_)type or kind because this does not only affect the label position but also it's semantics.

Description-wise, I think this would be even a bit clearer:

  'edge' :  Place the label beyond the top/right edge and display the
            value of that edge.  
  'center':  Place the label at the center of the bar and display the bar size.
             (useful for stacked bars, i.e. `link_to_your_example`.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t suggest “bar size” because that’s a bit ambiguous. Does it include the whole bar down to the baseline or just the segment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a bit of terminology. Since we need multiple calls to bar I see every block of unique color (what you call segment) as a separate bar. And these bars are stacked on top of each other. Also if you say "place the label at the center of the bar" you implicitly adopt that perspective.

I'm open to discussion. We should try to find a consistent and understandable description. IMHO it helps if multiple people contribute their impression/expectation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining suggestions into a slightly more verbose one:

label_type : {'edge', 'center'}, default: 'edge'

'edge' :   label placed at the end-point of the bar segment, 
           and the value displayed will be the position of that end-point.

'center' : label placed in the center of the bar segment, 
           and the value displayed will be the length of that segment.
           (useful for stacked bars, i.e. `link_to_your_example`.)

does it make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 For a standard way of formatting see e.g. the paramter align of bar().

- 'edge': Placed at edge, the cumulative values will be shown.
- 'center': Placed at center, the individual values will be shown.

padding : float, default: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest:

`padding` : float, default: 0
      Distance of label from end of bar
`offset` : float default: 0
     Distance label is moved relative to the centre line of the bar (positive up/right)

@@ -2581,6 +2582,123 @@ def barh(self, y, width, height=0.8, left=None, *, align="center",
align=align, **kwargs)
return patches

def bar_label(self, container, labels=None, *, fmt="%g", label_type="edge",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to take new style format strings instead? See arguments both ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we? both ok for me, currently using old style just for consistency with clabel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could also support both using

try:
    label = fmt % value
except TypeError:
    label = fmt.format(value)

But I'm not sure if that's worth it or if we even want that redundancy.

I'm fine with sticking to %-formatting, but don't have a strong opinion on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tacaswell, could you make a final decision? new, old or both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick to old-style formatting for consistency throughout the lib.

@tacaswell
Copy link
Member

Not quite get it what is an API for injecting the labels in at call time?

Something like

ax.bar(x, y, ..., bar_labels=['a', 'b'])

to avoid have to make a second call.

@immaxchen
Copy link
Contributor Author

I thought we were trying to avoid adding another kwarg because people upvote #12386 (comment) ?
of course, open to discussion.

@timhoffm
Copy link
Member

-1 on a bar_labels kwarg to bar(). It does not currently provide that feature. IMHO there are a number of reasons against that:

  • bar() is already complex enough as is.
  • bar labels need a number of customization options (see the signature in this PR). We neither would want to add all of that to bar() nor is it a good idea to have a semi-capable integration alongside a more complete separate function.
  • bar labels are non-essential to bar plots.
  • We already have established the pattern of adding complex labels by separate functions in clabel().

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,137 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not update the existing example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applicable for lines_bars_and_markers/barchart and horizontal_barchart_distribution
but sadly, this PR does not provide fine-grained controls needed in statistics/barchart_demo
this PR acts on BarContainer as a whole instead of fine-tuning color/position for every single bar
and for misc/demo_ribbon_box it will not work because they are not bars, they are images

- 'edge': Placed at edge, the cumulative values will be shown.
- 'center': Placed at center, the individual values will be shown.

padding : float, default: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shift by (size of bar)/2 to place the label centered, but over the bar?

Comment on lines 2652 to 2587
if orientation == "vertical":
extrema = max(y0, y1) if yc >= 0 else min(y0, y1)
length = abs(y0 - y1)
elif orientation == "horizontal":
extrema = max(x0, x1) if xc >= 0 else min(x0, x1)
length = abs(x0 - x1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these extrema calculations are quite correct, as you could set the bottom=-1, height=0.5, then the center is at -0.75, and this code would pick below, but it should pick above. I don't know if BarContainer has all information you need to catch this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to perceive the contextual position of labels (i.e. to distinguish between bottom=-1, height=0.5 vs. bottom=-0.5, height=-0.5), I have added an additional datavalues attribute into BarContainer in this new commit, i think this might be the only way to do it but not sure if it is desired.

@tacaswell tacaswell modified the milestones: v3.3.0, v3.4.0 Jun 2, 2020
@tacaswell
Copy link
Member

Pushing this to 3.4 because it needs a rebase and has at least one round of comments that need to be addressed.

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, but I need to think about the datavalues addition a bit more.

padding : float, default: 0
Distance of label from the end of the bar.

**kwargs : passed through to `.Axes.annotate`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the type.

Suggested change
**kwargs : passed through to `.Axes.annotate`.
**kwargs
Any remaining keyword arguments are passed through to `.Axes.annotate`.

@immaxchen immaxchen force-pushed the bar-chart-auto-label-gh12386 branch from 44174e6 to 3f2ece5 Compare October 4, 2020 15:17
@immaxchen
Copy link
Contributor Author

sure & thanks

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo docstring addtion in BarContainer.

- 'edge': Placed at edge, the cumulative values will be shown.
- 'center': Placed at center, the individual values will be shown.

padding : float, default: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jklymak. Leave perpendicular shifts out for now.

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

@kirksw
Copy link

kirksw commented Aug 18, 2021

I'm wondering if the right choice was made in regards to string formatting using the old-style in 2020/21, all my applications use the new-style formatting. I think this is solid feature, but won't use it for consistency reasons. This will become more relevant as time goes on and the old style diminishes further in use.

@timhoffm
Copy link
Member

timhoffm commented Aug 21, 2021

This choice was intentional to be consistent with similar functions (clabel, thetagrids etc.).

%-formatting is not deprecated and only mildly discouraged (https://docs.python.org/3/library/stdtypes.html#old-string-formatting).

If we want to introduce str.format style, that should happen consistently throughout the library (and I think we could have a reasonable auto-detection to support both so that path is still open without compatibility concerns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants