Thanks to visit codestin.com
Credit goes to github.com

Skip to content

scalar categoricals are sometimes interpreted as data keys #9844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anntzer opened this issue Nov 24, 2017 · 11 comments
Open

scalar categoricals are sometimes interpreted as data keys #9844

anntzer opened this issue Nov 24, 2017 · 11 comments
Labels
keep Items to be ignored by the “Stale” Github Action status: confirmed bug topic: categorical

Comments

@anntzer
Copy link
Contributor

anntzer commented Nov 24, 2017

Bug report

Bug summary

Code for reproduction

from matplotlib import pyplot as plt
fig, axs = plt.subplots(2)
axs[0].bar("thing", 1, data={"thing": 1})
axs[1].bar("other", 1, data={"thing": 1})
plt.show()

Actual outcome

figure_1

The first plot's x-value ("thing") is interpreted as a lookup into the data dict and replaced by 1.
The second plot's x-value does not appear in the data dict and is interpreted as a categorical.

Expected outcome

Not sure what the best option is, but silently changing the interpretation of input based on whether it is present or not in another dict seems a bit finnicky. I have proposed in other places to not allow scalar categoricals at all (always need to be passed in a container -- list, array, dataframe, etc.), which also solves the inconsistency that plot(1, "x") currently specs a marker whereas plot("x", 1) treats "x" as a categorical.

Unlike other categorical issues I don't actually think this is release critical per se, but it would still be nice to get the behavior clarified/simplified...

Matplotlib version

  • Operating system:
  • Matplotlib version: 2.1
  • Matplotlib backend (print(matplotlib.get_backend())):
  • Python version:
  • Jupyter version (if applicable):
  • Other libraries:
@jklymak jklymak added this to the v2.1.1 milestone Nov 24, 2017
@story645
Copy link
Member

This behavior happens with any scalar, and so I'm wondering if the solution isn't something like locking out scaler plotting that aren't keys in data if data is present.

from matplotlib import pyplot as plt
fig, axs = plt.subplots(2)
axs[0].bar(10, 1, data={10: 1})
axs[1].bar(20, 1, data={10: 1})
plt.show()

index

@tacaswell tacaswell modified the milestones: v2.1.1, v2.2 Dec 6, 2017
@story645
Copy link
Member

And looking the example I put down, it's even more buggy 'cause it's not registering the scaler as the key at all (floating point error?).

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Apr 25, 2023
@ksunden
Copy link
Member

ksunden commented Apr 25, 2023

I'm not sure there is much that can be done here...

There are two rules that come into play with handling strings as data:

  • The data kwarg handling, which uses the string to access the value of a mapping
  • The categorical handling

Necessarily, they must be resolved in that order. If they were resolved in the other order, categorical handling would always take place, and thus the data kwarg would be useless.

As for non-strings as keys of the data kwarg, I think that is unsupported undefined behavior:

The docstring entry for data kwarg reads:

dataindexable object, optional

If given, all parameters also accept a string s, which is interpreted as data[s] (unless this raises an exception).

As of (at least) #10928, that the keys are strings to have any data kwarg behavior is strictly enforced

I suppose the "not allowed to pass categoricals as scalars" is potentially a solution (as that would mean that strings are not expected for any parameter except via the data kwarg)... but that is a rather large API change for a rather narrow set of interactions, though perhaps it would have other benefits (and I'm not quite sure how easy it would be to change that, actually).

If we are unwilling to deprecate using bare strings for either of those cases, though, I don't see any other behavior change, perhaps some docs changes, but even then not sure where.

@github-actions github-actions bot removed the status: inactive Marked by the “Stale” Github Action label Apr 26, 2023
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Jun 14, 2024
@anntzer anntzer added keep Items to be ignored by the “Stale” Github Action and removed status: inactive Marked by the “Stale” Github Action labels Jun 14, 2024
@timhoffm
Copy link
Member

timhoffm commented Jun 14, 2024

IMHO this is a lost cause. There are cases outside of categoricals, in which data-replaceable args can rightfully be strings. Consider the following realistic example

from matplotlib import pyplot as plt
fig, axs = plt.subplots(2)
data = {
    "x": [0, 1],
    "y": [1, 1], 
    "color": ["c", "m"],
}
axs[0].scatter("x", "y", facecolor="color", data=data)
axs[1].scatter("x", "y", facecolor="red", data=data)
plt.show()

grafik

We cannot reasonably prohibit the second case. As a consequence, it would be very tedious to tell acceptable string scalars from illegal string scalars apart.

I propose to close this. And while "silently changing the interpretation of input based on whether it is present or not in another dict is a bit finnicky" is not optimal, the behavior is consistent: "If a string, try to look it up in data otherwise proceed as normal". A mild improvement could be updating data description from

If given, the following parameters also accept a string s, which is interpreted as data[s] (unless this raises an exception):

to

If given, the following parameters also accept a string s, which is interpreted as data[s] if data[s] exists.

@anntzer
Copy link
Contributor Author

anntzer commented Jun 15, 2024

Indeed, this is a good example regarding the difficulty to fix this.
I'm not sure about the proposed wording change; what does "data[s] exists" actually mean? (other than "the expression does not raise an exception"...)

@timhoffm
Copy link
Member

You're right it's not much clearer. I find the parentheses in the original message a bit confusing, because (1) why put in parentheses (2) what happens if that raises? What we need to communicate, is

If the parameter is a string s, we try to look up data[s]. If that works, the resulting value is used for the parameter. If it fails, s itself is used (which may or may not be a valid input type for the parameter).

Can't think of a nice wording for this right now. Help welcome.

@jklymak
Copy link
Member

jklymak commented Jun 15, 2024

Is it not just "if s is a key of data"?

@timhoffm
Copy link
Member

timhoffm commented Jun 15, 2024

I'm not quite sure that key is universally understood. For example I don't think columns of a DataFrame or fields of a structured numpy array are commonly referred to as keys. I believe explicitly using data[s] is helpful.

@jklymak
Copy link
Member

jklymak commented Jun 15, 2024

Certainly pandas and xarray both refer to keys, and that is the terminology for dictionaries. If people don't know that term when using data structures they should probably learn. However I agree it is also helpful to have data[s]. My phrasing above was not meant to replace the whole sentence, but is less awkward than "data[s] raises and exception".

timhoffm added a commit to timhoffm/matplotlib that referenced this issue Jun 15, 2024
timhoffm added a commit to timhoffm/matplotlib that referenced this issue Jun 15, 2024
trygvrad pushed a commit to trygvrad/matplotlib that referenced this issue Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep Items to be ignored by the “Stale” Github Action status: confirmed bug topic: categorical
Projects
None yet
Development

No branches or pull requests

7 participants