-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Categorical: Unsorted, String only, fix overwrite bug #9783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[1, 1, -1, 2, 100, 0, 200])] | ||
ids = ["unicode", "single", "basic", "mixed"] | ||
|
||
test_cases = {"unicode": ("Здравствуйте мир", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be an ordered dict so that they are always in the same order so that pytest-xdist works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda surprised since I was always taught that tests should be independent of each other/order but ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are, but xdist imports the tests in both processes and then sorts out who does what (instead of trying to pickle and ship tests across the wire). One of it's start up checks is that it finds exactly the same tests in both processes (to make sure nothing funny is happening). In 3.5 the order is random which is fine from a testing point of view but not fine from xdist counting it's fingers and toes point of view.
please leave for me to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks okay, but the public facing methods could do with some short docstrings (which would make reviewing easier too!)
lib/matplotlib/category.py
Outdated
def __init__(self, data): | ||
"""Create mapping between unique categorical values | ||
and numerical identifier | ||
|
||
Parameters | ||
---------- | ||
data: iterable | ||
sequence of values | ||
sequence of values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sequence can contain anything including mixed types right? Might be worth adding a note here if that's true.
lib/matplotlib/category.py
Outdated
|
||
@staticmethod | ||
def axisinfo(unit, axis): | ||
majloc = StrCategoryLocator(axis.unit_data.locs) | ||
majfmt = StrCategoryFormatter(axis.unit_data.seq) | ||
majloc = StrCategoryLocator(axis.unit_data._locs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this get a short docstring too?
lib/matplotlib/category.py
Outdated
data = [d.encode('utf-8', 'ignore').decode('utf-8') | ||
for d in data] | ||
return np.array(data, dtype=np.unicode) | ||
def to_str(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short docstring here too please!
@dstansby, plan is to address your comments, just wanna run through this the CI first. |
.appveyor.yml
Outdated
@@ -66,7 +66,7 @@ install: | |||
- activate test-environment | |||
- echo %PYTHON_VERSION% %TARGET_ARCH% | |||
# pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124 | |||
- pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist | |||
- pip install -q "pytest=3.4.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should >=
this so we aren't stuck on 3.4 for ever.... Not sure why Appveryor isn't getting 3.4 anyways.
Pretty sure this PR mostly ended up same as #10212, but here the failing tests are marked as xfail. But this PR also cleans up/consolidates the tests in #10212 (lots of overlap between me and @anntzer ). As for the failing tests, they're due to inconsistencies in unit implementation between scatter, bar, and plot, and that seems to be a fix better suited for a standalone PR. (And an audit of how conversion is done in all the plotting functions). |
xref #9713 |
Can you also deprecated
https://github.com/matplotlib/matplotlib/blob/0ec46336fd142313fa12de936e002e94e2a0dce5/lib/matplotlib/axis.py#L799-L820
|
lib/matplotlib/category.py
Outdated
if pos is None: | ||
return "" | ||
r_mapping = {v: to_str(k) for k, v in self._unit_data._mapping.items()} | ||
return r_mapping.get(int(x), '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use int(np.round(x))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah...probably
@@ -1,257 +1,273 @@ | |||
# -*- coding: utf-8 -*- | |||
# -*- coding: utf-8 -*-A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra 'A'
lib/matplotlib/category.py
Outdated
|
||
def __init__(self, data): | ||
def __init__(self, data=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to remove passing in data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the way that the mapping is updated below I think you have to not take input (as the counter is just incremented) so it will step on user values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐑 I am 😪 and can not read. Carry on this is 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we needed that? In this case, it's UnitData(['a', 'b', 'c']). Granted, I dunno if it ever gets called with data, or if it's always an indirect via an update...
@@ -66,7 +66,7 @@ install: | |||
- activate test-environment | |||
- echo %PYTHON_VERSION% %TARGET_ARCH% | |||
# pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124 | |||
- pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried about requiring the newest version of pytest? Are there new features we need or just bug fixes and this was the easiest way to make sure we got them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what I needed to get pytest.param
to work, which I used to mark the individual failing tests. I can do a rewrite to get around that, it just makes the tests even clunkier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where it's from: https://github.com/matplotlib/matplotlib/pull/9783/files#diff-f36a7b45d6a24734ba38d2da8f52f138R257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets see if we get push back from the packagers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mark parametrized tests as xfail without using pytest.param
all over the place. For example, needs_usetex
used in this parameter is really an xfail
.
So I'm not so sure you need to bump requirements here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to go back to an old version of the docs, e.g. 3.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Then will do that and undo the travis/appveyor changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, doesn't work/fails spectacularly when I try. Technically params was introduced in 3.2, but I dunno how to say version >=3.2 and !=3.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd that xfail
doesn't seem to catch did-not-raise errors, but Fedora has 3.2, so if that's all you need, we'd probably be fine with that. Debian other-than-stable is probably okay too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solution for the how to specify problem: pytest!=3.3.0, >=3.20
and travis didn't break this time. 🤞 for appveyor.
I am overall 👍 on this. It fixes several real bugs, and does not make anything worse (plot and bar still take mixed types, but we now have xfail tests to fix). @story645 can you squash this down to fewer commits / get rid of the merges? |
@tacaswell tried but I'm doing the "verified" merges via the website and have now broken my repo like twice... |
506ca48
to
c07b9a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really needs a couple of paragraphs at the top explaining what is going on, and helping future folks follow the code. I get it now, but it took me a while to understand that the str2num mapping is being put in axis.units. I don't know if we want that paragraph to be in the public API or not, but something would sure help.
lib/matplotlib/axis.py
Outdated
def unit_data(self, unit_data): | ||
self._unit_data = unit_data | ||
self.set_units = unit_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this. set_units
is a method on the axis class. What is unit_data
?
lib/matplotlib/category.py
Outdated
(bytes, six.text_type, np.str_, np.bytes_))) | ||
|
||
|
||
def to_str(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be private?
lib/matplotlib/category.py
Outdated
value: string or iterable | ||
value or list of values to be converted | ||
unit: None | ||
units to use for an axis with string data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit
doesn't do anything, so the doc string shouldn't make it sound like it will...
# default_units->axis_info->convert | ||
if axis.unit_data is None: | ||
axis.unit_data = UnitData(data) | ||
if axis.units is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this needs some explanation, somewhere. If I use the jpl toy example, ax.yaxis.units
returns something like 'meters'. dates.py
returns the timezone (if set). Here, you load axis.units
up with the data map. I guess thats OK. But its a little mysterious. Some description of why would be appreciated. Is this really the only place we can carry that map around?
Second, the name UnitData
doesn't help me know whats going on. I'd consider changing this name to CategoricalUnitsMap
or something that makes it clear its categorical that is involved and that its a map. If I want to query an axis as to its units, axis.units
is a useful place to look, and UnitData
doesn't quite convey that (though it will usually be <matplotlib.category.UnitData object at 0x11eda2d68>
, so take as you will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that 'units' is a place to stash what ever the handler feels like stashing there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although it could do with documenting, I think units
is a place for the converter to store any variables that affect how it does the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see thats how its being used. But the naive user (me as of a couple of months ago) would have a though time understanding that, and might think that this property of the axis
class, namedaxis.units
, might actually be the units of the axis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the initial implementation of categorical we missed this and added unit_data
(which is new being deprecated) to stash the mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the wrinkle to this is that it'd probably be useful to store the sort of sorting info the jpl folks do in their implemention of StrConvertor but that's probably a refactor away. And just another attribute on the unit object...
I took the liberty of pushing commits to address the latest round of comments. |
|
||
import unittest | ||
|
||
def _to_str(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of having two versions of the exact same function. Can the _to_str inside the StrFormatter of call be pulled out into a private class method of StrFormatter so that it can be called in the same say convert can be called?
lib/matplotlib/category.py
Outdated
|
||
|
||
class StrCategoryConverter(units.ConversionInterface): | ||
@staticmethod | ||
def convert(value, unit, axis): | ||
"""Uses axis.unit_data map to encode | ||
data as floats | ||
"""Uses axis.units to encode string data as floats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc string isn't correct in this implementation. It uses unit
to encode the string.
lib/matplotlib/category.py
Outdated
value : string or iterable | ||
value or list of values to be converted | ||
unit : :class:`.UnitData` | ||
units to use for an axis with string data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest "UnitData
contains map between category strings and floats.
lib/matplotlib/category.py
Outdated
""" | ||
Parameters | ||
---------- | ||
units: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right either, is it? Its a UnitData object as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, updating this now
627f50f
to
cfdbd9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one tiny change, but this looks 👍 overall to me
@@ -0,0 +1,10 @@ | |||
Deprecated `Axis.unt_data` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unt_data --> unit_data
fixed the typo @dstansby found. |
Took the liberty of pushing a bit more docs for the preamble. I think this is an improvement over the previous version, and basically think it can be merged. There are still going to be some confused folks, but at least the rules are clearer now. |
Won't merge w/o someone else checking the pre-amble change I made. |
Did not wait for CI, as the last commit is docs-only. |
🎉 thanks a lot @story645 for putting up with all our reviews! |
Alternative to #9774 and supersedes #9340 that preserves more of the existing behavior. Open discussion on whether that should be preserved, but closes out #9350, #9312, #9336 .
And it's supporting missing values/nans in scatter but not in plot (in plot it's treating them as a categorical). My guess is this is tied into #9713 (plot doesn't behave like the other functions).