-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Categorical: Unsorted, String only, fix overwrite bug #9783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4c9353f
34b8eb4
543d235
4d57690
22e3a66
c7d57f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Deprecated `Axis.unit_data` | ||
``````````````````````````` | ||
|
||
Use `Axis.units` (which has long existed) instead. | ||
|
||
Only accept string-like for Categorical input | ||
````````````````````````````````````````````` | ||
|
||
Do not accept mixed string / float / int input, only | ||
strings are valid categoricals. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,123 +1,206 @@ | ||
# -*- coding: utf-8 OA-*-za | ||
# -*- coding: utf-8 -*- | ||
""" | ||
catch all for categorical functions | ||
Module that allows plotting of string "category" data. i.e. | ||
``plot(['d', 'f', 'a'],[1, 2, 3])`` will plot three points with x-axis | ||
values of 'd', 'f', 'a'. | ||
|
||
See :doc:`/gallery/lines_bars_and_markers/categorical_variables` for an | ||
example. | ||
|
||
The module uses Matplotlib's `matplotlib.units` mechanism to convert from | ||
strings to integers, provides a tick locator and formatter, and the | ||
class:`.UnitData` that creates and stores the string-to-integer mapping. | ||
""" | ||
from __future__ import (absolute_import, division, print_function, | ||
unicode_literals) | ||
|
||
from collections import OrderedDict | ||
import itertools | ||
|
||
import six | ||
|
||
|
||
import numpy as np | ||
|
||
import matplotlib.units as units | ||
import matplotlib.ticker as ticker | ||
|
||
# np 1.6/1.7 support | ||
from distutils.version import LooseVersion | ||
import collections | ||
|
||
|
||
if LooseVersion(np.__version__) >= LooseVersion('1.8.0'): | ||
def shim_array(data): | ||
return np.array(data, dtype=np.unicode) | ||
else: | ||
def shim_array(data): | ||
if (isinstance(data, six.string_types) or | ||
not isinstance(data, collections.Iterable)): | ||
data = [data] | ||
try: | ||
data = [str(d) for d in data] | ||
except UnicodeEncodeError: | ||
# this yields gibberish but unicode text doesn't | ||
# render under numpy1.6 anyway | ||
data = [d.encode('utf-8', 'ignore').decode('utf-8') | ||
for d in data] | ||
return np.array(data, dtype=np.unicode) | ||
|
||
VALID_TYPES = tuple(set(six.string_types + | ||
(bytes, six.text_type, np.str_, np.bytes_))) | ||
|
||
|
||
class StrCategoryConverter(units.ConversionInterface): | ||
@staticmethod | ||
def convert(value, unit, axis): | ||
"""Uses axis.unit_data map to encode | ||
data as floats | ||
"""Converts strings in value to floats using | ||
mapping information store in the unit object | ||
|
||
Parameters | ||
---------- | ||
value : string or iterable | ||
value or list of values to be converted | ||
unit : :class:`.UnitData` | ||
object string unit information for value | ||
axis : :class:`~matplotlib.Axis.axis` | ||
axis on which the converted value is plotted | ||
|
||
Returns | ||
------- | ||
mapped_ value : float or ndarray[float] | ||
|
||
.. note:: axis is not used in this function | ||
""" | ||
value = np.atleast_1d(value) | ||
# try and update from here.... | ||
if hasattr(axis.unit_data, 'update'): | ||
for val in value: | ||
if isinstance(val, six.string_types): | ||
axis.unit_data.update(val) | ||
vmap = dict(zip(axis.unit_data.seq, axis.unit_data.locs)) | ||
# dtype = object preserves numerical pass throughs | ||
values = np.atleast_1d(np.array(value, dtype=object)) | ||
|
||
if isinstance(value, six.string_types): | ||
return vmap[value] | ||
# pass through sequence of non binary numbers | ||
if all((units.ConversionInterface.is_numlike(v) and | ||
not isinstance(v, VALID_TYPES)) for v in values): | ||
return np.asarray(values, dtype=float) | ||
|
||
vals = shim_array(value) | ||
# force an update so it also does type checking | ||
unit.update(values) | ||
|
||
for lab, loc in vmap.items(): | ||
vals[vals == lab] = loc | ||
str2idx = np.vectorize(unit._mapping.__getitem__, | ||
otypes=[float]) | ||
|
||
return vals.astype('float') | ||
mapped_value = str2idx(values) | ||
return mapped_value | ||
|
||
@staticmethod | ||
def axisinfo(unit, axis): | ||
majloc = StrCategoryLocator(axis.unit_data.locs) | ||
majfmt = StrCategoryFormatter(axis.unit_data.seq) | ||
"""Sets the default axis ticks and labels | ||
|
||
Parameters | ||
--------- | ||
unit : :class:`.UnitData` | ||
object string unit information for value | ||
axis : :class:`~matplotlib.Axis.axis` | ||
axis for which information is being set | ||
|
||
Returns | ||
------- | ||
:class:~matplotlib.units.AxisInfo~ | ||
Information to support default tick labeling | ||
|
||
.. note: axis is not used | ||
""" | ||
# locator and formatter take mapping dict because | ||
# args need to be pass by reference for updates | ||
majloc = StrCategoryLocator(unit._mapping) | ||
majfmt = StrCategoryFormatter(unit._mapping) | ||
return units.AxisInfo(majloc=majloc, majfmt=majfmt) | ||
|
||
@staticmethod | ||
def default_units(data, axis): | ||
# the conversion call stack is: | ||
""" Sets and updates the :class:`~matplotlib.Axis.axis~ units | ||
|
||
Parameters | ||
---------- | ||
data : string or iterable of strings | ||
axis : :class:`~matplotlib.Axis.axis` | ||
axis on which the data is plotted | ||
|
||
Returns | ||
------- | ||
class:~.UnitData~ | ||
object storing string to integer mapping | ||
""" | ||
# the conversion call stack is supposed to be | ||
# default_units->axis_info->convert | ||
if axis.unit_data is None: | ||
axis.unit_data = UnitData(data) | ||
if axis.units is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, this needs some explanation, somewhere. If I use the jpl toy example, Second, the name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that 'units' is a place to stash what ever the handler feels like stashing there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, although it could do with documenting, I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I see thats how its being used. But the naive user (me as of a couple of months ago) would have a though time understanding that, and might think that this property of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the initial implementation of categorical we missed this and added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the wrinkle to this is that it'd probably be useful to store the sort of sorting info the jpl folks do in their implemention of StrConvertor but that's probably a refactor away. And just another attribute on the unit object... |
||
axis.set_units(UnitData(data)) | ||
else: | ||
axis.unit_data.update(data) | ||
return None | ||
axis.units.update(data) | ||
return axis.units | ||
|
||
|
||
class StrCategoryLocator(ticker.Locator): | ||
"""tick at every integer mapping of the string data""" | ||
def __init__(self, units_mapping): | ||
""" | ||
Parameters | ||
----------- | ||
units: dict | ||
string:integer mapping | ||
""" | ||
self._units = units_mapping | ||
|
||
def __call__(self): | ||
return list(self._units.values()) | ||
|
||
class StrCategoryLocator(ticker.FixedLocator): | ||
def __init__(self, locs): | ||
self.locs = locs | ||
self.nbins = None | ||
def tick_values(self, vmin, vmax): | ||
return self() | ||
|
||
|
||
class StrCategoryFormatter(ticker.FixedFormatter): | ||
def __init__(self, seq): | ||
self.seq = seq | ||
self.offset_string = '' | ||
class StrCategoryFormatter(ticker.Formatter): | ||
"""String representation of the data at every tick""" | ||
def __init__(self, units_mapping): | ||
""" | ||
Parameters | ||
---------- | ||
units: dict | ||
string:integer mapping | ||
""" | ||
self._units = units_mapping | ||
|
||
def __call__(self, x, pos=None): | ||
if pos is None: | ||
return "" | ||
r_mapping = {v: StrCategoryFormatter._text(k) | ||
for k, v in self._units.items()} | ||
return r_mapping.get(int(np.round(x)), '') | ||
|
||
class UnitData(object): | ||
# debatable makes sense to special code missing values | ||
spdict = {'nan': -1.0, 'inf': -2.0, '-inf': -3.0} | ||
@staticmethod | ||
def _text(value): | ||
"""Converts text values into `utf-8` or `ascii` strings | ||
""" | ||
if LooseVersion(np.__version__) < LooseVersion('1.7.0'): | ||
if (isinstance(value, (six.text_type, np.unicode))): | ||
value = value.encode('utf-8', 'ignore').decode('utf-8') | ||
if isinstance(value, (np.bytes_, six.binary_type)): | ||
value = value.decode(encoding='utf-8') | ||
elif not isinstance(value, (np.str_, six.string_types)): | ||
value = str(value) | ||
return value | ||
|
||
def __init__(self, data): | ||
"""Create mapping between unique categorical values | ||
and numerical identifier | ||
|
||
Parameters | ||
class UnitData(object): | ||
def __init__(self, data=None): | ||
"""Create mapping between unique categorical values | ||
and integer identifiers | ||
---------- | ||
data: iterable | ||
sequence of values | ||
sequence of string values | ||
""" | ||
self.seq, self.locs = [], [] | ||
self._set_seq_locs(data, 0) | ||
|
||
def update(self, new_data): | ||
# so as not to conflict with spdict | ||
value = max(max(self.locs) + 1, 0) | ||
self._set_seq_locs(new_data, value) | ||
|
||
def _set_seq_locs(self, data, value): | ||
strdata = shim_array(data) | ||
new_s = [d for d in np.unique(strdata) if d not in self.seq] | ||
for ns in new_s: | ||
self.seq.append(ns) | ||
if ns in UnitData.spdict: | ||
self.locs.append(UnitData.spdict[ns]) | ||
else: | ||
self.locs.append(value) | ||
value += 1 | ||
self._mapping = OrderedDict() | ||
self._counter = itertools.count(start=0) | ||
if data is not None: | ||
self.update(data) | ||
|
||
def update(self, data): | ||
"""Maps new values to integer identifiers. | ||
|
||
Paramters | ||
--------- | ||
data: iterable | ||
sequence of string values | ||
|
||
Raises | ||
------ | ||
TypeError | ||
If the value in data is not a string, unicode, bytes type | ||
""" | ||
data = np.atleast_1d(np.array(data, dtype=object)) | ||
|
||
for val in OrderedDict.fromkeys(data): | ||
if not isinstance(val, VALID_TYPES): | ||
raise TypeError("{val!r} is not a string".format(val=val)) | ||
if val not in self._mapping: | ||
self._mapping[val] = next(self._counter) | ||
|
||
|
||
# Connects the convertor to matplotlib | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried about requiring the newest version of pytest? Are there new features we need or just bug fixes and this was the easiest way to make sure we got them?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what I needed to get
pytest.param
to work, which I used to mark the individual failing tests. I can do a rewrite to get around that, it just makes the tests even clunkier.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where it's from: https://github.com/matplotlib/matplotlib/pull/9783/files#diff-f36a7b45d6a24734ba38d2da8f52f138R257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets see if we get push back from the packagers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mark parametrized tests as xfail without using
pytest.param
all over the place. For example,needs_usetex
used in this parameter is really anxfail
.So I'm not so sure you need to bump requirements here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to go back to an old version of the docs, e.g. 3.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Then will do that and undo the travis/appveyor changes.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, doesn't work/fails spectacularly when I try. Technically params was introduced in 3.2, but I dunno how to say version >=3.2 and !=3.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd that
xfail
doesn't seem to catch did-not-raise errors, but Fedora has 3.2, so if that's all you need, we'd probably be fine with that. Debian other-than-stable is probably okay too.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solution for the how to specify problem:
pytest!=3.3.0, >=3.20
and travis didn't break this time. 🤞 for appveyor.