Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Categorical: Unsorted, String only, fix overwrite bug #9783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 11, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ install:
- activate test-environment
- echo %PYTHON_VERSION% %TARGET_ARCH%
# pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124
- pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about requiring the newest version of pytest? Are there new features we need or just bug fixes and this was the easiest way to make sure we got them?

Copy link
Member Author

@story645 story645 Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what I needed to get pytest.param to work, which I used to mark the individual failing tests. I can do a rewrite to get around that, it just makes the tests even clunkier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets see if we get push back from the packagers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mark parametrized tests as xfail without using pytest.param all over the place. For example, needs_usetex used in this parameter is really an xfail.

So I'm not so sure you need to bump requirements here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to go back to an old version of the docs, e.g. 3.0.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Then will do that and undo the travis/appveyor changes.

Copy link
Member Author

@story645 story645 Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, doesn't work/fails spectacularly when I try. Technically params was introduced in 3.2, but I dunno how to say version >=3.2 and !=3.3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd that xfail doesn't seem to catch did-not-raise errors, but Fedora has 3.2, so if that's all you need, we'd probably be fine with that. Debian other-than-stable is probably okay too.

Copy link
Member Author

@story645 story645 Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solution for the how to specify problem: pytest!=3.3.0, >=3.20 and travis didn't break this time. 🤞 for appveyor.

- pip install -q "pytest!=3.3.0,>=3.2.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist

# Apply patch to `subprocess` on Python versions > 2 and < 3.6.3
# https://github.com/matplotlib/matplotlib/issues/9176
Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ env:
- NUMPY=numpy
- PANDAS=
- PYPARSING=pyparsing
- PYTEST=pytest!=3.3.0
- PYTEST='pytest!=3.3.0,>=3.2.0'
- PYTEST_COV=pytest-cov
- PYTEST_PEP8=
- SPHINX=sphinx
Expand Down
10 changes: 10 additions & 0 deletions doc/api/next_api_changes/2018-02-10-HA.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Deprecated `Axis.unit_data`
```````````````````````````

Use `Axis.units` (which has long existed) instead.

Only accept string-like for Categorical input
`````````````````````````````````````````````

Do not accept mixed string / float / int input, only
strings are valid categoricals.
11 changes: 5 additions & 6 deletions lib/matplotlib/axis.py
Original file line number Diff line number Diff line change
Expand Up @@ -719,7 +719,7 @@ def __init__(self, axes, pickradius=15):
self.label = self._get_label()
self.labelpad = rcParams['axes.labelpad']
self.offsetText = self._get_offset_text()
self.unit_data = None

self.pickradius = pickradius

# Initialize here for testing; later add API
Expand Down Expand Up @@ -777,15 +777,14 @@ def limit_range_for_scale(self, vmin, vmax):
return self._scale.limit_range_for_scale(vmin, vmax, self.get_minpos())

@property
@cbook.deprecated("2.2.0")
def unit_data(self):
"""Holds data that a ConversionInterface subclass uses
to convert between labels and indexes
"""
return self._unit_data
return self.units

@unit_data.setter
@cbook.deprecated("2.2.0")
def unit_data(self, unit_data):
self._unit_data = unit_data
self.set_units(unit_data)

def get_children(self):
children = [self.label, self.offsetText]
Expand Down
237 changes: 160 additions & 77 deletions lib/matplotlib/category.py
Original file line number Diff line number Diff line change
@@ -1,123 +1,206 @@
# -*- coding: utf-8 OA-*-za
# -*- coding: utf-8 -*-
"""
catch all for categorical functions
Module that allows plotting of string "category" data. i.e.
``plot(['d', 'f', 'a'],[1, 2, 3])`` will plot three points with x-axis
values of 'd', 'f', 'a'.

See :doc:`/gallery/lines_bars_and_markers/categorical_variables` for an
example.

The module uses Matplotlib's `matplotlib.units` mechanism to convert from
strings to integers, provides a tick locator and formatter, and the
class:`.UnitData` that creates and stores the string-to-integer mapping.
"""
from __future__ import (absolute_import, division, print_function,
unicode_literals)

from collections import OrderedDict
import itertools

import six


import numpy as np

import matplotlib.units as units
import matplotlib.ticker as ticker

# np 1.6/1.7 support
from distutils.version import LooseVersion
import collections


if LooseVersion(np.__version__) >= LooseVersion('1.8.0'):
def shim_array(data):
return np.array(data, dtype=np.unicode)
else:
def shim_array(data):
if (isinstance(data, six.string_types) or
not isinstance(data, collections.Iterable)):
data = [data]
try:
data = [str(d) for d in data]
except UnicodeEncodeError:
# this yields gibberish but unicode text doesn't
# render under numpy1.6 anyway
data = [d.encode('utf-8', 'ignore').decode('utf-8')
for d in data]
return np.array(data, dtype=np.unicode)

VALID_TYPES = tuple(set(six.string_types +
(bytes, six.text_type, np.str_, np.bytes_)))


class StrCategoryConverter(units.ConversionInterface):
@staticmethod
def convert(value, unit, axis):
"""Uses axis.unit_data map to encode
data as floats
"""Converts strings in value to floats using
mapping information store in the unit object

Parameters
----------
value : string or iterable
value or list of values to be converted
unit : :class:`.UnitData`
object string unit information for value
axis : :class:`~matplotlib.Axis.axis`
axis on which the converted value is plotted

Returns
-------
mapped_ value : float or ndarray[float]

.. note:: axis is not used in this function
"""
value = np.atleast_1d(value)
# try and update from here....
if hasattr(axis.unit_data, 'update'):
for val in value:
if isinstance(val, six.string_types):
axis.unit_data.update(val)
vmap = dict(zip(axis.unit_data.seq, axis.unit_data.locs))
# dtype = object preserves numerical pass throughs
values = np.atleast_1d(np.array(value, dtype=object))

if isinstance(value, six.string_types):
return vmap[value]
# pass through sequence of non binary numbers
if all((units.ConversionInterface.is_numlike(v) and
not isinstance(v, VALID_TYPES)) for v in values):
return np.asarray(values, dtype=float)

vals = shim_array(value)
# force an update so it also does type checking
unit.update(values)

for lab, loc in vmap.items():
vals[vals == lab] = loc
str2idx = np.vectorize(unit._mapping.__getitem__,
otypes=[float])

return vals.astype('float')
mapped_value = str2idx(values)
return mapped_value

@staticmethod
def axisinfo(unit, axis):
majloc = StrCategoryLocator(axis.unit_data.locs)
majfmt = StrCategoryFormatter(axis.unit_data.seq)
"""Sets the default axis ticks and labels

Parameters
---------
unit : :class:`.UnitData`
object string unit information for value
axis : :class:`~matplotlib.Axis.axis`
axis for which information is being set

Returns
-------
:class:~matplotlib.units.AxisInfo~
Information to support default tick labeling

.. note: axis is not used
"""
# locator and formatter take mapping dict because
# args need to be pass by reference for updates
majloc = StrCategoryLocator(unit._mapping)
majfmt = StrCategoryFormatter(unit._mapping)
return units.AxisInfo(majloc=majloc, majfmt=majfmt)

@staticmethod
def default_units(data, axis):
# the conversion call stack is:
""" Sets and updates the :class:`~matplotlib.Axis.axis~ units

Parameters
----------
data : string or iterable of strings
axis : :class:`~matplotlib.Axis.axis`
axis on which the data is plotted

Returns
-------
class:~.UnitData~
object storing string to integer mapping
"""
# the conversion call stack is supposed to be
# default_units->axis_info->convert
if axis.unit_data is None:
axis.unit_data = UnitData(data)
if axis.units is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this needs some explanation, somewhere. If I use the jpl toy example, ax.yaxis.units returns something like 'meters'. dates.py returns the timezone (if set). Here, you load axis.units up with the data map. I guess thats OK. But its a little mysterious. Some description of why would be appreciated. Is this really the only place we can carry that map around?

Second, the name UnitData doesn't help me know whats going on. I'd consider changing this name to CategoricalUnitsMap or something that makes it clear its categorical that is involved and that its a map. If I want to query an axis as to its units, axis.units is a useful place to look, and UnitData doesn't quite convey that (though it will usually be <matplotlib.category.UnitData object at 0x11eda2d68>, so take as you will.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that 'units' is a place to stash what ever the handler feels like stashing there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although it could do with documenting, I think units is a place for the converter to store any variables that affect how it does the conversion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see thats how its being used. But the naive user (me as of a couple of months ago) would have a though time understanding that, and might think that this property of the axis class, namedaxis.units, might actually be the units of the axis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the initial implementation of categorical we missed this and added unit_data (which is new being deprecated) to stash the mapping.

Copy link
Member Author

@story645 story645 Feb 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the wrinkle to this is that it'd probably be useful to store the sort of sorting info the jpl folks do in their implemention of StrConvertor but that's probably a refactor away. And just another attribute on the unit object...

axis.set_units(UnitData(data))
else:
axis.unit_data.update(data)
return None
axis.units.update(data)
return axis.units


class StrCategoryLocator(ticker.Locator):
"""tick at every integer mapping of the string data"""
def __init__(self, units_mapping):
"""
Parameters
-----------
units: dict
string:integer mapping
"""
self._units = units_mapping

def __call__(self):
return list(self._units.values())

class StrCategoryLocator(ticker.FixedLocator):
def __init__(self, locs):
self.locs = locs
self.nbins = None
def tick_values(self, vmin, vmax):
return self()


class StrCategoryFormatter(ticker.FixedFormatter):
def __init__(self, seq):
self.seq = seq
self.offset_string = ''
class StrCategoryFormatter(ticker.Formatter):
"""String representation of the data at every tick"""
def __init__(self, units_mapping):
"""
Parameters
----------
units: dict
string:integer mapping
"""
self._units = units_mapping

def __call__(self, x, pos=None):
if pos is None:
return ""
r_mapping = {v: StrCategoryFormatter._text(k)
for k, v in self._units.items()}
return r_mapping.get(int(np.round(x)), '')

class UnitData(object):
# debatable makes sense to special code missing values
spdict = {'nan': -1.0, 'inf': -2.0, '-inf': -3.0}
@staticmethod
def _text(value):
"""Converts text values into `utf-8` or `ascii` strings
"""
if LooseVersion(np.__version__) < LooseVersion('1.7.0'):
if (isinstance(value, (six.text_type, np.unicode))):
value = value.encode('utf-8', 'ignore').decode('utf-8')
if isinstance(value, (np.bytes_, six.binary_type)):
value = value.decode(encoding='utf-8')
elif not isinstance(value, (np.str_, six.string_types)):
value = str(value)
return value

def __init__(self, data):
"""Create mapping between unique categorical values
and numerical identifier

Parameters
class UnitData(object):
def __init__(self, data=None):
"""Create mapping between unique categorical values
and integer identifiers
----------
data: iterable
sequence of values
sequence of string values
"""
self.seq, self.locs = [], []
self._set_seq_locs(data, 0)

def update(self, new_data):
# so as not to conflict with spdict
value = max(max(self.locs) + 1, 0)
self._set_seq_locs(new_data, value)

def _set_seq_locs(self, data, value):
strdata = shim_array(data)
new_s = [d for d in np.unique(strdata) if d not in self.seq]
for ns in new_s:
self.seq.append(ns)
if ns in UnitData.spdict:
self.locs.append(UnitData.spdict[ns])
else:
self.locs.append(value)
value += 1
self._mapping = OrderedDict()
self._counter = itertools.count(start=0)
if data is not None:
self.update(data)

def update(self, data):
"""Maps new values to integer identifiers.

Paramters
---------
data: iterable
sequence of string values

Raises
------
TypeError
If the value in data is not a string, unicode, bytes type
"""
data = np.atleast_1d(np.array(data, dtype=object))

for val in OrderedDict.fromkeys(data):
if not isinstance(val, VALID_TYPES):
raise TypeError("{val!r} is not a string".format(val=val))
if val not in self._mapping:
self._mapping[val] = next(self._counter)


# Connects the convertor to matplotlib
Expand Down
Loading