Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Added FuncNorm #7631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

ENH: Added FuncNorm #7631

wants to merge 4 commits into from

Conversation

alvarosg
Copy link
Contributor

@alvarosg alvarosg commented Dec 16, 2016

This PR is part of a larger PR originally proposed to include FuncNorm and PiecewiseNorm. It was decided to split it into several PRs for easier review starting from the simpler classes.

In this case the functionality added is FuncNorm, a normalization class (inheriting from Normalize), that allows using any arbitrary function as a normalization, by specifying a callable (and a callable of the inverse function), or a string compatible with the brand new _StringFuncParser.

Examples of usage for log normalization:

    norm_log = colors.FuncNorm(f='log10', vmin=0.01)

the same can be achieved with

    norm_log = colors.FuncNorm(f=np.log10,
                               finv=lambda x: 10.**(x), vmin=0.01)

For root normalization:

    norm_sqrt = colors.FuncNorm(f='sqrt', vmin=0.0)

the same can be achieved with

    norm_sqrt = colors.FuncNorm(f='root{2}', vmin=0.)

or with

    norm_sqrt = colors.FuncNorm(f=lambda x: x**0.5,
                                finv=lambda x: x**2, vmin=0.0)

Tests have been added, as well as an example producing this output this output:

funcnorm

Possible caveat:

Most of the behaviour provided by this class does not change the existing interfaces for normalizations, as most public methods of the new class are just overridden methods from Normalize.

The only exception to this is the ticks methods, returning an educated guess on where to add tick values, which would be a new public method in the class.

On the other hand I think it works quite well (even for very non-linear normalizations) and it is important to give the user a good guess at where to put the ticks, so it is not required to manually enter values (or come up with his own automated algorithm). This way the only thing the user needs to specify is a number of ticks, which may also be easily set to a default value. We should think whether we want to include this or no, or maybe do it in a different way.

The way it is done now is by an explicit call to norm.ticks():

python <del>fig.colorbar(cax, format='%.3g', ticks=cax.norm.ticks(5), ax=ax2) <del>

A FuncLocator class has now been implemented, so ticks are set automatically to the suggested positions without explicit user input, in exactly the same way it was done for other normalizations like LogNorm.

@alvarosg
Copy link
Contributor Author

@efiring @story645 @QuLogic
In case you are also interested in helping reviewing this one :)

(norm_log, 'Log normalization'),
(norm_sqrt, 'Root normalization')]

for i, (norm, title) in enumerate(normalizations):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for i: for ax_row, (norm, title) in zip(axes, normalizations):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea :)

plt.show()


def get_data(_cache=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the data needs to be produced in the loop if it's just going to be cached. Seems a bit over-engineered for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, the reason it is like this is because originally each of the plots was done independently inside a function, but I changed it a loop to comply with feedback from @story645, and I forgot to change that, thanks :)

"""
Specify the function to be used, and its inverse, as well as other
parameters to be passed to `Normalize`. The normalization will be
calculated as (f(x)-f(vmin))/(f(max)-f(vmin)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max or vmax?

values in the [0,1] range.
"""

def __init__(self, f, finv=None, **normalize_kw):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be **kwargs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that kwargs is what is used as the general case, however, I used normalize_kw because all of these parameters are to be passed to the parent class Normalize. This is the same name convention used for subplots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the same; those are dictionaries that are individual arguments. In this case, it's not an argument, it's the placeholder that accepts all other non-explicit keyword arguments.

Copy link
Contributor Author

@alvarosg alvarosg Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right. In that case I am thinking that it may just be better to get vmin, and vmax and clip directly, and pass them explicitly to the parent class. Any downside to doing it like that_

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just follow the example of the other classes; LogNorm, BoundaryNorm and NoNorm only accept clip and the rest accept all three explicitly, so being explicit seems to be the best choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually LogNorm does not have a initialization function, so implicitly takes all three of them as well. So yes, I am convinced that it may be better to include them explicitly. It may be worth in that case to put the documentation for vmin, vmax, and clip in common variables so it can be reused across different classes, similarly to what they do here.

Inverse function of `f` that satisfies finv(f(x))==x. It is
optional in cases where `f` is provided as a string.
normalize_kw : dict, optional
Dict with keywords (`vmin`,`vmax`,`clip`) passed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a dict in this function, but it's just all-other-keyword-args to any caller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @QuLogic that these should be individually documented.

resultnorm[mask] = (self._f(result[mask]) - self._f(vmin)) / \
(self._f(vmax) - self._f(vmin))

return np.ma.array(resultnorm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a MaskedArray? Is that just what other Norms do? It doesn't seem like anything is actually masked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely, the parent class returns masked arrays, even though it does not really ever sets the mask to anything. It would make sense to use them for values outside the range, the problem is that in that case there would not be a way to say whether they are above the maximum value, or below the minimum, and the plotting methods need this to use the under and over colours.

if clip:
result = np.clip(result, vmin, vmax)
resultnorm = (self._f(result) - self._f(vmin)) / \
(self._f(vmax) - self._f(vmin))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to cache any of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been considering this but the problem is that the cache would depend on vmin, and vmax, and checking whether the cache is up to date, along as having to include new variables for the cache, would make the code much uglier.

I guess it would make sense in cases when evaluating f is expensive, but even in those cases, we would still have to evaluate f(result), which typically will consist on many values. Also, in general, the functions typically used for normalization should not be very expensive to evaluate... (although we should never underestimate the user, hehehe)

return value

@staticmethod
def _fun_normalizer(fun):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, this was used by some of the derived classes in the original PR, and I figured this was the best place for all of them to have access as it is a general purpose normalization feature. I will remove it for now, and then when can decide where to include when it is necessary for the first time.

assert_array_equal(norm([0.01, 2]), [0, 1.0])

def test_limits_without_vmin(self):
norm = mcolors.FuncNorm(f='log10', vmax=2.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same vmax you would get if you didn't set it, so I guess it doesn't really test that it's working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that is a test on itself :P
You are right though, I will include tests where the values go above and below vmin, vmax, with and without the clip option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added added test_clip_true, test_clip_false, test_clip__default_false to test the clipping behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes should pretty much address the original comment

if self.vmin > self.vmax:
raise ValueError("vmin must be smaller than vmax")

def ticks(self, nticks=13):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the mixing of concerns here, but I'll leave that to @efiring to determine.

Copy link
Contributor Author

@alvarosg alvarosg Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also was not sure of this, because technically vmin, and vmax, do not belong to this class. Actually the only thing my autoscale methods do differently is to convert to float, so maybe I should just make a tiny change to the autoscale methods of Normalize to resemble this:

Methods in Normalize:

    def autoscale(self, A):
        self.vmin = np.ma.min(A)
        self.vmax = np.ma.max(A)

    def autoscale_None(self, A):
        ' autoscale only None-valued vmin or vmax'
        if self.vmin is None and np.size(A) > 0:
            self.vmin = np.ma.min(A)
        if self.vmax is None and np.size(A) > 0:
            self.vmax = np.ma.max(A)

Methods in FuncNorm:

    def autoscale(self, A):
        self.vmin = float(np.ma.min(A))
        self.vmax = float(np.ma.max(A))

    def autoscale_None(self, A):
        if self.vmin is None:
            self.vmin = float(np.ma.min(A))
        if self.vmax is None:
            self.vmax = float(np.ma.max(A))
        self.vmin = float(self.vmin)
        self.vmax = float(self.vmax)
        if self.vmin > self.vmax:
            raise ValueError("vmin must be smaller than vmax")

@efiring would it be ok, to include those changes (casting to float and vmax>vmin check) in Normalize, and remove the methods from FuncNorm?


return _cache[0]

main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like this should be in a main block, and might as well just directly put all the plotting code there instead of shoving it into a function...so

if __name__ == '__main__':
    all the code currently in main()

Copy link
Contributor Author

@alvarosg alvarosg Dec 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did not do the main originally, is because we now the examples generate the figures automatically through some process, and @efiring and I were not sure whether the actual process would run the file as main.

Yeah of course, I guess the only reason to have things in function is so the data generation could be after the rest, but maybe now that is much shortened it will not look that bad right in between were the norms are generated, and where the loop starts.

Copy link
Member

@QuLogic QuLogic Dec 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vast majority of examples (like >90%) use neither main, nor __main__ stuff, though I'm sure some examples use classes derived from backend-specific things that I didn't count properly.

fig, axes = plt.subplots(3, 2, gridspec_kw={
'width_ratios': [1, 3.5]}, figsize=plt.figaspect(0.6))

# Example of logarithm normalization using FuncNorm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraneous comment 'cause code explicitly shows this


# Example of logarithm normalization using FuncNorm
norm_log = colors.FuncNorm(f='log10', vmin=0.01)
# The same can be achieved with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment should be dropped and there should be a standalone example for that feature


# Example of root normalization using FuncNorm
norm_sqrt = colors.FuncNorm(f='sqrt', vmin=0.0)
# The same can be achieved with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, presenting users with 3 ways to do things off the bat can be super confusing. Also, is it really necessary to use two examples of the same norm in 1 example? I know you'll likely tell me it's more realistic, but I think examples should fundementally as small/basic as possible while still showing the functionality

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two examples of the same norm in 1 example?

If you're referring to the two Axes, one is the norm function, the other is the actual usage for a colormap; see the figure at the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really see a problem of having an example of multiple use in this case (both the log, and the sqrt), but I am happy to change it if everyone thinks that the example image is not appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that presenting multiple ways of doing the same thing (with the extra comment) provides the user an extra insight of what it can be done with the class, at a very low cost. But again, if everyone agrees that the comments are inappropriate, I am also happy to remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuLogic I was talking about the log and the sqrt norms and then also providing multiple methods of doing things. @alvarosg is there a colleague you can "hallway test" these docs (and warning messages) with. You have them read the doc/message and just ask what they think (and how they think it could be improved)

ticks = cax.norm.ticks(5) if norm else np.linspace(0, 1, 6)
fig.colorbar(cax, format='%.3g', ticks=ticks, ax=ax2)
ax2.set_title(title)
ax2.axes.get_xaxis().set_ticks([])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ax2.xaxis.set_ticks([])
ax2.yaxis.set_ticks([])

f = func_parser.function
finv = func_parser.inverse
if not callable(f):
raise ValueError("`f` must be a callable or a string.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f must be a function or a string (I don't like using callable in user facing docs 'cause I think it's a little too dev space)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always the user of a python module will also be a developer, and callable is a keyword of python, so IMO it is more clearer than function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true at all though in this case. You've got plenty of users for matplotlib in particular who are scientists but not devs who aren't gonna be familiar with any python keyword they don't use all the time (and callable is rarely in that set)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that at least when they are native English speakers they can figure it out quickly enough from the context and the structure of the word itself, "callable" -> "call" "able" -> "something that can be called". The word "string" would be much harder to understand than "callable"--it's pure comp-sci jargon, not used anywhere else in this way, and not something that can be figured out from the word itself. We are not going to delete uses of "string" or "callable".

Copy link
Member

@story645 story645 Dec 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable is equivalent to function.. You'd still need to mention it was a string. And string is different cause it's used in every single intro python everything, callable isn't. Honestly, callable trips me up all the time and I'm a native English speaker with a CS background.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I dunno I see your point but a) I'm always wary of straight transcriptions of the if statements that triggered the exceptions being the error messages b) I sort of think their should maybe be a bigger discussion of who is matplotlib's expected audience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it callable, because I think is a more accurate term. I think anyone able to use a callable (to pass it to the function) should know the term, and if not should be able to do a 5 s google search. In any case, let´s not waste our energy discussion this, as I think it is pretty irrelevant.

Copy link
Member

@story645 story645 Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree with you that this specific thing probably isn't worth fighting about, I feel in a general sense that it's bad practice to dismiss a usability concern as "well they should know what it's called and how to search for it" 'cause rarely are either of those statements true.

self._f = f
self._finv = finv

super(FuncNorm, self).__init__(**normalize_kw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason this is put at the end rather than upfront?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I will move it up

result, is_scalar = self.process_value(value)
self.autoscale_None(result)

vmin = float(self.vmin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does self.vmin/self.vmin need to be converted to float? I think there's an import at the top that forces division to always be floating point...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point, I did not noticed that, I guess there is no need, in that case. Thanks!

resultnorm = result.copy()
mask_over = result > vmax
mask_under = result < vmin
mask = (result >= vmin) * (result <= vmax)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this 'cause I feel like this is a bit too much on the clever but obfuscating side?
mask = mask_over || mask_under
and then just use ~mask everywhere you're using mask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or mask = ~(mask_over | mask_under) or mask = ~mask_over & ~mask_under?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I am always up for improving the efficiency!

return ticks

@staticmethod
def _round_ticks(ticks, permanenttick):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as @QuLogic , question for @efiring about mixing concerns. Wonder if all the tick stuff should be in a private class (or public) in ticker and then normalize should just point to the default formatter and locators it should use. (this is an issue I ran headlong into w/ catagorical norming too...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am definitely not very happy with the current with this the way it is...

and then normalize should just point to the default formatter and locators it should use

How does Normalize communicates with the formatters and tickers, is there any good example around?

Copy link
Member

@story645 story645 Dec 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Normalize communicates with the formatters and tickers, is there any good example around?

It's really messy currently (code is in colorbar.py and would probably require a refactor in the call stack. But that doesn't really matter to you. Since you're having them explictly get ticks via

ticks = cax.norm.ticks(5) if norm else np.linspace(0, 1, 6) 
fig.colorbar(cax, format='%.3g', ticks=ticks, ax=ax_right)

cax.norm.ticks should really likely be it's own tick Locator Method that locates ticks based on some input (I guess convoluted functions). The downside is that it can't rely on the attributes in the norm (unless it's something like FuncNormLocator(norm)), but I think that prevents scope creep in norms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I ideally I would not like the user to have to call ticks manually, but to get those ticks automatically, but I was not sure how to change the default ticker, to maybe implement a FuncTicker class...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easy to modify colorbar to use the ticks method of its Norm, if it exists, and if ticks are not provided by the user. The alternative of having all tick locators and formatters in tickers.py, and having Norms include a method or attributes for default locators and formatters, is also reasonable. I'm going to leave this question open for the moment, but we will need to return to it. I suspect the second of these two approaches will turn out to be the best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would, but the problem is that the colorbar has two different ways to represent the colorbar, depending on spacing:

  • 'uniform': Represent the colorbar uniformly between 0 and 1 and then assign values for the ticks that are not uniform according to the normalization. This is the one I normally use, and the ones that should use the tick values returned by ticks.
  • 'proportional': Stretch/compress the colorbar to represent the non-linearities given by the normalization, so the actual axis in the colorbar is uniform on the data values. In this case selecting the ticks is the same as with any linear axis.

What if I just make a FuncNorm locator class, and then add the corresponding line here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiring @story645 I have now made a new class FuncLocator (And tests), to include the behavior of proposing tick locations in a more appropriate place.

Instead of taking the norm itself as a parameter of the locator, I decided to just pass a method with the direct and inverse transformation. Then references to the methods of the norm object are passed in the initialization. This way the ticker module does not depend directly on colors, and on FuncNorm in particular, but still, if the FuncNorm instance is modified (limits for example, after calling clim), FuncNorm will adapt the behaviour if his methods and this will be available to the locator.

If you could please take a look, I am sure you can provide useful feedback.

PS: Happy new year :D

@phobson
Copy link
Member

phobson commented Dec 16, 2016

I wonder if supporting/documenting the string arguments adds more complexity than usability and benefit.

From my perspective, we're relying on the private _StringFuncParser that is fairly new (not battle tested?). But sense we're exposing it through this, we're effection committing to documenting and making its API public.

What's a use case where a string is preferable to just using a numpy function or simply one of your own?

@alvarosg
Copy link
Contributor Author

@phobson
About the use case, the main advantage is not to have to specify both the function and the inverse: but specifying the string it can be parsed and both can be obtained.

This may not be very relevant here (even though the first post already shows the simplicity of using a string vs two callables), because there is only one function, and because if they user is advanced enough to want to use this, he may not mind using two callables.

However, one of the next steps is to implement other classes which take more than one function (and more than one inverse). Some of those may never get implemented, but I am interested in particular on MirrorPiecewiseNorm. This one will normalize a scale symmetrically (or not) around a value, e.g. zero. By setting the different functions on each size, one can control how data is stretched around zero, on the positive side and in the negative side, and the easiest way is to get qualitative change is to apply polynomials, or roots of different degrees to each size. By providing the string option, the user still gets most of the functions that he would normally use, and does not need to specify the inverse function for each of them.

I think the key is not exposing _StringFuncParser to the user directly at all, so we can switch to another service in the future if we find something better. About battle testing the parser, I guess it is simple enough so in case there is a problem it can be fixed, but since the range of inputs is very limited, the tests can actually cover most cases.


def main():
fig, axes = plt.subplots(3, 2, gridspec_kw={
'width_ratios': [1, 3.5]}, figsize=plt.figaspect(0.6))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indent is kind of funny here; I'd break before the gridspec_kw, not in the middle of its value, if possible. Also, you can add sharex='col' and this will automatically remove the tick labels in between plots.

@phobson
Copy link
Member

phobson commented Dec 17, 2016

About the use case, the main advantage is not to have to specify both the function and the inverse: but specifying the string it can be parsed and both can be obtained.

That's pretty compelling. Thanks for clearing that up.

I think the key is not exposing _StringFuncParser to the user directly at all, so we can switch to another service in the future if we find something better

I guess I'm thinking that since we're accepting input and directly passing it to _StringFuncParser, we are exposing the mini-language-ness of it (e.g., braces and the like). So any future change to a new parser will have to implement that same syntax. But I don't really see a way around that, ATM.

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Dec 17, 2016
norm = mcolors.FuncNorm(f='log10', vmin=0.01, vmax=2.)
x = np.linspace(0.01, 2, 10)
assert_array_almost_equal(x, norm.inverse(norm(x)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for scalar values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now added

@codecov-io
Copy link

codecov-io commented Dec 24, 2016

Current coverage is 63.57% (diff: 88.17%)

Merging #7631 into master will increase coverage by 1.51%

@@             master      #7631   diff @@
==========================================
  Files           174        174           
  Lines         56120      65699   +9579   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          34826      41765   +6939   
- Misses        21294      23934   +2640   
  Partials          0          0           

Powered by Codecov. Last update 841a427...44658d4

Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor recommendations; but there are a couple of major questions that I need to return to:

  • How to handle ticking?
  • Should masked array inputs with masked points be handled differently?

I almost forgot: I am also wondering about whether more restrictions on functions, and checks on values, are needed, specifically to ensure that functions are monotonic, bounded, (and strictly increasing?) over the range of normalization. For example, if a user asks for 'square' and feeds in data from -1 to 1, it won't be good...

"""
Creates a normalizer using a custom function

The normalizer will be a function mapping the data values into colormap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would start the docstring (which is describing a class) with "A norm based on a monotonic function". Then a blank line, followed by the second sentence of the present init docstring, followed by the remainder of the present init docstring (Parameters, etc.). This is in accord with the numpydoc specification for classes: the init args and kwargs are described in the class docstring, and there is no need for an init docstring at all.

f : callable or string
Function to be used for the normalization receiving a single
parameter, compatible with scalar values and ndarrays.
Alternatively a string from the list ['linear', 'quadratic',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to put the list of strings and the explanation of "p" in the Notes section. The advantage is that it would set it apart, and keep the Parameters block from being so long. The disadvantage is that it might be separating it too much from its parameter. It's up to you.

can be used, replacing 'p' by the corresponding value of the
parameter, when present.
finv : callable, optional
Inverse function of `f` that satisfies finv(f(x))==x.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see more concise docstrings, comments, and code, in general. I won't try to identify every opportunity for shortening things, but I will make some suggestions. Here, the line could be "Inverse of f: finv(f(x)) == x." Below, clarify by saying "Optional and ignored when f is a string; otherwise, required."

vmin : float or None, optional
Value assigned to the lower limit of the colormap. If None, it
will be assigned to the minimum value of the data provided.
Default None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you could combine vmin with vmax, reverse the order (-> "vmin, vmax: None or float, optional") and delete the "Default None" line. Then, just "Data values to be mapped to 0 and 1. If either is None, it is assigned the minimum or maximum value of the data supplied to the first call of the norm." Let's leave the word "colormap" out, using it only where necessary, as in the clip explanation.

Default None.
clip : bool, optional
If True, any value below `vmin` will be clipped to `vmin`, and
any value above `vmax` will be clip to `vmin`. This effectively
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'clip : bool, optional, default is False' and then delete the last line of the docstring. In addition to being more concise, having the default up front makes it more obvious. Then, 'If True, clip data values to [vmin, vmax]. This defeats ... colormap. If False, ... respectively.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, the default option is specified on the description, not in the specification, right?
From numpydoc:

Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

# the limits vmin and vmax may require changing/updating the
# function depending on vmin/vmax, for example rescaling it
# to accommodate to the new interval.
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "pass", not "return". "pass" is the "do nothing" word.


self._check_vmin_vmax()
vmin = float(self.vmin)
vmax = float(self.vmax)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should _check_vmin_vmax do the float conversion and return the two values, so you can write vmin, vmax = self._check_vmin_vmax()?


Parameters
----------
value : float or ndarray of floats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a masked array, to handle missing values, or a python sequence, and it doesn't have to be float. So maybe just say "scalar or array-like".

resultnorm[mask] = (self._f(result[mask]) - self._f(vmin)) / \
(self._f(vmax) - self._f(vmin))

resultnorm = np.ma.array(resultnorm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary, because process_value() makes result a masked array, and all the operations you are doing after that appear to preserve the masked array type. Since your string-based functions like log10 are np.log10 and not np.ma.log10, however, they are preserving the original mask but not suppressing the warnings as the ma versions would do. (I'm actually surprised that the np versions are returning with the invalid values masked; maybe this has been added in newer numpy versions.)

return ticks

@staticmethod
def _round_ticks(ticks, permanenttick):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easy to modify colorbar to use the ticks method of its Norm, if it exists, and if ticks are not provided by the user. The alternative of having all tick locators and formatters in tickers.py, and having Norms include a method or attributes for default locators and formatters, is also reasonable. I'm going to leave this question open for the moment, but we will need to return to it. I suspect the second of these two approaches will turn out to be the best.

@alvarosg
Copy link
Contributor Author

@efiring

Thanks for the review, I essentially agree with pretty much everything, and implement those changes when I have some time :)

About the major questions:

How to handle ticking?

I do not have a strong opinion on what to do here. I think the functionality in the ticks function is interesting, but about where should that be, I would leave up to the people with a deeper knowledge of the library.

Should masked array inputs with masked points be handled differently?

I tried to do it the same way it was done for the other normalizations, I will double check though.

I am also wondering about whether more restrictions on functions, and checks on values, are needed, specifically to ensure that functions are monotonic, bounded, (and strictly increasing?) over the range of normalization. For example, if a user asks for 'square' and feeds in data from -1 to 1, it won't be good...

I completely agree on this. The problem here is that the issue it is quite different for callables that for predefined string functions:

  • Callables: It is numerically impossible to check if a given callable is bounded and strictly increasing with a finite number of operations. I had long discussions about this on a PR for scipy to include funcionality for providing the numerical inverse of a function (I also needs to be strictly monotonic), and that was the main conclusion. The only think we can do is explicitly include the conditions on the callable in the documentation.
  • Strings: In this case, since we have the analytical form of the function, we could actually do those checks. The main problem, specially check that the data is within the bounds (0,Inf) for the 'sqrt' case. The tricky part is that this check is data dependent, and would need to be done for each possible calculation. So this would involve storing some extra data from checks.

This would be my approach to solve it:

  • Include an extra parameter to FuncNorm: validity_range which by default would be set to [-np.inf, np.inf].
  • This parameter is stored as an attribute, and data is always checked against those boundaries. Worst case scenario, the user forgets to set it (or in some case it is not necessary to set it, this is why I rather leave it optional), and the behaviour is exactly like now.
  • We add a new method to _FuncInfo (Maybe), that returns the validity range of the predefined functions. When the input is a string, these values are used to fill the new validity_range attribute.
  • Some of the tricky aspects of this are related to the cases where the validity ranges are open intervals. For example, in a logarithm normalization, the function is not bounded in the [0, inf) interval, but it is bounded in the [epsilon,inf) interval, This complicates things a little bit, because it would also involve storing another variable indicating whether the validity_range represents and open or closed interval on each end. If would be something similar to this (domain and open_domain arguments).

Taking all this into account, I may still prefer just to specify very clearly in the documentation that the function must be strictly increasing and bounded in the [vmin, vmax] interval, which by default will be the bounds of the data to be normalizaed.

@alvarosg
Copy link
Contributor Author

alvarosg commented Feb 5, 2017

@efiring Did you get a chance to look at the changes I implemented about a couple of weeks ago?

@efiring
Copy link
Member

efiring commented Feb 6, 2017

@alvarosg I apologize for having neglected this for so long--it has been on my conscience. I think I can get to it on Saturday, but probably not before then.

@QuLogic
Copy link
Member

QuLogic commented May 13, 2017

Ping @efiring.

@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Aug 29, 2017
Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(copied from #7294)

I strongly

  • oppose adding a new domain-specific language to describe functions (root{n}, etc). Yes, the parser has already been merged into cbook but it is not actually used right now and would prefer removing it.
  • believe that we should still first sort out the overlap between scales, norms, locators, and formatters that has been (much) discussed above, before adding more complex functionality.
    As usual, other devs should feel free to override this review if there is consensus to do so.

@anntzer anntzer modified the milestones: needs sorting, unassigned Feb 17, 2018
@anntzer
Copy link
Contributor

anntzer commented Feb 17, 2018

Closing based on lack of comments over the rejection two weeks ago.

@anntzer anntzer closed this Feb 17, 2018
@jklymak
Copy link
Member

jklymak commented Feb 18, 2018

I agree w/ @anntzer closing this given the current API proposed by this PR.

I think this could be re-opened a) if the "string" representation went away, and b) if some thought was put into if all the norms should be implemented with this mechanism, which would require a bit of refactoring, but doesn't seem un-fathomably difficult.

I only somewhat agree that tick locators should be an issue. The obvious default is just equally-spaced ticks in normalized space that will fall where they may in data space, unless a special Locator is provided. I don't see what else a general tool is supposed to do.

I disagree w/ the original authors suggestion to have a bunch of extra parameters to specify the range over which normalization is valid; just specify the ranges in the user-supplied function. Yet another argument against the string-representation of the norms.

I am not interested in doing this work myself. But I'd happily re-open if someone else wanted to refactor this a bit.

@story645
Copy link
Member

story645 commented Feb 19, 2018

if some thought was put into if all the norms should be implemented with this mechanism, which would require a bit of refactoring, but doesn't seem un-fathomably difficult.

While I think this is doable, I dunno that a variant of this PR should be held up because of that. A version of FuncNorm could always go in first and then other norms rafactored against it (which I think will be smoother from a reviewing point of view anyway).

@jklymak
Copy link
Member

jklymak commented Feb 19, 2018

@story645 I agree, thats possible, but ideally some thought would be given to the API of this PR to make sure that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants