Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Re-write sym-log-norm #16391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

dstansby
Copy link
Member

@dstansby dstansby commented Feb 2, 2020

This is work I started a while ago to re-write SymLogNorm into code I could understand and read. As I wrote this I realised that the original code was just plain wrong... I have now added tests include values that are easy to manually calculate and verify. Suggestions for more tests welcome.

Fixes #16376

@dstansby dstansby added this to the v3.2.0 milestone Feb 2, 2020
@jklymak
Copy link
Member

jklymak commented Feb 2, 2020

Suggest we let the base be an option. We will need to know the base if we make a scale for the colorbar

The docstring probably needs to change as well. The linear region using is not exactly two decades which I think is done to make the derivative continuous.

I was working on this and found the existing test should be improved to actually test some numbers we could calculate by hand rather than three random numbers.

I’m happy for you to work on this but I would like to see substantive improvements to the tests and the documentation.

After we fix this we can get back to fixing the broken colorbars which I’d propose fixing by associating a scale with the norm. Any norms that don’t have a scale could fallback to the manual ticking.

@jklymak
Copy link
Member

jklymak commented Feb 2, 2020

This was done with a modified SymLogNorm that takes a base. So for sure, using base = np.e is different than base=10. Essentially a "decade" in np.e is smaller than one in base 10, and so the linear region is smaller.

Note however, that even the base=10 case does not yield equally spaced "ticks".

#scale = mscale.SymmetricalLogScale(ax.xaxis, linthreshx=1.0)
fig, ax =plt.subplots()
ax.set_xscale('symlog', linthreshx=1.0, basex=10)

ax.plot(np.arange(-100, 100, 0.1), np.arange(-100, 100, 0.1))
ax.set_xlim([-100, 100])
scale = ax.xaxis._scale

fig, ax = plt.subplots()
for normbase in [np.e, 10]:
    norm = mcolors.SymLogNorm(linthresh=1.0, linscale=1, vmin=-100, vmax=100, base=normbase)
    xx = scale._transform.transform([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200])

    print('xx', (xx - xx[1])/(xx[-2] - xx[1]))
    nn = norm([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200])
    print('nn', (nn - nn[1])/(nn[-2] - nn[1]))

    print(norm([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200]))
    ax.plot(nn, xx, '.', label=f'normbase {normbase:1.3f}')
    ax.plot(norm([-100, 100]), xx[[1, -2]], zorder=0)

Figure_2

vals = np.array([-30, -1, 2, 6], dtype=float)
normed_vals = norm(vals)
expected = [0., 0.53980074, 0.826991, 1.02758204]
expected = [-0.842119, 0.450236, 0.599528, 1.277676]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these values were just plain wrong before...

  • The first value (-30) is less than vmax (5), so should come out as less than zero
  • The second value (-1) is less than 0, so should come out as less than 0.5

@greglucas
Copy link
Contributor

The problem I see here is that the code found in scale for SymLog is basically going to be the exact same as the code in colors for SymLog because the math is the same. This is where I think taking a deep look at where the math should be placed would be helpful.

My thought is that there should be one, and only one, math Transform for SymLog. This Transform is essentially just a function y = f(x).

Scales could inherit from that Transform to identify/place ticks.
Norms could inherit from that Transform to set limits and scale values to be between 0 and 1.
A Norm could also inherit from the Scale if that's what is needed for colorbars because the Scale would have the transform already. Regardless, I think it would be helpful to link these items closer together in implementation.

This would guarantee consistency between the underlying math function chosen.

@jklymak
Copy link
Member

jklymak commented Feb 3, 2020

@anntzer is working towards that. We could also change this to use SymmetricLogTransform as a stopgap until the general solution. But we really should fix this ASAP as folks may be publishing nonsense using the Norm and not realizing it. A proper refactor could wait until after that.

@dstansby
Copy link
Member Author

dstansby commented Feb 3, 2020

I think this is ready for review. Particularly interested in opinions on

  • Forcing vmin = -vmax. I have no idea what to do if this isn't the case, so for now I'm just enforcing it.
  • Any extra tests that could/should be added.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is different than what we had before, even for base=np.e. I don't have any skin in the game whether this one is better or worse but it needs more than this API note.

It is also inconsistent with SymmetricalLogTransform, and hence SymmetricalLogScale, so that needs to be cleaned up or we can't make a proper axes for it when it gets turned into a colorbar. My understanding is that the current implementation has a smooth derivative at the transition. Does this new implementation? Do we care about that? Not sure we do...

Overall, I think this requires its own gallery page clearly explaining the properties of the transform, or cite a reference that explains exactly what the algorithm here is. People need to be able to cite what this thing is in their papers.

normed_vals = norm(vals)
assert_array_almost_equal(normed_vals, expected)
expected = [0, 0.25, 0.5, 0.75, 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is consistent with SymmetricLogTransform yet. I'm OK if we want to change that as well to be consistent with this code, but they can't be inconsistent!

import matplotlib.scale as mscale

trans = mscale.SymmetricalLogTransform(10, 1, 1)
new = trans.transform([-10, -1, 0, 1, 10])
new = (new - new[0]) / (new[-1] - new[0])
print(new)
[0.0 0.23684210526315788 0.5 0.7631578947368421 1.0]

I'm fine if this implementation is desired, but then we need to change SymmetricLogTransform.

@dstansby
Copy link
Member Author

dstansby commented Feb 3, 2020

Thanks for the comments. re.

My understanding is that the current implementation has a smooth derivative at the transition

I'm not sure what this means or how it could be implemented? Totally agreed that there should be a gallery example going through this carefully.

@jklymak
Copy link
Member

jklymak commented Feb 3, 2020

I'm not sure what this means or how it could be implemented?

I mean, I think thats why the current version is the way it is. https://iopscience.iop.org/article/10.1088/0957-0233/24/2/027001
seems the same and thats why they claim they do it.

@dstansby
Copy link
Member Author

dstansby commented Feb 3, 2020

Ah nice - thoughts on using the approach in that paper (open access here) instead of (what I am interpreting as) the current approach?

I think I'm pro using the published one because

  • It's a published thing to link to
  • It has a simple analytic form

@jklymak
Copy link
Member

jklymak commented Feb 3, 2020

Well, I prefer yours because its simpler and does what we say it should. But open to using either so long as its explained clearly...

@jklymak
Copy link
Member

jklymak commented Feb 4, 2020

@dstansby we discussed this on the call; something like this could/should go in v3.3. For 3.2, we need to add a base kwarg, and deprecate np.e as the base so the default is consistent with our docs and what most folks have probably been assuming. I'll open a PR for that right now.

@jklymak jklymak mentioned this pull request Feb 4, 2020
6 tasks
@anntzer
Copy link
Contributor

anntzer commented Feb 4, 2020

I wonder whether we really need to support a symlog norm and whether we could consider deprecating it instead. We could also consider moving it to a separate package, similar to mpl-probscale.
Especially now that we have FuncScale, it is "relatively" easy for the user to reimplement it if they really need it; conversely, it is a very non-standard scale that we should probably not be encouraging people to use. Note that the linked paper (which is also cited by the Matlab Central implementation of symlog) actually cites matplotlib as its first example for symlog, so it's a bit circular for matplotlib to refer back to it to justify the symlog scale :-) (even though I agree that ensuring continuity in the gradient seems a reasonable criterion). Also, that paper has been cited a grand total of 19 times in 7 years, which is not nothing but hardly a lot for a paper proposing something as fundamental as a new plot scaling function (the number of log-scale or power-law plots published every year is certainly orders of magnitude more than that).
If you really have "log-scaled" data with both signs, you can e.g. plot the absolute value in log scale, using e.g. different colors for the positive and negative sides. See e.g. the log(Gamma) plot by Wolfram (certainly a reasonable reference point...) at http://functions.wolfram.com/GammaBetaErf/LogGamma/introductions/Gammas/ShowAll.html, which effectively plots log(abs(Gamma)) and separately the imaginary part.

@dstansby
Copy link
Member Author

dstansby commented Feb 4, 2020

@dstansby we discussed this on the call; something like this could/should go in v3.3. For 3.2, we need to add a base kwarg, and deprecate np.e as the base so the default is consistent with our docs and what most folks have probably been assuming. I'll open a PR for that right now.

Sure; regardless of cosmetic changes I still think the maths is throwing out incorrect numbers: #16391 (comment) If we're not going to overhaul in 3.2, I think we should put a big warning on it that the function is un-tested and suspected wrong.

@jklymak
Copy link
Member

jklymak commented Feb 4, 2020

Judging by the tests, though, I don't think the original authors thought it was spitting out the wrong maths. Unfortunately they failed to write down what they thought the correct maths are.

That all said, overall I agree with @anntzer. This seems to be something we invented, versus something the scientific community is asking for. If it could be made its own package that would be fabulous.

But if we do persist in providing it in core, it has to be well-defined and documented. At the very least, the cited paper does that.

@greglucas
Copy link
Contributor

I think that everyone brings up really good points about the use of this function and that it is very subjective as to what different people would want out of it (or whether they should be using it at all).
@anntzer, I agree with your point about plotting the magnitude in log space with two different colorbars for +/-. That is exactly what I want out of this function :) this is a convenient wrapper around that to combine with a diverging colormap. It is purely a convenience for me and used to make a subjective figure that gets across the idea that we have positive and negative data spanning orders of magnitude. I'm not opposed to deprecating the function, but I also think it is quite convenient to use.

My specific use-case... I have a vector field that spans orders of magnitude and then I dot that into some other vector path (integral of v dot dx) to get a scalar field that has positive and negative values (spanning orders of magnitude) that depend on the direction of your integration path (dx). For the vector field, I write my own symlog vector scaling function that preserves angles, and I somewhat arbitrarily use a form close to what the referenced paper proposes.

def scale_vectors(x, y, scale=np.log10):
    """Scale vectors while preserving the angle.
    
    Parameters
    ==========
    x: Cartesian x coordinate(s)
    y: Cartesian y coordinate(s)
    
    scale: function to scale the magnitude by (Default: log10)"""
    mag = np.sqrt(x**2 + y**2)
    angle = np.arctan2(y, x)
    newx = scale(1 + mag)*np.cos(angle)
    newy = scale(1 + mag)*np.sin(angle)
    
    return (newx, newy)

(note that I don't think the linked paper will work with vector quantities even though they say it is bisymmetric because of the changing angles, but I didn't read it that closely).

@anntzer
Copy link
Contributor

anntzer commented Feb 4, 2020

Given that you already wrote your custom normalization (which is certainly the "responsible" thing to do), I think what Matplotlib should do is really just making sure that you can easily use it as scale/norm, not providing its own symlog?

@greglucas
Copy link
Contributor

Yes, I agree, and your previous point about using my own FuncNorm would certainly be the way to go for that. This is more for exploratory data analysis/ease of use. I'm really not opposed to getting rid of it from MPL, but since it is there and convenient, I currently use it.

@jklymak
Copy link
Member

jklymak commented Feb 4, 2020

Back in the olden days of Matlab, we had one Norm and we liked it, and we just plotted our data using that linear norm. i.e. you transform the array first and then pcolor it. Surely thats good enough for exploratory data analysis:

X = myData()  
Y = mytransform(X)
pcolormesh(Y)

@greglucas
Copy link
Contributor

I don't disagree... ¯_(ツ)_/¯

The nicety in mpl is now adding to your simple example: colorbar(), which will show -5 to 5 and in my head I'll have to convert that scale or call invtransform on the label strings myself (rather than having mpl just automatically put in -10^5 to 10^5 for me)

Again, I'm not opposed to deprecation, and I think most people are in agreement that this is a pretty niche norm.

@jklymak
Copy link
Member

jklymak commented Feb 4, 2020

Fully agreed, the point of using a Norm is that the colorbar gives you proper numbers. But that means that each norm should have an associated scale, or the advantage is pretty much nullified.

@jklymak
Copy link
Member

jklymak commented Feb 6, 2020

@anntzer, do you have more details on the proposed unification of scales and norms? I think this should be done ASAP so that most norms have a scale associated with it that the colorbar can just use. Will your proposed factory give us that?

@dstansby
Copy link
Member Author

dstansby commented Feb 6, 2020

In the meantime, are we decided on deprecating and removing symlognorm? I am happy to do this, with extensive documentation on alternatives.

@jklymak
Copy link
Member

jklymak commented Feb 6, 2020

Well maybe a quick poll and try to get main developers to vote?

👍 keep SymLogNorm and symlogscale
👎 deprecate both

@anntzer
Copy link
Contributor

anntzer commented Feb 6, 2020

My patch makes it possible to derive norms from scales (in which case the scale can known about the parent norm). Obviously you can still construct independent norms (e.g. BoundaryNorm) so not all norms will have an associated scale.

I think this should be done ASAP

Well, #14916 is not exactly recent :)

@Tillsten
Copy link
Contributor

Tillsten commented Feb 6, 2020

If you break symlog scale you break a lot of code I and a lot of other people in time-resolved spectroscopy use, please don't do that. Matplotlib is generally very careful about breaking code, so I don't see why this is not the case for scales.

As the original author of the retrospective faulty SymlogNorm, I don't care too much about it anymore. I fully agree with the found deficiencies, but my usage at the time of coding does not depend on exactly reproducing the absolute data form a colormap. Instead of the relative amplitude of the values is importent, something which is in some cases better reproduced by a symlog-scale than a linear scale. Note that the data has both positive and negative signals and varies quite a lot in its amplitude. Hence I never cared about the base, since the final differences in the map were not that visible since it the scale is normalized. Again, I retrospect this was wrong.

@rayosborn
Copy link

I have only just noticed this PR and haven't had time to understand all the issues involved, but I would like to strongly urge (aka beg) that SymLogNorm is not deprecated. A new x-ray scattering method called ΔPDF generates both positive and negative probability maps, which are ideally viewed in symmetric (often log) plots, e.g., see Krogstad, M. J. et al. Nat Mater 19, 63–68 (2020). We use MPL to plot with symmetric limits that are automatically enforced in NeXpy by choosing a divergent color map. If this is a niche, I think that it will be a growing one, since we are already collaborating with a number of research groups to produce ΔPDF data on a routine basis.

@jklymak
Copy link
Member

jklymak commented Feb 9, 2020

@rayosborn as discussed on gitter, would arcsinh suit your needs? Not saying we are going that way, but its another option.

@greglucas
Copy link
Contributor

greglucas commented Feb 9, 2020

I actually like the arcsinh suggestion, since it is a "simple" and explainable transform (although my guess is people still don't know what arcsinh is off the top of their head, I had to look it up myself: ln(z + np.sqrt(1+z^2)))

One other option I haven't seen mentioned yet would be removing the linear region completely and just making a threshold instead. This would be very simple to explain, log (of whatever base you want) everywhere. With abs(x) < threshold getting mapped to the midpoint of the scale. This would be similar to using a log scale currently where you specify a vmin and everything less than that gets clipped to the minimum value.

Edit: I just implemented this and realized it works fine for colorbar normalization, but not for x/y plots with symlog scales due to the clipping creating hard cutoffs rather than smoother transitions.

@rayosborn
Copy link

I'm not too bothered about what method is used to generate the symmetric "log" plots, since we don't analyze the images themselves. They are used to guide our interpretation and to present the data in talks and publications. If we fit any models, it is to the actual data, not the plotted representation.

@jklymak
Copy link
Member

jklymak commented Feb 10, 2020

My concern is reproducibility. I’ve hand digitized a good number of Figures from papers where the data was no longer available, and if the scale was not clearly defined in the paper it would lead to errors.

@tacaswell tacaswell modified the milestones: v3.2.0, v3.3.0 Feb 10, 2020
@QuLogic QuLogic added the status: needs comment/discussion needs consensus on next step label Apr 15, 2020
@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 5, 2020
@jklymak jklymak marked this pull request as draft July 23, 2020 01:50
@jklymak
Copy link
Member

jklymak commented Jan 5, 2021

I think this is superseded by #16457? I'll close, but feel free to re-open @dstansby if you think there is more to do.

@jklymak jklymak closed this Jan 5, 2021
@dstansby dstansby deleted the symlog-overhaul branch January 5, 2021 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SymLogNorm and SymLogScale give inconsistent results....
8 participants