Re-write sym-log-norm #16391

dstansby · 2020-02-02T20:40:38Z

This is work I started a while ago to re-write SymLogNorm into code I could understand and read. As I wrote this I realised that the original code was just plain wrong... I have now added tests include values that are easy to manually calculate and verify. Suggestions for more tests welcome.

Fixes #16376

jklymak · 2020-02-02T21:04:24Z

Suggest we let the base be an option. We will need to know the base if we make a scale for the colorbar

The docstring probably needs to change as well. The linear region using is not exactly two decades which I think is done to make the derivative continuous.

I was working on this and found the existing test should be improved to actually test some numbers we could calculate by hand rather than three random numbers.

I’m happy for you to work on this but I would like to see substantive improvements to the tests and the documentation.

After we fix this we can get back to fixing the broken colorbars which I’d propose fixing by associating a scale with the norm. Any norms that don’t have a scale could fallback to the manual ticking.

jklymak · 2020-02-02T21:37:39Z

This was done with a modified SymLogNorm that takes a base. So for sure, using base = np.e is different than base=10. Essentially a "decade" in np.e is smaller than one in base 10, and so the linear region is smaller.

Note however, that even the base=10 case does not yield equally spaced "ticks".

#scale = mscale.SymmetricalLogScale(ax.xaxis, linthreshx=1.0)
fig, ax =plt.subplots()
ax.set_xscale('symlog', linthreshx=1.0, basex=10)

ax.plot(np.arange(-100, 100, 0.1), np.arange(-100, 100, 0.1))
ax.set_xlim([-100, 100])
scale = ax.xaxis._scale

fig, ax = plt.subplots()
for normbase in [np.e, 10]:
    norm = mcolors.SymLogNorm(linthresh=1.0, linscale=1, vmin=-100, vmax=100, base=normbase)
    xx = scale._transform.transform([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200])

    print('xx', (xx - xx[1])/(xx[-2] - xx[1]))
    nn = norm([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200])
    print('nn', (nn - nn[1])/(nn[-2] - nn[1]))

    print(norm([-200, -100, -10, -1, -0.5, 0, 0.5, 1, 10, 100, 200]))
    ax.plot(nn, xx, '.', label=f'normbase {normbase:1.3f}')
    ax.plot(norm([-100, 100]), xx[[1, -2]], zorder=0)

dstansby · 2020-02-03T13:35:15Z

lib/matplotlib/tests/test_colors.py

    vals = np.array([-30, -1, 2, 6], dtype=float)
    normed_vals = norm(vals)
-    expected = [0., 0.53980074, 0.826991, 1.02758204]
+    expected = [-0.842119, 0.450236, 0.599528, 1.277676]


I think these values were just plain wrong before...

The first value (-30) is less than vmax (5), so should come out as less than zero

The second value (-1) is less than 0, so should come out as less than 0.5

greglucas · 2020-02-03T15:34:02Z

The problem I see here is that the code found in scale for SymLog is basically going to be the exact same as the code in colors for SymLog because the math is the same. This is where I think taking a deep look at where the math should be placed would be helpful.

My thought is that there should be one, and only one, math Transform for SymLog. This Transform is essentially just a function y = f(x).

Scales could inherit from that Transform to identify/place ticks.
Norms could inherit from that Transform to set limits and scale values to be between 0 and 1.
A Norm could also inherit from the Scale if that's what is needed for colorbars because the Scale would have the transform already. Regardless, I think it would be helpful to link these items closer together in implementation.

This would guarantee consistency between the underlying math function chosen.

jklymak · 2020-02-03T15:37:38Z

@anntzer is working towards that. We could also change this to use SymmetricLogTransform as a stopgap until the general solution. But we really should fix this ASAP as folks may be publishing nonsense using the Norm and not realizing it. A proper refactor could wait until after that.

dstansby · 2020-02-03T15:45:40Z

I think this is ready for review. Particularly interested in opinions on

Forcing vmin = -vmax. I have no idea what to do if this isn't the case, so for now I'm just enforcing it.
Any extra tests that could/should be added.

jklymak

This implementation is different than what we had before, even for base=np.e. I don't have any skin in the game whether this one is better or worse but it needs more than this API note.

It is also inconsistent with SymmetricalLogTransform, and hence SymmetricalLogScale, so that needs to be cleaned up or we can't make a proper axes for it when it gets turned into a colorbar. My understanding is that the current implementation has a smooth derivative at the transition. Does this new implementation? Do we care about that? Not sure we do...

Overall, I think this requires its own gallery page clearly explaining the properties of the transform, or cite a reference that explains exactly what the algorithm here is. People need to be able to cite what this thing is in their papers.

jklymak · 2020-02-03T17:15:42Z

lib/matplotlib/tests/test_colors.py

    normed_vals = norm(vals)
-    assert_array_almost_equal(normed_vals, expected)
+    expected = [0, 0.25, 0.5, 0.75, 1]


I don't think this is consistent with SymmetricLogTransform yet. I'm OK if we want to change that as well to be consistent with this code, but they can't be inconsistent!

import matplotlib.scale as mscale trans = mscale.SymmetricalLogTransform(10, 1, 1) new = trans.transform([-10, -1, 0, 1, 10]) new = (new - new[0]) / (new[-1] - new[0]) print(new)

[0.0 0.23684210526315788 0.5 0.7631578947368421 1.0]

I'm fine if this implementation is desired, but then we need to change SymmetricLogTransform.

dstansby · 2020-02-03T18:26:46Z

Thanks for the comments. re.

My understanding is that the current implementation has a smooth derivative at the transition

I'm not sure what this means or how it could be implemented? Totally agreed that there should be a gallery example going through this carefully.

jklymak · 2020-02-03T18:36:55Z

I'm not sure what this means or how it could be implemented?

I mean, I think thats why the current version is the way it is. https://iopscience.iop.org/article/10.1088/0957-0233/24/2/027001
seems the same and thats why they claim they do it.

dstansby · 2020-02-03T19:39:27Z

Ah nice - thoughts on using the approach in that paper (open access here) instead of (what I am interpreting as) the current approach?

I think I'm pro using the published one because

It's a published thing to link to
It has a simple analytic form

jklymak · 2020-02-03T19:55:52Z

Well, I prefer yours because its simpler and does what we say it should. But open to using either so long as its explained clearly...

jklymak · 2020-02-04T05:05:32Z

@dstansby we discussed this on the call; something like this could/should go in v3.3. For 3.2, we need to add a base kwarg, and deprecate np.e as the base so the default is consistent with our docs and what most folks have probably been assuming. I'll open a PR for that right now.

anntzer · 2020-02-04T06:48:45Z

I wonder whether we really need to support a symlog norm and whether we could consider deprecating it instead. We could also consider moving it to a separate package, similar to mpl-probscale.
Especially now that we have FuncScale, it is "relatively" easy for the user to reimplement it if they really need it; conversely, it is a very non-standard scale that we should probably not be encouraging people to use. Note that the linked paper (which is also cited by the Matlab Central implementation of symlog) actually cites matplotlib as its first example for symlog, so it's a bit circular for matplotlib to refer back to it to justify the symlog scale :-) (even though I agree that ensuring continuity in the gradient seems a reasonable criterion). Also, that paper has been cited a grand total of 19 times in 7 years, which is not nothing but hardly a lot for a paper proposing something as fundamental as a new plot scaling function (the number of log-scale or power-law plots published every year is certainly orders of magnitude more than that).
If you really have "log-scaled" data with both signs, you can e.g. plot the absolute value in log scale, using e.g. different colors for the positive and negative sides. See e.g. the log(Gamma) plot by Wolfram (certainly a reasonable reference point...) at http://functions.wolfram.com/GammaBetaErf/LogGamma/introductions/Gammas/ShowAll.html, which effectively plots log(abs(Gamma)) and separately the imaginary part.

dstansby · 2020-02-04T09:37:05Z

@dstansby we discussed this on the call; something like this could/should go in v3.3. For 3.2, we need to add a base kwarg, and deprecate np.e as the base so the default is consistent with our docs and what most folks have probably been assuming. I'll open a PR for that right now.

Sure; regardless of cosmetic changes I still think the maths is throwing out incorrect numbers: #16391 (comment) If we're not going to overhaul in 3.2, I think we should put a big warning on it that the function is un-tested and suspected wrong.

jklymak · 2020-02-04T14:48:15Z

Judging by the tests, though, I don't think the original authors thought it was spitting out the wrong maths. Unfortunately they failed to write down what they thought the correct maths are.

That all said, overall I agree with @anntzer. This seems to be something we invented, versus something the scientific community is asking for. If it could be made its own package that would be fabulous.

But if we do persist in providing it in core, it has to be well-defined and documented. At the very least, the cited paper does that.

greglucas · 2020-02-04T16:18:25Z

I think that everyone brings up really good points about the use of this function and that it is very subjective as to what different people would want out of it (or whether they should be using it at all).
@anntzer, I agree with your point about plotting the magnitude in log space with two different colorbars for +/-. That is exactly what I want out of this function :) this is a convenient wrapper around that to combine with a diverging colormap. It is purely a convenience for me and used to make a subjective figure that gets across the idea that we have positive and negative data spanning orders of magnitude. I'm not opposed to deprecating the function, but I also think it is quite convenient to use.

My specific use-case... I have a vector field that spans orders of magnitude and then I dot that into some other vector path (integral of v dot dx) to get a scalar field that has positive and negative values (spanning orders of magnitude) that depend on the direction of your integration path (dx). For the vector field, I write my own symlog vector scaling function that preserves angles, and I somewhat arbitrarily use a form close to what the referenced paper proposes.

def scale_vectors(x, y, scale=np.log10):
    """Scale vectors while preserving the angle.
    
    Parameters
    ==========
    x: Cartesian x coordinate(s)
    y: Cartesian y coordinate(s)
    
    scale: function to scale the magnitude by (Default: log10)"""
    mag = np.sqrt(x**2 + y**2)
    angle = np.arctan2(y, x)
    newx = scale(1 + mag)*np.cos(angle)
    newy = scale(1 + mag)*np.sin(angle)
    
    return (newx, newy)

(note that I don't think the linked paper will work with vector quantities even though they say it is bisymmetric because of the changing angles, but I didn't read it that closely).

anntzer · 2020-02-04T16:23:03Z

Given that you already wrote your custom normalization (which is certainly the "responsible" thing to do), I think what Matplotlib should do is really just making sure that you can easily use it as scale/norm, not providing its own symlog?

greglucas · 2020-02-04T16:26:27Z

Yes, I agree, and your previous point about using my own FuncNorm would certainly be the way to go for that. This is more for exploratory data analysis/ease of use. I'm really not opposed to getting rid of it from MPL, but since it is there and convenient, I currently use it.

jklymak · 2020-02-04T16:44:10Z

Back in the olden days of Matlab, we had one Norm and we liked it, and we just plotted our data using that linear norm. i.e. you transform the array first and then pcolor it. Surely thats good enough for exploratory data analysis:

X = myData()  
Y = mytransform(X)
pcolormesh(Y)

greglucas · 2020-02-04T16:50:58Z

I don't disagree... ¯_(ツ)_/¯

The nicety in mpl is now adding to your simple example: colorbar(), which will show -5 to 5 and in my head I'll have to convert that scale or call invtransform on the label strings myself (rather than having mpl just automatically put in -10^5 to 10^5 for me)

Again, I'm not opposed to deprecation, and I think most people are in agreement that this is a pretty niche norm.

jklymak · 2020-02-04T17:16:25Z

Fully agreed, the point of using a Norm is that the colorbar gives you proper numbers. But that means that each norm should have an associated scale, or the advantage is pretty much nullified.

jklymak · 2020-02-06T01:57:15Z

@anntzer, do you have more details on the proposed unification of scales and norms? I think this should be done ASAP so that most norms have a scale associated with it that the colorbar can just use. Will your proposed factory give us that?

dstansby · 2020-02-06T09:58:23Z

In the meantime, are we decided on deprecating and removing symlognorm? I am happy to do this, with extensive documentation on alternatives.

jklymak · 2020-02-06T15:50:29Z

Well maybe a quick poll and try to get main developers to vote?

👍 keep SymLogNorm and symlogscale
👎 deprecate both

anntzer · 2020-02-06T17:32:37Z

My patch makes it possible to derive norms from scales (in which case the scale can known about the parent norm). Obviously you can still construct independent norms (e.g. BoundaryNorm) so not all norms will have an associated scale.

I think this should be done ASAP

Well, #14916 is not exactly recent :)

Tillsten · 2020-02-06T22:47:00Z

If you break symlog scale you break a lot of code I and a lot of other people in time-resolved spectroscopy use, please don't do that. Matplotlib is generally very careful about breaking code, so I don't see why this is not the case for scales.

As the original author of the retrospective faulty SymlogNorm, I don't care too much about it anymore. I fully agree with the found deficiencies, but my usage at the time of coding does not depend on exactly reproducing the absolute data form a colormap. Instead of the relative amplitude of the values is importent, something which is in some cases better reproduced by a symlog-scale than a linear scale. Note that the data has both positive and negative signals and varies quite a lot in its amplitude. Hence I never cared about the base, since the final differences in the map were not that visible since it the scale is normalized. Again, I retrospect this was wrong.

rayosborn · 2020-02-08T23:50:42Z

I have only just noticed this PR and haven't had time to understand all the issues involved, but I would like to strongly urge (aka beg) that SymLogNorm is not deprecated. A new x-ray scattering method called ΔPDF generates both positive and negative probability maps, which are ideally viewed in symmetric (often log) plots, e.g., see Krogstad, M. J. et al. Nat Mater 19, 63–68 (2020). We use MPL to plot with symmetric limits that are automatically enforced in NeXpy by choosing a divergent color map. If this is a niche, I think that it will be a growing one, since we are already collaborating with a number of research groups to produce ΔPDF data on a routine basis.

jklymak · 2020-02-09T00:11:51Z

@rayosborn as discussed on gitter, would arcsinh suit your needs? Not saying we are going that way, but its another option.

greglucas · 2020-02-09T03:59:39Z

I actually like the arcsinh suggestion, since it is a "simple" and explainable transform (although my guess is people still don't know what arcsinh is off the top of their head, I had to look it up myself: ln(z + np.sqrt(1+z^2)))

One other option I haven't seen mentioned yet would be removing the linear region completely and just making a threshold instead. This would be very simple to explain, log (of whatever base you want) everywhere. With abs(x) < threshold getting mapped to the midpoint of the scale. This would be similar to using a log scale currently where you specify a vmin and everything less than that gets clipped to the minimum value.

Edit: I just implemented this and realized it works fine for colorbar normalization, but not for x/y plots with symlog scales due to the clipping creating hard cutoffs rather than smoother transitions.

rayosborn · 2020-02-10T00:08:42Z

I'm not too bothered about what method is used to generate the symmetric "log" plots, since we don't analyze the images themselves. They are used to guide our interpretation and to present the data in talks and publications. If we fit any models, it is to the actual data, not the plotted representation.

jklymak · 2020-02-10T01:33:51Z

My concern is reproducibility. I’ve hand digitized a good number of Figures from papers where the data was no longer available, and if the scale was not clearly defined in the paper it would lead to errors.

jklymak · 2021-01-05T17:57:51Z

I think this is superseded by #16457? I'll close, but feel free to re-open @dstansby if you think there is more to do.

dstansby added this to the v3.2.0 milestone Feb 2, 2020

Re-write sym-log-norm

73ae1d3

dstansby force-pushed the symlog-overhaul branch from 9b81d13 to 73ae1d3 Compare February 2, 2020 20:43

dstansby added 3 commits February 3, 2020 10:18

Allow specifying base

943a848

Disallow vmin!=vmax

60f0d69

Update tests

6fdd2e0

dstansby commented Feb 3, 2020

View reviewed changes

dstansby added 2 commits February 3, 2020 13:38

Add base docs

cc1fa93

Add API change

8904654

dstansby added the topic: color/color & colormaps label Feb 3, 2020

dstansby added 2 commits February 3, 2020 13:56

Check a different base

3fb1dcf

Fix color test

82052d8

jklymak reviewed Feb 3, 2020

View reviewed changes

dstansby added the status: work in progress label Feb 3, 2020

jklymak mentioned this pull request Feb 4, 2020

FIX: add base kwarg to symlognor #16404

Merged

6 tasks

jklymak mentioned this pull request Feb 10, 2020

Build lognorm/symlognorm from corresponding scales. #16457

Merged

6 tasks

tacaswell modified the milestones: v3.2.0, v3.3.0 Feb 10, 2020

greglucas mentioned this pull request Mar 3, 2020

Changing the symlog function to use arcsinh(x/2). #16639

Closed

9 tasks

QuLogic added the status: needs comment/discussion needs consensus on next step label Apr 15, 2020

QuLogic modified the milestones: v3.3.0, v3.4.0 May 5, 2020

dstansby mentioned this pull request Jun 26, 2020

SymLogNorm and SymLogScale give inconsistent results.... #16376

Closed

jklymak marked this pull request as draft July 23, 2020 01:50

jklymak closed this Jan 5, 2021

dstansby deleted the symlog-overhaul branch January 5, 2021 19:18

Uh oh!

Re-write sym-log-norm #16391

Re-write sym-log-norm #16391

Uh oh!

Conversation

dstansby commented Feb 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Feb 2, 2020

Uh oh!

jklymak commented Feb 2, 2020

Uh oh!

dstansby Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

greglucas commented Feb 3, 2020

Uh oh!

jklymak commented Feb 3, 2020

Uh oh!

dstansby commented Feb 3, 2020

Uh oh!

jklymak left a comment

Choose a reason for hiding this comment

Uh oh!

jklymak Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

dstansby commented Feb 3, 2020

Uh oh!

jklymak commented Feb 3, 2020

Uh oh!

dstansby commented Feb 3, 2020

Uh oh!

jklymak commented Feb 3, 2020

Uh oh!

jklymak commented Feb 4, 2020

Uh oh!

anntzer commented Feb 4, 2020

Uh oh!

dstansby commented Feb 4, 2020

Uh oh!

jklymak commented Feb 4, 2020

Uh oh!

greglucas commented Feb 4, 2020

Uh oh!

anntzer commented Feb 4, 2020

Uh oh!

greglucas commented Feb 4, 2020

Uh oh!

jklymak commented Feb 4, 2020

Uh oh!

greglucas commented Feb 4, 2020

Uh oh!

jklymak commented Feb 4, 2020

Uh oh!

jklymak commented Feb 6, 2020

Uh oh!

dstansby commented Feb 6, 2020

Uh oh!

jklymak commented Feb 6, 2020

Uh oh!

anntzer commented Feb 6, 2020

Uh oh!

Tillsten commented Feb 6, 2020

Uh oh!

rayosborn commented Feb 8, 2020

Uh oh!

jklymak commented Feb 9, 2020

Uh oh!

greglucas commented Feb 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rayosborn commented Feb 10, 2020

Uh oh!

jklymak commented Feb 10, 2020

Uh oh!

jklymak commented Jan 5, 2021

Uh oh!

Uh oh!

dstansby commented Feb 2, 2020 •

edited

Loading

greglucas commented Feb 9, 2020 •

edited

Loading