Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Scatter autoscaling still has issues with log scaling and zero values #16552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
moloney opened this issue Feb 18, 2020 · 7 comments · Fixed by #18642
Closed

Scatter autoscaling still has issues with log scaling and zero values #16552

moloney opened this issue Feb 18, 2020 · 7 comments · Fixed by #18642
Milestone

Comments

@moloney
Copy link

moloney commented Feb 18, 2020

I installed the 3.2.0rc3 package, and I can confirm that this fixes almost all the issues with autoscaling and scatter. However, if you have points with 0 for the y-axis and enable log scaling there is still an issue with the scatter plot auto scaling (that doesn't show up with the standard plot function).

import itertools

import numpy as np
from matplotlib import pyplot as plt


x_vals = [4.38462e-06,
          5.54929e-06,
          7.02332e-06,
          8.88889e-06,
          1.125e-05,
          1.42383e-05,
          1.80203e-05,
          2.2807e-05,
          2.88651e-05,
          3.65324e-05,
          4.62363e-05,
          5.85178e-05,
          7.40616e-05,
          9.37342e-05,
          0.000118632,
          ]

y_vals = [0.0,
          0.10000000000000002,
          0.182,
          0.332,
          0.604,
          1.1,
          2.0,
          3.64,
          6.64,
          12.100000000000001,
          22.0,
          39.60000000000001,
          71.3,
          ]

pts = np.array(list(itertools.product(x_vals, y_vals)))

fig = plt.figure("Scatter plot")
ax = fig.gca()
ax.set_xscale('log')
ax.set_yscale('log')
ax.scatter(pts[:,0], pts[:,1]) # This only shows four rows of points

fig = plt.figure("Regular plot")
ax = fig.gca()
ax.set_xscale('log')
ax.set_yscale('log')
ax.plot(pts[:,0], pts[:,1], marker="o", ls="") # This works

plt.show()

Originally posted by @moloney in #6015 (comment)

@tacaswell tacaswell added this to the v3.2.1 milestone Feb 18, 2020
@JadinLuong
Copy link

Hey, I'm new to matplotlib and I am willing to contribute. Would this be a good first issue? If so what are some first steps in tackling this issue?

@timhoffm
Copy link
Member

timhoffm commented Mar 6, 2020

@JadinLuong yes, this is suitable as a first issue.

First steps:

  • Run the above code example and verify that plot() yields a reasonable result while scatter() does not. - Actually what does it do with the 0 values? Is that reasonable?
  • Dig into the functions and find out why they behave differently.
  • Write a test for scatter() that checks the desired scaling behavior and thus currently fails.
  • Adapt scatter() so that it behaves like plot() with respect to scaling and now passes the above test.
  • Create ar PR with your code.

If you have technical trouble or want to learn the development workflow and conventions for Matplotlib, check the developers guide. Since this is quite lengthy, you may also ask if you have a particular question.

vincentt117 added a commit to CSCD01/matplotlib-team28 that referenced this issue Mar 10, 2020
vincentt117 added a commit to CSCD01/matplotlib-team28 that referenced this issue Mar 10, 2020
@tacaswell tacaswell modified the milestones: v3.2.1, v3.2.2 Mar 16, 2020
@QuLogic
Copy link
Member

QuLogic commented May 9, 2020

The problem here is that plot adds a Line2D, whereas scatter adds a PathCollection. In the former case, the Axes.dataLim is updated in Axes.add_line using Line2D.get_path(). In the latter case, the Axes.dataLim is updated in Axes.add_collection using Collection.get_datalim().

Updating using the Path means it will check every single point, while updating using the collection's data limits means it will only check the minimum x/y and the maximum x/y. This actually results in the same data limits, but on a log scale, the y=0 point will be ignored, so it produces a different minposy. In the latter case, there are only two points, so minposy is actually set to the maximum y.

On log scales, any view limit <= 0 is set to minposy. For plot, that's just the next value up (0.1), but for scatter, it's the maximum (71.3). You only see something in the end because the nonsingular check sees that vmin==vmax and picks 'nice' values around it.

To fix this, there would need to be some change in add_collection to actually set the right minpos? values. For scatter, it could grab the actual path and update dataLim using it, but I don't think that's true of all collections, and a bit of a hack fix.

@QuLogic
Copy link
Member

QuLogic commented May 9, 2020

Collection.get_datalim returns a Bbox, so we do have somewhere to store minpos? without complicated API changes. However, Axes.add_collection converts that into a Path to update the dataLim, throwing away any of that.

So this would need some coordination to say that Collection.get_datalim should fill in minpos? and that Bbox.update_from_data_xy (or probably a new function) should merge that information.

This is perhaps not difficult, but might require some carefulness and thorough testing, so I'm not too sure it'll be fixed for 3.2.2.

@QuLogic
Copy link
Member

QuLogic commented Oct 3, 2020

Surprisingly, this works and does not break any tests:

diff --git a/lib/matplotlib/collections.py b/lib/matplotlib/collections.py
index 51c6c50a03..e785ceb462 100644
--- a/lib/matplotlib/collections.py
+++ b/lib/matplotlib/collections.py
@@ -290,9 +290,7 @@ class Collection(artist.Artist, cm.ScalarMappable):
                 # note A-B means A B^{-1}
                 offsets = np.ma.masked_invalid(offsets)
                 if not offsets.mask.all():
-                    points = np.row_stack((offsets.min(axis=0),
-                                           offsets.max(axis=0)))
-                    return transforms.Bbox(points)
+                    return offsets
         return transforms.Bbox.null()
 
     def get_window_extent(self, renderer):

It just happens to work because Axes.update_datalim uses Bbox.update_from_data_xy I mentioned, and Axes.add_collection passes the above result directly to it. However, it's not really a great patch, as then Collection.get_datalim sometimes returns an array of points, and sometimes a bbox. And it doesn't handle the other cases in Collection.get_datalim.

If we added a second method that returned the points, while Collection.get_datalim returned the bbox from it, then we could have add_collection call this new method and update the Axes dataLim with the full data.

QuLogic added a commit to QuLogic/matplotlib that referenced this issue Oct 3, 2020
QuLogic added a commit to QuLogic/matplotlib that referenced this issue Oct 7, 2020
QuLogic added a commit to QuLogic/matplotlib that referenced this issue Oct 9, 2020
QuLogic added a commit to QuLogic/matplotlib that referenced this issue Oct 16, 2020
@petor-traffs
Copy link

Hello!! Is this issue still open? I'd like to try it!

@dopplershift
Copy link
Contributor

Looks like #18642 has been opened to try to fix this one. Might not be a good starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants