Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Off-axes scatter() points unnecessarily saved to PDF when coloured #2488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mspacek opened this issue Oct 2, 2013 · 10 comments
Open

Off-axes scatter() points unnecessarily saved to PDF when coloured #2488

mspacek opened this issue Oct 2, 2013 · 10 comments
Labels
backend: pdf Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! keep Items to be ignored by the “Stale” Github Action Performance

Comments

@mspacek
Copy link
Contributor

mspacek commented Oct 2, 2013

Scatter plotting a bunch of points while specifying the colour of each point, then changing the axes limits so none of the points are visible, and then saving the result to a PDF, results in a file just as big as if the points were all visible within their default axes limits. This doesn't seem to happen if the colour arg isn't passed to scatter(). I haven't tried, but specifying other kinds of point specific attributes, like size, might also trigger the problem . Also, I haven't tried any of the other vector backends, but they may be affected as well.

This came out of #2423.

Example code:

import numpy as np
x = np.random.random(20000)
y = np.random.random(20000)
c = np.random.random(20000)

figure()
scatter(x, y)
pyplot.savefig('scatter.pdf')
xlim(2, 3) # move axes away for empty plot
pyplot.savefig('scatter_empty.pdf')
'''
file sizes in bytes:
scatter.pdf:       324187
scatter_empty.pdf:   6617
'''
figure()
scatter(x, y, c=c)
pyplot.savefig('scatter_color.pdf')
xlim(2, 3) # move axes away for empty plot
pyplot.savefig('scatter_color_empty.pdf')
'''
file sizes in bytes:
scatter_color.pdf:       410722
scatter_color_empty.pdf: 413541
'''
@tacaswell
Copy link
Member

Closed as I believe that #2423 fixed this problem.

@mspacek
Copy link
Contributor Author

mspacek commented Jan 18, 2014

I was under the impression from @mdboom that scatter() had a very different code path, and fixing this for scatter would be much more difficult than for plot():

#2423 (comment)

Has scatter been tested? I should pull from git to test it out again, but I'm not quite feeling up for that right now...

@tacaswell
Copy link
Member

@mspacek Yeah, you are right, closed this erroneously.

@tacaswell tacaswell reopened this Jan 20, 2014
@mdboom
Copy link
Member

mdboom commented Jan 21, 2014

Yes -- scatter is so much more flexible -- each item can have its own transform, and the only way to determine (in the general case) if a patch is off the axes is to actually transform all of its points anyway, which probably doesn't result in terribly large savings (though I suppose one save the stroking time). If you pre-determine that all of the transformations scale/translation without rotation/skew, one could simply transform the bounding box of the patch to determine whether it's outside of the image, and this would probably be fast enough to be worth the effort.

@petehuang
Copy link
Contributor

Sharing my results on 1.5.3 using TkAgg:

01/06/2017  11:31 PM           324,044 scatter.pdf
01/06/2017  11:31 PM           454,639 scatter_color.pdf
01/06/2017  11:31 PM           457,371 scatter_color_empty.pdf
01/06/2017  11:31 PM           326,419 scatter_empty.pdf

@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Oct 3, 2017
@github-actions
Copy link

github-actions bot commented Mar 3, 2023

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 3, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2023
@mspacek
Copy link
Contributor Author

mspacek commented May 2, 2023

This issue remains in matplotlib 3.7.1, Python 3.8, Qt5QAgg backend:

import matplotlib.pyplot as plt
import numpy as np
x = np.random.random(20000)
y = np.random.random(20000)
c = np.random.random(20000)

plt.figure()
plt.scatter(x, y)
plt.savefig('scatter.pdf')
plt.xlim(2, 3) # move axes away for empty plot
plt.savefig('scatter_empty.pdf')
'''
file sizes in bytes:
scatter.pdf:       327682
scatter_empty.pdf:   6313
'''
plt.figure()
plt.scatter(x, y, c=c)
plt.savefig('scatter_color.pdf')
plt.xlim(2, 3) # move axes away for empty plot
plt.savefig('scatter_color_empty.pdf')
'''
file sizes in bytes:
scatter_color.pdf:       582991
scatter_color_empty.pdf: 583963  <--- should be much smaller
'''

@mspacek
Copy link
Contributor Author

mspacek commented May 2, 2023

@tacaswell this should probably (unfortunately) be re-opened :)

@tacaswell tacaswell added Good first issue Open a pull request against these issues if there are no active ones! Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Performance and removed New feature labels May 2, 2023
@tacaswell
Copy link
Member

I'm going to re-open this and label it as "good first issue" but with medium difficulty. It is good first issue in that there is no API design choices to be made and two clear metrics to look at (the file size goes down in the special case and the run time does not go up (too much) in the general case). It is medium difficulty because this will likely require understanding the draw code in both collections.py and in the pdf generation code.

I think Mike's description in #2488 (comment) is still accurate. We do not know until the very (very) end if a given marker will be clipped or not.

Concretely I see two places we might want to do this:

  • in the draw method in collection.py that scatter goes through. We may have to pre-emptively compute the full transform stack to do it, but we could filter the patches there before passing off to the renderers draw_path_collection method. The pro of doing it here is that all backends will get a speed up (the best performance increase is to not do work you do not have to!) but the con is a bunch of extra complexity in the Collection.draw method and possibly extra run time.
  • in the draw_path_collection in the pdf backend. At some point we will have the fully computed path for each marker and right before we write it out to the pdf stream we can make a choice to emit it or drop it on the floor. The pro of this that it is much less unlikely to impose a computational cost we are not already paying, but the con is that in only helps the pdf (and maybe eps/ps as they are all coupled) backends. The SVG backend may benefit from a similar bit of logic and the Agg backend might.

Without actually implementing both of them I do not have a good sense of which is the better approach. The exact work:

  1. investigate both approaches and verify that they are tractable
  2. if no clear winner yet, implement both
  3. bench mark both that it makes the files smaller in the edge case and the impact on run-time in the general case
  4. add tests

@tacaswell tacaswell reopened this May 2, 2023
@github-actions github-actions bot removed the status: inactive Marked by the “Stale” Github Action label May 3, 2023
Copy link

github-actions bot commented May 3, 2024

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label May 3, 2024
@QuLogic QuLogic added keep Items to be ignored by the “Stale” Github Action and removed status: inactive Marked by the “Stale” Github Action labels May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: pdf Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! keep Items to be ignored by the “Stale” Github Action Performance
Projects
None yet
Development

No branches or pull requests

5 participants