-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Excessive rugplot memory usage #4695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here is the relevant code from seaborn that interfaces with matplotlib:
So, it is creating 1e7 ax{h,v}lines, each with it own transform stack and Keeping in mind that ax{h,v}lines was originally designed only for doing a On Tue, Jul 14, 2015 at 12:08 PM, Coby Viner [email protected]
|
Wait the official word from matplotlib is "we can't draw more than 10 lines on a plot?" OK. |
Where did I say that? I merely noted that the ax{v,h}line function was On Tue, Jul 14, 2015 at 12:58 PM, Michael Waskom [email protected]
|
Note, we do have a way to draw large number of lines, it is called a On Tue, Jul 14, 2015 at 1:02 PM, Benjamin Root [email protected] wrote:
|
By similar logic, rugplot is "not intended to draw 1e7 lines". |
I would not disagree with that notion, but as the maintainer of seaborn and user of the matplotlib API, it would make sense to utilize its API in an efficient manner, especially when the mechanisms for doing so are available. Productive feedback to matplotlib would be how we can help make those mechanisms more easily apparent (documentation and/or api changes). Now, it may very well be that there are some fixable inefficiencies in those methods that will help improve performance a bit, but I can guarantee you that the biggest gains would come from creating a Line2DCollection object with 1e7 elements in it, rather than 1e7 Line2D objects. Remember that matplotlib's drawing stack requires sorting the list of artists that it has to handle, along with looping over each artist, calling its draw() at every refresh. Meanwhile, many of the Collection objects can bypass a lot of the typical inefficiencies by assuming certain commonalities. |
It looks like it would be straightforward to add |
right, I was thinking along those lines, but we would need to be careful of On Tue, Jul 14, 2015 at 3:46 PM, Eric Firing [email protected]
|
Yes, I think we could do that. I imagine the right way to do it might be with an underlying refactoring, so that LineCollection and Line2D would inherit from a base class. Then the return would be guaranteed to be an instance of that base, but might be further specialized depending on the inputs. This sort of refactoring could help us unify Collections with their related single types. I haven't thought it through; but the combination of close similarity and subtle differences in API between Collection types and the single types has always been problematic. |
I thought the grid lines were drawn as part of the axis |
Ticks and ticklabels are another performance nightmare; I think they are largely responsible for the abysmal performance in making a 10x10 array of subplots. |
The Tick situation is already tracked at #6664. For rugplot's case I think |
Seaborn's
sns.distplot
uses very large amounts of memory when attempting to plot a large dataset. An attempt to plot a dataset composed of a single vector of 19 591 561 elements (of type float64, all values between 0 and 1, with no NA elements), failed after exceeding 250 GB of memory (using the Agg backend, on Python 2.7.8, with the latest version of all packages obtained from pip). This only occurs whenrug=True
.This issue was previously reported on Seaborn's issue tracker, and the package author suggested that this is in fact a matplotlib issue.
The text was updated successfully, but these errors were encountered: