-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
WIP, MAINT: Improve import time #14083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sorry guys, I really thought this was going to my own branch.... |
@@ -25,7 +25,6 @@ | |||
import sys | |||
import operator | |||
import warnings | |||
import textwrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the cost is really high, I think textwrap aids readability too much to be worth removing. If the cost really is that high, then can we vendor the dedent function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running benchmarks on my laptop is making it really hot. At first glance, it looks like removing textwrap shaves about 8ms / 80 ms.
I really disagree with the claim that textwrap.deindent improves readability.
Deindent seems more like it should be used where a user inputs unpredictable text, whereas here, we can just press shift tab or add a new line to align things the way we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the cost stems from an re
import. Turns out matplotlib vendors dedent for performance reasons, I'd be inclined to follow their lead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re is a really hard import to get rid of.
I tried as part of this. Re is used a little everywhere to parse strings (often it really isn't necessary) but I did find a few precompiled parsers which were too complex to get rid of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i included your thoughts above in the first post. I'm not the one doing regular maintenance, so you all can make the final decision taking everything into account. Either fix is likely going to be ok.
5b4c734
to
eed6620
Compare
We don't mind force pushes in PRs, they are encouraged. Clean PRs are nice. |
Yea, this was just at the draft stage, looking more for comments, and discussion on whether this was a useful direction, than me wanting real feedback wrt code and style. |
ab5a957
to
e1ed1a3
Compare
e1ed1a3
to
79345aa
Compare
Might break it up into smaller bits, though, makes it easier to review. |
The obvious place to start is
but that had been shot down due to the fear of breaking backward compatibility. As a whole, you get a 30% improvement. As brought up in earlier attempts, isolated fixes don't always yield results since it moves the import cost elsewhere. Hopefully the improvements can help sway those original concerns. |
If this is decided to be a worthwhile endeavour, i'm happy to break things up. |
a73898a
to
3e621f9
Compare
a09f560
to
2e13de7
Compare
from cpython.pycapsule cimport PyCapsule_New | ||
from fastrlock.rlock cimport create_fastrlock as Lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new package dependency, import time improvement probably not enough reason to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn't add this as a dependency. i just needed a quick way to show that the threading module from python was likely bloated.
An alternative is to do:
from _thread import acquire_lock as Lock
but there you are importing from private CPython, which is likely incompatible with other implementations of python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's not a good idea. can you think of a robust alternative? if not, better to just revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use:
from cpython cimport pythread
self.lock = pythread.PyThread_allocate_lock()
https://github.com/scoder/fastrlock/blob/master/fastrlock/rlock.pyx#L21
I can see removing
is a problem. What if we just import |
I think most of these have been addressed in individual PRs. Thanks! |
This was not meant to be posted here with no note. I think i missed which fork I was targetting. I wanted to create a summary of the changes for discussion.
Going to repost my original comments here for discussion. xref: #11457 (comment)
As stated, I think blaming numpy.testing alone is probably misleading.
There are quite a few other modules that take time.
One of those "innocent" looking ones is:
platform
. The only location where it is truely necessary is to create the variableIS_PYPY
. That said, it seems to import all ofThreading
, which accounts for a large chunk of the import time. If detectingPyPy
was hard and inconvenient, it might be justified, but in fact, it is as easy as"PyPy" in sys.version
.Threading
is also imported for aLock
in the random module, which is a Cython module. I made a proof of concept where I importedfastrlock
(ok, I know that we don't need reentrant locks, but I wanted something that look API compatible easily). The random modules are already cython, and thus this is a "small micro optimization" that doesn't add any cost. We can use the samelocking
primitives that fastrlock uses to speed up the whole module.secrets
is quite slow to import. Since we only need it for a few random bits, we can import what we need ourselves. https://github.com/numpy/numpy/pull/14083/files#diff-89944aec176617da993c6de4d9529348R251UnitTest
does take quite a bit of time. The warning in the comments indicates that it is likely only used by packages, that can find the relevant documentation to test numpy as needed.pickle
is quite slow too.pickle
is a strange one, since many other libraries will import it, but from what I found by removing it, almost everywhere it was used except fornumpy.core._methods
, it is associated with a warning. Not sure if the omission of the warning there was an honest mistake or an API decision. But avoiding the import of pickle can speed things up for those that don't need it.https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py#L241
textwrap
is not a trivial import. It is only used in 2 location where it makes the code "indented to according to a certain style". It doesn't seem worthwhile to use it to sanitize static strings. https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L166 https://github.com/numpy/numpy/blob/master/numpy/ma/core.py#L2448Decimal
takes time, but there isn't much you can do other than ruining code style.pathlib
is also not a trivial import. In fact the one location where it is imported directly has a comment stating that it should not be the prefered method. https://github.com/numpy/numpy/blob/master/numpy/compat/py3k.py#L105shutils
can be lazy imported in the two locations where it is used.While some of these might be
micro optimizations
, I think many might be well justified to help improve numpy's import time in the near term especially seeing as these optimizations hit code that is consideredsoft deprecated
orconvenient for compatibility reasons that no longer exist (i.e. python 2 has been dropped)
.I'm going to refer to numpy's own benchmarks regarding the amount of time it takes for numpy to import:
https://pv.github.io/numpy-bench/#bench_import.Import.time_numpy
According to those, on some computer somewhere, it might take about 900 ms to import numpy up from 700 ms in recent versions. While the benchmarks running on my laptop are not that slow, it also isn't the cheapest laptop around.
Here is a PR made to my own branch showing the changes in case anybody wanted to glance at them: #14083
And an image of the improvements as I made the changes.

I'm happy to cleanup the changes as required.
Other relevant discussion here:
https://news.ycombinator.com/item?id=16978932 linking to a post where python core devs are worried about import time as well.
I get that in many application the caller will likely import
Threading
, orpathlib
orplatform
themselves, and thus their application will not see the overall benefit or removing all 3 imports, but they might see the slight improvement in removing one of the many dependencies that aren't critical, or, at the very least, they might have a nice way of lazy importing them themselves.Summary
The commits here are kinda my only reference. I really want to keep them until a consensus is reached on what should be kept and what shouldn't be:
deindent