Thanks to visit codestin.com
Credit goes to github.com

Skip to content

WIP, MAINT: Improve import time #14083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented Jul 23, 2019

This was not meant to be posted here with no note. I think i missed which fork I was targetting. I wanted to create a summary of the changes for discussion.

Going to repost my original comments here for discussion. xref: #11457 (comment)

As stated, I think blaming numpy.testing alone is probably misleading.

There are quite a few other modules that take time.

One of those "innocent" looking ones is:

  • platform. The only location where it is truely necessary is to create the variable IS_PYPY. That said, it seems to import all of Threading, which accounts for a large chunk of the import time. If detecting PyPy was hard and inconvenient, it might be justified, but in fact, it is as easy as "PyPy" in sys.version.
  • Threading is also imported for a Lock in the random module, which is a Cython module. I made a proof of concept where I imported fastrlock (ok, I know that we don't need reentrant locks, but I wanted something that look API compatible easily). The random modules are already cython, and thus this is a "small micro optimization" that doesn't add any cost. We can use the same locking primitives that fastrlock uses to speed up the whole module.
  • secrets is quite slow to import. Since we only need it for a few random bits, we can import what we need ourselves. https://github.com/numpy/numpy/pull/14083/files#diff-89944aec176617da993c6de4d9529348R251
  • As stated, by other UnitTest does take quite a bit of time. The warning in the comments indicates that it is likely only used by packages, that can find the relevant documentation to test numpy as needed.
  • pickle is quite slow too. pickle is a strange one, since many other libraries will import it, but from what I found by removing it, almost everywhere it was used except for numpy.core._methods, it is associated with a warning. Not sure if the omission of the warning there was an honest mistake or an API decision. But avoiding the import of pickle can speed things up for those that don't need it.
    https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py#L241
  • textwrap is not a trivial import. It is only used in 2 location where it makes the code "indented to according to a certain style". It doesn't seem worthwhile to use it to sanitize static strings. https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L166 https://github.com/numpy/numpy/blob/master/numpy/ma/core.py#L2448
  • Decimal takes time, but there isn't much you can do other than ruining code style.
  • pathlib is also not a trivial import. In fact the one location where it is imported directly has a comment stating that it should not be the prefered method. https://github.com/numpy/numpy/blob/master/numpy/compat/py3k.py#L105
  • shutils can be lazy imported in the two locations where it is used.

While some of these might be micro optimizations, I think many might be well justified to help improve numpy's import time in the near term especially seeing as these optimizations hit code that is considered soft deprecated or convenient for compatibility reasons that no longer exist (i.e. python 2 has been dropped).

I'm going to refer to numpy's own benchmarks regarding the amount of time it takes for numpy to import:
https://pv.github.io/numpy-bench/#bench_import.Import.time_numpy

According to those, on some computer somewhere, it might take about 900 ms to import numpy up from 700 ms in recent versions. While the benchmarks running on my laptop are not that slow, it also isn't the cheapest laptop around.

Here is a PR made to my own branch showing the changes in case anybody wanted to glance at them: #14083

And an image of the improvements as I made the changes.
image

I'm happy to cleanup the changes as required.
Other relevant discussion here:
https://news.ycombinator.com/item?id=16978932 linking to a post where python core devs are worried about import time as well.

I get that in many application the caller will likely import Threading, or pathlib or platform themselves, and thus their application will not see the overall benefit or removing all 3 imports, but they might see the slight improvement in removing one of the many dependencies that aren't critical, or, at the very least, they might have a nice way of lazy importing them themselves.

Summary

The commits here are kinda my only reference. I really want to keep them until a consensus is reached on what should be kept and what shouldn't be:

Library Performance cost proposed "fix" eric-wieser
total master 100 ms on a good laptop, 1s according to official benchmarks
python core libraries 10%
platform #14098 6% avoid (1 line vendor)
np.testing #14097 9.5% remove
textwrap #14095 3.4% avoid vendor deindent
pickle 2% avoid (deprecated)
pathlib #14093 2% avoid (remove after 3.6)
datetime 1% don't know
secrets #14094 3% 1 line vendor
threading (imported by platform) #14098 2% avoid

@hmaarrfk
Copy link
Contributor Author

Sorry guys, I really thought this was going to my own branch....

@@ -25,7 +25,6 @@
import sys
import operator
import warnings
import textwrap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the cost is really high, I think textwrap aids readability too much to be worth removing. If the cost really is that high, then can we vendor the dedent function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running benchmarks on my laptop is making it really hot. At first glance, it looks like removing textwrap shaves about 8ms / 80 ms.

I really disagree with the claim that textwrap.deindent improves readability.

Deindent seems more like it should be used where a user inputs unpredictable text, whereas here, we can just press shift tab or add a new line to align things the way we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the cost stems from an re import. Turns out matplotlib vendors dedent for performance reasons, I'd be inclined to follow their lead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re is a really hard import to get rid of.

I tried as part of this. Re is used a little everywhere to parse strings (often it really isn't necessary) but I did find a few precompiled parsers which were too complex to get rid of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i included your thoughts above in the first post. I'm not the one doing regular maintenance, so you all can make the final decision taking everything into account. Either fix is likely going to be ok.

@hmaarrfk hmaarrfk force-pushed the improve_import_time branch 2 times, most recently from 5b4c734 to eed6620 Compare July 23, 2019 03:14
@charris
Copy link
Member

charris commented Jul 23, 2019

We don't mind force pushes in PRs, they are encouraged. Clean PRs are nice.

@hmaarrfk
Copy link
Contributor Author

We don't mind force pushes in PRs, they are encouraged. Clean PRs are nice.

Yea, this was just at the draft stage, looking more for comments, and discussion on whether this was a useful direction, than me wanting real feedback wrt code and style.

@hmaarrfk hmaarrfk force-pushed the improve_import_time branch 2 times, most recently from ab5a957 to e1ed1a3 Compare July 23, 2019 03:36
@hmaarrfk hmaarrfk force-pushed the improve_import_time branch from e1ed1a3 to 79345aa Compare July 23, 2019 03:40
@charris
Copy link
Member

charris commented Jul 23, 2019

Might break it up into smaller bits, though, makes it easier to review.

@hmaarrfk
Copy link
Contributor Author

Might break it up into smaller bits, though, makes it easier to review.

The obvious place to start is

    from .testing import Tester

but that had been shot down due to the fear of breaking backward compatibility.

As a whole, you get a 30% improvement. As brought up in earlier attempts, isolated fixes don't always yield results since it moves the import cost elsewhere.

Hopefully the improvements can help sway those original concerns.

@hmaarrfk
Copy link
Contributor Author

Hopefully the improvements can help sway those original concerns.

If this is decided to be a worthwhile endeavour, i'm happy to break things up.

@hmaarrfk hmaarrfk force-pushed the improve_import_time branch from a73898a to 3e621f9 Compare July 23, 2019 04:32
@hmaarrfk hmaarrfk force-pushed the improve_import_time branch from a09f560 to 2e13de7 Compare July 23, 2019 04:49
from cpython.pycapsule cimport PyCapsule_New
from fastrlock.rlock cimport create_fastrlock as Lock
Copy link
Member

@pv pv Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new package dependency, import time improvement probably not enough reason to add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we shouldn't add this as a dependency. i just needed a quick way to show that the threading module from python was likely bloated.

An alternative is to do:

from _thread import acquire_lock as Lock

but there you are importing from private CPython, which is likely incompatible with other implementations of python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's not a good idea. can you think of a robust alternative? if not, better to just revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use:

from cpython cimport pythread
self.lock = pythread.PyThread_allocate_lock()

https://github.com/scoder/fastrlock/blob/master/fastrlock/rlock.pyx#L21

@charris charris changed the title Improve import time WIP, MAINT: Improve import time Jul 23, 2019
@charris
Copy link
Member

charris commented Jul 25, 2019

I can see removing Tester at some point, maybe even separating the tests into their own tree. For now, I wonder why it is so slow? All the instantiation does is set an internal variable. Maybe

    # Pytest testing
    from numpy._pytesttester import PytestTester
    test = PytestTester(__name__)
    del PytestTester

is a problem. What if we just import numpy._pytesttester and test = numpy._pytesttester.PytestTester(__name__)
or some such?

@hmaarrfk
Copy link
Contributor Author

I think most of these have been addressed in individual PRs. Thanks!

@hmaarrfk hmaarrfk closed this Sep 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants