WIP, MAINT: Improve import time #14083

hmaarrfk · 2019-07-23T02:16:21Z

This was not meant to be posted here with no note. I think i missed which fork I was targetting. I wanted to create a summary of the changes for discussion.

Going to repost my original comments here for discussion. xref: #11457 (comment)

As stated, I think blaming numpy.testing alone is probably misleading.

There are quite a few other modules that take time.

One of those "innocent" looking ones is:

platform. The only location where it is truely necessary is to create the variable IS_PYPY. That said, it seems to import all of Threading, which accounts for a large chunk of the import time. If detecting PyPy was hard and inconvenient, it might be justified, but in fact, it is as easy as "PyPy" in sys.version.
Threading is also imported for a Lock in the random module, which is a Cython module. I made a proof of concept where I imported fastrlock (ok, I know that we don't need reentrant locks, but I wanted something that look API compatible easily). The random modules are already cython, and thus this is a "small micro optimization" that doesn't add any cost. We can use the same locking primitives that fastrlock uses to speed up the whole module.
secrets is quite slow to import. Since we only need it for a few random bits, we can import what we need ourselves. https://github.com/numpy/numpy/pull/14083/files#diff-89944aec176617da993c6de4d9529348R251
As stated, by other UnitTest does take quite a bit of time. The warning in the comments indicates that it is likely only used by packages, that can find the relevant documentation to test numpy as needed.
pickle is quite slow too. pickle is a strange one, since many other libraries will import it, but from what I found by removing it, almost everywhere it was used except for numpy.core._methods, it is associated with a warning. Not sure if the omission of the warning there was an honest mistake or an API decision. But avoiding the import of pickle can speed things up for those that don't need it.
https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py#L241
textwrap is not a trivial import. It is only used in 2 location where it makes the code "indented to according to a certain style". It doesn't seem worthwhile to use it to sanitize static strings. https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L166 https://github.com/numpy/numpy/blob/master/numpy/ma/core.py#L2448
Decimal takes time, but there isn't much you can do other than ruining code style.
pathlib is also not a trivial import. In fact the one location where it is imported directly has a comment stating that it should not be the prefered method. https://github.com/numpy/numpy/blob/master/numpy/compat/py3k.py#L105
shutils can be lazy imported in the two locations where it is used.

While some of these might be micro optimizations, I think many might be well justified to help improve numpy's import time in the near term especially seeing as these optimizations hit code that is considered soft deprecated or convenient for compatibility reasons that no longer exist (i.e. python 2 has been dropped).

I'm going to refer to numpy's own benchmarks regarding the amount of time it takes for numpy to import:
https://pv.github.io/numpy-bench/#bench_import.Import.time_numpy

According to those, on some computer somewhere, it might take about 900 ms to import numpy up from 700 ms in recent versions. While the benchmarks running on my laptop are not that slow, it also isn't the cheapest laptop around.

Here is a PR made to my own branch showing the changes in case anybody wanted to glance at them: #14083

And an image of the improvements as I made the changes.

I'm happy to cleanup the changes as required.
Other relevant discussion here:
https://news.ycombinator.com/item?id=16978932 linking to a post where python core devs are worried about import time as well.

I get that in many application the caller will likely import Threading, or pathlib or platform themselves, and thus their application will not see the overall benefit or removing all 3 imports, but they might see the slight improvement in removing one of the many dependencies that aren't critical, or, at the very least, they might have a nice way of lazy importing them themselves.

Summary

The commits here are kinda my only reference. I really want to keep them until a consensus is reached on what should be kept and what shouldn't be:

Library	Performance cost	proposed "fix"	eric-wieser
total master	100 ms on a good laptop, 1s according to official benchmarks
python core libraries	10%
platform #14098	6%	avoid (1 line vendor)
np.testing #14097	9.5%	remove
textwrap #14095	3.4%	avoid	vendor `deindent`
pickle	2%	avoid (deprecated)
pathlib #14093	2%	avoid (remove after 3.6)
datetime	1%	don't know
secrets #14094	3%	1 line vendor
threading (imported by platform) #14098	2%	avoid

benchmarks/benchmarks/bench_import.py

numpy/compat/py3k.py

hmaarrfk · 2019-07-23T03:04:38Z

Sorry guys, I really thought this was going to my own branch....

eric-wieser · 2019-07-23T03:05:13Z

numpy/ma/core.py

@@ -25,7 +25,6 @@
 import sys
 import operator
 import warnings
-import textwrap


Unless the cost is really high, I think textwrap aids readability too much to be worth removing. If the cost really is that high, then can we vendor the dedent function?

Running benchmarks on my laptop is making it really hot. At first glance, it looks like removing textwrap shaves about 8ms / 80 ms.

I really disagree with the claim that textwrap.deindent improves readability.

Deindent seems more like it should be used where a user inputs unpredictable text, whereas here, we can just press shift tab or add a new line to align things the way we want.

Looks like the cost stems from an re import. Turns out matplotlib vendors dedent for performance reasons, I'd be inclined to follow their lead.

re is a really hard import to get rid of.

I tried as part of this. Re is used a little everywhere to parse strings (often it really isn't necessary) but I did find a few precompiled parsers which were too complex to get rid of.

Ok, i included your thoughts above in the first post. I'm not the one doing regular maintenance, so you all can make the final decision taking everything into account. Either fix is likely going to be ok.

charris · 2019-07-23T03:19:39Z

We don't mind force pushes in PRs, they are encouraged. Clean PRs are nice.

hmaarrfk · 2019-07-23T03:22:22Z

We don't mind force pushes in PRs, they are encouraged. Clean PRs are nice.

Yea, this was just at the draft stage, looking more for comments, and discussion on whether this was a useful direction, than me wanting real feedback wrt code and style.

charris · 2019-07-23T03:58:48Z

Might break it up into smaller bits, though, makes it easier to review.

hmaarrfk · 2019-07-23T04:03:16Z

Might break it up into smaller bits, though, makes it easier to review.

The obvious place to start is

    from .testing import Tester

but that had been shot down due to the fear of breaking backward compatibility.

As a whole, you get a 30% improvement. As brought up in earlier attempts, isolated fixes don't always yield results since it moves the import cost elsewhere.

Hopefully the improvements can help sway those original concerns.

hmaarrfk · 2019-07-23T04:03:50Z

Hopefully the improvements can help sway those original concerns.

If this is decided to be a worthwhile endeavour, i'm happy to break things up.

pv · 2019-07-23T14:58:10Z

numpy/random/bit_generator.pyx

 from cpython.pycapsule cimport PyCapsule_New
+from fastrlock.rlock cimport create_fastrlock as Lock


This is a new package dependency, import time improvement probably not enough reason to add it?

I agree that we shouldn't add this as a dependency. i just needed a quick way to show that the threading module from python was likely bloated.

An alternative is to do:

from _thread import acquire_lock as Lock

but there you are importing from private CPython, which is likely incompatible with other implementations of python.

yeah that's not a good idea. can you think of a robust alternative? if not, better to just revert

I think we can use:

from cpython cimport pythread self.lock = pythread.PyThread_allocate_lock()

https://github.com/scoder/fastrlock/blob/master/fastrlock/rlock.pyx#L21

charris · 2019-07-25T17:14:36Z

I can see removing Tester at some point, maybe even separating the tests into their own tree. For now, I wonder why it is so slow? All the instantiation does is set an internal variable. Maybe

    # Pytest testing
    from numpy._pytesttester import PytestTester
    test = PytestTester(__name__)
    del PytestTester

is a problem. What if we just import numpy._pytesttester and test = numpy._pytesttester.PytestTester(__name__)
or some such?

hmaarrfk · 2019-09-21T17:36:10Z

I think most of these have been addressed in individual PRs. Thanks!

hmaarrfk mentioned this pull request Jul 23, 2019

numpy.testing accounts for almost 30 % import time #11457

Closed

rgommers reviewed Jul 23, 2019

View reviewed changes

benchmarks/benchmarks/bench_import.py Outdated Show resolved Hide resolved

eric-wieser reviewed Jul 23, 2019

View reviewed changes

numpy/compat/py3k.py Show resolved Hide resolved

eric-wieser reviewed Jul 23, 2019

View reviewed changes

hmaarrfk force-pushed the improve_import_time branch 2 times, most recently from 5b4c734 to eed6620 Compare July 23, 2019 03:14

hmaarrfk force-pushed the improve_import_time branch 2 times, most recently from ab5a957 to e1ed1a3 Compare July 23, 2019 03:36

hmaarrfk added 7 commits July 22, 2019 23:38

Remove tester to improve import time

50686ca

Lazy import pickle

625db14

Don't import pathlib

b76631a

lazy import shutils

04aeae1

delay importing platform.

29fe679

lazy import weakref

0ae6ba7

Delay importing Decimal

1723475

hmaarrfk force-pushed the improve_import_time branch from e1ed1a3 to 79345aa Compare July 23, 2019 03:40

hmaarrfk force-pushed the improve_import_time branch from a73898a to 3e621f9 Compare July 23, 2019 04:32

hmaarrfk added 4 commits July 23, 2019 00:48

bit_generator use fastrlock instead of Threading.

e57cf06

create the randbits object ourselves.

4329d6a

don't import textwrap

df0c827

justtotest: install fastrlock in builds.

2e13de7

hmaarrfk force-pushed the improve_import_time branch from a09f560 to 2e13de7 Compare July 23, 2019 04:49

pv reviewed Jul 23, 2019

View reviewed changes

charris added the 03 - Maintenance label Jul 23, 2019

charris changed the title ~~Improve import time~~ WIP, MAINT: Improve import time Jul 23, 2019

charris added the 25 - WIP label Jul 23, 2019

This was referenced Jul 24, 2019

[WIP] DEP: Deprecate is_pathlib_like #14093

Closed

MAINT: Lazy import testing on python >=3.7 #14097

Merged

MAINT: Import time: Avoid importing python.Threading #14098

Closed

mattip mentioned this pull request Aug 27, 2019

It seemed that NumPy spend a lot of time on loading packages #14374

Closed

hmaarrfk closed this Sep 21, 2019

		from cpython.pycapsule cimport PyCapsule_New
		from fastrlock.rlock cimport create_fastrlock as Lock

Uh oh!

WIP, MAINT: Improve import time #14083

WIP, MAINT: Improve import time #14083

Uh oh!

Conversation

hmaarrfk commented Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Uh oh!

hmaarrfk commented Jul 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Jul 23, 2019

Uh oh!

hmaarrfk commented Jul 23, 2019

Uh oh!

charris commented Jul 23, 2019

Uh oh!

hmaarrfk commented Jul 23, 2019

Uh oh!

hmaarrfk commented Jul 23, 2019

Uh oh!

pv Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Jul 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmaarrfk commented Sep 21, 2019

Uh oh!

Uh oh!

hmaarrfk commented Jul 23, 2019 •

edited

Loading

pv Jul 23, 2019 •

edited

Loading

charris commented Jul 25, 2019 •

edited

Loading