-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] MNT Include all pxd files in the package #15626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] MNT Include all pxd files in the package #15626
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of this solution better. Let's assume that package_data
is properly tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this solution as well (I have not double-checked that it works as expected).
Can we just recognize the contribution of the author in #14896 in some way? Either by merging part of their validation script under maint_tools in that PR, or by adding them a co-author to this PR (I think adding Signed-off-by:
to the final commit message would do it). They did significant work on this, and didn't receive much timely feedback..
Of course. I opened this PR because I was reviewing his PR and found a simpler method by chance. I don't think the validation script is necessary. It's the role of setuptools to test that package_data works. |
We could have a bit of bash script that checks the content of a generated sdist tarball as a CI job. I also think we should change at least one of our CI jobs to build and install from the sdist tarball and run the tests on the installed scikit-learn in a new empty env using the But this can be done in a later PR. |
There is my reasoning why I didn't opt for this solution: Having a pxd doesn't mean you can use it/cimport it, for example because of https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/_libsvm.pxd#L42 It is not enough to tests that a pxd-file is present in the installation - one must really tests that an extension using it gets compiled/linked/loaded (see Thus, I opted to include only pxd-files which really can be used when installed, and skipped others. However, IMO using |
Can someone remind me why we want the .pxd files in the installed package ? @rth ? |
@ogrisel they are already in the sdist (otherwise pip install could not build sklearn), what this pr does and what could be tested is that they are in the installed package. |
We could include the .c and .h files as well ? |
I assume it's for people to be able to use scikit-learn's private API in their own scripts without having to download the source? Sort of like a I think I like this solution too. Worst case scenario, the installed package contains .pxd that cannot actually be used... which is OK right?
+1. We can add a what's new entry crediting both this PR and the former one. |
That is true, but usually one just would like to write "cimport xxxx" without hunting all dependencies down. numpy.pxd is a little bit different, but at least there is After thinking about it, probably |
Indeed (although not mandatory). I renamed those files. Should be ok since it's private now. Thus they won't be included in the install @realead I added a test to check that all pxd files included in the installation directory can be cimported. It's inspired from your test but a bit simpler because I don't check that all pxd files from the repo are actually in the install. Let me know what you think about it. |
@jeremiedbb looks good to me! I think we can assume setuptools will collect all *.pxd correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of a centralized Cython check. Here are a few cosmetic suggestions but feel free to ignore them ;)
with open('tst.pyx', 'w') as f: | ||
for file in pxd_files: | ||
to_import = file.replace(os.path.sep, '.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nipick: file
-> pxd_file
to avoid shadowing the file
built-in type in the globals namespace.
check=True) | ||
|
||
except subprocess.CalledProcessError: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this except clause? Just a try / finally construct would work fine, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually we can get rid of the try / finally if you remove the os.chdir
calls.
|
||
with tempfile.TemporaryDirectory() as tmpdir: | ||
try: | ||
os.chdir(tmpdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid os.chdir
and instead:
with open(tmpdir.name +os.path.sep + 'tst.pyx', 'w') as f:
...
setup_path = tmpdir.name + os.path.sep + "setup_tst.py"
and then:
subprocess.run(["python", setup_path, "build_ext", "-i"],
check=True, cwd=tmpdir.name)
cdef extern from "math.h": | ||
cdef extern double sqrt(double x) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
|
||
|
||
skl_dir = sys.argv[1] | ||
skl_dir = os.path.abspath(skl_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, let's use a oneliner with with a more explicit variable name:
skl_dir = os.path.abspath(skl_dir) | |
sklearn_dir = os.path.abspath(sys.argv[1]) |
skl_dir = os.path.abspath(skl_dir) | ||
|
||
# find all .pxd files | ||
pxd_files = glob.glob(os.path.join(skl_dir, '**', '*.pxd'), recursive=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way for this kind of helper maintenance script we could use recent feature of Python 3.6+ such as pathlib:
import pathlib
tmpdir = pathlib.Path(tmpdir.name)
setup_path = tmpdir / "setup.py"
sklearn_dir = pathlib.Path(sys.argv[1])
pxd_files = sklearn_dir.glob("**/*.pxd")
print("> Found pxd files:")
for pxd_file in pxd_files:
print(' -', str(pxd_file))
...
Is this ready to be merged now? I can't give a review, but it'd be nice to have in the RC3 (working on it now). |
I addressed all comments from @ogrisel so I guess it should be good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be run by CI? If it's costly, it can go into the wheel build
It's not costly at all (takes a few seconds) but its goal is to be run on scikit-learn installed from source distribution. it could be interesting to generate the sdist and install from that in one of the ci jobs however. |
Can we merge this, have in the RC, and do the CI later? |
Ok for me |
* include all pxd files in package * _libsvm _liblinear to pxi + add a test * simplify stuff + use pathlib * cln * more cln
* include all pxd files in package * _libsvm _liblinear to pxi + add a test * simplify stuff + use pathlib * cln * more cln
* include all pxd files in package * _libsvm _liblinear to pxi + add a test * simplify stuff + use pathlib * cln * more cln
Fixes #14847
Alternative to #14896
We can specify, in the top level setup, the files that need to be included in the package. (in combination to manifest.in but .pxd are already included in it).
I just tested it and it works:
then
find /path/to/conda/envs/tmp/lib/python3.7/site-packages/sklearn/ -name '*.pxd'
shows all 17 .pxd files as in the sklearn repo.
@rth @ogrisel @jnothman @adrinjalali @realead