-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MAINT Remove -Wcpp warnings when compiling sklearn.svm._liblinear #25112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You found an even better solution. 👍
This LGTM given that compilation passes when "sklearn.svm._liblinear
" is added to USE_NEWEST_NUMPY_C_API
:
Lines 64 to 70 in cbfb6ab
# XXX: add new extensions to this list when they | |
# are not using the old NumPy C API (i.e. version 1.7) | |
# TODO: when Cython>=3.0 is used, make sure all Cython extensions | |
# use the newest NumPy C API by `#defining` `NPY_NO_DEPRECATED_API` to be | |
# `NPY_1_7_API_VERSION`, and remove this list. | |
# See: https://github.com/cython/cython/blob/1777f13461f971d064bd1644b02d92b350e6e7d1/docs/src/userguide/migrating_to_cy30.rst#numpy-c-api # noqa | |
USE_NEWEST_NUMPY_C_API = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just to make sure my comment #25112 (comment) above is not missed, the current stat of this PR introduces an additional memory copy of the training set which we want to avoid in this wrapper. |
I've opened OmarManzoor#1 to propose a resolution. |
Suggestions for scikit-learn#25112
* Use separate memory views for float64 and float32 to handle the possibly dtypes of X * Separate the functionality to get the bytes of X in functions for sparse and normal ndarray
I've merge |
…kit-learn into cython_liblinear
The CI failures were caused by an actual error. I basically used two functions to split out the code but I think that might be resulting in some kind of memory issues as the code crashes when running the common tests. I have pushed the latest changes so that all the functionality remains in the train_wrap function. |
sklearn/svm/_liblinear.pyx
Outdated
cdef parameter *param | ||
cdef problem *problem | ||
cdef model *model | ||
cdef char_const_ptr error_msg | ||
cdef int len_w | ||
cdef bint x_has_type_float64 = X.dtype == np.float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if every variable in Python must be snake cased, in scikit-learn we do capitalize X
and related variables. In this case, we would have:
cdef bint x_has_type_float64 = X.dtype == np.float64 | |
cdef bint X_has_type_float64 = X.dtype == np.float64 |
This suggestions also applies to similar variable hereinafter.
Have you observed memoryleaks or other errors due to memory management? Intuitively, if you are wrapping the logic in functions, the memoryview are local stack-allocated structs in this context and the returned pointer points to an invalid region of memory after the function returns. |
Actually I am not sure whether it was a memory leak or some other memory related issue since all I observed was a Python segmentation fault when I run test_common.py with pytest.
Thank you for the intuitive explanation. |
OK, the "segmentation fault" is a (usually fatal) C error which is captured and transferred back by Python in your case. This error indicates that the process accessed an invalid segment of memory during its execution. If this error was not fatal (due to some handling done by CPython) problems related to memory management (like memory leaks) might happen. I think the remark that I gave in #25112 (comment) above hold in this case: you are returning addresses of local variables (i.e. variables that are temporarily allocated in one of the process' memory segments: the stack) via pointers. If such a pointer is then dereferenced to get access to the value it points to, a segmentation fault is raised because the local variable has been deallocated on return. This does not appear in the inline boilerplate I propose because those local variable are still accessible by pointers as they exist in the same scope. Is this remark understandable? |
Yes thank you very much for the explanation! |
@ogrisel Could you kindly have a look at this PR again when you get the time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. The build log is clean, +1 for merge!
Thanks for the contribution @OmarManzoor!
The pending runs for Circle CI are independent of those changes and also happen in other PRs. I am merging this manually and take responsibility if this this merge creates any problems (this is very unlikely in my opinion as Linux ARM64 is the only untested configuration and as all the other configurations' test suite pass). |
Thank you! |
…ikit-learn#25112) * MAINT Remove -Wcpp warnings when compiling sklearn.svm._liblinear * Convert the required data in X to bytes * Add NULL check for class_weight_label * Add NULL check for class_weight * Add sklearn.svm._liblinear in setup.py * Use intermediate memoryviews for static reinterpretation of dtypes * * Remove the usage of tobytes() * Use separate memory views for float64 and float32 to handle the possibly dtypes of X * Separate the functionality to get the bytes of X in functions for sparse and normal ndarray * Remove the use of functions and implement the functionality directly in train_wrap * Minor refactor * Refactor variables names involving X to use capital X * Define y, class_weight and sample_weight as const memory views * Use const with X_indices and X_indptr memory view declarations Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
…ikit-learn#25112) * MAINT Remove -Wcpp warnings when compiling sklearn.svm._liblinear * Convert the required data in X to bytes * Add NULL check for class_weight_label * Add NULL check for class_weight * Add sklearn.svm._liblinear in setup.py * Use intermediate memoryviews for static reinterpretation of dtypes * * Remove the usage of tobytes() * Use separate memory views for float64 and float32 to handle the possibly dtypes of X * Separate the functionality to get the bytes of X in functions for sparse and normal ndarray * Remove the use of functions and implement the functionality directly in train_wrap * Minor refactor * Refactor variables names involving X to use capital X * Define y, class_weight and sample_weight as const memory views * Use const with X_indices and X_indptr memory view declarations Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
Reference Issues/PRs
Towards #24875
What does this implement/fix? Explain your changes.
Any other comments?