-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST Change expected result type np.int64 -> np.int #18089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ckastner !
We be good to run this test in CI, likely by specifying the version of pandas to install so it is not skipped
scikit-learn/azure-pipelines.yml
Line 148 in 6e0ae8a
PANDAS_VERSION: 'none' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @ckastner !
sklearn/datasets/tests/test_base.py
Outdated
@@ -186,12 +186,12 @@ def test_loader(loader_func, data_shape, target_shape, n_target, has_descr, | |||
|
|||
|
|||
@pytest.mark.parametrize("loader_func, data_dtype, target_dtype", [ | |||
(load_breast_cancer, np.float64, np.int64), | |||
(load_breast_cancer, np.float64, np.int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.int
is being deprecated in numpy. Does the following work?
(load_breast_cancer, np.float64, np.int), | |
(load_breast_cancer, np.float64, np.int_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my use of np.int
was mistaken anyway. The data-loading function uses plain (builtin) int
, not np.int
as I originally claimed:
scikit-learn/sklearn/datasets/_base.py
Line 267 in 8f09e33
target[i] = np.asarray(ir[-1], dtype=int) |
I propose switching to that (PR updated), so that the data-loading function and the test are consistent. If you'd prefer np.int_
, please let me know. Sorry about the confusion.
On a side note: grep -R 'np\.int[^_1368a-z]' sklearn/*
gives me ~100 occurrences of np.int
in the codebase which will probably need to be updated at some point, along with the other deprecated types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
is works as well.
As for the (I hope we do not run into 32 bit + pandas issues) |
The function being tested uses int, so match that in the test. The test otherwise fails on 32-bit architectures. Closes: scikit-learn#18084
To add a data point, the packages we build for Debian are built on a number of 32-bit architectures (x86, ARM, MIPS), and all builds use pandas. All but very few tests pass, and IIRC the few failing tests have indications other than pandas. |
Thanks for the information! Could you still enable pandas in the 32 bit build? It's not currently in the diff.. |
Sorry about that, I thought that comment wasn't meant for me. Enabled and pushed. Just for my own understanding: I assume this enabling is limited to this particular PR and won't be merged? Because if the intention is to merge, then it should probably also be enabled for the 64-bit build, where it is currently disabled. |
The change did not have an effect on CI, the tests involving pandas were skipped again. I assume that deploying this change involves further steps? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this PR to enable pandas.
LGTM
@@ -50,7 +50,8 @@ elif [[ "$DISTRIB" == "ubuntu" ]]; then | |||
|
|||
elif [[ "$DISTRIB" == "ubuntu-32" ]]; then | |||
apt-get update | |||
apt-get install -y python3-dev python3-scipy python3-matplotlib libatlas3-base libatlas-base-dev python3-virtualenv | |||
apt-get install -y python3-dev python3-scipy python3-matplotlib libatlas3-base libatlas-base-dev python3-virtualenv python3-pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use numpy from apt-get
thus we need to use pandas from there as well.
@@ -1200,7 +1200,7 @@ def test_check_fit_params(indices): | |||
def test_check_sparse_pandas_sp_format(sp_format): | |||
# check_array converts pandas dataframe with only sparse arrays into | |||
# sparse matrix | |||
pd = pytest.importorskip("pandas") | |||
pd = pytest.importorskip("pandas", minversion="0.25.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse support was added in 0.25
and the pandas version installed on 18.04 is 0.22.0
.
Error in CI is for OSX which is a timeout issue that is not related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ckastner and @thomasjpfan !
Co-authored-by: Thomas J. Fan <[email protected]>
The function being tested uses np.int, so match that in the test. The
test otherwise fails on 32-bit architectures.
Closes: #18084