TST Change expected result type np.int64 -> np.int #18089

ckastner · 2020-08-04T16:43:50Z

The function being tested uses np.int, so match that in the test. The
test otherwise fails on 32-bit architectures.

Closes: #18084

rth

Thanks @ckastner !

We be good to run this test in CI, likely by specifying the version of pandas to install so it is not skipped

scikit-learn/azure-pipelines.yml

Line 148 in 6e0ae8a

PANDAS_VERSION: 'none'

thomasjpfan

Thank you for the PR @ckastner !

thomasjpfan · 2020-08-05T15:21:27Z

sklearn/datasets/tests/test_base.py

@@ -186,12 +186,12 @@ def test_loader(loader_func, data_shape, target_shape, n_target, has_descr,


 @pytest.mark.parametrize("loader_func, data_dtype, target_dtype", [
-    (load_breast_cancer, np.float64, np.int64),
+    (load_breast_cancer, np.float64, np.int),


np.int is being deprecated in numpy. Does the following work?

Suggested change

(load_breast_cancer, np.float64, np.int),

(load_breast_cancer, np.float64, np.int_),

Actually, my use of np.int was mistaken anyway. The data-loading function uses plain (builtin) int, not np.int as I originally claimed:

scikit-learn/sklearn/datasets/_base.py

Line 267 in 8f09e33

target[i] = np.asarray(ir[-1], dtype=int)

I propose switching to that (PR updated), so that the data-loading function and the test are consistent. If you'd prefer np.int_, please let me know. Sorry about the confusion.

On a side note: grep -R 'np\.int[^_1368a-z]' sklearn/* gives me ~100 occurrences of np.int in the codebase which will probably need to be updated at some point, along with the other deprecated types.

int is works as well.

thomasjpfan · 2020-08-05T15:35:27Z

As for the PANDAS_VERSION in config.yml, removing the line will test on the latest version of pandas.

(I hope we do not run into 32 bit + pandas issues)

The function being tested uses int, so match that in the test. The test otherwise fails on 32-bit architectures. Closes: scikit-learn#18084

ckastner · 2020-08-05T22:59:45Z

As for the PANDAS_VERSION in config.yml, removing the line will test on the latest version of pandas.

(I hope we do not run into 32 bit + pandas issues)

To add a data point, the packages we build for Debian are built on a number of 32-bit architectures (x86, ARM, MIPS), and all builds use pandas. All but very few tests pass, and IIRC the few failing tests have indications other than pandas.

rth · 2020-08-06T10:09:09Z

Thanks for the information! Could you still enable pandas in the 32 bit build? It's not currently in the diff..

ckastner · 2020-08-06T10:26:21Z

Sorry about that, I thought that comment wasn't meant for me. Enabled and pushed.

Just for my own understanding: I assume this enabling is limited to this particular PR and won't be merged? Because if the intention is to merge, then it should probably also be enabled for the 64-bit build, where it is currently disabled.

ckastner · 2020-08-06T12:22:34Z

The change did not have an effect on CI, the tests involving pandas were skipped again. I assume that deploying this change involves further steps?

thomasjpfan

I updated this PR to enable pandas.

LGTM

thomasjpfan · 2020-08-07T01:39:31Z

build_tools/azure/install.sh

@@ -50,7 +50,8 @@ elif [[ "$DISTRIB" == "ubuntu" ]]; then

 elif [[ "$DISTRIB" == "ubuntu-32" ]]; then
    apt-get update
-    apt-get install -y python3-dev python3-scipy python3-matplotlib libatlas3-base libatlas-base-dev python3-virtualenv
+    apt-get install -y python3-dev python3-scipy python3-matplotlib libatlas3-base libatlas-base-dev python3-virtualenv python3-pandas


We use numpy from apt-get thus we need to use pandas from there as well.

thomasjpfan · 2020-08-07T01:40:58Z

sklearn/utils/tests/test_validation.py

@@ -1200,7 +1200,7 @@ def test_check_fit_params(indices):
 def test_check_sparse_pandas_sp_format(sp_format):
    # check_array converts pandas dataframe with only sparse arrays into
    # sparse matrix
-    pd = pytest.importorskip("pandas")
+    pd = pytest.importorskip("pandas", minversion="0.25.0")


sparse support was added in 0.25 and the pandas version installed on 18.04 is 0.22.0.

thomasjpfan · 2020-08-07T01:41:51Z

Error in CI is for OSX which is a timeout issue that is not related.

rth

Thanks @ckastner and @thomasjpfan !

Co-authored-by: Thomas J. Fan <[email protected]>

github-actions bot added the module:datasets label Aug 4, 2020

rth reviewed Aug 5, 2020

View reviewed changes

thomasjpfan reviewed Aug 5, 2020

View reviewed changes

TST Change expected result type np.int64 -> int

5ae58bd

The function being tested uses int, so match that in the test. The test otherwise fails on 32-bit architectures. Closes: scikit-learn#18084

ckastner force-pushed the numpy-int branch from f8666b2 to 5ae58bd Compare August 5, 2020 22:48

Enable pandas in CI

02e4f4f

CI Install pandas

ac9418b

thomasjpfan approved these changes Aug 7, 2020

View reviewed changes

rth approved these changes Aug 7, 2020

View reviewed changes

rth merged commit 5af6561 into scikit-learn:master Aug 7, 2020

ckastner deleted the numpy-int branch August 7, 2020 06:55

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

TST Change expected result type np.int64 -> np.int (scikit-learn#18089)

3368d41

Co-authored-by: Thomas J. Fan <[email protected]>

	(load_breast_cancer, np.float64, np.int),
	(load_breast_cancer, np.float64, np.int_),

Uh oh!

TST Change expected result type np.int64 -> np.int #18089

TST Change expected result type np.int64 -> np.int #18089

Uh oh!

Conversation

ckastner commented Aug 4, 2020

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 5, 2020

Choose a reason for hiding this comment

Uh oh!

ckastner Aug 5, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Aug 5, 2020

Uh oh!

ckastner commented Aug 5, 2020

Uh oh!

rth commented Aug 6, 2020

Uh oh!

ckastner commented Aug 6, 2020

Uh oh!

ckastner commented Aug 6, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Aug 7, 2020

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!