Open
Description
Describe the bug
Our team just used the class sklearn.linear_model.LinearRegression to do multi-linear regression. And, we found out that the same data, which means the values are identical for each element, with different data formats will return different coefficients. The array we use are all integer values but saved as float32 and float64, separately. However, the array in float64 as input get us the expected result, but float32 doesn't.
The code below is a simple reproducible example.
Steps/Code to Reproduce
In [1]: from sklearn.linear_model import LinearRegression
...: import numpy as np
...: rng = np.random.default_rng(0)
...: x_1_array = rng.integers(low=1000, high=10000, size=10_000_000)
...: x_2_array = rng.integers(low=1000, high=10000, size=10_000_000)
...: x_3_array = rng.integers(low=1000, high=10000, size=10_000_000)
...: y_array = 3 * x_1_array + x_2_array - 5 * x_3_array
...: intercept_data = rng.integers(low=-3, high=3)
...: print(f'True intercept {intercept_data}')
...: y_array += intercept_data
...: X_train = np.stack([x_1_array,x_2_array,x_3_array],axis=-1)
...: y_train = y_array
...: linreg = LinearRegression()
...: model = linreg.fit(X_train.astype(np.float64), y_train)
...: print(f'Float 64 Intercept: {linreg.intercept_}')
...: print(f'Float 64 Coefficient: {linreg.coef_}')
...: model = linreg.fit(X_train.astype(np.float32), y_train)
...: print(f'Float 32 Intercept: {linreg.intercept_}')
...: print(f'Float 32 Coefficient: {linreg.coef_}')
Expected Results
Actual Results
Versions
System:
python: 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08) [Clang 6.0 (clang-600.0.57)]
executable: /Users/lazy/PycharmProjects/projects/bin/python
machine: Darwin-21.4.0-x86_64-i386-64bit
Python dependencies:
pip: 21.3
setuptools: 58.2.0
sklearn: 1.0.2
numpy: 1.21.4
scipy: 1.7.2
Cython: None
pandas: 1.3.4
matplotlib: 3.4.3
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True