Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The data type of input data for LinearRegression class will affect the results #23376

Open
@zhenliu26

Description

@zhenliu26

Describe the bug

Our team just used the class sklearn.linear_model.LinearRegression to do multi-linear regression. And, we found out that the same data, which means the values are identical for each element, with different data formats will return different coefficients. The array we use are all integer values but saved as float32 and float64, separately. However, the array in float64 as input get us the expected result, but float32 doesn't.

The code below is a simple reproducible example.

Steps/Code to Reproduce

In [1]: from sklearn.linear_model import LinearRegression
    ...: import numpy as np
    ...: rng = np.random.default_rng(0)
    ...: x_1_array = rng.integers(low=1000, high=10000, size=10_000_000)
    ...: x_2_array = rng.integers(low=1000, high=10000, size=10_000_000)
    ...: x_3_array = rng.integers(low=1000, high=10000, size=10_000_000)
    ...: y_array = 3 * x_1_array + x_2_array - 5 * x_3_array
    ...: intercept_data = rng.integers(low=-3, high=3)
    ...: print(f'True intercept {intercept_data}')
    ...: y_array += intercept_data
    ...: X_train = np.stack([x_1_array,x_2_array,x_3_array],axis=-1)
    ...: y_train = y_array
    ...: linreg = LinearRegression()
    ...: model = linreg.fit(X_train.astype(np.float64), y_train)
    ...: print(f'Float 64 Intercept: {linreg.intercept_}')
    ...: print(f'Float 64 Coefficient: {linreg.coef_}')
    ...: model = linreg.fit(X_train.astype(np.float32), y_train)
    ...: print(f'Float 32 Intercept: {linreg.intercept_}')
    ...: print(f'Float 32 Coefficient: {linreg.coef_}')

Expected Results

image

Actual Results

image

Versions

System:
    python: 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08)  [Clang 6.0 (clang-600.0.57)]
executable: /Users/lazy/PycharmProjects/projects/bin/python
   machine: Darwin-21.4.0-x86_64-i386-64bit

Python dependencies:
          pip: 21.3
   setuptools: 58.2.0
      sklearn: 1.0.2
        numpy: 1.21.4
        scipy: 1.7.2
       Cython: None
       pandas: 1.3.4
   matplotlib: 3.4.3
       joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions