Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Add MedianAbsoluteDeviationScaler (MADScaler) to sklearn.preprocessing #31621

Open
@sisird864

Description

@sisird864

Describe the workflow you want to enable

Today, if a user wants to centre features by the median and scale them by the median absolute deviation (MAD) they must hand-roll code like:
mad = 1.4826 * np.median(np.abs(X - np.median(X, axis=0)), axis=0)
X_scaled = (X - np.median(X, axis=0)) / mad

A built-in MedianAbsoluteDeviationScaler (or a statistic="mad" option on RobustScaler) would let them write a single, self-documenting line:
from sklearn.preprocessing import MedianAbsoluteDeviationScaler
X_scaled = MedianAbsoluteDeviationScaler().fit_transform(X)

That makes robust MAD scaling first-class, composable in pipelines, and reversible via inverse_transform().

Describe your proposed solution

Add a new transformer:
class MedianAbsoluteDeviationScaler(BaseEstimator, TransformerMixin):
with_centering: bool = True
with_scaling: bool = True
copy: bool = True
unit_variance: bool = False

# learned in fit
center_: ndarray
scale_: ndarray

Fit logic

  1. center_ = np.median(X, axis=0) (if with_centering)

  2. mad = np.median(np.abs(X - center_), axis=0) * 1.4826

  3. Guard against zeros with float_eps, store in scale_.

transform() and inverse_transform() reuse the pattern from RobustScaler.

Docs / tests

  • Unit tests for shape preservation, inverse-transform round-trip, and robustness to outliers.

  • A short subsection in preprocessing.rst and a gallery example comparing Standard, Robust (IQR) and MAD scalers.

  • Changelog bullet in whats_new/v1.5.rst.

I am happy to implement this within ~2 weeks.

Describe alternatives you've considered, if relevant

  • Keep user-land recipes – fragments the ecosystem and lacks inverse_transform().

  • Extend RobustScaler with statistic={"iqr","mad"} (default "iqr"). This also works, but changes a long-standing API and may require a deprecation cycle.

Additional context

  • MAD is a well-known σ-consistent robust scale estimator, more efficient than IQR for symmetric heavy-tailed or Laplace-like data.

  • Estimated code diff ≈ 20 LOC plus tests/docs – labelled “good first issue” size.

/cc @jnothman @glemaitre for initial design feedback — thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions