Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX Add validation for FeatureUnion transformer outputs (#31318) #31559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gguiomar
Copy link

Description

Fixes #31318

This PR adds validation to FeatureUnion._hstack() to prevent silent failures when transformers return 1D arrays instead of the expected 2D arrays.

Changes Made

  • Modified FeatureUnion._hstack() to validate transformer outputs are 2D
  • Added checks for consistent sample counts across transformers
  • Updated transform() method to pass transformer names for better error messages
  • Added informative error messages including transformer name and expected shapes
  • Added test case to verify the validation works correctly

Before

# This would silently produce meaningless results
FeatureUnion([
    ("a", FunctionTransformer(lambda df: df["a"])),  # Returns 1D
    ("b", FunctionTransformer(lambda df: df["b"])),  # Returns 1D
]).fit_transform(data)
# Output: array([0, 1, 2, 0, 1, 2])  # Meaningless concatenation

# This now raises an informative error (please check the test_feature_union_xs_dims() in tests/test_pipeline.py
FeatureUnion([
    ("a", FunctionTransformer(lambda df: df["a"])),
    ("b", FunctionTransformer(lambda df: df["b"])),
]).fit_transform(data)
# Raises: ValueError: Transformer 'a' returned an array with 1 dimensions, 
#         but expected 2 dimensions (n_samples, n_features).

Copy link

github-actions bot commented Jun 16, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f27c86e. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError raised by FeatureUnion._set_output with FunctionTransform that outputs a pandas Series in scikit-learn version 1.6
1 participant