Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@gguiomar
Copy link
Contributor

Description

Fixes #31318

This PR adds validation to FeatureUnion._hstack() to prevent silent failures when transformers return 1D arrays instead of the expected 2D arrays.

Changes Made

  • Modified FeatureUnion._hstack() to validate transformer outputs are 2D
  • Added checks for consistent sample counts across transformers
  • Updated transform() method to pass transformer names for better error messages
  • Added informative error messages including transformer name and expected shapes
  • Added test case to verify the validation works correctly

Before

# This would silently produce meaningless results
FeatureUnion([
    ("a", FunctionTransformer(lambda df: df["a"])),  # Returns 1D
    ("b", FunctionTransformer(lambda df: df["b"])),  # Returns 1D
]).fit_transform(data)
# Output: array([0, 1, 2, 0, 1, 2])  # Meaningless concatenation

# This now raises an informative error (please check the test_feature_union_xs_dims() in tests/test_pipeline.py
FeatureUnion([
    ("a", FunctionTransformer(lambda df: df["a"])),
    ("b", FunctionTransformer(lambda df: df["b"])),
]).fit_transform(data)
# Raises: ValueError: Transformer 'a' returned an array with 1 dimensions, 
#         but expected 2 dimensions (n_samples, n_features).

@github-actions
Copy link

github-actions bot commented Jun 16, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: c1ebb5d. Link to the linter CI: here

@jeremiedbb
Copy link
Member

Thanks for the PR @gguiomar. I directly pushed a commit with some nitpicks and small simplifications.

I also removed the check for n_samples for now because I'm not sure if we really want to check that. If Xs contains arrays of different sizes, np.hstack will complain with an informative error message so adding a message of our own before is not really useful. And if they have the same length but different than the input, I'm not sure that we should raise, there may be a use case out there. Anyway it can be done in a different PR if needed so that we keep this one about the original bug only.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit 6d2c9f2 into scikit-learn:main Jul 19, 2025
40 checks passed
lucyleeow pushed a commit to lucyleeow/scikit-learn that referenced this pull request Aug 22, 2025
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Sep 3, 2025
@jeremiedbb jeremiedbb mentioned this pull request Sep 3, 2025
13 tasks
jeremiedbb added a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:pipeline Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError raised by FeatureUnion._set_output with FunctionTransform that outputs a pandas Series in scikit-learn version 1.6

3 participants