Improve error message when Pipeline step is a class instead of instance #32721
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #32719
What does this implement/fix? Explain your changes.
This PR improves the error message when a user accidentally passes an uninstantiated class to a Pipeline step instead of an instance.
Before (confusing error):
AttributeError: 'numpy.ndarray' object has no attribute '_validate_params'
After (clear error):
TypeError: All steps should be estimator instances (objects), not classes.
Step 'pca' is a class: (PCA). Did you forget parentheses?
Use PCA() instead of PCA.
Changes made:
_validate_steps()to check if a step is a class usinginspect.isclass()TypeErrorwith a helpful messagetest_pipeline_step_class_instead_of_instance()to verify the improved error messageAny other comments?
This is my first contribution to scikit-learn. I've manually reviewed all changes and run the full test suite. All pipeline tests pass (141 passed, 5 skipped). Please let me know if any changes are needed!