Elegant, production-ready extensions for Scikit-learn pipelines
Save time, build faster, scale better 🚀
scikitelearn-collections
is a curated collection of robust utilities, transformers, wrappers, and experiment tools built on top of the Scikit-learn ecosystem. It helps you streamline model development, experiment tracking, and pipeline customization — all with full Scikit-learn compatibility.
- ✅ Plug-and-play
Pipeline
andColumnTransformer
components - ✅ Drop-in feature generators (dates, text, outliers, etc.)
- ✅ Advanced custom transformers and meta-estimators
- ✅ Support for nested cross-validation and custom scorers
- ✅ Compatible with
GridSearchCV
andRandomizedSearchCV
- ✅ Simple model evaluation wrappers with logging
- ✅ Utility functions for feature selection, data cleaning, and split strategies
- ✅ Modular design for experimentation & reproducibility
- ✅ Clean, tested, and production-grade Python code
- ✅ 100% compatible with Scikit-learn’s API & best practices
- Python 3.8+
- scikit-learn >= 1.0
- numpy, pandas, joblib
pip install scikitelearn-collections
Until then, you can clone manually:
git clone https://github.com/your-username/scikitelearn-collections.git
cd scikitelearn-collections
pip install -e .
from sklearn.pipeline import Pipeline
from scikitelearn_collections.transformers import DateFeatureGenerator, OutlierRemover
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline([
("date_features", DateFeatureGenerator(columns=["signup_date"])),
("remove_outliers", OutlierRemover(method="zscore", threshold=3.0)),
("classifier", LogisticRegression())
])
pipeline.fit(X_train, y_train)
Module | Description |
---|---|
transformers/ |
Custom transformers (dates, outliers, encodings, etc.) |
pipelines/ |
Reusable ML pipelines with preprocessing and modeling |
wrappers/ |
Model wrappers for enhanced evaluation, prediction, and logging |
validators/ |
Custom cross-validation strategies and metric calculators |
utils/ |
Helper utilities for splits, selection, diagnostics |
examples/ |
Real-world usage examples in Jupyter notebooks |
scikitelearn-collections/
│
├── transformers/ # Custom transformers
├── pipelines/ # Ready-to-use ML pipelines
├── wrappers/ # Model and metric wrappers
├── utils/ # Helper functions and classes
├── validators/ # Scoring & validation strategies
├── examples/ # Example notebooks and scripts
├── tests/ # Unit tests
└── README.md # You're here!
Explore the examples/
directory for practical Jupyter notebooks:
- ✅ Binary classification with preprocessing
- ✅ Regression with feature engineering
- ✅ Outlier detection & removal
- ✅ Cross-validation with custom scoring
- ✅ Hyperparameter tuning with pipeline integration
We ❤️ contributions! To contribute:
- Fork this repository
- Create a new branch:
git checkout -b feature/your-feature
- Write clean, tested code
- Ensure all tests pass with
pytest
- Submit a pull request 🚀
All modules include unit tests in the tests/
directory. Run:
pytest
We use Black for code formatting and expect all code to follow PEP8 guidelines.
This project is licensed under the MIT License.
- Built with ❤️ using Scikit-learn
- Inspired by real-world ML use-cases in research & production
- Thanks to open-source contributors and community ideas
Have questions or suggestions? Open an issue or start a discussion!
Let your pipelines be elegant, reusable, and powerful. —
scikitelearn-collections