Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MOHAMMEDFAHD/Scikit-Learn-Collections

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scikitelearn-collections

Build Status License Python Version scikit-learn Code Style: Black Issues

Elegant, production-ready extensions for Scikit-learn pipelines
Save time, build faster, scale better 🚀

scikit-learn logo


🔍 Overview

scikitelearn-collections is a curated collection of robust utilities, transformers, wrappers, and experiment tools built on top of the Scikit-learn ecosystem. It helps you streamline model development, experiment tracking, and pipeline customization — all with full Scikit-learn compatibility.


✨ Features

  • ✅ Plug-and-play Pipeline and ColumnTransformer components
  • ✅ Drop-in feature generators (dates, text, outliers, etc.)
  • ✅ Advanced custom transformers and meta-estimators
  • ✅ Support for nested cross-validation and custom scorers
  • ✅ Compatible with GridSearchCV and RandomizedSearchCV
  • ✅ Simple model evaluation wrappers with logging
  • ✅ Utility functions for feature selection, data cleaning, and split strategies
  • ✅ Modular design for experimentation & reproducibility
  • ✅ Clean, tested, and production-grade Python code
  • ✅ 100% compatible with Scikit-learn’s API & best practices

📦 Installation

Requirements

  • Python 3.8+
  • scikit-learn >= 1.0
  • numpy, pandas, joblib

Install via pip (PyPI release coming soon)

pip install scikitelearn-collections

Until then, you can clone manually:

git clone https://github.com/your-username/scikitelearn-collections.git
cd scikitelearn-collections
pip install -e .

🚀 Quick Start

from sklearn.pipeline import Pipeline
from scikitelearn_collections.transformers import DateFeatureGenerator, OutlierRemover
from sklearn.linear_model import LogisticRegression

pipeline = Pipeline([
    ("date_features", DateFeatureGenerator(columns=["signup_date"])),
    ("remove_outliers", OutlierRemover(method="zscore", threshold=3.0)),
    ("classifier", LogisticRegression())
])

pipeline.fit(X_train, y_train)

🧠 Modules & Components

Module Description
transformers/ Custom transformers (dates, outliers, encodings, etc.)
pipelines/ Reusable ML pipelines with preprocessing and modeling
wrappers/ Model wrappers for enhanced evaluation, prediction, and logging
validators/ Custom cross-validation strategies and metric calculators
utils/ Helper utilities for splits, selection, diagnostics
examples/ Real-world usage examples in Jupyter notebooks

📁 Project Structure

scikitelearn-collections/
│
├── transformers/         # Custom transformers
├── pipelines/            # Ready-to-use ML pipelines
├── wrappers/             # Model and metric wrappers
├── utils/                # Helper functions and classes
├── validators/           # Scoring & validation strategies
├── examples/             # Example notebooks and scripts
├── tests/                # Unit tests
└── README.md             # You're here!

🧪 Examples

Explore the examples/ directory for practical Jupyter notebooks:

  • ✅ Binary classification with preprocessing
  • ✅ Regression with feature engineering
  • ✅ Outlier detection & removal
  • ✅ Cross-validation with custom scoring
  • ✅ Hyperparameter tuning with pipeline integration

✅ Contributing

We ❤️ contributions! To contribute:

  1. Fork this repository
  2. Create a new branch: git checkout -b feature/your-feature
  3. Write clean, tested code
  4. Ensure all tests pass with pytest
  5. Submit a pull request 🚀

🧪 Testing

All modules include unit tests in the tests/ directory. Run:

pytest

We use Black for code formatting and expect all code to follow PEP8 guidelines.


📄 License

This project is licensed under the MIT License.


🙌 Acknowledgements

  • Built with ❤️ using Scikit-learn
  • Inspired by real-world ML use-cases in research & production
  • Thanks to open-source contributors and community ideas

📬 Contact

Have questions or suggestions? Open an issue or start a discussion!


Let your pipelines be elegant, reusable, and powerful. — scikitelearn-collections

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published