MGS-GRF

If you face imbalance data in your machine learning project, this package is here to pre-process your data. It is an efficient and ready-to-use implementation of MGS-GRF, an oversampling strategy presented at ECML-PKDD 2025 conference, designed to handle large-scale and mixed imbalanced data-set — with both continuous and categorical features.

🛠 Installation

First you can clone the repository:

git clone [email protected]:artefactory/mgs-grf.git

And install the required packages into your environment (conda, mamba or pip):

pip install -r requirements.txt

🚀 How to use the MGS-GRF Algorithm to learn on imbalanced data

Here is a short example on how to use MGS-GRF:

from mgs_grf import MGSGRFOverSampler

## Apply MGS-GRF procedure to oversample the data
mgs_grf = MGSGRFOverSampler(categorical_features=categorical_features, random_state=0)
X_train_balanced, y_train_balanced = mgs_grf.fit_resample(X_train_imbalanced, y_train_imbalanced)

## Encode the categorical variables
enc = OneHotEncoder(handle_unknown="ignore", sparse_output=False)
X_train_balanced_enc = np.hstack((X_train_balanced[:,numeric_features],
                                  enc.fit_transform(X_train_balanced[:,categorical_features])))
X_test_enc = np.hstack((X_test[:,numeric_features], enc.transform(X_test[:,categorical_features])))

# Fit the final classifier on the augmented data
clf = lgb.LGBMClassifier(n_estimators=100, verbosity=-1, random_state=0)
clf.fit(X_train_balanced_enc, y_train_balanced)

A more detailed notebook example is available in this notebook.

🙏 Acknowledgements

This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.

📜 Citation

If you find the code useful, please consider citing us :

@inproceedings{sakho2025harnessing,
  title={Harnessing Mixed Features for Imbalance Data Oversampling: Application to Bank Customers Scoring},
  author={Sakho, Abdoulaye and Malherbe, Emmanuel and Gauthier, Carl-Erik and Scornet, Erwan},
  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
  pages={247--264},
  year={2025},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
example		example
experiments		experiments
mgs_grf		mgs_grf
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MGS-GRF

🛠 Installation

🚀 How to use the MGS-GRF Algorithm to learn on imbalanced data

🙏 Acknowledgements

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

artefactory/mgs-grf

Folders and files

Latest commit

History

Repository files navigation

MGS-GRF

🛠 Installation

🚀 How to use the MGS-GRF Algorithm to learn on imbalanced data

🙏 Acknowledgements

📜 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages