home

Welcome to mymodels!

Interpretable machine learning has gained significant prominence across various fields. Machine learning models are valued for their robust capability to capture complex relationships within data through sophisticated fitting algorithms. Complementing these models, interpretability frameworks provide essential tools for revealing such "black-box" models. These interpretable approaches deliver critical insights by ranking feature importance, identifying nonlinear response thresholds, and analyzing interaction relationships between factors.

Project mymodels, is targeting on building a tiny, user-friendly, and efficient workflow, for the scientific researchers and students who are seeking to implement interpretable machine learning in their their research works.

Supported encode methods

Onehot
Ordinal
Binary

Supported models

Regression		Classification
model_name	Models	model_name	Models
lr	Linear Regression	lc	Logistic Regression
svr	Support Vector Regression	svc	Support Vector Classification
knr	K-Nearest Neighbors Regression	knc	K-Nearest Neighbors Classification
mlpr	Multi-Layer Perceptron Regressor	mlpc	Multi-Layer Perceptron Classifier
dtr	Decision Tree Regressor	dtc	Decision Tree Classifier
rfr	Random Forest Regressor	rfc	Random Forest Classifier
gbdtr	Gradient Boosted Decision Trees (GBDT) Regressor	gbdtc	Gradient Boosted Decision Trees (GBDT) Classifier
adar	AdaBoost Regressor	adac	AdaBoost Classifier
xgbr	XGBoost Regressor	xgbc	XGBoost Classifier
lgbr	LightGBM Regressor	lgbc	LightGBM Classifier
catr	CatBoost Regressor	catc	CatBoost Classifier

Recommanded usage

For running on the multi-cores platform

To ensure system stability and optimal performance, thread management must adhere to one of the following protocols:

Eliminate Nesting: Refactor the process to remove nested parallelism. Parallel execution should be restricted to the single, most appropriate layer.
Constrain Thread Product: If nested parallelism is required, the product of the thread counts at all levels must be less than or equal to ($\le$) the total number of available logical cores.

For example:

System Specification: 32 available logical cores.
Process Architecture:
- Level 1: Main process (executing 5-fold cross-validation).
- Level 2: Model (internal parallelism).
Recommended Configuration:
- Set Level 1 parallelism = 5
- Set Level 2 (model) maximum parallelism = 6
Result: The total concurrent threads ($5 \times 6 = 30$) remains within the system capacity of 32 cores.
Performance Tuning: Initial testing indicates that performance is optimized when the main process parallelism (Level 1) is set equal to the number of cross-validation folds.
Resource Monitoring: Concurrently monitor memory utilization. Increasing parallelism introduces significant memory overhead, which must be managed.
Validation: It is standard procedure to profile multiple nested parallelism combinations to identify the optimal configuration for the specific workload.

The users should know

This project is intended solely for scientific reference. It may contain calculation errors or logical inaccuracies. Users are responsible for verifying the accuracy of the results independently, and the author shall not be held liable for any consequences arising from the use of this code.
Due to the developer's limited personal capabilities and time constraints, the project may inevitably have shortcomings. We sincerely welcome fellow professionals to provide critiques and suggestions for improvement.
Note that explanations may not always be meaningful for real-world tasks, especially after data engineering. Users are solely responsible for validating the appropriateness of explanation methods for their specific use cases.
The project is not suitable for time-series tasks.
The hyperparameters shown in models_configs.yml are only for demonstration purposes. Users should try different hyperparameters in their actual applications to ensure the robustness of their results.
The random_state is set to 0 for demonstration purposes only. Users should try different random_state in their actual applications to ensure the robustness of their results.
The explanation in this project is currently based on SHAP and PDP (Partial Dependence Plot). Other explanation methods (i.e., ALE, FI) are under supporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!