-
Notifications
You must be signed in to change notification settings - Fork 3
home
Interpretable machine learning has gained significant prominence across various fields. Machine learning models are valued for their robust capability to capture complex relationships within data through sophisticated fitting algorithms. Complementing these models, interpretability frameworks provide essential tools for revealing such "black-box" models. These interpretable approaches deliver critical insights by ranking feature importance, identifying nonlinear response thresholds, and analyzing interaction relationships between factors.
Project mymodels, is targeting on building a tiny, user-friendly, and efficient workflow, for the scientific researchers and students who are seeking to implement interpretable machine learning in their their research works.
- Onehot
- Ordinal
- Binary
| Regression | Classification | ||
|---|---|---|---|
| model_name | Models | model_name | Models |
| lr | Linear Regression | lc | Logistic Regression |
| svr | Support Vector Regression | svc | Support Vector Classification |
| knr | K-Nearest Neighbors Regression | knc | K-Nearest Neighbors Classification |
| mlpr | Multi-Layer Perceptron Regressor | mlpc | Multi-Layer Perceptron Classifier |
| dtr | Decision Tree Regressor | dtc | Decision Tree Classifier |
| rfr | Random Forest Regressor | rfc | Random Forest Classifier |
| gbdtr | Gradient Boosted Decision Trees (GBDT) Regressor | gbdtc | Gradient Boosted Decision Trees (GBDT) Classifier |
| adar | AdaBoost Regressor | adac | AdaBoost Classifier |
| xgbr | XGBoost Regressor | xgbc | XGBoost Classifier |
| lgbr | LightGBM Regressor | lgbc | LightGBM Classifier |
| catr | CatBoost Regressor | catc | CatBoost Classifier |
To ensure system stability and optimal performance, thread management must adhere to one of the following protocols:
- Eliminate Nesting: Refactor the process to remove nested parallelism. Parallel execution should be restricted to the single, most appropriate layer.
-
Constrain Thread Product: If nested parallelism is required, the product of the thread counts at all levels must be less than or equal to (
$\le$ ) the total number of available logical cores.
For example:
-
System Specification: 32 available logical cores.
-
Process Architecture:
- Level 1: Main process (executing 5-fold cross-validation).
- Level 2: Model (internal parallelism).
-
Recommended Configuration:
- Set Level 1 parallelism = 5
- Set Level 2 (model) maximum parallelism = 6
-
Result: The total concurrent threads (
$5 \times 6 = 30$ ) remains within the system capacity of 32 cores. -
Performance Tuning: Initial testing indicates that performance is optimized when the main process parallelism (Level 1) is set equal to the number of cross-validation folds.
-
Resource Monitoring: Concurrently monitor memory utilization. Increasing parallelism introduces significant memory overhead, which must be managed.
-
Validation: It is standard procedure to profile multiple nested parallelism combinations to identify the optimal configuration for the specific workload.
-
This project is intended solely for scientific reference. It may contain calculation errors or logical inaccuracies. Users are responsible for verifying the accuracy of the results independently, and the author shall not be held liable for any consequences arising from the use of this code.
-
Due to the developer's limited personal capabilities and time constraints, the project may inevitably have shortcomings. We sincerely welcome fellow professionals to provide critiques and suggestions for improvement.
-
Note that explanations may not always be meaningful for real-world tasks, especially after data engineering. Users are solely responsible for validating the appropriateness of explanation methods for their specific use cases.
-
The project is not suitable for time-series tasks.
-
The hyperparameters shown in
models_configs.ymlare only for demonstration purposes. Users should try different hyperparameters in their actual applications to ensure the robustness of their results. -
The
random_stateis set to0for demonstration purposes only. Users should try differentrandom_statein their actual applications to ensure the robustness of their results. -
The explanation in this project is currently based on SHAP and PDP (Partial Dependence Plot). Other explanation methods (i.e., ALE, FI) are under supporting.