README: Predicting Wine Quality with Machine Learning

Project Overview

This project focuses on building and improving a machine learning model to predict wine quality based on physicochemical attributes. Through iterative steps including feature engineering, data balancing, and hyperparameter tuning, the model's performance was significantly enhanced.

Dataset

Source: Kaggle: White Wine Quality Dataset
Description: Contains physicochemical properties of white wines and their quality scores (3 to 9).
Features: 11 physicochemical properties such as alcohol, pH, and residual sugar.
Target Variable: Wine quality (integer scores).

Steps and Methodology

1. Baseline Model Performance

Objective: Establish a benchmark for comparison.
Results:
- R² Score: 0.546
- RMSE: 0.593
Insights: The model struggled with imbalanced data and unexplored feature relationships.

2. Feature Engineering

Objective: Enhance the dataset with new meaningful features to improve predictive accuracy.
New Features Added:
- sugar_alcohol_ratio
- volatile_acidity_pH_ratio
- Interaction terms and other domain-specific features.
Impact: Improved the model's ability to capture complex relationships.

3. Addressing Class Imbalance with SMOTE

Objective: Balance the dataset to prevent bias toward majority classes.
Method: Applied Synthetic Minority Oversampling Technique (SMOTE).
Result: The resampled dataset ensured equal representation of all quality levels.

4. Hyperparameter Tuning with GridSearchCV

Objective: Optimize XGBoost model parameters for better performance.
Key Parameters Tuned: n_estimators, max_depth, learning_rate, subsample.
Outcome:
- R² Score: 0.954
- RMSE: 0.424

5. Visualizations and Interpretability

Feature Importance: Identified key features using SHAP and XGBoost.
Error Analysis: Highlighted areas for improvement, particularly for underrepresented quality levels.

Results Summary

Metric	Baseline Model	Improved Model
R² Score	0.546	0.954
RMSE	0.593	0.424

Tools and Libraries Used

Python Libraries: pandas, numpy, matplotlib, seaborn, xgboost, scikit-learn, imbalanced-learn, SHAP.

How to Reproduce

Clone the repository and download the dataset.
Install dependencies:
```
pip install -r requirements.txt
```
Run the notebook to execute the analysis and model training steps.

Recommendations for Future Work

Feature Engineering: Explore external data sources or new feature combinations.
Advanced Models: Experiment with ensemble methods or neural networks.
Explainability: Utilize advanced interpretability tools like SHAP or LIME.
Validation: Test the model on an independent validation dataset.

Conclusion

This project demonstrates how iterative improvements in data preprocessing, feature engineering, and model optimization can lead to significant gains in predictive performance. The resulting model is robust and provides valuable insights into the factors influencing wine quality.

Contact

For questions or collaborations, feel free to reach out at [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
baseline_model.pkl		baseline_model.pkl
part1.ipynb		part1.ipynb
part2.ipynb		part2.ipynb
winequality-white.csv		winequality-white.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README: Predicting Wine Quality with Machine Learning

Project Overview

Dataset

Steps and Methodology

1. Baseline Model Performance

2. Feature Engineering

3. Addressing Class Imbalance with SMOTE

4. Hyperparameter Tuning with GridSearchCV

5. Visualizations and Interpretability

Results Summary

Tools and Libraries Used

How to Reproduce

Recommendations for Future Work

Conclusion

Contact

About

Uh oh!

Releases

Packages

Languages

noydvori/TDSProject

Folders and files

Latest commit

History

Repository files navigation

README: Predicting Wine Quality with Machine Learning

Project Overview

Dataset

Steps and Methodology

1. Baseline Model Performance

2. Feature Engineering

3. Addressing Class Imbalance with SMOTE

4. Hyperparameter Tuning with GridSearchCV

5. Visualizations and Interpretability

Results Summary

Tools and Libraries Used

How to Reproduce

Recommendations for Future Work

Conclusion

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages