A machine learning project that predicts wine quality based on various physicochemical properties. Using Wine Quality Dataset from https://www.kaggle.com/datasets/yasserh/wine-quality-dataset
wine-quality-prediction/
├── data/ # Dataset directory
│ └── wine_quality.csv # Wine dataset (to be added)
├── models/ # Trained models
├── src/ # Source code
│ ├── data_loader.py # Data loading and preprocessing
│ ├── model_trainer.py # Model training utilities
│ └── predictor.py # Prediction utilities
├── notebooks/ # Jupyter notebooks
│ └── wine_analysis.ipynb # Data analysis notebook
├── train_model.py # Main training script
└── requirements.txt # Python dependencies
-
Install dependencies:
pip install -r requirements.txt
-
Add your dataset:
- Place your wine quality dataset as
data/wine_quality.csv - Expected format: CSV with wine attributes and a 'quality' column
- Common wine attributes: fixed_acidity, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, alcohol
- Place your wine quality dataset as
Run the main training script:
python train_model.pyThis will:
- Load and preprocess the data
- Train multiple ML models (Random Forest, Gradient Boosting, Linear Regression, SVR)
- Compare model performance
- Save the best model to
models/
from src.predictor import WineQualityPredictor
# Initialize predictor with trained model
predictor = WineQualityPredictor('models/wine_quality_model.joblib')
# Predict single wine quality
wine_features = {
'fixed_acidity': 7.4,
'volatile_acidity': 0.7,
'citric_acid': 0.0,
'residual_sugar': 1.9,
'chlorides': 0.076,
'free_sulfur_dioxide': 11.0,
'total_sulfur_dioxide': 34.0,
'density': 0.9978,
'pH': 3.51,
'sulphates': 0.56,
'alcohol': 9.4
}
predicted_quality = predictor.predict_single_wine(wine_features)
print(f"Predicted quality: {predicted_quality:.2f}")Open the Jupyter notebook for interactive analysis:
jupyter notebook notebooks/wine_analysis.ipynbThe project includes several ML algorithms:
- Random Forest: Ensemble method, good baseline
- Gradient Boosting: Often performs well on tabular data
- Linear Regression: Simple interpretable model
- SVR: Support Vector Regression
- Data Loading: Flexible CSV loading with preprocessing
- Model Training: Multiple algorithms with cross-validation
- Model Comparison: Automatic comparison of different models
- Feature Importance: Analysis of which features matter most
- Prediction Confidence: Uncertainty estimation for ensemble models
- Batch Prediction: Process multiple wines at once
Expected wine attributes (typical wine quality dataset):
fixed_acidity: Fixed acidity levelvolatile_acidity: Volatile acidity levelcitric_acid: Citric acid contentresidual_sugar: Residual sugar contentchlorides: Chloride contentfree_sulfur_dioxide: Free sulfur dioxidetotal_sulfur_dioxide: Total sulfur dioxidedensity: Wine densitypH: pH levelsulphates: Sulphate contentalcohol: Alcohol percentagequality: Target variable (wine quality score)