0% found this document useful (0 votes)

28 views6 pages

Python Project 2 Colab

The document outlines a data analysis project on the Wine Quality dataset, detailing the importation of necessary libraries, data preprocessing, and exploratory data analysis. Key findings include the most frequent wine quality, correlations between various features and wine quality, and the training of Decision Tree and Random Forest models to predict wine quality with accuracy scores of 0.675 and 0.725, respectively. The Random Forest model outperformed the Decision Tree model in terms of accuracy.

Uploaded by

Gaurav Rajula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views6 pages

Python Project 2 Colab

Uploaded by

Gaurav Rajula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1/22/25, 8:14 PM Finlatics project 2 .

ipynb - Colab

FINLATICS Project 2

In this dataset we are analysing Wine Quality dataset.

# importing necessery libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# importing the data set

df = pd.read_csv('/content/wine_data.csv')

# data preprocessing

df.head()

free total
fixed volatile citric residual
chlorides sulfur sulfur density pH sulphates alcohol q
acidity acidity acid sugar
dioxide dioxide

0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4

1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8

2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8

3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8

4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

# Checking info about the dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fixed acidity 1599 non-null float64
1 volatile acidity 1599 non-null float64
2 citric acid 1599 non-null float64
3 residual sugar 1599 non-null float64
4 chlorides 1599 non-null float64
5 free sulfur dioxide 1599 non-null float64
6 total sulfur dioxide 1599 non-null float64
7 density 1599 non-null float64
8 pH 1599 non-null float64
9 sulphates 1599 non-null float64
10 alcohol 1599 non-null float64
11 quality 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 1/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab
# Checking for missing values and duplicates
print(df.isnull().sum())

print("checking duplicate rows")

print(df.duplicated().sum())

print("describing data")

print(df.describe())

fixed acidity 0
volatile acidity 0
citric acid 0
residual sugar 0
chlorides 0
free sulfur dioxide 0
total sulfur dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64
checking duplicate rows
240
describing data
fixed acidity volatile acidity citric acid residual sugar \
count 1599.000000 1599.000000 1599.000000 1599.000000
mean 8.319637 0.527821 0.270976 2.538806
std 1.741096 0.179060 0.194801 1.409928
min 4.600000 0.120000 0.000000 0.900000
25% 7.100000 0.390000 0.090000 1.900000
50% 7.900000 0.520000 0.260000 2.200000
75% 9.200000 0.640000 0.420000 2.600000
max 15.900000 1.580000 1.000000 15.500000

chlorides free sulfur dioxide total sulfur dioxide density \

count 1599.000000 1599.000000 1599.000000 1599.000000
mean 0.087467 15.874922 46.467792 0.996747
std 0.047065 10.460157 32.895324 0.001887
min 0.012000 1.000000 6.000000 0.990070
25% 0.070000 7.000000 22.000000 0.995600
50% 0.079000 14.000000 38.000000 0.996750
75% 0.090000 21.000000 62.000000 0.997835
max 0.611000 72.000000 289.000000 1.003690

pH sulphates alcohol quality

count 1599.000000 1599.000000 1599.000000 1599.000000
mean 3.311113 0.658149 10.422983 5.636023
std 0.154386 0.169507 1.065668 0.807569
min 2.740000 0.330000 8.400000 3.000000
25% 3.210000 0.550000 9.500000 5.000000
50% 3.310000 0.620000 10.200000 6.000000
75% 3.400000 0.730000 11.100000 6.000000
max 4.010000 2.000000 14.900000 8.000000

1. What is the most frequently occurring wine quality? What is the highest number in and the lowest number in
the quantity column?

# Most frequently occurring wine quality

most_frequent_quality = df['quality'].mode()[0]
quality_count = df['quality'].value_counts()

# Highest and lowest values in the 'quality' column

highest_quality = df['quality'].max()
lowest_quality = df['quality'].min()

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 2/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab

print("Most frequent wine quality : ",most_frequent_quality)

print("Frequency of each wine quality : ", quality_count)
print("Highest wine quality : " ,highest_quality)
print("Lowest wine quality : " ,lowest_quality)

Most frequent wine quality : 5

Frequency of each wine quality : quality
5 681
6 638
7 199
4 53
8 18
3 10
Name: count, dtype: int64
Highest wine quality : 8
Lowest wine quality : 3

2. How is fixed acidity correlated to the quality of the wine? How does the alcohol content affect the quality?
How is the free Sulphur dioxide content correlated to the quality of the wine?

# finding correlations between given features

corr_fixed_acidity = df['fixed acidity'].corr(df['quality'])

corr_alcohol = df['alcohol'].corr(df['quality'])
corr_free_sulfur_dioxide = df['free sulfur dioxide'].corr(df['quality'])

print("correlation between fixed acidity and quality of wine : ",corr_fixed_acidity)

print("correlation between alcohol and quality of wine : ",corr_alcohol)

print("corelation between free sulfur dioxide and quality of wine : ",corr_free_sulfur_dioxide)

correlation between fixed acidity and quality of wine : 0.12405164911322428

correlation between alcohol and quality of wine : 0.4761663239995365
corelation between free sulfur dioxide and quality of wine : -0.0506560572442763

# visualizing the given correlations

import seaborn as sns

plt.figure(figsize=(8,8))

# Fixed acidity vs Quality

plt.subplot(1, 3, 1)
sns.scatterplot(x='fixed acidity', y='quality', data=df, alpha=0.5)
plt.title('Fixed Acidity vs Quality')
plt.xlabel('Fixed Acidity')
plt.ylabel('Quality')

# Alcohol vs Quality
plt.subplot(1, 3, 2)
sns.scatterplot(x='alcohol', y='quality', data=df, alpha=0.5, color='orange')
plt.title('Alcohol vs Quality')
plt.xlabel('Alcohol')
plt.ylabel('Quality')

# Free Sulfur Dioxide vs Quality

plt.subplot(1, 3, 3)
sns.scatterplot(x='free sulfur dioxide', y='quality', data=df, alpha=0.5, color='green')
plt.title('Free Sulfur Dioxide vs Quality')

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 3/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab
plt.xlabel('Free Sulfur Dioxide')
plt.ylabel('Quality')

plt.tight_layout()
plt.show()

3. What is the average residual sugar for the best quality wine and the lowest quality wine in the dataset?

# average residual sugar for the best quality wine and the lowest quality wine

residual_sugar_best_quality = df[df['quality'] == df['quality'].max()]['residual sugar'].mean()

residual_sugar_lowest_quality = df[df['quality'] == df['quality'].min()]['residual sugar'].mean()

print("Average residual sugar for the best quality wine : ",residual_sugar_best_quality)

print("Average residual sugar for the lowest quality wine : ",residual_sugar_lowest_quality)

Average residual sugar for the best quality wine : 2.5777777777777775

Average residual sugar for the lowest quality wine : 2.6350000000000002

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 4/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab

4. Does volatile acidity has an effect over the quality of the wine samples in the dataset?

# correlation of volatile acidity and wine quality

corr_volatile_acidity = df['volatile acidity'].corr(df['quality'])

print("correlation between volatile acidity and wine quality : ",corr_volatile_acidity)

# Scatter plot to visualize the relationship

plt.figure(figsize=(8, 5))
sns.scatterplot(x='volatile acidity', y='quality', data=df, alpha=0.5, color='green')
plt.title('Volatile Acidity vs Wine Quality')
plt.xlabel('Volatile Acidity')
plt.ylabel('Wine Quality')
plt.show()

correlation between volatile acidity and wine quality : -0.390557780264007

5. Train a Decision Tree model and Random Forest Model separately to predict the Quality of the given samples
of wine. Compare the Accuracy scores for both models.

# for this we need to train two models and compare the accuracy score of both models
# for this we need to import needed models and split the data into training and testing

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Splitting data into features (X) and target (y)

X = df.drop(columns=['quality'])
y = df['quality']

# Train-test split (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 5/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab

# Decision Tree Model

dt_model = DecisionTreeClassifier(random_state=3)
# fitting and training model
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)
# accuracy score of decision tree model
dt_accuracy = accuracy_score(y_test, y_pred_dt)

print("accuracy score of decision tree model for the wine data : ",dt_accuracy)

# Random Forest Model

rf_model = RandomForestClassifier(random_state=3)
# fitting and training model
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
# accuracy score of random forest model
rf_accuracy = accuracy_score(y_test, y_pred_rf)

print("accuracy score of random forest model for the wine data : ",rf_accuracy)

# comparing accuracy score of both models

print("for the given wine data")

print("accuracy score of decision tree model : ",dt_accuracy)
print("accuracy score of random forest model : ",rf_accuracy)

if dt_accuracy > rf_accuracy:

print("Decision Tree model performs better.")
elif dt_accuracy < rf_accuracy:
print("Random Forest model performs better.")
else:
print("Both models have the same accuracy.")

accuracy score of decision tree model for the wine data : 0.675
accuracy score of random forest model for the wine data : 0.725
for the given wine data
accuracy score of decision tree model : 0.675
accuracy score of random forest model : 0.725
Random Forest model performs better.

Could not connect to the reCAPTCHA service. Please check your internet connection and reload to get a reCAPTCHA challenge.

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 6/6

Automatic Transfer Switch PLT
100% (1)
Automatic Transfer Switch PLT
157 pages
Battery Monitoring Board Tesla 1701959523
No ratings yet
Battery Monitoring Board Tesla 1701959523
8 pages
Principles of Engineering Thermodynamics - SI Version 8th Edition
No ratings yet
Principles of Engineering Thermodynamics - SI Version 8th Edition
47 pages
Tabel Termodinamika
50% (2)
Tabel Termodinamika
104 pages
Stargate Universe 3 X 02
100% (1)
Stargate Universe 3 X 02
52 pages
Sewage & Septage Ordinance Guide
100% (2)
Sewage & Septage Ordinance Guide
10 pages
ND Science Laboratory Technology
No ratings yet
ND Science Laboratory Technology
264 pages
Eda Red Wine
No ratings yet
Eda Red Wine
16 pages
Assignment4 VidulGarg
No ratings yet
Assignment4 VidulGarg
14 pages
Quality Prediction
No ratings yet
Quality Prediction
20 pages
Wine DS
No ratings yet
Wine DS
14 pages
Central Tendency and Dispersion Analysis - 12212204
No ratings yet
Central Tendency and Dispersion Analysis - 12212204
14 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
Basic Python Analysis
No ratings yet
Basic Python Analysis
33 pages
Quality Prediction Checkpoint
No ratings yet
Quality Prediction Checkpoint
14 pages
TP
No ratings yet
TP
13 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
6 pages
ML Assgn Logistic Wine Quality - Ipynb - Colab
No ratings yet
ML Assgn Logistic Wine Quality - Ipynb - Colab
5 pages
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
No ratings yet
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
6 pages
Normalization of Data - Jupyter Notebook
No ratings yet
Normalization of Data - Jupyter Notebook
7 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
Learning Concepts Hackers Realm
No ratings yet
Learning Concepts Hackers Realm
78 pages
Statistics and Probability PROJECT 2
No ratings yet
Statistics and Probability PROJECT 2
8 pages
Pandas Usefull Code
No ratings yet
Pandas Usefull Code
2 pages
AM19 EDA Assignment5
No ratings yet
AM19 EDA Assignment5
19 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
Wine
No ratings yet
Wine
22 pages
Wine
No ratings yet
Wine
15 pages
Data Mining 1 Practical File-1
No ratings yet
Data Mining 1 Practical File-1
24 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
21brs1715 Lab3
No ratings yet
21brs1715 Lab3
4 pages
14-May - Jupyter Notebook
No ratings yet
14-May - Jupyter Notebook
15 pages
Fixed Acidity Volatile Acidity Citric Acid Residual Sugar Chlorides Count Mean STD Min 25% 50% 75% Max Free Sulfur Dioxide
No ratings yet
Fixed Acidity Volatile Acidity Citric Acid Residual Sugar Chlorides Count Mean STD Min 25% 50% 75% Max Free Sulfur Dioxide
2 pages
Practical04.ipynb - Colab
No ratings yet
Practical04.ipynb - Colab
2 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
Data Ingestion: Import As Import As Import As
No ratings yet
Data Ingestion: Import As Import As Import As
16 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
Compte Rendu TP 2 Pandas
No ratings yet
Compte Rendu TP 2 Pandas
2 pages
45B AIML Practical07 Clustering
No ratings yet
45B AIML Practical07 Clustering
8 pages
Water Quality Data Analysis
No ratings yet
Water Quality Data Analysis
4 pages
R Console
No ratings yet
R Console
1 page
Binary Distillation Calculator
No ratings yet
Binary Distillation Calculator
2,155 pages
Record
No ratings yet
Record
27 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Coding An
No ratings yet
Coding An
19 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
9 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
DataFrame and Series
No ratings yet
DataFrame and Series
2 pages
03 - Fractionators
No ratings yet
03 - Fractionators
30 pages
The Art of Effective Visualization of Multi-Dimensional Data
No ratings yet
The Art of Effective Visualization of Multi-Dimensional Data
51 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
AND Temperatures: Liq. Liq
No ratings yet
AND Temperatures: Liq. Liq
8 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Steam Table Only Water
No ratings yet
Steam Table Only Water
35 pages
Tabel Sifat Air
No ratings yet
Tabel Sifat Air
11 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Exercise#9 Instructions 2021
No ratings yet
Exercise#9 Instructions 2021
5 pages
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
No ratings yet
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
15 pages
Fase Equilibrio Benceno/Metanol
No ratings yet
Fase Equilibrio Benceno/Metanol
15 pages
Karisma 23011101119 Eda Rec
No ratings yet
Karisma 23011101119 Eda Rec
88 pages
02 Radio Engineering - Radio Propagation
No ratings yet
02 Radio Engineering - Radio Propagation
18 pages
Book Summary
No ratings yet
Book Summary
41 pages
Class 10 - DECEMBER PREBOARD EXAM
No ratings yet
Class 10 - DECEMBER PREBOARD EXAM
11 pages
Summer 2122 Aubf Lab Periodical Test 2
No ratings yet
Summer 2122 Aubf Lab Periodical Test 2
38 pages
Catalog Fortuner GR Sport Compressed
No ratings yet
Catalog Fortuner GR Sport Compressed
8 pages
ICE1806 1821 v45 022519
No ratings yet
ICE1806 1821 v45 022519
20 pages
Global Organic Textile Standard - GOTS
No ratings yet
Global Organic Textile Standard - GOTS
3 pages
MATH
No ratings yet
MATH
6 pages
Research Proposal Covid 19
No ratings yet
Research Proposal Covid 19
19 pages
Science Teaching Reflection
No ratings yet
Science Teaching Reflection
2 pages
City School Itep Test
100% (4)
City School Itep Test
4 pages
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
No ratings yet
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
7 pages
POLARIS RPG - Core Rulebook 1 Beta 05 (8527262) PDF
100% (1)
POLARIS RPG - Core Rulebook 1 Beta 05 (8527262) PDF
269 pages
RDO No. 68 - Sorsogon City, Sorsogon 3
No ratings yet
RDO No. 68 - Sorsogon City, Sorsogon 3
703 pages
All About Gog and Magog, The Anti-Christ, and The Beast - Islam Question & Answer
No ratings yet
All About Gog and Magog, The Anti-Christ, and The Beast - Islam Question & Answer
3 pages
CSPP Geo GRB Installation Guide v0.3
No ratings yet
CSPP Geo GRB Installation Guide v0.3
13 pages
Grade: Midterm II (Quantitative Methods I)
No ratings yet
Grade: Midterm II (Quantitative Methods I)
3 pages
Application-Form-FSEC-for-Building-Permit Koronadal
No ratings yet
Application-Form-FSEC-for-Building-Permit Koronadal
1 page
Dežela Celjska in Your Pocket
No ratings yet
Dežela Celjska in Your Pocket
85 pages
Policy Wordings
No ratings yet
Policy Wordings
19 pages
Industrial Training (Presentation Slide)
No ratings yet
Industrial Training (Presentation Slide)
20 pages
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
No ratings yet
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
5 pages
Tle
No ratings yet
Tle
7 pages
11th Physics Book Back Questions With Answers in English
No ratings yet
11th Physics Book Back Questions With Answers in English
29 pages
Marker Assisted Breeding
No ratings yet
Marker Assisted Breeding
19 pages

Python Project 2 Colab

Uploaded by

Python Project 2 Colab

Uploaded by

1/22/25, 8:14 PM Finlatics project 2 .

In this dataset we are analysing Wine Quality dataset.

# importing necessery libraries

# importing the data set

# Checking info about the dataset

print("checking duplicate rows")

chlorides free sulfur dioxide total sulfur dioxide density \

pH sulphates alcohol quality

# Most frequently occurring wine quality

# Highest and lowest values in the 'quality' column

print("Most frequent wine quality : ",most_frequent_quality)

Most frequent wine quality : 5

# finding correlations between given features

corr_fixed_acidity = df['fixed acidity'].corr(df['quality'])

print("correlation between fixed acidity and quality of wine : ",corr_fixed_acidity)

print("correlation between alcohol and quality of wine : ",corr_alcohol)

print("corelation between free sulfur dioxide and quality of wine : ",corr_free_sulfur_dioxide)

correlation between fixed acidity and quality of wine : 0.12405164911322428

# visualizing the given correlations

# Fixed acidity vs Quality

# Free Sulfur Dioxide vs Quality

residual_sugar_best_quality = df[df['quality'] == df['quality'].max()]['residual sugar'].mean()

print("Average residual sugar for the best quality wine : ",residual_sugar_best_quality)

Average residual sugar for the best quality wine : 2.5777777777777775

# correlation of volatile acidity and wine quality

corr_volatile_acidity = df['volatile acidity'].corr(df['quality'])

print("correlation between volatile acidity and wine quality : ",corr_volatile_acidity)

# Scatter plot to visualize the relationship

correlation between volatile acidity and wine quality : -0.390557780264007

from sklearn.model_selection import train_test_split

# Splitting data into features (X) and target (y)

# Train-test split (80% train, 20% test)

# Decision Tree Model

# Random Forest Model

# comparing accuracy score of both models

print("for the given wine data")

if dt_accuracy > rf_accuracy:

You might also like