MACHINE LEARNING USING PYTHON
CAP4013L
School of Engineering & Sciences
Department of Computer Sciences and Engineering
Practical File
Submitted By
Student Name Pankaj Kumar
Enrolment Number 220160307085
Programme Master of Computer Application
Department Computer Science and Engineering
Session/Semester 2022-2024/Third Semester
Submitted To
Faculty Name Dr. Apeksha Mittal
INDEX
S.no Aim of Experiment Date Sign
1. Download a dataset from Kaggle (.csv format, atleast 1000 19-OCT-2023
rows and 20 columns) and
write a program in python programming language to perform
the following operations:
i) Read the dataset file in Python IDE.
ii) Display the dataset
iii) Display the shape of the dataset.
iv) Display the datatypes of the attributes of the dataset.
v) Find out the mean, median and mode of all the numeric
columns.
vi) Describe the entire dataset in terms of count, min, max,
standard deviation, variance
etc.
2. Write a program in python to implement Linear Regression. 30-OCT-2023
3. Write a Program in python to implement Binary Logistic 03-NOV-2023
Regression on a dataset
downloaded from Kaggle.
4. Write a Program in python to implement Naïve Bayes on the 08-NOV-2023
iris dataset. Study the
confusion matrix.
5. Write a program in Python to implement Naïve Bayes 16-NOV-2023
Algorithm on a dataset from Kaggle.
Also print Confusion Matrix, Accuracy, Precision, Recall.
6. Write a program in python to implement Support Vector 21-NOV-2023
Machine on the iris dataset.
1. Download a dataset from Kaggle (.csv format, atleast 1000 rows and 20
columns) and write
a program in python programming language to perform the following
operations:
i) Read the dataset file in Python IDE.
ii) Display the dataset
iii) Display the shape of the dataset.
iv) Display the datatypes of the attributes of the dataset.
v) Find out the mean, median and mode of all the numeric columns.
vi) Describe the entire dataset in terms of count, min, max, standard deviation, variance
# Import necessary libraries
import pandas as pd
i) Read the dataset file in Python IDE
# Replace 'path/to/titanic_dataset.csv' with the actual file path
file_path = 'path/to/match.csv'
df = pd.read_csv("match.csv")
ii) Display the dataset
print("Dataset:")
print(df)
iii) Display the shape of the dataset
print("\nShape of the dataset:")
print(df.shape)
iv) Display the datatypes of the attributes of the dataset
print("\nDatatypes of the attributes:")
print(df.dtypes)
v) Find out the mean, median, and mode of all the numeric columns
print("\nMean of numeric columns:")
print(df.mean())
print("\nMedian of numeric columns:")
print(df.median())
print("\nMode of numeric columns:")
print(df.mode().iloc[0])
vi) Describe the entire dataset in terms of count, min, max, standard deviation, variance,
etc.
print("\nSummary statistics of the dataset:")
print(df.describe())
2. Write a program in python to implement Linear Regression
import numpy as np
import matplotlib.pyplot as plt
# Generate some random data for demonstration purposes
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Visualize the data
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Generated Data for Linear Regression')
plt.show()
# Linear Regression implementation using NumPy
X_b = np.c_[np.ones((100, 1)), X] # Add bias term to X
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
# Print the calculated parameters
print("Intercept (theta_0):", theta_best[0][0])
print("Slope (theta_1):", theta_best[1][0])
# Make predictions on new data
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new]
y_predict = X_new_b.dot(theta_best)
# Plot the linear regression line
plt.plot(X_new, y_predict, "r-")
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Fit')
plt.show()
3. Write a Program in python to implement Binary Logistic Regression on a
dataset downloaded from Kaggle
I take Titanic dataset from Kaggle
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,
recall_score
# Load the Titanic dataset (replace 'path/to/titanic.csv' with the actual file path)
df = pd.read_csv('Titanic.csv')
# Preprocess the data (handle missing values, encode categorical variables, etc.)
# For simplicity, let's drop some irrelevant columns
df = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Survived']].dropna()
# Convert categorical variables to numerical using one-hot encoding
df = pd.get_dummies(df, columns=['Sex'], drop_first=True)
# Separate features and target variable
X = df.drop('Survived', axis=1)
y = df['Survived']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features (optional but often recommended)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create and train the Logistic Regression model
logreg_model = LogisticRegression()
logreg_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = logreg_model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
# Print the results
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("Confusion Matrix:")
print(conf_matrix)
4. Write a Program in Python to implement Naïve Bayes on iris Dataset . Study
the Confusion Matrix
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn import datasets
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a Naive Bayes model (Gaussian Naive Bayes for continuous features)
nb_model = GaussianNB()
# Train the model
nb_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = nb_model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
# Display the results
print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
5. Write a Program in Python To implement Naive Bayes Algorithm on a
Dataset From Kaggle. Also Print Confusion Matrix ,Accuracy ,Precision ,Recall.
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score,
recall_score
# Load the Titanic dataset (you can download it from Kaggle or use seaborn library to load
it)
# For example, using seaborn:
# import seaborn as sns
# df = sns.load_dataset('titanic')
# Assuming you have a 'titanic.csv' file
df = pd.read_csv('Titanic.csv')
# Preprocess the data (you may need to handle missing values, encode categorical variables,
etc.)
# For simplicity, let's drop some irrelevant columns
df = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Survived']].dropna()
# Convert categorical variables to numerical using one-hot encoding
df = pd.get_dummies(df, columns=['Sex'], drop_first=True)
# Separate features and target variable
X = df.drop('Survived', axis=1)
y = df['Survived']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the Naive Bayes model (Gaussian Naive Bayes for numerical features)
naive_bayes = GaussianNB()
naive_bayes.fit(X_train, y_train)
# Make predictions on the test set
y_pred = naive_bayes.predict(X_test)
# Evaluate the model
conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
# Print the results
print("Confusion Matrix:")
print(conf_matrix)
print("\nAccuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
6. Write a program in python to implement Support Vector Machine on the iris
dataset.
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the Support Vector Machine model
svm_model = SVC(kernel='linear') # You can try different kernels like 'rbf', 'poly', etc.
svm_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = svm_model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
# Print the results
print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
# Visualization (2D plot for simplicity, considering only the first two features)
plt.figure(figsize=(8, 6))
# Plot the decision boundary
h = .02 # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = svm_model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
# Plot the points
scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('Support Vector Machine on Iris Dataset')
plt.legend(*scatter.legend_elements(), title='Classes')
plt.show()