Experiment-5
Objective: Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Source Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix
# Load dataset
data = pd.read_csv("PlayTennis.csv")
# Encoding categorical features
label_encoders = {}
for column in data.columns[:-1]: # Excluding target column
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le
target_encoder = LabelEncoder()
data['PlayTennis'] = target_encoder.fit_transform(data['PlayTennis'])
# Splitting dataset into train and test sets
X = data.drop(columns=['PlayTennis'])
y = data['PlayTennis']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Naïve Bayes classifier
classifier = GaussianNB()
classifier.fit(X_train, y_train)
# Predictions
Machine Learning 6IT4-22 37
y_pred = classifier.predict(X_test)
# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)
# Correct and incorrect classifications
correct = (y_pred == y_test).sum()
incorrect = (y_pred != y_test).sum()
print(f"Correct Classifications: {correct}")
print(f"Incorrect Classifications: {incorrect}")
Explanation:
1. Load the Dataset
The script reads a CSV file (PlayTennis.csv) containing categorical features like Outlook,
Temperature, Humidity, Wind, and the target variable PlayTennis.
2. Encode Categorical Features
Since Naïve Bayes works with numerical data, Label Encoding is used to convert categorical
features into numerical form. Each categorical column (except the target) is encoded using
LabelEncoder().
The target column (PlayTennis) is also encoded separately.
3. Split Dataset into Training & Testing Sets
The dataset is split into 80% training and 20% testing using train_test_split().
X (features) and y (target) are separated before splitting.
4. Train the Naïve Bayes Classifier
A Gaussian Naïve Bayes model (GaussianNB()) is trained on the training data.
The model learns from the probability distributions of the feature values.
Machine Learning 6IT4-22 38
5. Make Predictions
The classifier predicts outcomes on the test dataset.
6. Compute Accuracy
The model's accuracy is calculated using accuracy_score(y_test, y_pred).
This gives the percentage of correctly classified instances.
7. Generate Confusion Matrix
The confusion matrix (confusion_matrix(y_test, y_pred)) shows the number of:
True Positives (Correct Yes)
True Negatives (Correct No)
False Positives (Incorrect Yes)
False Negatives (Incorrect No)
8. Count Correct & Incorrect Classifications
The script calculates and prints the number of correctly and incorrectly classified instances.
Machine Learning 6IT4-22 39
Experiment-8
Objective: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data
set for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
Source Code:
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load the dataset
dataset = load_iris()
X = pd.DataFrame(dataset.data)
X.columns = ['Sepal_Length', 'Sepal_width', 'Petal_Length', 'Petal_Width']
y = pd.DataFrame(dataset.target)
y.columns = ['Targets']
# Plotting
plt.figure(figsize=(14, 7))
colormap = np.array(['red', 'lime', 'black'])
# Real Plot
plt.subplot(1, 3, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real')
# KMeans Plot
Machine Learning 6IT4-22 46
plt.subplot(1, 3, 2)
model = KMeans(n_clusters=3)
model.fit(X)
predY = np.choose(model.labels_, [0, 1, 2]).astype(np.int64)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[predY], s=40)
plt.title('KMeans')
# GMM Plot
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns=X.columns)
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm = gmm.predict(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')
plt.show()
Machine Learning 6IT4-22 47
Output:
Exp
Explanation:
1. Data Loading:
• The script loads the Iris dataset, which includes features
like Sepal_Length, Sepal_width, Petal_Length, and Petal_Width, and the target
variable Targets (species of the flower).
2. k-Means Clustering:
• The k-Means algorithm is used to cluster the data into 3 clusters (since there are 3
species in the Iris dataset).
• The predicted cluster labels are used to color the data points in the plot.
3. Gaussian Mixture Model (GMM) Clustering:
• The data is standardized using StandardScaler to ensure all features contribute
equally to the clustering.
• The GMM algorithm is applied to fit the data into 3 Gaussian distributions
(components).
• The predicted cluster labels from GMM are used to color the data points in the plot.
Machine Learning 6IT4-22 48
4. Visualization:
• Three subplots are created:
• The first subplot shows the real data distribution colored by the true species
labels.
• The second subplot shows the clustering result of k-Means.
• The third subplot shows the clustering result of GMM.
Machine Learning 6IT4-22 49
Experiment-9
Objective: Build an Artificial Neural Network by implementing the Backpropagation algorithm
and test the same using appropriate datasets.
Source Code:
import pandas as pd
import tensorflow as tf
from google.colab import files
# Upload the dataset
file_upload = files.upload()
# Read the dataset
df = pd.read_csv('Celsius_to_Fahrenheit.csv')
print(df.head())
# Prepare the data
X = df['Celsius']
y = df['Fahrenheit']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
# Plot the data
import seaborn as sns
sns.scatterplot(x=df['Celsius'], y=df['Fahrenheit'], marker='.', s=20, color='b')
plt.show()
# Initialize the model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=1, input_shape=[1]))
model.summary()
Machine Learning 6IT4-22 50
# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(0.5), loss='mean_squared_error')
# Train the model
epochs_hist = model.fit(X_train, y_train, epochs=500)
# Get the model weights
print(model.get_weights())
# Make a prediction
celsius_temp = 100
print(f'Prediction from our perceptron model is: {model.predict([celsius_temp])}')
# Calculate the actual value
F = 9/5 * celsius_temp + 32
print(f'Prediction from actual formula is: {F}')
# Plot the loss progression
import matplotlib.pyplot as plt
plt.plot(epochs_hist.history['loss'])
plt.xlabel('Number of epochs')
plt.ylabel('loss')
plt.title('Loss progression during training')
plt.show()
# Plot the regression line on training data
plt.scatter(X_train, y_train, c='b', marker='.')
plt.plot(X_train, model.predict(X_train), c='g')
plt.xlabel('Celsius')
plt.ylabel('Fahrenheit')
plt.title('Regression line on training data')
plt.show()
Machine Learning 6IT4-22 51
# Plot the regression line on test data
plt.scatter(X_test, y_test, c='b', marker='.')
plt.plot(X_test, model.predict(X_test), c='r')
plt.xlabel('Celsius')
plt.ylabel('Fahrenheit')
plt.title('Regression line on test data')
plt.show()
Output:
Machine Learning 6IT4-22 52
Machine Learning 6IT4-22 53
Experiment-10
Objective: For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a description of the set
of all hypotheses consistent with the training examples.
Source Code:
import numpy as np
import pandas as pd
# Loading Data from a CSV File
data = pd.DataFrame(data=pd.read_csv('trainingdata.csv'))
print(data)
# Separating concept features from Target
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
# Isolating target into a separate DataFrame
# copying last column to target array
target = np.array(data.iloc[:,-1])
print(target)
def learn(concepts, target):
'''
learn() function implements the learning method of the Candidate elimination
algorithm.
Arguments:
concepts - a data frame with all the features
target - a data frame with corresponding output values
'''
# Initialise S0 with the first instance from concepts
Machine Learning 6IT4-22 54