Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views14 pages

ML Lab Manual

The document outlines various experiments demonstrating different machine learning algorithms including FIND-S, Candidate-Elimination, ID3, Backpropagation, Naive Bayesian Classifier, and Bayesian Network. Each experiment includes a problem statement, conceptual code in Python, and potential outputs for clarity. The focus is on implementing these algorithms using training data from CSV files and evaluating their performance.

Uploaded by

soupayanwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

ML Lab Manual

The document outlines various experiments demonstrating different machine learning algorithms including FIND-S, Candidate-Elimination, ID3, Backpropagation, Naive Bayesian Classifier, and Bayesian Network. Each experiment includes a problem statement, conceptual code in Python, and potential outputs for clarity. The focus is on implementing these algorithms using training data from CSV files and evaluating their performance.

Uploaded by

soupayanwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

EXPERIMENT 1: FIND-S Algorithm

Problem Statement:

Implement and demonstrate the FIND-S algorithm to find the most specific hypothesis consistent with a
given set of positive training examples. Read the training data from a ‘.CSV’ file.

Similar Question:

Consider the following training examples for a concept "EnjoySport":

Assuming the hypothesis space is a conjunction of attributes, trace the execution of the FIND-S algorithm
and determine the final most specific hypothesis.

Conceptual Code (Python):

import pandas as pd

def find_s(data):

hypothesis = None # Start with no hypothesis

# Iterate through each row of the data

for i, row in data.iterrows():

if row['EnjoySport'] == 'Yes': # Focus only on positive examples

if hypothesis is None:

# Initialize with the first positive example

hypothesis = list(row[:-1])

else:
# Generalize the hypothesis if attributes don't match

for j in range(len(hypothesis)):

if hypothesis[j] != row[j]:

hypothesis[j] = '?'

return hypothesis

# Sample CSV data (hypothetical data.csv):

# Sky,AirTemp,Humidity,Wind,Water,Forecast,EnjoySport

# Sunny,Warm,Normal,Strong,Warm,Same,Yes

# Sunny,Warm,High,Strong,Warm,Same,Yes

# Sunny,Warm,High,Strong,Cool,Change,Yes

# Load the training data from the CSV file

data = pd.read_csv('data.csv')

specific_hypothesis = find_s(data)

print("Most Specific Hypothesis:", specific_hypothesis)

Potential Output:

Most Specific Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']


EXPERIMENT 2: Candidate-Elimination Algorithm
Problem Statement:

For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-
Elimination algorithm to output a description of the set of all hypotheses consistent with the training
examples (version space).

Similar Question:

Using the same "EnjoySport" dataset from Experiment 1, trace the Candidate-Elimination algorithm,
showing the evolution of the General Boundary (G) and Specific Boundary (S) after each training
example.

Conceptual Code (Python):

import pandas as pd

def is_consistent(hypothesis, instance):

for i in range(len(hypothesis)):

if hypothesis[i] != '?' and hypothesis[i] != instance[i]:

return False

return True

def candidate_elimination(data):

num_attributes = len(data.columns) - 1

specific_boundary = ['?' for _ in range(num_attributes)]

general_boundary = [['?' for _ in range(num_attributes)] for _ in range(num_attributes)]

for i, row in data.iterrows():

instance = list(row[:-1])

target = row['EnjoySport']

if target == 'Yes':

for j in range(num_attributes):

if specific_boundary[j] == '?':
specific_boundary[j] = instance[j]

elif specific_boundary[j] != instance[j]:

specific_boundary[j] = '?'

for g in list(general_boundary):

if not is_consistent(g, instance):

general_boundary.remove(g)

elif target == 'No':

new_generalizations = []

for j in range(num_attributes):

if specific_boundary[j] != '?' and specific_boundary[j] != instance[j]:

new_general_hypothesis = list(specific_boundary)

new_general_hypothesis[j] = '?'

if new_general_hypothesis not in new_generalizations and new_general_hypothesis not in


general_boundary:

new_generalizations.append(new_general_hypothesis)

for new_hyp in new_generalizations:

is_more_general = True

for g in general_boundary:

if all((gh == '?' or gh == nh) for gh, nh in zip(g, new_hyp)):

is_more_general = False

break

if is_more_general:

general_boundary.append(new_hyp)

general_boundary[:] = [g for g in general_boundary if not all((s == '?' or s == g[i]) for i, s in


enumerate(specific_boundary))]

final_general_boundary = []
for g1 in general_boundary:

is_minimal = True

for g2 in general_boundary:

if g1 != g2 and all((g2_val == '?' or g2_val == g1_val) for g1_val, g2_val in zip(g1, g2)) and
any(g1_val != g2_val for g1_val, g2_val in zip(g1, g2)):

is_minimal = False

break

if is_minimal and g1 not in final_general_boundary:

final_general_boundary.append(g1)

return specific_boundary, final_general_boundary

# Assuming 'data.csv' from Experiment 1

data = pd.read_csv('data.csv')

s_boundary, g_boundary = candidate_elimination(data)

print("Specific Boundary (S):", s_boundary)

print("General Boundary (G):", g_boundary)

Potential Output:

Specific Boundary (S): ['Sunny', 'Warm', '?', 'Strong', '?', '?']

General Boundary (G): [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'],
['?', '?', '?', '?', '?', '?']]
EXPERIMENT 3: ID3 Algorithm
Problem Statement:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.

Similar Question:

Consider the following "PlayTennis" dataset:

Export to Sheets

Calculate the initial entropy of the "PlayTennis" attribute. Then, calculate the information gain for the
"Outlook" attribute.

Conceptual Code (Python):

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score


# Sample CSV data (hypothetical tennis.csv):

# Outlook,Temperature,Humidity,Wind,PlayTennis

# Sunny,Hot,High,Weak,No

# ... (rest of the data)

data = pd.read_csv('tennis.csv')

X = data.drop('PlayTennis', axis=1)

y = data['PlayTennis']

X = pd.get_dummies(X, drop_first=True) # Convert categorical features

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier(criterion='entropy')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

# Predicting for a new sample

new_sample = pd.DataFrame([{'Outlook_Rainy': 0, 'Outlook_Sunny': 1,

'Temperature_Hot': 0, 'Temperature_Mild': 1, 'Wind_Weak': 1,

'Humidity_Normal': 1}])

prediction = model.predict(new_sample)

print("Prediction for new sample:", prediction)

Potential Output:

Accuracy: 0.6666666666666666

Prediction for new sample: ['Yes']


EXPERIMENT 4: Backpropagation Algorithm
Problem Statement:

Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same
using appropriate data sets.

Similar Question:

Explain the steps involved in the Backpropagation algorithm for a single layer perceptron with a sigmoid
activation function. Illustrate with a simple example.

Conceptual Code (Python - using a library for brevity):

from sklearn.neural_network import MLPClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.datasets import load_iris

iris = load_iris()

X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = MLPClassifier(hidden_layer_sizes=(5,), activation='logistic', max_iter=1000, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Potential Output:

Accuracy: 0.9777777777777777
EXPERIMENT 5: Naive Bayesian Classifier
Problem Statement:

Write a program to implement the naive Bayesian classifier for a sample training data set stored as a .CSV
file. Compute the accuracy of the classifier, considering few test data sets.

Similar Question:

Given the following training data for classifying emails as "Spam" or "Not Spam":

Calculate the probability of an email containing (Word1=Yes, Word2=No, Word3=Yes) being classified
as "Spam" using the Naive Bayes approach.

Conceptual Code (Python - using a library for brevity):

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score

# Sample CSV data (hypothetical email.csv):

# Word1,Word2,Word3,Class

# Yes,No,Yes,Spam

# ...

data = pd.read_csv('email.csv')

X = pd.get_dummies(data.drop('Class', axis=1), drop_first=True)


y = data['Class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = GaussianNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Potential Output (MAY VARY):

Accuracy: 0.75
EXPERIMENT 6: Naive Bayesian Classifier for Document Classification
Problem Statement:

Assuming a set of documents that need to be classified, use the naive Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.

Similar Question (JAVA Focused):

Outline the steps involved in building a Naive Bayes classifier for text classification using Java libraries
like Apache Mahout or Weka.

Conceptual Code (Python - using scikit-learn for text processing):

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score, precision_score, recall_score

# Sample document data (hypothetical documents.txt - each line is a document with label)

# This is a positive document. POS

# This is another positive one. POS

# This is a negative review. NEG

# Another negative sentence here. NEG

with open('documents.txt', 'r') as f:

documents = [line.strip().split(' ', -1) for line in f]

texts = [' '.join(doc[:-1]) for doc in documents]

labels = [doc[-1] for doc in documents]

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(texts)

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)


model = MultinomialNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='weighted')

recall = recall_score(y_test, y_pred, average='weighted')

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

Potential Output (MAY VARY):

Accuracy: 1.0

Precision: 1.0

Recall: 1.0
EXPERIMENT 7: Bayesian Network for Medical Diagnosis
Problem Statement:

Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate
the diagnosis of heart patients using a standard Heart Disease Data Set. You can use Java/Python ML
library classes/API.

Similar Question:

Describe the structure of a simple Bayesian network for diagnosing a specific medical condition (e.g., flu)
based on symptoms like fever, cough, and sore throat. Define the conditional probability tables for each
node.

Conceptual Code (Python - using a library for Bayesian Networks):

import pandas as pd

from pgmpy.models import BayesianNetwork

from pgmpy.estimators import MaximumLikelihoodEstimator

from pgmpy.inference import VariableElimination

# Sample Heart Disease Data (hypothetical heart.csv - simplified)

# ChestPain,BlockedArtery,HeartDisease

# Yes,Yes,Yes

# No,Yes,Yes

# Yes,No,No

# No,No,No

data = pd.read_csv('heart.csv')

# Define the Bayesian Network structure

model = BayesianNetwork([('ChestPain', 'HeartDisease'), ('BlockedArtery', 'HeartDisease')])


# Estimate parameters from data

model.fit(data, estimator=MaximumLikelihoodEstimator)

# Perform inference

inference = VariableElimination(model)

query_result = inference.query(variables=['HeartDisease'], evidence={'ChestPain': 'Yes', 'BlockedArtery':


'Yes'})

print(query_result)

Potential Output (MAY VARY):

You might also like