0% found this document useful (0 votes)

12 views14 pages

ML Lab Manual

The document outlines various experiments demonstrating different machine learning algorithms including FIND-S, Candidate-Elimination, ID3, Backpropagation, Naive Bayesian Classifier, and Bayesian Network. Each experiment includes a problem statement, conceptual code in Python, and potential outputs for clarity. The focus is on implementing these algorithms using training data from CSV files and evaluating their performance.

Uploaded by

soupayanwork

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views14 pages

ML Lab Manual

Uploaded by

soupayanwork

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

EXPERIMENT 1: FIND-S Algorithm

Problem Statement:

Implement and demonstrate the FIND-S algorithm to find the most specific hypothesis consistent with a
given set of positive training examples. Read the training data from a ‘.CSV’ file.

Similar Question:

Consider the following training examples for a concept "EnjoySport":

Assuming the hypothesis space is a conjunction of attributes, trace the execution of the FIND-S algorithm
and determine the final most specific hypothesis.

Conceptual Code (Python):

import pandas as pd

def find_s(data):

hypothesis = None # Start with no hypothesis

# Iterate through each row of the data

for i, row in data.iterrows():

if row['EnjoySport'] == 'Yes': # Focus only on positive examples

if hypothesis is None:

# Initialize with the first positive example

hypothesis = list(row[:-1])

else:
# Generalize the hypothesis if attributes don't match

for j in range(len(hypothesis)):

if hypothesis[j] != row[j]:

hypothesis[j] = '?'

return hypothesis

# Sample CSV data (hypothetical data.csv):

# Sky,AirTemp,Humidity,Wind,Water,Forecast,EnjoySport

# Sunny,Warm,Normal,Strong,Warm,Same,Yes

# Sunny,Warm,High,Strong,Warm,Same,Yes

# Sunny,Warm,High,Strong,Cool,Change,Yes

# Load the training data from the CSV file

data = pd.read_csv('data.csv')

specific_hypothesis = find_s(data)

print("Most Specific Hypothesis:", specific_hypothesis)

Potential Output:

Most Specific Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']

EXPERIMENT 2: Candidate-Elimination Algorithm
Problem Statement:

For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-
Elimination algorithm to output a description of the set of all hypotheses consistent with the training
examples (version space).

Similar Question:

Using the same "EnjoySport" dataset from Experiment 1, trace the Candidate-Elimination algorithm,
showing the evolution of the General Boundary (G) and Specific Boundary (S) after each training
example.

Conceptual Code (Python):

import pandas as pd

def is_consistent(hypothesis, instance):

for i in range(len(hypothesis)):

if hypothesis[i] != '?' and hypothesis[i] != instance[i]:

return False

return True

def candidate_elimination(data):

num_attributes = len(data.columns) - 1

specific_boundary = ['?' for _ in range(num_attributes)]

general_boundary = [['?' for _ in range(num_attributes)] for _ in range(num_attributes)]

for i, row in data.iterrows():

instance = list(row[:-1])

target = row['EnjoySport']

if target == 'Yes':

for j in range(num_attributes):

if specific_boundary[j] == '?':
specific_boundary[j] = instance[j]

elif specific_boundary[j] != instance[j]:

specific_boundary[j] = '?'

for g in list(general_boundary):

if not is_consistent(g, instance):

general_boundary.remove(g)

elif target == 'No':

new_generalizations = []

for j in range(num_attributes):

if specific_boundary[j] != '?' and specific_boundary[j] != instance[j]:

new_general_hypothesis = list(specific_boundary)

new_general_hypothesis[j] = '?'

if new_general_hypothesis not in new_generalizations and new_general_hypothesis not in

general_boundary:

new_generalizations.append(new_general_hypothesis)

for new_hyp in new_generalizations:

is_more_general = True

for g in general_boundary:

if all((gh == '?' or gh == nh) for gh, nh in zip(g, new_hyp)):

is_more_general = False

break

if is_more_general:

general_boundary.append(new_hyp)

general_boundary[:] = [g for g in general_boundary if not all((s == '?' or s == g[i]) for i, s in

enumerate(specific_boundary))]

final_general_boundary = []
for g1 in general_boundary:

is_minimal = True

for g2 in general_boundary:

if g1 != g2 and all((g2_val == '?' or g2_val == g1_val) for g1_val, g2_val in zip(g1, g2)) and
any(g1_val != g2_val for g1_val, g2_val in zip(g1, g2)):

is_minimal = False

break

if is_minimal and g1 not in final_general_boundary:

final_general_boundary.append(g1)

return specific_boundary, final_general_boundary

# Assuming 'data.csv' from Experiment 1

data = pd.read_csv('data.csv')

s_boundary, g_boundary = candidate_elimination(data)

print("Specific Boundary (S):", s_boundary)

print("General Boundary (G):", g_boundary)

Potential Output:

Specific Boundary (S): ['Sunny', 'Warm', '?', 'Strong', '?', '?']

General Boundary (G): [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'],
['?', '?', '?', '?', '?', '?']]
EXPERIMENT 3: ID3 Algorithm
Problem Statement:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.

Similar Question:

Consider the following "PlayTennis" dataset:

Export to Sheets

Calculate the initial entropy of the "PlayTennis" attribute. Then, calculate the information gain for the
"Outlook" attribute.

Conceptual Code (Python):

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Sample CSV data (hypothetical tennis.csv):

# Outlook,Temperature,Humidity,Wind,PlayTennis

# Sunny,Hot,High,Weak,No

# ... (rest of the data)

data = pd.read_csv('tennis.csv')

X = data.drop('PlayTennis', axis=1)

y = data['PlayTennis']

X = pd.get_dummies(X, drop_first=True) # Convert categorical features

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier(criterion='entropy')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

# Predicting for a new sample

new_sample = pd.DataFrame([{'Outlook_Rainy': 0, 'Outlook_Sunny': 1,

'Temperature_Hot': 0, 'Temperature_Mild': 1, 'Wind_Weak': 1,

'Humidity_Normal': 1}])

prediction = model.predict(new_sample)

print("Prediction for new sample:", prediction)

Potential Output:

Accuracy: 0.6666666666666666

Prediction for new sample: ['Yes']

EXPERIMENT 4: Backpropagation Algorithm
Problem Statement:

Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same
using appropriate data sets.

Similar Question:

Explain the steps involved in the Backpropagation algorithm for a single layer perceptron with a sigmoid
activation function. Illustrate with a simple example.

Conceptual Code (Python - using a library for brevity):

from sklearn.neural_network import MLPClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.datasets import load_iris

iris = load_iris()

X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = MLPClassifier(hidden_layer_sizes=(5,), activation='logistic', max_iter=1000, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Potential Output:

Accuracy: 0.9777777777777777
EXPERIMENT 5: Naive Bayesian Classifier
Problem Statement:

Write a program to implement the naive Bayesian classifier for a sample training data set stored as a .CSV
file. Compute the accuracy of the classifier, considering few test data sets.

Similar Question:

Given the following training data for classifying emails as "Spam" or "Not Spam":

Calculate the probability of an email containing (Word1=Yes, Word2=No, Word3=Yes) being classified
as "Spam" using the Naive Bayes approach.

Conceptual Code (Python - using a library for brevity):

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score

# Sample CSV data (hypothetical email.csv):

# Word1,Word2,Word3,Class

# Yes,No,Yes,Spam

# ...

data = pd.read_csv('email.csv')

X = pd.get_dummies(data.drop('Class', axis=1), drop_first=True)

y = data['Class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = GaussianNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Potential Output (MAY VARY):

Accuracy: 0.75
EXPERIMENT 6: Naive Bayesian Classifier for Document Classification
Problem Statement:

Assuming a set of documents that need to be classified, use the naive Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.

Conceptual Code (Python - using scikit-learn for text processing):

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score, precision_score, recall_score

# Sample document data (hypothetical documents.txt - each line is a document with label)

# This is a positive document. POS

# This is another positive one. POS

# This is a negative review. NEG

# Another negative sentence here. NEG

with open('documents.txt', 'r') as f:

documents = [line.strip().split(' ', -1) for line in f]

texts = [' '.join(doc[:-1]) for doc in documents]

labels = [doc[-1] for doc in documents]

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(texts)

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

model = MultinomialNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='weighted')

recall = recall_score(y_test, y_pred, average='weighted')

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

Potential Output (MAY VARY):

Accuracy: 1.0

Precision: 1.0

Recall: 1.0
EXPERIMENT 7: Bayesian Network for Medical Diagnosis
Problem Statement:

Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate
the diagnosis of heart patients using a standard Heart Disease Data Set. You can use Java/Python ML
library classes/API.

Similar Question:

Describe the structure of a simple Bayesian network for diagnosing a specific medical condition (e.g., flu)
based on symptoms like fever, cough, and sore throat. Define the conditional probability tables for each
node.

Conceptual Code (Python - using a library for Bayesian Networks):

import pandas as pd

from pgmpy.models import BayesianNetwork

from pgmpy.estimators import MaximumLikelihoodEstimator

from pgmpy.inference import VariableElimination

# Sample Heart Disease Data (hypothetical heart.csv - simplified)

# ChestPain,BlockedArtery,HeartDisease

# Yes,Yes,Yes

# No,Yes,Yes

# Yes,No,No

# No,No,No

data = pd.read_csv('heart.csv')

# Define the Bayesian Network structure

model = BayesianNetwork([('ChestPain', 'HeartDisease'), ('BlockedArtery', 'HeartDisease')])

# Estimate parameters from data

model.fit(data, estimator=MaximumLikelihoodEstimator)

# Perform inference

inference = VariableElimination(model)

query_result = inference.query(variables=['HeartDisease'], evidence={'ChestPain': 'Yes', 'BlockedArtery':

'Yes'})

print(query_result)

Potential Output (MAY VARY):

DevOps Engineer Learning Path Guide
No ratings yet
DevOps Engineer Learning Path Guide
10 pages
Example Sky Airtemp Humidity Wind Water Forecast Enjoysport 1 2 3 4
No ratings yet
Example Sky Airtemp Humidity Wind Water Forecast Enjoysport 1 2 3 4
6 pages
Horse With Cowboy
No ratings yet
Horse With Cowboy
1 page
Hyundai Engine HMC l4kb9 Shop Manual
100% (64)
Hyundai Engine HMC l4kb9 Shop Manual
10 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
Machine Learning Lab: Algorithms & Implementation
No ratings yet
Machine Learning Lab: Algorithms & Implementation
33 pages
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
No ratings yet
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
31 pages
Machine Learning Lab Record: Dr. Sarika Hegde
No ratings yet
Machine Learning Lab Record: Dr. Sarika Hegde
23 pages
Fees Structure Assam Down Town University For The Session 2023 2
No ratings yet
Fees Structure Assam Down Town University For The Session 2023 2
2 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Machine Learning Algorithms Lab
No ratings yet
Machine Learning Algorithms Lab
48 pages
Machine Learninf File Final
No ratings yet
Machine Learninf File Final
45 pages
Machine Learning Manual Final
No ratings yet
Machine Learning Manual Final
37 pages
MLlab Manual LIET
No ratings yet
MLlab Manual LIET
52 pages
1.implement FIND-S Algorithm: Desription
No ratings yet
1.implement FIND-S Algorithm: Desription
19 pages
ML Lab - 231009 - 210335
No ratings yet
ML Lab - 231009 - 210335
38 pages
VFD Application Checklist
No ratings yet
VFD Application Checklist
3 pages
ML Record
No ratings yet
ML Record
18 pages
New ML Lab Manual
No ratings yet
New ML Lab Manual
29 pages
Module 7 Intangibles
No ratings yet
Module 7 Intangibles
14 pages
Cat 2 Document Likkitha
No ratings yet
Cat 2 Document Likkitha
80 pages
(ML) Machine Learning Lab Manual
No ratings yet
(ML) Machine Learning Lab Manual
25 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
MLT Shivani
No ratings yet
MLT Shivani
8 pages
ML1 3 Merged
No ratings yet
ML1 3 Merged
19 pages
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
No ratings yet
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
20 pages
ML Final-1
No ratings yet
ML Final-1
7 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
30 pages
IV - ML Lab
No ratings yet
IV - ML Lab
31 pages
18 Amazon Rally-1
No ratings yet
18 Amazon Rally-1
11 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
105 pages
ML Lab PFG - Removed - Removed - Removed
No ratings yet
ML Lab PFG - Removed - Removed - Removed
22 pages
MCB Types
No ratings yet
MCB Types
3 pages
ML Lab Manual-99
No ratings yet
ML Lab Manual-99
23 pages
Chapter 7 Input Tax Credit Under GST
No ratings yet
Chapter 7 Input Tax Credit Under GST
28 pages
Patiala Army Recruitment Rally 2020
No ratings yet
Patiala Army Recruitment Rally 2020
9 pages
MLP - Iv Eee
No ratings yet
MLP - Iv Eee
36 pages
Topic-Economic Role For Advertisement Development
No ratings yet
Topic-Economic Role For Advertisement Development
11 pages
ML Manual
No ratings yet
ML Manual
74 pages
DELTA V PDS S-Series Horizontal Carriers
No ratings yet
DELTA V PDS S-Series Horizontal Carriers
7 pages
ML 1prog
No ratings yet
ML 1prog
2 pages
AD3461 - ML Lab Manual
No ratings yet
AD3461 - ML Lab Manual
54 pages
Lab Manual
No ratings yet
Lab Manual
55 pages
2nd Exam TQ
No ratings yet
2nd Exam TQ
23 pages
Lab Manual
No ratings yet
Lab Manual
25 pages
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
No ratings yet
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
25 pages
Final Lab Programs
No ratings yet
Final Lab Programs
52 pages
Relational DB Design Lab Guide
No ratings yet
Relational DB Design Lab Guide
2 pages
Form 4 pH Determination Guide
0% (1)
Form 4 pH Determination Guide
3 pages
Codes & Outputs
No ratings yet
Codes & Outputs
9 pages
Screenshot 2023-12-07 at 11.07.49 AM
No ratings yet
Screenshot 2023-12-07 at 11.07.49 AM
14 pages
Os Lec 4 Process
No ratings yet
Os Lec 4 Process
7 pages
Inquiry Is A Learning Process That Motivates You To Obtain Knowledge or Information About People
100% (2)
Inquiry Is A Learning Process That Motivates You To Obtain Knowledge or Information About People
3 pages
MH 400
No ratings yet
MH 400
81 pages
7805BG
No ratings yet
7805BG
28 pages
Iseg Datasheet BPS en
No ratings yet
Iseg Datasheet BPS en
12 pages
22K61A0618 - Removed - Lab Manual Sasi CLD
No ratings yet
22K61A0618 - Removed - Lab Manual Sasi CLD
25 pages
The Need of MEMS
No ratings yet
The Need of MEMS
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
ML Final
No ratings yet
ML Final
19 pages
Class 11 Physics Exam Paper
No ratings yet
Class 11 Physics Exam Paper
4 pages
Ictasol
No ratings yet
Ictasol
1 page
ML Lab Programs
No ratings yet
ML Lab Programs
42 pages
551 1R-14 Preview
No ratings yet
551 1R-14 Preview
4 pages
Prolegomenon To Geisha As A Cultural Performer: Miyako Odori, The Gion School and Representation of A Traditional" Japan - Mariko Okada
No ratings yet
Prolegomenon To Geisha As A Cultural Performer: Miyako Odori, The Gion School and Representation of A Traditional" Japan - Mariko Okada
7 pages
ML Lab
No ratings yet
ML Lab
21 pages
MANUAL
No ratings yet
MANUAL
33 pages
Amit MLT1
No ratings yet
Amit MLT1
22 pages
Lab Manual Final
No ratings yet
Lab Manual Final
34 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
Practical 1: A. Design A Simple Machine Learning Model To Train The Training Instances and Test The Same
No ratings yet
Practical 1: A. Design A Simple Machine Learning Model To Train The Training Instances and Test The Same
30 pages
ML1408-Machine Learning Lab Programs
No ratings yet
ML1408-Machine Learning Lab Programs
17 pages
ML Lab Manual - Merged
No ratings yet
ML Lab Manual - Merged
44 pages
ML Lab Record
No ratings yet
ML Lab Record
49 pages
Data Set
No ratings yet
Data Set
10 pages
Quarmya Braswell - 2 Explore HierarchyOrganisms StationLab Digital D
No ratings yet
Quarmya Braswell - 2 Explore HierarchyOrganisms StationLab Digital D
25 pages
Cse Machine Learning Lab Manual
No ratings yet
Cse Machine Learning Lab Manual
22 pages
Definition of Tax MCQs
No ratings yet
Definition of Tax MCQs
2 pages
AI&ML
No ratings yet
AI&ML
9 pages
Project Two
No ratings yet
Project Two
14 pages
Lec 1
No ratings yet
Lec 1
7 pages
Machine Learning Lab Manual 10 15
No ratings yet
Machine Learning Lab Manual 10 15
6 pages
ML Lab Programs
No ratings yet
ML Lab Programs
16 pages
Mllab Manual
No ratings yet
Mllab Manual
54 pages
IV - ML Lab1
No ratings yet
IV - ML Lab1
40 pages

ML Lab Manual

Uploaded by

ML Lab Manual

Uploaded by

EXPERIMENT 1: FIND-S Algorithm

Consider the following training examples for a concept "EnjoySport":

Conceptual Code (Python):

hypothesis = None # Start with no hypothesis

# Iterate through each row of the data

for i, row in data.iterrows():

if row['EnjoySport'] == 'Yes': # Focus only on positive examples

# Initialize with the first positive example

# Sample CSV data (hypothetical data.csv):

# Load the training data from the CSV file

print("Most Specific Hypothesis:", specific_hypothesis)

Most Specific Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']

Conceptual Code (Python):

def is_consistent(hypothesis, instance):

if hypothesis[i] != '?' and hypothesis[i] != instance[i]:

specific_boundary = ['?' for _ in range(num_attributes)]

general_boundary = [['?' for _ in range(num_attributes)] for _ in range(num_attributes)]

for i, row in data.iterrows():

elif specific_boundary[j] != instance[j]:

if not is_consistent(g, instance):

elif target == 'No':

if specific_boundary[j] != '?' and specific_boundary[j] != instance[j]:

if new_general_hypothesis not in new_generalizations and new_general_hypothesis not in

for new_hyp in new_generalizations:

if all((gh == '?' or gh == nh) for gh, nh in zip(g, new_hyp)):

general_boundary[:] = [g for g in general_boundary if not all((s == '?' or s == g[i]) for i, s in

if is_minimal and g1 not in final_general_boundary:

return specific_boundary, final_general_boundary

# Assuming 'data.csv' from Experiment 1

s_boundary, g_boundary = candidate_elimination(data)

print("Specific Boundary (S):", s_boundary)

print("General Boundary (G):", g_boundary)

Specific Boundary (S): ['Sunny', 'Warm', '?', 'Strong', '?', '?']

Consider the following "PlayTennis" dataset:

Conceptual Code (Python):

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# ... (rest of the data)

X = pd.get_dummies(X, drop_first=True) # Convert categorical features

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

accuracy = accuracy_score(y_test, y_pred)

# Predicting for a new sample

new_sample = pd.DataFrame([{'Outlook_Rainy': 0, 'Outlook_Sunny': 1,

'Temperature_Hot': 0, 'Temperature_Mild': 1, 'Wind_Weak': 1,

print("Prediction for new sample:", prediction)

Prediction for new sample: ['Yes']

Conceptual Code (Python - using a library for brevity):

from sklearn.neural_network import MLPClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.datasets import load_iris

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = MLPClassifier(hidden_layer_sizes=(5,), activation='logistic', max_iter=1000, random_state=42)

accuracy = accuracy_score(y_test, y_pred)

Conceptual Code (Python - using a library for brevity):

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score

# Sample CSV data (hypothetical email.csv):

X = pd.get_dummies(data.drop('Class', axis=1), drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

accuracy = accuracy_score(y_test, y_pred)

Potential Output (MAY VARY):

Similar Question (JAVA Focused):

Conceptual Code (Python - using scikit-learn for text processing):

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score, precision_score, recall_score

# This is a positive document. POS

# This is another positive one. POS

# This is a negative review. NEG

# Another negative sentence here. NEG

with open('documents.txt', 'r') as f:

documents = [line.strip().split(' ', -1) for line in f]