Machine Learning
Lab Programs
1. Install and set-up Python and Pardas. and essential libmates Like
Numpy
● Installation of Python and Set-up Python
Search Python in the Google search bar “https://www.python.org/downloads/
● ”and download. The latest version of Python from the Google.
● Python latest version, Python 3.12.1 64-bit/32-bit and download the
executable fide.
Downloading the Python Installer
● Open the .exe file, such as Python 3.12.1 and 64, then launch the
python installer.
● Choose the option to install the launcher for all users by checking
the corresponding checkbox,
● verify the python installation in windows
Go to Python integrated Development Environment [IDLE] in windows search
bar, you can see the “IDLE (python3.12.64-bit)”open the IDLE screen itself you
can see the version.
This gives the com formation of successfully installation of Python.
Installation of essential Packages Numpy and Pandas.
a. Install numpy package:
NumPy is an open-source Python library that facilitates efficient
numerical operations on large quantities of data. There are a few functions
that exist in NumPy that we use on pandas DataFrames.
It is defined as a Python package used for performing the various
numerical computations and processing of the multidimensional and single-
dimensional array elements. The calculations using Numpy arrays are faster
than the normal Python array. It is also capable of handling a vast amount of
data
● Steps to install Numpy is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install numpy:
By typing the command
Pip install numpy
b. Install pandas package:
Pandas is a very popular library for working with data (its goal is to be
the most powerful and flexible open-source tool, and in our opinion, it has
reached that goal). DataFrames are at the center of pandas. A DataFrame
is structured like a table or spreadsheet. The rows and the columns both
have indexes, and you can perform operations on rows or columns
separately. It can perform five significant steps required for processing and
analysis of data irrespective of the origin of the data, i.e., load,
manipulate, prepare, model, and analyze.
● Steps to install pandas is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install pandas:
By typing the command
Pip install pandas
Simple program to show the installed library versions to provide conformation of
successful installing.
import
numpy
import
pandas
print("numpy library version is: ")
print(numpy. version ) #please type two underscore symbols.
print("numpy library is successfully installed")
print(" ")
print("pandas library
version is: ") print(pandas.
version )
print("pandas library is successfully installed")
Program 2 Introduce scikit-learn as a machine learning library.
Scikit-learn is a popular open-source machine learning library in Python that
offers a comprehensive set of tools and algorithms for data analysis,
modeling, and machine learning tasks. It is built on foundational libraries like
NumPy, SciPy, and Matplotlib. Scikit-learn provides a user-friendly and
efficient framework for both beginners and experts in the field of data
science.
Some key points to introduce scikit-learn as a machine learning library:
1. Comprehensive Machine Learning Library: Scikit-learn offers a wide
range of machine learning algorithms and tools for various tasks such
as classification, regression, clustering, dimensionality reduction, and
more.
2. User-Friendly and Easy to Use: It is designed with a user-friendly
interface and simple syntax, making it accessible for both beginners and
experienced machine learning practitioners.
3. Integration with Scientific Computing Libraries: Scikit-learn
integrates well with other scientific computing libraries in Python such
as NumPy, SciPy, and Matplotlib, providing a powerful environment for
machine learning tasks.
4. Extensive Documentation and Community Support: The library
comes with comprehensive documentation, tutorials, and examples to
help users understand and implement machine learning algorithms
effectively. Additionally, there is a vibrant community around scikit-learn
that provides support and contributions.
5. Efficient Implementation of Algorithms: Scikit-learn is built on top of
NumPy, SciPy, and Cython, which allows for efficient implementation of
machine learning algorithms and scalability to large datasets.
6. Support for Model Evaluation and Validation: The library provides
tools for model evaluation, hyperparameter tuning, cross-validation, and
performance metrics, enabling users to assess and improve the quality
of their machine learning models.
7. Flexibility and Customization: Scikit-learn offers flexibility for
customization and parameter tuning, allowing users to adapt algorithms
to their specific requirements and datasets.
8. Wide Adoption and Industry Usage: Due to its ease of use,
performance, and versatility, scikit-learn is widely adopted in academia,
research, and industry for various machine learning applications.
Overall, scikit-learn is a powerful and versatile machine learning library in
Python that empowers users to build and deploy machine learning models
efficiently for a wide range of tasks and applications.
Lab Program 3: Install and set up scikit-learn and other necessary
tools.
PIP is a package manager for Python, which means it allows you to
install and manage libraries and dependencies that are supplemental to the
standard library. (A package contains all the files you need for a module, and
modules are Python code libraries that you can include in your projects.) PIP3
is also a package manager, designed to replace PIP to solve few problems
caused by it. Latest versions of python 3.x allows the use of pips command for
installing python libraries.
Scikit-learn (Sklearn) Library:
Scikit-learn is the most useful machine learning library. It provides
modules for data analysis and statistical modelling. It provides a wide range
of efficient tools such as classification, regression, and clustering and
dimensionality reduction via a consistence interface in Python. This library,
which is largely written in Python, is built upon following essential libraries:
NumPy, Pandas, SciPy and Matplotlib libraries.
Install numpy library
● Steps to install Numpy is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install numpy:
By typing the command
Pip install numpy
● Steps to install pandas is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install pandas:
By typing the command
Pip install pandas
● Steps to install matplotlib is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install pandas:
By typing the command
Pip install matplotlib
● Steps to install scipy is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install pandas:
By typing the command
Pip install scipy
● Steps to install scikit-learn(sklearn) is
Step1: open Command prompt CMD
Step2: open the python directory
C:\User\Appdata\Local\Programs\Python\Python 3.12\
Step3: install pandas:
By typing the command
Pip install scikit-learn
Simple program to show the installed library versions to provide conformation of
successful installing.
import numpy
import pandas
import scipy
import matplotlib
import sklearn
print("numpy library version is: ")
print(numpy.__version__) #please type two underscore symbols.
print("numpy library is successfully installed")
print(" ")
print("pandas library version is: ")
print(pandas.__version__)
print("pandas library is successfully installed")
print(" ")
print("scipy library version is: ")
print(scipy.__version__)
print("scipy library is successfully installed")
print(" ")
print("matplotlib library version is: ")
print(matplotlib.__version__)
print("matplotlib library is successfully installed")
print(" ")
print("scikit-learn(sklearn) library version is: ")
print(sklearn.__version__)
print("sklearn library is successfully installed")
Lab Program 4: Write a program to Load and explore the dataset
of .CVS and excel files using pandas.
import pandas as pd
csv_file_path='C:\\ML_Projects\\sample_data.csv'
excel_file_path='C:\\ML_Projects\\sample_data.xlsx'
data_csv=pd.read_csv(csv_file_path)
print("CSV File data:")
print(data_csv)
data_excel=pd.read_excel(excel_file_path)
print("\nExcel File data:")
print(data_excel)
print("\n Data Descriptions:")
print("CSV data Decription:")
print(data_csv.describe())
print("\n Excel data Decription:")
print(data_excel.describe())
print("\n Datatypes in CSv files:")
print(data_csv.dtypes)
print("\n Datatypes in Excel files:")
print(data_excel.dtypes)
Output
CSV File data:
Name Age Score
0 Manoj 19 95
1 Dilip 20 97
2 Manjula 40 35
3 Rakesh 24 45
4 Kushal 22 80
Excel File data:
Name Course Sem
0 Rajesh BCA 1
1 Ramesh BCA 2
2 Swati BCOM 1
3 Florina BCOM 3
4 Pooja BBA 2
5 Raghu BBA 4
Data Descriptions:
CSV data Decription:
Age Score
count 5.000000 5.000000
mean 25.000000 70.400000
std 8.602325 28.736736
min 19.000000 35.000000
25% 20.000000 45.000000
50% 22.000000 80.000000
75% 24.000000 95.000000
max 40.000000 97.000000
Excel data Decription:
Sem
count 6.000000
mean 2.166667
std 1.169045
min 1.000000
25% 1.250000
50% 2.000000
75% 2.750000
max 4.000000
Datatypes in CSv files:
Name object
Age int64
Score int64
dtype: object
Datatypes in Excel files:
Name object
Course object
Sem int64
dtype: object
Lab Program 5: Write a program to visualize the dataset to gain insights using
Matplotlib or Seaborn by plotting scatter plots, bar charts.
import pandas as pd
import matplotlib.pyplot as plt
data= pd.read_csv('C:\\ML_Projects\\study_data.csv')
plt.figure(figsize=(14,7))
plt.subplot(1,2,1)
plt.scatter(data['Study Hours'], data['Exam Score'], color='cyan', edgecolor='k', alpha=0.7)
plt.title('Study Hours vs .Exam Scores')
plt.xlabel('Study Hours')
plt.ylabel('Exam Scores')
plt.grid(True)
bins=[0,2,4,6,8,10,12]
labels=['0-2', '2-4', '4-6', '6-8', '8-10', '10-12']
data['Study Hours Range']=pd.cut(data['Study Hours'], bins=bins, labels=labels, right=False)
grouped_data=data.groupby('Study Hours Range')['Exam Score'].mean()
plt.subplot(1,2,2)
grouped_data.plot(kind='bar', color='pink')
plt.title('Average Exam Score by Study Hour Range')
plt.xlabel('Study Hours Range')
plt.ylabel('Average Exam Scores')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
output
Lab Program 6: Write a program to Handle missing data, encode
categorical variables, and perform feature scaling.
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
data={
'Age': [25, 30, None, 28, 35],
'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
'Income': [50000, 60000, 45000, None, 70000]
}
df= pd.DataFrame(data)
#Handling missing data.
imputer = SimpleImputer(strategy='mean')
df[['Age', 'Income']] = imputer.fit_transform(df[['Age', 'Income']])
#Print data after handling missing values
print("Data after handling missing values:")
print(df)
#Encoding categorical variables
encoder = OneHotEncoder()
encoded_data = encoder.fit_transform(df[['Gender']]).toarray()
#Print data after categorical encoding
encoded_df= pd.DataFrame(encoded_data,
columns=encoder.get_feature_names_out(['Gender']))
print("\nData after categorical encoding:")
print(encoded_df)
scaler = StandardScaler()
scaled_data =scaler.fit_transform(df[['Age', 'Income']])
#Print data after feature scaling
scaled_df = pd.DataFrame(scaled_data, columns=['Scaled Age', 'Scaled Income'])
print("\nData after feature scaling:")
print(scaled_df)
Output
Lab Program 7: Write a program to implement a k-Nearest Neighbours (k-NN) classifier
using scikitlearn and Train the classifier on the dataset and evaluate its performance.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
#Dummy student data: exam score 1, exam score 2, pass/fail (features)
X = np.array([[88, 75], [95, 90], [60, 50], [45, 30], [30, 48], [85, 95], [70, 60], [50, 55], [40, 45], [60,
70]])
y= np.array([1, 1, 0, 0, 0, 1, 1, 0, 0, 1]) #Binary classes for demonstration
#Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2,random_state=42)
#Initialize the K-NN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)
#Train the classifier on the training data
knn.fit(X_train,y_train)
# Evaluate the classifier's performance
y_pred=knn.predict(X_test)
accuracy=accuracy_score(y_test, y_pred)
print("Accuracy on the test set: {:.2f}".format(accuracy))
#Take user input for exam scores
exam_score1 = float(input("Enter Exam Score 1: "))
exam_score2 = float(input("Enter Exam Score 2: "))
# Prepare the user input for prediction
user_input = np.array([[exam_score1, exam_score2]])
# Use the trained k-NN classifier to predict the outcome
predicted_outcome=knn.predict(user_input)
if predicted_outcome [0] == 1:
print("Based on the exam scores provided, the student is predicted to pass.")
else:
print("Based on the exam scores provided, the student is predicted to fail.")
OutPut:
Lab Program 08. Write a program to implement a linear regression model for regression
tasks and Train the model on a dataset.
#Regression Algorithm
import numpy as np
import matplotlib.pyplot as plt
X=np.array([18,17,26,19,27,31,14,29,32,26]) #Experince in months
Y=np.array([16000,11000,23000,23000,23000, 32000,15000, 33000, 32000,
32000]) #Salary
print("X-values are:")
print (X)
print("Y-values are:")
print (Y)
print()
#Find mean values of X and Y data.
mean_x=np.mean (X)
print (f"Mean of X is: {mean_x}")
mean_y=np.mean(Y)
print (f"Mean of Y is: {mean_y}")
print()
variance_x = np.var (X)
print (f"Variance of X is: {variance_x}")
covariance= (np.sum((X- mean_x) * (Y -mean_y)))/(len(X))
print (f"Covariance of X is: {covariance}")
print()
#Find a and b values.
a= covariance / variance_x
print("a =covariance / variance_x so, ")
print (f"a={a}")
b = mean_y-a* mean_x
print("b= mean_y-a *mean_x so, ")
print (f"b= {b}")
print()
#Predict Y- values to the existing X- values.
Y_pred=a* X + b
print (f"Regression Line: Y = {a:.2f} + {b:.2f}X")
print("Y-values obtained are =" , Y_pred)
print("And corresponding X- values are =" , X)
print()
plt.scatter (X, Y, label="Original Data")
plt.plot(X, Y_pred, color="red", label=f"Regression Line: Y = {a:.2f} + {b:.2f}X")
plt.xlabel("Experince")
plt.ylabel("Salary")
plt.legend()
plt.grid(True)
plt.show()
# Getting the Solution that is Y- value, for new data set that is X- value.
new_X = 7.5
new_Y=a* new_X + b
print()
print (f"Predict Y-value using= {a:.2f} + {b:.2f}X for new X- value= {new_X} ")
print (f"Predicted Y-value is =(new_Y:.2f) ")
Lab Program 09. Write a program to implement a decision tree classifier using
scikit-learn and visualize the decision tree and understand its splits.
from sklearn.tree import DecisionTreeClassifier, plot_tree
from matplotlib.pyplot import figure,show
import matplotlib.pyplot as plt
# Define some features and corresponding classifications
features = [[140,1],[130,0],[150,0],[170,1],[180,1],[100,0],[172,1]]
classifications = ["play","don't play","don't play","play","play","don't play","play",]
import numpy as np
features=np.array(features)
classifications=np.array(classifications)
# Create a decision tree classifier
clf = DecisionTreeClassifier()
# Train the classifier on the data
clf = clf.fit(features,classifications)
# Print the Prediction
predictions = clf.predict([[170,1]])
# Creaye a figure for Plotting the Tree
print("Decision Tree Classifier:")
print("Predict Class Label for New Instance is: [170,1]")
print("Class Label for New Instance is:",predictions[0])
plt.figure(figsize=(5,8))
plot_tree(clf,feature_names=["Temperature","Huminity"],class_names=classificatio
ns,filled=True,rounded=True)
plt.show()
Lab Program 10. Write a program to Implement K-Means clustering
and Visualize clusters.
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
data=[[1,1],[1.5,1.8], [5,8], [8,8],[1,0.6],[9,11]]
print("Considered data for K-Means clustering is:")
print(data)
print("Considered data as numpy list is:")
data=np.array(data)
print(data)
print("Assumed K-Value is:")
k=3
print(k)
print("K-Means object is given following value:")
Kmeans=kMeans(n_clusters=k, random_state=42, n_init=10)
print(Kmeans)
Kmeans.fit(data)
print("Integer labels provided to each data points are:")
labels=Kmeans.Labels_
print(labels)
print("Calculated centroid points are:")
centroids=Kmeans.cluster_centers_
print(Centroids)
plt.scatter(data[:,0], data[:,1], c=labels, cmap='viridis')
plt.scatter(centroids[:,0], centroids[:,1], s=60, marker='x', c='red')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("K-Means clustering (k=" + str(k) + ")")
plt.grid()
plt.show()