Practical 1 - 1.
Singular Value Decomposition (SVD)
Concept:
SVD is a matrix factorization technique that decomposes a matrix into three matrices: .
It is widely used in dimensionality reduction, noise reduction, and image compression.
Mathematics:
Given a matrix of size m x n , it is decomposed into:
U : An m x m orthogonal matrix
∑: An m x n diagonal matrix with singular values
VT : Transpose of an n x n orthogonal matrix
A = U∑VT
Python Program:
import numpy as np
A = np.array([[3, 2, 2], [2, 3, -2]])
U, S, VT = np.linalg.svd(A)
print("Matrix A:\n", A)
print("U:\n", U)
print("Sigma:\n", S)
print("V^T:\n", VT)
Practical 2 - Principal Component Analysis (PCA)
Concept:
PCA is used to reduce the dimensionality of datasets while preserving variance. It transforms
data into a new coordinate system such that the greatest variance lies on the first principal
component.
Steps:
1. Standardize the data
2. Calculate covariance matrix
3. Compute eigenvalues and eigenvectors
4. Choose k largest eigenvectors
5. Transform original data
Python Program:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np
X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0]])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=1)
X_pca = pca.fit_transform(X_scaled)
print("Original Data:\n", X)
print("PCA Result:\n", X_pca)
Practical 3 - K-Means Clustering
Concept:
K-Means is an unsupervised learning algorithm used to partition a dataset into a set of distinct,
non-overlapping subgroups (clusters). The algorithm works by minimizing the sum of squared
distances between each point and the centroid of its cluster. The number of clusters (k) is a
hyperparameter.
Working:
1. Choose the number of clusters (k).
2. Initialize k centroids randomly.
3. Assign each data point to the nearest centroid.
4. Compute new centroids by averaging all points in each cluster.
5. Repeat steps 3 and 4 until convergence (i.e., centroids do not change significantly).
Advantages:
Simple to implement.
Efficient for large datasets.
Limitations:
Requires specifying the number of clusters.
Sensitive to outliers and initial centroid placement.
Python Program
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
X = np.array([[1,2],[1.5,1.8],[5,8],[8,8],[1,0.6],[9,11]])
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=100, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red')
plt.title('K-Means Clustering')
plt.xlabel('X1')
plt.ylabel('X2')
plt.grid(True)
plt.show()
Practical 4 - Branch and Bound
Concept:
Branch and Bound is an algorithmic paradigm used for solving combinatorial and optimization problems
such as the Traveling Salesman Problem (TSP) and 0/1 Knapsack Problem. It systematically explores
branches of a decision tree, which represent subsets of the solution space.
Working Steps for 0/1 Knapsack:
1. Sort items by decreasing value-to-weight ratio.
2. At each node in the decision tree, make a decision to either include or exclude an item.
3. Calculate an upper bound of the maximum profit possible from that node.
4. If the bound is less than the best solution found so far, prune the branch.
Use Case:
Useful in solving problems where brute-force is infeasible due to time complexity but optimal solution is
still desired.
Python Program:
class Item:
def __init__(self, value, weight):
self.value = value
self.weight = weight
def knapsack(items, W):
n = len(items)
items.sort(key=lambda x: x.value/x.weight, reverse=True)
def bound(i, weight, value):
if weight >= W:
return 0
profit_bound = value
total_weight = weight
while i < n and total_weight + items[i].weight <= W:
total_weight += items[i].weight
profit_bound += items[i].value
i += 1
if i < n:
profit_bound += (W - total_weight) * (items[i].value / items[i].weight)
return profit_bound
maxProfit = 0
def dfs(i, weight, value):
nonlocal maxProfit
if i == n:
if value > maxProfit:
maxProfit = value
return
if weight + items[i].weight <= W:
dfs(i + 1, weight + items[i].weight, value + items[i].value)
if bound(i + 1, weight, value) > maxProfit:
dfs(i + 1, weight, value)
dfs(0, 0, 0)
return maxProfit
items = [Item(60, 10), Item(100, 20), Item(120, 30)]
capacity = 50
print("Maximum Profit: ", knapsack(items, capacity))
Practical 5 - Data Visualization
Concept:
Data visualization involves representing data in a graphical or pictorial format to discover patterns, trends,
and insights. It makes complex data more accessible and understandable.
Common Types of Charts:
Bar Plot
Line Plot
Histogram
Heatmaps
Python Program:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Sample data
data = pd.DataFrame({
'Subject': ['Math', 'Science', 'English', 'History'],
'Marks': [88, 92, 84, 76]
})
# Bar Plot
sns.barplot(x='Subject', y='Marks', data=data)
plt.title('Student Marks in Subjects')
plt.xlabel('Subject')
plt.ylabel('Marks')
plt.grid(True)
plt.show()
Practical 6 - Feature Selection
Concept:
Feature selection is the process of selecting the most relevant features from the dataset to
improve the model's accuracy, reduce overfitting, and decrease computation time.
Types of Feature Selection Techniques:
1. Filter Methods: Use statistical measures like chi-square, correlation, and mutual
information.
2. Wrapper Methods: Use predictive models to evaluate feature subsets (e.g., Recursive
Feature Elimination).
3. Embedded Methods: Features are selected during model training (e.g., Lasso regression).
Python Program using Filter Method:
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_new = SelectKBest(chi2, k=2).fit_transform(X, y)
print("Original shape:", X.shape)
print("Reduced shape:", X_new.shape)
Practical 7 - Image Classification
Concept:
Image classification refers to assigning a label to an image based on its visual content. It is
commonly performed using Convolutional Neural Networks (CNNs) or transfer learning from
pre-trained models.
Workflow:
1. Preprocess the image (resize, normalize)
2. Load a pre-trained model (e.g., MobileNetV2)
3. Pass the image to the model
4. Interpret the top predictions
Python Program using Pre-trained Model:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
import numpy as np
model = MobileNetV2(weights='imagenet')
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
print("Predicted:", decode_predictions(preds, top=3)[0])