Face recognition using PCA
Principal Component Analysis
● Images are high dimensional correlated data.
● The goal of PCA is to reduce the dimensionality of the data by retaining as much
variation possible in our original data set.
● The simplest way is to keep one variable and discard all others
● In PCA, we can see an intermediate stage as visualization.
Dimensionality Reduction
Here we have 3 dimensions x,y, and z. If we remove Z than the change will be insignificant.
Dimensions are reduced from 3 dimensions to 2 dimensions only now such that our
data points are much denser in the space. Classification accuracy will be increased.
We want to preserve the planes where we have most variations in our data
In the above diagram, we have high variance in both x and y-axis so it won’t be easier to
lose one of the axes. In PCA, we try to preserve as much variance as possible. Let’s find a
hidden dimension or vector such that we still preserve a lot of our information but have a
compact representation of our data
We will draw a line Z on these points such that all the points will fall on this z-axis. We try to
find a direction such that these points don’t move a lot, we want these points to still stay the
same as the XY plane. We can’t lose an axis as it will take away a lot of features present in
the data, instead, we find a way to project point on a new axis such that all the points have
variance between them. The new axis preserves most of the information but still goes from 2
dimensions to 1 dimension.
Steps For PCA
1. Prepare Dataset
Each image is 64 x 64 pixels photo and it is flattened in the 4096 column. And stored in a
CSV file.
2. Apply PCA
In the right, we can see the maximum variance that is the highlighted whitish portion in
photos. PCA is the only method where we can visualize in the middle process of the
algorithm. We have reduced the pixels in the photo to increase the performance.
3. Select the number of components
In the above diagram, we can see that the maximum variance is captured in the first 5
components about 95% of the information. There is no point in keeping the rest and that’s
where dimensionality reduction comes because we are discarding some components.
So, we will plot this graph to see how many components we need.
4. Project Input Images to Principal Components
5. Train your classifier with the reduced set of features
6. Apply PCA Transform to Test Input
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from time import time
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.decomposition import PCA
from sklearn.svm import SVC
##Helper functions. Use when needed.
def show_orignal_images(pixels):
#Displaying Orignal Images
fig, axes = plt.subplots(6, 10, figsize=(11, 7),
subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes.flat):
ax.imshow(np.array(pixels)[i].reshape(64, 64), cmap='gray')
plt.show()
def show_eigenfaces(pca):
#Displaying Eigenfaces
fig, axes = plt.subplots(3, 8, figsize=(9, 4),
subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes.flat):
ax.imshow(pca.components_[i].reshape(64, 64), cmap='gray')
ax.set_title("PC " + str(i+1))
plt.show()
## Step 1: Read dataset and visualize it.
df = pd.read_csv("face_data.csv") #pandas method to import csv code
# print(df.head())
labels = df["target"] #save target in labels
pixels = df.drop(["target"],axis=1) #save everything on pixels except target
#show_orignal_images(pixels)
## Step 2: Split Dataset into training and testing
x_train,x_test,y_train,y_test = train_test_split(pixels,labels)
## Step 3: Perform PCA.
pca = PCA(n_components=135).fit(x_train) #PCA from scikitlearn library
plt.plot(np.cumsum(pca.explained_variance_ratio_))
#plt.show()
#show_eigenfaces(pca)
## Step 4: Project Training data to PCA
x_train_pca = pca.transform(x_train)
## Step 5: Initialize Classifer and fit training data
clf = SVC(kernel = 'rbf',C=1000,gamma=0.01 )
clf = clf.fit(x_train_pca,y_train)
## Step 6: Perform testing and get classification report
x_test_pca = pca.transform(x_test)
y_pred = clf.predict(x_test_pca)
print( classification_report(y_test,y_pred))