Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views12 pages

2014 AISC Dimce

This paper presents a novel approach for robust face recognition using group sparse representation, which incorporates locality constrained regularization. The proposed method formulates the recognition problem as a bounded distance regularized L2 norm minimization with non-convex constraints, allowing for flexible group structures. Empirical results demonstrate that this method outperforms existing sparse coding techniques in face recognition tasks under various conditions.

Uploaded by

practivasjobs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

2014 AISC Dimce

This paper presents a novel approach for robust face recognition using group sparse representation, which incorporates locality constrained regularization. The proposed method formulates the recognition problem as a bounded distance regularized L2 norm minimization with non-convex constraints, allowing for flexible group structures. Empirical results demonstrate that this method outperforms existing sparse coding techniques in face recognition tasks under various conditions.

Uploaded by

practivasjobs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Robust face recognition by group sparse

representation that uses samples from list of


subjects

Dimce Kostadinov1 , Sviatoslav Voloshynovskiy1 , Sohrab Ferdowsi1 , Maurits


Diephuis1 , Rafal Sherer2 and Marcin Gabryel2
1
University of Geneva, Computer Science Department,
7 Route de Drize, Geneva, Switzerland
group web page: http://sip.unige.ch,
[email protected]
2
Czstochowa University of Technology, Department of Computer Science,
Al. Armii Krajowej 36, 42-200 Czstochowa, Poland

Abstract. In this paper we consider group sparsity for robust face


recognition. We propose a model for inducing group sparsity with no con-
straints on the definition of the structure of the group, coupled with lo-
cality constrained regularization. We formulate the problem as bounded
distance regularized L2 norm minimization with group sparsity induc-
ing, non-convex constrains. We apply convex relaxation and a branch
and bound strategy to find an approximation to the original problem.
The empirical results confirm that with this approach of deploying a very
simple non-overlapping group structure we outperform several state-of-
the-art sparse coding based image classification methods.

Keywords: Face recognition, sparse representation, group sparsity

1 Introduction

Automatic human face recognition systems are used in a wide range of real world
practical applications related to identification, verification, posture/gesture recog-
nition, social network linking and multimodal interaction. In the last ten years,
the problem of face recognition was intensively studied in different domains
including biometrics, computer vision and machine learning with the main em-
phasis on recognition accuracy under various acquisition conditions and more
recently on security and privacy.
In the past Nearest Neighbor (NN) [1] and Nearest Feature Subspace (NFS)
[2] were used for classification. NN classifies the query image by only using its
Nearest Neighbor. NN utilizes the local structure of the training data and is
therefore easily affected by noise. NFS approximates the query image by using
all the images belonging to an identical class, using the linear structure of the
data. Class prediction is achieved by selecting that class of images that mini-
mizes the reconstruction error. NFS might fail in the case that classes are highly
II

correlated to each other. Certain aspects of these problems can be overcome


by Sparse Representation based Classification (SRC) [3]. According to SRC, a
dictionary is first learned from training images, which can be acquired from the
same subject under different viewing conditions or from various subjects. At the
recognition stage, a query image is first sparsely encoded using the codewords
of the learned dictionary after which the classification is performed by verify-
ing which class yields the smallest coding errors. Other improvement methods
over SRC include for example the Gabor feature based SRC (GSRC) method [4]
which extracts Gabor features to represent face images and estimates an occlu-
sion dictionary for sparse errors and the metaface learning based SRC method
which trains a limited number of metafaces for each class [5]. On the other hand
Qinfeng S. at al. [6] argue that the lack of sparsity in the data means that the
compressive sensing approach cannot be guaranteed to recover the exact signal
and therefore that sparse approximations may not deliver the desired robustness
and performance. It has also been shown [7, 8] that in some cases, the locality of
the dictionary code words is more essential than the sparsity. Another extension
of SRC [9], called Weighted Sparse Representation based Classification (WSRC)
integrates the locality structure of the data into a sparse representation in a
unified formulation.
While the previous methods can only promote independent sparsity [10], one
can partition variables into disjoint groups and promote group sparsity using the
so called group Lasso regularization [11]. To induce more sophisticated struc-
tured sparsity patterns, it becomes essential to use structured sparsity-inducing
norms built on overlapping groups of variables [12, 13]. In a direction related to
group sparsity, Elhamifar and Vidal [14] proposed a more robust classification
method using a structured sparse representation, while Gao at al. [15] introduced
a kernelized version of SRC. Zing at al. The authors in [16] improve SRC by con-
structing a group structured dictionary by concentrating sub-dictionaries of all
classes. Wu at al. [17] introduced a class of structured sparsity inducing norms
into the SRC framework to model various corruptions in face images caused
by misalignment, shadow (due to illumination change), and occlusion, and de-
velop an automatic face alignment method based on minimizing the structured
sparsity norm.
Group Lasso is proven [18] to be robust to stochastic Gaussian noise due to
the stability associated with the group structure, however this is true and valid
when the signal is strongly group sparse or covered by groups with large sizes.
Here we present a more general approach with weaker assumptions on the group
sparsity structure. That is, we propose a method for inducing group sparsity
with an arbitrarily structure coupled with locality constrained regularization, by
introducing non-convex constraints on the approximation coefficients. Further-
more we propose an approximate solution using a branch and bound strategy
for solving non-convex optimization problems.
The motivation is three fold: (i) first via this approach we can impose any
structure on the sparsity, (ii) by introducing locality constrained regularization
we can control the impact of the locality in the approximation (iii) when one
III

adopts the SRC set-up the recognition rate is related to the reconstruction that
uses the samples from the correct subject (the one that is related to the probe
sample). By defining appropriate simple groups that impose structured sparsity,
the approximation might use samples from list of subjects. In this case we have
an restricted reconstruction. Letting just a few groups to be (non-zero) active,
one might expect an increase in recognition rate due to increase of the total error
by the restricted reconstruction.
In this paper we empirically validate our proposed method based on a simple
group sparse representation that uses samples from fixed list of subjects (simi-
larly to list decoding) with and without locality constrained regularization, and
consider face image variability induced by factors as noise, lightning, expression
or pose.
This paper is organized as follows. In Section 2 we give the basic problem
formulation, in Section 3 we present and explain our proposed method. The
results of the computer simulations are presented in Section 4. Finally Section 5
concludes the paper.
Notations: We use capital bold letters to denote real valued matrices (e.g,
W ∈ <M xN ) and small bold letters to denote real valued vectors (e.g. x ∈ <M ).
We use sub and upper indexed vector as sample data (vector) single realization
from a given distribution (e.g. xi (m) ∈ <M , where m denotes the sample from
distribution). We denote an element of a vector as x. The estimate of x is de-
noted as x̂. All vectors have finite length, explicitly defined where appropriate.
We denote an optimization problem that considers norm approximation without
a prior with A, if that problem considers L1 -norm approximation we denote it
with AL1 , if it considers L2 -norm approximation we denote it with AL2 . If the
optimization problem that includes a fidelity function (e.g. L2 -norm approxima-
tion) and a prior (e.g. L2 -norm prior) we denote the problem with AL2 PL2 . We
denote a classifier operating on the L2 -norm by C2 and on the L1 -norm by C1.

2 Problem formulation

The face recognition system consists of two stages: enrolment and identification.
At the enrolment stage, the photos from each subject are acquired and or-
ganized in the form of a codebook. We will assume that the recognition system
should recognize K subjects. The photo of each subject i, 1 ≤ i ≤ K, is acquired
under different imaging conditions such as lighting, expression, pose, etc., which
will represent the variability of face features and serve as intra-class statistics. To
investigate the upper limit of performance we will also assume that the frontal
face images are aligned to the same scale, rotation and translation (as in [3]).
Therefore each subject i is defined by xi (m) ∈ <N vectors representing a con-
catenation of aligned image columns with 1 ≤ m ≤ M . The samples from all
subjects are arranged into a codebook represented by a matrix:
IV

W = [x1 (1), ..., x1 (m), ..., x1 (M ), ..., xi (1),


..., xi (m), ..., xi (M ), ..., xK (1), ..., xK (m), ..., xK (M )] (1)
N x(K∗M )
∈< .

At the recognition stage, a probe or query y ∈ <N is presented to the system.


The system should identify the subject i as accurate as possible based on y
and W. It is also assumed that y always corresponds to one of the subjects
represented in the database.
In the scope of this paper, face recognition is considered as a classification
problem where the classifier should produce a decision in favour of some class i
whose codebook codewords produce the most accurate approximation of probe
y. One important class of approximations is represented by a sparse linear ap-
proximation [3], when the probe y is approximated by ŷ in the form of:

ŷ = Wα , (2)
where α ∈ <M xK is a sparse coding vector. The coding vector α weights
the codebook codewords gathered from all classes to favour the contribution of
codewords corresponding to the correct class î. The model of approximations
can be represented as:

y = ŷ + r , (3)
N
where r ∈ < is the residual error vector of the approximation.
For each class i, let δi : <K∗M → <K∗M be a function that selects the
coefficients associated with the ith class. For α ∈ RK∗N , δi (α) is a new vector
whose only nonzero entries are the entries in α that are associated with class i.
Using the coefficients that are only associated with the ith class, one can find
the class approximation ŷi ∈ RM to the given test sample y:

ŷi = Wδi (α) , (4)


and

y = ŷi + ri , (5)
where ri ∈ <N is the residual error vector of the approximation to class i.
Then the probe y is classified based on the approximation î that minimizes
the L2 -norm of the residual error vector between y and ŷi . This classifier is
denoted as C2:

C2 : î = arg min (kri k2 ) = arg min kWδi (α) − yk2 . (6)


1≤i≤K 1≤i≤K

Because we use the δi function in this approximation, which is a hard assign-


ment non-linear function, it might change the optimality of the found solution
in the L2 norm, which is known to be unstable. In order to be able to more
V

robustly tackle this problem, we propose a classifier based on the approximation


î that minimizes the L1 -norm of the residual error vector between y and ŷi and
we denote this classifier as C1:

C1 : î = arg min (kri k1 ) = arg min kWδi (α) − yk1 . (7)


1≤i≤K 1≤i≤K

In more general case, the equations (6) and (7) correspond to the minimum
Lp distance classification, where if p = 2 one has the Euclidean distance and if
p = 1 one has the Manhattan distance. A natural extension to (6) and (7) that
might be considered is a bounded distance decoding (BBD) rule when:

î = {i ∈ {1, · · · , M } : kWδi (α) − ykp ≤ η} , (8)


where η ≥ 0. The BDD rule is useful when the classifier should be able
to reject probes that are unrelated to the database. In the general case, the
BDD will produce a list of candidates that satisfy the above condition. To have
only one unique î on the list, the parameter η should be chosen accordingly.
Geometrically in the Lp space, it means that the Lp spheres with radius η around
each approximation for each class should not overlap thus producing a unique
classification.

3 Proposed model for group sparsity

Here we propose a model for group sparse approximation with locality con-
strained regularization. We present a general problem formulation as approxi-
mation with priors and non-convex constraints that induce use of data samples
from a variable list size of subjects. This is illustrated on Fig. 1.

Fig. 1.
a) b) c)
a) Ideal case, no noise present, where we have data samples from 4 subjects
and the black dot represents the probe sample, b) A case where we have noise, b)
Group sparsity constraints form the three active groups represented as circles filed
with color, the small circle with dash line represents the boundary of the locality
constrained regularization
VI

Let y be a test sample and W denote all training samples as in (1), than we
define the problem as follows:

min kWα − yk2 +λ ∗ φ(α, w)


α,s,c
subject to
ψ(gk (α)) + s(k)/c = 1/c, ∀k ∈ G,
kαk1 = 1,
kar(G)
X
s(k) = kar(G) − c,
k=1
kar(G)
s ∈ {0, 1} , c ∈ Z, α ∈ RK∗M . (9)

In the above equation φ(α, w) is a proximity prior distribution, describing


our prior about the location of the probe sample within all data samples. This
prior penalizes distance between y and each training data xi (m):

K∗N
X
φ(α, w) = |α(i) ∗ w(i)|, (10)
i

where w is a vector defined as:

w = [ky − x1 (1)k1 , ..., ky − xi (m)k1 , ..., ky − xK (N − 1)k1 , ky − xK (N )k1 ] ,

In equation (9) G is the set of the defined groups, the group gk (α) is subset
of the set that consists of all the approximation coefficients α. There are in
total kar(G) groups, where kar(G) is the cardinality of the set G. ψ(gk (α)) is a
function that sums the absolute values of all the coefficients α included in the
group gk (α):
X
ψ(gk (α)) = |α(i)|, (11)
i∈Igk

where Igk is a set of indexes that denote the coefficients α indexes that belong
to the group gk (α).
In general case the structure of the group gk (α), concretely what coefficients
α belong to the group gk (α) and the set of groups G can be arbitrary defined.
Here we empirically validate the set-up in which we partition the approximation
coefficients α into groups that are related to the samples from every subject,
this results in a number of non-overlapping groups that is equal to the number
of subjects K.
VII

α(1), α(2), . . . , α(M ), . . . , α(M + 1), α(M + 2), . . . , α(M + M ), . . . ,


| {z } | {z }
g1 (α) g2 (α)

α((K − 1) ∗ M + 1), α((K − 1) ∗ M + 2), . . . , α((K − 1) ∗ M + M )


| {z }
gK (α)

The integer c is the number of active groups, those groups that have a non-
zero sum of absolute values and s is binary slack variable. The first two con-
straints of the problem ensure to have non-zeros values for the coefficients that
are included in the sparsity inducing group and zeros at the remaining coeffi-
cients. Note also that these two constraints also enforce to have non-zero values
of coefficients for multiple active groups and zero at the all the remaining co-
efficients. The third constraint ensures the coefficient values to be normalized,
the fourth constraint ensures in having exactly c active groups. These last two
constraints are important to insure that the first two constants are valid.

This non-convex mixed integer program can be also interpreted as a distance


constrained, variable sized list decoder, where the actual list is a list of spar-
sity inducing groups gk (α). We solve this problem using the branch and bound
method [19], (other alternative is the cutting plane method [19]). At each branch
we solve a convex relaxation of the problem (9). The number of branches that
have to be visited to solve this problem is proportional to the product of the
number of groups and the number of active non-zero groups, so the method has
bilinear executions of the convexly relaxed problem in these two variables.

Letting the integer c variable be fixed and small, such that it suffices to have
data samples from a small list of subjects empirically has proven to be efficient.
The problem formulation where c variable is fixed is identical to the problem
defined in (9), except that now we have a less complex problem. Applying convex
relaxation for the vector of binary variables s, expressing the constrains kαk1 = 1
and ψk (α) + s(k)/c = 1/c, ∀k ∈ G in a convexly relaxed form we then have the
following convex problem:
VIII

minkWα − yk2 +λ ∗ φ(α, w)


α,s
subject to
α <= αa ,
α >= −αa ,
X
αa (i) + s(k)/c = 1/c, ∀k ∈ G,
i∈Igk
K∗N
X
αa (i) = 1,
i
kar(G)
X
s(k) = kar(G) − c,
k
s >= 0, s <= 1,
s(iCR ) = vCR . (12)

where we fix the variable s(iCR ) to have values vCR ∈ {0, 1} and let the
remaining variables of s have values between 0 and 1.
After obtaining the group sparse solution α̂ ∈ <M xK , the identification is
performed using equation (6) or (7).

4 Computer simulation

In this section we present the results of the computer simulation, which are
organized in two parts. In the first part, we present the results using sparsity
priors that promote independent element-wise sparsity and the result using an
approximation that induces group sparsity is presented in the second part. In all
the parts of the computer simulation, we present the results using two classifiers:
C1 (equation (6)) and C2 (equation (7)).
The computer simulation is carried out on publicly available data. The used
database is Extended Yale B for face recognition. This database consists of 2414
frontal face images of 38 subjects captured under various laboratory-controlled
lighting conditions [20]. All the images from this database are cropped and nor-
malized to 192x168 pixels.
In our set up, the images from the dataset are rescaled to 10x12 pixels using
nearest neighbor interpolation. In all of the computer simulations we use raw,
basic, elementary image pixel values (block of image pixel values) as features.
To be unbiased in our validation of the results we use 5-fold cross validation,
where for a single validation for each subject, half of the images are selected at
random for training and the remainder for testing.
All of the optimization problems presented in the previous chapters are solved
using CVX [21]. In all of the regularized optimization problems, the regulariza-
tion parameters were chosen to maximize the classification accuracy.
IX

First we present the results using approximations with sparsity priors that
promote independent element-wise sparsity. In this set-up we show the recogni-
tion accuracy under several models of approximation:
– L2-norm and L1-norm approximation as baseline without priors
– L2-norm and L1-norm approximation with priors that have a Laplacian and
Gaussian distribution
– SRC
– LLC [22]
– WSRC
The resulting estimates are tested with the C1 and C2 classifiers. The details
about the set-up of the parameter for all the above models can be found in [23].
The results presented in bold signifies the best achieved results.

Table 1. Identification precision under approximations with sparsity priors that pro-
mote independent element-wise sparsity

AL2 AL1 G-ALLC G-AW SRC


C2 C1 C2 C1 C2 C1 C2 C1
No prior 90.8% 91% 87.8% 82.9% - - - -
PL2 90.6% 91% 87.5% 89.3% 91.6% 89.8% - -
PL1 93.7 % 94.5% 91.6% 94.1% - - 93.2% 94%

In the second series of computer simulations we present the identification


accuracy of the proposed approximation that promotes group sparsity (GS) with
and without locality constrained regularization (LCR). Here we define a very
simple non-overlapping group where the number of groups is equal to the number
of subjects and every group covers the data samples from a single subject. We
present the results using a fixed number of active groups, where we set the
number of active groups (variable c in Equation 12) to 3. In the set-up where
we use locality constrained regularization, the Lagrangian multiplier λ is set to
3. Table 2 shows the results for the above method. The results presented in bold
signifies the best achieved results.

Table 2. Identification precision under approximations that induces group wise spar-
sity

AGS AGS LCR


C2 C1 C2 C1
93.2% 95.3% 93.1% 94.7%

As can be seen from Table 1 the C1 classifier demonstrates superior perfor-


mance in comparison to the C2 classifier. For the approximations with sparsity
X

priors that promote independent element-wise sparsity the impact of the ap-
proximation model is negligible under the PL1 prior and C1 classifier and the
L2-norm regularization is non-informative. The best result is achieved for the
G − AL2 PL1 C1 setup. Further detailed explanation of the impact of indepen-
dent element-wise sparsity based on the prior distribution of the approximation
coefficients can be found in [23].
As can be seen from Table 2 considering group sparsity again, the C1 classifier
demonstrates superior performance in comparison to the C2 classifier and it’s
robustness. The best results are achieved using the proposed model without
locality constraints regularization and the C1 classifier.

5 Conclusion

In this paper we consider group sparsity for robust face recognition. With our
proposed model for inducing group sparsity we have empirically shown that using
a very simple non-overlapping group structure we outperform several state-of-
the-art sparse coding based image classification methods.
One further possible future direction that might bring improvement is to
autonomously infer the underling group sparsity structure that is related to the
most accurate recognition rate.

6 Acknowledgements

The research has been partially supported by the research project PSPB-125/2010.

7 References

[1] T.M. Cover, P.E. Hart, ”Nearest neighbor pattern classication”, IEEE Trans-
actions on Information Theory 13 (1967) 2127.
[2] S. Shan, W. Gao, D. Zhao, ”Face identication from a single example
image based on face-specic subspace (FSS)”, IEEE International Conference on
Acoustic, Speech and, Signal Processing, 2002, pp. II/2125II/2128.
[3] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, ”Robust face recog-
nition via sparse representation”, IEEE Transactions on Pattern Analysis and
Machine Intelligence 31 (2009) 210227.
[4] Meng Yang, Lei Zhang, ”Gabor Feature Based Sparse Representation for
Face Recognition with Gabor Occlusion Dictionary”, Computer Vision ECCV
2010, Lecture Notes in Computer Science Volume 6316, 2010, pp 448-461
[5] Meng Yanga, Lei Zhanga1, Jian Yangb and David Zhanga, ”METAFACE
LEARNING FOR SPARSE REPRESENTATION BASED FACE RECOGNI-
TION”, ICIP 2010
[6] Qinfeng Shi, Anders Eriksson, Anton van den Hengel, Chunhua Shen,
”Is face recognition really a Compressive Sensing problem?”, IEEE Computer
XI

Society Conference on Computer Vision and Pattern Recognition (CVPR 11),


Colorado Springs, USA, June 21-23, 2011.
[7] Adam Coates and Andrew Y. Ng., ”The Importance of Encoding Versus
Training with Sparse Coding and Vector Quantization”, ICML 28, 2011.
[8] Adam Coates, ”Demystifying Unsupervised Feature Learning”, PhD the-
sis. Stanford University (2012).
[9] Can-Yi Lua,Hai Mina, Jie Gui, Lin Zhua, Ying-Ke Lei ”Face recognition
via Weighted Sparse Representation”, JVCIR, Volume 24, Issue 2, February
2013, Pages 111116.
[10] Tibshirani, R, Regression shrinkage and selection via the Lasso. Journal
of the Royal Stat. Soc., Series B, pages 267-288 (1996)
[11] Jerome Friedman, Trevor Hastie and Robert Tibshiraniz, ”A note on the
group lasso and a sparse group lasso”, Stanford Statistic Department, February
11, 2010
[12] Jenatton, R., Audibert, J.-Y., Bach, F. ”Structured variable selection
with sparsity-inducing norms”, JMLR, Oct. 12, pp 2777-2824, 2011
[13] Zhao, P., Rocha, G., Yu, B. ”The composite absolute penalties family
for grouped and hierarchical variable selection” Annals of Statistics, 37(6A), pp.
3468-3497, 2009
[14] E. Elhamifar, R. Vidal, ”Robust classication using structured sparse
representation”, IEEE Conference on Computer Vision and Pattern Recognition,
2011, pp. 18731879.
[15] S. Gao, I. Tsang, L.-T. Chia, ”Kernel sparse representation for image
classication and face recognition”, K. Daniilidis, P. Maragos, N. Paragios (Eds.),
Computer Vision ECCV 2010, Springer, Berlin/Heidelberg, 2010, pp. 114.
[16] Shu Kong, Donghui Wang, ”A Dictionary Learning Approach for Classi-
fication: Separating the Particularity and the Commonality”, Computer Vision
ECCV 2012, Lecture Notes in Computer Science Volume 7572, 2012, pp 186-199
[17] J. Z. Huang, X. L. Huang, and D. Metaxas, ”Learning with dynamic
group sparsity” in CVPR, pp. 64-71, 2009
[18] Junzhou Huang and Tong Zhang, ”The Benefit of Group Sparsity”, Cor-
nell University, Statistics, Machine Learning,
[19] Bertstecas D, ”Constraind optimization and Lagrange multiplier meth-
ods” Academic Press, 1982
[20] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, ”From few to many:
illumination cone models for face recognition under variable lighting and pose”,
IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001)
643660.
[21] Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined
convex programming, version 2.0 beta. http://cvxr.com/cvx, September 2013.
[22] Jinjun Wang, Akiira Media Syst., Palo Alto, CA, USA Jianchao Yang;
Kai Yu ; Fengjun Lv ; Huang, T. ; Yihong Gong ”Locality-constrained Lin-
ear Coding for image classification”, IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 3360 - 3367, 13-18 June 2010.
XII

[23] D. Kostadinov, S. Voloshynovskiy, and S. Ferdowsi, ”Robust human face


recognition based on locality preserving sparse over complete block approxima-
tion,” in Proc. Proceedings of SPIE Photonics West, Electronic Imaging, Media
Forensics and Security V, San Francisco, USA, 2014.

You might also like