Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views74 pages

Lecture06 Graph Visualization

Uploaded by

Norbert Durand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views74 pages

Lecture06 Graph Visualization

Uploaded by

Norbert Durand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

1

CS6208 : Advanced Topics in Artificial Intelligence


Graph Machine Learning

Lecture 6 : Graph-based Visualization


Semester 2 2022/23

Xavier Bresson
https://twitter.com/xbresson

Department of Computer Science


National University of Singapore (NUS)

Xavier Bresson 1
2

Course lectures

Introduction to Graph Machine Learning Part 3 : GML with deep feature learning,
Part 1: GML without feature learning a.k.a. GNNs (after 2016)
(before 2014) Graph Convolutional Networks
Introduction to Graph Science (spectral and spatial)
Graph Analysis Techniques without Weisfeiler-Lehman GNNs
Feature Learning Graph Transformer & Graph
Graph clustering ViT/MLP-Mixer
Graph SVM Benchmarking GNNs
Recommendation on graphs Molecular science and generative GNNs
Graph-based visualization GNNs for combinatorial optimization
Part 2 : GML with shallow feature learning GNNs for recommendation
(2014-2016) GNNs for knowledge graphs
Shallow graph feature learning Integrating GNNs and LLMs

Xavier Bresson 2
3

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 3
4

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 4
5

Visualization
The visualization task involves projecting high-dimensional data, s.a. images, text documents,
user/product attributes, sequences of actions, etc into 2D or 3D low-dimensional Euclidean spaces
to reveal underlying data structures.
This projection is achieved using dimensionality reduction techniques, which aim to compress the
original information while discarding unnecessary details and noise.

+
28 x 28 MNIST images
Visualization of MNIST
images in R3
Xavier Bresson 5
6

Dimentionality reduction
Two classes of dimensionality reduction techniques have been developed :
Linear Techniques: These methods produce low-dimensional Euclidean (flat) spaces.
Common examples include Principal Component Analysis (PCA)[1], Linear Discriminant
Analysis (LDA)[2], and Independent Component Analysis (ICA)[3].
Non-Linear Techniques: These methods compute low-dimensional manifolds, i.e. curved
hyper-surfaces.
Standard techniques are Kernel methods[4], Locally Linear Embedding (LLE)[5], Laplacian
Eigenmaps[6], t-distributed Stochastic Neighbor Embedding (TSNE)[7], and Uniform Manifold
Approximation and Projection (UMAP)[8].

[1] Pearson, On lines and planes of closest fit to systems of points in space, 1901
[2] Fisher, The Use of Multiple Measurements in Taxonomic Problems, 1936
[3] Herault, Jutten, Architectures neuromimétiques adaptatives: Détection de primitives, 1985
[4] Scholkopf et-al, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, 1998
[5] Roweis, Saul, Nonlinear dimensionality reduction by locally linear embedding, 2000
[6] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003
[7] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008
[8] McInnes et-al, UMAP: Uniform manifold approximation and projection for dimension reduction, 2018

Xavier Bresson 6
7

Linear dimensionality reduction


Assumption: The data distribution exists within a low-dimensional Euclidean space.

Linear
<latexit sha1_base64="riA6Z2VOF7Rb8GSp9v+9jfjr2IA=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Ygl9Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8Ayf6N+w==</latexit>

xi Dimensionality zi
<latexit sha1_base64="yBAAEqp91WhTUwFixWJWAf74gv8=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Qg19Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8AzQ6N/Q==</latexit>

Reduction <latexit sha1_base64="zudZ2wWTCJyijXHQ0f0/v8i6IV4=">AAAB9HicbVDLSsNAFJ34rPFVdelmsAgupCRFqsuKG5cV7APaUCaTSTt0HnFmUiih3+HGhSJu/Rh3/o2TNgttPXDhcM693HtPmDCqjed9O2vrG5tb26Udd3dv/+CwfHTc1jJVmLSwZFJ1Q6QJo4K0DDWMdBNFEA8Z6YTju9zvTIjSVIpHM01IwNFQ0JhiZKwU3A4yftnHkTQz1x2UK17VmwOuEr8gFVCgOSh/9SOJU06EwQxp3fO9xAQZUoZiRmZuP9UkQXiMhqRnqUCc6CCbHz2D51aJYCyVLWHgXP09kSGu9ZSHtpMjM9LLXi7+5/VSE98EGRVJaojAi0VxyqCRME8ARlQRbNjUEoQVtbdCPEIKYWNzykPwl19eJe1a1a9X6w9XlUatiKMETsEZuAA+uAYNcA+aoAUweALP4BW8ORPnxXl3Phata04xcwL+wPn8AYh9kUI=</latexit>

Am,·
) <latexit sha1_base64="9IBg7sQsT9cT9GjRz4uG60IMlWI=">AAAB9HicbVBNS8NAEN34WeNX1aOXxSJ4kJIUqR4rXjxWsB/QhrLZbNqlm924OymU0N/hxYMiXv0x3vw3Jm0O2vpg4PHeDDPz/FhwA47zba2tb2xubZd27N29/YPD8tFx26hEU9aiSijd9YlhgkvWAg6CdWPNSOQL1vHHd7nfmTBtuJKPMI2ZF5Gh5CGnBDLJux2k7mWfBgpmtj0oV5yqMwdeJW5BKqhAc1D+6geKJhGTQAUxpuc6MXgp0cCpYDO7nxgWEzomQ9bLqCQRM146P3qGzzMlwKHSWUnAc/X3REoiY6aRn3VGBEZm2cvF/7xeAuGNl3IZJ8AkXSwKE4FB4TwBHHDNKIhpRgjVPLsV0xHRhEKWUx6Cu/zyKmnXqm69Wn+4qjRqRRwldIrO0AVy0TVqoHvURC1E0RN6Rq/ozZpYL9a79bFoXbOKmRP0B9bnDytxkQY=</latexit>

A1,·

High-dimensional Euclidean space Low-dimensional hyper-plane


<latexit sha1_base64="EmC6jVfUHZ+VF1amJd/c25rirnE=">AAACPXicbVBNS+RAEO2o60d2dUc9eikcVvawDoks6lEQwaOKo8Jkduh0ajKNnU7orohDmD/mxf/gbW978aDIXvdqZ5yDH1tQ8Hj1inr14kJJS0Hw25uanvk0Oze/4H/+srj0tbG8cmbz0ghsi1zl5iLmFpXU2CZJCi8KgzyLFZ7Hl/v1/PwKjZW5PqVhgd2Mp1r2peDkqF7jdAMiwmsyWXUo08FmIjPUtZorOCiFkglyDbbgAkcQRf4GXPckRFJDlHEaxHF1MvqV/IAEojSFsJb4fq/RDFrBuOAjCCegySZ11GvcRUkuSneahOLWdsKgoG7FDUmhcORHpUVn4ZKn2HFQ8wxttxp/P4JvjkmgnxvXmmDMvt6oeGbtMIudsrZs389q8n+zTkn93W4ldVESavFyqF8qoBzqKCGRBgWpoQNcGOm8ghhwwwW5wOsQwvcvfwRnW61wu7V9/LO5tzWJY56tsXX2nYVsh+2xQ3bE2kywG/aHPbBH79a79568vy/SKW+ys8relPfvGYdMrDQ=</latexit>

Projection map
<latexit sha1_base64="wTYAwgitejPhPvDZmS5d7pnnjd4=">AAACOXicbVCxThtBEN2DEMCQxEBJM4oFShGsOxQBJRJNCgqDYkDyOdbe3hiv2N077c4B5uTfoslf0CHRUIAQbX4ge8ZFAnnSSE9v3mhmXpIr6SgMb4Op6Xcz72fn5msLix8+fqovLR+5rLAC2yJTmT1JuEMlDbZJksKT3CLXicLj5Gyv6h+fo3UyMz9omGNX81Mj+1Jw8lKv3lqHmPCSrC73s4uNVGo0lZkrGHi73cgVNziCOK6tw1VPQiwNxJrTIEnKw9FP/RU0xEpBWlk8evVG2AzHgLckmpAGm6DVq9/EaSYKv5eE4s51ojCnbsktSaFwVIsLhzkXZ/wUO54artF1y/HnI1jzSgr9zPoyBGP174mSa+eGOvHO6mb3uleJ/+t1CurvdEtp8oLQiJdF/UIBZVDFCKm0KEgNPeHCSn8riAG3XJAPuwohev3yW3K02Yy2mlsH3xq7m5M45tgq+8y+sIhts132nbVYmwl2ze7YA3sMfgX3wVPw/GKdCiYzK+wfBL//AJj0qsg=</latexit>

<latexit sha1_base64="eKXRZ9EWEw/rIH/5IChqwzQF2JU=">AAACPHicbVDBbtNAFFy3QFsXqKHHXlZEVOUS2VXVIqRIqXrhmArSRooj63mzSbbd9Vq7zyHByodx4SO4ceLSQyvElTPr1AdIGWml0cw8vX2T5lJYDMPv3tr6o8dPNja3/O2nz57vBC9eXlhdGMa7TEtteilYLkXGuyhQ8l5uOKhU8sv0+qzyL6fcWKGzjzjP+UDBOBMjwQCdlAQf9mmMfIZGlR2jrzirZKogX9A49p05BZNPBH1HZ4mgsRHjCYIx+hP9nIhW7R44703rdJmIfT8JGmEzXII+JFFNGqRGJwm+xUPNCsUzZBKs7UdhjoMSDAom+cKPC8tzYNcw5n1HM1DcDsrl8Qv62ilDOtLGvQzpUv17ogRl7VylLqkAJ3bVq8T/ef0CR28HpcjyAnnG7heNCklR06pJOhTG1SXnjgAzwv2VsgkYYOj6rkqIVk9+SC4Om9Fx8/j8qNE+rOvYJHvkFTkgETkhbfKedEiXMPKF/CC35M776t14P71f99E1r57ZJf/A+/0HZBus6g==</latexit>

x i 2 Rd , d 1 ' : xi ! zi = '(xi ) = Axi zi 2 R m , m ⌧ d

Xavier Bresson 7
8

Linear techniques
Task formalization : Restrict the mapping φ to be a linear operator A.
Several techniques exist to compute a linear operator A.
PCA, LDA, ICA, Non-negative matrix factorization[1] (NMF), Sparse Coding[2], etc.

<latexit sha1_base64="Tm2vb/7cR3DEzrAK9+G2aHSGiKk=">AAAEAnicfZNNb9NAEIZdh49ivlq4IHEZ0VKBlJa4QoVLpVZcOBZEP6RssNbrtb3Kem3tjtsklsWFv8KFAwhx5Vdw49+wTg00acucxu+8ftYzsw4LKQz2er8W3M6Vq9euL97wbt66fefu0vK9A5OXmvF9lstcH4XUcCkU30eBkh8VmtMslPwwHL5q6ofHXBuRq3c4Lvggo4kSsWAUrRQsuw/WYLJNjqkuUvFk9HR7dwTbHpE8xj54JOSJUBXVmo7rirHaVqhKJIfdoPK7hEU51tCFERA91QnxyLEVTZP8sw4vsXIVtXCPaJGkOLjsbClrbxL4Z/mTYHgxw7PyGkE+Qp1VcCIwrRsFRkQoklFMw7B6W7+PuhCRJAG/CwT+2lOLWI9ExlUzMyohokihyIXCU8pklpJ1ISPS2mYoMj/5L2R3FlJlBK3bgKpnMJFgzZqoHkMeQ0ERuVYGcg1258JAXKqpwbTUoBLtpOf4c9xVsbqO6R/gs+byUA2xkPaxQTURLK30NnrTgPOJ3yYrTht7wdJPEuWstE0jk9SYvt8rcGB3g4JJbrdTGl5QNqQJ79tUUdvwoJpe4RoeWyWC2HYW5wphqp59o6KZMeMstM6mLTNfa8SLav0S45eDSqiiRK7Y6UFxKQFzaP4HO2LNGcqxTSjTwn4rsJRqyuwoTDMEf77l88nB5oa/tbH15vnKzmY7jkXnofPIeeL4zgtnx3nt7Dn7DnM/uJ/cL+7XzsfO5863zvdTq7vQvnPfmYnOj98j3k2d</latexit>

2 3 2 3
hA1,· , xi z1
6 .. 7 6 .. 7
z = '(x) = Ax = 4 . 5=4 . 5
hAk,· , xi zk
with
x 2 Rd , d 1, high-dimensional data point
z 2 Rm , m ⌧ d, low-dimensional data point
A 2 Rm⇥n , dictionary of patterns or basis functions
Ai,· 2 Rn , i-th pattern/linear filter

[1] Lee, Seung, Learning the parts of objects by non-negative matrix factorization, 1999
[2] Olshausen, Field, Learning a sparse code for natural images, 1996

Xavier Bresson 8
9

Linear dimensionality reduction


An example where linear dimensionality reduction falls short in producing clear patterns.
This highlights the need for greater expressivity to uncover the underlying structures.

+
28 x 28 MNIST images Visualization of MNIST
images in R3 with PCA

Xavier Bresson 9
10

Non-linear dimensionality reduction


Assumption: Data distribution resides on low-dimensional curved spaces, known as
manifolds (which can be smooth or non-smooth).
Techniques designed to uncover these structures are referred to as manifold learning.

Non-Linear
xi Dimensionality
<latexit sha1_base64="riA6Z2VOF7Rb8GSp9v+9jfjr2IA=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Ygl9Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8Ayf6N+w==</latexit>

Reduction

) zi
<latexit sha1_base64="yBAAEqp91WhTUwFixWJWAf74gv8=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Qg19Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8AzQ6N/Q==</latexit>

<latexit sha1_base64="AiIhxF9B1xUHaE8x5Z+yTCIH0Ac=">AAACYXicbVFNSxxBEO0ZTaKbqBM9emlcEgwky4wE4yUgePFgQCWrws5mqemp1cbu6aG7xrgM8ye95ZKLf8Se3SX4kYKGx3uvqqtfZ6WSjuL4TxAuLL56/WZpufP23crqWvR+/cyZygrsC6OMvcjAoZIF9kmSwovSIuhM4Xl2fdDq5zdonTTFT5qUONRwWcixFECeGkW3H3lKeEtW10fm95dcaixaMyiuwRuNyhueph1v00BXAlT9wxOuyhzSjMuy+rT5lX/m6b9Rfkyz/ajh03fdypqnSvG8ndcZRd24F0+LvwTJHHTZvI5H0V2aG1H5/UgocG6QxCUNa7AkhcKmk1YOSxDXcIkDDwvQ6Ib1NKGGf/BMzsfG+lMQn7KPO2rQzk105p3t1u651pL/0wYVjfeGtSzKirAQs4vGleJkeBs3z6VFQWriAQgr/a5cXIEFQf5T2hCS509+Cc52eslub/fka3d/Zx7HEttkW2ybJewb22eH7Jj1mWB/g8VgJVgN7sPlMArXZ9YwmPdssCcVbj4AJ9u1Cw==</latexit>

High-dimensional Euclidean space Low-dimensional manifold


<latexit sha1_base64="EmC6jVfUHZ+VF1amJd/c25rirnE=">AAACPXicbVBNS+RAEO2o60d2dUc9eikcVvawDoks6lEQwaOKo8Jkduh0ajKNnU7orohDmD/mxf/gbW978aDIXvdqZ5yDH1tQ8Hj1inr14kJJS0Hw25uanvk0Oze/4H/+srj0tbG8cmbz0ghsi1zl5iLmFpXU2CZJCi8KgzyLFZ7Hl/v1/PwKjZW5PqVhgd2Mp1r2peDkqF7jdAMiwmsyWXUo08FmIjPUtZorOCiFkglyDbbgAkcQRf4GXPckRFJDlHEaxHF1MvqV/IAEojSFsJb4fq/RDFrBuOAjCCegySZ11GvcRUkuSneahOLWdsKgoG7FDUmhcORHpUVn4ZKn2HFQ8wxttxp/P4JvjkmgnxvXmmDMvt6oeGbtMIudsrZs389q8n+zTkn93W4ldVESavFyqF8qoBzqKCGRBgWpoQNcGOm8ghhwwwW5wOsQwvcvfwRnW61wu7V9/LO5tzWJY56tsXX2nYVsh+2xQ3bE2kywG/aHPbBH79a79568vy/SKW+ys8relPfvGYdMrDQ=</latexit>

Projection map
<latexit sha1_base64="Aoi8UKzbBdokfFmLFfF6vvL+Kd8=">AAACNnicbVDPSxtBGJ3V2sZoa6xHL0ODopewGyQWQQh46UWI0KiQDcu3k0kyzczOMvOtmi75q7z4d/TmxYMiXvsndDbuof54MPB473188704lcKi7996C4sflj5+qixXV1Y/f1mrrX89tTozjHeZltqcx2C5FAnvokDJz1PDQcWSn8WTo8I/u+DGCp38xGnK+wpGiRgKBuikqHa8TUPkV2hU3jH6F2eFTBWkMxqGVWdegEnHgh7Qq0jQ0IjRGMEYfUl/R+KwdHect1vko1rdb/hz0LckKEmdlOhEtT/hQLNM8QSZBGt7gZ9iPweDgkk+q4aZ5SmwCYx4z9EEFLf9fH72jG45ZUCH2riXIJ2r/0/koKydqtglFeDYvvYK8T2vl+Hwez8XSZohT9jzomEmKWpadEgHwrii5NQRYEa4v1I2BgMMXdNFCcHrk9+S02YjaDVaJ3v1drOso0I2yTeyQwKyT9rkB+mQLmHkmtySe/Lg3Xh33qP39Bxd8MqZDfIC3t9/JDmq5g==</latexit>

x i 2 Rd , d 1 ' : xi ! zi = '(xi ) M ⇢ Rd , dim(M) = m, m ⌧ d

Xavier Bresson 10
11

Dimensionality reduction
An example where non-linear reduction effectively reveals clear patterns.
Several non-linear techniques are available, each suited to different data distributions.

<latexit sha1_base64="4TanTIfIlIp8v7S7yc/z5SR719A=">AAACUXicbVFNaxQxGM5O/ahTtWs9egkulgqyzNSy9lIoiODBQxW3LWzG5Z1sdjc0yYTknbLbYf6iBz35P7x4UMzszsG2vhB4eD5I3ie5VdJjkvzoRBt37t67v/kg3nr46PF298nOqS9Kx8WQF6pw5zl4oaQRQ5SoxLl1AnSuxFl+8bbRzy6F87Iwn3FpRaZhZuRUcsBAjbvzXcouwdm53Fu8PLqijMW7dMGkYRpwnufVp/pLNTg8qF9RRq+u86/X7jZ/xFAs0OnqA9h3ciaMButDeJRmdRzH424v6SerobdB2oIeaedk3P3GJgUvtTDIFXg/ShOLWQUOJVeijlnphQV+ATMxCtCAFj6rVo3U9EVgJnRauHAM0hX7b6IC7f1S58HZ7ONvag35P21U4vQwq6SxJQrD1xdNS0WxoE29dCKd4KiWAQB3MryV8jk44Bg+oSkhvbnybXC6308H/cHHg97xflvHJnlGnpM9kpI35Ji8JydkSDj5Sn6S3+RP53vnV0SiaG2NOm3mKbk20dZfHCqxtA==</latexit>

'(x) = z
x 2 R684 , z 2 R3
' = LapEigenmaps[1]

+
28 x 28 MNIST images Visualization of MNIST
x ∈ R28 × 28 = 684 images in 3D
z ∈ R3
[1] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003

Xavier Bresson 11
12

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 12
13

Principal component analysis


PCA[1], introduced in 1901, is inspired by the principal axis theorem in mechanics.
It is the most popular technique for linear dimensionality reduction.
It aims to capture the direction of greatest variance within the data distribution.
Karl Pearson
1857-1936

<latexit sha1_base64="akX31KZ/iKC9kn1Di4CDaBifSMo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQe1QbniVt0FyDrxclKBHM1B+as/jFkaoTRMUK17npsYP6PKcCZwVuqnGhPKJnSEPUsljVD72eLUGbmwypCEsbIlDVmovycyGmk9jQLbGVEz1qveXPzP66UmvPEzLpPUoGTLRWEqiInJ/G8y5AqZEVNLKFPc3krYmCrKjE2nZEPwVl9eJ+1a1atX6/dXlUYtj6MIZ3AOl+DBNTTgDprQAgYjeIZXeHOE8+K8Ox/L1oKTz5zCHzifP+71jYk=</latexit>

e2
<latexit sha1_base64="wymQk+R1PM1aT7hru88wL9SHsnU=">AAAB63icbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4r2A9oQ9lsN+3S3U3Y3RRK6F/w4kERr/4hb/4bN20O2vpg4PHeDDPzwoQzbTzv2yltbe/s7pX33YPDo+OTyulZR8epIrRNYh6rXog15UzStmGG016iKBYhp91wep/73RlVmsXyycwTGgg8lixiBJtcmrmuO6xUvZq3BNokfkGqUKA1rHwNRjFJBZWGcKx13/cSE2RYGUY4XbiDVNMEkyke076lEguqg2x56wJdWWWEoljZkgYt1d8TGRZaz0VoOwU2E73u5eJ/Xj810V2QMZmkhkqyWhSlHJkY5Y+jEVOUGD63BBPF7K2ITLDCxNh48hD89Zc3Sade8xu1xuNNtVkv4ijDBVzCNfhwC014gBa0gcAEnuEV3hzhvDjvzseqteQUM+fwB87nD3+sjTE=</latexit>

v
xi xi xi <latexit sha1_base64="wymQk+R1PM1aT7hru88wL9SHsnU=">AAAB63icbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4r2A9oQ9lsN+3S3U3Y3RRK6F/w4kERr/4hb/4bN20O2vpg4PHeDDPzwoQzbTzv2yltbe/s7pX33YPDo+OTyulZR8epIrRNYh6rXog15UzStmGG016iKBYhp91wep/73RlVmsXyycwTGgg8lixiBJtcmrmuO6xUvZq3BNokfkGqUKA1rHwNRjFJBZWGcKx13/cSE2RYGUY4XbiDVNMEkyke076lEguqg2x56wJdWWWEoljZkgYt1d8TGRZaz0VoOwU2E73u5eJ/Xj810V2QMZmkhkqyWhSlHJkY5Y+jEVOUGD63BBPF7K2ITLDCxNh48hD89Zc3Sade8xu1xuNNtVkv4ijDBVzCNfhwC014gBa0gcAEnuEV3hzhvDjvzseqteQUM+fwB87nD3+sjTE=</latexit>

+
<latexit sha1_base64="ZYxI7HoimVNbHL0XoWVVcFwioqw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQfeoFxxq+4CZJ14OalAjuag/NUfxiyNUBomqNY9z02Mn1FlOBM4K/VTjQllEzrCnqWSRqj9bHHqjFxYZUjCWNmShizU3xMZjbSeRoHtjKgZ61VvLv7n9VIT3vgZl0lqULLlojAVxMRk/jcZcoXMiKkllClubyVsTBVlxqZTsiF4qy+vk3at6tWr9furSqOWx1GEMziHS/DgGhpwB01oAYMRPMMrvDnCeXHenY9la8HJZ07hD5zPH+1xjYg=</latexit>

e1 +
Projection of data points into Principal component v and
Original data
the direction of the largest approximation of the
distribution in R2
variation v of the distribution original distribution w.r.t.
the largest variance

[1] Pearson, On lines and planes of closest fit to systems of points in space, 1901

Xavier Bresson 13
14

Task formulation
Given a set of data points, PCA projects the data onto an orthogonal basis that best captures its
variance.
Assuming the data distribution is centered at the origin, PCA defines an orthogonal
transformation, i.e. a rotation matrix, that maps the data to a new coordinate system (v1,v2,…,vK)
known as principal directions such that
The first basis function or principal direction v1 captures the largest possible variance in data.
The second basis function or principal direction v2 captures the second largest possible
variance while being orthogonal to the first principal direction ⟨v1,v2⟩=0.
For each subsequent direction, the vk’s capture the k-th largest possible data variance,
maintaining orthogonality to all previous directions. e2
v1
v2

Rotation
e1
Origin

Xavier Bresson 14
15

Covariance matrix
Data variance across feature dimensions is captured by the covariance matrix :

C = X T X 2 Rd⇥d ,
<latexit sha1_base64="vJZAjb4qaPPDaJy2sjI+AcM66Tg=">AAAEq3ichVPbTttAEDUkbWl6AdrHvoxKiqgULNuq6EVCQqKV+lRRyiUqG6z1eoI32Otod81Fxh/XX+hb/6a7iYEAKV3Jq/HMnLmcmY2GKVfa8/7MzDaaDx4+mnvcevL02fP5hcUXeyovJMNdlqe57EZUYcoF7mquU+wOJdIsSnE/Ot609v0TlIrnYkefD7GX0SPB+5xRbVTh4uyvZdiEdege7kAXCBdAMqqTKCq3q8MyJppnqCCuOkBIaxmIxjMts/KU6wRiqikYb8nPoJqCFlfoKxycJijReI+jJWpIGZae+5FlE05t0QauQCcIosgilJD3x9mGORdajfBwX4D4XwH6SHUhcSJEjdlJUMApQkJPsC5vMyx9v7LkhCVhca47fmVpuv4zNnJxQ0EuwuAwsHpVZGHJ1w1EWA9uhOC6whMqORUMwXBmy4y5RGZHAm0M/fZEBcE9FYxsUzLZO5jgg8lcqdX/5Fw1d3CZ+RK6jRkXsSHwE3y2/BlWGQpthhjbMEhZcsmpiZehsJsGgw5wF91bY/brKdUr8qVauWpkUL2d1snA9ueZ3QPSzyVNUxiMt6z0O67rdmJStVrhwpLneqMDdwW/Fpac+myFC79JnLPCFKtZSpU68L2h7pVUas5SrFqkUGgKPqZHeGBEQc0S98rRW6vgjdHEYKoxn9Aw0k4iSpopdZ5FxtP2qW7brHKa7aDQ/Q+9kothoVGwcaJ+kYLOwT7celrpuREok9zUCiyhkjIzDGVJ8G+3fFfYC1x/zV37/m5pI6jpmHNeOa+dFcd33jsbzldny9l1WGOl8a2x3+g2V5s/mj+bZOw6O1NjXjo3ThP/Ai6Ceok=</latexit>

with data matrix X 2 Rn⇥d where


n is the number of data points
d is the number of data features
e2
Then we have
n
X C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>

T
kX·,1 k22 2 C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>

C11 = X·,1 X·,1 = = Xi1 variance in the direction e1


i=1 e1
n
X C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>

T
C12 = X·,1 X·,2 = Xi1 Xi2 cross-variance in the direction e1 -e2
i=1
Reminder : Data is centered in each feature dimension j, i.e.
Vector e1 is the
n
X direction of the largest
E(X·,j ) = Xij = 0, 8j 2 {1, ..., d}
variance with value C11
i=1

Xavier Bresson 15
16

Direction of the largest variation


Consider an arbitrary centered data distribution.
Let us compute the direction of the largest data variance, vlargest :
e2 vlargest
<latexit sha1_base64="pg37c/v3R7tLC0Xy2vP4Msc+YPM=">AAAB/XicbVDJSgNBEO1xjXEbl5uXxqB4CjNBoseAF48RzALJEHo6laRJz0J3TTAOwV/x4kERr/6HN//GTjIHTXxQ8Hiviqp6fiyFRsf5tlZW19Y3NnNb+e2d3b19++CwrqNEcajxSEaq6TMNUoRQQ4ESmrECFvgSGv7wZuo3RqC0iMJ7HMfgBawfip7gDI3UsY/P6ajTRnhAFaSSqT5onHTsglN0ZqDLxM1IgWSoduyvdjfiSQAhcsm0brlOjF7KFAouYZJvJxpixoesDy1DQxaA9tLZ9RN6ZpQu7UXKVIh0pv6eSFmg9TjwTWfAcKAXvan4n9dKsHftpSKME4SQzxf1EkkxotMoaFco4CjHhjCuhLmV8gFTjKMJLG9CcBdfXib1UtEtF8t3l4VKKYsjR07IKbkgLrkiFXJLqqRGOHkkz+SVvFlP1ov1bn3MW1esbOaI/IH1+QPULZVz</latexit>

C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>

n
X
<latexit sha1_base64="BA0TVd8XcZ7arSjI22XTxgW5GG0=">AAADs3ichVJbb9MwFE4bLqNc1sEjL0esTJ2EqqSCgZAmDfWFx4HarlLdBsdxErPECbETWqX5gbzyxr/BTjMEXSUs2TqX73zn+LPdNGJCWtavVtu8c/fe/YMHnYePHj857B49nYokzwidkCRKspmLBY0YpxPJZERnaUZx7Eb0yr0e6fxVQTPBEj6W65QuYhxw5jOCpQo5R60fJ1A4SNKVzOIywllAhawQ4yjGMnTd8nO19OAcbhAKEONV5ZRoU6CNMzy3K0CARB47JVPOkgNyWdCHlcOWYyhq73Q5fKVQKrRLjFDnBFAoUkxoaQ+GJK7+26xQvCNQ1E31txx7f0oCVlAOMsQSqu1YbP9Aus1mVtPWTn9WnC7H+lSObjHTW+M1cy10+T1kklblTS8U5gEtP1RVBXtnwdwDdw0e9RlnWm410kixb5n3FPTqGYseMKGuQCHNkq+U1JWJDzrbg4TXKY9lTaan8O8Vc8fpHlsDq15w27Ab49ho1qXT/Ym8hOQx5ZJEWIi5baVyoVSXjES06qBcUPUw1zigc2VyHFOxKGspKnipIh74SaY2l1BH/64ocSzEOnYVUj+42M3p4L7cPJf+u0XJeJpLysm2kZ9HIBPQH7i5eaR0ZZhkSlcCJMQZJlJ9cy2CvXvl28Z0OLDPBmefXh9fDBs5Doznxgujb9jGW+PC+GhcGhODtK32tO20v5hvzLnpmt4W2m41Nc+Mf5YZ/wYsjimS</latexit>

C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>

2 C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>

vlargest 2 Rd = argmaxkvk2 =1 xTi v , xi 2 Rd


i=1
e1
= argmaxkvk2 =1 v T Cv
X
given that
2
xTi v = kXvk22 = (Xv)T (Xv) = v T X T Xv A
i
and by definition C = X T X e2 <latexit sha1_base64="/lHcqkzf4Q1nKL9+ibdjVBbjF5g=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpl262YTdiVhCf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJFIYdN1vZ219Y3Nru7BT3N3bPzgsHR23TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg94CThfkSHSoSCUbTS/VNf9Etlt+LOQVaJl5My5Gj0S1+9QczSiCtkkhrT9dwE/YxqFEzyabGXGp5QNqZD3rVU0YgbP5ufOiXnVhmQMNa2FJK5+nsio5ExkyiwnRHFkVn2ZuJ/XjfF8NrPhEpS5IotFoWpJBiT2d9kIDRnKCeWUKaFvZWwEdWUoU2naEPwll9eJa1qxatVaneX5Xo1j6MAp3AGF+DBFdThFhrQBAZDeIZXeHOk8+K8Ox+L1jUnnzmBP3A+fwBfUo3T</latexit>

xi
xTi v is the projection of xi on the direction v :
<latexit sha1_base64="dh0tZ0YXEOcgLBAVaQXFDPcZOsA=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxCIo8ENmR2aGBkdnYzM0tCNnyBFw8a49VP8ubfOMAeFKykk0pVd7q7glhwbVz328ltbe/s7uX3CweHR8cnxdOzlo4SxbDJIhGpTkA1Ci6xabgR2IkV0jAQ2A4m9wu/PUWleSQfzSxGP6QjyYecUWOlxrRfLLlldwmySbyMlCBDvV/86g0iloQoDRNU667nxsZPqTKcCZwXeonGmLIJHWHXUklD1H66PHROrqwyIMNI2ZKGLNXfEykNtZ6Fge0MqRnrdW8h/ud1EzO881Mu48SgZKtFw0QQE5HF12TAFTIjZpZQpri9lbAxVZQZm03BhuCtv7xJWpWyVy1XGzelWiWLIw8XcAnX4MEt1OAB6tAEBgjP8ApvzpPz4rw7H6vWnJPNnMMfOJ8/4fOM9Q==</latexit>

xTi v
<latexit sha1_base64="saSb6pbqbAOExLgiGyJ0Bi2J3ps=">AAAB7nicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxiwsMEVjI7zMKE2dnNTC+RED7CiweN8er3ePNvHGAPClbSSaWqO91dQSKFQdf9dnIbm1vbO/ndwt7+weFR8fikZeJUM95ksYz1Q0ANl0LxJgqU/CHRnEaB5O1gdDv322OujYhVAycJ9yM6UCIUjKKV2k898dgg416x5JbdBcg68TJSggz1XvGr249ZGnGFTFJjOp6boD+lGgWTfFbopoYnlI3ogHcsVTTixp8uzp2RC6v0SRhrWwrJQv09MaWRMZMosJ0RxaFZ9ebif14nxfDGnwqVpMgVWy4KU0kwJvPfSV9ozlBOLKFMC3srYUOqKUObUMGG4K2+vE5albJXLVfvr0q1ShZHHs7gHC7Bg2uowR3UoQkMRvAMr/DmJM6L8+58LFtzTjZzCn/gfP4A6x6PQw==</latexit>

e1

Xavier Bresson 16
17

Eigenvalue decomposition
Next, we perform the eigenvalue decomposition (EVD) of the positive semi-definite (PSD)
covariance matrix C :
e2 <latexit sha1_base64="tP3fJgXRE5fGy/K/GEADPWDiTbI=">AAAB/3icbVBNS8NAEN34WetXVPDiJVgETyUpUr0IBS8eK9gPaEPYbCft0s0Hu5NiiT34V7x4UMSrf8Ob/8Ztm4O2Phh4vDfDzDw/EVyhbX8bK6tr6xubha3i9s7u3r55cNhUcSoZNFgsYtn2qQLBI2ggRwHtRAINfQEtf3gz9VsjkIrH0T2OE3BD2o94wBlFLXnm8cjrIjygDDNBZR8UTq5HnuOZJbtsz2AtEycnJZKj7plf3V7M0hAiZIIq1XHsBN2MSuRMwKTYTRUklA1pHzqaRjQE5Waz+yfWmVZ6VhBLXRFaM/X3REZDpcahrztDigO16E3F/7xOisGVm/EoSREiNl8UpMLC2JqGYfW4BIZirAllkutbLTagkjLUkRV1CM7iy8ukWSk71XL17qJUq+RxFMgJOSXnxCGXpEZuSZ00CCOP5Jm8kjfjyXgx3o2PeeuKkc8ckT8wPn8Auf+WhA==</latexit>

vlargest = v1
C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>

2 Rd , j = 1, ..., d
<latexit sha1_base64="hdyQG2kHH18i7ERxeakR6dOcYlI=">AAAEF3icjVNLbxMxEN5ueJTwaApHLiNSKg4hykaoIKRIlXIBiUNB6UOqk8jrnSROvd5ge7et0v0XXPgrXDiAEFe48W/wJptXWyQsWZqZ75uZb0a2PxJcm1rtz5pbuHHz1u31O8W79+4/2ChtPjzQUawY7rNIROrIpxoFl7hvuBF4NFJIQ1/goX/SzPDDBJXmkWyZ8xG2Q9qXvMcZNTbU3XSfb0Mz6Q6hAUTYtIBa2/qESxJSM/D98Ye0E1SAwLDhVarVaiUAQorbQAyeGRWOT7kZgBkgIO+jTKiIUb+GdF5ubOucpY2Z6wHp48c5Wp+6tu5qPFgSZCtwmU7x2mrzd2gg1sAiqXmAaqJDUNVHbZb0VOAUYUATzGRl6UmuqtOC5txZ7ZgFFrR/c8jFDCMX3Xqnfg1jea5hhQDpRYoKAUMiI9PwVid6KyGyUyg4jVSgF8q3ku6Mkg+YNmadreFtAdX5dPNalpfBlnWRZOoanmVA0mk1EyvzP5ewCHjFYrdUrlVrkwNXDS83yk5+9rql3ySIWByiNExQrY+92si0rTLDmcC0SGKNI8pOaB+PrSlpiLo9nrzrFJ7aSAB2W/ZKA5PocsaYhlqfh75lZm9VX8ay4HXYcWx6r9pjLkexQcmmjXqxABNB9kkg4AqZEefWoExxqxXYgCrKjP1K2RK8yyNfNQ7qVW+nuvP+RXm3nq9j3XnsPHGeOZ7z0tl13jh7zr7D3E/uF/eb+73wufC18KPwc0p11/KcR87KKfz6C+hYUyY=</latexit>

Cvj = j vj
C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>

C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>

with the eigenvalues: max = 1 2 ... d = min 0


Let us consider the largest eigenvalue, we have e1
T T 2
vmax Cvmax = max vmax vmax = max kvmax k2 = max j, 8j 6= 1
In other words, we have vlargest = vmax = v1 as
argmaxkvk2 =1 v T Cv = vmax
T
Cvmax = max = 1

Xavier Bresson 17
18

Direction of the second largest variation


The direction of the greatest data variance, known as the first Principal Direction (PD),
is given by the spectral solution and corresponds to the eigenvector v1 associated with the largest
eigenvalue of the covariance matrix C :
<latexit sha1_base64="NZP0pwOny9ixTNg8J4FME1dws6I=">AAACmHicbZFNaxsxEIa127RN3Y+4Lb00F1FTSC9m15S0UAKGtCS9JcVOApa9aGV5LSJpF2nWtdn4N/W/9JZ/k1nHh8TJgOCdZ97RxygttPIQRddB+GTr6bPn2y8aL1+9frPTfPvuzOelE7Ivcp27i5R7qZWVfVCg5UXhJDeplufp5WFdP59J51Vue7Ao5NDwzKqJEhwQJc1/h3SWxPSAMo1NY466zhllTmVT4M7lfzFDNurRR72j3iZkV0jYVdIZde5x1CDn4EzFXWb4fJlUaK2NB/GSMl+apFIoR5ayVGV7dJ4oPHW2yr7gZow1kmYrakeroA9FvBYtso6TpPmfjXNRGmlBaO79II4KGOINQAktlw1WellwcckzOUBpuZF+WK0Gu6SfkYzpJHe4LNAVvdtRceP9wqToNBymfrNWw8dqgxIm34eVskUJ0orbgyalppDT+pfoWDkpQC9QcOEU3pWKKXdcAP5lPYR488kPxVmnHe+390+/trqd9Ti2yS75RPZITL6RLjkmJ6RPRPAh+BH8DH6FH8NueBT+vrWGwbrnPbkX4Z8bvaLEMg==</latexit>

n
X 2
Cv1 = 1 v1 ! v1T Cv1 = T
1 v1 v1 = 2
1 kv1 k2 = 1 = argmaxkvk2 =1 xTi v
i=1

Similarly, the direction of the second largest data variance, or the second PD, is defined as :
<latexit sha1_base64="jiJ++768IWaQMeXeWiql8G2GyqY=">AAAEC3icfVPLbtNAFHUdHiW8UliyudBQtVIUxQEVhFSpUjewK6gvqU6s8XjiTDueCTNjt5HrPRt+hQ0LEGLLD7Djb7h2LdKmwEiW79yXzzn3OpwIbmyv92vBbVy7fuPm4q3m7Tt3791vLT3YMyrVlO1SJZQ+CIlhgku2a7kV7GCiGUlCwfbD460yvp8xbbiSO3Y6YYOExJKPOCUWXcGS+3gFsqDvc+knxI7DMH9XDCPYAN+yU6uTnOg4IadFkPtnmX8W9De8AnzwTZoEOcfLUIIf8ngVTgM+3IGsuq0N+50yq+4Bpmu7UEA23MkCb6M3C6y2szZwA0rbsYqVJAKsgjZmtdewAHy/uTKDIiOwYwZGibREXxbGPGMSwul5gFGFOYzHTGZEpAzKEm5N7WLUKv0KsHHVF3kj4q3yXRIWqFpE0K78806kjzKhAMM5f8ze/7kedXzEPFKaCAFHVejZjGwJprhYKi6Uepe5vpGgkJKGE6Uj04ETBmOSsRr5f6bTqVUup4TWVoZo/0V0NoZaOYH9mLGQEc2JpAxn0GwGreVet1cduGp4tbHs1Gc7aP30I0XThElLBTHm0OtN7ACxWk4FK5p+atiE0GMSs0M0JUmYGeTVLhfwFD0RoIT4SAuV92JFThJjpkmImeXCmvlY6fxb7DC1o5eDnMtJapmk5x8apdW+lT8GRFzjfogpGoRqjliBjokm1OLvU4rgzVO+auz1u956d/3t8+XNfi3HovPIeeKsOp7zwtl0Xjvbzq5D3Q/uJ/eL+7XxsfG58a3x/TzVXahrHjqXTuPHb4dKRyE=</latexit>

n
X e2
2 v1
v2 2 Rd = argmaxkvk2 =1 xTi v , s.t. v T v1 = 0 (v is orthogonal to v1 ) v2
<latexit sha1_base64="B3fYO1e43jTTf9GL4+leLuN7MP4=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpF262YTdTaGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bu63J6g0j+WTmSboR3QoecgZNVZ6nPS9fqnsVtwFyDrxclKGHI1+6as3iFkaoTRMUK27npsYP6PKcCZwVuylGhPKxnSIXUsljVD72eLUGbm0yoCEsbIlDVmovycyGmk9jQLbGVEz0qveXPzP66YmvPUzLpPUoGTLRWEqiInJ/G8y4AqZEVNLKFPc3krYiCrKjE2naEPwVl9eJ61qxatVag/X5Xo1j6MA53ABV+DBDdThHhrQBAZDeIZXeHOE8+K8Ox/L1g0nnzmDP3A+fwAHZo2Z</latexit>

<latexit sha1_base64="jQpwXqU8wSg4ykzncsVEQtq5GZ4=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpF262YTdTaGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bu63J6g0j+WTmSboR3QoecgZNVZ6nPSr/VLZrbgLkHXi5aQMORr90ldvELM0QmmYoFp3PTcxfkaV4UzgrNhLNSaUjekQu5ZKGqH2s8WpM3JplQEJY2VLGrJQf09kNNJ6GgW2M6JmpFe9ufif101NeOtnXCapQcmWi8JUEBOT+d9kwBUyI6aWUKa4vZWwEVWUGZtO0Ybgrb68TlrViler1B6uy/VqHkcBzuECrsCDG6jDPTSgCQyG8Ayv8OYI58V5dz6WrRtOPnMGf+B8/gAI6o2a</latexit>

C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>

C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>

i=1
C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>

and the solution is given by the second eigenvalue and its eigenvector:
v2T Cv2 = T
2 v2 v2 = 2
2 kv2 k2 = 2 j, 8j 3 and 2  1
e1
In other words, we have
argmaxkvk2 =1,vT v1 v T Cv = v2T Cv2 = 2 (second largest variance)

Xavier Bresson 18
19

PCA as EVD of covariance matrix


In the same way, the direction of the third largest data variance is defined as
<latexit sha1_base64="c1GNRY/eRF3kyjnt9Lnek9Nf/F4=">AAAEkHicfVPdThNBFF5oVayigJfenMhK2qTZdItBQmwE0USMF2hKIem2m9md6XbC/jQzs0vrsq/jA3nn23h22YIFcZJJzu/3nTnnjDPxuVSt1u+l5Ur1wcNHK49rT56uPnu+tr7Rk1EsXHbiRn4kzhwimc9DdqK48tnZRDASOD47dc4Pc/9pwoTkUdhVswkbBMQL+Yi7RKHJXl/+uQWJvW3x0AqIGjtO+j0bUuiApdhUiSAlwgvINLNT6zKxLu12x8zAAkvGgZ1yVIYhWA736jC1+bALSaE1hu1mHlVigDSUARkkw25im53WjYOEdG5v53artgXWWE6Iy9IdN8iuA+t6ogOXEAk1jrwoJD6oCHSE0wsQlNp6A7FKjDKvO2YgIz/OH5unezxhITgzUOhQYy4oMO6xMCF+zAogrmRpYq6KxB4gZgF5mPcpb4yP3aUE5VxfYDvwVeQxhBZNuGDgRqHklImCbBT7PmCLBZ/CiOTQ/EcxA4hGoB/qcMHVGD71Pu7dEHagB9bXgg562NzFKaXUUjxgEmi2WEaBlGEuFvuBe30s1GwahtFMbFpYBvcj5WNDql5nDnaU2fQ/vNfzOaIsxP2blW9sFEhl8ddglBMvq887eFXUXKONe2lqtZq9ttkyWsWBu4JZCptaeY7ttV8Wjdw4wKpcn0jZN1sTNcB1Vtz1WVazYslwzc6Jx/oohgSpBmnxoTJ4jRYKo0jgDRUU1r8zUhJIOQscjMwLlrd9ufFfvn6sRruDlIeTWLHQvSLCzch3Of+dQLnAtfNnKBBXcKwV3DERuC/4h/MmmLeffFfotQ1zx9j59mZzv122Y0V7qb3S6pqpvdX2tc/asXaiuZXVynblXaVT3ajuVt9XD65Cl5fKnBfawql++QP++XSg</latexit>

n
X 2
v3 2 Rd = argmaxkvk2 =1 xTi v , s.t. v T v1 = 0 and v T v2 = 0
i=1
(v is orthogonal to v1 and v2 )
The solution is given by the third eigenvalue and its eigenvector:
Cv3 = 3 v3
Altogether, we consider the full matrix factorization of C with EVD:
C = V ⇤V T 2 Rd⇥d
h i
with V = v1 , ..., vd 2 Rd⇥d , V T V = Id 2 Rd⇥d (Identity matrix), ⇤ = diag( 1 , ..., d) 2 Rd⇥d

Xavier Bresson 19
20

Principal directions and components


The principal directions (PDs) indicate the directions along which the data has the greatest variance.
The EVD of the covariance matrix C provides
The principal directions vj in Rd as the eigenvectors of C.
The magnitude of the variances along each direction Cjj = λj as the eigenvalues of C.
The principal components of a data point xi are defined as the projection onto the basis formed by
these principal directions :
e2
PD2 = v2
<latexit sha1_base64="y5dW/4dk0ybDEfCQZhgAVLW74rE=">AAAB+nicbVDLSsNAFJ3UV62vVJduBovgqiRBqhuhoAuXFewD2hAm00k7dPJg5qZaYj/FjQtF3Pol7vwbp20W2nrgwuGce7n3Hj8RXIFlfRuFtfWNza3idmlnd2//wCwftlScSsqaNBax7PhEMcEj1gQOgnUSyUjoC9b2R9czvz1mUvE4uodJwtyQDCIecEpAS55Z7gF7BBlmjZup51yNPcczK1bVmgOvEjsnFZSj4ZlfvX5M05BFQAVRqmtbCbgZkcCpYNNSL1UsIXREBqyraURCptxsfvoUn2qlj4NY6ooAz9XfExkJlZqEvu4MCQzVsjcT//O6KQSXbsajJAUW0cWiIBUYYjzLAfe5ZBTERBNCJde3YjokklDQaZV0CPbyy6uk5VTtWrV2d16pO3kcRXSMTtAZstEFqqNb1EBNRNEDekav6M14Ml6Md+Nj0Vow8pkj9AfG5w/Y+pOx</latexit>
<latexit sha1_base64="/lHcqkzf4Q1nKL9+ibdjVBbjF5g=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpl262YTdiVhCf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJFIYdN1vZ219Y3Nru7BT3N3bPzgsHR23TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg94CThfkSHSoSCUbTS/VNf9Etlt+LOQVaJl5My5Gj0S1+9QczSiCtkkhrT9dwE/YxqFEzyabGXGp5QNqZD3rVU0YgbP5ufOiXnVhmQMNa2FJK5+nsio5ExkyiwnRHFkVn2ZuJ/XjfF8NrPhEpS5IotFoWpJBiT2d9kIDRnKCeWUKaFvZWwEdWUoU2naEPwll9eJa1qxatVaneX5Xo1j6MAp3AGF+DBFdThFhrQBAZDeIZXeHOk8+K8Ox+L1jUnnzmBP3A+fwBfUo3T</latexit>

xi v1 = PD1
<latexit sha1_base64="UAO+EvMwPFJRHM4N/n8lzAx6X94=">AAAB+nicbVBNS8NAEN3Ur1q/Uj16WSyCp5IUqV6Egh48VrAf0Iaw2W7bpbtJ2J1US+xP8eJBEa/+Em/+G7dtDtr6YODx3gwz84JYcA2O823l1tY3Nrfy24Wd3b39A7t42NRRoihr0EhEqh0QzQQPWQM4CNaOFSMyEKwVjK5nfmvMlOZReA+TmHmSDELe55SAkXy7OPbdqy6wR1Ayrd9Mfde3S07ZmQOvEjcjJZSh7ttf3V5EE8lCoIJo3XGdGLyUKOBUsGmhm2gWEzoiA9YxNCSSaS+dnz7Fp0bp4X6kTIWA5+rviZRIrScyMJ2SwFAvezPxP6+TQP/SS3kYJ8BCuljUTwSGCM9ywD2uGAUxMYRQxc2tmA6JIhRMWgUTgrv88ippVsputVy9Oy/VKlkceXSMTtAZctEFqqFbVEcNRNEDekav6M16sl6sd+tj0Zqzspkj9AfW5w/SNJOv</latexit>

xTi v2
<latexit sha1_base64="3y2UsGVHJ1tn7D2CrSPeqdaYJA8=">AAAB/3icbVBNS8NAEN34WetXVPDiJVgETyUpUr0IhV48VugXtDFstpt26W4SdielJfbgX/HiQRGv/g1v/hu3bQ7a+mDg8d4MM/P8mDMFtv1trK1vbG5t53byu3v7B4fm0XFTRYkktEEiHsm2jxXlLKQNYMBpO5YUC5/Tlj+szvzWiErForAOk5i6AvdDFjCCQUueedoFOgYp0lp16pVuxx57qI+8kmcW7KI9h7VKnIwUUIaaZ351exFJBA2BcKxUx7FjcFMsgRFOp/luomiMyRD3aUfTEAuq3HR+/9S60ErPCiKpKwRrrv6eSLFQaiJ83SkwDNSyNxP/8zoJBDduysI4ARqSxaIg4RZE1iwMq8ckJcAnmmAimb7VIgMsMQEdWV6H4Cy/vEqapaJTLpbvrwqVUhZHDp2hc3SJHHSNKugO1VADEfSIntErejOejBfj3fhYtK4Z2cwJ+gPj8weo9pXU</latexit>

xTi v1 = PC1
<latexit sha1_base64="WV40eFKzOu17VGqX8tsfGzXV0i0=">AAAB/3icbVDLSgNBEJz1GeNrVfDiZTAInsJukOhFCOTiMUJekKzL7GQ2GTL7YKY3JKw5+CtePCji1d/w5t84SfagiQUNRVU33V1eLLgCy/o21tY3Nre2czv53b39g0Pz6LipokRS1qCRiGTbI4oJHrIGcBCsHUtGAk+wljeszvzWiEnFo7AOk5g5AemH3OeUgJZc83Ts8of6yLVvu8DGIIO0Vp26tmsWrKI1B14ldkYKKEPNNb+6vYgmAQuBCqJUx7ZicFIigVPBpvluolhM6JD0WUfTkARMOen8/im+0EoP+5HUFQKeq78nUhIoNQk83RkQGKhlbyb+53US8G+clIdxAiyki0V+IjBEeBYG7nHJKIiJJoRKrm/FdEAkoaAjy+sQ7OWXV0mzVLTLxfL9VaFSyuLIoTN0ji6Rja5RBd2hGmogih7RM3pFb8aT8WK8Gx+L1jUjmzlBf2B8/gCis5XS</latexit>

PC2 =
e1

xi pca = V T xi 2 Rd (Rotated data along the PDs)


<latexit sha1_base64="66JU3OmZCMzjzd14t5t6nTC8170=">AAACs3icdVFLb9NAEF6bVwmPBjhyGZFSlUtkR1C4IFWCA8dQNa6lODHr9SZZdR/W7hglsvwHOXLj37BOg0RTGGmkT9988y4qKRxG0a8gvHP33v0HBw97jx4/eXrYf/Y8caa2jE+YkcamBXVcCs0nKFDytLKcqkLyy+LqUxe//M6tE0Zf4KbiM0WXWiwEo+ipvP/jGJp1Ltp5hnyNVjUVoy18hGR+AZ7PhM4UxVVRNOftvIRs5SrKeBMNR0y18CcJTs4NUuQllBQpUGn0EnDFYfzZvfGyrHcM6X6LFBK4Wb/RGQrFHZT/Le3FVqzhKD3a79LL+4NoGG0NboN4BwZkZ+O8/zMrDasV18gkdW4aRxXOGmpRMMnbXlY77re9oks+9VBTP9qs2d68hdeeKWFhrHeNsGX/zmiocm6jCq/sFnT7sY78V2xa4+LDrBG6qpFrdt1oUUtAA90DoRSWM5QbDyizws8KbEUtZejf3B0h3l/5NkhGw/h0ePr17eBstDvHAXlJXpETEpP35Ix8IWMyISyIgiTIg2/hu3AaFmF5LQ2DXc4LcsNC9Ru3W9Nw</latexit>

X pca = XV 2 Rn⇥d (Rotated data matrix X along the PDs)


Xavier Bresson 20
21

Dimensionality reduction
Suppose that the data is primarily concentrated along the first principal directions, the remaining
directions, which mostly capture noise or insignificant details, can be discarded.
The first K principal directions (PDs) can be selected as :

Xkpca k2F  "


<latexit sha1_base64="6D6UxbUp5lhU3zFVw/GFuVzYfGE=">AAADbHicdZJbb9MwFMfdlMsol3XAA2hCOqKl4qFETYUGL5MmQAjeClq7SHVbOY7TWEmcYDvdpqwvfETe+Ai88Blwsm50HVhKfPQ/t985spfFXOle72fNqt+4eev21p3G3Xv3H2w3dx6OVJpLyoY0jVPpekSxmAs21FzHzM0kI4kXsyMvel/6jxZMKp6KQ32asUlC5oIHnBJtpNlO7XsHIsCanWiZFKByGoIOiYYl4DNwX7nTC19GyXIW4bPZx2kfcMy+AV4QyTLF41QAxo3OZZnjkElmKmwmwz64MDI3YC5wQnToecXX5bQQWPOEKYhM178wXBkUBiTLZHrCk4oY0gDabhuOuQ4rb8Cl0maEwQdVMq9jEOEbaVT1xe/4fAyLmdO1bbu7MFqpTDZB/EuQbpk4PTS//Yt6n8v5r8ZHa+DrrVfoWubCbJr50B79j7nRmDVbPbtXHbhuOCujhVZnMGv+wH5K84QJTWOi1NjpZXpSEKk5jdmygXPFMkIjMmdjYwpiECdF9ViW8MIoPgSpNJ/QUKnrGQVJlDpNPBNZDqo2faX4L98418HbScFFlmsm6HmjII9Bp1C+PPC5ZFTHp8YgVHLDCjQkklBt3me5BGdz5OvGqG87e/bel9etg/5qHVtoFz1HL5GD3qAD9AkN0BDR2i9r23piPbV+1x/Xd+vPzkOt2irnEbpy6p0/FbAT2Q==</latexit>

k such that kX
where Xkpca = XVk 2 Rn⇥k is the approximation of X with the first k PDs
h i
and Vk = v1 , ..., vk 2 Rd⇥k , VkT Vk = Ik 2 Rk⇥k
is the truncated V with the first k PDs
<latexit sha1_base64="akX31KZ/iKC9kn1Di4CDaBifSMo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQe1QbniVt0FyDrxclKBHM1B+as/jFkaoTRMUK17npsYP6PKcCZwVuqnGhPKJnSEPUsljVD72eLUGbmwypCEsbIlDVmovycyGmk9jQLbGVEz1qveXPzP66UmvPEzLpPUoGTLRWEqiInJ/G8y5AqZEVNLKFPc3krYmCrKjE2nZEPwVl9eJ+1a1atX6/dXlUYtj6MIZ3AOl+DBNTTgDprQAgYjeIZXeHOE8+K8Ox/L1oKTz5zCHzifP+71jYk=</latexit>

e2
v1
<latexit sha1_base64="DrvT3yMXAHnl7vpUgc841GOgMeo=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3RRK6G/w4kERr/4gb/4bN2kO2vpg4PHeDDPzgoQzpR3n26psbe/s7lX37YPDo+OT2ulZV8WpJNQjMY9lP8CKciaop5nmtJ9IiqOA014wu8/93pxKxWLxpBcJ9SM8ESxkBGsjefORa9ujWt1pOAXQJnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOlPUwVTTCZ4QkdGCpwRJWfFccu0ZVRxiiMpSmhUaH+nshwpNQiCkxnhPVUrXu5+J83SHV452dMJKmmgqwWhSlHOkb552jMJCWaLwzBRDJzKyJTLDHRJp88BHf95U3SbTbcVqP1eFNvN8s4qnABl3ANLtxCGx6gAx4QYPAMr/BmCevFerc+Vq0Vq5w5hz+wPn8AcZ6NwQ==</latexit>

xi pca = V T xi 2 Rd
<latexit sha1_base64="8ldh9i1NJUaxyBDHcA34MnWekXs=">AAACgHicbVHbihNBEO0Zb2u8RX30pTC4RNA4s8gqgrDgi4+rbLILmWTo6VQ2Tbp7mu6aJWGY7/C/fPNjBHuyETSbAw2HU+dQ1VWFVdJTkvyK4lu379y9d3C/8+Dho8dPuk+fjXxZOYFDUarSXRTco5IGhyRJ4YV1yHWh8LxYfmnr51fovCzNGa0tTjS/NHIuBacg5d0fh1CvctlMM8IVOV1bwRv4DKPpGQQ9kybTnBZFUX9vpjPIss4hZNxaV64ARvlyn20JO76rPN3jg78toU8L6QFXXFuFr5s2nXd7ySDZAG6SdEt6bIvTvPszm5Wi0mhIKO79OE0sTWruSAqFTSerPFoulvwSx4EartFP6s0CG3gVlBnMSxeeIdio/yZqrr1f6yI42/H9bq0V99XGFc0/TmppbEVoxHWjeaWASmivATPpUJBaB8KFk2FWEAvuuKBws3YJ6e6Xb5LR0SA9Hhx/e987Odqu44C9YC9Zn6XsAzthX9kpGzLBfke96E30No7jfvwuTq+tcbTNPGf/If70B7yMwDY=</latexit>

xi xi
⇡ VkT xi 2 Rk
e1
+ ⇡ v1T xi 2 R (this example)
<latexit sha1_base64="ZYxI7HoimVNbHL0XoWVVcFwioqw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQfeoFxxq+4CZJ14OalAjuag/NUfxiyNUBomqNY9z02Mn1FlOBM4K/VTjQllEzrCnqWSRqj9bHHqjFxYZUjCWNmShizU3xMZjbSeRoHtjKgZ61VvLv7n9VIT3vgZl0lqULLlojAVxMRk/jcZcoXMiKkllClubyVsTBVlxqZTsiF4qy+vk3at6tWr9furSqOWx1GEMziHS/DgGhpwB01oAYMRPMMrvDnCeXHenY9la8HJZ07hD5zPH+1xjYg=</latexit>

Projection of data points into


Original data
the direction of the largest
distribution in R2
variation v of the distribution
Xavier Bresson 21
22

Number of reduced dimensions


Simple selection of the hyper-parameter k.
Since principal directions capture the most significant variances in the data distribution, simply
retain the first k directions that collectively account for e.g. 90% of the total data variance.

Xkpca k2F
<latexit sha1_base64="tGHPCejNxj+9uJXsdZHG4D3Vj58=">AAAB8XicbVBNSwMxFHypX7V+VT16CRbBU9ktUj0WvHisYFuxXUo2m21js9klyQpl6b/w4kERr/4bb/4bs+0etHUgMMzMI++NnwiujeN8o9La+sbmVnm7srO7t39QPTzq6jhVlHVoLGJ17xPNBJesY7gR7D5RjES+YD1/cp37vSemNI/lnZkmzIvISPKQU2Ks9DAQNhqQ4WNlWK05dWcOvErcgtSgQHtY/RoEMU0jJg0VROu+6yTGy4gynAo2qwxSzRJCJ2TE+pZKEjHtZfONZ/jMKgEOY2WfNHiu/p7ISKT1NPJtMiJmrJe9XPzP66cmvPIyLpPUMEkXH4WpwCbG+fk44IpRI6aWEKq43RXTMVGEGltSXoK7fPIq6TbqbrPevL2otRpFHWU4gVM4BxcuoQU30IYOUJDwDK/whjR6Qe/oYxEtoWLmGP4Aff4A85eQaQ==</latexit>

<latexit sha1_base64="pz+6d76F7oJOdXOy0yFkdpuqUDM=">AAACtHicdVHLbtNAFB2bVwmvAEs2V6QgNkR2hEJZIFVCQiyLIE2kTGIm4+tk6pmxOzOuiFx/ITt2/A3jNIi0hbs6Ouc+z12UUlgXRb+C8MbNW7fv7N3t3Lv/4OGj7uMnx7aoDMcRL2RhJgtmUQqNIyecxElpkKmFxPEi/9Dq4zM0VhT6q1uXOFNsqUUmOHOeSro/XgJ1+N0ZVX9BidzBfr4PtuIrcCvmoAF6DpPXk/mfrJKzJsnpefJxPgAq8RToGTNYWiELDZR2/jYsDFihSrluu1xS/jcqM4zX1FYqqU/ex8089xP8LSlLTppdPt3hgS79ElH/XSfp9qJ+tAm4DuIt6JFtHCXdnzQteKVQOy6ZtdM4Kt2sZsYJLrHp0MpiyXjOljj1UDOFdlZvTG/ghWdSyPyRWaEdbNjdipopa9dq4TMVcyt7VWvJf2nTymUHs1rosnKo+cWgrJLgCmg/CKkw3jzvaioYN8LvCnzFvHPO/7k1Ib568nVwPOjHw/7w85ve4WBrxx55Rp6TVyQmb8kh+USOyIjwIA7GwbeAhcOQhjzEi9Qw2NY8JZci1L8BuDnUEA==</latexit>

Select k such that kX " j

or simply
Pk
j=1 j
Select k such that Pd 0.9 90% of total data variance
j=1 j

k j
<latexit sha1_base64="lMpwLjGEIF4eQMTIKTDw26b6Vto=">AAAB6nicbVBNS8NAEJ2tXzV+VT16WSyCp5IUqR4LXjxWtLXQhrLZbtq1m03Y3Qgl9Cd48aCIV3+RN/+NmzYHbX0w8Hhvhpl5QSK4Nq77jUpr6xubW+VtZ2d3b/+gcnjU0XGqKGvTWMSqGxDNBJesbbgRrJsoRqJAsIdgcp37D09MaR7LezNNmB+RkeQhp8RY6e7RcQaVqltz58CrxCtIFQq0BpWv/jCmacSkoYJo3fPcxPgZUYZTwWZOP9UsIXRCRqxnqSQR0342P3WGz6wyxGGsbEmD5+rviYxEWk+jwHZGxIz1speL/3m91IRXfsZlkhom6WJRmApsYpz/jYdcMWrE1BJCFbe3YjomilBj08lD8JZfXiWdes1r1Bq3F9VmvYijDCdwCufgwSU04QZa0AYKI3iGV3hDAr2gd/SxaC2hYuYY/gB9/gA4yo0R</latexit>

<latexit sha1_base64="wKhi5w6jcXlQlLYp7/zKfM6C3Bg=">AAAB6XicbVBNS8NAEJ2tX7V+VT16WSyCp5IUqR4LXjxWsR/QhrLZbtolm03Y3Qgl9B948aCIV/+RN/+NmzYHbX0w8Hhvhpl5fiK4No7zjUobm1vbO+Xdyt7+weFR9fikq+NUUdahsYhV3yeaCS5Zx3AjWD9RjES+YD0/vM393hNTmsfy0cwS5kVkInnAKTFWeggro2rNqTsL4HXiFqQGBdqj6tdwHNM0YtJQQbQeuE5ivIwow6lg88ow1SwhNCQTNrBUkohpL1tcOscXVhnjIFa2pMEL9fdERiKtZ5FvOyNipnrVy8X/vEFqghsv4zJJDZN0uShIBTYxzt/GY64YNWJmCaGK21sxnRJFqLHh5CG4qy+vk26j7jbrzfurWqtRxFGGMziHS3DhGlpwB23oAIUAnuEV3lCIXtA7+li2llAxcwp/gD5/AAXNjP4=</latexit>

YaleBFaces dataset
structure noise

Xavier Bresson 22
23

PCA as SVD of data matrix


We identified the principal directions of variance by performing EVD on the covariance matrix.
Alternatively, the same information can be derived using singular value decomposition (SVD) on
the data matrix X :

X = U ⌃W 2 Rn⇥d
<latexit sha1_base64="1jMfGTVIeEg5svoaLqb93dwEyVg=">AAAEYHicdVNNb9NAEHWTACW0tIUbXEa0VKkURUmECpdIRQUJJA6Fxnakbmyt15tkG3tt7a5TKst/khsHLvwS1o5dkn6sZHk+3pt5M/Z6ccCk6nZ/b9TqjUePn2w+bT7b2n6+s7v3wpJRIgg1SRREYuRhSQPGqamYCugoFhSHXkBtb36a5+0FFZJFfKiuYzoO8ZSzCSNY6ZC7V1scwggGYAI6Z9MQgw2IcRRiNfO89EfmpBwpFlIJfgYINQ8BKfpTiTC9YmoGGZgPwHnWBgSmMzQHFeNr5vI22LcI/k39nGA7Q3uV4LdLYQ+qWhNlU5jhBYVS6+lg5Azz8Vr/5ztyhqveEmgv/Vau9+gm6Qw11S5dp18Eqk7QOrc+HWXrbSxA3/TufQzWGvTzEgorSj9KwCCoTAJVqbUGdhtVFQY3XZFg05nCQkRXgIIi6V4OkMzT7qXTz9c2cqq6McGZO9daRmAV72pW7dju/IE1zu/7uAcVsZ0zDwALCkokXP871AddRTCiudGkQmrcARRUNaMwhwCLKZUKJOPTRDuwwEFCZQbNpru73+10iwN3jV5p7BvlOXN3fyE/IklIuSIBlvKi143VOMVCMRLQrIkSSWNM5nhKL7TJsR5rnBYXJIO3OuLDJBL64QqK6CojxaGU16Gnkfly5O1cHrwvd5GoyYdxynicKMrJstEkCUBFkN828JmgRAXX2sBEMK0VyAwLTJS+k/kSerdHvmtY/U7vuHP8/d3+Sb9cx6bx2nhjtIye8d44Mb4YZ4ZpkNqfer2+Vd+u/21sNnYae0tobaPkvDTWTuPVP5RHYSU=</latexit>

with U 2 Rn⇥n , U T U = In , W 2 Rd⇥d , W T W = Id , ⌃ 2 Rn⇥d


We have
C = X T X = (U ⌃W )T (U ⌃W )
W ⌃(U T U )⌃W T = W ⌃2 W T (SVD)
C = X T X = V ⇤V T (EVD)
As a result
V = W, ⇤ = ⌃2 ! j = 2
j, Xkpca = XVk = U ⌃k Wk 2 Rn⇥k
with ⌃k , Wk are truncated matrices of ⌃, W with the k largest singular values

Xavier Bresson 23
24

EVD or SVD
The choice depends on the value of the size (n × d) of the data matrix X :
For d < n : Use EVD, complexity is O(d3)
For n < d : Apply SVD, complexity is O(min(nd2,n2d))
Examples
MNIST dataset : 60,000 × 684 ⇒ Apply EVD
Microarray-based gene expression dataset[1] : 240 × 7,399 ⇒ Apply SVD

[1] Rosenwald et-al, The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma, 2002

Xavier Bresson 24
25

Lab 1 : (Standard) linear PCA


Run code01.ipynb :
Visualize the principal directions of a Gaussian distribution.
Plot the distribution of data variances and generate new faces.
Compute the 3D PCA embedding of MNIST.

Principal Directions of New faces generated PCA of MNIST


a Gaussian with PCA
Xavier Bresson 25
26

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 26
27

Robust PCA
Standard PCA is sensitive to outliers; even a single outlier can
significantly change the PCA solution.
Robust PCA[1] is a technique designed to separate outliers from the data,
allowing PCA to be performed on the clean part of the data. Emmanuel Candes Yi Ma

Noisy
Principal
Outlier directions

[1] Candes, Li, Ma, Wright, Robust principal component analysis, 2011 (8,000 citations)

Xavier Bresson 27
28

Task formalization
Lk2F s.t. rank(L) = k
<latexit sha1_base64="bYeFiS0eg4fPMPoSzn6nGk4ghc4=">AAACQHicbVBNSwMxEM36bf2qevQSLIIeXHaLVC+CIIiHHlSsLXTrkk1TDU2ySzIrlnV/mhd/gjfPXjwo4tWTaS3i14PAy3szzMyLEsENeN6DMzI6Nj4xOTVdmJmdm18oLi6dmTjVlNVoLGLdiIhhgitWAw6CNRLNiIwEq0fd/b5fv2La8FidQi9hLUkuFO9wSsBKYbEeSK7CrBpwFUgCl1GUneTnmQqAS2ZwO88DHNzgBt7EVUvCg/Mytgqwa9Ayw8YFF+dff01UN1+vbux2w2LJc70B8F/iD0kJDXEUFu+DdkxTyRRQQYxp+l4CrYxo4FSwvBCkhiWEdskFa1qqiF2vlQ0CyPGaVdq4E2v7FOCB+r0jI9KYnoxsZf9I89vri/95zRQ6O62MqyQFpujnoE4qMMS4nyZuc80oiJ4lhGpud8X0kmhCwWZesCH4v0/+S87Krl9xK8dbpb3yMI4ptIJW0Try0TbaQ4foCNUQRbfoET2jF+fOeXJenbfP0hFn2LOMfsB5/wDCQ68u</latexit>

Standard PCA : min kX


L2Rn⇥d

card(S) s.t. X = L + S 2 Rn⇥d (1)


<latexit sha1_base64="uoAZZvS+4MnuBObW5nJeK6l4evE=">AAAFtXicpVRbTxNBFF6kVaw30EdfjoJkG2FtiQFDYoLhxYfGVMulCVub2dkpnTA7s84FqGV/oW+++W88u9sGKGhMnGST2XPO953btxulghvbaPyauzNfqd69t3C/9uDho8dPFpeeHhjlNGX7VAmluxExTHDJ9i23gnVTzUgSCXYYnezm/sNTpg1Xcs+OUtZLyLHkA06JRVN/af7Haphw2R+31johl2FC7DCKxl+yr2MZWp4wA3GWhRBadm51MtZEnmR+qw6vIRSYJiaXPkp0nPmdOsClDUxgA8jQ0oX30ELYn9NgkN+sQxjWVqfwsyHTDOHdv4CmmbgBO2TgS8XNqA4xsQQQovl5llNi8quhBIQ6W8/bmQQhmFigJLVOs5KKCkbkG2OJjLE1aO9+AL+gNVY7msfVS+rODLVJiTbsdmLlrOC4kbJQZCgavjZEP6hPu8lfuSSC2xGoQWEqWdeABywoDNIlEdO5mwmWMGnN9dCZDHs5rUoipLVKcyJApThN/r2QBKRaoXiSYhVYxKf2+hBrWJshwZFgBmdAs2+O530RJJWWS6cKqyDnBd92Cfw3kV20wot+iAPX1/R10UFz839ktXG7rMKLkMbKFuyzOlpp9ZsrIJVOyh1fxpYFzsZLl8tFFwjwL9HTXRguj53AgFMiHDN1uLkWpZnFL1Pg+IwTdhs6SrhiKXkDmMcnIlHG1kvCqRMT5Mt6kfUXlxtBozhw89KcXJa9yWn3F3+GsaIu1wwVxJijZiO1vTHRWIZgWS10hqWEnpBjdoRXSXCivXHx18ngFVpiGCiNj0SB59ariDFJjBklEUbmezGzvtx4m+/I2cG73pjL1FkmaZlo4ARYBfkvDGLUG7VihBdCNcdagaJGCbX4WdVwCM3Zlm9eDjaC5maw+fnt8s7GZBwL3nPvped7TW/L2/E+em1v36OVjUq3QipRdavaq8bVQRl6Z26CeeZdO1X1G4MT4vY=</latexit>

Robust PCA : min rank(L) +


L,S2Rn⇥d

where X 2 Rn⇥d is the (noisy) data matrix


L is a low-rank matrix that captures the clean/standard PCA (data structure)
S is a sparse matrix that captures outliers (noise)
card(.) is the cardinality of the matrix, i.e. the number of elements of the matrix
The combinatorial optimization problem (1) is NP-hard,
and thus requires a continuous relaxation:
min kLk? + kSk1 s.t. X = L + S 2 Rn⇥d (2)
L,S2Rn⇥d

where k · k1 is the L1 norm


k · k? is the nuclear norm (L1 norm of the singular values)
Theoretical result: Solution (2) is (almost) the solution of (1) !

Xavier Bresson 28
29

Optimization algorithm
Alternating direction method of multipliers (ADMM) technique[1,2] :
Provides a fast, robust, and accurate solution to the relaxed problem (2).
The core idea is to decompose the problem into simpler sub-problems using Lagrangian multipliers.

kSk1 s.t. X = L + S 2 Rn⇥d (2)


<latexit sha1_base64="V7IFiiUyW9s/+Yt62w+qYddrB74=">AAAFLXicjVTbbtNAEHWbACXcWnjkZURFlSqJa0eoIKSgSkUIpDwU0qRRuom1djbJqr6k63VLcfaHeOFXEBIPRYhXfoOx6/QWWlgpznh25szZObO2xy4PpWEcz83n8jdu3lq4Xbhz9979B4tLD1thEAmHNZ3ADUTbpiFzuc+akkuXtceCUc922Y69t5ns7xwwEfLA35ZHY9b16NDnA+5QiS5rKbe5QjzuW3G93CDcJx6VI9uOP6he7EvusRD6ShEgkzqZWCSUVEAJiIsF+jRxN9BtAqAp2UcpvBhCXeqg0NOGGtQxegaXTIExqFhdBUIKK6f5hyPujICHwPYjfkBd5kuQgcqCplTLnStB/802dflDl0EHytCuFOulxioRJy7cHAjqxELFVaw6wZgKEJsPi3igCqSx6Ts+J9abXrWMpxCvjJTg9BDvfC45dfmntMvwEvtR78VezVC19pXEE6BGFtXJ/g3rQrsS+FTzpE2SqXhakIyiIYsHSqnCBR6SCSoZRKilC07g4ygMme+wlJJXM8vVsq7rWXP3I9pPeZZMVYMmjKzYXBOqSOppA1eh1du+TsvTETjkcoT4Tcgyk0QgQTKHTJ5ybrReKxXXFCrQ6HmlTs9bE+eZNKZMkEemIbI5kyJjioplqYko1/A7Q+5MkTER0wXMglam5f8DtmAtLhu6kS6YNczMWNaytWUtfiP9wIk8HG7HpWG4axpj2Y2pkNxxGWoYhWxMnT06ZLto+hSLdeNUeQVP0dOHQSDwh5cj9Z7PiKkXhkeejZEJ5fDyXuL8295uJAcvujH3x5HEGTkpNIhcvH6QfDqgzwVzpHuEBnUETrgDzojiZcEpC5MmmJePPGu0qrq5rq+/f7a8Uc3asaA91p5oRc3Unmsb2lttS2tqTu5z7mvuOPcj/yX/Pf8z/+skdH4uy3mkXVj5338A/IytqA==</latexit>

min kLk? +
L,S2Rntimesd

which is equivalent to
r
min kLk? + kSk1 + hZ, X (L + S)i + kZ X (L + S) k2F , r > 0
L,S,Z2Rn⇥d 2

Initialization : Lm=0 = X 2 Rn⇥d , S m=0 = Z m=0 = 0n⇥d f


Iterate until convergence : m = 1, 2, ...
SVD hµ (x)
Lm+1 = U h1/r (⇤)V T 2 Rn⇥d with U ⇤V T = X S m + Z m /r
S m+1 = h X Lm+1 + Z m /r 2 Rn⇥d µ µ
<latexit sha1_base64="czPJnzxMIB+2RAHt+ggSF/FebsM=">AAAB6nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSPQY8OIxonlAsoTZyWwyZB7LzKwQlnyCFw+KePWLvPk3TpI9aGJBQ1HVTXdXlHBmrO9/e4W19Y3NreJ2aWd3b/+gfHjUMirVhDaJ4kp3ImwoZ5I2LbOcdhJNsYg4bUfj25nffqLaMCUf7SShocBDyWJGsHXSQ0+k/XLFr/pzoFUS5KQCORr98ldvoEgqqLSEY2O6gZ/YMMPaMsLptNRLDU0wGeMh7ToqsaAmzOanTtGZUwYoVtqVtGiu/p7IsDBmIiLXKbAdmWVvJv7ndVMb34QZk0lqqSSLRXHKkVVo9jcaME2J5RNHMNHM3YrICGtMrEun5EIIll9eJa3LalCr1u6vKvWLPI4inMApnEMA11CHO2hAEwgM4Rle4c3j3ov37n0sWgtePnMMf+B9/gBaUo3L</latexit>

<latexit sha1_base64="4MPj2UwZk4AToZyy4OYiiDf2JvE=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBg5ZEpHosePFYwbSFNpTNdtMu3d2E3Y1QQv+CFw+KePUPefPfuGlz0NYHA4/3ZpiZFyacaeO6305pbX1jc6u8XdnZ3ds/qB4etXWcKkJ9EvNYdUOsKWeS+oYZTruJoliEnHbCyV3ud56o0iyWj2aa0EDgkWQRI9jk0mVfpINqza27c6BV4hWkBgVag+pXfxiTVFBpCMda9zw3MUGGlWGE01mln2qaYDLBI9qzVGJBdZDNb52hM6sMURQrW9Kgufp7IsNC66kIbafAZqyXvVz8z+ulJroNMiaT1FBJFouilCMTo/xxNGSKEsOnlmCimL0VkTFWmBgbT8WG4C2/vEraV3WvUW88XNeaF0UcZTiBUzgHD26gCffQAh8IjOEZXuHNEc6L8+58LFpLTjFzDH/gfP4Aw/WOAg==</latexit>

/r

Z m+1 = Z m + r X Lm+1 S m+1 2 Rn⇥d Shrinkage


operator

[1] Glowinski, Le Tallec, Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, 1989
[2] Boyd et-al, Distributed optimization and statistical learning via the alternating direction method of multipliers, 2011 (23,000 citations)

Xavier Bresson 29
30

Lab 2 : Robust PCA


Run code02.ipynb :
Visualize the principal directions and components of a noisy Gaussian distribution.
Compute the robust PCA solution and compare with the standard (noisy) solution.

Robust PCA (green) Robust principal


Noisy PCA (red) components

Xavier Bresson 30
31

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 31
32

Graph-based PCA
Lk2F s.t. rank(L) = k
<latexit sha1_base64="bYeFiS0eg4fPMPoSzn6nGk4ghc4=">AAACQHicbVBNSwMxEM36bf2qevQSLIIeXHaLVC+CIIiHHlSsLXTrkk1TDU2ySzIrlnV/mhd/gjfPXjwo4tWTaS3i14PAy3szzMyLEsENeN6DMzI6Nj4xOTVdmJmdm18oLi6dmTjVlNVoLGLdiIhhgitWAw6CNRLNiIwEq0fd/b5fv2La8FidQi9hLUkuFO9wSsBKYbEeSK7CrBpwFUgCl1GUneTnmQqAS2ZwO88DHNzgBt7EVUvCg/Mytgqwa9Ayw8YFF+dff01UN1+vbux2w2LJc70B8F/iD0kJDXEUFu+DdkxTyRRQQYxp+l4CrYxo4FSwvBCkhiWEdskFa1qqiF2vlQ0CyPGaVdq4E2v7FOCB+r0jI9KYnoxsZf9I89vri/95zRQ6O62MqyQFpujnoE4qMMS4nyZuc80oiJ4lhGpud8X0kmhCwWZesCH4v0/+S87Krl9xK8dbpb3yMI4ptIJW0Try0TbaQ4foCNUQRbfoET2jF+fOeXJenbfP0hFn2LOMfsB5/wDCQ68u</latexit>

Standard PCA : min kX


L2Rn⇥d

card(S) s.t. X = L + S 2 Rn⇥d


<latexit sha1_base64="mpnDo9BJJfBHtC/W4ODs6nBIGOk=">AAACdnicdVFNaxRBEO0Zv+L6tcaTCFJkiW6IDDMhRC9CwIuHPUTjJgvb61LT05s0290zdNeIyzA/wT/nzd/hxaO9m4Voog8aHu9V8aqr8korT2n6I4pv3Lx1+87G3c69+w8ePuo+3jzxZe2EHIpSl26Uo5daWTkkRVqOKifR5Fqe5vN3S//0i3RelfYTLSo5MXhm1UwJpCBNu99ecKPstBm8OubKcoN0nufNx/ZzYzkpIz0UbcuBk/xKzjQO7bztD3ZgF7gOKQVeegJd0faPdwAuNfAJJdAGZQRvYRDa/h8DnE+7vTRJV4DrJFuTHlvjaNr9zotS1EZaEhq9H2dpRZMGHSmhZdvhtZcVijmeyXGgFkPUpFmtrYXtoBQwK114lmCl/tnRoPF+YfJQuRzYX/WW4r+8cU2zN5NG2aomacVF0KzWQCUsbwCFclKQXgSCwqkwK4hzdCgoXKoTlpBd/fJ1crKXZAfJwYf93uHeeh0b7BnbYn2WsdfskL1nR2zIBPsZPY22ol70K34eb8cvL0rjaN3zhP2FOP0NUI29Ug==</latexit>

Robust PCA : min rank(L) +


L,S2Rn⇥d

kLkG s.t. X = L + S 2 Rn⇥d


<latexit sha1_base64="jqhM43lkcLRuOoIInXiSDXdPPiY=">AAAFUHictVRNb9NAEHWbFEr4auHIZURLVdQSJRUqCKlSUZHKIUKB9Euq02i83tSrrtdmd00buf6JXHrjd3DhAIKxkzRtSiQurGR7NG/nzey8WXuxFMbWat+mpkvlmVu3Z+9U7t67/+Dh3PyjPRMlmvFdFslIH3houBSK71phJT+INcfQk3zfO9nK8f0vXBsRqR3bi3k7xGMluoKhJVdnvtRdckOhOmljteUK5YZoA89LP2VHqXKtCLkBP8tccC0/szpMNaqTbLnxHFbAlZTGx05rhDLUfrbcuopuE3oODXp10oKdoUy3swwARnFgqrYKGXkOYIM2r8DkYsB1K0uXkacB1xwW3XOX+ZEdT7IIwgDCscY4ABNGkQ0UNwYs12F1jGkn4IAizCNsBFwFqBiH5tZbOBU2AB8tghGhkKiFFVSK5tRpw5XlPni9YZpx2qaOSIuC1lghJXxovgioTauAyieOz4kgFopW/BRYpKxQSZTk7BLPCpHe9Bn/TabzBvXANRb1SIPc3SJ3fZIsg1LfCZ11rmn0nxSalK+Qy5IOfcEIFyyQ3IKKdLgKPu/SlPuAZpiiOO1kvo0hYjXN7NEOjdwQbWRj40gjfXnWQvBRHQ2MJTKBKj//JIIJ3VBZlUqtVDpzC7VqrVhw06gPjAVnsJqduQvXj1gS0nwxicYc1muxbaeorWCSZxU3MTxGdoLH/JBMhZStnRY/hAyekceHbqTpURYK79WIFENjeqFHO/OazTiWO/+GHSa2+7qdChUnlivWT9RNZH5j8r8L+DTMzMoeGcjye8KA0bQjoxtn8ibUx49809hbq9bXq+sfXy5srg3aMes8cZ46y07deeVsOu+dprPrsNLX0vfSz9Kv8kX5R/n3zFR/6/Tg6zx2rq2Zyh/LJMr1</latexit>

Graph PCA[1] : min rank(L) + S card(S) + G


L,S2Rn⇥d

where k · kG is a graph smoothness term.


The aim is to enhance PCA with data similarities represented by a graph.
Problem is still NP-hard, and requires a new continuous relaxation:
min kLk? + kSk1 + G kLkDirG s.t. X = L + S 2 Rn⇥d
L,S2Rn⇥d

where k · kDirG is the graph Dirichlet norm, defined as


kLkDirG = tr(LT LG L) with the graph Laplacian LG 2 Rn⇥n .

[1] Shahid, Kalofolias, Bresson, Bronstein, Vandergheynst, Robust principal component analysis on graphs, 2015

Xavier Bresson 32
33

Optimization algorithm
ADMM technique :
kLkDirG s.t. X = L + S 2 Rn⇥d
<latexit sha1_base64="uO/qtF0cKry7bLea1TzWj7wqJy4=">AAAHtnictVVbj9NGFDZQEhpaWOhjX466KsoqF2K3oqhSJCRQaSVHgoYsgZ2NNXYmyYixHewx22WYf9invvXf9Mxkks2yXBZURnJ8ci7ffOcy43gpeCl7vX8vXLz01eVa/crXjavffHvt+s6Nm/tlXhUJGyW5yItxTEsmeMZGkkvBxsuC0TQW7Gn84r6xP33FipLn2RN5vGSHKZ1nfMYTKlEV3bj89y2S8ixSYXtIeEZSKhdxrP7UE5URyVNWwlRrAuRNSN5EpJS0gBYQgTtMaTQ0hiEa/C3lQ6OEEH8iRST7SxapesALHVnwhAr1UGtAJ2eDsiu7YDRj6GNcC97PBAhp3NpE8hLYy4q/ooJlEmS+Nq8Tag/+15QGXySlNnoN+uE5U6YlHDEhAN96O9H288jHJ/jkhD8/3ZZjJmg2FwyQALRh3GmGreEeKVZK9JkVNFFF5GsVaANr/DpAYj5vYnU6YP3t/z2z4W+TYMXEgQYIGnYGZwGDE8DAwnQGGwBDbE38j4xLTgV/bScefsW6hROV9nu6P/5gU4bOa+DeSHwjBU7qne6U2dKeSXW04JLpTfXIopozNdNaN05xk6ygkkGVSS4gyTM8qnOWJczSTPt+O2h3u11X6JcVnVruLV/3YQSLSPm3C90koW3bHuxPnnxgjE6m84jLBeKPwEWaQCC5uSeY3HAe7j/QWvU19nQ4SVsm/fQ2NnKbzXDNBrm46UFGJ811bLFtW+Gm1R8bd4s+WKNbwE3RdJRtzSpshjLUSGJrQu1GE9XxddPxaJnOGRbB+QisWm4pWP6YxzqD0/l11pX4hOxWU+TAAwceOPAT4MG5gRuNaGe31+3ZBWcF3wm7nluPop1/yDRPqhTvz0TQsjzwe0t5qGgheSIYDmtVsiVNXtA5O0Axo7jbobIjruFH1Exhlhf44P1rtdsRiqZleZzG6Gk4l2/bjPJdtoNKzu4eKp4tK4mHYbXRrBJ4w4P5hsGUFyyR4hgFmhR4vBNIFhSvBTxOpSmC/3bKZ4X9oOvf6d55/PPuvcCV44r3vfeD1/R87xfvnve798gbeUntp9qzWlxL6nfrkzqrz1euFy+4mO+8U6u+/A/vrosK</latexit>

min kLk? + S kSk1 + G


L,S2Rn⇥d

is equivalent to
min kLk? + S kSk1 + G kM kDirG s.t. X = L + S 2 Rn⇥d , M = L 2 Rn⇥d
L,S,M 2Rn⇥d

as well as min kLk? + kSk1 + G kM kDirG +


L,S,M,Z1 ,Z2 2Rn⇥d
r1 r2
hZ1 , X (L + S)i + kZ1 X (L + S) k2F + hZ2 , L Mi + kZ2 (L M )k2F
2 2
Initialization : Lm=0 = X 2 Rn⇥d , S m=0 = M m=0 = Z1m=0 = Z2m=0 = 0n⇥d f
Iterate until convergence : m = 1, 2, ...
SVD
Lm+1 = U h1/r (⇤)V T 2 Rn⇥d with U ⇤V T = X S m + Z1m /r1
S m+1 = h /r X Lm+1 + Z1m /r1 2 Rn⇥d
1
M m+1 = In + G LG (Lm+1 + Z2m /r2 ) 2 Rn⇥d
Z1m+1 = Z1m + r1 X Lm+1 S m+1 2 Rn⇥d
Z2m+1 = Z2m + r2 Lm+1 M m+1 2 Rn⇥d

Xavier Bresson 33
34

Application to video surveillance


Separate the background from moving objects :

= +
<latexit sha1_base64="DTUlsyaJ4p6f5z9ImquJubKZhhk=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxClEcCGzI79MLI7OxmZtaEEL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eQ2Nre2d/K7hb39g8Oj4vFJS8epYthksYhVJ6AaBZfYNNwI7CQKaRQIbAfj27nffkKleSwfzCRBP6JDyUPOqLFS475fLLlldwGyTryMlCBDvV/86g1ilkYoDRNU667nJsafUmU4Ezgr9FKNCWVjOsSupZJGqP3p4tAZubDKgISxsiUNWai/J6Y00noSBbYzomakV725+J/XTU1440+5TFKDki0XhakgJibzr8mAK2RGTCyhTHF7K2EjqigzNpuCDcFbfXmdtCplr1quNq5KtUoWRx7O4BwuwYNrqMEd1KEJDBCe4RXenEfnxXl3PpatOSebOYU/cD5/AKznjNI=</latexit>

X L S
<latexit sha1_base64="13Ypezsn2gD69bifRehFco5ypPU=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU7A5KZbfiLkDWiZeTMuRoDEpf/WHM0gilYYJq3fPcxPgZVYYzgbNiP9WYUDahI+xZKmmE2s8Wh87IpVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGtn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNkUbgrf68jppVyterVJrXpfr1TyOApzDBVyBBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwBtHuM1w==</latexit> <latexit sha1_base64="PM/Pd0KcKkbCWAJ3IqTBX1p3efM=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe6CRMuAjYVFAuYDkiPsbeaSNXt7x+6eEEJ+gY2FIrb+JDv/jZvkCk18MPB4b4aZeUEiuDau++3kNja3tnfyu4W9/YPDo+LxSUvHqWLYZLGIVSegGgWX2DTcCOwkCmkUCGwH49u5335CpXksH8wkQT+iQ8lDzqixUuO+Xyy5ZXcBsk68jJQgQ71f/OoNYpZGKA0TVOuu5ybGn1JlOBM4K/RSjQllYzrErqWSRqj96eLQGbmwyoCEsbIlDVmovyemNNJ6EgW2M6JmpFe9ufif101NeONPuUxSg5ItF4WpICYm86/JgCtkRkwsoUxxeythI6ooMzabgg3BW315nbQqZa9arjauSrVKFkcezuAcLsGDa6jBHdShCQwQnuEV3pxH58V5dz6WrTknmzmFP3A+fwCiS4zL</latexit> <latexit sha1_base64="Vn8VtFou3m+vN9OfNvL6Sq3aGb0=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoMgCGE3SPQY8OIxAfOAZAmzk95kzOzsMjMrhJAv8OJBEa9+kjf/xkmyB00saCiquunuChLBtXHdbye3sbm1vZPfLeztHxweFY9PWjpOFcMmi0WsOgHVKLjEpuFGYCdRSKNAYDsY38399hMqzWP5YCYJ+hEdSh5yRo2VGlf9YsktuwuQdeJlpAQZ6v3iV28QszRCaZigWnc9NzH+lCrDmcBZoZdqTCgb0yF2LZU0Qu1PF4fOyIVVBiSMlS1pyEL9PTGlkdaTKLCdETUjverNxf+8bmrCW3/KZZIalGy5KEwFMTGZf00GXCEzYmIJZYrbWwkbUUWZsdkUbAje6svrpFUpe9VytXFdqlWyOPJwBudwCR7cQA3uoQ5NYIDwDK/w5jw6L86787FszTnZzCn8gfP5A3BHjKo=</latexit>

<latexit sha1_base64="sXkPdMgnk0WN0sbN5HKwHckcWpo=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoPgKewGiV6EgBePCZgHJEuYnfQmY2Znl5lZIYR8gRcPinj1k7z5N06SPWhiQUNR1U13V5AIro3rfju5jc2t7Z38bmFv/+DwqHh80tJxqhg2WSxi1QmoRsElNg03AjuJQhoFAtvB+G7ut59QaR7LBzNJ0I/oUPKQM2qs1LjtF0tu2V2ArBMvIyXIUO8Xv3qDmKURSsME1brruYnxp1QZzgTOCr1UY0LZmA6xa6mkEWp/ujh0Ri6sMiBhrGxJQxbq74kpjbSeRIHtjKgZ6VVvLv7ndVMT3vhTLpPUoGTLRWEqiInJ/Gsy4AqZERNLKFPc3krYiCrKjM2mYEPwVl9eJ61K2auWq42rUq2SxZGHMziHS/DgGmpwD3VoAgOEZ3iFN+fReXHenY9la87JZk7hD5zPH4uPjLw=</latexit>

Xavier Bresson 34
35

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 35
36

Non-linear visualization techniques


All non-linear dimensionality reduction techniques follow a two-step approach:
Construct a k-nearest neighbor (kNN) graph G from the n high-dimensional data points
{xi} ∈ Rd, d ≫ 1.
Compute a low-dimensional embedding of the graph that preserves
The geometric distance between neighboring data points, i.e. the graph structure, and
Additional properties specific to the chosen dimensionality reduction technique, s.a.
physical forces.

Low-dim
k-NN graph xi
embedding '
construction '(xi ) '(xj )
xj

+
+

n⇥n
d
G = {V, E, A}, A 2 R d
R , d
<latexit sha1_base64="kj8FWojcQGJddEc1jnNgxcUA/ek=">AAACEnicbVBNS8NAEN3Ur1q/oh69LBZBoZSkFPUitIjosYqthSaWzXbbLt1swu5GKCG/wYt/xYsHRbx68ua/cdPmoK0PBh7vzTAzzwsZlcqyvo3cwuLS8kp+tbC2vrG5ZW7vtGQQCUyaOGCBaHtIEkY5aSqqGGmHgiDfY+TOG52n/t0DEZIG/FaNQ+L6aMBpn2KktNQ1jy7PnLhVuijVnaQEHVh3KHd8pIaeF98k9zF3FPWJhDzpmkWrbE0A54mdkSLI0OiaX04vwJFPuMIMSdmxrVC5MRKKYkaSghNJEiI8QgPS0ZQjvceNJy8l8EArPdgPhC6u4ET9PREjX8qx7+nO9Fo566Xif14nUv1TN6Y8jBTheLqoHzGoApjmA3tUEKzYWBOEBdW3QjxEAmGlUyzoEOzZl+dJq1K2j8vV62qxVsniyIM9sA8OgQ1OQA1cgQZoAgwewTN4BW/Gk/FivBsf09ackc3sgj8wPn8Ahrycxg==</latexit>

V = {x1 , ..., xn } 2 R Rk , k ⌧ d
<latexit sha1_base64="2CDGn9purteYu7gPs5J7+cCMH44=">AAACAnicbVDLSsNAFJ34rPEVdSVuBovgQkpSirosuHFZxT6giWUymbRDJpMwMxFKKG78FTcuFHHrV7jzb5y0WWjrgQuHc+7l3nv8lFGpbPvbWFpeWV1br2yYm1vbO7vW3n5HJpnApI0TloiejyRhlJO2ooqRXioIin1Gun50VfjdByIkTfidGqfEi9GQ05BipLQ0sA7dGKmR7+e3k/voDLowgi5jMDDNgVW1a/YUcJE4JamCEq2B9eUGCc5iwhVmSMq+Y6fKy5FQFDMyMd1MkhThCA1JX1OOYiK9fPrCBJ5oJYBhInRxBafq74kcxVKOY193FgfLea8Q//P6mQovvZzyNFOE49miMGNQJbDIAwZUEKzYWBOEBdW3QjxCAmGlUytCcOZfXiSdes05rzVuGtVmvYyjAo7AMTgFDrgATXANWqANMHgEz+AVvBlPxovxbnzMWpeMcuYA/IHx+QOnEJWa</latexit>

Xavier Bresson 36
37

Non-linear visualization techniques


We will explore the following non-linear visualization techniques:
LLE[1] and Laplacian Eigenvectors[2], which are spectral techniques.
TSNE[3], a technique based on probability matching.
UMAP[4], which uses a physics-based approach.

[1] Roweis, Saul, Nonlinear dimensionality reduction by locally linear embedding, 2000 (18,000 citations)
[2] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003 (10,000 citations)
[3] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008 (46,000 citations)
[4] McInnes et-al, UMAP: Uniform manifold approximation and projection for dimension reduction, 2018 (13,000 citations)

Xavier Bresson 37
38

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 38
39

LLE
Locally Linear Embedding[1](LLE) was one of the pioneering non-linear
visualization techniques.
Sam Roweis Lawrence Saul
It involves three key steps: 1973-2010

First, construct a k-nearest neighbor graph G the high-dimensional data distribution


{xi} ∈ Rd, d ≫ 1.
Second, approximate the high-dimensional data distribution as a manifold M discretized
with local linear patches, i.e. a data point xi and its neighbors {xj}j∈Ni lie on a locally linear
patch of M.
Third, project the high-dimensional data points {xi} ∈ Rd into a low-dimensional Euclidean
space {zi} ∈ Rk, k ≪ d by preserving data proximity, i.e. if data i close to data j then zi
should be similar to zj.

[1] Roweis, Saul, Nonlinear dimensionality reduction by locally linear embedding, 2000

Xavier Bresson 39
40

Algorithm
Step 1 : Compute a k-nearest neighbor graph G.
For each data xi, we identify its k nearest neighbors {xj}j∈N(i).
Then, we compute the adjacency matrix A of the graph :

<latexit sha1_base64="GssJpZernZlqG/DIIZuzNpcMOEg=">AAADk3icbVLbbhMxEN0mXMpyS0E88TKiIUqkdpVEKCAhRLg88ACoSKStVKcrr3c2ceK1V7bTJFrtB/E7vPE3eJNQtSmWdnU0c2bm+IyjTHBj2+0/O5Xqrdt37u7e8+8/ePjocW3vybFRM81wwJRQ+jSiBgWXOLDcCjzNNNI0EngSTT+V+ZML1IYr+dMuMxymdCR5whm1LhTu7fxqfAhzPingHfhEYGJJ7pMIR1zmVGu6LHIhCp/gImsekkRTlhOLC6vTPHbqiuYi5Itw0jrvFjkxfJRSh1rQgH8s4AkUMCFckpTaMaMi/16E0yZvtYAQv32VquwY9ZwbhHKijDcKfKL5aGwDR29ccueOilBfz6wDN+CKwbj+CBnVNEWL+gBiTJwzMWAwCpyOay3ouiZFKqG8DJUMQSVAhYDpoR2DRDc3UtoE25UydqOv+UBYrOzB6t+6lHPZNUI7R5SroJ0riKmlcIHMuuYH27pWWutfz7t1kEqnG91+WNtvB+3VgZugswH73uYchbXfJFZslqK0TFBjzjrtzA6dqZYzgc7WmcGMsikd4ZmD0nlmhvnqTRXw0kViSJR2n7Swil6tyGlqzDKNHLPcq9nOlcH/5c5mNnkzzLnMZhYlWw9KZgKsgvKBOs+080UsHaBMc6cV2NgtlLl9mtKEzvaVb4LjbtDpBb0fr/b73Y0du95z74XX9Drea6/vffGOvIHHKrVKr/K+0q8+q76tfqx+XlMrO5uap961U/32F8kDH9E=</latexit>

(
dist(xi xj )2
Aij = exp( 2 ) if j 2 Nk (i))
0 otherwise
where is the scale parameter, defined e.g. xi
<latexit sha1_base64="w9BAGsLFuSxviSOQepGIeKXJY/g=">AAAB/3icbVDLSsNAFL3xWesrKrhxEyxC3ZSkSHVZcONKKtgHNCFMptN27GQSZiZiiV34K25cKOLW33Dn3zhps9DWAwOHc+7lnjlBzKhUtv1tLC2vrK6tFzaKm1vbO7vm3n5LRonApIkjFolOgCRhlJOmooqRTiwICgNG2sHoMvPb90RIGvFbNY6JF6IBp32KkdKSbx4++Hcu5W6I1BAjll5P/FGZnvpmya7YU1iLxMlJCXI0fPPL7UU4CQlXmCEpu44dKy9FQlHMyKToJpLECI/QgHQ15Sgk0kun+SfWiVZ6Vj8S+nFlTdXfGykKpRyHgZ7Mcsp5LxP/87qJ6l94KeVxogjHs0P9hFkqsrIyrB4VBCs21gRhQXVWCw+RQFjpyoq6BGf+y4ukVa04tUrt5qxUr+Z1FOAIjqEMDpxDHa6gAU3A8AjP8ApvxpPxYrwbH7PRJSPfOYA/MD5/AN4ilfY=</latexit>

xj 2 Nk (i) M
as the mean distance of all k-th neighbors.
and dist(·, ·) is the distance between the two data vectors,
e.g. L2 norm. <latexit sha1_base64="RotrUxR+GlX9YVZ/QG8JAlilxv0=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoPgKewGiR4DHvSYgHlAsoTZSW8yZnZ2mZkVQsgXePGgiFc/yZt/4yTZgyYWNBRV3XR3BYng2rjut5Pb2Nza3snvFvb2Dw6PiscnLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb+d++wmV5rF8MJME/YgOJQ85o8ZKjbt+seSW3QXIOvEyUoIM9X7xqzeIWRqhNExQrbuemxh/SpXhTOCs0Es1JpSN6RC7lkoaofani0Nn5MIqAxLGypY0ZKH+npjSSOtJFNjOiJqRXvXm4n9eNzXhjT/lMkkNSrZcFKaCmJjMvyYDrpAZMbGEMsXtrYSNqKLM2GwKNgRv9eV10qqUvWq52rgq1SpZHHk4g3O4BA+uoQb3UIcmMEB4hld4cx6dF+fd+Vi25pxs5hT+wPn8AZq3jMY=</latexit>

Xavier Bresson 40
41

Algorithm
Step 2 : Compute linear patches.
Find the weights Wij ∈ [0,1] which best linearly reconstruct xi from its neighbors :
n n n
1X X X
<latexit sha1_base64="+XRrhmNnSUCQe+Ei2CuegH8/VLw=">AAAGXnictVRdTxNBFF2QFiwioC8mvtxIJZBA020MGpMmGGPik0EDlMi2m9nplA7Mzi47s3xknT/pm/HFn+Kd3YXYFoIkOg/N7P0495x7byeIBVe62fwxNf1gplKdnXtYm3+08HhxafnJvorShLI9GokoOQiIYoJLtqe5FuwgThgJA8E6wcl76++csUTxSO7qy5h1Q3Ik+YBTotHkL8/oVfBCLv2s43HphUQPgyD7YnqZ9DQPmQJpjAfeICE0c03WMp5KQz/jbdf0JHgBP/K+wYXPYRMKz3Hh6WDMsXmX/174x2Wk3+q1AOE0u9BJmIFq6AYYa5nMbbu2bpQQIYCD59VWr/M+nKb8jAgmtbjcsPn/VsEVDiUi+2R8bkpGf6/jNoSbNN2Pe0H3TqL3VtO+dxFYK8ogzvp/aEuJ0WHQZwk/Y6CHDFQkUru4EA3wmysQjCi9qU5TkjCIkwj3PgREgkhiItEE4ohLDXXkWoe3ZhT8nOsh1L8WHK7VjI2iX47CNXUgsm/jeTvX7vqZRBHrvV1MXCu/7MpvTCrFoIPxGfu8hO4j9Bi1vJK0pFGllS7TMGCJFZ4rUsAl1Edq1McgAHtHiYSEbZ4nXBcdvOoRUWZk/5DLbezcm3cQU3q72Iu7Zp/Hlc1pu7ZozV9aaTaa+YHJi1teVpzy7PhL371+RNMQ//FUEKUO3WasuxlJNKeCmZqXKhYTekKO2CFeJUHe3Sx/Hg28REs/X4pBhKuQW//MyEio1GUYYKTVr8Z91niT7zDVgzfdjMs41UzSotAgFaAjsG8t9HnCKD5ReCEUJ8Ap0CHBPmp8kW0T3HHJk5f9VsPdamx9frWy3SrbMec8d144a47rvHa2nY/OjrPn0JmflalKrTJf+VWtVheqi0Xo9FSZ89QZOdVnvwE5/yee</latexit>

2
min xi Wij Aij xj 2 s.t. Wij = 1 8i
W 2Rn⇥n 2 i=1 j=1 j=1
n
1X X 2 X
Equivalently, min xi Wij xj 2 s.t. Wij = 1 8i
W 2Rn⇥n 2 i=1
j2Ni j2Ni
1 X X 2 1 X 2 X xi M
min Wij xi Wij xj 2 = Wij (xi xj ) 2
s.t. Wij = 1 8i
W 2Rn⇥n 2 2
j2Ni j2Ni j2Ni j2Ni

We derive the solution of this least-square problem for one data point xi :
x j 2 Nx i
d⇥1 T T ni ⇥d
with Zij = xi xj 2 R and Zi = (xi 1ni ) (1ni Ai,j2Ni ) X 2 R
and ni is the number of points in Ni
We can re-write the problem as
1 T 2 T
min W i Z i 2 s.t. Wi 1ni = 1
Wi 2Rni ⇥1 2

Xavier Bresson 41
42

Algorithm
Step 2 : Compute linear patches.
Find the weights Wij ∈ [0,1] which best linearly reconstruct xi from its neighbors :

1
<latexit sha1_base64="KNofEc6L3DwXu4fuk8n3RDWiMzA=">AAAFonicjVRbT9swFA7Qbqy7ANvjHnY0OgQCqqaa2IRUCYndmHYBxk2QNnIct7VwnMx2ClWW/7Xfsbf9m9lJWyCwDT+dHJ/vO9/57NiLGJWqXv89MTlVKt+5O32vcv/Bw0czs3OPD2QYC0z2cchCceQhSRjlZF9RxchRJAgKPEYOvdNNs3/YJ0LSkO+pQURaAepy2qEYKZ1y56Z+wgI4AeVucuhSh3InQKrneclu2k64zigaEAl2mjrgdATCiZ0mjdTxaNf5ARrS3jt2KeTfbqPdAF2nyLkSQQKypmqQ6kxWZ7uGMG3a4DiVhYuqT6grEO8SCGKmqB6aCFAE9zj9HhNYT28lb8VhemgfFYpurXoZxgSweFnuqr1U0PuGCNrX9vUJnNWEGbGqAVVY16NmSNDcx1lwhXZI2N5r1o1Ju7TbU0iI8Cw3CJqwOAIutZNVO70G/b+QMSSXA1cnuaHxGNAc+zTWWdQzklHU8Y5yxNhgBVSPgAxZbK4WdEIBPlIIqudGDpW5PXrMvFOBfMR9i/b/uAYFZZ+REvQcqpvNEdVfsYY5U4n1LMTPZsGhEIRlfwroad4LFECQURYbbXGI9FCKYrICCGSgSYD6hOtfcjDEGHLk+5rbWMPjQB+fbgZSIY8yU2eO7ELqstNHgkSSspCPGm2l45vgzs7Xa/VswfXAHgbz1nBtu7O/HD/Eui1XmCEpT+x6pFoJElo1I2nFiSWJED5FXXKiQ460La0ke2JSeKEzufBOyBVk2cuIBAVSDgJPVxp7ZXHPJG/aO4lV53UroTyKFeE4b9SJGagQzHsFPhUEKzbQAcKCaq2Ae8h4rV+1ijbBLo58PTho1Oy12trOy/mNxtCOaeup9dxatGzrlbVhfbC2rX0Ll56V3pa+lL6Wq+WP5Z3yt7x0cmKIeWJdWWXnD+qN22w=</latexit>

2
min WiT Zi 2
s.t. WiT 1ni = 1
Wi 2R n ⇥1
i 2
1 2
Lagrange multiplier technique : min WiT Zi 2
+ T
i (Wi 1ni 1)
n ⇥1
Wi 2R i , i 2R 2
Derivative w.r.t. Wi : WiT Zi ZiT + T
= 0 ) Wi = (Zi ZiT )
i 1ni
1
i 1ni
xi M
1
Derivative w.r.t. i : WiT 1ni 1=0 ) i= T
1ni (Zi ZiT ) 1 1ni
(Zi ZiT ) 1 1ni ni ⇥1
Finally, the solution for data xi is Wi = T T
2 R x j 2 Nx i
1ni (Zi Zi ) 1 1ni
Matrix C = Zi ZiT 2 Rni ⇥ni is called the correlation or Gram matrix
In practice, a small identity matrix is added for numerical stability : C = Zi ZiT + "Ini

Xavier Bresson 42
43

Algorithm
Step 3 : Compute the low-dim embedding data zi with the weights Wij :
Find the coordinates zi ∈ Rk which best linearly reconstruct zi from its neighbors :
n
X n
X
<latexit sha1_base64="NpXFGRHcPdrFSd6KpNsSd8s0lA4=">AAAEgXicpVNdb9MwFM3aAmN8bINHXq4YTKvURkmFBhKqNIkPwQPSYO06pW4tx3Vbr4lTxc5Ym/l38L94488g7DUDujEJhCVLR+eee8/1tR1OIy6V531bKZUrN27eWr29dufuvfvrG5sPDmWSpZS1aRIl6VFIJIu4YG3FVcSOpikjcRixTjh5ZeOdE5ZKnoiWmk1ZLyYjwYecEmUovFn6sg0o5gLnQbM7x37Ndd3aHIse4gLFRI3DMP+k+7lAisdMwkRrBEhmMc5509d9ASjkI3QGc8yhXkSOF5GO0RzrOT4uNLjRb4DJVuxUpXEO0lUuaMME/ZaPRdPDomZx0LyQvNfYGKC17Z9JrTEDmUSZ7R64hBE/YQLCGbw5fO3qQrs4j230LKh3AuP81jpf8r3O6Vd2EVOp3oEdW6nabxUAqv9ZL7ClltR16PRb1StcFYJ/MCuYg2tGBMkQlBmhudmUn5o6H6D5t22gxD4kpvKLkKmndd7UAG10wEcxgbY51FIfn7ka226NSxvniA4SVVu8sYm5rOuemK2BN7Y81ztfcBX4BdhyirWPN76iQUKzmAlFIyJl1/emqpeTVHEaMb2GMsmmhE7IiHUNFMR49fLzH6ThqWEGMExSs4WCc/b3jJzEUs7i0Chtx/JyzJJ/inUzNXzRy7mYZooJujAaZhGoBOx3hAFPGVXRzABCU256BTomKaHKzNoOwb985KvgsOH6u+7ux2dbe41iHKvOI+exs+P4znNnz3nn7Dtth5a+l5+U62W3Uq5UK16l0JZWipyHztKqvPwB0i1vxQ==</latexit>

2
min zi Wij zj 2
s.t. Z T 1n = 0n , Z T Z = In
Z=[z1 ,...,zn ]2Rn⇥k
i=1 j=1 <latexit sha1_base64="KIRH04wdjXlWVk2mmZ7McfdM2F8=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpl262YTdiVBDf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJFIYdN1vZ219Y3Nru7BT3N3bPzgsHR23TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg94CThfkSHSoSCUbTS/VO/2i+V3Yo7B1klXk7KkKPRL331BjFLI66QSWpM13MT9DOqUTDJp8VeanhC2ZgOeddSRSNu/Gx+6pScW2VAwljbUkjm6u+JjEbGTKLAdkYUR2bZm4n/ed0Uw2s/EypJkSu2WBSmkmBMZn+TgdCcoZxYQpkW9lbCRlRThjadog3BW355lbSqFa9Wqd1dluvVPI4CnMIZXIAHV1CHW2hAExgM4Rle4c2Rzovz7nwsWtecfOYE/sD5/AEPAo2e</latexit>

z2
The solution is given by EVD. <latexit sha1_base64="06xFCbA50QVf48RS/PWsi9J2PiQ=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpF262YTdjVBDf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ThVDJssFrHqBFSj4BKbhhuBnUQhjQKB7WB8M/Pbj6g0j+WDmSToR3QoecgZNVa6f+p7/VLZrbhzkFXi5aQMORr90ldvELM0QmmYoFp3PTcxfkaV4UzgtNhLNSaUjekQu5ZKGqH2s/mpU3JulQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeO1nXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtO0YbgLb+8SlrViler1O4uy/VqHkcBTuEMLsCDK6jDLTSgCQyG8Ayv8OYI58V5dz4WrWtOPnMCf+B8/gANfo2d</latexit>

z1
min kZ W Zk2F s.t. Z T Z = In
Z
min tr((Z W Z)T (Z W Z)) s.t. Z T Z = In
Z
min tr(Z T (In W T )(In W )Z) s.t. Z T Z = In
Z
EVD
Solution is given by EVD of the matrix M = (In W T )(In W ) = U ⌃U T
with Z = U·,1,...,k 2 Rn⇥k

Xavier Bresson 43
44

Lab 3 : LLE
Run code03.ipynb :
Compute the LLE solution for the Swiss Roll dataset.
Visualize the MNIST dataset with the LLE technique.

2D and 3D LLE 2D and 3D LLE


Swiss Roll dataset
Xavier Bresson of Swiss Roll dataset of MNIST dataset 44
45

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 45
46

Laplacian eigenmaps
Laplacian eigenmaps technique[1] was one of the first non-linear visualization
techniques grounded in mathematical theory.
Misha Belkin Partha Niyogi
It is based on the manifold assumption, i.e. the data distribution is sampled 1967-2010
from a smooth and continuous manifold M.
Since the manifold cannot be directly observed, it is approximated using a k-nearest neighbor graph.

M M M
+

+
Data G
points

[1] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003

Xavier Bresson 46
47

Spectral analysis and differential geometry


Mathematical tools have been developed to analyze smooth, continuous manifolds[1].
The eigenfunctions vk ∈ Rd of the continuous Laplace-Beltrami operator ΔM can serve as
embedding coordinates for M.
The discretization of ΔM provides the graph Laplacian L (but it is not unique, i.e. multiple
definitions exist).
v2
v2
v1 '
⇒ v1
Unwrap

M Unwrap M

[1] Chung, Spectral graph theory, 1997 (11,000 citations)

Xavier Bresson 47
48

Task formalization
Let us begin with a simple 1D dimensionality reduction.
The goal is to map a given graph G = (V,E,A) onto a line with the constraint that neighboring
data points on G remain as close as possible on the line.
To achieve this, we can design a loss function that computes the mapping y = φ(x) such that :
X
<latexit sha1_base64="t3Xq1VJVmkT/LUcbYJ7bXCsytdc=">AAADB3icbVLLbtQwFHXCq4RHp7BESFeMGM1IZZSMUGFTqYgNK1QQ01YaD5HjOBOnjhPZTjtRlB0bfoUNCxBiyy+w42/wPARDy5VsHZ17z33ZUSm4Nr7/y3GvXL12/cbWTe/W7Tt3tzs79450USnKxrQQhTqJiGaCSzY23Ah2UipG8kiw4+j05cJ/fMaU5oV8Z+qSTXMykzzhlBhLhTvOwx7gnMuwqTGXOCcmjaLmbdtiwLrKw4Zn7Yvl3a9D/qQOs8H70S5gw+ZG5Q2cc5NCC9a3j8+IKlPen4d8sGuZbIPJBhsaImMrAYy9Hqxy73tYsMTgxsMRm3HZEKVI3TZCtF4Avb9SnlhlBrZVWPZKiWhetyG3ufzNuMKkTJ1zzaD1MJPxOqGHFZ+lZgj4TyjRQGCmSJkCiTNCmaQ12NSKz1vPCztdf+gvDS6DYA26aG2HYecnjgta5UwaKojWk8AvzdSWN5wKZhuoNCsJPSUzNrFQkpzpabN8xxYeWyaGpFD2SANLdlPRkFzrOo9s5GJ4fdG3IP/nm1QmeT5tuCwrY+dbFUoqAaaAxaeAmCtGjagtIFRx2yvQlChCjf06iyUEF0e+DI5Gw2BvuPfmafdgtF7HFnqAHqE+CtAzdIBeoUM0RtT54Hxyvjhf3Y/uZ/eb+30V6jprzX30j7k/fgMVbPQE</latexit>

min Aij (yi yj )2 , with yi = '(xi ), yj = '(xj ), and


y2R
ij

1 if j 2 Ni
Aij = as a graph adjacency matrix
0 otherwise

yi = '(xi )
xi '
)
Graph G Line R1

Xavier Bresson 48
49

Task formalization
<latexit sha1_base64="7DClLs2RCRZW8Ss+wjWt+UihATw=">AAACSXicbVC7TsMwFHXKu7wKjCwWFRJIUCUVAkYQCyMgCkhNCY7rUhfbiewbIIryeyxsbPwDCwMIMeG0HXgdyfLROfdeX58wFtyA6z47pZHRsfGJyany9Mzs3HxlYfHMRImmrEEjEemLkBgmuGIN4CDYRawZkaFg5+HNQeGf3zJteKROIY1ZS5JrxTucErBSULnyJVdBlvpc+ZJANwyzk/xS5T72TSKDjPdyvD+41tKAb6ZBb/2yvoGtD+wetMzwHYcuzrF1N6yLf0wql4NK1a25feC/xBuSKhriKKg8+e2IJpIpoIIY0/TcGFoZ0cCpYHnZTwyLCb0h16xpqSKSmVbWTyLHq1Zp406k7VGA++r3joxIY1IZ2spiR/PbK8T/vGYCnd1WxlWcAFN08FAnERgiXMSK21wzCiK1hFDN7a6YdokmFGz4RQje7y//JWf1mrdd2z7equ7Vh3FMomW0gtaQh3bQHjpER6iBKHpAL+gNvTuPzqvz4XwOSkvOsGcJ/UBp5At8pbKD</latexit>

X
Let us analyze the loss function : minn Aij (yi yj )2 , with yi , yj 2 R
y2R
ij
When Aij ≈ 1, i.e. xi is close to xj, then minimizing the loss encourages yi to be similar to yj.
When Aij ≈ 0, i.e. xi is far from xj, then minimizing the loss allows yi to differ significantly
from yj.
In summary, minimizing this loss ensures that data points that are close in the high-dim space
remain close in the low-dim space, satisfying the first key property of dimensionality reduction
techniques.
Observe that the non-linear mapping φ is never explicitly computed.

yi = '(xi )
xi '
)
Graph G Line R1

Xavier Bresson 49
50

Task formalization
Finally, the loss function can be reformulated in terms of the Laplacian operator :
X X
<latexit sha1_base64="NjD4BSW3OUKVr9NNl1anSI5frUM=">AAAEqnicdVNdb9MwFM3WAqN8bfDIyxULY5PSqpnQQEiVNjEkkIo00NYNNW3kOG7jzbEj2+nIovw3fgNv/BuctOvWDvwQ3dxz7se51w4SRpVut/+srNbq9+4/WHvYePT4ydNn6xvPe0qkEpMTLJiQZwFShFFOTjTVjJwlkqA4YOQ0uPhY4qcTIhUV/FhnCRnEaMzpiGKkjcvfWP21BV5MuZ9nQ+5R7sVIR0GQfy8KDzyVxn5Oz4uD6rud+bSZ+ec7w13o3IDd6mswAxl/NjyGLmTgeY0tT5OfWsY5XFIdQWH8HTiEJhzAInwcEUg51c0JwVpIwIIrLRHl2gHaIi3QhiCkjsRYcMSozm5RwDYls45rO8VCUjQRNFRVqJZ0QhEDJVha6jYhnbZd3OlhKanrc0NzrjOhKY4MOk+0HZhWCNdEUj42QfbOYtYvHMycYiQzBy4JRGhCqo7mlUgIItE0plfVRiCRwuwuhg9mXGWm/29nPmiYV1Mt3SrjYDoRZ2pVMpb2EVEc3aigCsZ0QjgYNZ96hyBGYHftmx66kPpuuXNmblaIjF3+LzQ05A4srtues+2yQKlaxYgxojRwwZtXRAogdEz4BLGUzIo617IXclGtZtTpBbFNfbtiNvz1zXarXR24a7gzY9OanSN//bcXCpzGZmuYIaX6bjvRgxxJTTEjRcNLFUkQvkBj0jcmRzFRg7x6agW8Np4QRqaFkTD3oPLejshRrFQWB4ZZjkYtY6XzX1g/1aP3g5zyJNWE42mhUcpACyjfLYRUGuksMwbCkppeAUdIImyuniqH4C5Lvmv0dlvuXmvv29vN/d3ZONasl9Yra9tyrXfWvvXZOrJOLFx7U/ta69VO6079e/1HvT+lrq7MYl5YC6ce/gVpp379</latexit>

2
min
n
Aij (yi yj ) = Lij yi yj = y T Ly
y 2R
ij ij

with L = D A
The unit-vector constraint, i.e. the orthogonality constraint y T y = 1,
avoids the trivial solution y = 0
The constraint y T 1n = 0, avoids a constant solution (by centering y)
In summary, we have the constrained optimization problem :
T T T
min
n
y Ly s.t. y y = 1, y 1n = 0
y 2R

which solution is given by EVD of L :


Lu1 = 1 u1 2 Rn , with 1 is the smallest non-zero eigenvalue of L,
with its eigenvector u1

Xavier Bresson 50
51

Generalization to k dimensions
Let us extend this mapping process, i.e. from a graph to a k-dimensional Euclidean space :

T T T
<latexit sha1_base64="2QiE+lY16keqUJW3PaY0hqOOnuw=">AAAEnnicdVNtT9swEA6021j3BttHvpxGNxWpRA2a2IRUCTT23gkGlFE1beQ4bmvFsTPbAUKUf7Vfsm/7N3P6wihslhKd757zPX7u7MeMKt1o/F5YLJXv3L23dL/y4OGjx0+WV56eKJFITNpYMCFPfaQIo5y0NdWMnMaSoMhn5Lsfvi3i38+IVFTwY53GpBehIacDipE2Lm9l8edLcCPKvSztc5dyN0J65PvZYZ67kPaPoQUpgKvJhZZRBsrWNuRFIG06dRhDHI83G8a8AtWcPYhQHFM+XM9dtzKr0Gl2U8+p27ZdTz3em6vWz7iraUQUhEVlVyWRl0VNJ++H0PEyFwdC16N8TOjaHppXZbXMa9CZANbn6IQb1/hAQegqdk71CPSIwJBwIhGjlyQAIfVIDAU3W50CFlxpiSjXUDXHd5qz5E+5F1bzueOOYoINlsGRYEkhMGz/NamCIT0jHPx0XLIaVoELvnFJpAAVIcaI0kCooXJmjhFSgRjADcJjrhLFI2ihmCFMEYdqq7m7sVeF7QkW3B8JCowMrig6T3Q2y353spfnWTOHttsyIxIgaBu9jFSHdDjSSEpxbnYdI2p7JnHRrnqYmxH4X7fm+R1IEROpqQltwwcmfCOGmilQozwgMTE/I6a5G+VU00L08TCuw7yYiAdQ3a/x/ma4XjVtiGJGLoqO5BVvea1hN8YLbhvO1FizpuvAW/7lBgInkSmMGVKq6zRi3cuQYYoZyStuokiMcIiGpGtMjszdetn4eeXwwngCGAhpPkN87L2ekaFIqTTyDbJQSN2MFc5/xbqJHrzpZZTHiSYcTwoNEgZaQPFWIaDSDAJLjYGwNFphwCMkEdamr4UIzs0r3zZONm1ny9769mptZ3Mqx5K1aj23apZjvbZ2rI/WgdW2cGm1tFv6XPpShvL78tfy/gS6uDDNeWbNrfLpHxA3fmU=</latexit>

min
n
y Ly s.t. y y = 1, y 1n = 0 (1D mapping)
y 2R
k
X
T
min Y·,m LY·,m = tr(Y T LY ) (k-D mapping)
Y =[y1 ,...,yn ]2Rn⇥k
m=1
with the generalized orthogonality constraint Y T Y = Ik
Spectral Solution : Solution is given by the k non-zero smallest eigenvectors of
the graph Laplacian L = A D:
EVD
L = U ⇤U T ) Y = U·,1,..,k 2 Rn⇥k
Properties : Global solution (independent of initialization)
and O(n2 k) complexity

Xavier Bresson 51
52

Normalized Laplacian
Considering the importance of the nodes with the degree matrix D :

tr(Y T LY ) = tr(Y T (D A)Y ) s.t. Y T DY = Ik


<latexit sha1_base64="VWEXovR0CmxPlqz7CWwRkZh43Kw=">AAAEwHicrVNNb9NAEHWTACV8tXDkMqKlSg5J44AKQrLU0h5AyqGgpm1SJ9F4s4lXWa/N7qYltfwnOSDxb1inbpo0lQCJlSw9v5k3b2bs9SLOlK7Vfq3k8oV79x+sPiw+evzk6bO19efHKhxLQpsk5KE89VBRzgRtaqY5PY0kxcDj9MQb7afxk3MqFQvFkZ5EtBPgULABI6gN1VvP/dwCN2CiF7dcJtwAte958dekGwtXs4AqGCWJC66m37UMYi2TErS6R9CAVtlZYksHlb2yicCNAlRVVyExjEk4aM00n5PeyHWLW7O8fR/FkEI4gHOUDM0A8MHo2uDAQTe2t+tJa6FsCBcUfDynJgnSSpmBaaIbV9L8dtkQ19hpd4/asOCIop82lkqvVtD+6xXMW2RT31B3Tp/aLw0Prq8iJDS2q29IkCx6GMWs6NTi+gX+q0XpRgGVmSHszWA59XOWhdNNEeRx4986mqVd+FRS2Jyr4/ypl01gCrRPQYQyQM4uaR+GEiMfGhhxJAzF9IsWe2sbtWptemAZ2BnYsLJz2Fv74fZDMg6o0ISjUmd2LdKdGKVmhNOk6I4VNXsc4ZCeGSjQ/BideHoBE3htmD4MQmkeoWHKzitiDJSaBJ7JTIdVt2MpeVfsbKwH7zsxE9FYU0GujAZjDjqE9DZDn0lKNJ8YgEQy0ysQHyUSbe58ugT79sjL4LhetXeqO1/ebuzWs3WsWi+tV1bJsq131q71yTq0mhbJO3mS5/mg8LHgF8LCt6vU3EqmeWEtnMLlb9mRfzw=</latexit>

min
Y 2Rn⇥k

Change of variable : Z = D1/2 Y so we have


Y T DY = (D 1/2
Z)T D 1/2
Z = ZT Z
and
1/2
min tr(D Z)T (D A)D 1/2
Z) s.t. Z T Z = Ik
Z2Rn⇥k

tr(Z T D 1/2
(D A)D 1/2
Z) s.t. Z T Z = Ik
tr(Z T (I D 1/2
AD 1/2
)Z) = tr(Z T LZ) s.t. Z T Z = Ik
1/2 1/2
where L = I D AD is the normalized graph Laplacian

Xavier Bresson 52
53

Lab 4 : Laplacian eigenmaps


Run code04.ipynb :
Visualize MNIST with PCA.
Compare PCA with Laplacian eigenmaps.

2D PCA 3D PCA 2D LapEigMaps 3D LapEigMaps


of MNIST dataset of MNIST dataset of MNIST dataset of MNIST dataset

Xavier Bresson 53
54

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 54
55

TSNE
T-distributed Stochastic Neighbor Embedding[1] (TSNE) has been the
most successful non-linear visualization technique.
Geoffrey Hinton Laurens van der
It involves four steps : Maaten

Step 1: Compute a k-nearest neighbor graph G from the high-dimensional data points
{xi} ∈ Rd, d ≫ 1.
Step 2: Represent the distribution of the high-dim points using exponential weights :

kxi xj k22 / 2
<latexit sha1_base64="96H9ZMvRyJr19MmcyOMgKCQgGoM=">AAACy3icbVFNbxMxEPUuXyV8BThyGZFQFYmE3QgVLpUqcekFKUikrZRNVl7vbNbp2ruyvTTB2SN/kBtXfglOmqLSdiTLz2/e2OM3SVVwbYLgt+ffuXvv/oOdh61Hj588fdZ+/uJYl7ViOGJlUarThGosuMSR4abA00ohFUmBJ8nZ53X+5DsqzUv5zSwrnAg6kzzjjBpHxe0/u1DFls8bOIDI4MIoYYeqTJrYXh5zPst7KRdNs8ffzd+uhZmizOLU9qLVIua9RTyPVvFgOngfaT4TNObTQdPYSNcituIgbKYSrqrFLWqIotbuvxbgPEeF0L1UdIFrMDmCRKpQG7e7rhLnAqTOJCoZQqZKASk1FKqSSwNd3t3cGrc7QT/YBNwE4RZ0yDaGcftXlJasFigNK6jW4zCozMRSZTgrsGlFtcaKsjM6w7GDkgrUE7uZRQNvHJNCViq3XBMb9mqFpULrpUicUlCT6+u5NXlbblyb7NPEclnVBiW7eCirCzAlrAfrfFDITLF0gDLFXa/AcuoGZdz41yaE1798ExwP+uF+f//rh87hYGvHDnlFXpM9EpKP5JAckSEZEeYdedI79xb+F1/7P/zVhdT3tjUvyX/h//wLHPrePg==</latexit>

e i
pij = Probhigh-dim (i, j) = Pn kxi xm k22 / 2
m=1 e i

where i is the nearest neighbour distance from data point i

[1] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008
[2] Interactive demo : https://distill.pub/2016/misread-tsne

Xavier Bresson 55
56

TSNE
Step 3: Parametrize the distribution of the low-dim points using polynomial weights :

(1 + kyi yj k22 ) 1
<latexit sha1_base64="7AT1J26ffxqofnzH3ndX1AsRUiQ=">AAAC3nicbVJLbxMxEHaWVwmPBjhyGZFSpYJE2QgVLpEqceEYEGkrxcni9XoTJ2t7sb1tV+4euHAAIa78Lm78EO44aUB9MJKlb775RvNynGfc2G73Vy24dv3GzVsbt+t37t67v9l48HDfqEJTNqQqU/owJoZlXLKh5TZjh7lmRMQZO4gXr5fxgyOmDVfyvS1zNhZkKnnKKbGeihq/t+Fj5Pi8apU70Ads2YnVwg20iqvI/XUzddxOuKiqFn8+X+lSTaiDVvgMn5YRb5fRHJ9GvUlvZ+LaYQWVA2wKETnRD6uJPC8UF4WAcX37X1045nYGW2UfHxGdz3jrZAdziQWxszh276qJk9hywQwsqi0gmoGdMVi3B0zELEm4nAJVSntArFeqFBJiCeSKS2s6FUA9ajS7ne7K4CoI16CJ1jaIGj9xomghmLQ0I8aMwm5ux45oy2nGqjouDMsJXZApG3koiW9x7FbnqeCpZxJIlfZPWlix5zMcEcaUIvbK5aDmcmxJ/i82Kmz6auy4zAvLJD0rlBYZWAXLW0PCNaM2Kz0gVHPfK9AZ8Zez/kcslxBeHvkq2O91wt3O7tsXzb3eeh0b6DF6glooRC/RHnqDBmiIaG1U+1T7UvsafAg+B9+C72fSoLbOeYQuWPDjD46G45Q=</latexit>

qij (y) = Problow-dim (i, j) = Pn


m=1 (1 + kyi ym k22 ) 1

with y = '(x) 2 Rn⇥k are the low-dim embedding coordinates of data points.

Step 4: Minimize the Kullback-Leibler divergence (/distance) between the high-dim


distribution P and the low-dim distribution Q parametrized by Y :
<latexit sha1_base64="OVxWQBhcwhdTPQstUms8VQfa3Lg=">AAACinicdVFdixMxFM2MH7vW1a366EuwuLSwlpkiXUWEBfdB0Ieu2P2gqUMmzbShSWZI7qgl5Mf4l3zz35hpK+quXgj35NxzSe65eSWFhST5EcU3bt66vbN7p3V37979/faDh2e2rA3jY1bK0lzk1HIpNB+DAMkvKsOpyiU/z5dvmvr5Z26sKPVHWFV8quhci0IwCoHK2t8OiBI6c5dEaKIoLPLcffCfnCYgFLd46T3BBPhXMMqd+OwXfPfed0eHp93LXg8T0jr4LcJfBCyw/29Tlh6OskHvNbG1ypzygeiqHpHlnBSGMre5+5AHTW5l7U7ST9aBr4N0CzpoG6Os/Z3MSlYrroFJau0kTSqYOmpAMMl9i9SWV5Qt6ZxPAtQ0zDl1ays9fhqYGS5KE44GvGb/7HBUWbtSeVA2btmrtYb8V21SQ/Fi6oSuauCabR4qaomhxM1e8EwYzkCuAqDMiPBXzBY0GAJhe40J6dWRr4OzQT8d9oenzzvHg60du+gxeoK6KEVH6Bi9RSM0RizaiZ5Fw+go3osH8cv41UYaR9ueR+iviE9+ArRkxA4=</latexit>

min DKL (P, Q(Y ))


Y 2Rn⇥k
X P1 (m)
with DKL (P1 , P2 ) = P1 (m) log
m
P2 (m)

[1] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008

Xavier Bresson 56
57

Optimization problem
The minimization problem is a continuous non-convex problem.
Standard gradient descent (GS) can be applied.
However, since the problem is non-convex, the GD solution is not guaranteed to reach a global
minimum.
In practice, PCA is often used as the starting point for initialization.
Although GD is a slow optimization process, TSNE has significant advantages :
TSNE does not enforce the manifold assumption, meaning there is no orthogonality
constraint YTY=I, which allows for greater flexibility in representing complex structures.
Minimizing the KL loss ensures that local distances in the high-dimensional data
distribution are preserved in the low-dimensional representation.

Xavier Bresson 57
58

Algorithm
Optimization problem :
<latexit sha1_base64="EFjvrTVqhQfheAxVL2F2SO4NwOE=">AAADtHicdVJbb9MwFM5aLqPcOnjk5YiNKdXaqqnQQJMqDY0HEDx0iN00t8Fx3M5r4mT2CaLL/Ad55I1/g5N2Y1vhSI6/nPv5joM0Eho7nd9Lleqdu/fuLz+oPXz0+MnT+sqzfZ1kivE9lkSJOgyo5pGQfA8FRvwwVZzGQcQPgslOYT/4zpUWifyK05QPYjqWYiQYRavyV5Z+rpNYSD8/IkKSmOJJEORfzDCXBEXMNUyMIUCQ/0AV5++Nfwk/fTZuv7nrHjUa0CM6i/1cnJq0/JIoGZORoiyf/Zv8rLytswFCauvwNyUIDbYBEYtzHkIwBTzhoJHKkKoQxoqGgkuEkGtW3FuwmOGjFChoJM7LmUqXo2GOvY6B3pVXf+edcQ8b/50SFtMiVxQ5ZBJFBCyRlscxl4w3YQ17XrPbbLfba7B12dBZRkOY+sKW3vCK0jNsoHWVM1LWueTqtOeZoay5M4JaM36G2HC9DXJRBGJr6p8OkVz43WG3McxbnnGv6Rtwc5KJ7cGvr3banVJgEXhzsOrMpe/Xf5EwYVlsiWUR1frY66Q4yKlCwSJuaiTTPKVsQsf82EJJLVWDvHx0Bl5ZTQijRNljF1Nqr0fkNNZ6GgfWs2hT37YVyn/ZjjMcvR3kQqYZWrZnhUZZBJhA8YIhFIozjKYWUKbs6hmwE2pfm92XLkjwbo+8CPa7bW+zvbn7enW7O6dj2XnhvHRcx3PeONvOB6fv7Dms4lUOKt8qtLpZJVVW5TPXytI85rlzQ6ryDy5rL1Q=</latexit>

X pij
min DKL (P, Q(Y )) = pij log
Y 2Rn⇥k
ij
qij (Y )
is minimized by the standard gradient descent :
Initialization : Y t=0 = PCA(X) 2 Rn⇥k
Iterate until convergence, t = 1, 2, ... :
Xn
t+1 t t
yi = yi lr (pij qij )(1 + kyit yjt k22 ) 1
(yit yjt ) 2 Rk
j=1

Xavier Bresson 58
59

Lab 5 : TSNE
Run code05.ipynb :
Compare TSNE with Laplacian Eigenmaps using MNIST.

2D LapEigMaps 3D LapEigMaps 2D TSNE 3D TSNE


of MNIST dataset of MNIST dataset of MNIST dataset of MNIST dataset

Xavier Bresson 59
60

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 60
61

UMAP
Uniform Manifold Approximation and Projection (UMAP)[1] enhances TSNE
in several key aspects.
Notice that the TSNE gradient can be interpreted as an attractive force between Leland McInnes
data points, similar to interactions in physics.
UMAP generalizes this concept by introducing a more flexible attractive force, controlled by two
hyperparameters.
Additionally, UMAP introduces a repulsive force by sampling a few non-neighboring data points.
Instead of using PCA for initialization in gradient descent, UMAP uses Laplacian Eigenmaps,
which offer a more effective starting point.
To my opinion, both TSNE and UMAP excel in visualization.
The choice boils down to the implementation with the fastest computational speed.

[1] McInnes et-al, UMAP: Uniform manifold approximation and projection for dimension reduction, 2018

Xavier Bresson 61
62

Task formalization
The visualization task involves minimizing two physics-based losses, each parameterized by the
low-dimensional embedding coordinates of the data points.
One loss function generates attractive forces between closely connected data points on the graph,
while the other creates repulsive forces for data points that are far apart on the graph.

<latexit sha1_base64="o00dLQWwY178aP0F45YmIptg6kk=">AAACA3icbVDLSsNAFJ3UV42vqDvdBItQNyURqS4Lbly4qGAf0IYwmUzbsZMHMzdiCQE3/oobF4q49Sfc+TdO2iy09cCFwzn3cu89XsyZBMv61kpLyyura+V1fWNza3vH2N1ryygRhLZIxCPR9bCknIW0BQw47caC4sDjtOONL3O/c0+FZFF4C5OYOgEehmzACAYlucbBtdsH+gAiSDGAyKq+m7K77ETXddeoWDVrCnOR2AWpoAJN1/jq+xFJAhoC4VjKnm3F4KRYACOcZno/kTTGZIyHtKdoiAMqnXT6Q2YeK8U3B5FQFYI5VX9PpDiQchJ4qjPAMJLzXi7+5/USGFw4KQvjBGhIZosGCTchMvNATJ8JSoBPFMFEMHWrSUZYYAIqtjwEe/7lRdI+rdn1Wv3mrNKwijjK6BAdoSqy0TlqoCvURC1E0CN6Rq/oTXvSXrR37WPWWtKKmX30B9rnD/YHlwM=</latexit>

Lattr (dij )
<latexit sha1_base64="2M6JP81fG4MGPf6Mxfo4l4Vtv4Y=">AAACBHicbVBNS8NAEN3Urxq/oh57CRahXkoiUj0WvHjwUMF+QFvCZjtt1242YXciltCDF/+KFw+KePVHePPfmLQ9aOuDgcd7M8zM8yPBNTrOt5FbWV1b38hvmlvbO7t71v5BQ4exYlBnoQhVy6caBJdQR44CWpECGvgCmv7oMvOb96A0D+UtjiPoBnQgeZ8ziqnkWYVrr4PwgCpIFESxmJR6XsLvJiemaXpW0Sk7U9jLxJ2TIpmj5llfnV7I4gAkMkG1brtOhN2EKuRMwMTsxBoiykZ0AO2UShqA7ibTJyb2car07H6o0pJoT9XfEwkNtB4HftoZUBzqRS8T//PaMfYvugmXUYwg2WxRPxY2hnaWiN3jChiKcUooUzy91WZDqijDNLcsBHfx5WXSOC27lXLl5qxYdeZx5EmBHJEScck5qZIrUiN1wsgjeSav5M14Ml6Md+Nj1poz5jOH5A+Mzx/Ljpd6</latexit>

Lrepul (dij )

rdij Lattr (dij )


<latexit sha1_base64="l44HWXeQ2NMdfhQzwhUhQj6HJA0=">AAACE3icbVA9S8RAEN2c3/ErammzeAhqcSQip+WBjYWFgqfCXQiTzd7deptN2J2IR8h/sPGv2FgoYmtj578x91H49WDg8d4MM/PCVAqDrvtpVaamZ2bn5hfsxaXllVVnbf3SJJlmvMkSmejrEAyXQvEmCpT8OtUc4lDyq7B/PPSvbrk2IlEXOEi5H0NXiY5ggKUUOHttBaGEII+CXNwUBT0N2sjvUMc5IOpiZ6zv2rYdOFW35o5A/xJvQqpkgrPA+WhHCctirpBJMKbluSn6OWgUTPLCbmeGp8D60OWtkiqIufHz0U8F3S6ViHYSXZZCOlK/T+QQGzOIw7IzBuyZ395Q/M9rZdg58nOh0gy5YuNFnUxSTOgwIBoJzRnKQUmAaVHeSlkPNDAsYxyG4P1++S+53K959Vr9/KDacCdxzJNNskV2iEcOSYOckDPSJIzck0fyTF6sB+vJerXexq0VazKzQX7Aev8CyOyeAg==</latexit>

rdij Lrepul (dij )


<latexit sha1_base64="hABnkatheO1p5noNB88psc5po1o=">AAACFHicbVA9SwNBEN3zM55fUUubxSBEhHAnEi0DNhYWEcwHJOHY20yS1b29Y3dODMf9CBv/io2FIrYWdv4bLx+FJj4YeLw3w8w8P5LCoON8WwuLS8srq7k1e31jc2s7v7NbN2GsOdR4KEPd9JkBKRTUUKCEZqSBBb6Ehn93MfIb96CNCNUNDiPoBKyvRE9whpnk5Y/bivmSeUnXS8RtmtIrr43wgDpINESxTIsT48i2bS9fcErOGHSeuFNSIFNUvfxXuxvyOACFXDJjWq4TYSdhGgWXkNrt2EDE+B3rQyujigVgOsn4qZQeZkqX9kKdlUI6Vn9PJCwwZhj4WWfAcGBmvZH4n9eKsXfeSYSKYgTFJ4t6saQY0lFCtCs0cJTDjDCuRXYr5QOmGccsx1EI7uzL86R+UnLLpfL1aaHiTOPIkX1yQIrEJWekQi5JldQIJ4/kmbySN+vJerHerY9J64I1ndkjf2B9/gCk8p55</latexit>

yj k22
<latexit sha1_base64="HlyvhjY16rp35RqqYVliCGJbxTM=">AAACAnicbVDLSsNAFJ34rPEVdSVuBovgxpIUqW6EghuXFewDmhgmk2k77eTBzEQIaXHjr7hxoYhbv8Kdf+OkzUJbD1w4nHMv997jxYwKaZrf2tLyyuraemlD39za3tk19vZbIko4Jk0csYh3PCQIoyFpSioZ6cScoMBjpO2NrnO//UC4oFF4J9OYOAHqh7RHMZJKco1D383ocHJlj1OXnqXu0B671fuqruuuUTYr5hRwkVgFKYMCDdf4sv0IJwEJJWZIiK5lxtLJEJcUMzLR7USQGOER6pOuoiEKiHCy6QsTeKIUH/YiriqUcKr+nshQIEQaeKozQHIg5r1c/M/rJrJ36WQ0jBNJQjxb1EsYlBHM84A+5QRLliqCMKfqVogHiCMsVWp5CNb8y4ukVa1YtUrt9rxcN4s4SuAIHINTYIELUAc3oAGaAINH8AxewZv2pL1o79rHrHVJK2YOwB9onz9VaZYI</latexit>

yj k22
<latexit sha1_base64="HlyvhjY16rp35RqqYVliCGJbxTM=">AAACAnicbVDLSsNAFJ34rPEVdSVuBovgxpIUqW6EghuXFewDmhgmk2k77eTBzEQIaXHjr7hxoYhbv8Kdf+OkzUJbD1w4nHMv997jxYwKaZrf2tLyyuraemlD39za3tk19vZbIko4Jk0csYh3PCQIoyFpSioZ6cScoMBjpO2NrnO//UC4oFF4J9OYOAHqh7RHMZJKco1D383ocHJlj1OXnqXu0B671fuqruuuUTYr5hRwkVgFKYMCDdf4sv0IJwEJJWZIiK5lxtLJEJcUMzLR7USQGOER6pOuoiEKiHCy6QsTeKIUH/YiriqUcKr+nshQIEQaeKozQHIg5r1c/M/rJrJ36WQ0jBNJQjxb1EsYlBHM84A+5QRLliqCMKfqVogHiCMsVWp5CNb8y4ukVa1YtUrt9rxcN4s4SuAIHINTYIELUAc3oAGaAINH8AxewZv2pL1o79rHrHVJK2YOwB9onz9VaZYI</latexit>

dij = kyi dij = kyi


Attractive loss Repulsive loss
yj k22 ) b , b 2
<latexit sha1_base64="Hwv90l3X1QzqxLze85V4i2FvzxY=">AAACp3icbVFda9RAFJ3Er7p+dG0ffbm4WjbUDUmQ1pdCRQQRwYrd7cLONszMTrbTTj6cmYghzV/zR/jmv3GyjVC7vTBwOPfec++cSwsptAmCP4575+69+w82HvYePX7ydLP/bGui81IxPma5zNWUEs2lyPjYCCP5tFCcpFTyE3rxvs2f/OBKizw7NlXB5ylZZiIRjBhLxf1f2PCfRqX1B3/pQwM78Dn+RylelLIZVh4cwDAcvYtrcd54w3CX4MsqFqMqPseXcXQaeaf1iDavAQPFS/4dIsC4twM4I1SSuFqTtHqjCAhQuKYLOFGE1WFT3z6B7oZNYxs63mtn9OL+IPCDVcA6CDswQF0cxf3feJGzMuWZYZJoPQuDwsxrooxgkjc9XGpeEHZBlnxmYUZSruf1yucGXllmAUmu7MsMrNjrHTVJta5SaitTYs70zVxL3pablSZ5O69FVpSGZ+xqUFJKMDm0R4OFUJwZWVlAmBJ2V2BnxNpl7GlbE8KbX14Hk8gP9/y9r28Gh0FnxwZ6jl6gIQrRPjpEH9ERGiPmvHQ+Od+cY9dzv7gTd3pV6jpdzzb6L1zyFywVyNs=</latexit>

Aij (1 + akyi yj k22 )b , b 2 E.g. Lrepul (y) = (1 Aij )(1 + akyi


<latexit sha1_base64="ia3IcK/GY3dj0L2ZXzUn4gK9/68=">AAACjHicbVFdb9MwFHXC11ZgFHjk5YqKqRUsS6JpDKFJQwjEAw9Dotukuots1+28OR/YN4goy6/hH/HGv8HpggTdrmTp6Jx7j+1zeaGVxTD87fm3bt+5e29tvXf/wcONR/3HT45sXhohxyLXuTnhzEqtMjlGhVqeFEaylGt5zC/et/rxd2msyrOvWBVymrJFpuZKMHRU0v9JUf5Ak9YfgkUADWzC5+QvxRBNM6xGsA/vklqdN8PoJaOXVaK2quScXibxaTw65a+AAqcL+Q1ioLS3CTRjXLOkWrVyPjEw4J0b3GhX863ISR07ah2T/iAMwmXBdRB1YEC6Okz6v+gsF2UqMxSaWTuJwgKnNTOohJZNj5ZWFkxcsIWcOJixVNppvQyzgReOmcE8N+5kCEv234mapdZWKXedKcMzu6q15E3apMT53rRWWVGizMTVRfNSA+bQbgZmykiBunKACaPcW0GcMcMEuv21IUSrX74OjuIg2g12v+wMDsIujjXyjDwnQxKR1+SAfCKHZEyEt+5te3veG3/D3/Hf+vtXrb7XzTwl/5X/8Q9As7/i</latexit>

E.g. Lattr (y) =


ry Lattr = 2abAij (1 + akyi yj k22 )b 1 (yi yj ) 1
ry Lrepul = 2ab(1 Aij ) (yi yj )
(1 + akyi yj k22 )b+1
Xavier Bresson 62
63

Algorithm
Minimization problem :
<latexit sha1_base64="KLhyBiZ8bAcIVZG+Vyj62neK1SI=">AAAFy3icjVTfT9swEA6j3Vj3C7bHvZxGh1LRVk00sQkJCYSmMQkQm8YvERo5rtuaOk5mO0AIftw/uLc97j+Z0xYopWhYsnL6znf+7rucg5hRqRqNP1OPpgvFx09mnpaePX/x8tXs3Os9GSUCk10csUgcBEgSRjnZVVQxchALgsKAkf2gt57790+JkDTiP1Qak+MQdThtU4yUgfy56b8LXki5nx16lHshUt0gyL7rZsY9RUMioae1B54i50qE2ab2r0yklND2WvWwAouT/ILECdO2U8uPeF5pAW7SOHVYj8I4UQQ6AsVdQK0ThAnHKRgGgp7DGpxRZXAo9/w+K4xY9kWXa9vbgxANYzndOmxRTkN6QSBI80MtSriCFpEms7o6/zNBresY+MqpoojRi74asAwaDpuZWmloWLkpCsWfaUfbB5V7JYLJ2RURyNSYcEUZ4IibRnRMlaQKZbXiVN1qvV4vw/Iot0GK1KeGxqKT0xjYGmrXiZnQeeEB7djgcRQw5KewOd4ZP6MnujqMNt+T/Ntv1t2Qm2bdE5VfNlZ+b2LNZ10iyN2CHkAzv6ySy94WCGc1FwXepYFrBvYufbeZuXZQcypaZ87ibY+rB1nsIThO9AFkJghwm48b6Mx2FtHIzc0c9N2K7Z0iEy8pi/gos6ZrnDc5K/+nNyYjlFE1KINJbnZAlUAiha4ZYlGLkUAhMf+XNP+Vmd8OtEUUguoSiCOWcoMhJvt9KPmz8416o7/gruEMjXlruHb82d9eK8JJaMYGMyTlkdOI1XGGhKKYEV3yEklihHuoQ46MyQ0TeZz13yIN7w3SgnYkzDbj10dHIzIUSpmGgTmZayDHfTk4yXeUqPan44zy/NngeHBRO2GgIsgfNmhRQbBiqTEQFmauMeCukQnnKuUiOOMl3zX23LqzVF/69mF+1R3KMWO9td5ZtuVYH61Va8PasXYtXNgo8MJZ4by4VZTFi+Ll4OijqWHMG+vWKv76B/cm8Os=</latexit>

min Lattr (A, Y ) + Lrepul (1 A, Y )


Y 2Rn⇥k
1. Compute graph adjacency matrix A with a kG -NN graph
2. Minimize by gradient descent
Initialization : Y t=0 = LapEig(X) 2 Rn⇥k
Iterate until convergence, t = 1, 2, ... :
yit+1 = yit lr ry Lattr (Aij , yit , yjt ) + ry Lrepul (1 Aij , yit , yjt ) 2 Rk
where
2(b 1)
2abkyi yj k2
ry Lattr (Aij , yi , yj ) = Aij (yi y j ) 2 Rk
1 + kyi yj k22
2b
ry Lrepul (1 Aij , yi , yj ) = (1 Aij )(yi yj ) 2 Rk
(1 + akyi yj k2b
2 )(" + kyi yj k22 )
where a, b are arbitrary hyper-parameters coming from the polynomials

Xavier Bresson 63
64

Lab 6 : UMAP
Run code06.ipynb :
Compare UMAP visualization with LapEigenmaps and TSNE on MNIST.
Apply UMAP on CIFAR (raw) images. CIFAR dataset

2D and 3D LapEigMaps 2D and 3D TSNE 2D and 3D UMAP 2D and 3D UMAP


of MNIST dataset of MNIST dataset of MNIST dataset of CIFAR dataset
Xavier Bresson 64
65

Lab 7 : Visualization with deep learning


Run code07.ipynb :
Visualize CIFAR with TSNE/UMAP and InceptionV3 features.
Create a mosaic of CIFAR images.
CIFAR dataset

2D and 3D TSNE Mosaic of CIFAR images 2D and 3D UMAP Mosaic of CIFAR images
of CIFAR inception with TSNE and inception of CIFAR inception with UMAP and inception
features features features features
Xavier Bresson 65
66

Lab 7 : Visualization with deep learning

Mosaic of CIFAR images with Cropped mosaic of CIFAR


TSNE and inception features images with TSNE and
inception features

Xavier Bresson 66
67

Visualizing ImageNet dataset

[1] Andrej Karpathy’s course cs231n on convolutional neural networks for image recognition

Xavier Bresson 67
68

Visualizing video games

[1] Mnih, Human-level control through deep reinforcement learning, 2015

Xavier Bresson 68
69

Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion

Xavier Bresson 69
70

Conclusion

Data

Linear Non-linear
structure structure Main property
φ preserves local distances in high-dim
and in low-dim spaces:

xi
zi = Axi zi = '(xi ) + ' '(xi ) '(xj )
+ xj + +
Low-dim High-dim Low-dim High-dim
data data data data
Rd , d 1 R3

Dictionary/pattern matching Non-linear mapping/embedding

PCA ICA LDA Sparse Coding Kernel PCA NMF LLE LapEigMaps T-SNE UMAP

1901 1936 1985 1996 1998 1999 2000 2003 2008 2018

Most popular Most popular


linear technique non-linear technique

Xavier Bresson 70
71

Conclusion
Non-linear visualization techniques typically involve two key steps :
Construct a k-nearest neighbors (kNN) graph from the high-dim data points, and
Determine low-dim embedding coordinates, usually in 2D or 3D, that preserve the pairwise
distances between the high-dim data points while maintaining a specific visualization
property :
In spectral-based methods like LLE and Laplacian Eigenmaps, the embedding
coordinates Y are orthogonal,
In TSNE, the low-dim distribution is optimized to match the high-dim distribution using
the Kullback-Leibler (KL) distance,
UMAP achieves its embedding Y by balancing attractive and repulsive forces based on
the high- and low-dim data representations.

Xavier Bresson 71
72

Conclusion
LLE and LapEigenmaps
Offer global solutions but lack flexibility in adjusting the visualization outcome.
The spectral orthogonality constraint, needed to avoid trivial solutions, restricts
the low-dimensional embeddings.
TSNE and UMAP
Produce local solutions through non-linear loss functions,
offering greater flexibility and often more visually appealing results.
Without the orthogonality constraint, the class of possible solutions is larger,
and more diverse than those provided by spectral methods.
UMAP includes two hyperparameters that control the visualization aspect.
While this flexibility is advantageous, it also raises the question: what are the optimal
UMAP hyperparameter values for visualizing a new dataset?

Xavier Bresson 72
73

Conclusion
No visualization technique, whether linear or non-linear, is universally effective for high-
dimensional data due to the curse of dimensionality.
A key prerequisite for successful visualization is that the input data must be sufficiently
expressive.
In some cases, representing e.g. a text document as a bag of words (a distribution over the
dictionary) may work.
But typically, the most effective visualizations are achieved using the hidden representations
from a deep neural network.
For example, extracting the hidden vector R2048 of one of the last layers of an Inception or
ResNet network can provide a strong representation of images.
Similarly, using the memory state vector R512 from an RNN can effectively represent a time
series, or a class token embedding R1024 from the last layer of a Transformer.

Xavier Bresson 73
74

Questions?

Xavier Bresson 74

You might also like