Lecture06 Graph Visualization
Lecture06 Graph Visualization
Xavier Bresson
https://twitter.com/xbresson
Xavier Bresson 1
2
Course lectures
Introduction to Graph Machine Learning Part 3 : GML with deep feature learning,
Part 1: GML without feature learning a.k.a. GNNs (after 2016)
(before 2014) Graph Convolutional Networks
Introduction to Graph Science (spectral and spatial)
Graph Analysis Techniques without Weisfeiler-Lehman GNNs
Feature Learning Graph Transformer & Graph
Graph clustering ViT/MLP-Mixer
Graph SVM Benchmarking GNNs
Recommendation on graphs Molecular science and generative GNNs
Graph-based visualization GNNs for combinatorial optimization
Part 2 : GML with shallow feature learning GNNs for recommendation
(2014-2016) GNNs for knowledge graphs
Shallow graph feature learning Integrating GNNs and LLMs
Xavier Bresson 2
3
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 3
4
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 4
5
Visualization
The visualization task involves projecting high-dimensional data, s.a. images, text documents,
user/product attributes, sequences of actions, etc into 2D or 3D low-dimensional Euclidean spaces
to reveal underlying data structures.
This projection is achieved using dimensionality reduction techniques, which aim to compress the
original information while discarding unnecessary details and noise.
+
28 x 28 MNIST images
Visualization of MNIST
images in R3
Xavier Bresson 5
6
Dimentionality reduction
Two classes of dimensionality reduction techniques have been developed :
Linear Techniques: These methods produce low-dimensional Euclidean (flat) spaces.
Common examples include Principal Component Analysis (PCA)[1], Linear Discriminant
Analysis (LDA)[2], and Independent Component Analysis (ICA)[3].
Non-Linear Techniques: These methods compute low-dimensional manifolds, i.e. curved
hyper-surfaces.
Standard techniques are Kernel methods[4], Locally Linear Embedding (LLE)[5], Laplacian
Eigenmaps[6], t-distributed Stochastic Neighbor Embedding (TSNE)[7], and Uniform Manifold
Approximation and Projection (UMAP)[8].
[1] Pearson, On lines and planes of closest fit to systems of points in space, 1901
[2] Fisher, The Use of Multiple Measurements in Taxonomic Problems, 1936
[3] Herault, Jutten, Architectures neuromimétiques adaptatives: Détection de primitives, 1985
[4] Scholkopf et-al, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, 1998
[5] Roweis, Saul, Nonlinear dimensionality reduction by locally linear embedding, 2000
[6] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003
[7] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008
[8] McInnes et-al, UMAP: Uniform manifold approximation and projection for dimension reduction, 2018
Xavier Bresson 6
7
Linear
<latexit sha1_base64="riA6Z2VOF7Rb8GSp9v+9jfjr2IA=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Ygl9Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8Ayf6N+w==</latexit>
xi Dimensionality zi
<latexit sha1_base64="yBAAEqp91WhTUwFixWJWAf74gv8=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Qg19Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8AzQ6N/Q==</latexit>
Am,·
) <latexit sha1_base64="9IBg7sQsT9cT9GjRz4uG60IMlWI=">AAAB9HicbVBNS8NAEN34WeNX1aOXxSJ4kJIUqR4rXjxWsB/QhrLZbNqlm924OymU0N/hxYMiXv0x3vw3Jm0O2vpg4PHeDDPz/FhwA47zba2tb2xubZd27N29/YPD8tFx26hEU9aiSijd9YlhgkvWAg6CdWPNSOQL1vHHd7nfmTBtuJKPMI2ZF5Gh5CGnBDLJux2k7mWfBgpmtj0oV5yqMwdeJW5BKqhAc1D+6geKJhGTQAUxpuc6MXgp0cCpYDO7nxgWEzomQ9bLqCQRM146P3qGzzMlwKHSWUnAc/X3REoiY6aRn3VGBEZm2cvF/7xeAuGNl3IZJ8AkXSwKE4FB4TwBHHDNKIhpRgjVPLsV0xHRhEKWUx6Cu/zyKmnXqm69Wn+4qjRqRRwldIrO0AVy0TVqoHvURC1E0RN6Rq/ozZpYL9a79bFoXbOKmRP0B9bnDytxkQY=</latexit>
A1,·
Projection map
<latexit sha1_base64="wTYAwgitejPhPvDZmS5d7pnnjd4=">AAACOXicbVCxThtBEN2DEMCQxEBJM4oFShGsOxQBJRJNCgqDYkDyOdbe3hiv2N077c4B5uTfoslf0CHRUIAQbX4ge8ZFAnnSSE9v3mhmXpIr6SgMb4Op6Xcz72fn5msLix8+fqovLR+5rLAC2yJTmT1JuEMlDbZJksKT3CLXicLj5Gyv6h+fo3UyMz9omGNX81Mj+1Jw8lKv3lqHmPCSrC73s4uNVGo0lZkrGHi73cgVNziCOK6tw1VPQiwNxJrTIEnKw9FP/RU0xEpBWlk8evVG2AzHgLckmpAGm6DVq9/EaSYKv5eE4s51ojCnbsktSaFwVIsLhzkXZ/wUO54artF1y/HnI1jzSgr9zPoyBGP174mSa+eGOvHO6mb3uleJ/+t1CurvdEtp8oLQiJdF/UIBZVDFCKm0KEgNPeHCSn8riAG3XJAPuwohev3yW3K02Yy2mlsH3xq7m5M45tgq+8y+sIhts132nbVYmwl2ze7YA3sMfgX3wVPw/GKdCiYzK+wfBL//AJj0qsg=</latexit>
<latexit sha1_base64="eKXRZ9EWEw/rIH/5IChqwzQF2JU=">AAACPHicbVDBbtNAFFy3QFsXqKHHXlZEVOUS2VXVIqRIqXrhmArSRooj63mzSbbd9Vq7zyHByodx4SO4ceLSQyvElTPr1AdIGWml0cw8vX2T5lJYDMPv3tr6o8dPNja3/O2nz57vBC9eXlhdGMa7TEtteilYLkXGuyhQ8l5uOKhU8sv0+qzyL6fcWKGzjzjP+UDBOBMjwQCdlAQf9mmMfIZGlR2jrzirZKogX9A49p05BZNPBH1HZ4mgsRHjCYIx+hP9nIhW7R44703rdJmIfT8JGmEzXII+JFFNGqRGJwm+xUPNCsUzZBKs7UdhjoMSDAom+cKPC8tzYNcw5n1HM1DcDsrl8Qv62ilDOtLGvQzpUv17ogRl7VylLqkAJ3bVq8T/ef0CR28HpcjyAnnG7heNCklR06pJOhTG1SXnjgAzwv2VsgkYYOj6rkqIVk9+SC4Om9Fx8/j8qNE+rOvYJHvkFTkgETkhbfKedEiXMPKF/CC35M776t14P71f99E1r57ZJf/A+/0HZBus6g==</latexit>
Xavier Bresson 7
8
Linear techniques
Task formalization : Restrict the mapping φ to be a linear operator A.
Several techniques exist to compute a linear operator A.
PCA, LDA, ICA, Non-negative matrix factorization[1] (NMF), Sparse Coding[2], etc.
<latexit sha1_base64="Tm2vb/7cR3DEzrAK9+G2aHSGiKk=">AAAEAnicfZNNb9NAEIZdh49ivlq4IHEZ0VKBlJa4QoVLpVZcOBZEP6RssNbrtb3Kem3tjtsklsWFv8KFAwhx5Vdw49+wTg00acucxu+8ftYzsw4LKQz2er8W3M6Vq9euL97wbt66fefu0vK9A5OXmvF9lstcH4XUcCkU30eBkh8VmtMslPwwHL5q6ofHXBuRq3c4Lvggo4kSsWAUrRQsuw/WYLJNjqkuUvFk9HR7dwTbHpE8xj54JOSJUBXVmo7rirHaVqhKJIfdoPK7hEU51tCFERA91QnxyLEVTZP8sw4vsXIVtXCPaJGkOLjsbClrbxL4Z/mTYHgxw7PyGkE+Qp1VcCIwrRsFRkQoklFMw7B6W7+PuhCRJAG/CwT+2lOLWI9ExlUzMyohokihyIXCU8pklpJ1ISPS2mYoMj/5L2R3FlJlBK3bgKpnMJFgzZqoHkMeQ0ERuVYGcg1258JAXKqpwbTUoBLtpOf4c9xVsbqO6R/gs+byUA2xkPaxQTURLK30NnrTgPOJ3yYrTht7wdJPEuWstE0jk9SYvt8rcGB3g4JJbrdTGl5QNqQJ79tUUdvwoJpe4RoeWyWC2HYW5wphqp59o6KZMeMstM6mLTNfa8SLav0S45eDSqiiRK7Y6UFxKQFzaP4HO2LNGcqxTSjTwn4rsJRqyuwoTDMEf77l88nB5oa/tbH15vnKzmY7jkXnofPIeeL4zgtnx3nt7Dn7DnM/uJ/cL+7XzsfO5863zvdTq7vQvnPfmYnOj98j3k2d</latexit>
2 3 2 3
hA1,· , xi z1
6 .. 7 6 .. 7
z = '(x) = Ax = 4 . 5=4 . 5
hAk,· , xi zk
with
x 2 Rd , d 1, high-dimensional data point
z 2 Rm , m ⌧ d, low-dimensional data point
A 2 Rm⇥n , dictionary of patterns or basis functions
Ai,· 2 Rn , i-th pattern/linear filter
[1] Lee, Seung, Learning the parts of objects by non-negative matrix factorization, 1999
[2] Olshausen, Field, Learning a sparse code for natural images, 1996
Xavier Bresson 8
9
+
28 x 28 MNIST images Visualization of MNIST
images in R3 with PCA
Xavier Bresson 9
10
Non-Linear
xi Dimensionality
<latexit sha1_base64="riA6Z2VOF7Rb8GSp9v+9jfjr2IA=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Ygl9Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8Ayf6N+w==</latexit>
Reduction
) zi
<latexit sha1_base64="yBAAEqp91WhTUwFixWJWAf74gv8=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3Qg19Dd48aCIV3+QN/+NmzQHbX0w8Hhvhpl5QcKZ0o7zbVU2Nre2d6q79t7+weFR7fikq+JUEuqRmMeyH2BFORPU00xz2k8kxVHAaS+Y3eZ+75FKxWLxoOcJ9SM8ESxkBGsjeU8jZtujWt1pOAXQOnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOFPUwVTTCZ4QkdGCpwRJWfFccu0IVRxiiMpSmhUaH+nshwpNQ8CkxnhPVUrXq5+J83SHV442dMJKmmgiwXhSlHOkb552jMJCWazw3BRDJzKyJTLDHRJp88BHf15XXSbTbcVqN1f1VvN8s4qnAG53AJLlxDG+6gAx4QYPAMr/BmCevFerc+lq0Vq5w5hT+wPn8AzQ6N/Q==</latexit>
<latexit sha1_base64="AiIhxF9B1xUHaE8x5Z+yTCIH0Ac=">AAACYXicbVFNSxxBEO0ZTaKbqBM9emlcEgwky4wE4yUgePFgQCWrws5mqemp1cbu6aG7xrgM8ye95ZKLf8Se3SX4kYKGx3uvqqtfZ6WSjuL4TxAuLL56/WZpufP23crqWvR+/cyZygrsC6OMvcjAoZIF9kmSwovSIuhM4Xl2fdDq5zdonTTFT5qUONRwWcixFECeGkW3H3lKeEtW10fm95dcaixaMyiuwRuNyhueph1v00BXAlT9wxOuyhzSjMuy+rT5lX/m6b9Rfkyz/ajh03fdypqnSvG8ndcZRd24F0+LvwTJHHTZvI5H0V2aG1H5/UgocG6QxCUNa7AkhcKmk1YOSxDXcIkDDwvQ6Ib1NKGGf/BMzsfG+lMQn7KPO2rQzk105p3t1u651pL/0wYVjfeGtSzKirAQs4vGleJkeBs3z6VFQWriAQgr/a5cXIEFQf5T2hCS509+Cc52eslub/fka3d/Zx7HEttkW2ybJewb22eH7Jj1mWB/g8VgJVgN7sPlMArXZ9YwmPdssCcVbj4AJ9u1Cw==</latexit>
Projection map
<latexit sha1_base64="Aoi8UKzbBdokfFmLFfF6vvL+Kd8=">AAACNnicbVDPSxtBGJ3V2sZoa6xHL0ODopewGyQWQQh46UWI0KiQDcu3k0kyzczOMvOtmi75q7z4d/TmxYMiXvsndDbuof54MPB473188704lcKi7996C4sflj5+qixXV1Y/f1mrrX89tTozjHeZltqcx2C5FAnvokDJz1PDQcWSn8WTo8I/u+DGCp38xGnK+wpGiRgKBuikqHa8TUPkV2hU3jH6F2eFTBWkMxqGVWdegEnHgh7Qq0jQ0IjRGMEYfUl/R+KwdHect1vko1rdb/hz0LckKEmdlOhEtT/hQLNM8QSZBGt7gZ9iPweDgkk+q4aZ5SmwCYx4z9EEFLf9fH72jG45ZUCH2riXIJ2r/0/koKydqtglFeDYvvYK8T2vl+Hwez8XSZohT9jzomEmKWpadEgHwrii5NQRYEa4v1I2BgMMXdNFCcHrk9+S02YjaDVaJ3v1drOso0I2yTeyQwKyT9rkB+mQLmHkmtySe/Lg3Xh33qP39Bxd8MqZDfIC3t9/JDmq5g==</latexit>
Xavier Bresson 10
11
Dimensionality reduction
An example where non-linear reduction effectively reveals clear patterns.
Several non-linear techniques are available, each suited to different data distributions.
<latexit sha1_base64="4TanTIfIlIp8v7S7yc/z5SR719A=">AAACUXicbVFNaxQxGM5O/ahTtWs9egkulgqyzNSy9lIoiODBQxW3LWzG5Z1sdjc0yYTknbLbYf6iBz35P7x4UMzszsG2vhB4eD5I3ie5VdJjkvzoRBt37t67v/kg3nr46PF298nOqS9Kx8WQF6pw5zl4oaQRQ5SoxLl1AnSuxFl+8bbRzy6F87Iwn3FpRaZhZuRUcsBAjbvzXcouwdm53Fu8PLqijMW7dMGkYRpwnufVp/pLNTg8qF9RRq+u86/X7jZ/xFAs0OnqA9h3ciaMButDeJRmdRzH424v6SerobdB2oIeaedk3P3GJgUvtTDIFXg/ShOLWQUOJVeijlnphQV+ATMxCtCAFj6rVo3U9EVgJnRauHAM0hX7b6IC7f1S58HZ7ONvag35P21U4vQwq6SxJQrD1xdNS0WxoE29dCKd4KiWAQB3MryV8jk44Bg+oSkhvbnybXC6308H/cHHg97xflvHJnlGnpM9kpI35Ji8JydkSDj5Sn6S3+RP53vnV0SiaG2NOm3mKbk20dZfHCqxtA==</latexit>
'(x) = z
x 2 R684 , z 2 R3
' = LapEigenmaps[1]
+
28 x 28 MNIST images Visualization of MNIST
x ∈ R28 × 28 = 684 images in 3D
z ∈ R3
[1] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003
Xavier Bresson 11
12
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 12
13
<latexit sha1_base64="akX31KZ/iKC9kn1Di4CDaBifSMo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQe1QbniVt0FyDrxclKBHM1B+as/jFkaoTRMUK17npsYP6PKcCZwVuqnGhPKJnSEPUsljVD72eLUGbmwypCEsbIlDVmovycyGmk9jQLbGVEz1qveXPzP66UmvPEzLpPUoGTLRWEqiInJ/G8y5AqZEVNLKFPc3krYmCrKjE2nZEPwVl9eJ+1a1atX6/dXlUYtj6MIZ3AOl+DBNTTgDprQAgYjeIZXeHOE8+K8Ox/L1oKTz5zCHzifP+71jYk=</latexit>
e2
<latexit sha1_base64="wymQk+R1PM1aT7hru88wL9SHsnU=">AAAB63icbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4r2A9oQ9lsN+3S3U3Y3RRK6F/w4kERr/4hb/4bN20O2vpg4PHeDDPzwoQzbTzv2yltbe/s7pX33YPDo+OTyulZR8epIrRNYh6rXog15UzStmGG016iKBYhp91wep/73RlVmsXyycwTGgg8lixiBJtcmrmuO6xUvZq3BNokfkGqUKA1rHwNRjFJBZWGcKx13/cSE2RYGUY4XbiDVNMEkyke076lEguqg2x56wJdWWWEoljZkgYt1d8TGRZaz0VoOwU2E73u5eJ/Xj810V2QMZmkhkqyWhSlHJkY5Y+jEVOUGD63BBPF7K2ITLDCxNh48hD89Zc3Sade8xu1xuNNtVkv4ijDBVzCNfhwC014gBa0gcAEnuEV3hzhvDjvzseqteQUM+fwB87nD3+sjTE=</latexit>
v
xi xi xi <latexit sha1_base64="wymQk+R1PM1aT7hru88wL9SHsnU=">AAAB63icbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4r2A9oQ9lsN+3S3U3Y3RRK6F/w4kERr/4hb/4bN20O2vpg4PHeDDPzwoQzbTzv2yltbe/s7pX33YPDo+OTyulZR8epIrRNYh6rXog15UzStmGG016iKBYhp91wep/73RlVmsXyycwTGgg8lixiBJtcmrmuO6xUvZq3BNokfkGqUKA1rHwNRjFJBZWGcKx13/cSE2RYGUY4XbiDVNMEkyke076lEguqg2x56wJdWWWEoljZkgYt1d8TGRZaz0VoOwU2E73u5eJ/Xj810V2QMZmkhkqyWhSlHJkY5Y+jEVOUGD63BBPF7K2ITLDCxNh48hD89Zc3Sade8xu1xuNNtVkv4ijDBVzCNfhwC014gBa0gcAEnuEV3hzhvDjvzseqteQUM+fwB87nD3+sjTE=</latexit>
+
<latexit sha1_base64="ZYxI7HoimVNbHL0XoWVVcFwioqw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQfeoFxxq+4CZJ14OalAjuag/NUfxiyNUBomqNY9z02Mn1FlOBM4K/VTjQllEzrCnqWSRqj9bHHqjFxYZUjCWNmShizU3xMZjbSeRoHtjKgZ61VvLv7n9VIT3vgZl0lqULLlojAVxMRk/jcZcoXMiKkllClubyVsTBVlxqZTsiF4qy+vk3at6tWr9furSqOWx1GEMziHS/DgGhpwB01oAYMRPMMrvDnCeXHenY9la8HJZ07hD5zPH+1xjYg=</latexit>
e1 +
Projection of data points into Principal component v and
Original data
the direction of the largest approximation of the
distribution in R2
variation v of the distribution original distribution w.r.t.
the largest variance
[1] Pearson, On lines and planes of closest fit to systems of points in space, 1901
Xavier Bresson 13
14
Task formulation
Given a set of data points, PCA projects the data onto an orthogonal basis that best captures its
variance.
Assuming the data distribution is centered at the origin, PCA defines an orthogonal
transformation, i.e. a rotation matrix, that maps the data to a new coordinate system (v1,v2,…,vK)
known as principal directions such that
The first basis function or principal direction v1 captures the largest possible variance in data.
The second basis function or principal direction v2 captures the second largest possible
variance while being orthogonal to the first principal direction ⟨v1,v2⟩=0.
For each subsequent direction, the vk’s capture the k-th largest possible data variance,
maintaining orthogonality to all previous directions. e2
v1
v2
Rotation
e1
Origin
Xavier Bresson 14
15
Covariance matrix
Data variance across feature dimensions is captured by the covariance matrix :
C = X T X 2 Rd⇥d ,
<latexit sha1_base64="vJZAjb4qaPPDaJy2sjI+AcM66Tg=">AAAEq3ichVPbTttAEDUkbWl6AdrHvoxKiqgULNuq6EVCQqKV+lRRyiUqG6z1eoI32Otod81Fxh/XX+hb/6a7iYEAKV3Jq/HMnLmcmY2GKVfa8/7MzDaaDx4+mnvcevL02fP5hcUXeyovJMNdlqe57EZUYcoF7mquU+wOJdIsSnE/Ot609v0TlIrnYkefD7GX0SPB+5xRbVTh4uyvZdiEdege7kAXCBdAMqqTKCq3q8MyJppnqCCuOkBIaxmIxjMts/KU6wRiqikYb8nPoJqCFlfoKxycJijReI+jJWpIGZae+5FlE05t0QauQCcIosgilJD3x9mGORdajfBwX4D4XwH6SHUhcSJEjdlJUMApQkJPsC5vMyx9v7LkhCVhca47fmVpuv4zNnJxQ0EuwuAwsHpVZGHJ1w1EWA9uhOC6whMqORUMwXBmy4y5RGZHAm0M/fZEBcE9FYxsUzLZO5jgg8lcqdX/5Fw1d3CZ+RK6jRkXsSHwE3y2/BlWGQpthhjbMEhZcsmpiZehsJsGgw5wF91bY/brKdUr8qVauWpkUL2d1snA9ueZ3QPSzyVNUxiMt6z0O67rdmJStVrhwpLneqMDdwW/Fpac+myFC79JnLPCFKtZSpU68L2h7pVUas5SrFqkUGgKPqZHeGBEQc0S98rRW6vgjdHEYKoxn9Aw0k4iSpopdZ5FxtP2qW7brHKa7aDQ/Q+9kothoVGwcaJ+kYLOwT7celrpuREok9zUCiyhkjIzDGVJ8G+3fFfYC1x/zV37/m5pI6jpmHNeOa+dFcd33jsbzldny9l1WGOl8a2x3+g2V5s/mj+bZOw6O1NjXjo3ThP/Ai6Ceok=</latexit>
T
kX·,1 k22 2 C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>
T
C12 = X·,1 X·,2 = Xi1 Xi2 cross-variance in the direction e1 -e2
i=1
Reminder : Data is centered in each feature dimension j, i.e.
Vector e1 is the
n
X direction of the largest
E(X·,j ) = Xij = 0, 8j 2 {1, ..., d}
variance with value C11
i=1
Xavier Bresson 15
16
C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>
n
X
<latexit sha1_base64="BA0TVd8XcZ7arSjI22XTxgW5GG0=">AAADs3ichVJbb9MwFE4bLqNc1sEjL0esTJ2EqqSCgZAmDfWFx4HarlLdBsdxErPECbETWqX5gbzyxr/BTjMEXSUs2TqX73zn+LPdNGJCWtavVtu8c/fe/YMHnYePHj857B49nYokzwidkCRKspmLBY0YpxPJZERnaUZx7Eb0yr0e6fxVQTPBEj6W65QuYhxw5jOCpQo5R60fJ1A4SNKVzOIywllAhawQ4yjGMnTd8nO19OAcbhAKEONV5ZRoU6CNMzy3K0CARB47JVPOkgNyWdCHlcOWYyhq73Q5fKVQKrRLjFDnBFAoUkxoaQ+GJK7+26xQvCNQ1E31txx7f0oCVlAOMsQSqu1YbP9Aus1mVtPWTn9WnC7H+lSObjHTW+M1cy10+T1kklblTS8U5gEtP1RVBXtnwdwDdw0e9RlnWm410kixb5n3FPTqGYseMKGuQCHNkq+U1JWJDzrbg4TXKY9lTaan8O8Vc8fpHlsDq15w27Ab49ho1qXT/Ym8hOQx5ZJEWIi5baVyoVSXjES06qBcUPUw1zigc2VyHFOxKGspKnipIh74SaY2l1BH/64ocSzEOnYVUj+42M3p4L7cPJf+u0XJeJpLysm2kZ9HIBPQH7i5eaR0ZZhkSlcCJMQZJlJ9cy2CvXvl28Z0OLDPBmefXh9fDBs5Doznxgujb9jGW+PC+GhcGhODtK32tO20v5hvzLnpmt4W2m41Nc+Mf5YZ/wYsjimS</latexit>
C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>
2 C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>
xi
xTi v is the projection of xi on the direction v :
<latexit sha1_base64="dh0tZ0YXEOcgLBAVaQXFDPcZOsA=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxCIo8ENmR2aGBkdnYzM0tCNnyBFw8a49VP8ubfOMAeFKykk0pVd7q7glhwbVz328ltbe/s7uX3CweHR8cnxdOzlo4SxbDJIhGpTkA1Ci6xabgR2IkV0jAQ2A4m9wu/PUWleSQfzSxGP6QjyYecUWOlxrRfLLlldwmySbyMlCBDvV/86g0iloQoDRNU667nxsZPqTKcCZwXeonGmLIJHWHXUklD1H66PHROrqwyIMNI2ZKGLNXfEykNtZ6Fge0MqRnrdW8h/ud1EzO881Mu48SgZKtFw0QQE5HF12TAFTIjZpZQpri9lbAxVZQZm03BhuCtv7xJWpWyVy1XGzelWiWLIw8XcAnX4MEt1OAB6tAEBgjP8ApvzpPz4rw7H6vWnJPNnMMfOJ8/4fOM9Q==</latexit>
xTi v
<latexit sha1_base64="saSb6pbqbAOExLgiGyJ0Bi2J3ps=">AAAB7nicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxiwsMEVjI7zMKE2dnNTC+RED7CiweN8er3ePNvHGAPClbSSaWqO91dQSKFQdf9dnIbm1vbO/ndwt7+weFR8fikZeJUM95ksYz1Q0ANl0LxJgqU/CHRnEaB5O1gdDv322OujYhVAycJ9yM6UCIUjKKV2k898dgg416x5JbdBcg68TJSggz1XvGr249ZGnGFTFJjOp6boD+lGgWTfFbopoYnlI3ogHcsVTTixp8uzp2RC6v0SRhrWwrJQv09MaWRMZMosJ0RxaFZ9ebif14nxfDGnwqVpMgVWy4KU0kwJvPfSV9ozlBOLKFMC3srYUOqKUObUMGG4K2+vE5albJXLVfvr0q1ShZHHs7gHC7Bg2uowR3UoQkMRvAMr/DmJM6L8+58LFtzTjZzCn/gfP4A6x6PQw==</latexit>
e1
Xavier Bresson 16
17
Eigenvalue decomposition
Next, we perform the eigenvalue decomposition (EVD) of the positive semi-definite (PSD)
covariance matrix C :
e2 <latexit sha1_base64="tP3fJgXRE5fGy/K/GEADPWDiTbI=">AAAB/3icbVBNS8NAEN34WetXVPDiJVgETyUpUr0IBS8eK9gPaEPYbCft0s0Hu5NiiT34V7x4UMSrf8Ob/8Ztm4O2Phh4vDfDzDw/EVyhbX8bK6tr6xubha3i9s7u3r55cNhUcSoZNFgsYtn2qQLBI2ggRwHtRAINfQEtf3gz9VsjkIrH0T2OE3BD2o94wBlFLXnm8cjrIjygDDNBZR8UTq5HnuOZJbtsz2AtEycnJZKj7plf3V7M0hAiZIIq1XHsBN2MSuRMwKTYTRUklA1pHzqaRjQE5Waz+yfWmVZ6VhBLXRFaM/X3REZDpcahrztDigO16E3F/7xOisGVm/EoSREiNl8UpMLC2JqGYfW4BIZirAllkutbLTagkjLUkRV1CM7iy8ukWSk71XL17qJUq+RxFMgJOSXnxCGXpEZuSZ00CCOP5Jm8kjfjyXgx3o2PeeuKkc8ckT8wPn8Auf+WhA==</latexit>
vlargest = v1
C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>
2 Rd , j = 1, ..., d
<latexit sha1_base64="hdyQG2kHH18i7ERxeakR6dOcYlI=">AAAEF3icjVNLbxMxEN5ueJTwaApHLiNSKg4hykaoIKRIlXIBiUNB6UOqk8jrnSROvd5ge7et0v0XXPgrXDiAEFe48W/wJptXWyQsWZqZ75uZb0a2PxJcm1rtz5pbuHHz1u31O8W79+4/2ChtPjzQUawY7rNIROrIpxoFl7hvuBF4NFJIQ1/goX/SzPDDBJXmkWyZ8xG2Q9qXvMcZNTbU3XSfb0Mz6Q6hAUTYtIBa2/qESxJSM/D98Ye0E1SAwLDhVarVaiUAQorbQAyeGRWOT7kZgBkgIO+jTKiIUb+GdF5ubOucpY2Z6wHp48c5Wp+6tu5qPFgSZCtwmU7x2mrzd2gg1sAiqXmAaqJDUNVHbZb0VOAUYUATzGRl6UmuqtOC5txZ7ZgFFrR/c8jFDCMX3Xqnfg1jea5hhQDpRYoKAUMiI9PwVid6KyGyUyg4jVSgF8q3ku6Mkg+YNmadreFtAdX5dPNalpfBlnWRZOoanmVA0mk1EyvzP5ewCHjFYrdUrlVrkwNXDS83yk5+9rql3ySIWByiNExQrY+92si0rTLDmcC0SGKNI8pOaB+PrSlpiLo9nrzrFJ7aSAB2W/ZKA5PocsaYhlqfh75lZm9VX8ay4HXYcWx6r9pjLkexQcmmjXqxABNB9kkg4AqZEefWoExxqxXYgCrKjP1K2RK8yyNfNQ7qVW+nuvP+RXm3nq9j3XnsPHGeOZ7z0tl13jh7zr7D3E/uF/eb+73wufC18KPwc0p11/KcR87KKfz6C+hYUyY=</latexit>
Cvj = j vj
C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>
C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>
Xavier Bresson 17
18
n
X 2
Cv1 = 1 v1 ! v1T Cv1 = T
1 v1 v1 = 2
1 kv1 k2 = 1 = argmaxkvk2 =1 xTi v
i=1
Similarly, the direction of the second largest data variance, or the second PD, is defined as :
<latexit sha1_base64="jiJ++768IWaQMeXeWiql8G2GyqY=">AAAEC3icfVPLbtNAFHUdHiW8UliyudBQtVIUxQEVhFSpUjewK6gvqU6s8XjiTDueCTNjt5HrPRt+hQ0LEGLLD7Djb7h2LdKmwEiW79yXzzn3OpwIbmyv92vBbVy7fuPm4q3m7Tt3791vLT3YMyrVlO1SJZQ+CIlhgku2a7kV7GCiGUlCwfbD460yvp8xbbiSO3Y6YYOExJKPOCUWXcGS+3gFsqDvc+knxI7DMH9XDCPYAN+yU6uTnOg4IadFkPtnmX8W9De8AnzwTZoEOcfLUIIf8ngVTgM+3IGsuq0N+50yq+4Bpmu7UEA23MkCb6M3C6y2szZwA0rbsYqVJAKsgjZmtdewAHy/uTKDIiOwYwZGibREXxbGPGMSwul5gFGFOYzHTGZEpAzKEm5N7WLUKv0KsHHVF3kj4q3yXRIWqFpE0K78806kjzKhAMM5f8ze/7kedXzEPFKaCAFHVejZjGwJprhYKi6Uepe5vpGgkJKGE6Uj04ETBmOSsRr5f6bTqVUup4TWVoZo/0V0NoZaOYH9mLGQEc2JpAxn0GwGreVet1cduGp4tbHs1Gc7aP30I0XThElLBTHm0OtN7ACxWk4FK5p+atiE0GMSs0M0JUmYGeTVLhfwFD0RoIT4SAuV92JFThJjpkmImeXCmvlY6fxb7DC1o5eDnMtJapmk5x8apdW+lT8GRFzjfogpGoRqjliBjokm1OLvU4rgzVO+auz1u956d/3t8+XNfi3HovPIeeKsOp7zwtl0Xjvbzq5D3Q/uJ/eL+7XxsfG58a3x/TzVXahrHjqXTuPHb4dKRyE=</latexit>
n
X e2
2 v1
v2 2 Rd = argmaxkvk2 =1 xTi v , s.t. v T v1 = 0 (v is orthogonal to v1 ) v2
<latexit sha1_base64="B3fYO1e43jTTf9GL4+leLuN7MP4=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpF262YTdTaGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bu63J6g0j+WTmSboR3QoecgZNVZ6nPS9fqnsVtwFyDrxclKGHI1+6as3iFkaoTRMUK27npsYP6PKcCZwVuylGhPKxnSIXUsljVD72eLUGbm0yoCEsbIlDVmovycyGmk9jQLbGVEz0qveXPzP66YmvPUzLpPUoGTLRWEqiInJ/G8y4AqZEVNLKFPc3krYiCrKjE2naEPwVl9eJ61qxatVag/X5Xo1j6MA53ABV+DBDdThHhrQBAZDeIZXeHOE8+K8Ox/L1g0nnzmDP3A+fwAHZo2Z</latexit>
<latexit sha1_base64="jQpwXqU8wSg4ykzncsVEQtq5GZ4=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpF262YTdTaGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bu63J6g0j+WTmSboR3QoecgZNVZ6nPSr/VLZrbgLkHXi5aQMORr90ldvELM0QmmYoFp3PTcxfkaV4UzgrNhLNSaUjekQu5ZKGqH2s8WpM3JplQEJY2VLGrJQf09kNNJ6GgW2M6JmpFe9ufif101NeOtnXCapQcmWi8JUEBOT+d9kwBUyI6aWUKa4vZWwEVWUGZtO0Ybgrb68TlrViler1B6uy/VqHkcBzuECrsCDG6jDPTSgCQyG8Ayv8OYI58V5dz6WrRtOPnMGf+B8/gAI6o2a</latexit>
C12
<latexit sha1_base64="hB/A3uxybTFxwUcxcc1KXYWKF6Y=">AAAB7XicbVBNSwMxEJ34WetX1aOXYBE8ld0i1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb63nfaGNza3tnt7BX3D84PDounZy2jUo1ZS2qhNLdkBgmuGQty61g3UQzEoeCdcJJY+53npg2XMkHO01YEJOR5BGnxDqp3RhkfnU2KJW9ircAXid+TsqQozkoffWHiqYxk5YKYkzP9xIbZERbTgWbFfupYQmhEzJiPUcliZkJssW1M3zplCGOlHYlLV6ovycyEhszjUPXGRM7NqveXPzP66U2ug0yLpPUMkmXi6JUYKvw/HU85JpRK6aOEKq5uxXTMdGEWhdQ0YXgr768TtrVil+r1O6vy/VqHkcBzuECrsCHG6jDHTShBRQe4Rle4Q0p9ILe0ceydQPlM2fwB+jzB/AUjq4=</latexit>
C11
<latexit sha1_base64="gYileR05pAh6KIXN6+GLZyVUEG4=">AAAB7XicbVBNSwMxEJ31s9avqkcvwSJ4Kpsi1WOhF48V7Ae0S8mm2TY2myxJVihL/4MXD4p49f9489+YtnvQ1gcDj/dmmJkXJoIb6/vf3sbm1vbObmGvuH9weHRcOjltG5VqylpUCaW7ITFMcMlallvBuolmJA4F64STxtzvPDFtuJIPdpqwICYjySNOiXVSuzHIMJ4NSmW/4i+A1gnOSRlyNAelr/5Q0TRm0lJBjOlhP7FBRrTlVLBZsZ8alhA6ISPWc1SSmJkgW1w7Q5dOGaJIaVfSooX6eyIjsTHTOHSdMbFjs+rNxf+8Xmqj2yDjMkktk3S5KEoFsgrNX0dDrhm1YuoIoZq7WxEdE02odQEVXQh49eV10q5WcK1Su78u16t5HAU4hwu4Agw3UIc7aEILKDzCM7zCm6e8F+/d+1i2bnj5zBn8gff5A+6Pjq0=</latexit>
i=1
C22
<latexit sha1_base64="tzt0PjMpiw6A2RYpgHLCpzR1htc=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbBU9ldpPVY6MVjBfsB7VKyabaNzSZLkhXK0v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmhQln2rjut1PY2t7Z3Svulw4Oj45PyqdnHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91w2lz43SeqNJPiwcwSGsR4LFjECDZW6jSHme/Ph+WKW3WXQJvEy0kFcrSG5a/BSJI0psIQjrXue25iggwrwwin89Ig1TTBZIrHtG+pwDHVQba8do6urDJCkVS2hEFL9fdEhmOtZ3FoO2NsJnrdW4j/ef3URLdBxkSSGirIalGUcmQkWryORkxRYvjMEkwUs7ciMsEKE2MDKtkQvPWXN0nHr3q1au3+ptLw8ziKcAGXcA0e1KEBd9CCNhB4hGd4hTdHOi/Ou/Oxai04+cw5/IHz+QPxmo6v</latexit>
and the solution is given by the second eigenvalue and its eigenvector:
v2T Cv2 = T
2 v2 v2 = 2
2 kv2 k2 = 2 j, 8j 3 and 2 1
e1
In other words, we have
argmaxkvk2 =1,vT v1 v T Cv = v2T Cv2 = 2 (second largest variance)
Xavier Bresson 18
19
n
X 2
v3 2 Rd = argmaxkvk2 =1 xTi v , s.t. v T v1 = 0 and v T v2 = 0
i=1
(v is orthogonal to v1 and v2 )
The solution is given by the third eigenvalue and its eigenvector:
Cv3 = 3 v3
Altogether, we consider the full matrix factorization of C with EVD:
C = V ⇤V T 2 Rd⇥d
h i
with V = v1 , ..., vd 2 Rd⇥d , V T V = Id 2 Rd⇥d (Identity matrix), ⇤ = diag( 1 , ..., d) 2 Rd⇥d
Xavier Bresson 19
20
xi v1 = PD1
<latexit sha1_base64="UAO+EvMwPFJRHM4N/n8lzAx6X94=">AAAB+nicbVBNS8NAEN3Ur1q/Uj16WSyCp5IUqV6Egh48VrAf0Iaw2W7bpbtJ2J1US+xP8eJBEa/+Em/+G7dtDtr6YODx3gwz84JYcA2O823l1tY3Nrfy24Wd3b39A7t42NRRoihr0EhEqh0QzQQPWQM4CNaOFSMyEKwVjK5nfmvMlOZReA+TmHmSDELe55SAkXy7OPbdqy6wR1Ayrd9Mfde3S07ZmQOvEjcjJZSh7ttf3V5EE8lCoIJo3XGdGLyUKOBUsGmhm2gWEzoiA9YxNCSSaS+dnz7Fp0bp4X6kTIWA5+rviZRIrScyMJ2SwFAvezPxP6+TQP/SS3kYJ8BCuljUTwSGCM9ywD2uGAUxMYRQxc2tmA6JIhRMWgUTgrv88ippVsputVy9Oy/VKlkceXSMTtAZctEFqqFbVEcNRNEDekav6M16sl6sd+tj0Zqzspkj9AfW5w/SNJOv</latexit>
xTi v2
<latexit sha1_base64="3y2UsGVHJ1tn7D2CrSPeqdaYJA8=">AAAB/3icbVBNS8NAEN34WetXVPDiJVgETyUpUr0IhV48VugXtDFstpt26W4SdielJfbgX/HiQRGv/g1v/hu3bQ7a+mDg8d4MM/P8mDMFtv1trK1vbG5t53byu3v7B4fm0XFTRYkktEEiHsm2jxXlLKQNYMBpO5YUC5/Tlj+szvzWiErForAOk5i6AvdDFjCCQUueedoFOgYp0lp16pVuxx57qI+8kmcW7KI9h7VKnIwUUIaaZ351exFJBA2BcKxUx7FjcFMsgRFOp/luomiMyRD3aUfTEAuq3HR+/9S60ErPCiKpKwRrrv6eSLFQaiJ83SkwDNSyNxP/8zoJBDduysI4ARqSxaIg4RZE1iwMq8ckJcAnmmAimb7VIgMsMQEdWV6H4Cy/vEqapaJTLpbvrwqVUhZHDp2hc3SJHHSNKugO1VADEfSIntErejOejBfj3fhYtK4Z2cwJ+gPj8weo9pXU</latexit>
xTi v1 = PC1
<latexit sha1_base64="WV40eFKzOu17VGqX8tsfGzXV0i0=">AAAB/3icbVDLSgNBEJz1GeNrVfDiZTAInsJukOhFCOTiMUJekKzL7GQ2GTL7YKY3JKw5+CtePCji1d/w5t84SfagiQUNRVU33V1eLLgCy/o21tY3Nre2czv53b39g0Pz6LipokRS1qCRiGTbI4oJHrIGcBCsHUtGAk+wljeszvzWiEnFo7AOk5g5AemH3OeUgJZc83Ts8of6yLVvu8DGIIO0Vp26tmsWrKI1B14ldkYKKEPNNb+6vYgmAQuBCqJUx7ZicFIigVPBpvluolhM6JD0WUfTkARMOen8/im+0EoP+5HUFQKeq78nUhIoNQk83RkQGKhlbyb+53US8G+clIdxAiyki0V+IjBEeBYG7nHJKIiJJoRKrm/FdEAkoaAjy+sQ7OWXV0mzVLTLxfL9VaFSyuLIoTN0ji6Rja5RBd2hGmogih7RM3pFb8aT8WK8Gx+L1jUjmzlBf2B8/gCis5XS</latexit>
PC2 =
e1
Dimensionality reduction
Suppose that the data is primarily concentrated along the first principal directions, the remaining
directions, which mostly capture noise or insignificant details, can be discarded.
The first K principal directions (PDs) can be selected as :
k such that kX
where Xkpca = XVk 2 Rn⇥k is the approximation of X with the first k PDs
h i
and Vk = v1 , ..., vk 2 Rd⇥k , VkT Vk = Ik 2 Rk⇥k
is the truncated V with the first k PDs
<latexit sha1_base64="akX31KZ/iKC9kn1Di4CDaBifSMo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQe1QbniVt0FyDrxclKBHM1B+as/jFkaoTRMUK17npsYP6PKcCZwVuqnGhPKJnSEPUsljVD72eLUGbmwypCEsbIlDVmovycyGmk9jQLbGVEz1qveXPzP66UmvPEzLpPUoGTLRWEqiInJ/G8y5AqZEVNLKFPc3krYmCrKjE2nZEPwVl9eJ+1a1atX6/dXlUYtj6MIZ3AOl+DBNTTgDprQAgYjeIZXeHOE8+K8Ox/L1oKTz5zCHzifP+71jYk=</latexit>
e2
v1
<latexit sha1_base64="DrvT3yMXAHnl7vpUgc841GOgMeo=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0mKVI8FLx4rmLbQhrLZbtqlm03Y3RRK6G/w4kERr/4gb/4bN2kO2vpg4PHeDDPzgoQzpR3n26psbe/s7lX37YPDo+OT2ulZV8WpJNQjMY9lP8CKciaop5nmtJ9IiqOA014wu8/93pxKxWLxpBcJ9SM8ESxkBGsjefORa9ujWt1pOAXQJnFLUocSnVHtaziOSRpRoQnHSg1cJ9F+hqVmhNOlPUwVTTCZ4QkdGCpwRJWfFccu0ZVRxiiMpSmhUaH+nshwpNQiCkxnhPVUrXu5+J83SHV452dMJKmmgqwWhSlHOkb552jMJCWaLwzBRDJzKyJTLDHRJp88BHf95U3SbTbcVqP1eFNvN8s4qnABl3ANLtxCGx6gAx4QYPAMr/BmCevFerc+Vq0Vq5w5hz+wPn8AcZ6NwQ==</latexit>
xi pca = V T xi 2 Rd
<latexit sha1_base64="8ldh9i1NJUaxyBDHcA34MnWekXs=">AAACgHicbVHbihNBEO0Zb2u8RX30pTC4RNA4s8gqgrDgi4+rbLILmWTo6VQ2Tbp7mu6aJWGY7/C/fPNjBHuyETSbAw2HU+dQ1VWFVdJTkvyK4lu379y9d3C/8+Dho8dPuk+fjXxZOYFDUarSXRTco5IGhyRJ4YV1yHWh8LxYfmnr51fovCzNGa0tTjS/NHIuBacg5d0fh1CvctlMM8IVOV1bwRv4DKPpGQQ9kybTnBZFUX9vpjPIss4hZNxaV64ARvlyn20JO76rPN3jg78toU8L6QFXXFuFr5s2nXd7ySDZAG6SdEt6bIvTvPszm5Wi0mhIKO79OE0sTWruSAqFTSerPFoulvwSx4EartFP6s0CG3gVlBnMSxeeIdio/yZqrr1f6yI42/H9bq0V99XGFc0/TmppbEVoxHWjeaWASmivATPpUJBaB8KFk2FWEAvuuKBws3YJ6e6Xb5LR0SA9Hhx/e987Odqu44C9YC9Zn6XsAzthX9kpGzLBfke96E30No7jfvwuTq+tcbTNPGf/If70B7yMwDY=</latexit>
xi xi
⇡ VkT xi 2 Rk
e1
+ ⇡ v1T xi 2 R (this example)
<latexit sha1_base64="ZYxI7HoimVNbHL0XoWVVcFwioqw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpAQfeoFxxq+4CZJ14OalAjuag/NUfxiyNUBomqNY9z02Mn1FlOBM4K/VTjQllEzrCnqWSRqj9bHHqjFxYZUjCWNmShizU3xMZjbSeRoHtjKgZ61VvLv7n9VIT3vgZl0lqULLlojAVxMRk/jcZcoXMiKkllClubyVsTBVlxqZTsiF4qy+vk3at6tWr9furSqOWx1GEMziHS/DgGhpwB01oAYMRPMMrvDnCeXHenY9la8HJZ07hD5zPH+1xjYg=</latexit>
Xkpca k2F
<latexit sha1_base64="tGHPCejNxj+9uJXsdZHG4D3Vj58=">AAAB8XicbVBNSwMxFHypX7V+VT16CRbBU9ktUj0WvHisYFuxXUo2m21js9klyQpl6b/w4kERr/4bb/4bs+0etHUgMMzMI++NnwiujeN8o9La+sbmVnm7srO7t39QPTzq6jhVlHVoLGJ17xPNBJesY7gR7D5RjES+YD1/cp37vSemNI/lnZkmzIvISPKQU2Ks9DAQNhqQ4WNlWK05dWcOvErcgtSgQHtY/RoEMU0jJg0VROu+6yTGy4gynAo2qwxSzRJCJ2TE+pZKEjHtZfONZ/jMKgEOY2WfNHiu/p7ISKT1NPJtMiJmrJe9XPzP66cmvPIyLpPUMEkXH4WpwCbG+fk44IpRI6aWEKq43RXTMVGEGltSXoK7fPIq6TbqbrPevL2otRpFHWU4gVM4BxcuoQU30IYOUJDwDK/whjR6Qe/oYxEtoWLmGP4Aff4A85eQaQ==</latexit>
<latexit sha1_base64="pz+6d76F7oJOdXOy0yFkdpuqUDM=">AAACtHicdVHLbtNAFB2bVwmvAEs2V6QgNkR2hEJZIFVCQiyLIE2kTGIm4+tk6pmxOzOuiFx/ITt2/A3jNIi0hbs6Ouc+z12UUlgXRb+C8MbNW7fv7N3t3Lv/4OGj7uMnx7aoDMcRL2RhJgtmUQqNIyecxElpkKmFxPEi/9Dq4zM0VhT6q1uXOFNsqUUmOHOeSro/XgJ1+N0ZVX9BidzBfr4PtuIrcCvmoAF6DpPXk/mfrJKzJsnpefJxPgAq8RToGTNYWiELDZR2/jYsDFihSrluu1xS/jcqM4zX1FYqqU/ex8089xP8LSlLTppdPt3hgS79ElH/XSfp9qJ+tAm4DuIt6JFtHCXdnzQteKVQOy6ZtdM4Kt2sZsYJLrHp0MpiyXjOljj1UDOFdlZvTG/ghWdSyPyRWaEdbNjdipopa9dq4TMVcyt7VWvJf2nTymUHs1rosnKo+cWgrJLgCmg/CKkw3jzvaioYN8LvCnzFvHPO/7k1Ib568nVwPOjHw/7w85ve4WBrxx55Rp6TVyQmb8kh+USOyIjwIA7GwbeAhcOQhjzEi9Qw2NY8JZci1L8BuDnUEA==</latexit>
or simply
Pk
j=1 j
Select k such that Pd 0.9 90% of total data variance
j=1 j
k j
<latexit sha1_base64="lMpwLjGEIF4eQMTIKTDw26b6Vto=">AAAB6nicbVBNS8NAEJ2tXzV+VT16WSyCp5IUqR4LXjxWtLXQhrLZbtq1m03Y3Qgl9Cd48aCIV3+RN/+NmzYHbX0w8Hhvhpl5QSK4Nq77jUpr6xubW+VtZ2d3b/+gcnjU0XGqKGvTWMSqGxDNBJesbbgRrJsoRqJAsIdgcp37D09MaR7LezNNmB+RkeQhp8RY6e7RcQaVqltz58CrxCtIFQq0BpWv/jCmacSkoYJo3fPcxPgZUYZTwWZOP9UsIXRCRqxnqSQR0342P3WGz6wyxGGsbEmD5+rviYxEWk+jwHZGxIz1speL/3m91IRXfsZlkhom6WJRmApsYpz/jYdcMWrE1BJCFbe3YjomilBj08lD8JZfXiWdes1r1Bq3F9VmvYijDCdwCufgwSU04QZa0AYKI3iGV3hDAr2gd/SxaC2hYuYY/gB9/gA4yo0R</latexit>
<latexit sha1_base64="wKhi5w6jcXlQlLYp7/zKfM6C3Bg=">AAAB6XicbVBNS8NAEJ2tX7V+VT16WSyCp5IUqR4LXjxWsR/QhrLZbtolm03Y3Qgl9B948aCIV/+RN/+NmzYHbX0w8Hhvhpl5fiK4No7zjUobm1vbO+Xdyt7+weFR9fikq+NUUdahsYhV3yeaCS5Zx3AjWD9RjES+YD0/vM393hNTmsfy0cwS5kVkInnAKTFWeggro2rNqTsL4HXiFqQGBdqj6tdwHNM0YtJQQbQeuE5ivIwow6lg88ow1SwhNCQTNrBUkohpL1tcOscXVhnjIFa2pMEL9fdERiKtZ5FvOyNipnrVy8X/vEFqghsv4zJJDZN0uShIBTYxzt/GY64YNWJmCaGK21sxnRJFqLHh5CG4qy+vk26j7jbrzfurWqtRxFGGMziHS3DhGlpwB23oAIUAnuEV3lCIXtA7+li2llAxcwp/gD5/AAXNjP4=</latexit>
YaleBFaces dataset
structure noise
Xavier Bresson 22
23
X = U ⌃W 2 Rn⇥d
<latexit sha1_base64="1jMfGTVIeEg5svoaLqb93dwEyVg=">AAAEYHicdVNNb9NAEHWTACW0tIUbXEa0VKkURUmECpdIRQUJJA6Fxnakbmyt15tkG3tt7a5TKst/khsHLvwS1o5dkn6sZHk+3pt5M/Z6ccCk6nZ/b9TqjUePn2w+bT7b2n6+s7v3wpJRIgg1SRREYuRhSQPGqamYCugoFhSHXkBtb36a5+0FFZJFfKiuYzoO8ZSzCSNY6ZC7V1scwggGYAI6Z9MQgw2IcRRiNfO89EfmpBwpFlIJfgYINQ8BKfpTiTC9YmoGGZgPwHnWBgSmMzQHFeNr5vI22LcI/k39nGA7Q3uV4LdLYQ+qWhNlU5jhBYVS6+lg5Azz8Vr/5ztyhqveEmgv/Vau9+gm6Qw11S5dp18Eqk7QOrc+HWXrbSxA3/TufQzWGvTzEgorSj9KwCCoTAJVqbUGdhtVFQY3XZFg05nCQkRXgIIi6V4OkMzT7qXTz9c2cqq6McGZO9daRmAV72pW7dju/IE1zu/7uAcVsZ0zDwALCkokXP871AddRTCiudGkQmrcARRUNaMwhwCLKZUKJOPTRDuwwEFCZQbNpru73+10iwN3jV5p7BvlOXN3fyE/IklIuSIBlvKi143VOMVCMRLQrIkSSWNM5nhKL7TJsR5rnBYXJIO3OuLDJBL64QqK6CojxaGU16Gnkfly5O1cHrwvd5GoyYdxynicKMrJstEkCUBFkN828JmgRAXX2sBEMK0VyAwLTJS+k/kSerdHvmtY/U7vuHP8/d3+Sb9cx6bx2nhjtIye8d44Mb4YZ4ZpkNqfer2+Vd+u/21sNnYae0tobaPkvDTWTuPVP5RHYSU=</latexit>
Xavier Bresson 23
24
EVD or SVD
The choice depends on the value of the size (n × d) of the data matrix X :
For d < n : Use EVD, complexity is O(d3)
For n < d : Apply SVD, complexity is O(min(nd2,n2d))
Examples
MNIST dataset : 60,000 × 684 ⇒ Apply EVD
Microarray-based gene expression dataset[1] : 240 × 7,399 ⇒ Apply SVD
[1] Rosenwald et-al, The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma, 2002
Xavier Bresson 24
25
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 26
27
Robust PCA
Standard PCA is sensitive to outliers; even a single outlier can
significantly change the PCA solution.
Robust PCA[1] is a technique designed to separate outliers from the data,
allowing PCA to be performed on the clean part of the data. Emmanuel Candes Yi Ma
Noisy
Principal
Outlier directions
[1] Candes, Li, Ma, Wright, Robust principal component analysis, 2011 (8,000 citations)
Xavier Bresson 27
28
Task formalization
Lk2F s.t. rank(L) = k
<latexit sha1_base64="bYeFiS0eg4fPMPoSzn6nGk4ghc4=">AAACQHicbVBNSwMxEM36bf2qevQSLIIeXHaLVC+CIIiHHlSsLXTrkk1TDU2ySzIrlnV/mhd/gjfPXjwo4tWTaS3i14PAy3szzMyLEsENeN6DMzI6Nj4xOTVdmJmdm18oLi6dmTjVlNVoLGLdiIhhgitWAw6CNRLNiIwEq0fd/b5fv2La8FidQi9hLUkuFO9wSsBKYbEeSK7CrBpwFUgCl1GUneTnmQqAS2ZwO88DHNzgBt7EVUvCg/Mytgqwa9Ayw8YFF+dff01UN1+vbux2w2LJc70B8F/iD0kJDXEUFu+DdkxTyRRQQYxp+l4CrYxo4FSwvBCkhiWEdskFa1qqiF2vlQ0CyPGaVdq4E2v7FOCB+r0jI9KYnoxsZf9I89vri/95zRQ6O62MqyQFpujnoE4qMMS4nyZuc80oiJ4lhGpud8X0kmhCwWZesCH4v0/+S87Krl9xK8dbpb3yMI4ptIJW0Try0TbaQ4foCNUQRbfoET2jF+fOeXJenbfP0hFn2LOMfsB5/wDCQ68u</latexit>
Xavier Bresson 28
29
Optimization algorithm
Alternating direction method of multipliers (ADMM) technique[1,2] :
Provides a fast, robust, and accurate solution to the relaxed problem (2).
The core idea is to decompose the problem into simpler sub-problems using Lagrangian multipliers.
min kLk? +
L,S2Rntimesd
which is equivalent to
r
min kLk? + kSk1 + hZ, X (L + S)i + kZ X (L + S) k2F , r > 0
L,S,Z2Rn⇥d 2
<latexit sha1_base64="4MPj2UwZk4AToZyy4OYiiDf2JvE=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBg5ZEpHosePFYwbSFNpTNdtMu3d2E3Y1QQv+CFw+KePUPefPfuGlz0NYHA4/3ZpiZFyacaeO6305pbX1jc6u8XdnZ3ds/qB4etXWcKkJ9EvNYdUOsKWeS+oYZTruJoliEnHbCyV3ud56o0iyWj2aa0EDgkWQRI9jk0mVfpINqza27c6BV4hWkBgVag+pXfxiTVFBpCMda9zw3MUGGlWGE01mln2qaYDLBI9qzVGJBdZDNb52hM6sMURQrW9Kgufp7IsNC66kIbafAZqyXvVz8z+ulJroNMiaT1FBJFouilCMTo/xxNGSKEsOnlmCimL0VkTFWmBgbT8WG4C2/vEraV3WvUW88XNeaF0UcZTiBUzgHD26gCffQAh8IjOEZXuHNEc6L8+58LFpLTjFzDH/gfP4Aw/WOAg==</latexit>
/r
[1] Glowinski, Le Tallec, Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, 1989
[2] Boyd et-al, Distributed optimization and statistical learning via the alternating direction method of multipliers, 2011 (23,000 citations)
Xavier Bresson 29
30
Xavier Bresson 30
31
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 31
32
Graph-based PCA
Lk2F s.t. rank(L) = k
<latexit sha1_base64="bYeFiS0eg4fPMPoSzn6nGk4ghc4=">AAACQHicbVBNSwMxEM36bf2qevQSLIIeXHaLVC+CIIiHHlSsLXTrkk1TDU2ySzIrlnV/mhd/gjfPXjwo4tWTaS3i14PAy3szzMyLEsENeN6DMzI6Nj4xOTVdmJmdm18oLi6dmTjVlNVoLGLdiIhhgitWAw6CNRLNiIwEq0fd/b5fv2La8FidQi9hLUkuFO9wSsBKYbEeSK7CrBpwFUgCl1GUneTnmQqAS2ZwO88DHNzgBt7EVUvCg/Mytgqwa9Ayw8YFF+dff01UN1+vbux2w2LJc70B8F/iD0kJDXEUFu+DdkxTyRRQQYxp+l4CrYxo4FSwvBCkhiWEdskFa1qqiF2vlQ0CyPGaVdq4E2v7FOCB+r0jI9KYnoxsZf9I89vri/95zRQ6O62MqyQFpujnoE4qMMS4nyZuc80oiJ4lhGpud8X0kmhCwWZesCH4v0/+S87Krl9xK8dbpb3yMI4ptIJW0Try0TbaQ4foCNUQRbfoET2jF+fOeXJenbfP0hFn2LOMfsB5/wDCQ68u</latexit>
[1] Shahid, Kalofolias, Bresson, Bronstein, Vandergheynst, Robust principal component analysis on graphs, 2015
Xavier Bresson 32
33
Optimization algorithm
ADMM technique :
kLkDirG s.t. X = L + S 2 Rn⇥d
<latexit sha1_base64="uO/qtF0cKry7bLea1TzWj7wqJy4=">AAAHtnictVVbj9NGFDZQEhpaWOhjX466KsoqF2K3oqhSJCRQaSVHgoYsgZ2NNXYmyYixHewx22WYf9invvXf9Mxkks2yXBZURnJ8ci7ffOcy43gpeCl7vX8vXLz01eVa/crXjavffHvt+s6Nm/tlXhUJGyW5yItxTEsmeMZGkkvBxsuC0TQW7Gn84r6xP33FipLn2RN5vGSHKZ1nfMYTKlEV3bj89y2S8ixSYXtIeEZSKhdxrP7UE5URyVNWwlRrAuRNSN5EpJS0gBYQgTtMaTQ0hiEa/C3lQ6OEEH8iRST7SxapesALHVnwhAr1UGtAJ2eDsiu7YDRj6GNcC97PBAhp3NpE8hLYy4q/ooJlEmS+Nq8Tag/+15QGXySlNnoN+uE5U6YlHDEhAN96O9H288jHJ/jkhD8/3ZZjJmg2FwyQALRh3GmGreEeKVZK9JkVNFFF5GsVaANr/DpAYj5vYnU6YP3t/z2z4W+TYMXEgQYIGnYGZwGDE8DAwnQGGwBDbE38j4xLTgV/bScefsW6hROV9nu6P/5gU4bOa+DeSHwjBU7qne6U2dKeSXW04JLpTfXIopozNdNaN05xk6ygkkGVSS4gyTM8qnOWJczSTPt+O2h3u11X6JcVnVruLV/3YQSLSPm3C90koW3bHuxPnnxgjE6m84jLBeKPwEWaQCC5uSeY3HAe7j/QWvU19nQ4SVsm/fQ2NnKbzXDNBrm46UFGJ811bLFtW+Gm1R8bd4s+WKNbwE3RdJRtzSpshjLUSGJrQu1GE9XxddPxaJnOGRbB+QisWm4pWP6YxzqD0/l11pX4hOxWU+TAAwceOPAT4MG5gRuNaGe31+3ZBWcF3wm7nluPop1/yDRPqhTvz0TQsjzwe0t5qGgheSIYDmtVsiVNXtA5O0Axo7jbobIjruFH1Exhlhf44P1rtdsRiqZleZzG6Gk4l2/bjPJdtoNKzu4eKp4tK4mHYbXRrBJ4w4P5hsGUFyyR4hgFmhR4vBNIFhSvBTxOpSmC/3bKZ4X9oOvf6d55/PPuvcCV44r3vfeD1/R87xfvnve798gbeUntp9qzWlxL6nfrkzqrz1euFy+4mO+8U6u+/A/vrosK</latexit>
is equivalent to
min kLk? + S kSk1 + G kM kDirG s.t. X = L + S 2 Rn⇥d , M = L 2 Rn⇥d
L,S,M 2Rn⇥d
Xavier Bresson 33
34
= +
<latexit sha1_base64="DTUlsyaJ4p6f5z9ImquJubKZhhk=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxClEcCGzI79MLI7OxmZtaEEL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eQ2Nre2d/K7hb39g8Oj4vFJS8epYthksYhVJ6AaBZfYNNwI7CQKaRQIbAfj27nffkKleSwfzCRBP6JDyUPOqLFS475fLLlldwGyTryMlCBDvV/86g1ilkYoDRNU667nJsafUmU4Ezgr9FKNCWVjOsSupZJGqP3p4tAZubDKgISxsiUNWai/J6Y00noSBbYzomakV725+J/XTU1440+5TFKDki0XhakgJibzr8mAK2RGTCyhTHF7K2EjqigzNpuCDcFbfXmdtCplr1quNq5KtUoWRx7O4BwuwYNrqMEd1KEJDBCe4RXenEfnxXl3PpatOSebOYU/cD5/AKznjNI=</latexit>
X L S
<latexit sha1_base64="13Ypezsn2gD69bifRehFco5ypPU=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU7A5KZbfiLkDWiZeTMuRoDEpf/WHM0gilYYJq3fPcxPgZVYYzgbNiP9WYUDahI+xZKmmE2s8Wh87IpVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGtn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNkUbgrf68jppVyterVJrXpfr1TyOApzDBVyBBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwBtHuM1w==</latexit> <latexit sha1_base64="PM/Pd0KcKkbCWAJ3IqTBX1p3efM=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe6CRMuAjYVFAuYDkiPsbeaSNXt7x+6eEEJ+gY2FIrb+JDv/jZvkCk18MPB4b4aZeUEiuDau++3kNja3tnfyu4W9/YPDo+LxSUvHqWLYZLGIVSegGgWX2DTcCOwkCmkUCGwH49u5335CpXksH8wkQT+iQ8lDzqixUuO+Xyy5ZXcBsk68jJQgQ71f/OoNYpZGKA0TVOuu5ybGn1JlOBM4K/RSjQllYzrErqWSRqj96eLQGbmwyoCEsbIlDVmovyemNNJ6EgW2M6JmpFe9ufif101NeONPuUxSg5ItF4WpICYm86/JgCtkRkwsoUxxeythI6ooMzabgg3BW315nbQqZa9arjauSrVKFkcezuAcLsGDa6jBHdShCQwQnuEV3pxH58V5dz6WrTknmzmFP3A+fwCiS4zL</latexit> <latexit sha1_base64="Vn8VtFou3m+vN9OfNvL6Sq3aGb0=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoMgCGE3SPQY8OIxAfOAZAmzk95kzOzsMjMrhJAv8OJBEa9+kjf/xkmyB00saCiquunuChLBtXHdbye3sbm1vZPfLeztHxweFY9PWjpOFcMmi0WsOgHVKLjEpuFGYCdRSKNAYDsY38399hMqzWP5YCYJ+hEdSh5yRo2VGlf9YsktuwuQdeJlpAQZ6v3iV28QszRCaZigWnc9NzH+lCrDmcBZoZdqTCgb0yF2LZU0Qu1PF4fOyIVVBiSMlS1pyEL9PTGlkdaTKLCdETUjverNxf+8bmrCW3/KZZIalGy5KEwFMTGZf00GXCEzYmIJZYrbWwkbUUWZsdkUbAje6svrpFUpe9VytXFdqlWyOPJwBudwCR7cQA3uoQ5NYIDwDK/w5jw6L86787FszTnZzCn8gfP5A3BHjKo=</latexit>
<latexit sha1_base64="sXkPdMgnk0WN0sbN5HKwHckcWpo=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoPgKewGiV6EgBePCZgHJEuYnfQmY2Znl5lZIYR8gRcPinj1k7z5N06SPWhiQUNR1U13V5AIro3rfju5jc2t7Z38bmFv/+DwqHh80tJxqhg2WSxi1QmoRsElNg03AjuJQhoFAtvB+G7ut59QaR7LBzNJ0I/oUPKQM2qs1LjtF0tu2V2ArBMvIyXIUO8Xv3qDmKURSsME1brruYnxp1QZzgTOCr1UY0LZmA6xa6mkEWp/ujh0Ri6sMiBhrGxJQxbq74kpjbSeRIHtjKgZ6VVvLv7ndVMT3vhTLpPUoGTLRWEqiInJ/Gsy4AqZERNLKFPc3krYiCrKjM2mYEPwVl9eJ61K2auWq42rUq2SxZGHMziHS/DgGmpwD3VoAgOEZ3iFN+fReXHenY9la87JZk7hD5zPH4uPjLw=</latexit>
Xavier Bresson 34
35
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 35
36
Low-dim
k-NN graph xi
embedding '
construction '(xi ) '(xj )
xj
+
+
n⇥n
d
G = {V, E, A}, A 2 R d
R , d
<latexit sha1_base64="kj8FWojcQGJddEc1jnNgxcUA/ek=">AAACEnicbVBNS8NAEN3Ur1q/oh69LBZBoZSkFPUitIjosYqthSaWzXbbLt1swu5GKCG/wYt/xYsHRbx68ua/cdPmoK0PBh7vzTAzzwsZlcqyvo3cwuLS8kp+tbC2vrG5ZW7vtGQQCUyaOGCBaHtIEkY5aSqqGGmHgiDfY+TOG52n/t0DEZIG/FaNQ+L6aMBpn2KktNQ1jy7PnLhVuijVnaQEHVh3KHd8pIaeF98k9zF3FPWJhDzpmkWrbE0A54mdkSLI0OiaX04vwJFPuMIMSdmxrVC5MRKKYkaSghNJEiI8QgPS0ZQjvceNJy8l8EArPdgPhC6u4ET9PREjX8qx7+nO9Fo566Xif14nUv1TN6Y8jBTheLqoHzGoApjmA3tUEKzYWBOEBdW3QjxEAmGlUyzoEOzZl+dJq1K2j8vV62qxVsniyIM9sA8OgQ1OQA1cgQZoAgwewTN4BW/Gk/FivBsf09ackc3sgj8wPn8Ahrycxg==</latexit>
V = {x1 , ..., xn } 2 R Rk , k ⌧ d
<latexit sha1_base64="2CDGn9purteYu7gPs5J7+cCMH44=">AAACAnicbVDLSsNAFJ34rPEVdSVuBovgQkpSirosuHFZxT6giWUymbRDJpMwMxFKKG78FTcuFHHrV7jzb5y0WWjrgQuHc+7l3nv8lFGpbPvbWFpeWV1br2yYm1vbO7vW3n5HJpnApI0TloiejyRhlJO2ooqRXioIin1Gun50VfjdByIkTfidGqfEi9GQ05BipLQ0sA7dGKmR7+e3k/voDLowgi5jMDDNgVW1a/YUcJE4JamCEq2B9eUGCc5iwhVmSMq+Y6fKy5FQFDMyMd1MkhThCA1JX1OOYiK9fPrCBJ5oJYBhInRxBafq74kcxVKOY193FgfLea8Q//P6mQovvZzyNFOE49miMGNQJbDIAwZUEKzYWBOEBdW3QjxCAmGlUytCcOZfXiSdes05rzVuGtVmvYyjAo7AMTgFDrgATXANWqANMHgEz+AVvBlPxovxbnzMWpeMcuYA/IHx+QOnEJWa</latexit>
Xavier Bresson 36
37
[1] Roweis, Saul, Nonlinear dimensionality reduction by locally linear embedding, 2000 (18,000 citations)
[2] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003 (10,000 citations)
[3] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008 (46,000 citations)
[4] McInnes et-al, UMAP: Uniform manifold approximation and projection for dimension reduction, 2018 (13,000 citations)
Xavier Bresson 37
38
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 38
39
LLE
Locally Linear Embedding[1](LLE) was one of the pioneering non-linear
visualization techniques.
Sam Roweis Lawrence Saul
It involves three key steps: 1973-2010
[1] Roweis, Saul, Nonlinear dimensionality reduction by locally linear embedding, 2000
Xavier Bresson 39
40
Algorithm
Step 1 : Compute a k-nearest neighbor graph G.
For each data xi, we identify its k nearest neighbors {xj}j∈N(i).
Then, we compute the adjacency matrix A of the graph :
<latexit sha1_base64="GssJpZernZlqG/DIIZuzNpcMOEg=">AAADk3icbVLbbhMxEN0mXMpyS0E88TKiIUqkdpVEKCAhRLg88ACoSKStVKcrr3c2ceK1V7bTJFrtB/E7vPE3eJNQtSmWdnU0c2bm+IyjTHBj2+0/O5Xqrdt37u7e8+8/ePjocW3vybFRM81wwJRQ+jSiBgWXOLDcCjzNNNI0EngSTT+V+ZML1IYr+dMuMxymdCR5whm1LhTu7fxqfAhzPingHfhEYGJJ7pMIR1zmVGu6LHIhCp/gImsekkRTlhOLC6vTPHbqiuYi5Itw0jrvFjkxfJRSh1rQgH8s4AkUMCFckpTaMaMi/16E0yZvtYAQv32VquwY9ZwbhHKijDcKfKL5aGwDR29ccueOilBfz6wDN+CKwbj+CBnVNEWL+gBiTJwzMWAwCpyOay3ouiZFKqG8DJUMQSVAhYDpoR2DRDc3UtoE25UydqOv+UBYrOzB6t+6lHPZNUI7R5SroJ0riKmlcIHMuuYH27pWWutfz7t1kEqnG91+WNtvB+3VgZugswH73uYchbXfJFZslqK0TFBjzjrtzA6dqZYzgc7WmcGMsikd4ZmD0nlmhvnqTRXw0kViSJR2n7Swil6tyGlqzDKNHLPcq9nOlcH/5c5mNnkzzLnMZhYlWw9KZgKsgvKBOs+080UsHaBMc6cV2NgtlLl9mtKEzvaVb4LjbtDpBb0fr/b73Y0du95z74XX9Drea6/vffGOvIHHKrVKr/K+0q8+q76tfqx+XlMrO5uap961U/32F8kDH9E=</latexit>
(
dist(xi xj )2
Aij = exp( 2 ) if j 2 Nk (i))
0 otherwise
where is the scale parameter, defined e.g. xi
<latexit sha1_base64="w9BAGsLFuSxviSOQepGIeKXJY/g=">AAAB/3icbVDLSsNAFL3xWesrKrhxEyxC3ZSkSHVZcONKKtgHNCFMptN27GQSZiZiiV34K25cKOLW33Dn3zhps9DWAwOHc+7lnjlBzKhUtv1tLC2vrK6tFzaKm1vbO7vm3n5LRonApIkjFolOgCRhlJOmooqRTiwICgNG2sHoMvPb90RIGvFbNY6JF6IBp32KkdKSbx4++Hcu5W6I1BAjll5P/FGZnvpmya7YU1iLxMlJCXI0fPPL7UU4CQlXmCEpu44dKy9FQlHMyKToJpLECI/QgHQ15Sgk0kun+SfWiVZ6Vj8S+nFlTdXfGykKpRyHgZ7Mcsp5LxP/87qJ6l94KeVxogjHs0P9hFkqsrIyrB4VBCs21gRhQXVWCw+RQFjpyoq6BGf+y4ukVa04tUrt5qxUr+Z1FOAIjqEMDpxDHa6gAU3A8AjP8ApvxpPxYrwbH7PRJSPfOYA/MD5/AN4ilfY=</latexit>
xj 2 Nk (i) M
as the mean distance of all k-th neighbors.
and dist(·, ·) is the distance between the two data vectors,
e.g. L2 norm. <latexit sha1_base64="RotrUxR+GlX9YVZ/QG8JAlilxv0=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoPgKewGiR4DHvSYgHlAsoTZSW8yZnZ2mZkVQsgXePGgiFc/yZt/4yTZgyYWNBRV3XR3BYng2rjut5Pb2Nza3snvFvb2Dw6PiscnLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb+d++wmV5rF8MJME/YgOJQ85o8ZKjbt+seSW3QXIOvEyUoIM9X7xqzeIWRqhNExQrbuemxh/SpXhTOCs0Es1JpSN6RC7lkoaofani0Nn5MIqAxLGypY0ZKH+npjSSOtJFNjOiJqRXvXm4n9eNzXhjT/lMkkNSrZcFKaCmJjMvyYDrpAZMbGEMsXtrYSNqKLM2GwKNgRv9eV10qqUvWq52rgq1SpZHHk4g3O4BA+uoQb3UIcmMEB4hld4cx6dF+fd+Vi25pxs5hT+wPn8AZq3jMY=</latexit>
Xavier Bresson 40
41
Algorithm
Step 2 : Compute linear patches.
Find the weights Wij ∈ [0,1] which best linearly reconstruct xi from its neighbors :
n n n
1X X X
<latexit sha1_base64="+XRrhmNnSUCQe+Ei2CuegH8/VLw=">AAAGXnictVRdTxNBFF2QFiwioC8mvtxIJZBA020MGpMmGGPik0EDlMi2m9nplA7Mzi47s3xknT/pm/HFn+Kd3YXYFoIkOg/N7P0495x7byeIBVe62fwxNf1gplKdnXtYm3+08HhxafnJvorShLI9GokoOQiIYoJLtqe5FuwgThgJA8E6wcl76++csUTxSO7qy5h1Q3Ik+YBTotHkL8/oVfBCLv2s43HphUQPgyD7YnqZ9DQPmQJpjAfeICE0c03WMp5KQz/jbdf0JHgBP/K+wYXPYRMKz3Hh6WDMsXmX/174x2Wk3+q1AOE0u9BJmIFq6AYYa5nMbbu2bpQQIYCD59VWr/M+nKb8jAgmtbjcsPn/VsEVDiUi+2R8bkpGf6/jNoSbNN2Pe0H3TqL3VtO+dxFYK8ogzvp/aEuJ0WHQZwk/Y6CHDFQkUru4EA3wmysQjCi9qU5TkjCIkwj3PgREgkhiItEE4ohLDXXkWoe3ZhT8nOsh1L8WHK7VjI2iX47CNXUgsm/jeTvX7vqZRBHrvV1MXCu/7MpvTCrFoIPxGfu8hO4j9Bi1vJK0pFGllS7TMGCJFZ4rUsAl1Edq1McgAHtHiYSEbZ4nXBcdvOoRUWZk/5DLbezcm3cQU3q72Iu7Zp/Hlc1pu7ZozV9aaTaa+YHJi1teVpzy7PhL371+RNMQ//FUEKUO3WasuxlJNKeCmZqXKhYTekKO2CFeJUHe3Sx/Hg28REs/X4pBhKuQW//MyEio1GUYYKTVr8Z91niT7zDVgzfdjMs41UzSotAgFaAjsG8t9HnCKD5ReCEUJ8Ap0CHBPmp8kW0T3HHJk5f9VsPdamx9frWy3SrbMec8d144a47rvHa2nY/OjrPn0JmflalKrTJf+VWtVheqi0Xo9FSZ89QZOdVnvwE5/yee</latexit>
2
min xi Wij Aij xj 2 s.t. Wij = 1 8i
W 2Rn⇥n 2 i=1 j=1 j=1
n
1X X 2 X
Equivalently, min xi Wij xj 2 s.t. Wij = 1 8i
W 2Rn⇥n 2 i=1
j2Ni j2Ni
1 X X 2 1 X 2 X xi M
min Wij xi Wij xj 2 = Wij (xi xj ) 2
s.t. Wij = 1 8i
W 2Rn⇥n 2 2
j2Ni j2Ni j2Ni j2Ni
We derive the solution of this least-square problem for one data point xi :
x j 2 Nx i
d⇥1 T T ni ⇥d
with Zij = xi xj 2 R and Zi = (xi 1ni ) (1ni Ai,j2Ni ) X 2 R
and ni is the number of points in Ni
We can re-write the problem as
1 T 2 T
min W i Z i 2 s.t. Wi 1ni = 1
Wi 2Rni ⇥1 2
Xavier Bresson 41
42
Algorithm
Step 2 : Compute linear patches.
Find the weights Wij ∈ [0,1] which best linearly reconstruct xi from its neighbors :
1
<latexit sha1_base64="KNofEc6L3DwXu4fuk8n3RDWiMzA=">AAAFonicjVRbT9swFA7Qbqy7ANvjHnY0OgQCqqaa2IRUCYndmHYBxk2QNnIct7VwnMx2ClWW/7Xfsbf9m9lJWyCwDT+dHJ/vO9/57NiLGJWqXv89MTlVKt+5O32vcv/Bw0czs3OPD2QYC0z2cchCceQhSRjlZF9RxchRJAgKPEYOvdNNs3/YJ0LSkO+pQURaAepy2qEYKZ1y56Z+wgI4AeVucuhSh3InQKrneclu2k64zigaEAl2mjrgdATCiZ0mjdTxaNf5ARrS3jt2KeTfbqPdAF2nyLkSQQKypmqQ6kxWZ7uGMG3a4DiVhYuqT6grEO8SCGKmqB6aCFAE9zj9HhNYT28lb8VhemgfFYpurXoZxgSweFnuqr1U0PuGCNrX9vUJnNWEGbGqAVVY16NmSNDcx1lwhXZI2N5r1o1Ju7TbU0iI8Cw3CJqwOAIutZNVO70G/b+QMSSXA1cnuaHxGNAc+zTWWdQzklHU8Y5yxNhgBVSPgAxZbK4WdEIBPlIIqudGDpW5PXrMvFOBfMR9i/b/uAYFZZ+REvQcqpvNEdVfsYY5U4n1LMTPZsGhEIRlfwroad4LFECQURYbbXGI9FCKYrICCGSgSYD6hOtfcjDEGHLk+5rbWMPjQB+fbgZSIY8yU2eO7ELqstNHgkSSspCPGm2l45vgzs7Xa/VswfXAHgbz1nBtu7O/HD/Eui1XmCEpT+x6pFoJElo1I2nFiSWJED5FXXKiQ460La0ke2JSeKEzufBOyBVk2cuIBAVSDgJPVxp7ZXHPJG/aO4lV53UroTyKFeE4b9SJGagQzHsFPhUEKzbQAcKCaq2Ae8h4rV+1ijbBLo58PTho1Oy12trOy/mNxtCOaeup9dxatGzrlbVhfbC2rX0Ll56V3pa+lL6Wq+WP5Z3yt7x0cmKIeWJdWWXnD+qN22w=</latexit>
2
min WiT Zi 2
s.t. WiT 1ni = 1
Wi 2R n ⇥1
i 2
1 2
Lagrange multiplier technique : min WiT Zi 2
+ T
i (Wi 1ni 1)
n ⇥1
Wi 2R i , i 2R 2
Derivative w.r.t. Wi : WiT Zi ZiT + T
= 0 ) Wi = (Zi ZiT )
i 1ni
1
i 1ni
xi M
1
Derivative w.r.t. i : WiT 1ni 1=0 ) i= T
1ni (Zi ZiT ) 1 1ni
(Zi ZiT ) 1 1ni ni ⇥1
Finally, the solution for data xi is Wi = T T
2 R x j 2 Nx i
1ni (Zi Zi ) 1 1ni
Matrix C = Zi ZiT 2 Rni ⇥ni is called the correlation or Gram matrix
In practice, a small identity matrix is added for numerical stability : C = Zi ZiT + "Ini
Xavier Bresson 42
43
Algorithm
Step 3 : Compute the low-dim embedding data zi with the weights Wij :
Find the coordinates zi ∈ Rk which best linearly reconstruct zi from its neighbors :
n
X n
X
<latexit sha1_base64="NpXFGRHcPdrFSd6KpNsSd8s0lA4=">AAAEgXicpVNdb9MwFM3aAmN8bINHXq4YTKvURkmFBhKqNIkPwQPSYO06pW4tx3Vbr4lTxc5Ym/l38L94488g7DUDujEJhCVLR+eee8/1tR1OIy6V531bKZUrN27eWr29dufuvfvrG5sPDmWSpZS1aRIl6VFIJIu4YG3FVcSOpikjcRixTjh5ZeOdE5ZKnoiWmk1ZLyYjwYecEmUovFn6sg0o5gLnQbM7x37Ndd3aHIse4gLFRI3DMP+k+7lAisdMwkRrBEhmMc5509d9ASjkI3QGc8yhXkSOF5GO0RzrOT4uNLjRb4DJVuxUpXEO0lUuaMME/ZaPRdPDomZx0LyQvNfYGKC17Z9JrTEDmUSZ7R64hBE/YQLCGbw5fO3qQrs4j230LKh3AuP81jpf8r3O6Vd2EVOp3oEdW6nabxUAqv9ZL7ClltR16PRb1StcFYJ/MCuYg2tGBMkQlBmhudmUn5o6H6D5t22gxD4kpvKLkKmndd7UAG10wEcxgbY51FIfn7ka226NSxvniA4SVVu8sYm5rOuemK2BN7Y81ztfcBX4BdhyirWPN76iQUKzmAlFIyJl1/emqpeTVHEaMb2GMsmmhE7IiHUNFMR49fLzH6ThqWEGMExSs4WCc/b3jJzEUs7i0Chtx/JyzJJ/inUzNXzRy7mYZooJujAaZhGoBOx3hAFPGVXRzABCU256BTomKaHKzNoOwb985KvgsOH6u+7ux2dbe41iHKvOI+exs+P4znNnz3nn7Dtth5a+l5+U62W3Uq5UK16l0JZWipyHztKqvPwB0i1vxQ==</latexit>
2
min zi Wij zj 2
s.t. Z T 1n = 0n , Z T Z = In
Z=[z1 ,...,zn ]2Rn⇥k
i=1 j=1 <latexit sha1_base64="KIRH04wdjXlWVk2mmZ7McfdM2F8=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpl262YTdiVBDf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJFIYdN1vZ219Y3Nru7BT3N3bPzgsHR23TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg94CThfkSHSoSCUbTS/VO/2i+V3Yo7B1klXk7KkKPRL331BjFLI66QSWpM13MT9DOqUTDJp8VeanhC2ZgOeddSRSNu/Gx+6pScW2VAwljbUkjm6u+JjEbGTKLAdkYUR2bZm4n/ed0Uw2s/EypJkSu2WBSmkmBMZn+TgdCcoZxYQpkW9lbCRlRThjadog3BW355lbSqFa9Wqd1dluvVPI4CnMIZXIAHV1CHW2hAExgM4Rle4c2Rzovz7nwsWtecfOYE/sD5/AEPAo2e</latexit>
z2
The solution is given by EVD. <latexit sha1_base64="06xFCbA50QVf48RS/PWsi9J2PiQ=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWNF+wFtKJvtpF262YTdjVBDf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ThVDJssFrHqBFSj4BKbhhuBnUQhjQKB7WB8M/Pbj6g0j+WDmSToR3QoecgZNVa6f+p7/VLZrbhzkFXi5aQMORr90ldvELM0QmmYoFp3PTcxfkaV4UzgtNhLNSaUjekQu5ZKGqH2s/mpU3JulQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeO1nXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtO0YbgLb+8SlrViler1O4uy/VqHkcBTuEMLsCDK6jDLTSgCQyG8Ayv8OYI58V5dz4WrWtOPnMCf+B8/gANfo2d</latexit>
z1
min kZ W Zk2F s.t. Z T Z = In
Z
min tr((Z W Z)T (Z W Z)) s.t. Z T Z = In
Z
min tr(Z T (In W T )(In W )Z) s.t. Z T Z = In
Z
EVD
Solution is given by EVD of the matrix M = (In W T )(In W ) = U ⌃U T
with Z = U·,1,...,k 2 Rn⇥k
Xavier Bresson 43
44
Lab 3 : LLE
Run code03.ipynb :
Compute the LLE solution for the Swiss Roll dataset.
Visualize the MNIST dataset with the LLE technique.
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 45
46
Laplacian eigenmaps
Laplacian eigenmaps technique[1] was one of the first non-linear visualization
techniques grounded in mathematical theory.
Misha Belkin Partha Niyogi
It is based on the manifold assumption, i.e. the data distribution is sampled 1967-2010
from a smooth and continuous manifold M.
Since the manifold cannot be directly observed, it is approximated using a k-nearest neighbor graph.
M M M
+
+
Data G
points
[1] Belkin, Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, 2003
Xavier Bresson 46
47
M Unwrap M
Xavier Bresson 47
48
Task formalization
Let us begin with a simple 1D dimensionality reduction.
The goal is to map a given graph G = (V,E,A) onto a line with the constraint that neighboring
data points on G remain as close as possible on the line.
To achieve this, we can design a loss function that computes the mapping y = φ(x) such that :
X
<latexit sha1_base64="t3Xq1VJVmkT/LUcbYJ7bXCsytdc=">AAADB3icbVLLbtQwFHXCq4RHp7BESFeMGM1IZZSMUGFTqYgNK1QQ01YaD5HjOBOnjhPZTjtRlB0bfoUNCxBiyy+w42/wPARDy5VsHZ17z33ZUSm4Nr7/y3GvXL12/cbWTe/W7Tt3tzs79450USnKxrQQhTqJiGaCSzY23Ah2UipG8kiw4+j05cJ/fMaU5oV8Z+qSTXMykzzhlBhLhTvOwx7gnMuwqTGXOCcmjaLmbdtiwLrKw4Zn7Yvl3a9D/qQOs8H70S5gw+ZG5Q2cc5NCC9a3j8+IKlPen4d8sGuZbIPJBhsaImMrAYy9Hqxy73tYsMTgxsMRm3HZEKVI3TZCtF4Avb9SnlhlBrZVWPZKiWhetyG3ufzNuMKkTJ1zzaD1MJPxOqGHFZ+lZgj4TyjRQGCmSJkCiTNCmaQ12NSKz1vPCztdf+gvDS6DYA26aG2HYecnjgta5UwaKojWk8AvzdSWN5wKZhuoNCsJPSUzNrFQkpzpabN8xxYeWyaGpFD2SANLdlPRkFzrOo9s5GJ4fdG3IP/nm1QmeT5tuCwrY+dbFUoqAaaAxaeAmCtGjagtIFRx2yvQlChCjf06iyUEF0e+DI5Gw2BvuPfmafdgtF7HFnqAHqE+CtAzdIBeoUM0RtT54Hxyvjhf3Y/uZ/eb+30V6jprzX30j7k/fgMVbPQE</latexit>
yi = '(xi )
xi '
)
Graph G Line R1
Xavier Bresson 48
49
Task formalization
<latexit sha1_base64="7DClLs2RCRZW8Ss+wjWt+UihATw=">AAACSXicbVC7TsMwFHXKu7wKjCwWFRJIUCUVAkYQCyMgCkhNCY7rUhfbiewbIIryeyxsbPwDCwMIMeG0HXgdyfLROfdeX58wFtyA6z47pZHRsfGJyany9Mzs3HxlYfHMRImmrEEjEemLkBgmuGIN4CDYRawZkaFg5+HNQeGf3zJteKROIY1ZS5JrxTucErBSULnyJVdBlvpc+ZJANwyzk/xS5T72TSKDjPdyvD+41tKAb6ZBb/2yvoGtD+wetMzwHYcuzrF1N6yLf0wql4NK1a25feC/xBuSKhriKKg8+e2IJpIpoIIY0/TcGFoZ0cCpYHnZTwyLCb0h16xpqSKSmVbWTyLHq1Zp406k7VGA++r3joxIY1IZ2spiR/PbK8T/vGYCnd1WxlWcAFN08FAnERgiXMSK21wzCiK1hFDN7a6YdokmFGz4RQje7y//JWf1mrdd2z7equ7Vh3FMomW0gtaQh3bQHjpER6iBKHpAL+gNvTuPzqvz4XwOSkvOsGcJ/UBp5At8pbKD</latexit>
X
Let us analyze the loss function : minn Aij (yi yj )2 , with yi , yj 2 R
y2R
ij
When Aij ≈ 1, i.e. xi is close to xj, then minimizing the loss encourages yi to be similar to yj.
When Aij ≈ 0, i.e. xi is far from xj, then minimizing the loss allows yi to differ significantly
from yj.
In summary, minimizing this loss ensures that data points that are close in the high-dim space
remain close in the low-dim space, satisfying the first key property of dimensionality reduction
techniques.
Observe that the non-linear mapping φ is never explicitly computed.
yi = '(xi )
xi '
)
Graph G Line R1
Xavier Bresson 49
50
Task formalization
Finally, the loss function can be reformulated in terms of the Laplacian operator :
X X
<latexit sha1_base64="NjD4BSW3OUKVr9NNl1anSI5frUM=">AAAEqnicdVNdb9MwFM3WAqN8bfDIyxULY5PSqpnQQEiVNjEkkIo00NYNNW3kOG7jzbEj2+nIovw3fgNv/BuctOvWDvwQ3dxz7se51w4SRpVut/+srNbq9+4/WHvYePT4ydNn6xvPe0qkEpMTLJiQZwFShFFOTjTVjJwlkqA4YOQ0uPhY4qcTIhUV/FhnCRnEaMzpiGKkjcvfWP21BV5MuZ9nQ+5R7sVIR0GQfy8KDzyVxn5Oz4uD6rud+bSZ+ec7w13o3IDd6mswAxl/NjyGLmTgeY0tT5OfWsY5XFIdQWH8HTiEJhzAInwcEUg51c0JwVpIwIIrLRHl2gHaIi3QhiCkjsRYcMSozm5RwDYls45rO8VCUjQRNFRVqJZ0QhEDJVha6jYhnbZd3OlhKanrc0NzrjOhKY4MOk+0HZhWCNdEUj42QfbOYtYvHMycYiQzBy4JRGhCqo7mlUgIItE0plfVRiCRwuwuhg9mXGWm/29nPmiYV1Mt3SrjYDoRZ2pVMpb2EVEc3aigCsZ0QjgYNZ96hyBGYHftmx66kPpuuXNmblaIjF3+LzQ05A4srtues+2yQKlaxYgxojRwwZtXRAogdEz4BLGUzIo617IXclGtZtTpBbFNfbtiNvz1zXarXR24a7gzY9OanSN//bcXCpzGZmuYIaX6bjvRgxxJTTEjRcNLFUkQvkBj0jcmRzFRg7x6agW8Np4QRqaFkTD3oPLejshRrFQWB4ZZjkYtY6XzX1g/1aP3g5zyJNWE42mhUcpACyjfLYRUGuksMwbCkppeAUdIImyuniqH4C5Lvmv0dlvuXmvv29vN/d3ZONasl9Yra9tyrXfWvvXZOrJOLFx7U/ta69VO6079e/1HvT+lrq7MYl5YC6ce/gVpp379</latexit>
2
min
n
Aij (yi yj ) = Lij yi yj = y T Ly
y 2R
ij ij
with L = D A
The unit-vector constraint, i.e. the orthogonality constraint y T y = 1,
avoids the trivial solution y = 0
The constraint y T 1n = 0, avoids a constant solution (by centering y)
In summary, we have the constrained optimization problem :
T T T
min
n
y Ly s.t. y y = 1, y 1n = 0
y 2R
Xavier Bresson 50
51
Generalization to k dimensions
Let us extend this mapping process, i.e. from a graph to a k-dimensional Euclidean space :
T T T
<latexit sha1_base64="2QiE+lY16keqUJW3PaY0hqOOnuw=">AAAEnnicdVNtT9swEA6021j3BttHvpxGNxWpRA2a2IRUCTT23gkGlFE1beQ4bmvFsTPbAUKUf7Vfsm/7N3P6wihslhKd757zPX7u7MeMKt1o/F5YLJXv3L23dL/y4OGjx0+WV56eKJFITNpYMCFPfaQIo5y0NdWMnMaSoMhn5Lsfvi3i38+IVFTwY53GpBehIacDipE2Lm9l8edLcCPKvSztc5dyN0J65PvZYZ67kPaPoQUpgKvJhZZRBsrWNuRFIG06dRhDHI83G8a8AtWcPYhQHFM+XM9dtzKr0Gl2U8+p27ZdTz3em6vWz7iraUQUhEVlVyWRl0VNJ++H0PEyFwdC16N8TOjaHppXZbXMa9CZANbn6IQb1/hAQegqdk71CPSIwJBwIhGjlyQAIfVIDAU3W50CFlxpiSjXUDXHd5qz5E+5F1bzueOOYoINlsGRYEkhMGz/NamCIT0jHPx0XLIaVoELvnFJpAAVIcaI0kCooXJmjhFSgRjADcJjrhLFI2ihmCFMEYdqq7m7sVeF7QkW3B8JCowMrig6T3Q2y353spfnWTOHttsyIxIgaBu9jFSHdDjSSEpxbnYdI2p7JnHRrnqYmxH4X7fm+R1IEROpqQltwwcmfCOGmilQozwgMTE/I6a5G+VU00L08TCuw7yYiAdQ3a/x/ma4XjVtiGJGLoqO5BVvea1hN8YLbhvO1FizpuvAW/7lBgInkSmMGVKq6zRi3cuQYYoZyStuokiMcIiGpGtMjszdetn4eeXwwngCGAhpPkN87L2ekaFIqTTyDbJQSN2MFc5/xbqJHrzpZZTHiSYcTwoNEgZaQPFWIaDSDAJLjYGwNFphwCMkEdamr4UIzs0r3zZONm1ny9769mptZ3Mqx5K1aj23apZjvbZ2rI/WgdW2cGm1tFv6XPpShvL78tfy/gS6uDDNeWbNrfLpHxA3fmU=</latexit>
min
n
y Ly s.t. y y = 1, y 1n = 0 (1D mapping)
y 2R
k
X
T
min Y·,m LY·,m = tr(Y T LY ) (k-D mapping)
Y =[y1 ,...,yn ]2Rn⇥k
m=1
with the generalized orthogonality constraint Y T Y = Ik
Spectral Solution : Solution is given by the k non-zero smallest eigenvectors of
the graph Laplacian L = A D:
EVD
L = U ⇤U T ) Y = U·,1,..,k 2 Rn⇥k
Properties : Global solution (independent of initialization)
and O(n2 k) complexity
Xavier Bresson 51
52
Normalized Laplacian
Considering the importance of the nodes with the degree matrix D :
min
Y 2Rn⇥k
tr(Z T D 1/2
(D A)D 1/2
Z) s.t. Z T Z = Ik
tr(Z T (I D 1/2
AD 1/2
)Z) = tr(Z T LZ) s.t. Z T Z = Ik
1/2 1/2
where L = I D AD is the normalized graph Laplacian
Xavier Bresson 52
53
Xavier Bresson 53
54
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 54
55
TSNE
T-distributed Stochastic Neighbor Embedding[1] (TSNE) has been the
most successful non-linear visualization technique.
Geoffrey Hinton Laurens van der
It involves four steps : Maaten
Step 1: Compute a k-nearest neighbor graph G from the high-dimensional data points
{xi} ∈ Rd, d ≫ 1.
Step 2: Represent the distribution of the high-dim points using exponential weights :
kxi xj k22 / 2
<latexit sha1_base64="96H9ZMvRyJr19MmcyOMgKCQgGoM=">AAACy3icbVFNbxMxEPUuXyV8BThyGZFQFYmE3QgVLpUqcekFKUikrZRNVl7vbNbp2ruyvTTB2SN/kBtXfglOmqLSdiTLz2/e2OM3SVVwbYLgt+ffuXvv/oOdh61Hj588fdZ+/uJYl7ViOGJlUarThGosuMSR4abA00ohFUmBJ8nZ53X+5DsqzUv5zSwrnAg6kzzjjBpHxe0/u1DFls8bOIDI4MIoYYeqTJrYXh5zPst7KRdNs8ffzd+uhZmizOLU9qLVIua9RTyPVvFgOngfaT4TNObTQdPYSNcituIgbKYSrqrFLWqIotbuvxbgPEeF0L1UdIFrMDmCRKpQG7e7rhLnAqTOJCoZQqZKASk1FKqSSwNd3t3cGrc7QT/YBNwE4RZ0yDaGcftXlJasFigNK6jW4zCozMRSZTgrsGlFtcaKsjM6w7GDkgrUE7uZRQNvHJNCViq3XBMb9mqFpULrpUicUlCT6+u5NXlbblyb7NPEclnVBiW7eCirCzAlrAfrfFDITLF0gDLFXa/AcuoGZdz41yaE1798ExwP+uF+f//rh87hYGvHDnlFXpM9EpKP5JAckSEZEeYdedI79xb+F1/7P/zVhdT3tjUvyX/h//wLHPrePg==</latexit>
e i
pij = Probhigh-dim (i, j) = Pn kxi xm k22 / 2
m=1 e i
[1] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008
[2] Interactive demo : https://distill.pub/2016/misread-tsne
Xavier Bresson 55
56
TSNE
Step 3: Parametrize the distribution of the low-dim points using polynomial weights :
(1 + kyi yj k22 ) 1
<latexit sha1_base64="7AT1J26ffxqofnzH3ndX1AsRUiQ=">AAAC3nicbVJLbxMxEHaWVwmPBjhyGZFSpYJE2QgVLpEqceEYEGkrxcni9XoTJ2t7sb1tV+4euHAAIa78Lm78EO44aUB9MJKlb775RvNynGfc2G73Vy24dv3GzVsbt+t37t67v9l48HDfqEJTNqQqU/owJoZlXLKh5TZjh7lmRMQZO4gXr5fxgyOmDVfyvS1zNhZkKnnKKbGeihq/t+Fj5Pi8apU70Ads2YnVwg20iqvI/XUzddxOuKiqFn8+X+lSTaiDVvgMn5YRb5fRHJ9GvUlvZ+LaYQWVA2wKETnRD6uJPC8UF4WAcX37X1045nYGW2UfHxGdz3jrZAdziQWxszh276qJk9hywQwsqi0gmoGdMVi3B0zELEm4nAJVSntArFeqFBJiCeSKS2s6FUA9ajS7ne7K4CoI16CJ1jaIGj9xomghmLQ0I8aMwm5ux45oy2nGqjouDMsJXZApG3koiW9x7FbnqeCpZxJIlfZPWlix5zMcEcaUIvbK5aDmcmxJ/i82Kmz6auy4zAvLJD0rlBYZWAXLW0PCNaM2Kz0gVHPfK9AZ8Zez/kcslxBeHvkq2O91wt3O7tsXzb3eeh0b6DF6glooRC/RHnqDBmiIaG1U+1T7UvsafAg+B9+C72fSoLbOeYQuWPDjD46G45Q=</latexit>
with y = '(x) 2 Rn⇥k are the low-dim embedding coordinates of data points.
[1] Van der Maaten, Hinton, Visualizing data using t-SNE, 2008
Xavier Bresson 56
57
Optimization problem
The minimization problem is a continuous non-convex problem.
Standard gradient descent (GS) can be applied.
However, since the problem is non-convex, the GD solution is not guaranteed to reach a global
minimum.
In practice, PCA is often used as the starting point for initialization.
Although GD is a slow optimization process, TSNE has significant advantages :
TSNE does not enforce the manifold assumption, meaning there is no orthogonality
constraint YTY=I, which allows for greater flexibility in representing complex structures.
Minimizing the KL loss ensures that local distances in the high-dimensional data
distribution are preserved in the low-dimensional representation.
Xavier Bresson 57
58
Algorithm
Optimization problem :
<latexit sha1_base64="EFjvrTVqhQfheAxVL2F2SO4NwOE=">AAADtHicdVJbb9MwFM5aLqPcOnjk5YiNKdXaqqnQQJMqDY0HEDx0iN00t8Fx3M5r4mT2CaLL/Ad55I1/g5N2Y1vhSI6/nPv5joM0Eho7nd9Lleqdu/fuLz+oPXz0+MnT+sqzfZ1kivE9lkSJOgyo5pGQfA8FRvwwVZzGQcQPgslOYT/4zpUWifyK05QPYjqWYiQYRavyV5Z+rpNYSD8/IkKSmOJJEORfzDCXBEXMNUyMIUCQ/0AV5++Nfwk/fTZuv7nrHjUa0CM6i/1cnJq0/JIoGZORoiyf/Zv8rLytswFCauvwNyUIDbYBEYtzHkIwBTzhoJHKkKoQxoqGgkuEkGtW3FuwmOGjFChoJM7LmUqXo2GOvY6B3pVXf+edcQ8b/50SFtMiVxQ5ZBJFBCyRlscxl4w3YQ17XrPbbLfba7B12dBZRkOY+sKW3vCK0jNsoHWVM1LWueTqtOeZoay5M4JaM36G2HC9DXJRBGJr6p8OkVz43WG3McxbnnGv6Rtwc5KJ7cGvr3banVJgEXhzsOrMpe/Xf5EwYVlsiWUR1frY66Q4yKlCwSJuaiTTPKVsQsf82EJJLVWDvHx0Bl5ZTQijRNljF1Nqr0fkNNZ6GgfWs2hT37YVyn/ZjjMcvR3kQqYZWrZnhUZZBJhA8YIhFIozjKYWUKbs6hmwE2pfm92XLkjwbo+8CPa7bW+zvbn7enW7O6dj2XnhvHRcx3PeONvOB6fv7Dms4lUOKt8qtLpZJVVW5TPXytI85rlzQ6ryDy5rL1Q=</latexit>
X pij
min DKL (P, Q(Y )) = pij log
Y 2Rn⇥k
ij
qij (Y )
is minimized by the standard gradient descent :
Initialization : Y t=0 = PCA(X) 2 Rn⇥k
Iterate until convergence, t = 1, 2, ... :
Xn
t+1 t t
yi = yi lr (pij qij )(1 + kyit yjt k22 ) 1
(yit yjt ) 2 Rk
j=1
Xavier Bresson 58
59
Lab 5 : TSNE
Run code05.ipynb :
Compare TSNE with Laplacian Eigenmaps using MNIST.
Xavier Bresson 59
60
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 60
61
UMAP
Uniform Manifold Approximation and Projection (UMAP)[1] enhances TSNE
in several key aspects.
Notice that the TSNE gradient can be interpreted as an attractive force between Leland McInnes
data points, similar to interactions in physics.
UMAP generalizes this concept by introducing a more flexible attractive force, controlled by two
hyperparameters.
Additionally, UMAP introduces a repulsive force by sampling a few non-neighboring data points.
Instead of using PCA for initialization in gradient descent, UMAP uses Laplacian Eigenmaps,
which offer a more effective starting point.
To my opinion, both TSNE and UMAP excel in visualization.
The choice boils down to the implementation with the fastest computational speed.
[1] McInnes et-al, UMAP: Uniform manifold approximation and projection for dimension reduction, 2018
Xavier Bresson 61
62
Task formalization
The visualization task involves minimizing two physics-based losses, each parameterized by the
low-dimensional embedding coordinates of the data points.
One loss function generates attractive forces between closely connected data points on the graph,
while the other creates repulsive forces for data points that are far apart on the graph.
<latexit sha1_base64="o00dLQWwY178aP0F45YmIptg6kk=">AAACA3icbVDLSsNAFJ3UV42vqDvdBItQNyURqS4Lbly4qGAf0IYwmUzbsZMHMzdiCQE3/oobF4q49Sfc+TdO2iy09cCFwzn3cu89XsyZBMv61kpLyyura+V1fWNza3vH2N1ryygRhLZIxCPR9bCknIW0BQw47caC4sDjtOONL3O/c0+FZFF4C5OYOgEehmzACAYlucbBtdsH+gAiSDGAyKq+m7K77ETXddeoWDVrCnOR2AWpoAJN1/jq+xFJAhoC4VjKnm3F4KRYACOcZno/kTTGZIyHtKdoiAMqnXT6Q2YeK8U3B5FQFYI5VX9PpDiQchJ4qjPAMJLzXi7+5/USGFw4KQvjBGhIZosGCTchMvNATJ8JSoBPFMFEMHWrSUZYYAIqtjwEe/7lRdI+rdn1Wv3mrNKwijjK6BAdoSqy0TlqoCvURC1E0CN6Rq/oTXvSXrR37WPWWtKKmX30B9rnD/YHlwM=</latexit>
Lattr (dij )
<latexit sha1_base64="2M6JP81fG4MGPf6Mxfo4l4Vtv4Y=">AAACBHicbVBNS8NAEN3Urxq/oh57CRahXkoiUj0WvHjwUMF+QFvCZjtt1242YXciltCDF/+KFw+KePVHePPfmLQ9aOuDgcd7M8zM8yPBNTrOt5FbWV1b38hvmlvbO7t71v5BQ4exYlBnoQhVy6caBJdQR44CWpECGvgCmv7oMvOb96A0D+UtjiPoBnQgeZ8ziqnkWYVrr4PwgCpIFESxmJR6XsLvJiemaXpW0Sk7U9jLxJ2TIpmj5llfnV7I4gAkMkG1brtOhN2EKuRMwMTsxBoiykZ0AO2UShqA7ibTJyb2car07H6o0pJoT9XfEwkNtB4HftoZUBzqRS8T//PaMfYvugmXUYwg2WxRPxY2hnaWiN3jChiKcUooUzy91WZDqijDNLcsBHfx5WXSOC27lXLl5qxYdeZx5EmBHJEScck5qZIrUiN1wsgjeSav5M14Ml6Md+Nj1poz5jOH5A+Mzx/Ljpd6</latexit>
Lrepul (dij )
yj k22
<latexit sha1_base64="HlyvhjY16rp35RqqYVliCGJbxTM=">AAACAnicbVDLSsNAFJ34rPEVdSVuBovgxpIUqW6EghuXFewDmhgmk2k77eTBzEQIaXHjr7hxoYhbv8Kdf+OkzUJbD1w4nHMv997jxYwKaZrf2tLyyuraemlD39za3tk19vZbIko4Jk0csYh3PCQIoyFpSioZ6cScoMBjpO2NrnO//UC4oFF4J9OYOAHqh7RHMZJKco1D383ocHJlj1OXnqXu0B671fuqruuuUTYr5hRwkVgFKYMCDdf4sv0IJwEJJWZIiK5lxtLJEJcUMzLR7USQGOER6pOuoiEKiHCy6QsTeKIUH/YiriqUcKr+nshQIEQaeKozQHIg5r1c/M/rJrJ36WQ0jBNJQjxb1EsYlBHM84A+5QRLliqCMKfqVogHiCMsVWp5CNb8y4ukVa1YtUrt9rxcN4s4SuAIHINTYIELUAc3oAGaAINH8AxewZv2pL1o79rHrHVJK2YOwB9onz9VaZYI</latexit>
yj k22
<latexit sha1_base64="HlyvhjY16rp35RqqYVliCGJbxTM=">AAACAnicbVDLSsNAFJ34rPEVdSVuBovgxpIUqW6EghuXFewDmhgmk2k77eTBzEQIaXHjr7hxoYhbv8Kdf+OkzUJbD1w4nHMv997jxYwKaZrf2tLyyuraemlD39za3tk19vZbIko4Jk0csYh3PCQIoyFpSioZ6cScoMBjpO2NrnO//UC4oFF4J9OYOAHqh7RHMZJKco1D383ocHJlj1OXnqXu0B671fuqruuuUTYr5hRwkVgFKYMCDdf4sv0IJwEJJWZIiK5lxtLJEJcUMzLR7USQGOER6pOuoiEKiHCy6QsTeKIUH/YiriqUcKr+nshQIEQaeKozQHIg5r1c/M/rJrJ36WQ0jBNJQjxb1EsYlBHM84A+5QRLliqCMKfqVogHiCMsVWp5CNb8y4ukVa1YtUrt9rxcN4s4SuAIHINTYIELUAc3oAGaAINH8AxewZv2pL1o79rHrHVJK2YOwB9onz9VaZYI</latexit>
Algorithm
Minimization problem :
<latexit sha1_base64="KLhyBiZ8bAcIVZG+Vyj62neK1SI=">AAAFy3icjVTfT9swEA6j3Vj3C7bHvZxGh1LRVk00sQkJCYSmMQkQm8YvERo5rtuaOk5mO0AIftw/uLc97j+Z0xYopWhYsnL6znf+7rucg5hRqRqNP1OPpgvFx09mnpaePX/x8tXs3Os9GSUCk10csUgcBEgSRjnZVVQxchALgsKAkf2gt57790+JkDTiP1Qak+MQdThtU4yUgfy56b8LXki5nx16lHshUt0gyL7rZsY9RUMioae1B54i50qE2ab2r0yklND2WvWwAouT/ILECdO2U8uPeF5pAW7SOHVYj8I4UQQ6AsVdQK0ThAnHKRgGgp7DGpxRZXAo9/w+K4xY9kWXa9vbgxANYzndOmxRTkN6QSBI80MtSriCFpEms7o6/zNBresY+MqpoojRi74asAwaDpuZWmloWLkpCsWfaUfbB5V7JYLJ2RURyNSYcEUZ4IibRnRMlaQKZbXiVN1qvV4vw/Iot0GK1KeGxqKT0xjYGmrXiZnQeeEB7djgcRQw5KewOd4ZP6MnujqMNt+T/Ntv1t2Qm2bdE5VfNlZ+b2LNZ10iyN2CHkAzv6ySy94WCGc1FwXepYFrBvYufbeZuXZQcypaZ87ibY+rB1nsIThO9AFkJghwm48b6Mx2FtHIzc0c9N2K7Z0iEy8pi/gos6ZrnDc5K/+nNyYjlFE1KINJbnZAlUAiha4ZYlGLkUAhMf+XNP+Vmd8OtEUUguoSiCOWcoMhJvt9KPmz8416o7/gruEMjXlruHb82d9eK8JJaMYGMyTlkdOI1XGGhKKYEV3yEklihHuoQ46MyQ0TeZz13yIN7w3SgnYkzDbj10dHIzIUSpmGgTmZayDHfTk4yXeUqPan44zy/NngeHBRO2GgIsgfNmhRQbBiqTEQFmauMeCukQnnKuUiOOMl3zX23LqzVF/69mF+1R3KMWO9td5ZtuVYH61Va8PasXYtXNgo8MJZ4by4VZTFi+Ll4OijqWHMG+vWKv76B/cm8Os=</latexit>
Xavier Bresson 63
64
Lab 6 : UMAP
Run code06.ipynb :
Compare UMAP visualization with LapEigenmaps and TSNE on MNIST.
Apply UMAP on CIFAR (raw) images. CIFAR dataset
2D and 3D TSNE Mosaic of CIFAR images 2D and 3D UMAP Mosaic of CIFAR images
of CIFAR inception with TSNE and inception of CIFAR inception with UMAP and inception
features features features features
Xavier Bresson 65
66
Xavier Bresson 66
67
[1] Andrej Karpathy’s course cs231n on convolutional neural networks for image recognition
Xavier Bresson 67
68
Xavier Bresson 68
69
Outline
Visualization as dimensionality reduction
Linear visualization techniques
Standard PCA
Robust PCA
Graph-based PCA
Non-linear visualization techniques
LLE
Laplacian eigenmaps
TSNE
UMAP
Conclusion
Xavier Bresson 69
70
Conclusion
Data
Linear Non-linear
structure structure Main property
φ preserves local distances in high-dim
and in low-dim spaces:
xi
zi = Axi zi = '(xi ) + ' '(xi ) '(xj )
+ xj + +
Low-dim High-dim Low-dim High-dim
data data data data
Rd , d 1 R3
PCA ICA LDA Sparse Coding Kernel PCA NMF LLE LapEigMaps T-SNE UMAP
1901 1936 1985 1996 1998 1999 2000 2003 2008 2018
Xavier Bresson 70
71
Conclusion
Non-linear visualization techniques typically involve two key steps :
Construct a k-nearest neighbors (kNN) graph from the high-dim data points, and
Determine low-dim embedding coordinates, usually in 2D or 3D, that preserve the pairwise
distances between the high-dim data points while maintaining a specific visualization
property :
In spectral-based methods like LLE and Laplacian Eigenmaps, the embedding
coordinates Y are orthogonal,
In TSNE, the low-dim distribution is optimized to match the high-dim distribution using
the Kullback-Leibler (KL) distance,
UMAP achieves its embedding Y by balancing attractive and repulsive forces based on
the high- and low-dim data representations.
Xavier Bresson 71
72
Conclusion
LLE and LapEigenmaps
Offer global solutions but lack flexibility in adjusting the visualization outcome.
The spectral orthogonality constraint, needed to avoid trivial solutions, restricts
the low-dimensional embeddings.
TSNE and UMAP
Produce local solutions through non-linear loss functions,
offering greater flexibility and often more visually appealing results.
Without the orthogonality constraint, the class of possible solutions is larger,
and more diverse than those provided by spectral methods.
UMAP includes two hyperparameters that control the visualization aspect.
While this flexibility is advantageous, it also raises the question: what are the optimal
UMAP hyperparameter values for visualizing a new dataset?
Xavier Bresson 72
73
Conclusion
No visualization technique, whether linear or non-linear, is universally effective for high-
dimensional data due to the curse of dimensionality.
A key prerequisite for successful visualization is that the input data must be sufficiently
expressive.
In some cases, representing e.g. a text document as a bag of words (a distribution over the
dictionary) may work.
But typically, the most effective visualizations are achieved using the hidden representations
from a deep neural network.
For example, extracting the hidden vector R2048 of one of the last layers of an Inception or
ResNet network can provide a strong representation of images.
Similarly, using the memory state vector R512 from an RNN can effectively represent a time
series, or a class token embedding R1024 from the last layer of a Transformer.
Xavier Bresson 73
74
Questions?
Xavier Bresson 74