Graph Convolutional Neural Networks
Graphs irregular datastructuretypically
are
used to represent Hierarchal relationships
No parent and child relationship exist
DAG Tree with loops a selfloops
h
G V E
ItyTjmYht
f
L set of
Set
of Edges Several problems
Verticies WI YA Can be easily
u v 4,9 embedded into graphs
ya
and can be solved
It has been observed using existing 4efficient
that
for learningrelated graph algorithms for
traversed shortestpath
Graph embedded problems
Graph CNN can be Even with few
very
Powerful tool
learning Guv randomly intyed
2 types GCN's layers can entract
of Meaningful node's feature
spectral 9Spatial representation
D
C
xD xE
into
A
A GCN B
B The learned node
Any graph 9 4 E embeddings must
preserve the relative
6 b nearness
A B C D E
AC
Nk 5 AE BC BD CD CEI
1131 6
Adjacency matrix A a
x MXN
A B C D E Each node A B c D E have
A 00 I O l a
feature fat vector associated
B O O I 1 0 to it Let us assume that the
C 1 I O l 1 Size of input feature is fo
D O I 1 O O i e each node have a FO
l O
F I O dimensional feature vector
1 2 3 Fo Wtf FA f B te fo te
Y A fa SE SE fn
2 B fig ff ffg fpfo Ifp feature Matrix X
3 e N X Fo
i D
Ef FEZ fee
Hence
from any given graph G d
V E we
will have 2 matrices entracted
M N
as infut
Adjacency matrix A and feature matrixLx
N xN Nx FI
it Hi
forfhnIIPf Ha
GCN
H
legend
f x A
Hi f
Yui
A
L
Ho X f propagation function
we have the Gp of anylithhidden
layer Hi taking a
feature of size
i 1
f Corresponding to any
node and
transforming it into Fi dimensions
i l
H ithlayer Hi
F dim Flodin
Hence the feature matrix afterEi hidden
layers will be of size fi
IN
Wi W AnyW
Input µ µzW3 µ5WY
A X H
K7 xF9
atxFJ Enix
Ff
pi pi
qµ FOX F
2 f
2
13 p
3
1 5 24
f xp
At each layer the node embedding feature vector
CAD to
got aggregated utilizing its neighbourhood
formthe next features the learned
layer using
propagation rule parameterized ones
H fw Hii A
say cults
Nxf3 fwzlfe.is xn
2 3
f 1
The most simplest rule can be
Wi
Hi fw Hi A 0 A Hi.it
Nx ntxr.it i
ypij.E hd.EaeaTEif
For simplicity let us assume
done
a linear function filterengasconnected
in
any fully
HE A HoWo Axwo Network
w I
AX timhom
313 Considering Such a simple
graph with random but
to easy
a god
21 z
node inital embeddings
use
to understand what
is nabbing
L
41,1 ooo 38 ooo
1 O O I 1 I l I
A
2 O l O O 2 2 2
3 I O l O 33 3
iii Iii's
t
the addition of features of
its neighbours
mm
0 10,03 El I I I
I 1 I 2 23 3 3 15 5 5 5
I I
2 2 2 EI I
3 3 2
3 O o 152 2
Neighbourhood feature aggregation just as it
happens in convolutional
neural network
But since there are no
self
p loops their own features are
ignored Jn matrix CAT diagonals are o
Just need to add selfloof A I
An At matrix
goffion ILIdentity
a
i
t
g
ppbbM forany node we will be
Considering the i row
of the adjacency
mattress LA
Basically dtu feature of
node i will be the the
aggregation of
LafIuaTL n
it belongs b the
neighbourhood of
node
Degree Z Aig
I I ton
At degreeis very
high too many things willbe added This may lead
to
Nf degree is
low then way vanishenflode
the absolute value
less things will be added
of thefeature
As optimizers are This resent
may
quietsensitive b ilpscales in
baceepsopagaton issues
Hence the
finally aggregated mode featuse
Need to be
suitably normalized
o
The node features are normalized
solution
by the
degree of the node
Aggregated featese per unit
of degree
D ch
des
If
f µ
Yd 2
Yds
yd
Hence the
final update rule can be
f X A D A X Ca
D II X b
A A TI
solving both issues
Therefore for our case
on 3131
fog
o
O
L
2421
l Ii
I
I
I
f's I'q
I M Iii
a D AX b D An X
c H
ii n
This aggregation utilizes
Modelsfeature 4J
as well as aggregated
Now the transformation parameterized
leaving
by Curt A'x
fue I w
ca Fw I a'x W Lb fwz I A'x Nz
Iii and It
Kiana
An order to
I H
I
4 1
iii IM
After aggregation transformation nonlinearity
has been introduced Rehe Activation lol
using
Relu fw x AI
H 0 DAy W
12g uhI punks
mp
final
representation
layer ofp
for Aggregations CA X d D are
required
Transfernaturs N O are reamed
Hence one can see that function Cfw
any
learned at b Can be represented as
anylayer
fµ HFA Transform AggregateCA HD W
I 4
w Ruled sum
Transferm LM
Aggregate A He AH
0cm w
Rule 2 MEAN
Aggregate A H
AA
II
D H
At I
Basically both aggregation rule uses weighted
Sum of its neighboring features Just
weighing
is different
SUM Rule Neighbourhood Aggregation
yr
a
i iii
I
iii
1 3 1
Let us working on node hence
using
Moro of adjacency matron Ai
aggCA x
i Aix Ami
XI
H feature
Klute contribution only of Mode
of each
mdfmohY7yayfayfa
14471M
n ou
g
Meanwhile
agg.CA x i D Aix
a5 Mrs
i.is Aii
mum
xj
is fined Just same as
Varies from to Poem'ous
3io iAi.ixi I3ai o
x
x
Contibution
got feature
weighted by degree of
the 4th Node
of goth
mode
N
I Aii EI Ai 1
5 1 Dii pi
are to be Seemed equal 641
Weights guranted
for any nodecilfweignted
This can beseen as the mean feature
Normalization
representations
of neighbours of node
spectral Rule o
aggCA xho D Ai
Degree of
x
Ifi i
its I IH ith oedthnode
A
if
Dii
Dii exile
EX Contributionof xD'm
3i xi
the aggregation
off'm
Mode can be
oA X
Dip Dj
If both are highly connected
Less contribution ly
If both are Sparsely Connected
High Contribution In