Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views14 pages

GRL Unit 3

Graph Neural Networks (GNNs) are advanced deep learning models designed to process graph-structured data by dynamically generating node embeddings through a message passing framework. They address challenges posed by irregular graph structures, utilizing aggregation and update functions to refine node representations over multiple iterations. Various techniques, including attention mechanisms and relational graph processing, enhance GNN performance and expressiveness in tasks such as node classification and molecular analysis.

Uploaded by

20 JANANI N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

GRL Unit 3

Graph Neural Networks (GNNs) are advanced deep learning models designed to process graph-structured data by dynamically generating node embeddings through a message passing framework. They address challenges posed by irregular graph structures, utilizing aggregation and update functions to refine node representations over multiple iterations. Various techniques, including attention mechanisms and relational graph processing, enhance GNN performance and expressiveness in tasks such as node classification and molecular analysis.

Uploaded by

20 JANANI N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

19-04-2025

Introduction
• Graph Neural Networks (GNNs) are a class of deep learning models designed to work with graph-
structured data.
• Unlike traditional neural networks (CNNs for images, RNNs for sequences), GNNs can effectively model
relationships between nodes in a graph
Shallow Embeddings vs. Deep Encoders
• Shallow Embeddings (like node2vec, DeepWalk) assign a fixed vector to each node without considering
graph structure beyond proximity.
Graph Neural Networks (GNNs) • GNNs generate node embeddings dynamically, considering the graph structure and node features.
Message Passing Framework
• Nodes aggregate information from their neighbors in multiple layers.
• At each layer, a node updates its embedding using neighbor embeddings and its own features.
• After multiple layers, nodes have information from a wider neighborhood.
Challenges in Graph Learning
• Traditional deep learning methods (CNNs, RNNs) don’t apply because graphs have an irregular
structure.
• GNNs solve this by defining custom aggregation functions to process graph data.

5.1.1 Overview of the Message Passing


5.1 Neural Message Passing in GNNs
Framework
Node Embedding Update:
• Extends traditional deep learning to graphs. • Each node u updates its embedding h(k)ₐ using information from its neighbors N(u).
Message Aggregation:
• Inspired by convolutions, belief propagation, and graph isomorphism • The model aggregates messages from neighbors (e.g., B, C, D for node A).
tests. • These messages are based on information from their own neighbors.
• Uses neural message passing, where nodes exchange vector Mathematical Formulation:
messages and update their embeddings. • Aggregation Function (AGGREGATE) collects information from neighbors.
• Update Function (UPDATE) combines the previous embedding and aggregated message.
• Given a graph G = (V, E) with node features X ∈ Rⁿ, GNNs generate • Initial embeddings are set to node features h(0)ₐ = xₐ.
node embeddings zₙ. Final Node Embeddings:
• Can be applied to nodes, subgraphs, and entire graphs for various • After K iterations, the final node embedding is zₐ = h(K)ₐ.
tasks. Permutation Equivariance:
• Since AGGREGATE takes a set as input, GNNs are order-independent in processing neighbors
19-04-2025

AGGREGATE collects information from neighbors.


UPDATE integrates this information into the node’s new embedding.
This process repeats for multiple iterations K, refining the embeddings over time.

The Basic GNN


• The basic GNN defines UPDATE and AGGREGATE functions in a
Information Aggregation: structured way.
• Each node gathers information from its neighbors at every iteration. • Inspired by early models (Merkwirth & Lengauer, 2005; Scarselli et
• After k iterations, a node embeds information from its k-hop neighborhood. al., 2009).
Types of Information Encoded:
Structural Information:
• Captures graph properties like node degrees and structural motifs.
• Useful for applications like molecular analysis (e.g., detecting benzene rings).
Feature-Based Information:
• Embeds feature details from the node’s neighborhood.
• Works similarly to CNNs, but instead of spatially-defined patches, GNNs aggregate data
from graph neighborhoods.
19-04-2025

• GNN as an MLP/RNN Analogy:


• Similar to a multi-layer perceptron (MLP) or Elman RNN (Elman, 1990). Message Passing with self loops
• Steps: Simplified Approach:
• Aggregate messages from neighbors.
• Combine them with the node’s previous embedding using a linear function. • Adds self-loops to the graph, including the node itself in aggregation.
• Apply a non-linearity. Equation:
• No separate UPDATE function is needed.
Advantages:
• Reduces complexity and helps prevent overfitting.
Disadvantages:
• Limits expressivity since the node's own features and its neighbors’
• This formulation forms the foundation for more advanced GNN architectures. features are mixed together.

Example
• Matrix Form (for the basic GNN): • Problem: Predict a Node's Category in a Social Network
• Imagine a small social network where we want to predict if a person
• A is the adjacency matrix. is interested in sports based on their friends’ interests.
• I is the identity matrix (for self-loops).
• W is a trainable weight matrix.
• Graph Representation:
Self-loop GNN: • Nodes (V) → People
• This method is referred to as the self-loop GNN approach • Edges (E) → Friendships
• Node Features (X) → Whether a person likes sports (1: Yes, 0: No)
19-04-2025

• A, C, and E like sports (1). 5.2 Generalized Neighborhood Aggregation


• B and D do not like sports (0).
• We want to predict for B based on its neighbors.
• Below model can achieve strong performance
• GNN Message Passing (1 Iteration):
• Aggregate Neighbor Information
• B’s neighbors: {A(1), D(0), C(1)}
• Aggregate (sum/mean):
• (1+0+1)/3=0.66
• Apply Update Rule
• Update B’s feature using a simple function:
• hB =ReLU(W×0.66+b) • the basic GNN can be improved upon and generalized in many
• (W and b are trainable parameters.) ways.
• After Multiple Iterations • One way is to use AGGREGATE operator can be generalized and
• Each node refines its feature representation based on its neighbors.
• The model learns patterns and predicts if B likes sports.
improved upon,

• Numerical instability due to large variations in aggregation values.


5.2.1 Neighborhood Normalization • Optimization challenges, as models struggle to learn meaningful
Problem with Basic Neighborhood Aggregation representations when node degrees vary drastically.
• The most fundamental neighborhood aggregation in Graph Neural
Networks (GNNs) sums the embeddings of a node’s neighbors:

Degree-Based Normalization
• To stabilize aggregation and mitigate degree-related distortions, one
straightforward approach is degree normalization, which takes an
average instead of a sum:
• A key issue with this approach is its sensitivity to node degree. If
node u has significantly more neighbors than node u′ the magnitude • This ensures that all nodes, regardless of their degree, contribute
of its embedding sum will be much larger: similarly to their own representations.
19-04-2025

Symmetric Normalization (Kipf & Welling, 2016) Connection to Spectral Graph Theory
• A more refined symmetric normalization method considers the • By combining symmetric normalization with the basic GNN update
degrees of both the central node and its neighbors: function:

• this approach approximates a first-order spectral graph convolution.


Spectral methods define convolution operations in the graph Fourier
domain, and symmetric normalization provides an efficient
approximation.

• This method is particularly useful in citation graphs, where widely


cited papers (high-degree nodes) are cited across diverse subfields,
making them less informative for community detection.

Graph Convolutional Networks (GCNs) 5.2.2 Set Aggregators


• Graph Convolutional Networks (GCNs), introduced by Kipf & Welling
(2016), build upon these normalization techniques. The GCN • several ways to further improve the AGGREGATE operator beyond simple
message-passing function is: summation in Graph Neural Networks (GNNs)

• Neighborhood aggregation is a set function and must be permutation invariant.


• Attention-based aggregation (e.g., GAT) assigns learnable weights to neighbors.
• Alternative reduction functions: mean, max-pooling, LSTM-based, or a combination.
• Simple summation of neighbor embeddings is common but can be improved.
• MLP-based aggregation (Deep Sets) applies MLP transformations before summation.
• Higher-order aggregation captures multi-hop neighbors (e.g., MixHop).
• Spectral aggregation (e.g., GCN) applies graph convolution in frequency space.
• Subgraph-based methods use motifs or pooling (e.g., DiffPool) for richer representations
19-04-2025

Set Pooling (Permutation-Invariant Aggregation) Janossy Pooling (Permutation-Sensitive Aggregation)


• Set pooling ensures permutation invariance in aggregation functions. • Janossy pooling is an alternative to set pooling that increases the
• Zaheer et al. (2017) proposed a universal set function approximator expressiveness of GNNs.
• Unlike set pooling (which relies on permutation-invariant reductions
like sum, mean, or max), Janossy pooling:
• First MLP transforms each neighbor embedding. • Applies a permutation-sensitive function to neighbor embeddings.
• Then, a sum operation aggregates them. • Averages the result over multiple permutations of the input set.
• A final MLP maps the sum to the output embedding.
• Alternative reduction functions: Sum can be replaced with max or min pooling (Qi Challenges & Solutions:
et al., 2017). maps the sum to the output embedding. • Computationally intractable to sum over all
• GraphSAGE-pool (Hamilton et al., 2017) combines this approach with permutations.
neighborhood normalization. • Two practical solutions:
• Sample a subset of permutations instead of
• Trade-off: Improves performance slightly but increases overfitting risk. using all.
• Best practice: Use single-layer MLPs to avoid excessive parameters. • Use a canonical ordering (e.g., sorting
neighbors by degree).

• Let's consider a Graph Neural Network (GNN) where a node uuu has
three neighbors with embeddings:

Key properties:
Key properties: More expressive—captures
Order of neighbors does not matter. neighbor relationships based on
Efficient and computationally simple. order.
Limited expressiveness—loses order- Can learn complex interactions
dependent information. between neighbors.
Computationally expensive—
requires processing multiple
permutations.
19-04-2025

Example of Neighborhood Attention in GNNs


5.2.3 Neighborhood Attention in GNNs
• Scenario: Classifying Research Papers in a Citation Network
1.Attention in Aggregation
1. Assigns different importance (weights) to each neighbor during the aggregation step in GNNs. • Consider a citation network where nodes represent research papers, and
2. Helps refine information flow by prioritizing more relevant neighbors. edges represent citations between them. The goal is to classify each paper
2.Graph Attention Networks (GAT) into a topic category based on its connections.
1. Uses self-attention mechanisms to compute attention scores for each neighbor. Without Attention:
2. The final representation is a weighted sum of neighbors based on learned importance.
• A standard Graph Neural Network (GNN) aggregates information equally
3.Types of Attention Mechanisms from all neighbors.
1. Additive Attention (Bahdanau et al., 2015): Computes attention using a feedforward network.
2. Dot-Product Attention: Uses inner product between node embeddings. • If a paper is cited by many unrelated papers (e.g., interdisciplinary papers),
3. Bilinear Attention: Involves a trainable weight matrix between node embeddings. their influence may mislead classification.
4. MLP-Based Attention: Uses a Multi-Layer Perceptron (MLP) to compute attention scores. With Attention (GAT):
4.Multi-Head Attention in GNNs • The model assigns higher attention weights to relevant neighbors (papers
1. Inspired by Transformer models, it applies multiple independent attention mechanisms in
parallel. from the same topic).
2. Helps capture diverse perspectives from different attention heads. • It reduces the impact of papers that are cited across multiple fields and
5.Advantages of Attention in GNNs might introduce noise.
1. Enhances representational power by filtering out irrelevant neighbors.
2. Helps in tasks like node classification by focusing on meaningful relationships.
3. Improves interpretability by highlighting important connections in a graph.

5.3 Generalized Update Methods


• The AGGREGATE operator in GNNs has received significant research
focus, especially after the GraphSAGE(Graph sample and Aggregate)
framework introduced generalized neighborhood aggregation.
• However, UPDATE is equally important in defining the power and
inductive bias of GNN models.
• Basic GNN update methods include:Linear combination of the node's
current embedding with the aggregated neighbor messages.
• Self-loop approach, where self-loops are added before aggregation
19-04-2025

5.3.1 Concatenation and Skip-Connections


• Example: Using Skip Connections in a
Problem: Over-Smoothing in GNNs Citation Network
• Over-smoothing happens when repeated message passing causes • Imagine a citation network, where nodes
node-specific information to be lost. represent research papers, and edges
• The node representation becomes dominated by neighbor represent citations between them. The
information, leading to indistinguishable node embeddings. goal is to classify papers into different
research fields (e.g., AI, Biology, Physics).
Solution: Skip-Connections & Concatenation
• If a highly cited interdisciplinary paper
• Concatenation Approach: Keeps previous layer information by connects to many different fields, a • Blue nodes (P1, P2, etc.) represent
concatenating past node embeddings with updated embeddings. standard GNN might over-smooth its research papers.
representation. • Black arrows show citations (standard
message passing in GNN).
• Interpolation Approach: Uses a weighted combination of past • After multiple layers of message passing, • Red dashed arrows represent skip
embeddings and updated embeddings. its unique field-specific identity gets lost, connections (concatenation or
making it harder to classify accurately. interpolation) that help retain original node
information

Example: Visualizing GGNN Operations on a Molecular Graph


5.3.2 Gated Updates • Imagine a molecule represented as a graph where:
• enhance model performance by integrating mechanisms from • Nodes represent atoms.
• Edges represent chemical bonds between atoms.
Recurrent Neural Networks (RNNs), such as Gated Recurrent Units
• GGNN processes this molecular graph:
(GRUs) and Long Short-Term Memory (LSTM) cells 1. Initial Representation:
• Update Function: Nodes update their hidden states using functions 1. Each atom (node) is initialized with a feature vector capturing its
properties (e.g., atom type, charge).
like: 2. Each bond (edge) is characterized by its type (e.g., single, double).
2. Message Passing with Gating:
1. Over multiple iterations, each atom updates its state by aggregating
information from neighboring atoms.
2. Gating mechanisms (similar to GRUs) control the flow of information,
allowing the network to manage long-term dependencies and retain
relevant information while discarding irrelevant data.
• Benefits: Enhances learning of complex patterns and improves 3. Updated Node States:
training stability, facilitating deeper GNN architectures. 1. After several iterations, each atom's representation captures not
only its own properties but also the contextual information from its
neighbors, effectively encoding the molecular structure.
• Applications: Particularly useful in tasks requiring intricate reasoning 4. Prediction:
over entire graph structures, such as program verification 1. The aggregated information enables the GGNN to predict specific
chemical properties, such as reactivity or stability, based on the
learned representations.
19-04-2025

5.4 Edge Features and Multi-relational


GNNs
5.4.1 Relational Graph Neural Networks
Separate Transformation Matrices for Each Relation Type:
• In standard GNNs, a single weight matrix is used for all edges.
• In RGCNs, a unique transformation matrix W is assigned to each
relation type , allowing the model to process different edge types
separately.
Multi-Relational Aggregation:
• A node u aggregates messages from its neighbors v, but the update
considers the relation r between them.
Normalization Strategies:
• Since different relations may have varying numbers of edges, normalization ensures balanced
influence from different relation types.
• Various strategies for defining fn similar to those used in standard GNN normalization techniques.

Parameter Sharing in RGCNs


• A major drawback of the naïve RGCN approach is the large number of parameters due to having a Aggregation Function with Parameter Sharing:
separate transformation matrix for each relation type.
• With this new approach, the message-passing formula in RGCNs is
• This can lead to overfitting and slow learning, especially in applications like knowledge graphs that have
many distinct relationships. rewritten as:
• Solution-To address this issue, Schlichtkrull et al. (2017) introduced parameter sharing through
the basis matrix approach.
Basis Matrix Approach:
• Instead of learning a separate weight matrix W for each relation we define it as a linear
combination of a small set of b shared basis matrices B1,B2,...,Bb:
19-04-2025

Key Benefits:
5.5 Graph Pooling
1.Fewer Parameters:
1. Instead of learning one matrix per relation, the model only learns a small set • Graph pooling is a technique used to generate an embedding for an entire graph from its node
embeddings. While message passing in GNNs results in node-level representations (zu for each
of basis matrices and a few relation-specific weights. node u), graph pooling aggregates these into a graph-level embedding (zG)
2. This makes training more efficient and helps prevent overfitting. 1. Set Pooling Approaches
2.Faster Learning & Generalization: • Since graphs are unordered sets of nodes, pooling functions must be designed to aggregate node
1. Since fewer parameters are involved, the model converges faster and embeddings in a permutation-invariant way.
generalizes better, especially for large-scale knowledge graphs with many • A. Mean/Sum Pooling (Simple Pooling)
relations. • The most straightforward pooling method is taking the sum or mean of all node embeddings
3.Alternative Interpretation:
1. This approach can also be viewed as learning an embedding for each
relation, along with a tensor that is shared across all relations.

B. Attention-Based Pooling (LSTM + Attention) 3. Compute Attention Weights


This method learns a graph embedding using an LSTM and attention • Convert scores into a probability distribution using softmax
mechanism, inspired by sequence models.
Step-by-Step Process
1.Initialize Query Vector • Ensures that the most relevant nodes contribute more.
1. Define a query vector qt (ini ally zeros).
2. Update it iteratively using an LSTM.
4. Weighted Sum of Node Embeddings
2.Compute Attention Scores for Nodes • Aggregate embeddings based on attention weights
1. At each iteration t, compute attention scores for all nodes:

• ot represents the intermediate pooled output.


5. Update Query Vector Using LSTM
• Update query vector using LSTM
19-04-2025

6. Final Graph Embedding


• After T iterations, obtain the final graph embedding
19-04-2025

Graph Neural Networks in Practice


6.1 Applications and Loss Functions
Key GNN Tasks:
• Node Classification: Identifying node properties, e.g., detecting bots in social networks.
• Graph Classification: Predicting graph-level properties, e.g., molecular property prediction.
• Relation Prediction: Inferring connections, e.g., recommending content on online platforms.
• Loss Functions in GNNs:
• Loss functions are defined on node embeddings (zu) or graph-level embeddings (zG).
• Any GNN model can generate these embeddings.
• Loss gradients are optimized using stochastic gradient descent (SGD) or its variants.
• Sum and mean pooling treat all nodes equally.
• Attention pooling gives more weight to important nodes.
• Unsupervised Pre-training:
• The softmax distribution ensures that nodes with higher importance get higher weight. • Helps improve GNN performance on downstream tasks.
• Typically involves self-supervised learning methods.

6.1.1 GNNs for Node Classification


Benchmark Tasks:
• Node classification is a key GNN task, widely studied using datasets like Cora, Citeseer, and Pubmed.
• Research was significantly influenced by Kipf and Welling (2016).
• Typically involves classifying scientific papers based on their citation network position.
Training Approach:
• GNNs are trained in a fully supervised manner for node classification.
• Uses softmax classification and negative log-likelihood loss

• The softmax function computes the predicted probability for each node’s class.
Loss Computation:
• One-hot encoded labels ( yu) represent class assignments.
• Softmax function normalizes node embedding scores

wi are trainable parameters for each class


19-04-2025

6.1.2 GNNs for Graph Classification 6.1.3 GNNs for Relation Prediction
• Benchmark Tasks:
• Graph classification is a widely studied problem, historically tackled with kernel methods.
• Early benchmarks were adapted from kernel-based literature (e.g., enzyme property classification • Applications:
[Morris et al., 2019]). • Used in recommender systems ([Ying et al., 2018a]) and knowledge graph
• Loss Function for Graph Classification: completion ([Schlichtkrull et al., 2017]).
• Uses a softmax classification loss, similar to node classification. • Predicts relationships between entities instead of classifying nodes or graphs.
• Key difference: Loss is computed over graph-level embeddings (zG) instead of node
embeddings.
• Loss Functions:
• The training set consists of labeled graphs: T = {G₁, ..., Gₙ}. • Pairwise node embedding loss functions are commonly used.
• GNNs replace shallow embeddings in traditional relation prediction methods.
• Regression Tasks with GNNs:
• GNNs are also used for graph-based regression, particularly in molecular property prediction
• Can be integrated with any standard pairwise loss functions for relation learning.
(e.g., solubility).
•MLP (Multi-Layer Perceptron): A neural network maps graph embeddings to scalar target values.
• Uses a squared-error loss:

• Loss function

6.1.4 Pre-training GNNs


• Concept of Pre-training:
• Pre-training is a common deep learning practice ([Goodfellow et al., 2016]).
• In GNNs, it was hypothesized that pre-training with neighborhood reconstruction
losses (e.g., edge reconstruction) could improve classification performance.
• Limited Success of Neighborhood Reconstruction: • D is a discriminator trained to distinguish real node embeddings
• Studies ([Veličković et al., 2019]) found no significant improvement from pre-training from corrupted ones.
with neighborhood reconstruction.
• Hypothesis: Message passing in GNNs already encodes neighborhood information,
• Corruption is done by shuffling node features, modifying
making reconstruction loss redundant. adjacency matrix, etc.
• Successful Pre-training Strategies: • Unsupervised GNN Training:
• DGI is one example of unsupervised GNN objectives that work well.
• Deep Graph Infomax (DGI) ([Veličković et al., 2019]) is an effective alternative.
• Other approaches ([Hu et al., 2019; Sun et al., 2020]) also focus on maximizing mutual
• DGI maximizes mutual information between node embeddings ( zu ) and graph information between different graph representations.
embeddings ( zG ).
19-04-2025

6.2 Efficiency Concerns and Node Sampling 6.2.1 Graph-level Implementations


• Computational Challenges in GNNs: • Implement message passing using graph-level equations instead of node-wise operations.
• Leverages sparse matrix multiplications for efficiency.
• Directly implementing node-level message passing can be inefficient.
• Redundant computations occur when multiple nodes share the same • Graph-level GNN Equation:
• Computes node embeddings using
neighbors.
• Need for Efficient Implementation:
• Optimizing GNN execution requires reducing redundant operations.
• Strategies are needed to improve efficiency without compromising • Advantages:
accuracy. • Avoids redundant computations by updating all nodes simultaneously.
• Solution: Node Sampling Techniques: • More efficient than node-level updates.
• Challenges:
• Instead of processing all neighbors of a node, sampling methods
• Requires loading the entire graph into memory, which may be infeasible for large graphs.
selectively choose a subset. • Works best with full-batch gradient descent, limiting support for mini-batches.
• Sampling reduces computational costs while maintaining
representation quality.

6.2.2 Subsampling and Mini-Batching


6.3 Parameter Sharing and Regularization
• Goal:
• Reduce memory usage and enable mini-batch training for large graphs.
• Avoid redundant computations by ensuring that each node embedding is computed at most once per • Regularization is essential for GNNs, and standard techniques like L2
batch. regularization, dropout, and layer normalization are effective.
• Challenges: However, there are also GNN-specific regularization strategies:
• Loss of information when running message passing on only a subset of nodes.
• Disconnected subgraphs: Random selection of nodes can break graph connectivity. • Parameter Sharing Across Layers:
• Impact on performance: Improper mini-batching can degrade model accuracy. • The same parameters are used in all AGGREGATE and UPDATE functions.
• Proposed Solution (Hamilton et al., 2017b): • Effective for deep GNNs (more than six layers) and often combined with
• Neighborhood Sampling: gated update functions.
1.Select a set of target nodes for a batch.
2.Recursively sample neighbors of these nodes to maintain graph connectivity. • Edge Dropout:
3.Limit the number of sampled neighbors per node to ensure efficient batched computations. • Randomly removes edges in the adjacency matrix during training.
• Further Improvements: • Helps reduce overfitting and makes GNNs more robust to noise.
• Follow-up works (Chen et al., 2018) introduced more efficient subsampling techniques.
• Applied in knowledge graphs and graph attention networks (GATs).
• These methods enable GNNs to scale to large real-world graphs (e.g., recommender systems, social
networks).

You might also like