Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views15 pages

GScan

G-Scan is an innovative system designed for fine-grained line-level vulnerability detection in Ethereum smart contracts, achieving high accuracy with a 93.02% F1-score for contract-level detection and 93.69% for line-level localization. It utilizes graph neural networks to convert smart contracts into code graphs, allowing for efficient identification and mapping of vulnerabilities back to specific lines of code. This approach addresses the challenges of existing methods by providing a scalable solution that improves the security of smart contracts while reducing the burden on developers.

Uploaded by

160421737033
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

GScan

G-Scan is an innovative system designed for fine-grained line-level vulnerability detection in Ethereum smart contracts, achieving high accuracy with a 93.02% F1-score for contract-level detection and 93.69% for line-level localization. It utilizes graph neural networks to convert smart contracts into code graphs, allowing for efficient identification and mapping of vulnerabilities back to specific lines of code. This approach addresses the challenges of existing methods by providing a scalable solution that improves the security of smart contracts while reducing the burden on developers.

Uploaded by

160421737033
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

G-Scan: Graph Neural Networks for Line-Level

Vulnerability Identification in Smart Contracts

Christoph Sendner∗ , Ruisi Zhang† , Alexander Hefter∗ , Alexandra Dmitrienko∗ , and Farinaz Koushanfar†
∗ University of Würzburg, Germany
† University of California San Diego, USA

Abstract—Due to the immutable and decentralized nature of Many prior works explored vulnerability detection algo-
Ethereum (ETH) platform, smart contracts are prone to security rithms on Ethereum smart contracts, and most of them can
risks that can result in financial loss. While existing machine be categorized into two types: (1) contract-level vulnerability
arXiv:2307.08549v1 [cs.CR] 17 Jul 2023

learning-based vulnerability detection algorithms achieve high detection and (2) line-level/node-level vulnerability detection.
accuracy at the contract level, they require developers to manu- Contract-level vulnerability detection takes the detection as
ally inspect source code to locate bugs. To this end, we present
G-Scan, the first end-to-end fine-grained line-level vulnerability
a classification problem and uses symbolic execution [21],
detection system evaluated on the first-of-its-kind real world [23], [24], fuzzing [12], [16], [35], and machine learning algo-
dataset. G-Scan first converts smart contracts to code graphs in rithms [19], [27], [37], [38], [43] to find vulnerable contracts.
a dependency and hierarchy preserving manner. Next, we train These methods achieve over 90% F1 scores in classification,
a graph neural network to identify vulnerable nodes and assess but developers are still required to examine the contract line-
security risks. Finally, the code graphs with node vulnerability by-line to localize buggy statements.
predictions are mapped back to the smart contracts for line-
level localization. We train and evaluate G-Scan on a collected More recently, SCScan [14] proposed a vulnerability de-
real world smart contracts dataset with line-level annotations on tection system at the line level, while MANDO [26] pro-
reentrancy vulnerability, one of the most common and severe posed a node-level detection system with the aim of reducing
types of smart contract vulnerabilities. With the well-designed the development burden. SCScan first uses a support vector
graph representation and high-quality dataset, G-Scan achieves machine (SVM) to detect vulnerable contracts and then a
93.02% F1-score in contract-level vulnerability detection and pattern matching algorithm to determine the exact positions
93.69% F1-score in line-level vulnerability localization. Addition- of the vulnerabilities. However, SCScan is primitive which
ally, the lightweight graph neural network enables G-Scan to can only accept fix-size contracts as inputs to SVM, and the
localize vulnerabilities in 6.1k lines of code smart contract within
1.2 seconds.
pattern matching algorithm fails to scale as new attacks appear.
MANDO [26], on the other hand, only detected vulnerabilities
in the code graph but failed to map them back to the source
I. I NTRODUCTION code. MANDO employs a two-stage graph neural network
(GNN) training to identify node-level vulnerabilities, where
The recent success of Bitcoin [25] and Ethereum (ETH) [8] the first stage trains a graph classification model to assess the
has brought decentralized platforms, which allow users to vulnerability of contracts, and the second stage utilizes node
transact and interact in a trustless manner, to the forefront embeddings updated from a topology graph neural network
of attention. Bitcoin, as the first generation blockchain plat- to classify whether a specific node is vulnerable. MANDO
form, facilitates peer-to-peer digital currency transactions using can handle contracts of varying sizes, but after getting the
proof-of-work consensus algorithms. However, due to the node-level predictions, it cannot map vulnerabilities back to
limited functionality of Bitcoin scripting languages, ETH and the smart contract for localization.
its smart contracts introduced a broader scenario where users
can encode complex agreement terms directly into codes to Notwithstanding that node classification is widely studied
enable automated and trustless transactions. by the machine learning community, localizing vulnerable
nodes within the code graph itself is non-trivial and challeng-
ETH has been widely applied in various domains, from ing for the following reasons. First of all, there is a lack of real-
financial services to supply chain management and beyond. world datasets for fine-grained1 vulnerability detection model
Developers in the open-source ETH platform write their agree- training. MANDO [26] trains its GNN model on a simulated
ments into smart contracts code using Solidity and publish dataset by injecting vulnerabilities into clean contracts. These
them on the blockchain to facilitate transactions with third injections fail to represent some corner cases in real-world
parties. Like any other programming language, vulnerabilities contracts and result in lower detection accuracy. Secondly,
also reside in ETH contracts, potentially leading to loss of code graph construction in prior work [19], [20], [43] mainly
funds or theft of sensitive information. The security risks extracts the semantic information in source code but fails to
arise from the two main aspects: First, due to the distributed capture the code dependencies and hierarchies. Thirdly, as the
nature of blockchain platforms, malicious third parties may contract grows, efficiently identifying vulnerable statements
publish their smart contracts to exploit vulnerabilities for profit. with regard to time and computation cost is crucial to save
Second, as a newly developed programming language, Solidity developers’ time and resources.
has limited research on formal verification to ensure the smart
contracts are written properly. Even minor mistakes in smart 1 throughout the paper, we use “localize” and “fine-grained detection”
contract code can result in significant financial loss. interchangeably.
To tackle these challenges, we present G-Scan, the first II. BACKGROUND AND C HALLENGES
end-to-end fine-grained line-level vulnerability detection sys-
tem with evaluations on the first-of-its-kind real world dataset. A. Smart Contracts Vulnerability
Our G-Scan converts the contracts to code graphs by first get- Smart contracts are self-executing programs in Ethereum
ting their abstract syntax tree (AST) representations and adding that enables developers to create decentralized blockchain
extra edges to reflect data dependencies and code hierarchies. applications, including financial services. These contracts are
Then, it assigns node features by including function types like written in Solidity, a high-level programming language, and
While and For, variable properties like visibility and storage compiled into Ethereum Virtual Machine (EVM) bytecode for
location, and node member access like send and transfer. Each deployment on the Ethereum network. Despite their benefits,
node also processes a src attribute which memorizes the lines smart contracts can also be vulnerable to security risks, which,
where the nodes are extracted when constructing AST. if not identified in time, can lead to significant financial losses.
G-Scan trains a GCN model on the converted graphs and In this context, we outline some of the vulnerabilities related
learns the hidden features carrying the vulnerabilities. In the to financial transactions, including the reentrancy attack, which
inference stage, G-Scan is used to predict the node labels of un- is widely regarded as the most common and dangerous vul-
seen code graphs and map back to the contract statements with nerability (e.g., DAO [22]).
src attributes. We train and evaluate G-Scan on the first real- Reentrancy Attack: The adversary performs reentrancy
world fine-grained reentrancy vulnerability annotation dataset attacks [9] by repeatedly calling back into a vulnerable contract
consisting of 13,773 smart contracts and achieve 93.02% F1- before the previous invocation has been completed. It allows
score in contract-level vulnerability detection and 93.69% F1- the adversary to repeatedly siphon funds.
score in line-level vulnerability localization.
Integer Overflow/Underflow Attack: The adversary per-
Our contributions to the community are summarized as forms overflow/underflow attacks [2] on vulnerable smart
follows: contracts doing mathematical operations on unsigned integers.
It allows the adversary to transfer excessive funds from the
• We present G-Scan, the first end-to-end algorithm that overflow/underflow.
leverages graph neural networks for fine-grained line-
level detection of smart contract vulnerabilities. Access Control Attack: The adversary performs access
control attacks [1] on vulnerable smart contracts that do not
• G-Scan is a scalable and efficient framework that is properly control access to sensitive functions or data. It allows
agnostic to contract size and localizes vulnerability the adversary to steal sensitive private information from the
within 1.2 seconds for contracts with more than 6.1k vulnerable smart contract.
lines of code.
• Proof of concept evaluation is demonstrated on the B. Graph Neural Networks
first-of-its-kind real world data collection consisting of
13,773 smart contracts and a total of 5,363,793 lines Given a graph G = (V, E, A) consists of a set of nodes
of code. Each data line is labeled to indicate whether V = {v1 , v2 , ..., vn } and a set of edges E = {e1 , e2 , ..., em }.
they contain reentrancy vulnerabilities or not. For every node, vi ∈ V has a k dimension feature vector
hi ∈ Rk . The connections between nodes are deposited into
• Our results show that G-Scan can achieve 93.02% the adjacency matrix A where a nonzero number in row i and
F1-score in contract-level vulnerability detection and column j means a connection from node i to node j.
93.69% F1-score in line-level vulnerability localiza-
tion by incorporating variable dependencies and code The objective of GNN is to learn a function g that maps the
hierarchies into graph modeling. feature embedding hi of the i-th node vi to a new embedding
vector ĥi ∈ Rh which capture the node’s local and global
In summary, G-Scan provides an efficient and precise fine- information. For a multi-layer GNN model, the mapping above
grained vulnerability detection solution to smart contracts. G- is performed iteratively to help nodes update their feature
Scan also contributes the first-of-its-kind large-scale real word embeddings with the information from their neighbors via
smart contract dataset with fine-grained reentrancy vulnerabil- message passing. For every node vi , it performs message
ity annotations. Our novel graph representation and lightweight passing by receiving the feature embeddings from its neighbors
GNN-based node classification model trained on the real- j ∈ Ni , where Ni is the set of nodes adjacent to vi . The
world dataset enable line-level vulnerability detection with messages are then aggregated using a customized function to
both efficiency and accuracy compared with prior arts. We obtain an updated representation ĥi .
will open-source the code along with our collected dataset to
promote research in this area. The computations are formulated in Equation 1 where f ,
g, and ⊕ are customizable functions, e.g., convolution. Here,
Paper Organization: For the rest of the paper, we will (l) (l)
hv and hu are the node features at layer l, N (v) are the
introduce vulnerability localization background and challenges set of neighbors for node v, euv is the edge feature between
in Section II; the goal of G-Scan and the threat models (l+1)
nodes u and v, and hv is the updated features of node v
in Section III; the detailed design pipeline of G-Scan in
at layer (l + 1).
Section VI-Section V; the experiments demonstrating G-Scan’s
effectiveness in Section VIII; and the conclusion and future h(l+1) = g(h(l) ⊕ f (h(l) (l)
(1)
u , hv , euv ))
work in Section X. v v
u∈N (v)

2
C. Challenges A. Goal
To perform fine-grained line-level smart contract vulnera- Our goal is to localize the vulnerabilities within smart
bility detection, we identify the following challenges. contracts. While many works [19], [27], [37], [38], [43] have
used machine learning models to detect contract vulnerabil-
Lack of Dataset: Many real-world contract-level vulner- ities, few explored how to localize them. Nevertheless, it
ability detection datasets [38] are open-sourced for scientific is crucial to accurately and efficiently identify which line
research. However, these datasets lack line-level annotations contains the vulnerability for both developers and end-users.
that pinpoint the exact location of vulnerabilities. It brings For developers, identifying the specific line of code containing
challenges to the fine-grained detection model training, as the bug, instead of just contract-level detection and examining
high-quality data, like real-world contracts with annotations, the code line-by-line, can improve working efficiency. Smart
is essential for the machine learning models to learn hidden contracts can be inevitably complex, with intricate user agree-
patterns and achieve optimal performance. ments and transaction logics, making debugging a challenging
task, even when developers know that vulnerabilities exist in
Graph Representation: The second challenge is rep- the codebase. G-Scan helps the developers to focus on fixing
resenting code graphs with both semantic and structural in- the risky part instead of changing the entire codebase, which
formation. Previous approaches such as MANDO [26] and saves time and resources. For end users, with G-Scan, fine-
DR-GCN [19] extract heterogeneous code graphs based on grained vulnerability detection with low inference overhead
semantic information from the source code, such as critical opens a more transparent door to facilitate them to analyze
function calls/variables and temporal execution trace. However, security concerns within contracts and avoid transactions with
they did not consider the structural information that indicates malicious parties.
the code execution orders and relationships between data ele-
ments, which are where many vulnerabilities originate. Apart In summary, G-Scan contributes to improving the security
from the edge connections, designing node feature embedding of the transaction life-cycle from the following two aspects:
representations that depict function or expression types and
properties helps GNN models learn better local and global • Before publishing the smart contract, with G-Scan,
code information. Therefore, developing appropriate graph rep- one can analyze the vulnerabilities within the smart
resentations is crucial in improving fine-grained vulnerability contracts at line-level and fix them in time.
detection model performance. • When making transactions with untrusted third par-
Efficiency: Achieving efficiency in terms of inference ties, with G-Scan, one can analyze the vulnerabilities
time and cost is also challenging as the size of contracts within the smart contracts efficiently without costing
scales. For time efficiency, line-level detection algorithms like too much computation resources.
MANDO [26] use a two-level detection algorithm, where it
first classifies if a smart contract is vulnerable and then uses B. Threat Model
the node embeddings obtained from a topology graph neural As depicted in Figure 1, a Solidity developer first publish
network for line-level vulnerability classification. However, a vulnerable contract to the blockchain. Then, an adversary
performing inference over three multi-level graph neural net- exploits the vulnerabilities by publishing an attacking contract
works results in significant localization overheads. For cost to call the vulnerable contract to steal funds. G-Scan safeguards
efficiency, prior arts detect contract-level vulnerability via smart contracts by helping Solidity developers identify and fix
DNN based models [13], [15], [27] are parameter-heavy and possible security risks promptly.
require significant computation resources when contract users
intend to verify if a specific contract is vulnerable.
Attack
Mapping between Node and Statement: Another chal- Adversary Contract
lenge is establishing the relationship between nodes in the
code graph and statements in the smart contract. This is
because localizing which line is vulnerable relies heavily
on identifying the node representing the code block in the
graph representation and accurately mapping it back to the
corresponding line. While prior work such as MANDO [26]
classifies vulnerable nodes within code graphs, they did not
explicitly mention how the vulnerable nodes are mapped back Block Chain
Solidity Vulnerable
line by line. In contrast, G-Scan proposed the first end-to- Developer Contract
end vulnerability localization system by leveraging AST trees,
which enables us to accurately map vulnerable nodes back to
specific lines in the smart contract. Fig. 1: Attack Scenario. When solidity developer publishes a
vulnerable contract to the blockchain, the adversary leverages
another attack contract to exploit the vulnerabilities in the
III. G OAL AND T HREAT M ODEL vulnerable contract and steal funds from the developer.

In this section, we will first introduce the goal of our


proposed G-Scanand then introduce the potential threat models Adversary’s Objective The adversary is the malicious
that our G-Scan aims to defend. party in the blockchain and attempts to perform an attack on

3
the Ethereum platform. In this paper, we use the reentrancy a GNN-based node classification model on the code graphs
attack as an example, but note that G-Scan can be adapted to classify vulnerable nodes and clean nodes. The training is
to detect any other vulnerability types. In the reentrancy performed on a collected real-world smart contract dataset with
attack, the adversary deploys an attacking contract designed each data line annotated as either reentrancy vulnerable or not.
to exploit the vulnerabilities in the vulnerable contract. The
The code graph representation construction consists of
vulnerable contract contains a function that allows it to transfer
three steps, namely, AST generation, code graph edge gener-
funds to other contracts. The adversary adds itself as one
ation, and node feature embedding generation. We will also
of the vulnerable contract recipients by first depositing the
illustrate how the code graph is constructed by giving an
vulnerable contract funds and then withdrawing them. During
example in the end of this section.
the withdrawal execution, the attack contract repeatedly re-
enters the vulnerable contract and calls the transfer function
A. AST Generation
multiple times until all funds are stolen.
We use Solidity compiler [10] to convert the smart contract
In Solidity, the above fund transfer is implemented by into tree format, where the node represents syntactic constructs
the following three subtype data structures, where <CA> in the source code, such as statements and expressions, and
indicates the recipient’s contract or account address and x each edge represents relationships between these constructs,
describes the amount of fund to be transferred. such as function calls and value assignments. The nodes within
• call: <CA>.call{value: x} AST have two sets of node attributes: node type and source
src. The node type describes the functionality of the source
• transfer: <CA>.transfer(x) code, while the source src stores which part of the source
code the node comes from. The edges in AST are not explicitly
• send: <CA>.send(x)
defined but are generated by the nested child nodes within each
The difference among the three subtypes mainly arises node.
from their implementations: Firstly, the upper transaction
limit of transfer or send is 2,300 gas (a unit to measure B. Code Graph Edge Generation
funds in ETH platform), while call has no such limitations. In addition to the edges being implicitly defined by the
Therefore, if an adversary exploits the call subtype, it can hierarchically nested nodes in the AST, we also add the
result in a greater loss of funds. Secondly, call cannot following edges to represent the structure and semantics of
automatically handle exceptions thrown by the called contract, the underlying source code. These edges aim to obtain the
which increases the potential for reentrancy attacks. Thirdly, representation by adding control flow, hierarchy between the
due to the transaction limitations in transfer and send, nodes, and existing data dependencies between the nodes into
some developers may opt to use call without mitigating the the code graph. Specifically, we add the following edges
reentrancy vulnerabilities. into the code graph: (1) AST Hierarchy Edges; (2) Control
However, possible reentrancy attacks can still be performed Flow and Ordering Edges; (3) Reference Edges; (4) Branching
on transfer and send for several reasons. Firstly, if these two Edges; (5) Loop Edges; and (6) Break, Continue, and Return
subtypes requests call vulnerable contracts for funds exceeding Edges. After adding these edges to the code graph, we convert
the upper limit, exceptions can still happen, and if not handled it into a directed homogenous graph for future classification
properly, reentrancy attacks can be executed to steal funds. in Section V.
Secondly, if Ethereum Virtual Machine is upgraded and the AST Hierarchy Edges: The first type of edges are the
maximum fund transaction limit changes, reentrancy attacks hierarchy edges generated by AST. Each node in the AST can
can still bring fund loss. Additionally, transfer will throw an have other child nodes defined in its node types, meaning the
exception if gas is depleted, leading to a state change reversal. node and its child nodes have hierarchical relationships. As
shown in Figure 3, for example, for node Block, it has node
Adversary’s Capacity We consider the most general attack
types related to statements 1 to statements k and Block
case where the adversary is one of the contract creators. He
is the parent node of these child nodes. To represent the parent-
or she has access to the public data structure on the chain and
child relationship in the code graph, we add two edges, one
can upload his or her contract to the Ethereum system but does
from the parent to the child and vice versa.
not have access to the detection of G-Scan.
Control Flow and Ordering Edges: Apart from the
IV. G RAPH R EPRESENTATION hierarchy information defined by AST, we also add the control
flow edges indicating the relationship between statements
In this and subsequent sections, we will first present how in the child nodes. There are three node types containing
smart contracts are converted into code graphs, how to train function statements, namely, Block, UncheckedBlock, and
GNN models to classify nodes as vulnerable or not, and YulBlock. The function statements in the node types are
then introduce how we collected our reentrancy vulnerability given by the order they are executed in the original smart
dataset, along with detailed statistics. contracts. As shown in Figure 3, for example, if a Block
contains k statements, in addition to the AST edges added
A more general pipeline of our G-Scan is shown in
to indicate their hierarchy information, we also add edges
Figure 2. We begin by converting the smart contract into
between statements to explain the order they are executed.
a code graph through AST representations that extract code
structural information. We then add extra edges and node When a given node in the AST has the same child node
feature embeddings to reflect code hierarchies. Next, we train belonging to two or more node types, ordering edges are

4
GNN

Smart Contract Abstract Syntax Tree Code Graph GNN Training


Graph Representation

Unseen Line: 12, 31


Contract

Line: 35, 67
GNN
Real-world Line-level Reentrancy Vulnerable Node
Contracts Vulnerability Annotation Clean Node
Data Collection GNN Inference

Fig. 2: G-Scan overview. First, we obtain the graph representation of the smart contract by converting it into an Abstract Syntax
Tree (AST) representation and adding additional edges and node feature embeddings to reflect code hierarchies and dependencies.
Next, we train a Graph Neural Network (GNN) model on the code graphs to classify vulnerable nodes and perform inference
on unseen smart contracts. We train and evaluate the GNN model on a collected real-world smart contract dataset, with each
data line annotated to indicate whether it contains reentrancy vulnerability.

Node Type Start End


Block
AST AST
IndexAccess baseExpression indexExpression
baseExpression startExpression
AST AST IndexRangeAccess
startExpression endExpression
FunctionCall arguments expression
Statement 1 Statement 2 Statement 3 ··· Statement k FunctionTypeName parameterTypes returnParameterTypes
CF CF CF CF Assignment leftHandSide rightHandSide
BinaryOperation leftHandSide rightHandSide
Fig. 3: Control flow edges between k statements inside a block FunctionDefinition parameters returnParameters
YulFunctionDefinition parameters returnVariables

TABLE I: Ordering edge directions


introduced to distinguish each connection. For example, node
type Mapping connects keys node A to values node B, where
A and B can be either the same or different. In its AST dependency between the usage of the same variable without a
representation, the key node connects with the value node variable name, we introduce a new edge type called reference
by attribute keyType defined in key node type. The value edge. The identifier node types are linked with their declaration
node connects with its key node with valueType defined node by the reference edge, which is always added in both
in the value node type. When building AST, the connection directions, from the identifier node to the declaration node
is replaced by the undirected AST edges, which is hard and vice versa. Functions are treated in the same manner as
to represent the mapping relationship between keyType and variables, where reference edges are added from the identifier
valueType. Therefore, we add an ordering edge to from node to its declaration nodes and vice versa.
keyType node to valueType node to represent the mapping
relationship. Branching Edges: In the code graph representation,
branching code blocks are only mapped to child nodes without
Apart from Mapping, we summarize other node types further representation of the contexts between child nodes.
which also have similar ordering problems and how to add To handle the branching, we add edges between the nodes
ordering edges in Table I representing the condition, the true branch, and the false
branch. Figure 4 shows how branch conditions are handled for
Reference Edges: To ensure two smart contracts with the different functions. In the case of If statements and Conditional
same functionality but different function or variable names statements, three child nodes connect to the statement. One
have the same code graph, we introduce reference edges. At node represents the condition and connects the statement with
variable level, AST introduced declaration node and assigned control flow edges, one represents the true branch and connects
it a unique identifier attribute id. If the variables are again the condition with true body edges, and the other represents
used in the smart contracts, the identifier nodes representing the false branch and connects the condition with false body
used variables will call the declaration node. To reflect the data edges.

5
IfStatement

Conditional
YulIf YulCase
CF
YulSwitch
CF
condition
CF CF
CF CF AST AST
CF CF
condition
AST AST AST
true body true body
false true
falseBody body body trueBody
false true condition body expression YulCase 1 ··· YulCase n value body
falseExpression body body trueExpression

(a) If statement (b) Conditional (c) If statement of Yul (d) Switch statement of Yul (e) Case statement of Yul

Fig. 4: Graph structure of the node types referring to branchings

WhileStatement
ForStatement DoWhileStatement false
CF false YulForLoop
false AST
CF body CF
AST AST body body AST false AST
AST AST CF AST AST body
true body AST true body
pre condition body CF
condition body pre condition body
CF body condition CF
AST true body true body AST
CF CF

CF loopExpression CF CF post

(a) For loop (b) Do while loop (c) While loop (d) For loop of Yul

Fig. 5: Graph structure of the four loop types

For If statements in Yul, which only have a true branch and In Do while loop and While loop, the body node is
ignore the statements in the true body for false conditions, we executed first, and then the condition is evaluated. Therefore,
drop the false body edges. In the case of Switch statements we add one false body edge from condition node either to
in Yul, which consist of a switch statement with expressions the DoWhileStatement or to the WhileStatement node; and
and several case statements corresponding to the expressions, one true body edge from condition to the body node, and a
the switch statements are mapped by the AST edges between control flow edge from the body to the condition node. We
the node type YulSwitch and case statement nodes of type also add a control flow edge from either DoWhileStatement
YulCase. node and WhileStatement node to the condition node
meaning the entry point of the while loop.
For Case statements in Yul, AST edges are added between
the YulCase node and the two child nodes at the node prop- For the For loop of Yul, we add edges similar to the For
erties body and value. Additionally, a true body edges loop. However, the attribute post gives the loop update here
is added from value to body to reflect the body execution instead of the attribute loopExpression.
condition. Break, Continue, and Return Edges: We also consider
six node types that are related to leaving or continuing a
Loop Edges: We also include four loop edges that indicate loop: Break, Continue, and Return, as well as their Yul
variable dependencies in the code graph. The four loop types counterparts YulBreak, YulContinue, and YulLeave. For
in Solidity are for, do-while, while, and for loop of Yul. Figure Break and YulBreak, we add a control flow edge from
5 shows how the loop edges are added to the code graph. the break statement node to the loop node, indicating the
For the For loop, there are four child nodes in the loop break enforces the node to leave the loop. For Continue and
statement. When entering the loop, an initialization statement YulContinue, we add a control flow edge from the continue
with loop parameters is executed and saved in the pre node. statement node to the loop update node to show that the loop
The initialization statement connects to the for statement via variable has been updated. In the case of YulLeave or Return,
a control flow edge. Then, the loop condition is evaluated by we use control flow edges from the leave statement node to
the condition node by adding a control flow edge between the function definition node to represent the loop has exited.
the pre and condition nodes. If the evaluation result is
C. Node Feature Embedding Generation
false, the loop is exited, as reflected by a false body edge
inserted from the condition node to the ForStatement The nodes in the code graph are derived from the AST,
node. If the evaluation result is true, the body node is executed, which provides a list of attributes that describe the main
and a true body edge is added from the condition node to functionalities of each node and which part of the source code
the body node. When the execution is finished, the loop is it comes from. Empirically, [6] has identified 73 node types,
updated, and the loop condition is re-evaluated. To reflect this, but using one-hot encodings of all 73 types can introduce
we add two control flow edges, one from body node to the unnecessary computation overhead since some types have
loopExpression node and one from the loopExpression similar meanings and functions. Additionally, certain attributes
node to the condition node. may contribute more to the vulnerability than others.

6
To address these issues, we merged similar node types body node is executed, we update i following post, and then
and created a 29-dimensional feature embedding. The first check if the loop condition node condition is still satisfied.
20 dimensions capture information about different function or The red and green loop edges connect the control flow edges.
variable node definitions, the next eight dimensions describe If the condition node is true, a true body edge goes from
node properties, and the final dimension indicates Solidity the condition node to the body node to continue the loop. If
member access, specifically whether it involves send or the condition node is false, a false body edge goes from
transfer. the condition node to the YulForLoop node to exit the
loop. Since there are no branch, return, variable, and function
Table II displays some of the dimension information, where
definitions in the code, we do not add these edges to the code
the first column denotes which dimension it comes from, the
graph.
second column denotes the value of the specific node type
it corresponds to in the third column, and the last column is
additional descriptions. For the complete list of 29-dimensional V. GNN T RAINING AND I NFERENCE
vectors, please refer to the appendix. In this section, we present how we use GNN models
to classify vulnerable nodes within code graphs. We train a
Dim Values Node Type Description GNN-based node classification model on the code graphs with
1 DoWhileStatement annotated ground truth to predict vulnerabilities. The details of
0
2 WhileStatement
Loop information how we acquire the dataset are summarized in Section VI.
3 ForStatement
4 YulForLoop
1 ’internal’
A. Node Classification Training
2 ’external’
20 3 ’private’ Different values of visibility The objective of the node classification model is to deter-
4 ’public’ mine whether a given node is vulnerable. We train a seven-
5 unknown value layer GCN model to predict the node label, where 1 means the
28
1 ’transfer
Member access of attribute memberName node is vulnerable and 0 means the node is clean. The GNN
-1 ’send’
model is trained by minimizing the cross-entropy between
TABLE II: Description of the node feature vectors predicted labels and ground truth, as shown in Equation 2.
N is the total number of classes, Gi is the binary indicator (0
or 1) if class i is the correct classification for this code graph,
Example We demonstrate the process of generating the and Pi is the predicted probability that class i is the correct
code graph shown in Figure 5d from the Solidity code below. classification for this code graph.
The function’s purpose is to withdraw a specified amount (
_amount) of money from the ExampleYul contract and transfer N
X
it to a specified target address (_to). L(G, P ) = − Gi log(Pi ) (2)
pragma s o l i d i t y ^0.8.19; i=1
c o nt r a c t ExampleYul {
f u n c t i o n withdrawYul ( address _to , uint256 The detailed model architecture is summarized in Table III,
_amount ) e x t e r n a l payable { which takes the contract code graph as input and predicts the
assembly { binary labels for each node. The GNN model consists of seven
for { let i := 0} lt (i , 2) { i := add (i , 1) } {
// loop body GCN layers and a ReLU activation function after each layer.
} Then, the updated feature embeddings are passed through three
} linear layers, followed by a Softmax activation function to
// other solidity text generate binary label masks.
}
}
Layers Input Size Output Size Activation Function

In the Yul Loop, variable i is initialized to 0 using let i GCNConv0 29 500 ReLU
GCNConv1 500 500 ReLU
:= 0; if the condition lt(i, 2) is satisfied, we increment i GCNConv2 500 500 ReLU
by 1 using i := add(i, 1). When i satisfies the condition, GCNConv3 500 500 ReLU
GCNConv4 500 500 ReLU
the loop body is executed. GCNConv5 500 500 ReLU
GCNConv6 500 500 ReLU
To generate the code graph from this smart contract, we Linear0 500 300 ReLU
extract the following nodes from the AST: YulForLoop, which Linear1 300 100 ReLU
defines the loop function; pre, which initializes the variable i; Linear2 100 2 Softmax
condition, which determines when to exit the loop in lt(i,
2); and post, which updates i using i := add(i, 1). The TABLE III: GNN architecture
loop body is represented by the body node.
Next, we add the following edges to the code graph. The
B. Node Classification Inference
first set of blue edges represents the parent-child relationships
in the AST. Then, we add black control flow edges to indicate In the inference stage, an unseen smart contract is first con-
how the code is executed. First, the pre node initializes the verted into an AST representation using the Solidity compiler.
variable i, followed by checking if i satisfies the loop condition Then, we add additional edges and node features to form a
node condition. When the loop body represented by the code graph following the process outlined in Section IV. Next,

7
the code graph is fed into the trained GNN model for inference. the node was labeled as such; otherwise, it was labeled as
The final Linear2 layer in Table III predicts the labels for each non-vulnerable. The final reentrancy vulnerability dataset only
node. Once we obtain the predicted node labels in the code contains smart contracts with three subtype functions with
graph, we map them back to the original smart contract using 13,773 smart contracts.
the node’s src attributes, which enable us to determine which
line the node corresponds to. If a node is labeled as vulnerable, B. Dataset Statistics
the corresponding line is also marked as vulnerable.
The dataset is split into training, validation, and test sets,
with details provided in Table IV. In the table, the acronym VG
VI. DATA C OLLECTION represents the number of vulnerable graphs, which corresponds
This section presents how we collect smart contracts and to the number of contracts in each dataset. Similarly, the
annotate the reentrancy vulnerabilities on each data line. We acronym VN represents the total number of vulnerable nodes
also provide visualizations of the dataset statistics. within the corresponding Solidity code graphs.

Subtype Dataset VG non-VG VN non-VN


A. Dataset Construction
training set 573 11,040 28,900 11,468,676
Our ETH smart contracts are constructed using a previous Call validation set 121 959 5,430 1,447,293
test set 118 962 6,603 1,401,601
version of the SmartBugs Wild Dataset [17] called Pecu-
liar [38]. The dataset construction process involved two stages. training set 1,067 10,546 61,508 11,436,068
Send validation set 200 880 10,678 1,442,045
In the first stage, we clean the collected smart contracts and test set 226 854 13,207 1,394,997
split them into training, validation, and test subsets. In the training set 1,572 10,041 83,469 11,414,107
second stage, we annotate each data line in the smart contracts Transfer validation set 320 760 16,782 1,435,941
to identify the presence of reentrancy vulnerabilities. test set 346 734 18,213 1,389,991

Preprocessing Although the previous Peculiar dataset un- TABLE IV: Dataset statistics on each subtype, VG represents
derwent cleaning, the duplication rate remains over 50%. These vulnerable graphs number, VN represents vulnerable nodes
can occur in two ways: (1) one contract may have been copied number
from another by adding a few white lines or comments; and (2)
one contract may have been copied from another, with changes
only made to variable names, function names, or variable In addition to the statistical information, we also present
values. Moreover, many contracts lacked their reference files, visualizations to demonstrate the code graph data distribution.
making it impossible to construct an AST tree. Figure 6 displays the code length distribution of smart con-
tracts, as well as the nodes and edges distribution of each
To address these issues, we adopted a two-step approach. code graph. The first and second figure provides insights into
First, we generated an Abstract Syntax Tree (AST) tree from the size and complexity of the code graphs, as measured by
the Solidity compiler, which removed white space characters the number of nodes and edges, respectively. The third figure
and assigned comment lines to an ignored node type. Next, we shows the distribution of contract length in terms of the number
constructed the Solidity code graph by leveraging information of lines of code.
on the smart contract code structure and the AST tree (refer
to Section IV for details on graph construction). This step
VII. I MPLEMENTATION
removed intermediate values, as well as variable and function
names. After obtaining the code graphs, we grouped the smart In this section, we present the general workflow for imple-
contract codes based on the similarity of their sha256 hashes. menting G-Scan at both training and inference time.
We selected one representative contract from each group,
resulting in a total of 22,237 smart contracts, which are less A. G-Scan Training Implementation
than half of the original dataset containing 46,057 contracts.
The pipeline for our approach consists of three main steps:
Annotation In the Peculiar dataset [38], reentrancy vulner- (1) constructing an AST from the source files; (2) converting
ability was only annotated at the contract-level for the call the AST into a code graph with vulnerability labels; and
subtype. To achieve more accurate line-level annotation, we (3) training a GNN-based node classification model. In the
expanded the annotation from the call subtype to include all following subsection, we provide a detailed description of how
three subtypes: call, send, and transfer. Contracts that did each step is implemented.
not include these three subtype functions were deemed non-
vulnerable. AST Construction We construct ASTs from source code
files using Solc [10]. The AST construction requires the
In the remaining contracts, we manually inspected the data smart contracts in one file. Therefore, if the smart contract
lines containing the three subtype functions to determine if is written in multiple files, we merge them into a single file
a state change was made after the money transfer, and if before performing AST conversion. The following is a sample
there was no reentrancy lock in place. If these conditions Solidity program, where x.x.xx represents the Solidity version
were met, we annotated the lines as vulnerable. We then in which the contract was written. It is worth noting that ASTs
mapped these annotations to the code graphs. Each node in generated by Solc versions below 0.4.12 are not compatible
the code graph was assigned an attribute src, indicating the with newer versions, and only a small number of contracts are
original smart contract line from which the node was derived. written in these older versions. Therefore, we did not include
If the corresponding code line was marked as vulnerable, them in our dataset collection.

8
(a) Node Number Distribution (b) Edges Number Distribution (c) Lines of Code Distribution

Fig. 6: G-Scan dataset visualization on code graph node number, edge number, and smart contract lines of code

pragma s o l i d i t y x . x . xx ; from u t i l s import d i c T o N o d e s


from g r a p h c o n s t r u c t import c r e a t e R e f e r e n c e E d g e s ,
createNodeFeatureVectors
We then convert the line-level annotations from smart
contracts to AST node labels, as shown in the annotateLa- g r a p h = d i c T o N o d e s ( AST )
bel function below. It takes AST’s node attributes src list g r a p h = c r e a t e R e f e r e n c e E d g e s ( g r a p h , AST )
(node2sourceCode), smart contract’s source code annotation g r a p h = c r e a t e N o d e F e a t u r e V e c t o r s ( g r a p h , AST )
list (GTlabel), and the source code at byte level (codeBit) as
input. The GTlabel variable stores which line in the original The dicToNodes function is used to extract nodes and
smart contracts are vulnerable. edges from the AST and form the initial code graph. The AST
def a n n o t a t e L a b e l ( node2sourceCode , GTlabel , c o d e B i t ) :
object contains a dictionary with node mappings to the source
labels =[] code, as well as node and edge lists, edge types lists, and a list
i f sum ( G T l a b e l ) ==0: indicating breaks, continues, or returns in substructures. The
return [0]* len ( node2sourceCode ) graph object, on the other hand, includes a node mapping from
the code graph id to the AST node id, as well as node and
for s r c in node2sourceCode :
s r c = s r c . s p l i t ( ":" ) edge lists. The resulting code graph is stored in a PyG data
sBit=codeBit [ : int ( src [ 0 ] ) ] object to facilitate future GNN training.
eBit=codeBit [ : ( sBit+int ( src [ 1 ] ) ) ]
After creating the initial AST tree, we use the
s t a r t = l e n ( s B i t . d e c o d e ( "utf −8" ) . s p l i t l i n e s ( ) ) createReferenceEdges function to add additional edges
end = l e n ( e B i t . d e c o d e ( "utf −8" ) . s p l i t l i n e s ( ) ) to the code graph as described in Section IV-B. We also
c o d e L i n e = [ * range ( s t a r t , end + 1 , 1 ) ] add additional node features to the code graph using the
l i n e L a b e l =[ GTlabel [ i ] for i in codeLine ]
createNodeFeatureVectors function, which takes the code
i f sum ( l i n e L a b e l ) >0: graph and AST node properties as inputs and adds node
l a b e l s . append ( 1 ) features as described in Section IV-C.
else :
l a b e l s . append ( 0 ) GNN Training The GNN model for node classification is
trained with the code graphs in the training set and evaluated
return l a b e l s
with the code graphs from the validation and test set. We
add additional training details, including the hyperparameters
If there are no vulnerable annotations in the current smart in Table V. The architecture information is summarized in
contract, we return a set of 0 to indicate that the contract is Table III.
clean. However, if the contract has vulnerable annotations, we
proceed to analyze each node in the AST representation to Variables Settings
obtain node-level labels. Since the src attribute in the AST Epoch 1600
representation points from each node to the corresponding Batch size 100
position in the original contract at the byte level instead of Optimizer adam
Learning rate 0.0001
the line number, we need to decode the code byte into utf-8 Loss function cross entropy
format to determine the exact line number. We calculate the
decoded line length and use this information to label each TABLE V: GNN training hyperparameters
node as vulnerable (labeled as 1) if one of the corresponding
lines is marked as vulnerable in the GTlabel. If none of
the corresponding lines are marked as vulnerable, the node
is labeled as not vulnerable (labeled as 0). B. G-Scan Inference Implementation
Code Graph Generation The ASTs are then converted to We utilize the trained GNN for node classification to
code graphs as described in Section IV. To generate the code predict the location of vulnerabilities. As with the training
graph, we use the following functions: process for G-Scan, we first convert the source code file into

9
an AST representation using the Solidity compiler, and then Simulated Dataset [26]: This dataset includes 31 smart con-
transform it into a code graph representation without additional tracts related to reentrancy vulnerability used by MANDO
labels indicating vulnerability. We then feed the code graph as their test set. The contracts consist of synthetic samples
into the GNN model for node classification inference and created by injecting vulnerabilities into clean smart contracts
obtain a binary mask that measures whether each node is and public smart contracts. All samples include line-level
vulnerable or not. If none of the nodes are vulnerable, we annotations.
predict that the contract is clean. Otherwise, we predict which
nodes are vulnerable. After obtaining the node prediction C. Evaluation Metrics
results, we convert them back to code lines using the src node In the experiment, we benchmark the model performance at
property and mark the vulnerable lines on the source code file. both line-level and contract-level. For line-level classification,
The process of code graph generation and inference are the we use G-Scan to determine if a node is vulnerable. For
same as described in Section VII-A. Here, we will focus on contract-level classification, after obtaining the classification
the mapping from code graph to the original smart contract. result of each node, if there is a vulnerable node, we classify
As illustrated below, mapping the node-level predictions back the contract as vulnerable; otherwise, it will be classified as
to the smart contract can be viewed as a reverse process of non-vulnerable.
AST construction. We use the predictLabel function, which In order to evaluate the performance of different models,
takes in the node predictions (NodeLabel), the source code we compared their classification results with the annotated
at the byte level (codeBit), the source code length (length), ground truth from Sec.VI, and computed the confusion matrix
and the AST’s node attributes src list (node2sourceCode) as at both the line-level and contract-level, including True Positive
inputs to generate the line-level predictions (LineLabel). (TP), True Negative (TN), False Positive (FP), and False
d e f p r e d i c t L a b e l ( n o d e 2 s o u r c e C o d e , NodeLabel , c o d e B i t , Negative (FN). We then used the following metrics to evaluate
length ) : the performance of the models based on the confusion matrix:
l a b e l s =[0]* len ( length )
i f sum ( NodeLabel ) ==0: Accuracy (A): The proportion of correctly identified graphs
return [0]* len ( l e n g t h ) or nodes, which is formulated as T P +FT PP +T N +F N .
+T N

f o r i , s r c i n enumerate ( n o d e 2 s o u r c e C o d e ) : Recall (R): The proportion of correctly identified vulnerable


s r c = s r c . s p l i t ( ":" ) graphs or nodes, which is formulated as T PT+F
P
N.
sBit=codeBit [ : int ( src [ 0 ] ) ]
eBit=codeBit [ : ( sBit+int ( src [ 1 ] ) ) ] Precision (P): The proportion of identified vulnerable graphs
or nodes that are actually vulnerable, which is formulated as
s t a r t = l e n ( s B i t . d e c o d e ( "utf −8" ) . s p l i t l i n e s ( ) )
T P +F P .
TP
end = l e n ( e B i t . d e c o d e ( "utf −8" ) . s p l i t l i n e s ( ) )
c o d e L i n e = [ * range ( s t a r t , end + 1 , 1 ) ] F1 score: A harmonic mean that combines precision and recall,
∗R)
i f NodeLabel [ i ] >0: which is formulated as 2∗(P
(P +R)
l a b e l s [ codeLine ] = 1
else : D. Baselines
l a b e l s [ codeLine ] = 0
For contract level classification, we compare G-Scan’s
return l a b e l s performance with the following baselines on Peculiar
Dataset [38]. For line level classification, we compare G-Scan
with MANDO on its simulated dataset.
VIII. E XPERIMENTS
MANDO [26]: It first uses a topology GNN to obtain node
A. Experiment Setup embeddings of a heterogeneous code graph and then classifies
the vulnerability at the graph level. If the code graph is
G-Scan is implemented with Python 3.8.10 and bench- vulnerable, it trains a node classification model to identify the
marked on Linux Mint 20.3. Our workflow is built upon vulnerable nodes.
PyTorch [34] version 1.9.0 and PyG [32] version 2.0.4. The
GNN model is trained on NVIDIA A16 graphic card consisting DR-GCN [43]: It first converts the contract into a symbolic
of four GPUs each with 16 GB RAM, and 4 CPUs with 128 graph and normalizes the graph by performing node elimina-
GB RAM. tion. The resulting code graph is classified using a degree-free
GCN.
B. Dataset TMP [43]: The graph processing step is the same as DR-GCN,
but it uses a temporal message propagation network to classify
To evaluate the performance of various vulnerability detec- code graphs.
tion models, we employed the following three datasets:
AME [20]: It utilizes a rule-based approach to extract local
Peculiar Dataset [38]: This dataset consists of 46,057 smart expert patterns and convert the code into a semantic graph to
contracts with contract-level reentrancy vulnerability annota- extract global graph features. These two features are then fused
tion for subtype call. using an attention-based network to predict vulnerabilities.
G-Scan Dataset: Our collected dataset of 13,773 contracts Peculiar [38]: It first extracts the dataflow of important source
with line-level reentrancy vulnerability annotations for sub- code variables and creates a crucial dataflow graph. The
types call, transfer, and send, as described in Section VI. vulnerability is then classified using GraphCodeBert.

10
As mentioned earlier, SCScan [14] also performs line-level classification performance of G-Scan, along with other base-
vulnerability detection. However, at the time of submission, lines. However, since prior work reported their performance
SCScan’s code was not open-sourced, and their results were on the Peculiar Dataset, which only annotates reentrancy
not reported on an open benchmark. Therefore, we could not subtype Call, and some of them did not open-source their
compare their performance with that of G-Scan in this paper. code, we were unable to benchmark them on our extended
reentrancy vulnerability dataset in Section VIII-E. Therefore,
E. Results on G-Scan Dataset we benchmark G-Scan on the Peculiar Dataset and report
our results in the first row of the table. We note that prior
We present the graph-level and node-level classification works mainly constructed code graphs based on semantic
results on the G-Scan dataset’s validation and test part in Table information, whereas G-Scan relies on AST to generate code
VI. The table displays the accuracy, precision, recall, and F1- graphs. If a smart contract is missing its reference file, we
score for three reentrancy subtypes: Call, Send, and Transfer. cannot construct AST and perform inference on the contract,
which is reasonable as one cannot determine the vulnerability
Based on the results presented in the table, we can make the
without checking the entire contract code. Additionally, during
following observations for node-level classification: Firstly, the
G-Scan inference, we removed duplicated code graphs, which
classification accuracy for all datasets and reentrancy subtypes
accounted for more than 50% of the total.
is above 99%, indicating that G-Scan can successfully localize
vulnerable nodes in real-world datasets. It demonstrated the Based on the results, we observe that even when the
effectiveness of our proposed pipeline. Secondly, node-level duplicates are removed, G-Scan can still achieve 8% better
F1-score are higher than 90% for most general vulnerable accuracy than the baseline methods. Although Peculiar [38]
reentrancy subtype Call, and higher than 79% for the rest two achieves a slightly better F1-score than G-Scan, it cannot lo-
subtypes, showing G-Scan can benefit reentrancy vulnerability calize reentrancy vulnerabilities. Moreover, among the baseline
localization in real world smart contracts. methods, only MANDO performs fine-grained vulnerability
In terms of graph-level classification, we can find the localization, and G-Scan achieves a 21% better F1-score at the
following: Firstly, the classification accuracy for all reentrancy contract level detection. It is noteworthy that even though G-
subtypes is higher than 89%. This indicates that even though Scan is designed to localize vulnerabilities at the node level, it
G-Scan is designed for line-level vulnerability localization, it still achieves good performance at the contract level detection.
can still effectively classify vulnerable contracts. Secondly,
the F1-scores for all three reentrancy subtypes are higher Method Accuracy F1-Score
than 85%, demonstrating that G-Scan can successfully detect G-Scan 98.15% 91.74%
potential vulnerabilities. MANDO [26] 75.80%
DR-GCN [43] 81.47% 76.39%
TMP [43] 84.48% 78.11%
Subtype Dataset Level Accuracy Precision Recall F1-Score
AME [20] 90.19% 87.94%
validation set node 99.95% 94.88% 92.52% 93.69% Peculiar [38] 99.81% 92.10%
test set node 99.92% 93.23% 89.26% 91.20%
Call
validation set graph 98.33% 87.59% 99.17% 93.02% TABLE VII: Comparison with baselines on Peculiar Dataset.
test set graph 97.31% 80.27% 100% 89.06%
validation set node 99.86% 91.34% 89.05% 90.18%
test set node 99.77% 92.55% 82.65% 87.32%
Send
validation set graph 94.72% 81.07% 98.50% 88.94% 2) Node-level Performance: We compared G-Scan with
test set graph 95.46% 80.07% 99.56% 88.76% MANDO on a simulated dataset since it was not possible to
validation set node 99.52% 79.58% 79.19% 79.39% directly compare the two methods on our collected G-Scan
test set node 99.53% 83.61% 78.92% 81.19%
Transfer
validation set graph 89.91% 74.71% 99.69% 85.41%
Dataset. This was because MANDO did not provide explicit
test set graph 90.74% 78.21% 98.55% 87.21% details on how their contract graph is mapped back to the
smart contract, and line-level accuracy could not be compared.
TABLE VI: G-Scan performance on different reentrancy sub- Additionally, directly comparing the two methods at the node
types level was not possible as MANDO used metapaths to construct
the code graph while G-Scan used the AST to generate the
code graph.
In Figure 7, we display the F1-scores on the train and
validation datasets for the three reentrancy subtypes. From Therefore, to compare the two approaches, we use the
the figure, we observe that as the training progresses, G-Scan simulated dataset collected by MANDO. One of the smart
achieves an average F1-score of over 90% on both the train and contracts could not be compiled, while another one belongs
validation datasets for the most common subtype, Call, at both to the subtype transfer, which was correctly identified by
the node-level and graph-level. The average F1-scores for the G-Scan. As shown in Table VIII, G-Scan achieved 91.28% F1-
subtypes Transfer and Send are also higher than 80%. The score on call reentrancy, outperforming its baseline MANDO.
stable curve indicates that G-Scan continuously learns from the This result suggests that by incorporating code hierarchy
train dataset and generalizes well toward the validation dataset. information into graph modeling and training on real-world
datasets, G-Scan outperformed MANDO in vulnerability local-
F. Comparison with Baselines ization. We also note that MANDO’s effectiveness is heavily
reliant on identifying vulnerabilities at the contract level, and
1) Contract-level Performance: In Table VII, we present its classification F1-score for certain vulnerability types, such
the accuracy and F1-score of the contract-level vulnerability as reentrancy, falls to 75.80%.

11
(a) Subtype Call at node level (b) Subtype Send at node level (c) Subtype Transfer at node level

(d) Subtype Call at graph level (e) Subtype Send at graph level (f) Subtype Transfer at graph level

Fig. 7: The F1 score evaluated during training for train and validation dataset on three subtypes at both node and graph level

Method Accuracy F1-Score LoC AST (ms) CGG (ms) Prediction (ms) Total (ms)
G-Scan 97.44% 91.28% 5 3.60 1.19 5.05 9.84
Mando [26] - 86.40% 197 31.29 6.04 9.96 47.29
245 55.89 12.92 14.15 82.96
5815 901.98 178.08 95.26 1175.32
TABLE VIII: Comparison with MANDO on Simulated 6151 850.70 191.48 108.25 1150.43
Dataset.
TABLE IX: G-Scan overhead on AST construction (AST),
code graph generation (CGG) and prediction
G. Inference Overhead
We present the inference overhead of smart contracts
vulnerability conditions can exist in the source code or not.
with varying lengths in Table IX. The inference overhead is
In smart contract vulnerability detections, symbolic executions
measured for five smart contracts with different Lines of Code
are always combined with other rule-based algorithms to detect
(LoC) and includes the time it takes for AST construction, code
potential security risks. Osiris [36] detects integer bugs with
graph generation, and GNN predictions. AST construction and
symbolic execution and additional taint analysis, which is a
code graph generation are performed on the CPU, while GNN
kind of tracking of the data across the control flow. SmarTest
prediction inference is performed on the GPU.
[30] applies symbolic execution on the smart contract language
From the table, we can find the following: Firstly, the models and creates transaction sequences to detect bugs.
majority of the G-Scan overhead comes from AST construction SAILFISH [7] uses a hybrid approach for source codes, where
using Solidity compilers. As the size of the smart contract a storage dependence graph performs analysis of side effects
increases, the percentage of time required for AST construction on the storage variables during the execution and follows by
as a fraction of the total overhead increases. This is because the symbolic evaluation to detect vulnerabilities.
the compiler requires more time to parse larger files. Secondly, Fuzzing: Fuzzing [12], [16], [28], [29], [31], [35] uses
despite an LoC value as large as 6.1k, the total inference time different algorithms to automatically generate inputs poten-
remains under 1.2s, demonstrating the efficiency of G-Scan. tially trigger errors or unexpected behaviors to detect smart
IX. R ELATED W ORK contract vulnerabilities. ContractFuzzer [16] analyzes the ABI
A. Smary Contract Vulnerability Detection interfaces of smart contracts to generate inputs that conform
to the invocation grammars of the smart contracts under test.
Symbolic Execution: Symbolic execution [3], [18], [30], It defines new test oracles for different types of vulnerabilities
[33], [36] takes the smart contracts bytecode as input and uses and instrument EVM to monitor smart contract executions
symbolic expression to represent the contracts. Then, it uses to detect security vulnerabilities. Echidna [12] incorporates
Satisfiability Modulo Theory (SMT) solver to prove if certain a worst-case gas estimator into a general-purpose fuzzer.

12
When a property violation is detected, a counterexample is like C/C++. AST can extract better structure and variable/func-
automatically minimized to report the sequence of transactions tion dependency from source code. Some recent work [39],
that triggers the failure. [41] also applied AST based graph analysis to smart contract
vulnerability detection. Allamanis et al. [4], [5] proposed a
Machine Learning: Machine training [11], [37], [42]
program graph based on the node connections in C#’s AST. It
based algorithms take the smart contracts as text data or
first transforms the C# source code into AST representation
graph data and use different machine learning algorithms to
using compilers. Then, the nodes in the tree are replaced
classify if a smart contract is vulnerable or not. The machine
by corresponding source code tokens and connected with
learning algorithms enable the model to learn the hidden
edges that reflect the original execution order. To represent
feature representation of smart contract code that previously
the control and dataflow structure, they add additional edges,
introduced rule-based algorithms failed to catch. Most of
for example, between all occurrences of the same variable or in
them [38], [42], [43] classify vulnerabilities at contract level,
condition statements to indicate all valid paths. Code Property
and others first perform contract level vulnerabilities detection
Graphs [40] proposed to construct graphs from the C/C++’s
and then localize the bugs at line level like MANDO [26]
AST by combining it with edges from control flow and
and SCScan [14]. SC-VDM [42] converts the bytecode into a
program dependence. The control flow edges are inserted for
greyscale image and applies a convolutional neural network to
subsequent statements, loops, returns, and similar constructs.
classify whether a smart contract is vulnerable. Liu et al. [43]
The program dependence edges are inserted to reflect the
propose to apply a graph convolutional neural network on a
influences of other variables or predicates on a certain variable,
contract graph of the source code to detect bugs by graph
such as the definition or variable assignment.
classification. In the follow-up papers, they [19], [20] inte-
grate additional expert patterns to increase detection accuracy.
Peculiar [38] extracts the dataflow of important variables of the X. C ONCLUSION AND F UTURE W ORK
source code in a crucial dataflow graph. Then, the source code
and the dataflow graph are fed into GraphCodeBERT [13] for In the paper, we present G-Scan , the first end-to-end fine-
reentrancy vulnerability classification. grained line-level reentrancy vulnerability detection system
with evaluations on the first-of-its-kind real world dataset. G-
B. Source Code Graphs Scan first converts source code smart contracts to code graphs,
and then trains node classification models on the graphs to
In machine learning based vulnerability detection algo- localize vulnerabilities. Our experiments demonstrate that G-
rithms, converting source code into image or texts may lose Scan achieves 93.02% F1-score in contract-level vulnerability
structural information. Therefore, many work first convert the detection and 93.69% F1-score in line-level vulnerability local-
source code into graph structure, and then classify the graph ization on the real world dataset. Moreover, G-Scan processes
using different GNN models. low inference overhead and can localize vulnerabilities within
Heterogeneous Code Graph Heterogeneous code less than 1.2s for 6k LoC smart contracts. We will open-source
graph [19], [20], [26], [43] construct graph connections based G-Scan along with its dataset to promote research in this area.
on the semantic information and program flow paths, which In the future, we plan to extend our real world dataset to
is widely used in the smart contract vulnerability detection. include multiple vulnerability annotations and explore different
However, they failed to incorporate code dependencies and code graph representations that may improve GNN classifica-
hierarchies into modeling. Liu et al. [19], [20], [43] proposed tion performance.
to construct code graph based on three different node types.
Major nodes represent important function invocations or
critical variables. Other variables are seen as secondary nodes R EFERENCES
and fallback nodes symbolize a fallback function call. Edges [1] “Access control vulnerabilities in solidity smart contracts,.” [Online].
are defined by the feasible program flow paths. Additionally, Available: https://medium.com/ginger-security/access-control-vulnera
an elimination step is performed to reduce the graph size, bilities-in-solidity-smart-contracts-5e0871a00d77
which removes secondary and fallback nodes and redirects [2] “Integer overflow and underflow attacks on smart contracts,.” [Online].
Available: https://blockgeeks.com/guides/smart-contracts/
the edges. This final normalized graph is fed into a GNN that
[3] A. Ali, Z. U. Abideen, K. Ullah, and F. Ullah, “SESCon:
detects reentrancy, timestamp dependence, and infinite loop Secure Ethereum Smart Contracts by Vulnerable Patterns’ Detection,”
vulnerabilities. MANDO [26]’s code graphs are based on Secur. Commun. Networks, vol. 2021, 2021. [Online]. Available:
call graphs and control flows of the source code. It first uses https://doi.org/10.1155/2021/2897565
a multi-metapaths extractor to generate metapaths from the [4] M. Allamanis, “Graph Neural Networks in Program Analysis,” in
node types and their associated edges in these graphs. The Graph Neural Networks: Foundations, Frontiers, and Applications,
node embeddings are created with these metapaths and fused L. Wu, P. Cui, J. Pei, and L. Zhao, Eds. Singapore: Springer, 2021.
[Online]. Available: https://graph-neural-networks.github.io/gnnbook
with the metapaths by a heterogeneous attention mechanism Chapter22.html
at node level. Afterward, these node embeddings are fed
[5] M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to
into an MLP for graph classification to check whether the Represent Programs with Graphs,” CoRR, vol. abs/1711.00740, 2017.
contract is vulnerable. If that is the case, the exact location [Online]. Available: http://arxiv.org/abs/1711.00740
of the vulnerability inside the contract is determined by node [6] blitz 1306, “Repository solc-typed-ast,” https://github.com/ConsenSys
classification with the updated node embeddings in graph /solc-typed-ast, Last Access on December 26, 2022.
classification. [7] P. Bose, D. Das, Y. Chen, Y. Feng, C. Kruegel, and G. Vigna,
“SAILFISH: Vetting Smart Contract State-Inconsistency Bugs in
AST based Code Graph AST [4], [5], [40] based code Seconds,” CoRR, vol. abs/2104.08638, 2021. [Online]. Available:
graph generation is widely used in code vulnerability detection https://arxiv.org/abs/2104.08638

13
[8] V. Buterin, “Ethereum White Paper: A Next Generation Smart Contract Contracts,” in 34th IEEE/ACM ASE, 2019. [Online]. Available:
& Decentralized Application Platform,” 2013. [Online]. Available: https://doi.org/10.1109/ASE.2019.00133
https://github.com/ethereum/wiki/wiki/White-Paper [24] B. Mueller, “Smashing Ethereum Smart Contracts for Fun and Real
[9] N. Fatima Samreen and M. H. Alalfi, “Reentrancy vulnerability Profit,” in 9th HITBSecConf, Amsterdam, Netherlands, 2018. [Online].
identification in ethereum smart contracts,” in 2020 IEEE International Available: https://github.com/b-mueller/smashing-smart-contracts/blob/
Workshop on Blockchain Oriented Software Engineering (IWBOSE), master/smashing-smart-contracts-1of1.pdf
2020. [Online]. Available: https://doi.org/10.1109/IWBOSE50093.2020 [25] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,”
.9050260 Accessed 2022. [Online]. Available: http://bitcoin.org/bitcoin.pdf
[10] E. Foundation, “Repository Solidity Compiler Solc,” Last Access on [26] H. H. Nguyen, N.-M. Nguyen, C. Xie, Z. Ahmadi, D. Kudendo,
December 26, 2022. [Online]. Available: https://github.com/ethereum/ T.-N. Doan, and L. Jiang, “Mando: Multi-level heterogeneous graph
solidity/releases embeddings for fine-grained detection of smart contract vulnerabilities,”
[11] J.-R. Giesen, S. Andreina, M. Rodler, G. O. Karame, and L. Davi, ArXiv, 2022. [Online]. Available: https://arxiv.org/abs/2208.13252
“Practical mitigation of smart contract bugs,” ArXiv, 2022. [Online]. [27] C. Sendner, H. Chen, H. Fereidooni, L. Petzi, J. König, J. Stang,
Available: http://arxiv.org/abs/2203.00364 A. Dmitrienko, A.-R. Sadeghi, and F. Koushanfar, “Smarter contracts:
[12] G. Grieco, W. Song, A. Cygan, J. Feist, and A. Groce, “Echidna: Detecting vulnerabilities in smart contracts with deep transfer learning,”
Effective, Usable, and Fast Fuzzing for Smart Contracts,” in in Proceedings of the 2023 Network and Distributed System Security
Proceedings of the 29th ACM SIGSOFT International Symposium on (NDSS) Symposium, ser. NDSS ’23, 01 2023.
Software Testing and Analysis, ser. ISSTA 2020. New York, NY, [28] D. She, R. Krishna, L. Yan, S. Jana, and B. Ray, “MTFuzz: Fuzzing
USA: ACM, 2020. [Online]. Available: https://doi.org/10.1145/339536 with a Multi-Task Neural Network,” in Proceedings of the 28th
3.3404366 ACM Joint Meeting on European Software Engineering Conference
[13] D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, and Symposium on the Foundations of Software Engineering, ser.
N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. B. ESEC/FSE 2020. New York, NY, USA: ACM, 2020. [Online].
Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou, Available: https://doi.org/10.1145/3368089.3409723
“GraphCodeBERT: Pre-training Code Representations with Data Flow,” [29] D. She, K. Pei, D. Epstein, J. Yang, B. Ray, and S. S. Jana,
CoRR, 2020. [Online]. Available: https://arxiv.org/abs/2009.08366 “NEUZZ: Efficient Fuzzing with Neural Program Smoothing,” IEEE
[14] X. Hao, W. Ren, W. Zheng, and T. Zhu, “SCScan: A SVM-Based Symposium on Security and Privacy (SP), 2019. [Online]. Available:
Scanning System for Vulnerabilities in Blockchain Smart Contracts,” https://doi.org/0.1109/SP.2019.00052
in 2020 IEEE 19th International Conference on Trust, Security [30] S. So, S. Hong, and H. Oh, “SmarTest: Effectively Hunting
and Privacy in Computing and Communications (TrustCom), 2020. Vulnerable Transaction Sequences in Smart Contracts through
[Online]. Available: https://doi.org/10.1109/TrustCom50675.2020.002 Language Model-Guided Symbolic Execution,” in 30th USENIX
21 Security Symposium. USENIX Association, 2021. [Online]. Available:
[15] T. H.-D. Huang, “Hunting the ethereum smart contract: Color-inspired https://www.usenix.org/conference/usenixsecurity21/presentation/so
inspection of potential attacks,” arXiv preprint arXiv:1807.01868, 2018. [31] D. Soto, A. Bergel, and A. G. Hevia, “Fuzzing to Estimate Gas Costs
[16] B. Jiang, Y. Liu, and W. K. Chan, “ContractFuzzer: fuzzing of Ethereum Contracts,” IEEE ICSME, 2020. [Online]. Available:
smart contracts for vulnerability detection,” Proceedings of the https://doi.org/10.1109/ICSME46990.2020.00073
33rd ACM/IEEE International Conference on Automated Software [32] The PyG Team, “PyTorch Geometric,” https://www.pyg.org/, Last
Engineering, 2018. [Online]. Available: http://dx.doi.org/10.1145/323 Access on December 26, 2022.
8147.3238177 [33] S. Tikhomirov, E. Voskresenskaya, I. Ivanitskiy, R. Takhaviev,
[17] João F. Ferreira, “SmartBugs Wild Dataset,” https://github.com/smart E. Marchenko, and Y. Alexandrov, “SmartCheck: Static Analysis
bugs/smartbugs-wild, Last Access on December 26, 2022. of Ethereum Smart Contracts,” in 1st IEEE/ACM WETSEB, 2018.
[18] J. Krupp and C. Rossow, “teEther: Gnawing at Ethereum to [Online]. Available: https://ieeexplore.ieee.org/document/8445052
Automatically Exploit Smart Contracts,” in 27th USENIX Security [34] Torch Contributors, “PyTorch,” https://pytorch.org/, Last Access on
Symposium. Baltimore, MD: USENIX Association, 2018. [Online]. December 26, 2022.
Available: https://www.usenix.org/conference/usenixsecurity18/present [35] C. F. Torres, A. K. Iannillo, A. Gervais, and R. State, “ConFuzzius:
ation/krupp Towards Smart Hybrid Fuzzing for Smart Contracts,” CoRR, vol.
[19] Z. Liu, P. Qian, X. Wang, Y. Zhuang, L. Qiu, and X. Wang, abs/2005.12156, 2020. [Online]. Available: https://arxiv.org/abs/2005.1
“Combining Graph Neural Networks with Expert Knowledge for Smart 2156
Contract Vulnerability Detection,” IEEE Transactions on Knowledge [36] C. F. Torres, J. Schütte, and R. State, “Osiris: Hunting for Integer
and Data Engineering, vol. 1, no. 01, 2021. [Online]. Available: Bugs in Ethereum Smart Contracts,” in Proceedings of the 34th
https://doi.org/10.1109/TKDE.2021.3095196 Annual Computer Security Applications Conference, ser. ACSAC
[20] Z. Liu, P. Qian, X. Wang, L. Zhu, Q. He, and S. Ji, “Smart Contract ’18. New York, NY, USA: ACM, 2018. [Online]. Available:
Vulnerability Detection: From Pure Neural Network to Interpretable https://doi.org/10.1145/3274694.3274737
Graph Feature and Expert Pattern Fusion,” in Proceedings of the [37] W. Wang, J. Song, G. Xu, Y. Li, H. Wang, and C. Su, “ContractWard:
Thirtieth International Joint Conference on Artificial Intelligence, Automated Vulnerability Detection Models for Ethereum Smart
IJCAI-21, Z.-H. Zhou, Ed. International Joint Conferences on Contracts,” IEEE Transactions on Network Science and Engineering,
Artificial Intelligence Organization, 2021. [Online]. Available: https: vol. 8, no. 2, 2021. [Online]. Available: https://doi.org/10.1109/TNSE
//doi.org/10.24963/ijcai.2021/379 .2020.2968505
[21] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making [38] H. Wu, Z. Zhang, S. Wang, Y. Lei, B. Lin, Y. Qin, H. Zhang,
Smart Contracts Smarter,” in Proceedings of the 2016 ACM SIGSAC and X. Mao, “Peculiar: Smart Contract Vulnerability Detection
Conference on Computer and Communications Security, ser. CCS Based on Crucial Data Flow Graph and Pre-training Techniques,”
’16. New York, NY, USA: ACM, 2016. [Online]. Available: in 2021 IEEE 32nd International Symposium on Software Reliability
https://doi.org/10.1145/2976749.2978309 Engineering (ISSRE). IEEE, 2021. [Online]. Available: https:
[22] M. I. Mehar, C. L. Shier, A. Giambattista, E. Gong, G. Fletcher, //shangwenwang.github.io/files/ISSRE-21.pdf
R. Sanayhie, H. M. Kim, and M. Laskowski, “Understanding a rev- [39] Y. Xu, G. Hu, L. You, and C. Cao, “A novel machine learning-
olutionary and flawed grand experiment in blockchain: the dao attack,” based analysis model for smart contract vulnerability,” Security and
Journal of Cases on Information Technology (JCIT), vol. 21, no. 1, pp. Communication Networks, vol. 2021, pp. 1–12, 2021.
19–32, 2019. [40] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and
[23] M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco, Discovering Vulnerabilities with Code Property Graphs,” in 2014
J. Feist, T. Brunson, and A. Dinaburg, “Manticore: A User- IEEE Symposium on Security and Privacy, 2014. [Online]. Available:
Friendly Symbolic Execution Framework for Binaries and Smart https://doi.org/10.1109/SP.2014.44

14
[41] Z. Yang, J. Keung, X. Yu, X. Gu, Z. Wei, X. Ma, and M. Zhang,
“A multi-modal transformer-based code summarization approach for
smart contracts,” in 2021 IEEE/ACM 29th International Conference on
Program Comprehension (ICPC). IEEE, 2021, pp. 1–12.
[42] K. Zhou, J. Cheng, H. Li, Y. Yuan, L. Liu, and X. Li, “SC-VDM: A
Lightweight Smart Contract Vulnerability Detection Model,” in Data
Mining and Big Data, Y. Tan, Y. Shi, A. Zomaya, H. Yan, and J. Cai,
Eds. Singapore: Springer Singapore, 2021.
[43] Y. Zhuang, Z. Liu, P. Qian, Q. Liu, X. Wang, and Q. He, “Smart
Contract Vulnerability Detection using Graph Neural Network,” in
Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, IJCAI-20, C. Bessiere, Ed. International Joint
Conferences on Artificial Intelligence Organization, 2020. [Online].
Available: https://doi.org/10.24963/ijcai.2020/454

15

You might also like