LLVM Compiler and Its Intermediate Representation
Yulei Sui
University of Technology Sydney, Australia
1
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Where We Are Now and Today’s Class
Assignment-1
Self-built Graph Traversal
Graph Algorithm
2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Where We Are Now and Today’s Class
Assignment-1
Self-built Graph Traversal
Graph Algorithm
Assignment-2 Assignment-3
SVFIR
C LLVM Compiler SVF Control-flow Manual Assertion-
Control-flow
Program Compiler IR Reachability Analysis based Verification
Graph
Automated Assertion-
based Verification
Assignment-4
3
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Where We Are Now and Today’s Class
Assignment-1
Self-built Graph Traversal
Graph Algorithm
Assignment-2 Assignment-3
SVFIR
C LLVM Compiler SVF Control-flow Manual Assertion-
Control-flow
Program Compiler IR Reachability Analysis based Verification
Graph
Automated Assertion-
Today's class based Verification
Assignment-4
4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
What is LLVM ?
LLVM compiler infrastructure is a collection of compiler and tool-chain
technologies.
• Originally started in 2000 from UIUC. An open-source project and supported
and contributed by a range of high-tech companies such as Apple, Google,
Intel, ARM.
• Modern compiler infrastructure can be used to develop a front-end for any
programming language and a back-end for any instruction set
architecture.
• A set of reusable software modules to quickly design your own compiler or
software tool chains.
• Language-independent intermediate representation (IR) used for a variety
of purposes, such as compiler optimizations, static analysis and bug
detection.
• More information on LLVM’s website: https://llvm.org/
5
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Why Learn LLVM or Learn Compilers in General?
• An essential part of the standard curriculum in computer science.
• One of the most complex systems required for building virtually all other
software.
• A perfect base framework to build your own tools for code analysis and
verification
• Sharpen your software design and implementation skills.
• Widely used by many major software companies. In-demand skills and
competitive salaries in job market.
6
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM’s Architecture
Source Code Front End Optimizer Passes Back End Machine Code
.c/.cpp
X86
pass1
pass2
swift Clang IR ... IR
Code
Generation
ARM
Rust Power PC
.
. .
. .
.
*IR: Human-readable LLVM IR (.ll files) or dense ’bitcode’ binary representation (.bc files)
7
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM’s Architecture
Source Code Front End Optimizer Passes Back End Machine Code
.c/.cpp
X86
pass1
pass2
swift Clang IR ... IR
Code
Generation
ARM
Rust Power PC
-loop-vectorize: Loop Vectorization
-loop-unroll: Loop Unrolling
.
.
-dse: Dead Store Elimination .
. -mem2reg: Promote Memory to Register .
.
.
.
.
*IR: Human-readable LLVM IR (.ll files) or dense ’bitcode’ binary representation (.bc files)
8
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
• Clear lexical scope, such as modules, functions, basic blocks, and
instructions
9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
• Clear lexical scope, such as modules, functions, basic blocks, and
instructions
• 3-address code style in static single assignment (SSA) form
9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
• Clear lexical scope, such as modules, functions, basic blocks, and
instructions
• 3-address code style in static single assignment (SSA) form
• Variables are strongly typed
• Global variable (symbol starting with ‘@‘)
• Stack/register variable (symbol starting with ‘%‘)
• Three addresses and one operator.
• For example, ‘a = b op c‘, where ‘a‘, ‘b‘, ‘c‘ are either programmer defined variables
(e.g., heap, global or stack), constants or compiler-generated temporary names.
‘op‘ stands for an operation which is applied on ‘a‘ and ‘b‘.
9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Compiling a C Program to Its LLVM IR
Clang/LLVM compiler options
• Compile a C program ‘swap.c‘ to a human readable IR ‘swap.ll‘.
• clang -c -S -emit-llvm swap.c -o swap.ll
• Compilation without optimisation.
• clang -c -S -Xclang -disable-O0-optnone -emit-llvm swap.c -o
swap.ll
• Keep the variable names.
• clang -c -S -fno-discard-value-names -Xclang -disable-O0-optnone
-emit-llvm swap.c -o swap.ll
• Convert the LLVM IR to more compact SSA form for later static analysis.
• opt -S -mem2reg swap.ll -o swap.ll
10
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Compiling a C Program to Its LLVM IR
An example
define void @swap(i8** %p, i8** %q) #0 {
entry:
void swap(char **p, char **q){ %0 = load i8*, i8** %p, align 8
char* t = *p; %1 = load i8*, i8** %q, align 8
*p = *q; store i8* %1, i8** %p, align 8
*q = t; store i8* %0, i8** %q, align 8
} ret void
int main(){ }
char a1;
char *a;
compile
define i32 @main() #0 {
char b1; entry:
char *b; %a1 = alloca i8, align 1
a = &a1; %a = alloca i8*, align 8
b = &b1; %b1 = alloca i8, align 1
swap(&a,&b); %b = alloca i8*, align 8
} store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8
swap.c call void @swap(i8** %a, i8** %b)
ret i32 0
}
swap.ll
11
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
C code to LLVM IR
An example
define void @swap(i8** %p, i8** %q) #0 {
entry:
void swap(char **p, char **q){ %0 = load i8*, i8** %p, align 8
char* t = *p; %1 = load i8*, i8** %q, align 8
*p = *q; store i8* %1, i8** %p, align 8 Function
*q = t; store i8* %0, i8** %q, align 8
} ret void
int main(){ }
char a1;
char *a;
compile
define i32 @main() #0 {
char b1; entry: BasicBlock
char *b; %a1 = alloca i8, align 1
a = &a1; %a = alloca i8*, align 8
b = &b1; %b1 = alloca i8, align 1
swap(&a,&b); %b = alloca i8*, align 8
} store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8 Instruction
swap.c call void @swap(i8** %a, i8** %b)
ret i32 0
}
swap.ll
12
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
Structure Organization
Module contains Functions and Global Vari-
ables
- Whole module is the unit of
translation, analysis and optimization.
Module
Function contains BasicBlocks and Argu-
Function ments, which correspond to functions.
BasicBlock contains list of instructions.
Global BasicBlock
- Each block is contiguous chunk of in-
Vairables
structions
instruction
Instruction is opcode + vector of operands
in '3-address' style
- All operands have types
- Instruction result is typed
LLVM-IR Scopes
13
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
%a1 = alloca i8, align 1
define i32 @main() #0 {
int main() entry: register
{ %a1 = alloca i8, align 1 variable
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
identifiers:
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 [% / @] [-a-zA-Z$._][-a-zA-Z$._0-9]
b = &b1; call void @swap(i8** %a, i8** %b) - % is for local variable
swap(&a,&b); ret i32 0 - @ is for global
} } - temporary variables are numbered
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
%a1 = alloca i8, align 1
define i32 @main() #0 {
int main() entry: register
{ %a1 = alloca i8, align 1 variable
char a1; %a = alloca i8*, align 8 instruction
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
identifiers:
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 [% / @] [-a-zA-Z$._][-a-zA-Z$._0-9]
b = &b1; call void @swap(i8** %a, i8** %b) - % is for local variable
swap(&a,&b); ret i32 0 - @ is for global
} } - temporary variables are numbered
alloca: instruction allocates i8
(sizeof) bytes of memory on run-
time stack
15
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
%a1 = alloca i8, align 1
define i32 @main() #0 {
int main() entry: register 8 bit-integer
{ %a1 = alloca i8, align 1 variable
char a1; %a = alloca i8*, align 8 instruction memory alignment
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
identifiers:
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 [% / @] [-a-zA-Z$._][-a-zA-Z$._0-9]
b = &b1; call void @swap(i8** %a, i8** %b) - % is for local variable
swap(&a,&b); ret i32 0 - @ is for global
} } - temporary variables are numbered
alloca: instruction allocates i8
(sizeof) bytes of memory on run-
time stack
align: indicates the memory operation
should be aligned to 1 byte
16
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
%a = alloca i8*, align 8
define i32 @main() #0 {
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8 allocate 8-bit integer pointer for a
a = &a1; store i8* %b1, i8** %b, align 8
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }
17
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
%b1 = alloca i8, align 1
define i32 @main() #0 {
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8
allocate 8-bit integer for b1
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }
18
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
%b = alloca i8*, align 8
define i32 @main() #0 {
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8 allocate 8-bit integer pointer for b
a = &a1; store i8* %b1, i8** %b, align 8
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }
19
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
store i8* %a1, i8!" %a align 8
define i32 @main() #0 {
int main() entry: instruction
{ %a1 = alloca i8, align 1
8-bit integer typed pointer %a1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8
b = &b1; call void @swap(i8** %a, i8** %b) store the pointer %a1 to the memory
swap(&a,&b); ret i32 0 location that %a points to
} }
20
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
store i8* %b1, i8!" %b align 8
define i32 @main() #0 { instruction
int main() entry:
{ %a1 = alloca i8, align 1 8-bit integer typed pointer %b1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 store the pointer %b1 to the memory
b = &b1; call void @swap(i8** %a, i8** %b) location that %b points to
swap(&a,&b); ret i32 0
} }
21
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions
call void @swap(i8!" %a, i8!" %b)
define i32 @main() #0 { function call function name typed params
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8 call instruction will be
a = &a1; store i8* %b1, i8** %b, align 8
used to build control flow.
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }
22
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Documentations
• LLVM Language Reference Manual https://llvm.org/docs/LangRef.html
• LLVM Programmer’s Manual
https://llvm.org/docs/ProgrammersManual.html
• Writing an LLVM Pass http://llvm.org/docs/WritingAnLLVMPass.html
• Tutorials for Clang/LLVM
https://freecompilercamp.org/clang-llvm-landing
• LLVM Tutorial IEEE SecDev 2020 https://cs.rochester.edu/u/ejohns48/
secdev19/secdev20-llvm-tutorial-version4_copy.pdf
23
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification