Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views26 pages

4 LLVM

The document provides an overview of the LLVM compiler infrastructure, detailing its architecture and the LLVM Intermediate Representation (IR). It emphasizes the importance of learning LLVM for software verification, code analysis, and its relevance in the job market. Additionally, it includes examples of compiling C programs to LLVM IR, showcasing the structure and instructions of LLVM IR.

Uploaded by

shrydhpd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

4 LLVM

The document provides an overview of the LLVM compiler infrastructure, detailing its architecture and the LLVM Intermediate Representation (IR). It emphasizes the importance of learning LLVM for software verification, code analysis, and its relevance in the job market. Additionally, it includes examples of compiling C programs to LLVM IR, showcasing the structure and instructions of LLVM IR.

Uploaded by

shrydhpd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

LLVM Compiler and Its Intermediate Representation

Yulei Sui
University of Technology Sydney, Australia

1
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Where We Are Now and Today’s Class
Assignment-1

Self-built Graph Traversal


Graph Algorithm

2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Where We Are Now and Today’s Class
Assignment-1

Self-built Graph Traversal


Graph Algorithm

Assignment-2 Assignment-3
SVFIR
C LLVM Compiler SVF Control-flow Manual Assertion-
Control-flow
Program Compiler IR Reachability Analysis based Verification
Graph

Automated Assertion-
based Verification
Assignment-4

3
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Where We Are Now and Today’s Class
Assignment-1

Self-built Graph Traversal


Graph Algorithm

Assignment-2 Assignment-3
SVFIR
C LLVM Compiler SVF Control-flow Manual Assertion-
Control-flow
Program Compiler IR Reachability Analysis based Verification
Graph

Automated Assertion-
Today's class based Verification
Assignment-4

4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
What is LLVM ?
LLVM compiler infrastructure is a collection of compiler and tool-chain
technologies.
• Originally started in 2000 from UIUC. An open-source project and supported
and contributed by a range of high-tech companies such as Apple, Google,
Intel, ARM.
• Modern compiler infrastructure can be used to develop a front-end for any
programming language and a back-end for any instruction set
architecture.
• A set of reusable software modules to quickly design your own compiler or
software tool chains.
• Language-independent intermediate representation (IR) used for a variety
of purposes, such as compiler optimizations, static analysis and bug
detection.
• More information on LLVM’s website: https://llvm.org/
5
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Why Learn LLVM or Learn Compilers in General?

• An essential part of the standard curriculum in computer science.


• One of the most complex systems required for building virtually all other
software.
• A perfect base framework to build your own tools for code analysis and
verification
• Sharpen your software design and implementation skills.
• Widely used by many major software companies. In-demand skills and
competitive salaries in job market.

6
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM’s Architecture

Source Code Front End Optimizer Passes Back End Machine Code

.c/.cpp
X86

pass1

pass2
swift Clang IR ... IR
Code
Generation
ARM

Rust Power PC

.
. .
. .
.

*IR: Human-readable LLVM IR (.ll files) or dense ’bitcode’ binary representation (.bc files)
7
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM’s Architecture

Source Code Front End Optimizer Passes Back End Machine Code

.c/.cpp
X86

pass1

pass2
swift Clang IR ... IR
Code
Generation
ARM

Rust Power PC
-loop-vectorize: Loop Vectorization
-loop-unroll: Loop Unrolling
.
.
-dse: Dead Store Elimination .
. -mem2reg: Promote Memory to Register .
.
.
.
.
*IR: Human-readable LLVM IR (.ll files) or dense ’bitcode’ binary representation (.bc files)
8
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly

9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
• Clear lexical scope, such as modules, functions, basic blocks, and
instructions

9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
• Clear lexical scope, such as modules, functions, basic blocks, and
instructions
• 3-address code style in static single assignment (SSA) form

9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM IR is LLVM’s code representation which is generated by its front-end
clang when compiling a program (https://llvm.org/docs/LangRef.html)
• Language independent. Not machine code, but one step just above
assembly
• Clear lexical scope, such as modules, functions, basic blocks, and
instructions
• 3-address code style in static single assignment (SSA) form
• Variables are strongly typed
• Global variable (symbol starting with ‘@‘)
• Stack/register variable (symbol starting with ‘%‘)
• Three addresses and one operator.
• For example, ‘a = b op c‘, where ‘a‘, ‘b‘, ‘c‘ are either programmer defined variables
(e.g., heap, global or stack), constants or compiler-generated temporary names.
‘op‘ stands for an operation which is applied on ‘a‘ and ‘b‘.

9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Compiling a C Program to Its LLVM IR
Clang/LLVM compiler options

• Compile a C program ‘swap.c‘ to a human readable IR ‘swap.ll‘.


• clang -c -S -emit-llvm swap.c -o swap.ll
• Compilation without optimisation.
• clang -c -S -Xclang -disable-O0-optnone -emit-llvm swap.c -o
swap.ll
• Keep the variable names.
• clang -c -S -fno-discard-value-names -Xclang -disable-O0-optnone
-emit-llvm swap.c -o swap.ll
• Convert the LLVM IR to more compact SSA form for later static analysis.
• opt -S -mem2reg swap.ll -o swap.ll

10
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Compiling a C Program to Its LLVM IR
An example

define void @swap(i8** %p, i8** %q) #0 {


entry:
void swap(char **p, char **q){ %0 = load i8*, i8** %p, align 8
char* t = *p; %1 = load i8*, i8** %q, align 8
*p = *q; store i8* %1, i8** %p, align 8
*q = t; store i8* %0, i8** %q, align 8
} ret void
int main(){ }
char a1;
char *a;
compile
define i32 @main() #0 {
char b1; entry:
char *b; %a1 = alloca i8, align 1
a = &a1; %a = alloca i8*, align 8
b = &b1; %b1 = alloca i8, align 1
swap(&a,&b); %b = alloca i8*, align 8
} store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8
swap.c call void @swap(i8** %a, i8** %b)
ret i32 0
}

swap.ll
11
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
C code to LLVM IR
An example

define void @swap(i8** %p, i8** %q) #0 {


entry:
void swap(char **p, char **q){ %0 = load i8*, i8** %p, align 8
char* t = *p; %1 = load i8*, i8** %q, align 8
*p = *q; store i8* %1, i8** %p, align 8 Function
*q = t; store i8* %0, i8** %q, align 8
} ret void
int main(){ }
char a1;
char *a;
compile
define i32 @main() #0 {
char b1; entry: BasicBlock
char *b; %a1 = alloca i8, align 1
a = &a1; %a = alloca i8*, align 8
b = &b1; %b1 = alloca i8, align 1
swap(&a,&b); %b = alloca i8*, align 8
} store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8 Instruction
swap.c call void @swap(i8** %a, i8** %b)
ret i32 0
}

swap.ll
12
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
Structure Organization

Module contains Functions and Global Vari-


ables
- Whole module is the unit of
translation, analysis and optimization.
Module
Function contains BasicBlocks and Argu-
Function ments, which correspond to functions.

BasicBlock contains list of instructions.


Global BasicBlock
- Each block is contiguous chunk of in-
Vairables
structions
instruction
Instruction is opcode + vector of operands
in '3-address' style
- All operands have types
- Instruction result is typed
LLVM-IR Scopes

13
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

%a1 = alloca i8, align 1


define i32 @main() #0 {
int main() entry: register
{ %a1 = alloca i8, align 1 variable
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
identifiers:
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 [% / @] [-a-zA-Z$._][-a-zA-Z$._0-9]
b = &b1; call void @swap(i8** %a, i8** %b) - % is for local variable
swap(&a,&b); ret i32 0 - @ is for global
} } - temporary variables are numbered

14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

%a1 = alloca i8, align 1


define i32 @main() #0 {
int main() entry: register
{ %a1 = alloca i8, align 1 variable
char a1; %a = alloca i8*, align 8 instruction
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
identifiers:
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 [% / @] [-a-zA-Z$._][-a-zA-Z$._0-9]
b = &b1; call void @swap(i8** %a, i8** %b) - % is for local variable
swap(&a,&b); ret i32 0 - @ is for global
} } - temporary variables are numbered
alloca: instruction allocates i8
(sizeof) bytes of memory on run-
time stack

15
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

%a1 = alloca i8, align 1


define i32 @main() #0 {
int main() entry: register 8 bit-integer
{ %a1 = alloca i8, align 1 variable
char a1; %a = alloca i8*, align 8 instruction memory alignment
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
identifiers:
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 [% / @] [-a-zA-Z$._][-a-zA-Z$._0-9]
b = &b1; call void @swap(i8** %a, i8** %b) - % is for local variable
swap(&a,&b); ret i32 0 - @ is for global
} } - temporary variables are numbered
alloca: instruction allocates i8
(sizeof) bytes of memory on run-
time stack

align: indicates the memory operation


should be aligned to 1 byte
16
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

%a = alloca i8*, align 8


define i32 @main() #0 {
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8 allocate 8-bit integer pointer for a
a = &a1; store i8* %b1, i8** %b, align 8
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }

17
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

%b1 = alloca i8, align 1


define i32 @main() #0 {
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8
allocate 8-bit integer for b1
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }

18
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

%b = alloca i8*, align 8


define i32 @main() #0 {
int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8 allocate 8-bit integer pointer for b
a = &a1; store i8* %b1, i8** %b, align 8
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }

19
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

store i8* %a1, i8!" %a align 8


define i32 @main() #0 {
int main() entry: instruction
{ %a1 = alloca i8, align 1
8-bit integer typed pointer %a1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8
b = &b1; call void @swap(i8** %a, i8** %b) store the pointer %a1 to the memory
swap(&a,&b); ret i32 0 location that %a points to
} }

20
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

store i8* %b1, i8!" %b align 8

define i32 @main() #0 { instruction


int main() entry:
{ %a1 = alloca i8, align 1 8-bit integer typed pointer %b1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8
a = &a1; store i8* %b1, i8** %b, align 8 store the pointer %b1 to the memory
b = &b1; call void @swap(i8** %a, i8** %b) location that %b points to
swap(&a,&b); ret i32 0
} }

21
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Intermediate Representation (IR)
LLVM Instructions

call void @swap(i8!" %a, i8!" %b)

define i32 @main() #0 { function call function name typed params


int main() entry:
{ %a1 = alloca i8, align 1
char a1; %a = alloca i8*, align 8
char *a; %b1 = alloca i8, align 1
char b1; %b = alloca i8*, align 8
char *b; store i8* %a1, i8** %a, align 8 call instruction will be
a = &a1; store i8* %b1, i8** %b, align 8
used to build control flow.
b = &b1; call void @swap(i8** %a, i8** %b)
swap(&a,&b); ret i32 0
} }

22
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
LLVM Documentations

• LLVM Language Reference Manual https://llvm.org/docs/LangRef.html


• LLVM Programmer’s Manual
https://llvm.org/docs/ProgrammersManual.html
• Writing an LLVM Pass http://llvm.org/docs/WritingAnLLVMPass.html
• Tutorials for Clang/LLVM
https://freecompilercamp.org/clang-llvm-landing
• LLVM Tutorial IEEE SecDev 2020 https://cs.rochester.edu/u/ejohns48/
secdev19/secdev20-llvm-tutorial-version4_copy.pdf

23
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification

You might also like