Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 313d80c

Browse files
author
Kaipeng Zeng
committed
Create cover_filter.md & kstate_resource.md
1 parent 25e3faf commit 313d80c

File tree

2 files changed

+278
-0
lines changed

2 files changed

+278
-0
lines changed

syzkaller/cover_filter.md

+186
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Syzkaller cover filter and weighted PCs
2+
3+
## Content
4+
1. Usage.
5+
2. Implement detail.
6+
3. Practice.
7+
8+
To implement coverage filter in syzkaller. we have to follow the next steps:
9+
10+
1. Get the LLVM ir code and assembly code of target.
11+
2. Get the addresses map of target functions by analyzing ir code, assembly code and kernel ELF.
12+
3. Support cover filter and weighted PCs in syzkaller.
13+
14+
After step 1 and 2, you will get a addresses map contains addresses of any kernel functions you need. Also, you can attach weight to every PC base on LLVM ir analysis, eg. weighted PCs base on CFG information.
15+
16+
## Usage
17+
18+
### Get LLVM ir code and assembly code
19+
20+
Lots of static analysis tools can be used to parse ir code. But ir code know nothing about addresses of the final executable file while the assembly code holds both address offset and basic block information. By analyzing them, we can associate ir information with addresses.
21+
To get ir code and assembly code, you need to pick out the source file where your target functions located at. For example, if your target function is in /net/ipv4/tcp.c, you should run this command in your kernel build tree:
22+
23+
```
24+
make CC=clang net/ipv4/tcp.o -n | grep tcp.c
25+
```
26+
27+
to get the command of compiling tcp.c, command may look like:
28+
29+
```
30+
clang ...... -c -o net/ipv4/tcp.o net/ipv4/tcp.c
31+
```
32+
33+
To get the LLVM ir code of tcp.c, run:
34+
35+
```
36+
clang ...... -S -o net/ipv4/tcp.ll net/ipv4/tcp.c -emit-llvm
37+
```
38+
39+
To get the assembly code of tcp.c, run:
40+
41+
```
42+
clang ...... -S -o net/ipv4/tcp.s net/ipv4/tcp.c
43+
```
44+
45+
Repeat the mentioned steps to get all ir codes and assembly codes of your target functions. Move them to a IR_DIR and ASM_DIR. Then build your kernel and get a VMLINUX file.
46+
47+
### Get PCs table
48+
49+
We use a [kcov_map](../static_analysis_tools/IRParser/kcov_map.cpp) tool to get addresses of the kernel functions we are interested in.
50+
Run the following command to build kcov_map:
51+
52+
```
53+
clang++-10 kcov_map.cpp -o kcov_map -O0 -g `llvm-config-10 --cxxflags --libs --ldflags --system-libs`
54+
```
55+
56+
```
57+
./kcov_map IR_DIR ASM_DIR VMLINUX_FILE FUNCTION_LIST LOG_DIR
58+
```
59+
60+
FUNCTION_LIST has functions name that we need to get their addresses.
61+
IR_DIR: directory all the LLVM ir code we need.
62+
ASM_DIR: directory all the assembly code we need.
63+
VMLINUX_FILE: kernel ELF
64+
LOG_DIR: after run the command, kcov_map will creat a "*.json" and a "*.addr.map" for every function.
65+
Then run:
66+
67+
```
68+
cat LOG_DIR/*.addr.map > funcaddr.map
69+
```
70+
71+
Copy funcaddr.map to syzkaller work directory.
72+
This is only one of ways when we try to build functions addresses map with weight. You can explore how to build your functions addresses map for you need.
73+
74+
#### Extend functions list
75+
76+
In our practice, when we choose some member functions as entry, some functions may be a wrapper function but not the truly implement function. We use [extend_func](../static_analysis_tools/IRParser/extend_func.cpp) extend the function list.
77+
78+
```
79+
clang++-10 extend_func.cpp -o extend_func -O0 -g `llvm-config-10 --cxxflags --libs --ldflags --system-libs`
80+
```
81+
82+
```
83+
./extend_func FUNCTION_LIST IR_DIR
84+
```
85+
86+
You will get a FUNCTION_LIST.new which you can pass to kcov_map.
87+
88+
### Support cover filter in syzkaller
89+
90+
#### Patch syzkaller
91+
92+
Clone syzkaller, and run:
93+
94+
```
95+
git checkout a2cdad9
96+
git apply harbian-qa/syzkaller/cover_filter/*.patch
97+
```
98+
99+
Build syzkaller as usual.
100+
101+
#### Modify configure file
102+
103+
Add the following options in syz-manager configure file:
104+
105+
```
106+
"covfilter": true,
107+
"coverpcs": PATH_TO_FUNCTION_ADDRESS_MAP,
108+
```
109+
110+
The "covfilter" enable coverage filter of executor. If you only want to use weighted PCs feature without filter, set it to false. If you want to use cover filter only, without weighted PCs, just create your map that every PC has weight 1.
111+
Now you can run a syzkaller with cover filter.
112+
113+
## Implement detail of cover filter
114+
115+
### manager
116+
117+
#### Read weighted pcs from funcaddr.map
118+
119+
The configure specifies which funcaddr.map should be loaded and send to VM. Function readPCsWeight in syz-manager/manager.go will read the funcaddr.map and maintain a pcsWeight map in structure manager. This pcsWeight map can be used while calculating the weight of prog in web UI.
120+
121+
#### RPC interface for sending addresses map to fuzzer
122+
123+
Extend a getPCsWeight interface in RPCManagerView in syz-manager/rpc.go for waiting client call( fuzzer) for getting a pcsWeight map.
124+
125+
#### Display the pc and its weight in source code
126+
127+
Use the syzkaller web UI "cover", we extend an interface called bitmap. It will convert PCs table to source lines. The color of lines is black means the block of this line won't be drop while fuzzing. The number at the left is the weight of that line. Note that there may be multiple block maps to a source line. Their weight will add to this line.
128+
129+
### fuzzer
130+
131+
#### getPCsWeight from syz-manager
132+
133+
Add a getPCsWeight() for fuzzer, so fuzzer can dynamically fetch PCs table from syz-manager. In other words, it's possible to dynamically distribute PCs table to different fuzzers. For example, light PCs weight while some block has been fully explored[]().
134+
135+
#### Calculate the prog prio from its cover
136+
137+
We implement a function calCoverWeight in syz-fuzzer/proc.go to calculate the weight and attach to structure prog. You can implement your algorithm of calculating weight base on weighted pc in this function.
138+
139+
#### Choose prog to mutate base on prog prio
140+
141+
Syzkaller already has its prior choice base on signals length of the prog. We have to modify the addInputToCorpus function to use out prog weight.
142+
143+
### executor
144+
145+
#### Read pcs map
146+
147+
The executor/bitmap.h implement function for getting PCs table from the map.
148+
149+
##### Fast cover filtering.
150+
151+
Unlike manager and fuzzer, executor coverage filter run more frequently. Without a fast search, if the PCs table grow up, the affect of performance can be a disaster. So we use a fast but rough way, bitmap, to address this program. We assume that kernel text size is less than 0x3000000, and we maintain a map:
152+
```
153+
#define COVERAGE_BITMAP_SIZE 0x300000 / sizeof(uint32)
154+
static uint32 kTextBitMap[COVERAGE_BITMAP_SIZE];
155+
```
156+
Because address align, the lowest 4-bit is dropped off. So, for quickly setting and accessing the bit which record if a pc should be filtered, we can search by:
157+
```
158+
pc &= 0xffffffff;
159+
pc -= KERNEL_TEXT_BASE;
160+
uint64 pcc = pc >> 4;
161+
uint64 index = pcc / 32;
162+
uint64 shift = pcc % 32;
163+
164+
kTextBitMap[index] & (0x1 << shift)
165+
```
166+
The affect of performance will not grow up no mater how many PCs should be filtered.
167+
168+
## Some PCs-weight-guide fuzzing practice
169+
170+
Cover filtering is quite certain that you can only set if the edge of that pc will be sent to fuzzer as a signal or not. But, weighted PCs can guide fuzzer to evolve prog flexibly. You can assign weight to PCs base on the result from LLVM ir static analysis.
171+
172+
### Cyclomatic complexity base on llvm CFG
173+
174+
In the theory of cyclomatic complexity[1](https://en.wikipedia.org/wiki/Cyclomatic_complexity), a function can be treated as a one-entry and one-exit model, the complexity can be easily calculated. In realistic application, complexity indicates that program testing should pay more attention to those functions that are more complex.
175+
176+
### Basic block count base on llvm BlockFrequenceInfo
177+
178+
The LLVM class [BlockFrequencyInfo](https://llvm.org/doxygen/classllvm_1_1BlockFrequencyInfo.html) is a convenient way to get the frequency of a block will appear in all potential control-flow paths. It's reasonable that if a basic block appeared more frequently, mutate the prog that triggers this block has a higher probability to cover more other PCs edge.
179+
180+
### Basic block to basic block count base on llvm BranchProbabiltyInfo
181+
182+
The LLVM class [BranchProbabiltyInfo](https://llvm.org/doxygen/classllvm_1_1BranchProbabilityInfo.html) is another tool that can be used in fuzzing. The class has information about the probability of from a block to another block. If you want the fuzzer to evolve a testcase can cover a specific basic block, it's a good choice that uses BranchProbabilityInfo weighted the PCs.
183+
184+
### Weighted function call stack
185+
186+
The mentioned tools focus on if the functions should be fuzzed is already picked out, how to assign priorities to PCs base on CFG information. Sometimes, you may want to fuzz an approximate range, for example, a serial of functions from a call stack. LLVM class [CallGraph](https://llvm.org/doxygen/classllvm_1_1CallGraph.html) can help build the associate of functions call. You can assign low weight to those functions if they are deep and not so complex.

syzkaller/kstate_resource.md

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Kernel state based fuzzer
2+
3+
## Content
4+
5+
1. Usage.
6+
2. Implement detail.
7+
3. Practice
8+
9+
To implement collect kernel states as syzkaller resource, we have to follow the next steps:
10+
11+
1. Build kernel with GEPOperator tracker instrument.
12+
2. Support collecting kernel state in syzkaller.
13+
3. Weighted kernel states for fuzzer.
14+
15+
## Usage
16+
17+
### Kernel instrument
18+
19+
First, we need to implement a LLVM pass to do instrument. While we already knew, lots of states of kernel are located in some field of structure. Tracking the store operation of a variable of GEPointer can detect states which may help to fuzzer. Then, refer to [this document]() to build you compiler with field assignment tracker. While building kernel, you have to add line such like:
20+
```
21+
CFLAGS_*.o = -Xclang -load -Xclang PATH_TO_YOUR_PASS.so -fno-discard-value-names
22+
```
23+
to Makefile for the object file you need to instrument it. The kernel state id is the hash of structure name and field name.
24+
25+
### Implement the instrument function in kernel
26+
27+
Refer to our [implement](../instrument/kcov_trace_srt.patch) of instrument to collect kernel state. Then, build your kernel as usual.
28+
29+
### Patch syzkaller
30+
31+
Clone syzkaller, run:
32+
```
33+
git checkout a2cdad9
34+
git apply harbian-qa/syzkaller/cover_filter/*.patch
35+
```
36+
37+
build syzakller as usual. Add the following line to configure file:
38+
39+
```
40+
"kstatemap": "PATH_TO_KERNEL_STATE.map"
41+
```
42+
43+
You can use our tool [kstate_map](../static_analysis_tools/IRParser/kstate_map.cpp) get the kernel state map. run:
44+
45+
```
46+
clang++-10 kstate_map.cpp -o kstate_map -O0 -g -fsanitize=address `llvm-config-10 --cxxflags --libs --ldflags --system-libs`
47+
./kstate_map LLVM_IR_DIR ASM_DIR VMLINUX FUNCTION_LIST LOG_DIR
48+
```
49+
50+
FUNCTION_LIST has the functions name we need to get their addresses.
51+
IR_DIR: directory all the LLVM ir code we need.
52+
LOG_DIR: after run the command, kstate_map will creat a "*.json" and a "*.state.map" for every function.
53+
Write the output to PATH_TO_KERNEL_STATE.map. And run patched syzkaller as usual. This map assigns weight base on the frequency of state using.
54+
55+
## Kernel state base fuzzer
56+
Now, you can run syzkaller as usual, and you can find there is a list of kernel states if you access a "\input" interface. You can also get states weight of every prog in "/corpus" interface.
57+
58+
## Implement detail of kernel state resource
59+
60+
### Kernel instrument
61+
62+
We reuse the KCOV interface instead of using a separate mode. So, we encode the state id with 0xfefe at the highest 16-bit. While syzkaller gets a kcov pc started with 0xfefe, it realizes this pc is a kstate id and the value and address of the state will occupy the followed 2*64-bit. No matter how many bit the variable used, we formalize to 64-bit. Noted if you want to collect other information, you have to implement a corresponding syzkaller for it.
63+
64+
### Syzkaller support
65+
66+
#### executor
67+
68+
syz-executor have to pick out kernel states and send them out after all signal was sent. These handling can be found in our patch for executor.cc function write_coverage_signl. While executor read a pc started with 0xfefe, that means it receives a kernel state. And we use a chunk of shared memory for this state after coverage signal shared memory. syz-fuzzer will handle them later.
69+
70+
#### syz-fuzzer
71+
72+
Correspondingly, parseOutput in pkg/ipc.go is called by fuzzer and we add a readKernState for parse the executor output. And these kernel states information will be put into a structure called KernState in pkg/kstate/kstate.go. Every input from executor has an array for kernstate, and every prog has a state weight calculated from kernstates. Also, KernState support searching the map by its ID or ID^Value which called it hash.
73+
74+
syz-fuzzer/proc.go: calStateWeight will calculate the weight of a prog. Minus count for eliminating the influence of the length of kstate. prog/rand.go: chooseReaProgramIdx function implement a prior choice of prog base on its states weight
75+
76+
## Kernel state guide fuzzing practice
77+
78+
We have explored two ways in assigning weight to resources.
79+
80+
#### Get frequency of using kernel state
81+
82+
This tool is what we mentioned above kstate_map. We use LLVM api static analyze the using of states in target functions. Without any awareness of the value of a state, it just encourages fuzzer to preferentially choose and extract those progs that frequently rewrite important states. In other words, the prog has complex states.
83+
84+
#### Specify kernel state value weight
85+
86+
We use a [clang checker](../static_analysis_tools/ConditionChecker/) to get symbolic information of condition constraint:
87+
88+
```
89+
clang -Xclang -analyze -Xclang -analyzer-checker=debug.ConditionChecker ...... -c -o *.o *.c
90+
```
91+
92+
You can get some constraint value of variables. And patched syzkaller support a hash mode, if a ID^value can be found in the kstate map, use it as a unique state. So, you can specify a weight for a state with special value. Now, it can be specified in kstatemap manually only.

0 commit comments

Comments
 (0)