Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ Callee Public

Official code of Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

License

Notifications You must be signed in to change notification settings

vul337/Callee

Repository files navigation

CALLEE

Official code of CALLEE: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning.

For ease of use, we have made some changes to the original implementation in the paper.

Status: We have substituted the doc2vec model with transformers and released a new dataset.

  • The new work kTrans is here.
  • The new dataset is here.

We have decided to deprecate the old dataset since it was collected several years ago on older versions of Firefox and the Linux kernel.

Usage

Environment

Tested on Ubuntu 18.04 with

  • Python3 (python-magic, gensim, numpy, torch, tqdm, capstone)
  • IDA Pro 7.6
  • CUDA 10.2

Pipeline

NOTE: This is a single-thread demo, consider multiprocessing for production or batch processing

a. Slice target binary with IDA

python3 run-slice.py -i /path/to/binary -o /path/to/slices -n <num_workers> --ida_path /path/to/idat64

The script invokes IDA Pro to analyze the binary and perform slicing for indirect callsites and candidate callees.

b. Tokenize the slices

python3 preprocess.py -i /path/to/slices -o /path/to/tokenized_slices

The script tokenizes assembly instructions of slices.

c. Generate embeddings with doc2vec

python3 store_emb.py -i /path/to/tokenized_slices -o /path/to/embeddings --doc2vec_model /path/to/doc2vec_model

The script transforms slices into embeddings with pretrained doc2vec model.

d. Predict with the Siamese network

python3 pred.py -i /path/to/embeddings

The script outputs scores for each (indirect callsite, candidate callee).

Tool for collecting indirect call

Here is a qemu tcg plugin we've modified to collect indirect calls on x86_64: ibresolver

About

Official code of Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages