Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views19 pages

Munich Rust 2020

William Woodruff's presentation at the Rust Munich Meetup discusses steganography in x86 binaries, introducing a tool called steg86 that hides messages within binary programs using semantic duals of x86 instructions. The tool is capable of embedding and extracting messages while maintaining the integrity of the original binary, leveraging the complexity of x86 instruction encoding. Woodruff also addresses the challenges and limitations of this approach, including code/data disambiguation and detection issues.

Uploaded by

rtloweb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views19 pages

Munich Rust 2020

William Woodruff's presentation at the Rust Munich Meetup discusses steganography in x86 binaries, introducing a tool called steg86 that hides messages within binary programs using semantic duals of x86 instructions. The tool is capable of embedding and extracting messages while maintaining the integrity of the original binary, leveraging the complexity of x86 instruction encoding. Woodruff also addresses the challenges and limitations of this approach, including code/data disambiguation and detection issues.

Uploaded by

rtloweb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

steg86: hiding messages in x86 binaries

rust munich meetup

william woodruff

august 25 2020
agenda

I yours truly
I steganography?
I steg on programs
I x86 instruction encoding
I steg86
yours truly

I william woodruff
I @8x5clPW2 • yossarian.net • blog.yossarian.net

I senior security engineer @ trail of bits


I work: program analysis research, mostly in LLVM
I disclaimer: independent talk, not representing employer

I open source: member of homebrew, miscellaneous contributor


steganography?

I “hiding data within data”


I not cryptography
I different techniques for
different data
I popular targets:
I images
I sound files
I plain text
I what about programs?
steg on programs

programs are a natural choice


for steg
I can be very large (lots of
info capacity)
I complex binary formats
(PE, Mach-O, ELF)
I complex instruction
encodings (x86/AMD64,
ARM w/ Thumb)
I present on every computer,
not inherently suspicious
steg on programs: approaches

I hide information in stack layout, register selection


I problem: need the program’s source
I problem: need to maintain a compiler. . .

I hide information in the format itself (e.g. segment order)


I problem: specific to a format, may not apply to others

I rewrite the program after compilation


I ex: add eax, -50 → sub eax, 50
I problem: code/data disambiguation (difficult to solve)
I problem: relocations, position independent code (-fPIC)
I problem: CPU-level semantics (arithmetic, status flags)

I can we do better?
x86 instruction encoding
I variable length (up to 15 bytes)
I extremely complex (decades of compat, overloaded fields)
I rich source/sink combinations
I register-to-register (mov ebx, eax)
I register-to-memory (mov dword [1337], eax)
I memory-to-register (mov eax, dword [1337])
I immediate-to-register (mov eax, 1337)
I immediate-to-memory (mov dword [1337], 1337)
x86 instruction encoding: modr/m

I essentially an 8-bit lookup table of (some) operand encodings


I doesn’t cover all possible operands, for historical reasons. . .

I simplest case: encodes one or two operands


I reg/opcode field: one register operand
I r/m field: one register or memory operand
I enables mem-to-reg, reg-to-mem, reg-to-reg operations
x86 instruction encoding: xor

opcode instruction
31 /r xor r/m32, r32
33 /r xor r32, r/m32

I reg-to-mem, mem-to-reg, reg-to-reg, ...


I there are two reg-to-reg encodings!
I 31 C0 → mov eax, eax
I 33 C0 → also mov eax, eax!
I they’re even the same size!
I 64-bit variants (w/ REX prefix) work too!
steg86

I central conceit: each reg-to-reg pair represents one bit of


information
I with enough bits, we can hide messages!

I binary format independent


I uses goblin to unpack PE/ELF/Mach-O binaries

I encodings are the same size, so PIC/relocations aren’t broken


I uses iced for decoding/encoding/semantics

I ~700 lines of rust total (much of it constants)


I CLI: steg86 {profile,embed,extract}
steg86: semantic duals
I it turns out there are a bunch of these
I 9 instructions (add, adc, sub, sbb, and, or, xor, mov, cmp)
I 4 variants (8, 16, 32, 64-bit) each1

I each dual gives us 1 bit of information


I minus a little space for a header with metadata

I how common are these instructions?


$ steg86 profile /bin/bash
Summary for /bin/bash:
175828 total instructions
27957 potential semantic pairs
27925 bits of information capacity (3490 bytes)
I not bad!

1
actually 3 in any particular CPU mode. . .
steg86: semantic duals

each pair represents (false, true). . .


static SEMANTIC_PAIRS: &[(Code, Code)] = &[
// ADD
(Code::Add_rm8_r8, Code::Add_r8_rm8),
(Code::Add_rm16_r16, Code::Add_r16_rm16),
(Code::Add_rm32_r32, Code::Add_r32_rm32),
(Code::Add_rm64_r64, Code::Add_r64_rm64),

// ... snip ...


];
steg86: profiling

for every instruction in the program. . .


// skip instructions we don't support
if !SUPPORTED_OPCODES.contains(&instruction.code()) {
continue;
}

// skip non reg-to-reg instructions


if instruction.op0_kind() != OpKind::Register
|| instruction.op1_kind() != OpKind::Register
{
continue;
}

offsets.push(instruction.ip() as usize);
steg86: embedding
for each candidate instruction. . .
let new_code = {
let tuple = SEMANTIC_PAIRS
.iter()
.find(|&&t| old_code == t.0 || old_code == t.1)
.unwrap();

match (bit, tuple.0 == old_code) {


(false, true) | (true, false) => {
// already correct!
continue;
}
(false, false) => tuple.0,
(true, true) => tuple.1,
}
};
steg86: embedding

let new_instruction = Instruction::with_reg_reg(


new_code,
instruction.op0_register(),
instruction.op1_register(),
);
let new_len = encoder
.encode(&new_instruction, offset as u64)
.map_err(|s| anyhow!(s))?;

// ... snip ...

text_copy
.data
.splice(
offset..(offset + new_len), encoder.take_buffer());
steg86: results
binary diff:
$ cargo install steg86

$ echo "hello!" > message.txt

$ steg86 embed \
/bin/bash test.steg \
< message.txt

$ steg86 extract test.steg


hello!
steg86: next steps

I other tricks
I test reg1, reg2 is the same as test reg2, reg1
I same with xchg
I multi-byte nops

I deficiencies
I code/data disambiguation is impossible in the general case
I many open problems in program analysis reduce to this
I partial workarounds: CFG recovery, jump table identification

I very easy to detect (real compilers stick to one encoding)


thank you!
slides: yossarian.net/publications#munich-rust-2020
github: woodruffw/steg86
blog post: hiding messages in x86 binaries using semantic duals
contact: [email protected] / @8x5clPW2
links and prior work

I A86 assembler (1980s!)


I HYDAN (2004)
I ARMaHYDAN (2019, PoC||GTFO)

You might also like