Basic exploitation techniques
20200123
Outline
A primer on x86 assembly
Memory segments
Stack-based buffer overflows
Heap-based buffer overflows
Format strings
1
A primer on x86 assembly
Introduction
Verily, when the developer herds understand the tools
that drive them to their cubicled pastures every day,
then shall the 0day be depleted — but not before.
– Pastor Manul Laphroaig
2
It’s a trap!
• ≈ 1000 instructions . . .
• No time to know them all :-)
This overview is meant as a first help
Multiple syntaxes
• AT&T
• Intel
3
Basics
In general
Mnemonics accept from 0 to 3 arguments.
2 arguments mnemonics are of the form (Intel syntax)
m dst, src
which roughly means
dst ← dst src
where is the semantics of m
4
Endianness
x = 0xdeadbeef
Endianness
byte address 0x00 0x01 0x02 0x03
byte content (big-endian) 0xde 0xad 0xbe 0xef
byte content (litte-endian) 0xef 0xbe 0xad 0xde
5
Architectures
Big endian
PowerPC, Sparc, 68000
Little endian
Intel, AMD
Bi-endian
ARM, RISC-V
These usually defaults to little endian.
6
Resources
• Cheat sheet
• Opcode and Instruction Reference
• Intel full instruction set reference
7
Basic registers (16/32/64 bits)
64 32 16 name (8080) / use
r e ax accumulator
r e bx base address
r e cx count
r e dx data
r e di source index
r e si destination index
r e bp base pointer
r e sp stack pointer
r e ip instruction pointer
• esp (e = extended) is the 32 bits stack pointer
• rsp (r = register) is the 64 bits one
8
Less basic registers (64 bits)
Add extended general purpose registers r8-15
• r7*d* accesses the lower 32 bits of r7;
• r7*w* accesses the lower 16 bits;
• r7*b* accesses its lower 8 bits.
9
The full story
10
Register flags (partial)
of overflow flag
cf carry flag
zf zero flag
sf sign flag
df direction flag
pf parity flag
af adjust flag
11
Signed vs unsigned
At machine-level, every value is a bitvector.
Bitvectors can be seen through different lenses:
• unsigned value
• signed value
• float (will not talk about it)
12
Transfer
Move
mov dst, src dst := src
xchg o1, o2 tmp:= o1; o1 := o2; o2 := tmp
13
Arithmetic
All 4 arithmetic operations are present
add src, dst dst ← dst + src
sub src, dst dst ← dst - src
div src t64 ← edx @ eax
eax ← t64 / src
edx ← t64 % src
mul src t64 ← eax * src
edx ← t64{32,63}
eax ← t64{0,31}
14
Arithmetic
inc dst dst ← dst + 1
dec dst dst ← dst - 1
sal/sar dst, src arithmetic shift left / right
Sign preservation
1 mov ax, 0xff00 # unsigned: 65280, signed : -256
2 # ax=1111.1111.0000.0000
3 sal ax, 2 # unsigned: 64512, signed : -1024
4 # ax=1111.1100.0000.0000
5 sar ax, 5 # unsigned: 65504, signed : -32
6 # ax=1111.1111.1110.0000
15
Basic logical operators
Basic semantics
and dst, src dst ← dst & src
or dst, src dst ← dst | src
xor dst, src dst ← dst ˆ src
not dst dst ← ~dst
Examples
1 xor ax, ax # ax = 0x0000
2 not ax # ax = 0xffff
3 mov bx, 0x5500 # bx = 0x5500
4 xor ax, bx # ax = 0xbbff
16
Logical shifts
Shift
shl dst, src logical shift left
shr dst, src logical shift right
Logical and arithmetic shift lefts are the same.
Example
1 mov ax, 0xff00 # unsigned: 65280, signed : -256
2 # ax=1111.1111.0000.0000
3 shl ax, 2 # unsigned: 64512, signed : -1024
4 # ax=1111.1100.0000.0000
5 shr ax, 5 # unsigned: 2016, signed : 2016
6 # ax=0000.0111.1110.0000
17
Comparison and test instructions
Comparison
cmp dst, src : set condition according to dst − src
Test
test dst, src: set condition according to dst & src
18
Stack manipulation
Push
push src dec sp; @[sp] := src
Pop
pop src src := @[sp]; inc sp
19
Nops
The nop instruction does nothing (it’s skip!).
There are lots of nop instructions.
Assembly Byte sequence
66 NOP 66 90H
NOP DWORD ptr [EAX] 0F 1F 00H
NOP DWORD ptr [EAX + 00H] 0F 1F 40 00H
NOP DWORD ptr [EAX + EAX*1 + 00H] 0F 1F 44 00 00H
66 NOP DWORD ptr [EAX + EAX*1 + 00H] 66 0F 1F 44 00 00H
NOP DWORD ptr [EAX + 00000000H] 0F 1F 80 00 00 00 00H
NOP DWORD ptr [EAX + EAX*1 + 00000000H] 0F 1F 84 00 00 00 00 00H
66 NOP DWORD ptr [EAX + EAX*1 + 00000000H] 66 0F 1F 84 00 00 00 00 00H
20
Misc
Lea (load effective address)
lea dst, [src] dst := src
mov dst, [src] dst := @[src]
Int
int n runs interrupt number n
21
Unconditional jump instructions
Call
call address
call *op
call pushes eip
Jmp
jmp *op
jmp address
jmp only jumps
22
Extra jumps
Leave
esp := ebp; ebp := pop();
Ret
esp := esp + 4; eip := @[esp - 4];
23
Unsigned jumps
jump if n version e version
ja above Í Í
jb below Í Í
jc carry Í ë
Reading
ja has n and e versions, means that mnemonics
• jna (not above),
• jae (above or equal),
• jnae (not above or equal)
exist as well
24
Signed jumps
jump type if n version e version
jg greater Í Í
jl lower Í Í
jo overflow Í ë
js if sign Í ë
25
Addressing modes
The addressing mode determines, for an instruction that
accesses a memory location, how the address for the memory
location is specified.
Mode Intel
Immediate mov ax, 16h
Direct mov ax, [1000h]
Register Direct mov bx, ax
Register Indirect (indexed) mov ax, [di]
Based Indexed Addressing mov ax, [bx + di]
Based Indexex Disp. mov eax, [ebx + edi + 2]
26
The semantics of instructions
may seem intuitive
but is complex
26
Instructions do have side effects
1 // 04 16 / add al, 0x16
2 0: res8 := (eax(32){0,7} + 22(8))
3 1: OF := ((eax(32){0,7}{7} = 22(8){7}) &
4 (eax(32){0,7}{7} != res8(8){7}))
5 2: SF := (res8(8) <s 0(8))
6 3: ZF := (res8(8) = 0(8))
7 4: AF := ((extu eax(32){0,7}{0,7} 9) + 22(9)){8}
8 5: PF := !
9 ((((((((res8(8){0} ^ res8(8){1}) ^ res8(8){2}) ^
10 res8(8){3}) ^ res8(8){4}) ^ res8(8){5}) ^
11 res8(8){6}) ^ res8(8){7}))
12 6: CF := ((extu eax(32){0,7} 9) + 22(9)){8}
13 7: eax{0, 7} := res8(8)
27
Real behavior of conditions
Mnemonic Flag cmp x y sub x y test x y
ja ¬ CF ∧¬ ZF x >u y x 0 6= 0 x &y 6= 0
jnae CF x <u y x 0 6= 0 ⊥
je ZF x =y x0 = 0 x &y =0
jge OF = SF x ≥y > x ≥0∨y ≥0
jle ZF ∨ OF 6= SF x ≤y > x&y = 0 ∨
(x < 0 ∧ y < 0)
28
Shift left
The OF flag is affected only on 1-bit shifts. For left
shifts, the OF flag is set to 0 if the most-significant
bit of the result is the same as the CF flag (that is, the
top two bits of the original operand were the same);
otherwise, it is set to 1. For the SAR instruction, the
OF flag is cleared for all 1-bit shifts. For the SHR
instruction, the OF flag is set to the most-significant
bit of the original operand.
The OF flag is affected only for 1-bit shifts (see "De-
scription" above); otherwise, it is undefined.
29
Memory segments
General overview
A compiled program has 5 segments:
1. code (text)
2. stack
3. data segments
3.1 data
3.2 bss
3.3 heap
30
Execution
1. Read instruction i @ eip
2. Add byte length of i to eip
3. Execute i
4. Goto 1
31
Graphically speaking
stack function, locals
the hole
the break
heap malloc, free
bss globals
data
text
32
Text segment
stack
• The text segment (aka code segment)
the hole
is where the code resides.
• It is not writable. Any attempt to to
write to it will kill the program.
heap
• As it is ro, it can be shared among
bss processes.
• It has a fixed size
data
text
33
Data & bss segments
stack
• The data segment is filled with
the hole
initialized global and static variables.
• The bss segment contains the
uninitialized ones. It is zeroed on
heap
program startup.
bss • The segments are (of course) writable.
• They have a fixed size
data
text
34
Heap segment
stack
• The heap segment is directly
the hole controlled by the programmer
• Blocks can be allocated or freed and
used for anything.
heap • It is writable
bss • It can grow larger, towards higher
memory addresses – or smaller, on
need
data
text
35
Stack segment
stack
• The stack segment is a temporary
scratch pad for functions
the hole
• Since eip changes on function calls,
the stack is used to remember the
previous state (return address, calling
heap function base, arguments, . . . ).
• It is writable
bss
• It can grow larger, towards lower
memory addresses – w.r.t to function
data calls.
text
36
In C
1 void test_function(int a, int b, int c, int d)
2 {
3 int flag;
4 char buffer[10];
5 flag = 31337;
6 buffer[0] = 'A';
7 }
8
9 int main()
10 {
11 test_function(1, 2, 3, 4);
12 }
37
Stack-based buffer overflows
C low-level responsibility
In C, the programmer is responsible for data integrity.
This means there are no guards to ensure data is freed, or that
the contents of a variable fits into memory,
This exposes memory leaks and buffer overflows
38
Reminder : stack layout for x86
return address f
saved frame pointer f
stack frame f
Code locals f
f: ...
call g
... arguments g
return address g
Data saved frame pointer g
val1 pointer to data
val2
stack frame g
locals g
buffer
39
Vulnerability reason
• When an array a is declared in C, space is reserved for it.
• a will be manipulated through offsets from its base
pointer.
• At run-time, no information about the array size is present
• Thus, it is allowed to copy data beyond the end of a
40
A rich history
1972 First document attack
1988 Morris worm
1995 NCSA httpd 1.3
1996 Smashing the Stack for Fun & Profit
41
Basic exploitation
return address f
saved frame pointer f
stack frame f
Code locals f
f: ...
call g
... arguments g
return address g
Data saved frame pointer g
val1 pointer to data
val2
stack frame g
injected code
locals g
42
Frame pointer overwriting
return address f
saved frame pointer f
stack frame f
Code locals f
f: ...
call g
... arguments g
return address g
Data saved frame pointer g
val1 pointer to data
val2 return address f
saved frame pointer f
stack frame g
locals g
injected code
43
Indirect pointer overwriting
return address f
saved frame pointer f
stack frame f
Code locals f
f: ...
call g
... arguments g
return address g
Data saved frame pointer g
val1 pointer to data
val2
stack frame g
injected code
locals g
44
Example 1
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int check_authentication(char *password) {
6 int auth_flag = 0;
7 char password_buffer[16];
8 strcpy(password_buffer, password);
9 if (strcmp(password_buffer, "brillig") == 0)
10 auth_flag = 1;
11 if (strcmp(password_buffer, "outgrabe") == 0)
12 auth_flag = 1;
13 return auth_flag;
14 }
15
16 int main(int argc, char *argv[]) {
17 if (argc < 2) { printf("Usage: %s <password>\n", argv[0]); exit(0); }
18 if (check_authentication(argv[1])) printf("\nAccess Granted.\n");
19 else printf("\nAccess Denied.\n");
20 }
45
Example 2
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int check_authentication(char *password) {
6 char password_buffer[16]; /* Putting buffers before variables to impede
7 int auth_flag = 0;
8 strcpy(password_buffer, password);
9 if (strcmp(password_buffer, "brillig") == 0)
10 auth_flag = 1;
11 if (strcmp(password_buffer, "outgrabe") == 0)
12 auth_flag = 1;
13 return auth_flag;
14 }
15
16 int main(int argc, char *argv[]) {
17 if (argc < 2) { printf("Usage: %s <password>\n", argv[0]); exit(0); }
18 if (check_authentication(argv[1])) printf("\nAccess Granted.\n");
19 else printf("\nAccess Denied.\n");
20 }
46
Constraints
Needs
• Hardware willing to execute data as code
• No null bytes
Variants
• Frame pointer corruption
• Causing an exception to execute a specific function
pointer
47
Statistics # (https://nvd.nist.gov/vuln)
48
Statistics % (https://nvd.nist.gov/vuln)
49
Heap-based buffer overflows
Vulnerability
Heap memory is dynamically allocated at runtime.
Arrays on the heap overflow just as well as those on the stack.
Warning
The heap grows towards higher addresses instead of lower
addresses.
This is the opposite of the stack.
50
Basic exploitation
Overwriting heap-based function pointers located after the
buffer
Overwriting virtual function pointer
1998 IE4 Heap overflow
2002 Slapper worm (Linux, OpenSSL)
CVE-2007-1365 OpenBSD 2nd remote exploits in 10 years
CVE-2017-11779 Windows DNS client
51
Overwriting heap-based function pointers
1 typedef struct _vulnerable_struct
2 {
3 char buff[MAX_LEN];
4 int (*cmp)(char*,char*);
5
6 } vulnerable;
7
8 int is_file_foobar_using_heap(vulnerable* s, char* one, char* two)
9 {
10 strcpy( s->buff, one );
11 strcat( s->buff, two );
12 return s->cmp(s->buff, "foobar");
13 }
52
Constraints
• Ability to determine the address of heap
• If string-based, no null-bytes
Variants
• Corrupt pointers in other (adjacent) data structures
• Corrupt heap metadata
53
Statistics # (https://nvd.nist.gov/vuln)
54
Statistics % (https://nvd.nist.gov/vuln)
55
Format strings
About format strings vulnerabilities
They were the ‘spork‘ of exploitation. ASLR? PIE?
NX Stack/Heap? No problem, fmt had you covered.
56
Vulnerability
Format functions are variadic.
1 int printf(const char *format, ...);
How it works
• The format string is copied to the output unless ’%’ is
encountered.
• Then the format specifier will manipulate the output.
• When an argument is required, it is expected to be on the
stack.
57
Caveat
And so ..
If an attacker is able to specify the format string, it is now
able to control what the function pops from the stack and
can make the program write to arbitrary memory locations.
CVEs
Software CVE
Zend 2015-8617
latex2rtf 2015-8106
VmWare 8x 2012-3569
WuFTPD (providing remote root since 1994) 2000
58
Good & Bad
Good Í Bad ë
1 int f (char *user) { 1 int f (char *user) {
2 printf("%s", user); 2 printf(user);
3 } 3 }
59
Exploitation
Badly formatted format parameters can lead to :
• arbitrary memory read (data leak)
• arbitrary memory write
• rewriting the .dtors section
• overwriting the Global Offset Table (.got)
60
Example
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int main(int argc, char *argv[]) {
6 char text[1024];
7 static int test_val = 65;
8 if (argc < 2) {
9 printf("Usage: %s <text to print>\n", argv[0]);
10 exit(0);
11 }
12 strcpy(text, argv[1]);
13 printf("The right way to print user-controlled input:\n");
14 printf("%s", text);
15 printf("\nThe wrong way to print user-controlled input:\n");
16 printf(text);
17 // Debug output
18 printf("\n[*] test_val @ 0x%08x = %d 0x%08x\n",
19 &test_val, test_val, test_val);
20 exit(0);
21 } 61
Stack situation
fmt
...
argn
...
arg1
&fmt
62
Reading from arbitrary addresses
The %s format specifier can be used to read from arbitrary
addresses
1 $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
2 The right way to print user-controlled input:
3 AAAA%08x.%08x.%08x.%08x
4 The wrong way to print user-controlled input:
5 AAAAffffcbc0.f7ffcfd4.565555c7.41414141
6 [*] test_val @ 0x56557028 = 65 0x00000041
63
Printing local variable
1 $ ./fmt_vuln $(printf "\x28\x70\x55\x56")%08x.%08x.%08x.%s
2 The right way to print user-controlled input:
3 (pUV%08x.%08x.%08x.%s
4 The wrong way to print user-controlled input:
5 (pUVffffcbc0.f7ffcfd4.565555c7.A
6 [*] test_val @ 0x56557028 = 65 0x00000041
65 is the ASCII value of ’a’
64
Writing to arbitrary memory
As %s, %n can be used to write to arbitrary addresses.
1 $ ./fmt_vuln $(printf "\x28\x70\x55\x56")%08x.%08x.%08x.%n
2 The right way to print user-controlled input:
3 (pUV%08x.%08x.%08x.%n
4 The wrong way to print user-controlled input:
5 (pUVffffcbc0.f7ffcfd4.565555c7.
6 [*] test_val @ 0x56557028 = 31 0x0000001f
65
It may be unintentional
• printf("100% dave") prints stack entry above saved
eip
• printf("%s") prints bytes pointed to by that stack entry
• printf("%d %d %d ...") prints a series of stack entries
as integer
• printf("%08x %08x %08x ...") same but as
hexadecimal values
• printf("100% no way") writes 3 to the address
pointed to by stack entry
66
Statistics # (https://nvd.nist.gov/vuln)
67
Statistics % (https://nvd.nist.gov/vuln)
68
Looking back
Buffer overflow Format string
public since ≈ 1985 1999
dangerous 1990’s 2000
# exploits thousands dozens
considered security threat programming bug
techniques evolved & advanced basic
visibility sometimes hard easy
69
Play (exploitation) games
https://microcorruption.com
70
Questions ?
https://rbonichon.github.io/teaching/2020/asi36/
70