KEMBAR78
ch4 Handouts | PDF | Integrated Circuit | Software Development
0% found this document useful (0 votes)
105 views72 pages

ch4 Handouts

The document provides an overview of static and dynamic analysis techniques for examining malware, with a focus on disassembly. It discusses six levels of program abstraction from hardware to high-level languages. The key concepts covered include disassembling malware from machine code into assembly language, the structure of x86 assembly language, basic instructions like MOV and NOP, registers including EIP and EFLAGS, and data types like operands, memory addresses, and opcodes.

Uploaded by

Marah Irshedat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views72 pages

ch4 Handouts

The document provides an overview of static and dynamic analysis techniques for examining malware, with a focus on disassembly. It discusses six levels of program abstraction from hardware to high-level languages. The key concepts covered include disassembling malware from machine code into assembly language, the structure of x86 assembly language, basic instructions like MOV and NOP, registers including EIP and EFLAGS, and data types like operands, memory addresses, and opcodes.

Uploaded by

Marah Irshedat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Practical Malware Analysis

Ch 4: A Crash Course in x86


Disassembly
Basic Techniques
• Basic static analysis
– Looks at malware from the outside
• Basic dynamic analysis
– Only shows you how the malware operates in one
case
• Disassembly
– View code of malware & figure out what it does
Levels of Abstraction
Six Levels of Abstraction
• Hardware
• Microcode: Also called firmware
– Only operates on specific hardware it was designed
for
• Machine code: Opcodes that tells the processor
to do something
– Usually compiled from high-level language programs
• Low-level languages
• High-level languages
• Interpreted languages
Low-level languages
• Human-readable version of processor's
instruction set
• Assembly language
– PUSH, POP, NOP, MOV, JMP ...
• Disassembler generates assembly language
• This is the highest level language that can be
reliably recovered from malware when source
code is unavailable
High-level languages
• Most programmers use these
• C, C++, etc.
• Converted to machine code by a compiler
Interpreted languages
• Highest level
• Java, C#, Perl, .NET, Python
• Code is not compiled into machine code
• It is translated into bytecode
– An intermediate representation
– Independent of hardware and OS
– Bytecode executes in an interpreter, which
translates bytecode into machine language on the
fly at runtime
– Ex: Java Virtual Machine
Reverse Engineering
Disassembly
• Malware on a disk is in binary form at the
machine code level
• Disassembly converts the binary form to
assembly language
• IDA Pro is the most popular disassembler
Assembly Language
• Different versions for each type of processor
• x86 – 32-bit Intel (most common)
• x64 – 64-bit Intel
• SPARC, PowerPC, MIPS, ARM – others
• Windows runs on x86 or x64
• x64 machines can run x86 programs
• Most malware is designed for x86
The x86 Architecture
Registers
Data storage within the CPU
Faster than RAM

ALU (Arithmetic Logic Unit)


Executes an instruction and places
results in registers or RAM

Control unit
Fetches instructions from RAM using a
register named the instruction pointer
Data and Code
• Data are values placed in RAM when a
program loads
– These values are static
• They cannot change while the program is running
– They are also global
• Available to any part of the program
• Code: Instructions for the CPU
– Controls what the program does
Heap and Stack
• Heap is dynamic memory
– Changes frequently during program execution
– Program allocates new values, and frees them
when they are no longer needed
• Stack
– Local variables and parameters for functions
– Helps programs flow
Instructions
• Mnemonic followed by operands
• mov ecx 0x42
– Move into Extended C register the value 42 (hex)
• mov ecx is 0xB9 in hexadecimal
• The value 42 is 0x4200000000
• In binary this instruction is
Endianness
• Big-Endian
– Most significant byte first
– 0x42 as a 64-bit value would be 0x00000042
• Little-Endian
– Least significant byte first
– 0x42 as a 64-bit value would be 0x42000000
• Network data uses big-endian
• x86 programs use little-endian
IP Addresses
• 127.0.0.1, or in hex, 7F 00 00 01
• Sent over the network as 0x7F000001
• Stored in RAM as 0x0100007F
Operands
• Immediate
– Fixed values like – 0x42
• Register
– eax, ebx, ecx, and so on
• Memory address (indirect addressing)
– Denoted with brackets, like [eax]
Registers
• General registers
– Used by the CPU during execution
• Segment registers
– Used to track sections of memory
• Status flags
– Used to make decisions (EFLAGS)
• Instruction pointer
– Address of next instruction to execute (RIP/EIP)
Registers
x86/x64 Architecture: Registers

64-bit Lower 32 bits Lower 16 bits Lower byte 2nd byte


RAX EAX AX AL AH
RBX EBX BX BL BH
RCX ECX CX CL CH
RDX EDX DX DL DH
RSI ESI SI SIL
RDI EDI DI DIL
RBP EBP BP BPL
RSP ESP SP SPL
R8–R15 R8D–R15D R8W–R15W R8L–R15L

6 / 26
x86/x64 Architecture: Registers
• General purpose registers
– RAX: (Accumulator) used in arithmetic operations
– RBX: (Base) used as a pointer to data
– RCX: (Counter) used in shift/rotate instructions and loops
– RDX: (Data) used in arithmetic operations & I/O operations
– RSI: data source index
– RDI: data destination index
– R8–R15
• Added & available only in x64
– RBP: stack base pointer (start/base of stack)
– RSP: stack pointer (top of stack)
• Points to last occupied location (not next one to use)

• Special purpose registers


– RIP: instruction pointer, i.e., program counter (PC)
• Holds address of the next instruction to be executed
x86/x64 Architecture: Registers
• Floating-point arithmetic registers
• Memory segmentation registers
– Used to track sections of memory
– CS, SS, DS, ES, FS, GS
– Are NOT used any longer
• Control registers
– Used by the kernel only to control the CPU’s behavior, e.g., to switch
between protected mode & real mode
– CR0–CR10
• Debug registers
– Used by the kernel only to provide hardware support for debugging
– features such as breakpoints
– DR0–DR7
Size of Registers
• General registers (in x64) are all 64 bits in size
– Can be referenced as 64bits (rdx)
• General registers (in x86) are all 32 bits in size
– Can be referenced as either 32bits (edx) or 16 bits
(dx)
• Four registers (eax, ebx, ecx, edx) can also be
referenced as 8-bit values
– AL is lowest 8 bits
– AH is higher 8 bits
RAX
0000 0000 0000 0000 0000 0000 0000 0000 1010 1001 1101 1100 1000 0001 1111 0101
64 bits
0 0 0 0 0 0 0 0 A 9 D C 8 1 F 5

X64 RAX register breakdown


General Registers
• Typically store data or memory addresses
• Normally interchangeable
• Some instructions reference specific registers
– Multiplication and division use EAX and EDX
• Conventions
– Compilers use registers in consistent ways
– EAX contains the return value for function calls
Flags
• EFLAGS is a status register, 32 bits in size
• Each bit is a flag, SET (1) or Cleared (0)
• ZF Zero flag
– Set when the result of an operation is zero
• CF Carry flag
– Set when result is too large or small for destination
• SF Sign Flag
– Set when result is negative, or when most significant bit
is set after arithmetic
• TF Trap Flag
– Used for debugging—if set, processor executes only one
instruction at a time
EIP (Extended Instruction Pointer)
• Contains the memory address of the next
instruction to be executed
• If EIP contains wrong data, the CPU will fetch
non-legitimate instructions and crash
• Buffer overflows target EIP
Simple Instructions
Simple Instructions
• mov destination, source
– Moves data from one location to another
• We use Intel format throughout this course,
with destination first
• Remember indirect addressing
– [ebx] means the memory location pointed to
by EBX
lea (Load Effective Address)
• lea destination, source
• lea eax, [ebx+8]
– Puts ebx + 8 into eax
• Compare to
– mov eax, [ebx+8]
– Moves the data at location ebx+8 into eax
lea (Load Effective Address)
• The lea instruction is not used exclusively to refer to
memory addresses.
• It is useful when calculating values, because it requires
fewer instructions.
– For example, it is common to see an instruction such as
– lea ebx, [eax*5+5]
– where eax is a number, rather than a memory address.
– This instruction is the functional equivalent of ebx = (eax+1)*5
– but is more efficient for the compiler to use instead of a total of
four instructions (for example
inc eax
mov ecx, 5
mul ecx
mov ebx, eax
Arithmetic
• sub Subtracts
• add Adds
• inc Increments
• dec Decrements
• mul Multiplies
• div Divides
NOP
• Does nothing
• 0x90
• Commonly used as a NOP Sled
• Allows attackers to run code even if they are
imprecise about jumping to it
A First Look @ Instruction Set: Intel vs. AT&T

Intel Syntax AT&T Syntax Description


add r8, r9 add %r9, %r8 r8 = r8 + r9
mov r8, r9 mov %r9, %r8 Move data from r9 into
r8
mov r8, 0x99 mov $0x99, %r8 r8 = 0x99 (immediate
value)
mov r8, [r9] mov (%r9), %r8 Move data from address
pointed to by r9 into r8
mov [r8], r9 mov %r9, (%r8) Move data from r9 to
address pointed to by r8
push r8 push %r8 Push r8 onto the stack
pop r8 pop %r8 Pop top of stack into r8

9 / 26
Instructions: Machine-Level Structure

 Variable-length instructions: 1 byte up to 15 bytes


 Instruction code examples

Instruction Instruction code Description


mov ecx, 0x42 B9 42 00 00 00 Little endian val-
ues; Linux format
mov ecx, 2 B9 02 00 00 00 Same op code
mul ecx F7 E1 2-byte instruction
mov eax, 80000000h B8 00 00 00 80 Windows format
(h)
mov esi, 0F003Fh BE 3F 00 0F 00 Windows format
mov esi, ecx 89 CE Different op code
from above
nop 90 1-byte instruction;
no operation
10 / 26
Memory Operands

 Only one explicit memory operand per instruction


 Either source or destination
 General format: [base + index * scale + displacement]
 base & index are registers
 scale is an integer with the value 1, 2, 4, or 8
 displacement is a 32-bit constant or a symbol
 All of these components are optional

Instruction Description
mov eax, [0x4037C4] Copies 4 bytes @ memory location
0x4037C4 into EAX
mov eax, [ebx] Copies 4 bytes @ memory location
specified by EBX into EAX
mov eax, [ebx+esi*4] Copies 4 bytes @ memory location
specified by the result of “ebx + esi *
4” into EAX
11 / 26
Common x86 Instructions

Instruction Description
Data transfer
mov dst, src dst = src
xchg dst1, dst2 Swap dst1 and dst2
push src Push src onto the stack & decrement rsp
pop dst Pop value from stack into dst & increment
rsp
Arithmetic
add dst, src dst += src
sub dst, src dst -= src
inc dst dst += 1
dec dst dst -= 1
neg dst dst = -dst
cmp src1, src2 Set status flags based on src1 – src2

12 / 26
Common x86 Instructions

Instruction Description
Logical/bitwise
and dst, src dst &= src
or dst, src dst |= src
xor dst, src dst ˆ= src
not dst dst = ∼dst
test src1, src2 Set status flags based on src1 & src2
Unconditional branches
jmp addr Jump to address
call addr Push return address on stack, then call
function @ address
ret Pop return address from stack & return to
that address
syscall Enter the kernel to perform a system call

13 / 26
Common x86 Instructions

Instruction Description
Conditional branches (based on status flags)
jcc addr Jumps to address only if condition cc holds,
else it falls through
jncc Inverts the condition, jumping if it does not
hold
je addr/jz addr Jump if zero flag is set (e.g., operands were
equal in last cmp)
ja addr Jump if dst > src (“above”) in last compari-
son (unsigned)
jb addr Jump if dst < src (“below”) in last compari-
son (unsigned)
jg addr Jump if dst > src (“greater than”) in last com-
parison (signed)
jl addr Jump if dst < src (“less than”) in last com-
parison (signed)
14 / 26
Common x86 Instructions

Instruction Description
jge addr Jump if dst >= src in last comparison (signed)
jle addr Jump of dst <= src in last comparison (signed)
js addr Jump if last comparison set the sign bit (i.e.,
the result was negative)
Miscellaneous
lea dst, src Load memory address into dst (dst = &src,
where src must be in memory)
nop Do nothing (e.g., used for code padding)

15 / 26
Notes On Some Instructions

 Conditional jumps

Instructions Description
cmp rax, rbx
if rax < rbx, jump to label
jb label
test rax, rax
if rax != 0, jump to label
jnz label

 Loading memory addresses (load effective address)

Instruction Description
lea r8, [rip*5+0x2000] loads the address resulting from
the expression “rip * 5 + 0x2000”
into r8, i.e., r8 = rip * 5 + 0x2000

16 / 26
Signed vs Unsigned
• Many ISAs (ARM and x86) such as do not have distinct signed and unsigned
versions of operations.
– The burden is on the programmer to handle overflow and underflow.
– Many of the x86 arithmetic instructions are used for both signed and unsigned
types!
– There is not a signed add and unsigned add, the same instruction is used for both.
• The only difference between signed and unsigned instructions is that signed instructions can
generate an overflow exception and unsigned instructions can not.“
– Where needed, the EFLAGs (condition codes) set by the operation can be used to
detect the different kinds of overflow.
• Some instructions have special-case variants with different number of
operands.
– imul src
– single operand imul assumes other operand in %rax computes 128-bit result,
stores high 64-bits in %rdx
Conditional Jumps Instructions
• cmp or test is generally executed before all
branches to set the flags followed by a jump
instruction variant that reads the flags to
determine whether to take the branch or
continue.
• There are 32 variants of conditional jump,
several of which are synonyms
Conditionals
• test
– Compares two values the way AND does, but does
not alter them
– test eax, eax
• Sets Zero Flag if eax is zero
• cmp eax, ebx
– Sets Zero Flag if the arguments are equal
Branching
• jz loc
– Jump to loc if the Zero Flag is set
• jnz loc
– Jump to loc if the Zero Flag is cleared
Conditional jump Groups
• Conditional jump instructions can be divided
into four groups:
1. Jumps based on the value of a single
arithmetic flag
2. Jumps based on the value of CX or ECX
3. Jumps based on comparisons of signed
operands
4. Jumps based on comparisons of unsigned
operands
Jumps on a Single Flag
Signed and ecx jumps
• jumps based on the value of CX and ECX.

• list of signed jumps based on comparisons of


signed
Unsigned jumps
Summary
• All conditional jumps except two (JCXZ and
JECXZ) use the processor flags for their
criteria.
• Thus, any statement that sets or clears a flag
can serve as a test basis for a conditional
jump.
• The jump statement can be any one of 32
conditional-jump instructions
Reference
Intel® 64 and IA-32 Architectures Software
Developer’s Manual Combined Volumes: 1
Other Stack Instructions
• All used with functions
– Call
– Leave
– Enter
– Ret
The Stack
• Memory for functions, local variables, and
flow control
• Last in, First out
• ESP (Extended Stack Pointer) – top of stack
• EBP (Extended Base Pointer) – bottom of stack
• PUSH puts data on the stack
• POP takes data off the stack
Program/Process Memory Layout

 Code (.text) segment: contains machine code instructions


 Data segment: contains initialized static & global variables
 Heap: stores dynamically allocated variables (managed via malloc,
new, free & delete)
 Stack: stores local variables of functions, & data related to function
calls (e.g., return address & arguments/parametrs)

FFFF…FFFF High Address


Stack

Heap

Data Segment

Code Segment
0000…0000 Low Address

17 / 26
Function Calls
• Small programs that do one thing and return,
like printf()
• Prologue
– Instructions at the start of a function that prepare
stack and registers for the function to use
• Epilogue
– Instructions at the end of a end of a function that
restore the stack and registers to their state
before the function was called
Function Calls & Stack

1. Arguments are placed in registers (mov) & on stack (push)


2. call addr
2.1 Pushes RIP onto stack
2.2 RIP is loaded with addr
3. Callee (called function) structure
3.1 Pushes RBP on stack, sets RBP to RSP & space is allocated on stack for
the new function
push rbp
mov rbp, rsp
sub rsp, size
3.2 Does work
3.3 Return value (integer) is put into RAX
3.4 leave sets RSP to RBP & then pops top of stack into RBP
mov rsp, rbp
pop rbp
3.5 ret: pops top of stack into RIP
4. Stack is adjusted to remove the arguments that were sent (unless
they’ll be used again later)
18 / 26
Function Calls & Stack

 Upon starting to execute call @90



90 push rbp
91 mov rbp, rsp
92 sub rsp, 50

98 leave
99 ret
100 push rbp
101 mov rbp, rsp
220 val 2 ← rsp
102 sub rsp, size


300 val 1 ← rbp
105 instr 1
106 call @90
107 instr 2 ← rip

19 / 26
Function Calls & Stack

 call @90

90 push rbp ← rip
91 mov rbp, rsp
92 sub rsp, 50

98 leave
99 ret
100 push rbp
212 107 ← rsp
101 mov rbp, rsp
220 val 2
102 sub rsp, size


300 val 1 ← rbp
105 instr 1
106 call @90
107 instr 2

20 / 26
Function Calls & Stack

 push rbp

90 push rbp
91 mov rbp, rsp ← rip
92 sub rsp, 50

98 leave
99 ret
204 300 ← rsp
100 push rbp
212 107
101 mov rbp, rsp
220 val 2
102 sub rsp, size


300 val 1 ← rbp
105 instr 1
106 call @90
107 instr 2

21 / 26
Function Calls & Stack

 mov rbp, rsp



90 push rbp
91 mov rbp, rsp
92 sub rsp, 50 ← rip

98 leave
99 ret
204 300 ← rsp=rbp
100 push rbp
212 107
101 mov rbp, rsp
220 val 2
102 sub rsp, size


300 val 1
105 instr 1
106 call @90
107 instr 2

22 / 26
Function Calls & Stack

 sub rsp, 50

90 push rbp
91 mov rbp, rsp
92 sub rsp, 50
… ← rip
154 val 3 ← rsp
98 leave

99 ret
204 300 ← rbp
100 push rbp
212 107
101 mov rbp, rsp
220 val 2
102 sub rsp, size


300 val 1
105 instr 1
106 call @90
107 instr 2

23 / 26
Function Calls & Stack

 leave ≡ (mov rsp, rbp; pop rbp)



90 push rbp
91 mov rbp, rsp
92 sub rsp, 50

154 val 3
98 leave

99 ret ← rip
204 300 ← rbp=rsp
100 push rbp
212 107
101 mov rbp, rsp
220 val 2
102 sub rsp, size


300 val 1
105 instr 1
106 call @90
107 instr 2

24 / 26
Function Calls & Stack

 leave ≡ (mov rsp, rbp; pop rbp)



90 push rbp
91 mov rbp, rsp
92 sub rsp, 50

154 val 3
98 leave

99 ret ← rip
204 300
100 push rbp
212 107 ← rsp
101 mov rbp, rsp
220 val 2
102 sub rsp, size


300 val 1 ← rbp
105 instr 1
106 call @90
107 instr 2

25 / 26
Function Calls & Stack

 ret

90 push rbp
91 mov rbp, rsp
92 sub rsp, 50

154 val 3
98 leave

99 ret
204 300
100 push rbp
212 107
101 mov rbp, rsp
220 val 2 ← rsp
102 sub rsp, size


300 val 1 ← rbp
105 instr 1
106 call @90
107 instr 2 ← rip

26 / 26
C Main Method
• Every C program has a main() function
• int main(int argc, char** argv)
– argc contains the number of arguments on the
command line
– argv is a pointer to an array of names containing
the arguments
Example
• cp foo bar
• argc = 3
• argv[0] = cp
• argv[1] = foo
• argv[2] = bar

You might also like