1.1k likes | 1.52k Views
Chapter 4: A Crash Course in x86 Disassembly Chapter 5: IDA Pro Chapter 6: Recognizing C Code Constructs in Assembly Chapter 7: Analyzing Malicious Windows Programs. Part 2: Advanced Static Analysis. Chapter 4: A Crash Course in x86 Disassembly. How software works.
E N D
Chapter 4: A Crash Course in x86 Disassembly Chapter 5: IDA Pro Chapter 6: Recognizing C Code Constructs in Assembly Chapter 7: Analyzing Malicious Windows Programs Part 2: Advanced Static Analysis
How software works • gcc compiler driver pre-processes, compiles, assembles and links to generate executable • Links together object code (i.e. game.o) and static libraries (i.e. libc.a) to form final executable • Links in references to dynamic libraries for code loaded at load time (i.e. libc.so.1) • Executable may still load additional dynamic libraries at run-time Pre- processor Compiler Assembler Linker hello.c hello.i hello.s hello.o hello Program Source Modified Source Assembly Code Object Code Executable Code
Executables • Various file formats • Linux = Executable and Linkable Format (ELF) • Windows = Portable Executable (PE)
ELF Object File Format • ELF header • Magic number, type (.o, exec, .so), machine, byte ordering, etc. • Program header table • Page size, virtual addresses of memory segments (sections), segment sizes, entry point • .text section • Code • .data section • Initialized (static) data • .bss section • Uninitialized (static) data • “Block Started by Symbol” 0 ELF header Program header table (required for executables) .text section .data section .bss section .symtab .rel.text .rel.data .debug Section header table (required for relocatables)
ELF Object File Format (cont) • .rel.text section • Relocation info for .text section • Addresses of instructions that will need to be modified in the executable • Instructions for modifying. • .rel.data section • Relocation info for .data section • Addresses of pointer data that will need to be modified in the merged executable • .symtab section • Symbol table • Procedure and static variable names • Section names and locations • .debug section • Info for symbolic debugging (gcc -g) 0 ELF header Program header table (required for executables) .text section .data section .bss section .symtab .rel.text .rel.data .debug Section header table (required for relocatables)
PE (Portable Executable) file format • Windows file format for executables • Based on COFF Format • Magic Numbers, Headers, Tables, Directories, Sections
Example C Program m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); }
Merging Relocatable Object Files into an Executable Object File Relocatable Object Files Executable Object File 0 system code .text headers .data system data system code main() .text a() main() .text m.o more system code .data int e = 7 system data .data int e = 7 a() int *ep = &e .text int x = 15 .bss a.o .data int *ep = &e int y int x = 15 .symtab .debug .bss int y
Program execution • Operating system provides • Protection and resource allocation • Abstract view of resources (files, system calls) • Virtual memory • Uniform memory space abstraction for each process • Gives the illusion that each process has entire memory space
How does a program get loaded? • The operating system creates a new process. • Including among other things, a virtual memory space • System loader • Loads the executable file from the file system into the memory space • Done via DMA (direct memory access) • Executable contains code and statically link libraries • Executable in file system remains and can be executed again • Loads dynamic shared objects/libraries into memory space • Done via DMA from file system as with original executable • Resolves addresses in code (using .rel.text and .rel.data information) based on where code/data is loaded • Starts a thread of execution running based on specified entry point in ELF/PE header
Loading Executable Binaries Executable object file for example program p 0 ELF header Virtual addr Process image Program header table (required for executables) 0x080483e0 init and shared lib segments .text section .data section 0x08048494 .text segment (r/o) .bss section .symtab .rel.text 0x0804a010 .data segment (initialized r/w) .rel.data .debug 0x0804a3b0 Section header table (required for relocatables) .bss segment (uninitialized r/w)
Example: Linux virtual memory space (32-bit) 0xffffffff kernel virtual memory (code, data, heap, stack) memory invisible to user code 0xc0000000 user stack (created at runtime) %esp (stack pointer) memory mapped region for shared libraries 0x40000000 brk run-time heap (managed by malloc) read/write segment (.data, .bss) loaded from the executable file read-only segment (.init, .text, .rodata) 0x08048000 unused 0 cat /proc/self/maps
Relocation • Virtual memory abstraction makes compilation and linking easy • Compared to a single, shared real memory address space (e.g. original Mac) • Linker statically binds all program code and data to absolute virtual addresses • Linker decides entire memory layout at compile time • Example: Windows ".com" format effectively a memory image • Issues • Support dynamic libraries to avoid statically linking things like libc into all processes. • Dynamic libraries might want to be loaded at the same address! • Need to support relative addressing and relocation again • Want to support address-space layout randomization • Security defense mechanism requiring everything to be relocatable • What Meltdown/Spectre malware might attack first
More on relocation • Relocation in Windows PE (.exe) and Linux ELF • Requires position-independent code • Compiler makes all jumps and branches relative to current location or relative to a base register set at run-time • Compiler labels any accesses to absolute addresses and has loader rewrite them to their actual run-time values • Compiler uses indirection and dynamically generated offset tables to determine addresses • Example: Procedure Link and Global Offset Tables in ELF • GOT contains addresses where imported library calls are loaded at run-time • Library calls index GOT to determine location to jump to • Note: Can be targetted by malware for hooks!
Program execution CPU Memory Addresses Registers E I P Object Code Program Data OS Data Data Program-Visible State • EIP - Instruction Pointer • a. k. a. Program Counter • Address of next instruction • Register File • Heavily used program data • Condition Codes • Store status information about most recent arithmetic operation • Used for conditional branching Memory • Byte addressable array • Code, user data, OS data • Includes stack used to support procedures Condition Codes Instructions Stack
IA32 Register file 31 15 8 7 0 %ax %ah %al %eax %cx %ch %cl %ecx %dx %dh %dl %edx General purpose registers (mostly) %bx %bh %bl %ebx %esi %si %edi %di Stack pointer %esp %sp Special purpose registers Frame pointer %ebp %bp
Registers • The processor operates on data in registers (usually) • movl (%eax), %ecx • Fetch data at address contained in %eax • Store in register %ecx • movl $array, %ecx • Move address of variable array into %ecx • Typically, data is loaded into registers, manipulated or used, and then written back to memory • The IA32 architecture is "register poor" • Few general purpose registers • Source or destination operand is often memory locations • Makes context-switching amongst processes easy (less register-state to store)
Operand types • A typical instruction acts on 1 or more operands • addl %ecx, %edx adds the contents of ecx to edx • Three general types of operands • Immediate • Like a C constant, but preceded by $ • e.g., $0x1F, $-533 • Encoded with 1, 2, or 4 bytes based on instruction • Register: the value in one of the 8 integer registers • Memory: a memory address • There are many modes for addressing memory
Operand examples using mov Source Destination C Analog movl $0x4,%eax temp = 0x4; Reg Imm movl $-147,(%eax) *p = -147; Mem movl %eax,%edx temp2 = temp1; Reg movl Reg Mem movl %eax,(%edx) *p = temp; Mem Reg movl (%eax),%edx temp = *p;
Addressing Modes • Immediate and registers have only one mode • Memory on the other hand needs many (so that a load from memory can take a single instruction) • Absolute • specify the address of the data • Indirect • use register to calculate address • Base + displacement • use register plus absolute address to calculate address • Indexed • Add contents of an index register • Scaled index • Add contents of an index register scaled by a constant
Addressing Modes • Absolute • Indirect • Base + displacement • Indexed • Scaled Index movl 0x08049000, %eax movl (%edx), %eax movl 8(%ebp), %eax movl (%ecx, %edx), %eax movl (%ecx, %edx, 4), %eax
x86 instructions • Rules • Source operand can be memory, register or constant • Destination can be memory or register • Only one of source and destination can be memory • Source and destination must be same size
What’s the "l" for on the end? • movl 8(%ebp),%eax • It stands for “long” and is 32-bits • Size of the operands • Baggage from the days of 16-bit processors • For x86, x86_64 • 8 bits is a byte (movb) • 16 bits is a word (movw) • 32 bits is a double or long word (movl) • 64 bits is a quad word (movq)
Global vs. Local variables • Global variables stored in either .data or .bss section of process • Local variables stored on stack • Which variables? m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); }
Global vslocal: Which is which? void a() { intx = 1; inty = 2; x = x+y; printf("Total = %d\n",x); } int main() {a();} int x = 1; int y = 2; void a() { x = x+y; printf("Total = %d\n",x); } int main(){a();} 080483c4 <a>: 80483c4: push %ebp 80483c5: mov %esp,%ebp 80483c7: sub $0x18,%esp 80483ca: movl $0x1,-0x8(%ebp) 80483d1: movl $0x2,-0x4(%ebp) 80483d8: mov -0x4(%ebp),%eax 80483db: add %eax,-0x8(%ebp) 80483de: mov -0x8(%ebp),%eax 80483e1: mov %eax,0x4(%esp) 80483e5: movl $0x80484f0,(%esp) 80483ec: call 80482dc <printf@plt> 80483f1: leave 80483f2: ret 080483c4 <a>: 80483c4: push %ebp 80483c5: mov %esp,%ebp 80483c7: sub $0x8,%esp 80483ca: mov 0x804966c,%edx 80483d0: mov 0x8049670,%eax 80483d5: lea (%edx,%eax,1),%eax 80483d8: mov %eax,0x804966c 80483dd: mov 0x804966c,%eax 80483e2: mov %eax,0x4(%esp) 80483e6: movl $0x80484f0,(%esp) 80483ed: call 80482dc <printf@plt> 80483f2: leave 80483f3: ret
Arithmetic operations 08048394 <f>: 8048394: pushl%ebp 8048395: movl%esp,%ebp 8048397: subl$0x10,%esp 804839a: movl $0x0,-0x8(%ebp) 80483a1: movl $0x1,-0x4(%ebp) 80483a8: addl $0xb,-0x8(%ebp) 80483ac: movl-0x4(%ebp),%eax 80483af: subl%eax,-0x8(%ebp) 80483b2: subl $0x1,-0x8(%ebp) 80483b6: addl $0x1,-0x4(%ebp) 80483ba: leave 80483bb: ret void f(){ int a = 0; int b = 1; a = a+11; a = a-b; a--; b++; } int main() { f();}
Condition codes • The IA32 processor has a register called eflags • (extended flags) • Each bit is a flag, or condition code CF Carry Flag SF Sign Flag ZF Zero Flag OF Overflow Flag • As programmers, we don’t write to this register and seldom read it directly • Flags are set or cleared by hardware on each arithmetic/logical operation depending on the result of an instruction • Conditional branches handled via EFLAGS
Condition codes (cont.) • Setting condition codes via compare instruction cmplb,a • Computes a-b without setting destination • CF set if carry out from most significant bit • Used for unsigned comparisons • ZF set if a == b • SF set if (a-b) < 0 • OF set if two’s complement overflow • (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0) • Byte and word versions cmpb, cmpw
Condition codes (cont.) • Setting condition codes via test instruction testlb,a • Computes a&bwithout setting destination • Sets condition codes based on result • Useful to have one of the operands be a mask • Often used to test zero, positive testl %eax, %eax • ZF set when a&b == 0 • SF set when a&b < 0 • Byte and word versions testb, testw
void f(){ intx = 1; inty = 2; if (x==y) printf("x equals y.\n"); else printf("x is not equal to y.\n"); } int main() { f();} if statements 080483c4 <f>: 80483c4: pushl%ebp 80483c5: movl%esp,%ebp 80483c7: subl$0x18,%esp 80483ca: movl $0x1,-0x8(%ebp) 80483d1: movl $0x2,-0x4(%ebp) 80483d8: movl-0x8(%ebp),%eax 80483db: cmpl-0x4(%ebp),%eax 80483de: jne 80483ee <f+0x2a> 80483e0: movl $0x80484f0,(%esp) 80483e7: call 80482d8 <puts@plt> 80483ec: jmp 80483fa <f+0x36> 80483ee: movl $0x80484fc,(%esp) 80483f5: call 80482d8 <puts@plt> 80483fa: leave 80483fb: ret
if statements • Note: Microsoft assembly and reverse operand order int a = 1, b = 3, c; if (a > b) c = a; else c = b; movdwordptr [ebp-4],1 ; store a = 1 movdwordptr [ebp-8],3 ; store b = 3 moveax,dwordptr [ebp-4] ; move a into EAX register cmpeax,dwordptr [ebp-8]; compare a with b (subtraction) jle 00000036 ; if (a<=b) jump to line 00000036 movecx,dwordptr [ebp-4] ; else move 1 into ECX register && movdwordptr [ebp-0Ch],ecx ; move ECX into c (12 bytes down) && jmp 0000003C ; unconditional jump to 0000003C movedx,dwordptr [ebp-8] ; move 3 into EDX register && movdwordptr [ebp-0Ch],edx ; move EDX into c (12 bytes down)
Loops int factorial_do(int x) { int result = 1; do { result *= x; x = x-1; } while (x > 1); return result; } factorial_do: pushl %ebp movl %esp, %ebp movl 8(%ebp), %edx movl $1, %eax .L2: imull %edx, %eax decl %edx cmpl $1, %edx jg .L2 leave ret
C switch statements switch (x) { case 1: case 5: code at L0 case 2: case 3: code at L1 default: code at L2 }
C switch statements • Implementation options • Series of conditionals • testl followed by je • OK if few cases and large ranges of values • Slow if many cases • Jump table (example below) • Lookup branch target from a table • Possible with a small range of integer constants • GCC picks implementation based on structure • Example: switch (x) { case 1: case 5: code at L0 case 2: case 3: code at L1 default: code at L2 } .L3 .L2 .L0 .L1 .L1 .L2 .L0 1. init jump table at .L3 2. get address at .L3+4*x 3. jump to that address
Example int switch_eg(int x) { int result = x; switch (x) { case 100: result *= 13; break; case 102: result += 10; /* Fall through */ case 103: result += 11; break; case 104: case 106: result *= result; break; default: result = 0; } return result; }
intswitch_eg(int x) { int result = x; switch (x) { case 100: result *= 13; break; case 102: result += 10; /* Fall through */ case 103: result += 11; break; case 104: case 106: result *= result; break; default: result = 0; } return result; } leal -100(%edx),%eax cmpl $6,%eax ja .L9 jmp *.L10(,%eax,4) .p2align 4,,7 .section .rodata .align 4 .align 4 .L10: .long .L4 .long .L9 .long .L5 .long .L6 .long .L8 .long .L9 .long .L8 .text .p2align 4,,7 .L4: leal (%edx,%edx,2),%eax leal (%edx,%eax,4),%edx jmp .L3 .p2align 4,,7 .L5: addl $10,%edx .L6: addl $11,%edx jmp .L3 .p2align 4,,7 .L8: imull %edx,%edx jmp .L3 .p2align 4,,7 .L9: xorl %edx,%edx .L3: movl %edx,%eax leave ret Key is jump table at L10 Array of pointers to jump locations
Avoiding conditional branches • Modern CPUs with deep pipelines • Instructions fetched far in advance of execution • Mask the latency going to memory • Problem: What if you hit a conditional branch? • Must predict which branch to take and speculatively fetch/execute! • Branch prediction in CPUs well-studied, fairly effective (except when it's not… ) (1/2018) • But, best to avoid conditional branching altogether
x86 REP prefixes • Loops require decrement, comparison, and conditional branch for each iteration • Incur branch prediction penalty and overhead even for trivial loops • Repeat instruction prefixes (REP, REPE, REPNE) • Inserted just before some instructions (movsb, movsw, movsd, cmpsb, cmpsw, cmpsd) • REP (repeat for fixed count) • Direction flag (DF) set via cld and std instructions • esi and edi contain pointers to arguments • ecx contains counts • REPE (repeat until zero), REPNE (repeat until not zero) • Used in conjuntion with cmpsb, cmpsw, cmpsd
x86 REP example • .data source DWORD 20 DUP (?) target DWORD 20 DUP (?) • .code cld ; clear direction flag = forward movecx, LENGTHOF source movesi, OFFSET source movedi, OFFSET target rep movsd
x86 SCAS • Repeat a search until a condition is met SCASB SCASW SCASD • Search for a specific element in an array • Search for the first element that does not match a given value
x86 SCAS .data alpha BYTE "ABCDEFGH",0 .code movedi,OFFSET alpha moval,'F' ; search for 'F' movecx,LENGTHOF alpha cld repnescasb ; repeat while not equal jnz quit decedi ; EDI points to 'F'
x86-64 Conditionals • Conditional instruction execution • cmovXXsrc, dest • Move value from src to dest if condition XX holds • No branching • Conditional handled as operation within Execution Unit • Added with P6 microarchitecture (PentiumPro onward) • Must ensure gcc compiles with proper target to use • Example (x < y) ? (x) : (y) • Performance • 14 cycles on all data • More efficient than conditional branching (simple control flow) • But overhead: both branches are evaluated movl 8(%ebp),%edx # Get x movl 12(%ebp),%eax # rval=y cmpl %edx, %eax # rval:x cmovll%edx,%eax # If <, rval=x
x86-64 conditional example int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } # x in %edi, y in %esi absdiff: movl%edi, %eax # eax = x movl%esi, %edx # edx = y subl%esi, %eax # eax = x-y subl%edi, %edx # edx = y-x cmpl%esi, %edi # x:y cmovle%edx, %eax # eax=edx if <= ret
IA32 function calls • Handled based on calling convention used by the processor and compiler for each language • First, some data structures
Increasing Addresses Stack Pointer %esp IA32 Stack Stack “Bottom” • Region of memory managed with stack discipline • Grows toward lower addresses • Register %esp indicates lowest stack address • address of top element Stack Grows Down Stack “Top”
Increasing Addresses Stack Pointer %esp IA32 Stack Pushing Stack “Bottom” • Pushing • pushlSrc • Decrement %esp by 4 • Fetch operand at Src • Write operand at address given by %esp • e.g. pushl %eax subl $4, %esp movl %eax,(%esp) Stack Grows Down -4 Stack “Top”
Increasing Addresses Stack Pointer %esp IA32 Stack Popping Stack “Bottom” • Popping • poplDest • Read operand at address given by %esp • Write to Dest • Increment %esp by 4 • e.g. popl %eax movl (%esp),%eax addl $4,%esp Stack Grows Down +4 Stack “Top”
Stack Operation Examples Initially pushl %eax popl %edx 0x110 0x110 0x110 0x10c 0x10c 0x10c 0x108 123 0x108 123 0x108 123 0x104 213 0x104 213 Top Top Top %eax 213 %eax 213 %eax 213 %edx %edx %edx 555 213 %esp 0x108 %esp 0x104 0x108 %esp 0x104 0x108
Procedure Control Flow • Procedure call: • call label • Push address of next instruction (after the call) on stack • Jump to label • Procedure return: • ret Pop address from stack into eip register