1 / 61

RAMSES ( Regeneration And iMmunity SErviceS ): A Cognitive Immune System

RAMSES ( Regeneration And iMmunity SErviceS ): A Cognitive Immune System. Mark Cornwell James Just Nathan Li Robert Schrag Global Infotek. R. Sekar Stony Brook University. Outline. Goals Approach Overview Status and Next Steps Memory Errors Taint Recognition Questions. Project Goals.

Download Presentation

RAMSES ( Regeneration And iMmunity SErviceS ): A Cognitive Immune System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAMSES(Regeneration And iMmunity SErviceS): A Cognitive Immune System Mark Cornwell James Just Nathan Li Robert Schrag Global Infotek R. SekarStony Brook University

  2. Outline • Goals • Approach Overview • Status and Next Steps • Memory Errors • Taint Recognition • Questions

  3. Project Goals • Prevent most attacks from causing damage • Cover a wide range of attacks • Work on blackbox COTS applications with modest performance overheads • Refine response to preserve availability • Reduce performance impact of unsuccessful attacks by filtering out future attack instances • Filters (“signatures”) should be deployable across different instances of an application • Input filters (network layer or close) • Output filters (at well-known APIs) • Don’t require “deep” instrumentation

  4. Attack Coverage (Stack-smashing, heap overflow, integer overflow, data attacks) Generalized Injection Attacks CVE Vulnerabilities (Ver. 20040901)

  5. RAMSES Project Schedule Baseline Tasks 1. Refine RAMSES Requirements 2. Design RAMSES 3. Develop Components 4. Integrate System 5. Analyze & Test RAMSES 6. Coordinate & Rept Prototypes Optional Tasks O.3 Cross-Area Exper CY06 CY09 CY07 CY08 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 1 2 3 21 June 2007

  6. Outline • Goals • Approach Overview • Status and Next Steps • Memory Errors • Taint Recognition • Questions

  7. RAMSES Components • Event Collector • parse/decode/normalize HTTP requests, parameters, cookies, … Internet • Attack Detector • Address-space randomization • Taint-based policies, anomalies RAMSES Interceptors • Filter Generator • Output filter • Input filter RAMSES Overview • Key research problems • Learn taint propagation • Identify tainted components in output, generate filtering criteria • Learn input/output transformation • Use transformation to project output filters to input Protected System Web Server (IIS/Apache) Web App (PHP/ ASP) SQL Database (MySQL) Network/App Firewall (e.g. mod_security) Network DLLs OS DLLs Application DLLs

  8. Instrumentation • Instrument important APIs • Uses Microsoft detours framework to intercept DLL calls • No need for source code or semantics of application specific data structures • No need for complex analyses or transformations on binaries • Instrumentation will support • Logging of relevant operations • Including calling context, parameters and return values • Interposition of filter functions • Including injection of failure returns at appropriate points to ensure error recovery

  9. Event Collection • Apply further processing for widely used, standardized APIs (e.g., HTTP) • Parse into components • Request type, URL, form parameters, cookies, … • Exposes more of protocol semantics to learning and filtering algorithms • Normalize formats to avoid effect of various encoding schemes • To cope with evasion techniques • To ensure accuracy of taint-learning

  10. Memory Error Detection • Base technology (ASR) developed in DAWSON project (SRS Phase I) for Windows XP • Enhanced with: • Multiple sensors for earlier detection • Analysis of memory for vulnerability characterization • Integrated with signature generation

  11. Taint-based Detection Input Interface $name=$_GET[‘name’] Attacker injects malicious input data • Taint-tracking: • Identify parts of output “directly controlled” by untrusted input • Taint-enhanced policy enforcement: • Policies based on output value as well as taint, e.g., “No tainted semicolons in SQL query argument” • Taint-enhanced anomaly detection: • Detect anomalous structure/content of output, e.g., “tainted component too long, contains too many non-alphabetic chars” $query= “SELECT price FROM products WHERE name=‘” . $name . “’” Attacker-provided data propagated in program Program sql_query($query) Attacker-provided data used as argument to corrupt system/data Security-Sensitive Operations

  12. Learning-based Taint Inference • Off-line learning from logs • Identify dataflow from input to output • Compare each output with recent inputs • Search can be narrowed if each related operations are made by the same thread • Learn a FSA for taint-marking • Given an output, can quickly identify which portions are tainted • Online • Match outputs using learned FSA • Double-check with input to verify (optional)

  13. Identifying Dataflow • In most applications, parts of input directly copied into output, with some minor modifications • deletion, e.g., spaces • modification, e.g., upper to lower case • insertion, e.g., escape special chars • more complex transformations aren’t easily handled by a learning technique • Given strings I and O, we need to answer one of: • Problem P0: Does I equal O, possibly with some changes? • Problem P1: Does I occur as a substring of O, possibly with minor modifications? • Problem P2: Do substrings of I occur within O, possibly with minor modifications?

  14. Algorithms for P0 to P2 • P0 is the approximate string matching problem • Dynamic programming technique yields O(|I|*|O|) algorithm for weighted approximate matching • Closely related problems: • Longest common subsequence • Global alignment • Weighted edit distance

  15. Algorithms for P0 to P2 • P1 is the approximatesubstring matching problem • A very simple change to the algorithm for P0 yields solution for P1 • So, problem still has O(|I|*|O|) runtime • P2 is the common approximate substring problem • Again, solvable with a minor tweak on the algorithm for P1 • Local alignment [Smith-Waterman '81] • Difficulty: results very sensitive to weights • So, we focus on P1, since we have already parsed inputs into parameters, cookies, etc.

  16. Practical Issues with Algorithms for P1 • Some outputs can be very long • E.g., HTML outputs of server • Need faster algorithms than O(|I|*|O|) • Idea: use O(|O|) algorithm to find likely places for start of matching substring • Some inputs can be too short • Leads to too many matches • Need to compute some measures of statistical significance of a match

  17. Taint-marking FSA • Given a set OCof possible outputs in a context C, an FSA FCthat • accepts all strings in OC • edges of FSA are annotated to indicate if the corresponding input symbol is tainted • Limit size of OC using as much execution context as possible • capture calling context of output function (includes all activation records on stack) • simplifies learning: often there is just one element in O for a given set of inputs

  18. Learning Taint-marking FSA • Suitable algorithms depend on lexical structure of output language • SQL, shell commands, … • This info could be specified externally • Our initial design based on general characteristics of command languages • Character classes: alphabetic, numeric, upper/lower case, separators, special chars,… • Leverage properties of tainted/untainted parts of output • Tainted components are from input, hence highly variable: generalize quickly from examples • Untainted components are from web app, likely static: generalize only if necessary

  19. xyz SELECT price FROM products WHERE name=' ' AND brand=' abcd ' Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz'

  20. xyz SELECT price FROM products WHERE name=' ' AND brand=' abcd ' Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz‘ • SELECT price FROM products WHERE name='abcd' AND brand='yyz'

  21. x yz y SELECT price FROM products WHERE name=' ' AND brand=' abcd ' Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz'

  22. x yz y SELECT price FROM products WHERE name=' ' AND brand=' abcd ' Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz' • SELECT price FROM products WHERE name='abcd' AND brand='uvw'

  23. SELECT price FROM products WHERE name=' ' AND brand=' abcd ' a…z Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz' • SELECT price FROM products WHERE name='abcd' AND brand='uvw' a…z

  24. SELECT price FROM products WHERE name=' ' AND brand=' abcd ' a…z Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz' • SELECT price FROM products WHERE name='abcd' AND brand='uvw‘ • SELECT price FROM products WHERE name='defg' AND brand='uvw' a…z

  25. a…z a…z Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz' • SELECT price FROM products WHERE name='abcd' AND brand='uvw' • SELECT price FROM products WHERE name='defg' AND brand='uvw' SELECT price FROM products WHERE name=' ' AND brand=' a…z a…z '

  26. a…z a…z Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz' • SELECT price FROM products WHERE name='abcd' AND brand='uvw' • SELECT price FROM products WHERE name='defg' AND brand='uvw' • SELECT price FROM products WHERE name='ijkl' SELECT price FROM products WHERE name=' ' AND brand=' a…z a…z '

  27. a…z a…z Learning Taint-marking FSA • SELECT price FROM products WHERE name='abcd' AND brand='xyz' • SELECT price FROM products WHERE name='abcd' AND brand='yyz' • SELECT price FROM products WHERE name='abcd' AND brand='uvw' • SELECT price FROM products WHERE name='defg' AND brand='uvw' • SELECT price FROM products WHERE name='ijkl' SELECT price FROM products WHERE name=' AND brand=' a…z ' a…z '

  28. Taint Inference Vs Taint-tracking • Disadvantages of learning • False negatives if inputs transformed before use • Low likelihood for most web apps • False positives due to coincidence • Mitigated using statistical information • Plan to evaluate these experimentally • Benefits of learning • Low performance overhead • Some significant implicit flows handled without incurring high false positives • Can address multi-step attacks where tainted data is first stored in a file/database before use • More generally, in dealing with information flow that crosses module boundaries

  29. Filter Criteria Correlative filters Equality-based filter Structure-based filter Statistical filter Causal filters Filtering criteria derived from attack detection criteria (policy or anomaly) Filter Location Input filter Easier to deploy but harder to synthesize Output filter (precedes sensitive operation) Easier to synthesize than input filter, but deployment needs deeper instrumentation May be too late for some attacks (memory corruption) Filter Types Note: All filters evaluated using large number of benign samples and 1 attack sample

  30. Output Filters • Use taint-marking FSA to identify tainted components of output • Attack-independent signature component • Structural filter: FSA match failure • Due to structural changes associated with SQL and command injection • Example: taint-marking regular expressionSELECT price FROM products WHERE name=\'[a-z]*\'doesn’t match attack outputSELECT … WHERE name=\'z\'; UPDATE products SET price=0 WHERE name=abcd\'

  31. Output Filters (continued …) • Equality-based filter • tainted parts same as attack-causing output • Statistical • Filter: statistics matching that of attack output but not benign outputs • length of tainted data > threshold • tainted data contains `<script>’ • tainted data contains too many non-alpha characters • Causal: Just apply attack detection criteria • Note: filter independent of attack sample

  32. Input Filters • Taint-marking FSA only indicates which parts of output are tainted • Don’t know which parts relate to which inputs • Need this info to generate input filters • We need to capture the relationship between inputs and outputs • We will represent this relation using an FSA as well, but its transitions will be on pairs (input char/output char) • Note: i may be from one of n input parameters; the transition will specify which. • The term “finite-state transducer” (FST) is commonly used to refer to such FSA

  33. $4=/  /’| gpg -h $2@/$2 $3/$3 /.g -r $1/$1 /echo ' / -r $4/$4 ' ' $2/ I/O Transformation FST Example from SquirrelMail

  34. Generating Causal Filters from FST • Consider a command injection on SquirrelMail • Use I/O FST to compute output • Violates policy: no “;” (or other command separators) in tainted parts • Can be projected into a corresponding condition on input: no command separator in input $3 • Can be generalized to state that if $1 has no unmatched quotes, then neither $3 nor $4 can contain unescaped “;” x@yz.com;touch /tmp/t /’| gpg -h sekar /.g -r ab /echo ' ' '

  35. Outline • Goals • Approach Overview • Status and Next Steps • Memory Errors • Taint Recognition • Questions

  36. RAMSES Functional Architecture (Spiral 1) Responses in the form of learned attack signatures and specific interventions (block, filter) are fed to interceptors to provide an immune response Memory Error Attack Detector User Inputs New information from sensors Is analyzed in context of retained history Function Interceptors Signature & Rule Manager Application (s) Function Interceptors Interceptors observe and control inputs to applications at the function call level Function Interceptor Manager Crash Dump Application DLL Signature Generator Function Interceptors Win32 API Sense Monitor Respond Alerts, Buffered Inputs Function Interceptors Windows SysCall Function Interceptors Windows Kernel RAMSES Initiator Dataflow Anomaly Detector Offline Load LOG Sensor Config Parameters, Input, Output, Context Taint Mark Rules Offline Dataflow Learning Dataflow / user-taint identification rules learned offline from logs. Rule-based monitoring of inputs performed by Interceptors (also alerting and response)

  37. RAMSES Implementation Status • Architecture supports generalized injection attack defenses • Vulnerability-based signature and policy-based regular expression signature • Works on multiple applications with application specific configuration • Memory error detector integrated • Zero-day attack detector for unknown vulnerability • DAWSON enhanced and integrated on windows XP • Function interceptors have been tested and stabilized over numerous applications on Windows XP/Vista/Linux • IIS v4 - v7, SQL Server, MySQL • IE browser, Windows Explorer • Also implemented for Linux Apache and PHP • Test beds and synthetic applications developed • Performed first set of end-to-end experiments on synthetic application based attacks • Memory error • SQL injection • XSS attacks • Completed Spiral 1 Functional Architecture

  38. Function Interceptor • The Infrastructure • Based on Detours package from Microsoft Research • Same Component used by RAMSES online and offline mode • Works the same way for all generalized injection attack defense • Function Interceptors • To intercept/monitor/alter application behavior • Can intercept any exported function • Internal functions can be intercepted with Microsoft online debug information • Function name, parameters, return result, calling context are logged in name:value pair • Data buffer content is logged in printable ASCII string • Current Implementation • Currently support following types of APIs: • Windows Socket APIs: 40+ functions • Two set of socket APIs on Windows, both are instrumented • Process/Thread APIs: 10+ functions • Windows COM APIs: 60+ functions • File IO APIs: 40+ functions • HTTP APIs: 10+ functions (on Vista)

  39. Function Interceptor Sample Log Function Name Timestamp Parameters Return result Calling context • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, send(SOCKET:274|Buf:1edc08|Len:413|Flags:0|RETURN:19d|DUMPBUFFER:1) • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, DUMPBUFFER:BEGIN • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, GET / HTTP/1.1\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, Accept: */*\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, Accept-Language: en-us\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, UA-CPU: x86\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, Accept-Encoding: gzip, deflate\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506 • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, Host: www.google.com\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, Connection: Keep-Alive\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, Cookie: PREF=ID=c559b127b50b7436:TM=1176843792:LM=1180046298:L=0jXnI1ToXFxWWo5LbIcrLh7t8ID0Fd1HW3eXHVTozZwA:S=clcSnwJfHl • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, rjWQTm; testcookie=\r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, \r\n • 20070613175843825: FuncIndex=0x8e,PID=0xd58,ThreadId=0xeb0,SPOffset=0x290, DUMPBUFFER:END

  40. Key Steps • Attack detection • Process crash • Useful information can be extracted from a failed attack • Usually in the form of access violation exception • Full crash dump can be analyzed offline • Policy violation • Trace back to input • Correlate attack effect to some part of input • Vulnerability/exploit analyzer • Analyze attack in the context of recent input history • Correlate the exploit/vulnerability to input without fine grained execution trace • Signature generator • Generate generic signature and response for the underlying vulnerability

  41. Memory Error Example WorkerThread WorkerThread WorkerThread • Multi-threaded vulnerable server • Server receives input from port V • Server handles socket error • Server handles thread level exception RAMSES Attack Detector ASR Function Interceptor Attacker Port V Exception Handler Listening Thread Recent Input History Vulnerability Exploit Analyzer Signature Generator RAMSES Protected Server

  42. Memory Error Attack • A traditional stack buffer overflow exploit jmp esp at 0x77fb59cc from NTDLL • Brute-force attack enumerates all possibilities, 0x000059cc to 0xffff59cc • Attack succeeded or service denied without protection WorkerThread RAMSES Attack Detector WorkerThread ASR Function Interceptor Attacker Port V 0x77fb59cc Exception Handler Listening Thread 0xABCD59cc Recent Input History 0xFFFF59cc Vulnerability Exploit Analyzer WorkerThread Signature Generator Faulting address, Instruction, stack content RAMSES Protected Server

  43. Next Steps • Attack Detection • Enhance detection to be closer to or even before vulnerability exploiting point. • Save attacking dump for offline analysis to increase confidence and enhance signatures • Signature Generation • Create new probes to have an accurate characterization of the underlying vulnerability, for example the minimum size required to overflow a buffer and overwrite return address, by customizing/randomizing certain part of payload like target address, buffer length. • Measure and enhance false positive/false negative

  44. Outline • Goals • Approach Overview • Status and Next Steps • Memory Errors • Taint Recognition • Questions

  45. Taint Recognition Implementation Overview • Script Vulnerability Scenario • Taint Flow ~ Script Vulnerability • Exploits: SQL, Plant XSS attack • Taint Recognition and Policy • Example of a policy signature • Algorithm for generating a simple recognizer • Limits of effectiveness • Solutions are not unique • Multiple tainted segments causes problems • Better signatures • Character distribution invariants • More things we can learn • What calls/contexts tainted inputs appear • Measure effectiveness of taint templates

  46. Script Vulnerability Scenario Values from the database User supplied inputs

  47. $v = $_REQUEST['vote']; $n = $_REQUEST['name']; $sql = "INSERT INTO namevote VALUES('$n ', $v );" ; multi_query($conn,$sql) Taint Flow ~ Script Vulnerability Web Server (IIS) Web App (PHP ASP) SQL Database (MySQL) INSERT INTO namevote VALUES(‘alice’,345);

  48. $v = $_REQUEST['vote']; $n = $_REQUEST['name']; $sql = "INSERT INTO namevote VALUES('$n ', $v );" ; multi_query($conn,$sql) Exploit: SQL Capture Attacker can execute arbitrary SQL commands via crafted strings intended for data values. Web Server (IIS) Web App (PHP ASP) SQL Database (MySQL) 445); DROP TABLE FOO; INSE Trudy INSERT INTO namevote VALUES(‘trudy’,445); DROP TABLE FOO; INSERT INTO namevote VALUES(‘bob’,345);

  49. $v = $_REQUEST['vote']; $n = $_REQUEST['name']; $sql = "INSERT INTO namevote VALUES('$n ', $v );" ; multi_query($conn,$sql) Exploit: Plant XSS Attack Imperfect filtering offers the possibility of planting XSS attacks even without subverting the SQL syntax. Web Server (IIS) Web App (PHP ASP) SQL Database (MySQL) Alice<script>alert(String.fr 345 INSERT INTO namevote VALUES(‘Alice<script>alert(String.fromCharCode(88,83,83)</script>’,345); Tainted text later executes in browsers of future visitors to site. Observe: No variation in server control flow!

  50. Observations on Exploits • Can launch from ordinary web browser. • SQL capture can inject persistent data into database, that may propagate to 3rd party victims. • Text the application programmer intended for limited use as data is crafted to influence the command interpretation stream in unintended ways. • Command shell, filename, format string, all share a similar structure to this canonical example. • These “capture” vulnerabilities stem from flaws in application programs. Perfect filtering by the application could have eliminated the flaw. Alas, Perfection is not always achieved. Can we craft our application to learn from experience?

More Related