1 / 37

Automated Signature Extraction for High Volume Attacks

Automated Signature Extraction for High Volume Attacks. Yehuda Afek Anat Bremler -Barr Shir Landau Feibish. This work is part of the Kabarnit –Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor.

micheal
Download Presentation

Automated Signature Extraction for High Volume Attacks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Signature Extraction for High Volume Attacks YehudaAfek AnatBremler-Barr Shir Landau Feibish • This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. • This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.

  2. Infrastructure-level DDoS attacks Server-level DDoS attacks Bandwidth-level DDoS attacks Current DDoS Attack Zombies on innocent computers

  3. High volume attacks - Current Defense Many different types of attackers: • Remaining attacks: • Botnets (millions of computers) • Hard to identify behaviorally, under the radar screen • Zero-day – no known signatures Defense Line1 Defense Line 2 Defense Line 3 Defense Line n Call for HELP!! … SYN cookies, Challenge-response access control list filtering behavioral analysis

  4. Signature based DDoS Attack Detection • Unknown (zero-day) attacks: • Some hope: Attack tools usually leave some unique footprint (repeating pattern) • Example in packet: Connection: KEEP-ALIVE • Today: Find signatures manually (human eye) • Our goal: Find it automatically • Signatures used by anti-DDoS devices and firewalls to stop attack • Mitigation in minutes, good enough for these types of attacks

  5. Signatures also used in • NIDS/IPS (Snort, Bro, etc.) • Worm detection (automated extraction) • Previous work: • Worm behavior (address dispersion, suspicious code, etc.) • Fixed-length signatures • Non-scalable • Notable works: • Kephart et al ‘94 • Honeycomb [Kreibich et al ’04] • Earlybird [Singh et al ‘04] • Autograph[Kim et al ’04] • Hancock[Griffin et al ’09]

  6. System Overview Our Challenge: Automatically find signatures that appear frequently only during attack Where: Input collection: • In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) • In the cloud – collect data from several collectors. Peace time traffic sample Signature Extraction Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample

  7. Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic

  8. Our Goal Automatically find signatures that appear frequently only during attack Requirements: • Find minimal set of signatures • Some filtering devices have limited capacity • Allow signatures of varying lengths • Don’t include signatures found in legitimate traffic • Minimum false positives • Minimize space and time usage • Large amounts of data • Quick response

  9. Finding Frequent Strings in Traffic • Input: Sequence of packets • Output: Strings that appear frequently in packets • Common Stringology solution: use suffix trees/arrays • too much space • Our solution uses heavy hitters Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic

  10. Heavy Hitters (Frequent Items) • Input: N values, integer v • Output: v values each appearing at least N/v times • Approximate solution: • Uses O(v) space! • One pass over input! • Known counter based HH Algorithms: • Misra & Gries 1982 • Lossy Counting – Monku and Motwani 2002 • Space saving - Metwally et al 2005 – currently using

  11. Space saving Heavy Hitters [Metwally et al 2005] • Algorithm: • Maintain v values, and their counters.

  12. Space saving Heavy Hitters [Metwally et al 2005] • Algorithm: • Maintain v values, and their counters. • If next value x is one of the v, increment its counter.

  13. Space saving Heavy Hitters [Metwally et al 2005] • Algorithm: • Maintain v values, and their counters. • If next value x is one of the v, increment its counter. • Else take item with minimal counter c: • Replace value with x • New counter is c+1 • Error rate: N/v

  14. Our Solution • Heavy hitters usually done on numbers… how do we use it for text? • k-grams: strings of length exactly k • Trivial idea: For each packet: • Take all k-grams (sliding window) • Do Heavy hitters on them • Fixed length not good enough • Either too short: cuts up longer signatures • Substring pollution - Too many heavy hitters for one signature • Or too long : noisy signatures abcabcadefgfsdghjghnfdghfgsdhfjs b1=abca k-grams b2 = bcab b3 = cabc

  15. Our Solution: Double Heavy Hitters • Double Heavy Hitters algorithm: two separate instances of heavy hitters • Heavy Hitters 1: Find heavy hitters of k-grams • Heavy Hitters 2: Find heavy hitters of varying-length strings created during run of Heavy Hitters 1 Heavy Hitters 1 Heavy Hitters 2 Input to Heavy Hitters 1: k-grams Input to Heavy Hitters 2: strings Output is output of Heavy Hitters 2 k k …. k k k string string string k string k k string

  16. Double Heavy Hitters Algorithm • While processing k-grams in Heavy Hitters1 • Find max run of k-grams: • Already in Heavy Hitters 1 • Counters of consecutive k-grams maintain predefined ratio • Create string • Insert into Heavy Hitters 2 k-grams: bcab abca cabc dabc bcab abca cabc abcd cdab abca bcda Is already in Heavy Hitters 1? N N N Y Y Y N N N N Y abca abcabc Check ratio

  17. Double Heavy Hitters Algorithm • Example: Input: abcabcabcd

  18. Double Heavy Hitters Algorithm • Example: Input: abcabcabcd String = abca

  19. Double Heavy Hitters Algorithm • Example: Input: abcabcabcd String = abcab

  20. Double Heavy Hitters Algorithm • Example: String = abcabc Input: abcabcabcd

  21. Double Heavy Hitters Algorithm • Example: String = abcabc Input: abcabcabcd

  22. Heavy Hitters on text – improving the estimation • Problem: substrings in heavy hitters • Only longest run is in input to HH2 • Correct the count: • After run of algorithm • For all strings s in Heavy Hitters 2: • Find other strings which contain s and add their counters to s’s counter

  23. Double Heavy Hitters Algorithm Analysis • Input: • Input to HH1: N k-grams • Input to HH2: C consecutive grams • Error bounds: • For HH1 with v items: N/v • For HH2 with v items: C/v • We Prove: • C ≤ N/(k + 1) • Overall: Error bound of the Double Heavy Hitters algorithm

  24. Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: keep-ALIVE Attack time traffic sample Find frequent strings in attack time traffic Formalize with thresholds

  25. Chose Signatures • Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High

  26. Chose Signatures • Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace time Signatures False positives

  27. Chose Signatures • Create signatures that rarely appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace with frequency > Peace-Low Signatures False positives

  28. Chose Signatures • Create signatures that may appear in legitimate traffic, but appear in attack traffic much more Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High frequency > Peace-high frequency > Peace-Low Signatures Signatures only if attack frequency at least delta more than peace frequency False positives

  29. Use peace traffic to create filters • Use our Double Heavy Hitters algorithm on peace time traffic: Double Heavy Hitters Algorithm 100% Peace time traffic packets payload: White list abcabcadefgfsdghjghnfdghfg...... b2 = bcab b3 = cabc …… b1=abca Peace-high Maybe white list Output values 50% frequency > Peace-high frequency > Peace-high frequency > Peace-Low Peace-low Not white list 0%

  30. Extracting Attack Signatures • Now use Double Heavy Hitters algorithm on attack time traffic with filters Heavy Hitters 1 Heavy Hitters 2 Attack traffic packets payload: hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbas frequency > Attack-High …… b2 = agdh b3 = gdhd b1=hagd Output values Signatures string Maybe white list: White list: discard if contained in whitelist string Modified DHH

  31. Evaluations • Overall eleven tests: • Ten real attack captures • 5 captures of peacetime traffic • 5 synthetic peacetime captures • One Synthetic attack in real peace • time traffic • Compare to human expert

  32. Sample Signatures Could not be identified manually • Extra newline between header fields • Use of upper-case characters, where usually lower • Use of a rarely used HTTP field • Use of rare user agent.

  33. Results – Accuracy of Double Heavy Hitters estimation • Graph of frequency of signatures • RED – Actual count (frequency) in attack traffic • BLUE – Algorithm (DHH) estimation of frequency of signatures Percent Signatures

  34. Results - Attack Rate Estimation Tests with synthetic peace time traffic Tests with real peace time traffic Attack rate Test Number

  35. Results – Recall and Precision Estimation Precision: relevant packets from all identified Tests with real peace time traffic Tests with synthetic peace time traffic Recall: identified packets from all relevant Average: 99.96 Worst case: 99.8 Percent Test Number

  36. Future Work • Identify signatures always found in same packets • Good synthetic peace-time traffic, global white-list • Support regular expression signatures

More Related