1 / 36

FiG: Automatic Fingerprint Generation

FiG: Automatic Fingerprint Generation. Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie Mellon University. Fingerprinting. Used to identify: versions of software on hosts operating systems of hosts

Download Presentation

FiG: Automatic Fingerprint Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie Mellon University

  2. Fingerprinting Used to identify: • versions of software on hosts • operating systems of hosts • hosts running versions with vulnerabilities Linux Solaris Windows XP SP2 Network administrator Windows XP SP1

  3. Queries Host Responses Output: what OS? (e.g. Linux) The Fingerprinting Process Fingerprint: • set of queries sent to host + • classification function analyzing queries & responses Well-known fingerprinting tools: nmap, fpdns Fingerprinting Tool

  4. What classification function? Fingerprinting Tool What queries? Finding Fingerprints • How do fingerprinting tools get fingerprints? • Existing approach: • Manual identification • Incomplete, time-consuming • Difficult to keep up-to-date Need automatic, accurate fingerprint generation!

  5. Our Contribution: FiG In particular: • Use machine learning to automatically generate fingerprints • Automatically generate accurate fingerprints: • Distinguishing OS • Distinguishing implementations of DNS servers • Finding new fingerprints Demonstrate automatic fingerprint generation is possible

  6. Outline • Fingerprint Generation Problem • Overview of Approach • Automatic Fingerprint Generation • Experimental Results • Conclusion

  7. Fingerprints Fingerprint Generation Problem Solaris Goal: find fingerprints, i.e. • Useful queries • Classification function that distinguishes implementations Windows XP Linux Fingerprint Generator Fingerprinting Tool

  8. Outline • Fingerprint Generation Problem • Overview of Approach • Automatic Fingerprint Generation • Experimental Results • Conclusion

  9. Candidate Queries Query Exploration Fingerprints FiG: Overview of Approach • Query exploration: Generate candidate queries • Learning: Automatically find fingerprints Learning Fingerprinting Tool FiG: Automatic Fingerprint Generation

  10. Candidate Queries Query Exploration Fingerprints FiG: Overview of Approach Learning Fingerprinting Tool FiG: Automatic Fingerprint Generation

  11. Query Exploration • Goal: generate candidate queries • query: specially crafted packet sent to host • Infeasible to generate all possible queries • All queries = all possible byte combinations of packet header • e.g., 40 bytes of TCP & IP header => 2^320 queries! • Instead, use protocol semantics to design queries

  12. Query Exploration • Queries: packets with unusual values in fields of header Explore unusual values for fields independently • Explore fields with rich semantics exhaustively i.e., all possible values e.g., TCP flags • Explore other fields selectively i.e., some valid, invalid values e.g., tcp checksum, tcp src port

  13. Candidate Queries Query Exploration Fingerprints Testing Phase: test accuracy of fingerprints Training Phase: learn potential fingerprints Data Collection FiG: Overview of Approach Learning Fingerprinting Tool

  14. Training Phase Testing Phase Data Collection Data Collection 1. Send candidate queries to hosts 2. Collect responses from hosts 3. Split into training & testing data Training Data Candidate Queries And Responses Data Collection Testing Data

  15. Testing Phase Data Collection Training Phase Training Phase Goal: learn potential fingerprints from data Intuition: different implementations differ in bytes of responses Learn which bytes of responses distinguish between implementations!

  16. Testing Phase Data Collection What we’re learning Training Phase Outline: • Features • Classification functions • Combining into fingerprints <queries, responses>Solaris 1. Extract features <queries, responses> Linux Data Collection <queries, responses> Windows 2. Combine features to distinguish implementations Training Data

  17. a b c d e f g h j k i k a b c d e f g h j i 3 0 7 4 9 6 Features • Analyze only bytes of response • Use both value & position of individual bytes in response • Capture this idea with position-substring Response byte sequence 2 10 1 8 5 Some example position-substrings

  18. position-substrings of response to query q Classification Functions Analyze each query & each implementation separately e.g. for query q, for Linux implementation YES (comes from Linux) Classification function NO (does not come from Linux) Two classes of functions: • Conjunctions • Decision lists

  19. 00 00 16 d0 00 04 16 d0 Conjunctions • Capture identical behaviour across all hosts • require position-substrings distinctive to Linux to appear in responses from ALL Linux hosts if (response[4-5]==0x0000 && response[34-35]==0x16d0) then Linux else NotLinux Linux NotLinux Positions 34-35 Positions 4-5

  20. ff ff 40 e8 Decision Lists • Need more expressivity than conjunctions • Capture multiple types of behaviour within implementation • allow many sets of position-substrings, each distinctive to implementation (e.g. Windows) if (response[34-35] == 0xffff) then Windows else if (response[34-35] == 0x40e8) then Windows else NotWindows Windows Windows Positions 34-35

  21. Testing Phase Data Collection What we’re learning Training Phase Outline: • Features • Classification functions • Combining into fingerprints <queries, responses>Solaris 1. Extract features <queries, responses> Linux Data Collection <queries, responses> Windows 2. Combine features to distinguish implementations

  22. Binary-fingerprints • Binary-fingerprint for implementation (e.g., Linux) is: • single query + • classification function: e.g., conjunction or decision list • = boolean: e.g. Linux, or Not Linux? • Binary-fingerprint separates ONE implementation • Learning (so far) finds binary-fingerprints • Conjunctions/decision lists of position-substrings (e.g. Linux or Not Linux? Windows or NotWindows?)

  23. Multi-class Fingerprint • Combine binary-fingerprints for multiple implementations • Multi-class fingerprint is: • single query + • classification functions e.g. conjunctions, decision lists • = implementation, e.g. Linux, Windows, Solaris, unknown? Linux or Not Linux? Linux? Windows? Solaris? unknown? Windows or Not Windows? Solaris or Not Solaris? Multi-class fingerprint (for query q) Binary-fingerprints for query q

  24. Training Phase Summary • Analyze responses to all queries, one at a time • Use position-substrings of bytes in response • Generate binary-fingerprints & multi-class fingerprints • Send these to testing phase

  25. Training Phase Data Collection Testing Phase Testing Phase Binary & Multi-class Fingerprints Which fingerprints are accurate? Fingerprints Testing Data Fingerprinting Tool

  26. Outline • Fingerprint Generation Problem • Overview of Approach • Automatic Fingerprint Generation • Query Exploration Phase • Learning Phase • Experimental Results • Experimental Setup & Data • Fingerprinting Results: Binary & Multi-class Fingerprints • Examples of New Fingerprints • Conclusion

  27. Experiment Setup & Data • OS fingerprint generation: • 3 OS: 77 Windows, 29 Linux, 22 Solaris hosts • 305 different queries • DNS fingerprint generation: • 5 DNS server implementations: 10 BIND8, 12 BIND9, 11 Windows Server 2003, 10 MyDNS, 11 TinyDNS hosts • 96 different queries

  28. Multi-class Fingerprints One-query fingerprint distinguishing ALL implementations simultaneously • OS: 66 queries with multi-class fingerprints • DNS: 19 queries with multi-class fingerprints • All these are decision lists! • No multi-class fingerprints with conjunctions found • Decision list has greater discriminatory power

  29. All Fingerprints: OS One-query fingerprint distinguishing ONE implementation from rest Binary-fingerprints • Lots more binary-fingerprints! • Find conjunctions & decision lists in binary-fingerprints • Again, more fingerprints with more expressive decision lists • Similar results for DNS

  30. Examples of New Fingerprints • Invalid value in data offset field: • Windows & Solaris hosts respond when value < 5 • Linux hosts do not respond • RST+ACK packets in responses: • Linux & Solaris hosts set TCP Ack # to 0 • Windows hosts set TCP Ack # to Ack # of query

  31. Examples of New Fingerprints • Behaviour on ECN & CWR bits • Linux & Windows ignore ECN & CWR bits in queries • Solaris do not ignore them (sometimes) • Behaviour of QdCount field on invalid queries (DNS fingerprinting) • Some servers copy the field value, others don’t

  32. Conclusion • Automatic fingerprint generation is possible • Use machine learning to identify fingerprints • Generate fingerprints automatically for 2 applications: • Distinguish OS • Distinguish implementations of DNS servers • Find multi-class fingerprints using decision lists • Discover new fingerprints for fingerprinting tools

  33. Thank You! Questions? shobha@cs.cmu.edu

  34. Binary-fingerprints: DNS One-query fingerprint distinguishing ONE implementation from rest • Similar results for DNS binary-fingerprints • More fingerprints with more expressive decision list • No binary-fingerprints with conjunctions for BIND8 & BIND9

  35. Related Work • Active fingerprinting: • Comer & Lin ’94: Probing to find differences in TCP • Padhye & Floyd ’01: compliance testing & protocol violations • Passive Fingerprinting • Paxson ’97: TCP implementation with traffic traces • Beverly ’04, Lippman et al ’03: classify OS • Franklin et al ’06: wireless device driver fingerprinting • Tools: • OS fingerprinting: Nmap, queso, Xprobe, Snacktime • Passive fingerprinting: p0f, siphon • Defeating OS fingerprinting: • Smart et al ’00: TCP Fingerprint scrubber • Tools: Morph, IPPersonality

More Related