1 / 16

Gnort: High Performance Intrusion Detection Using Graphics Processors

Learn how offloading pattern matching to GPUs can significantly boost intrusion detection system performance. Speed up throughput, leverage Aho-Corasick algorithm, and parallelize packet searching efficiently.

cgordon
Download Presentation

Gnort: High Performance Intrusion Detection Using Graphics Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gnort: High Performance Intrusion Detection Using Graphics Processors GiorgosVasiliadis, SpirosAntonatos, MichalisPolychronakis, EvangelosMarkatos, Sotiris Ioannidis Institute of Computer Science Foundation for Research and Technology Hellas

  2. General Idea • How to speed up the processing throughput of intrusion detection systems by offloading the pattern matching operations to the GPU. Giorgos Vasiliadis ICS-FORTH

  3. Introduction • The problem • Network Intrusion Detection Systems (NIDS) are based on String Matching for detecting and preventing from well-known attacks • String Matching process accounts up to 75% of the total CPU processing • String Matching Algorithms • Aho-Corasick • Specialized hardware devices (NP, FPGAs, ASICs) • Complex to modify and program • Poor flexibility • Graphics Cards • Easy to program • Powerful and ubiquitous • Researches have begun exploring ways to tap their power for non-graphics applications Giorgos Vasiliadis ICS-FORTH

  4. Why use the GPU ? • The GPU is specialized for compute-intensive, highly parallelcomputation Giorgos Vasiliadis ICS-FORTH

  5. NVIDIA GeForce SIMD Architecture • Many Multiprocessors • Each multiprocessor contains many Stream Processors • Memory model • Shared On-Chip Memory • 1 cycle • Constant Memory • 400-600 cycles; 1 cycle if cached • Texture Memory • 400-600 cycles; 1 cycle if cached • Global Device Memory • 400-600 cycles Size GPU can be used as a general purpose processor, capable of executing many threads in parallel Giorgos Vasiliadis ICS-FORTH

  6. The Aho-Corasick Algorithm • Used in most modern NIDSes • Scans for multiple patterns simultaneously • Preprocess all patterns to build a state machine • The state machine is used to scan for multiple patterns simultaneously at linear time • Complexity is independent of the number of patterns Example: P={he, she, his, hers} Giorgos Vasiliadis ICS-FORTH

  7. Mapping Aho-Corasick on GPU • How to represent the State Machine ? • Snort represent each state as an array of pointers • It is difficult to map them on the GPU memory • Transform to a 2D array • Can easily bind to Texture Memory • Texture fetches are cached • Aho-Corasick exhibits strong locality of references • Random access memory read • The usage of Texture Memory boosts GPU execution time about 19 % Giorgos Vasiliadis ICS-FORTH

  8. Parallelizing Packet Searching (1/2) • Assigning a Single Packet to each Multiprocessor • Each packet is copied to the shared memory of the Multiprocessor • Stream Processors search different parts of the packet concurrently • Overlapping computation • Matching patterns may span consecutive chunks of the packet • Same amount of work per Stream Processor • Stream Processors will be synchronized Giorgos Vasiliadis ICS-FORTH

  9. Parallelizing Packet Searching (2/2) • Assigning a Single Packet to each Stream Processor • Each packet is processed by a different Stream Processor • No overlapping computation • Different amount of work per Stream Processor • Stream processors of the same Multiprocessor will have to wait until all have finished Giorgos Vasiliadis ICS-FORTH

  10. Software Mapping • Packets are transferred to the GPU in batches • Performs much better than making each transfer separately • Packets are stored to a buffer that is copied to the GPU when gets full • Use page-locked memory to store the packets • Higher transfer throughput from host to device • Copies are performed using DMA, without occupying the CPU • CPU and GPU execution can overlap Giorgos Vasiliadis ICS-FORTH

  11. Evaluation (1/2) • Scalability as a function of the number of patterns • We ran Snort using random generated patterns • All patterns are matched against every packet • Payload trace contained UDP 800-bytes packets of random payload • Throughput remains constant when #patterns increases • 2.4x faster than the CPU Giorgos Vasiliadis ICS-FORTH

  12. Evaluation (2/2) • Throughput as a function of the packets size • Ran Snort using 1000 random patterns • All patterns are matched against every packet • 2.3 Gbit/s for full packets • 3.2xfaster compared to the CPU • Both GPU implementations do not present significant differences in performance Giorgos Vasiliadis ICS-FORTH

  13. Evaluation with real input and rules • Experimental setup • Two PCs connected via a 1 Gbit/s Ethernet switch • To directly compare with prior work [Jacob et al], we re-implemented the Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM) algorithms on the GPU. Giorgos Vasiliadis ICS-FORTH

  14. Evaluation with real input and rules • Snort loaded about 8000 patterns. • Preprocessors and PCRE were disabled • Original Snort (AC) cannot process all packets in rates higher than 300 Mbit/s • GPU-assisted Snort (AC1, AC2) begins to loose packets at 600 Mbit/s • 200% improvement • KMP and BM algorithms used from [Jacob et al] perform worse in all cases Giorgos Vasiliadis ICS-FORTH

  15. Conclusion • Graphics cards can be used effectively to speed up Network Intrusion Detection Systems. • Low-cost • Easy programming • Future work includes • Transfer the packets directly from the NIC to the GPU • Utilize multiple GPUs on multi-slot motherboards Giorgos Vasiliadis ICS-FORTH

  16. Thank you Any questions? gvasil@ics.forth.gr Giorgos Vasiliadis ICS-FORTH

More Related