Gnort: High Performance Intrusion Detection Using Graphics Processors

Gnort: High Performance Intrusion Detection Using Graphics Processors GiorgosVasiliadis, SpirosAntonatos, MichalisPolychronakis, EvangelosMarkatos, Sotiris Ioannidis Institute of Computer Science Foundation for Research and Technology Hellas

General Idea • How to speed up the processing throughput of intrusion detection systems by offloading the pattern matching operations to the GPU. Giorgos Vasiliadis ICS-FORTH

Introduction • The problem • Network Intrusion Detection Systems (NIDS) are based on String Matching for detecting and preventing from well-known attacks • String Matching process accounts up to 75% of the total CPU processing • String Matching Algorithms • Aho-Corasick • Specialized hardware devices (NP, FPGAs, ASICs) • Complex to modify and program • Poor flexibility • Graphics Cards • Easy to program • Powerful and ubiquitous • Researches have begun exploring ways to tap their power for non-graphics applications Giorgos Vasiliadis ICS-FORTH

Why use the GPU ? • The GPU is specialized for compute-intensive, highly parallelcomputation Giorgos Vasiliadis ICS-FORTH

NVIDIA GeForce SIMD Architecture • Many Multiprocessors • Each multiprocessor contains many Stream Processors • Memory model • Shared On-Chip Memory • 1 cycle • Constant Memory • 400-600 cycles; 1 cycle if cached • Texture Memory • 400-600 cycles; 1 cycle if cached • Global Device Memory • 400-600 cycles Size GPU can be used as a general purpose processor, capable of executing many threads in parallel Giorgos Vasiliadis ICS-FORTH

The Aho-Corasick Algorithm • Used in most modern NIDSes • Scans for multiple patterns simultaneously • Preprocess all patterns to build a state machine • The state machine is used to scan for multiple patterns simultaneously at linear time • Complexity is independent of the number of patterns Example: P={he, she, his, hers} Giorgos Vasiliadis ICS-FORTH

Mapping Aho-Corasick on GPU • How to represent the State Machine ? • Snort represent each state as an array of pointers • It is difficult to map them on the GPU memory • Transform to a 2D array • Can easily bind to Texture Memory • Texture fetches are cached • Aho-Corasick exhibits strong locality of references • Random access memory read • The usage of Texture Memory boosts GPU execution time about 19 % Giorgos Vasiliadis ICS-FORTH

Parallelizing Packet Searching (1/2) • Assigning a Single Packet to each Multiprocessor • Each packet is copied to the shared memory of the Multiprocessor • Stream Processors search different parts of the packet concurrently • Overlapping computation • Matching patterns may span consecutive chunks of the packet • Same amount of work per Stream Processor • Stream Processors will be synchronized Giorgos Vasiliadis ICS-FORTH

Parallelizing Packet Searching (2/2) • Assigning a Single Packet to each Stream Processor • Each packet is processed by a different Stream Processor • No overlapping computation • Different amount of work per Stream Processor • Stream processors of the same Multiprocessor will have to wait until all have finished Giorgos Vasiliadis ICS-FORTH

Software Mapping • Packets are transferred to the GPU in batches • Performs much better than making each transfer separately • Packets are stored to a buffer that is copied to the GPU when gets full • Use page-locked memory to store the packets • Higher transfer throughput from host to device • Copies are performed using DMA, without occupying the CPU • CPU and GPU execution can overlap Giorgos Vasiliadis ICS-FORTH

Evaluation (1/2) • Scalability as a function of the number of patterns • We ran Snort using random generated patterns • All patterns are matched against every packet • Payload trace contained UDP 800-bytes packets of random payload • Throughput remains constant when #patterns increases • 2.4x faster than the CPU Giorgos Vasiliadis ICS-FORTH

Evaluation (2/2) • Throughput as a function of the packets size • Ran Snort using 1000 random patterns • All patterns are matched against every packet • 2.3 Gbit/s for full packets • 3.2xfaster compared to the CPU • Both GPU implementations do not present significant differences in performance Giorgos Vasiliadis ICS-FORTH

Evaluation with real input and rules • Experimental setup • Two PCs connected via a 1 Gbit/s Ethernet switch • To directly compare with prior work [Jacob et al], we re-implemented the Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM) algorithms on the GPU. Giorgos Vasiliadis ICS-FORTH

Evaluation with real input and rules • Snort loaded about 8000 patterns. • Preprocessors and PCRE were disabled • Original Snort (AC) cannot process all packets in rates higher than 300 Mbit/s • GPU-assisted Snort (AC1, AC2) begins to loose packets at 600 Mbit/s • 200% improvement • KMP and BM algorithms used from [Jacob et al] perform worse in all cases Giorgos Vasiliadis ICS-FORTH

Conclusion • Graphics cards can be used effectively to speed up Network Intrusion Detection Systems. • Low-cost • Easy programming • Future work includes • Transfer the packets directly from the NIC to the GPU • Utilize multiple GPUs on multi-slot motherboards Giorgos Vasiliadis ICS-FORTH

Thank you Any questions? gvasil@ics.forth.gr Giorgos Vasiliadis ICS-FORTH

Gnort: High Performance Intrusion Detection Using Graphics Processors