FPGA Based String Matching for Network Processing Applications

FPGA Based String Matching for Network Processing Applications JanardhanSingaraju, John A. Chandy ENGG*3050 RCS Winter 2014 March 24, 2014 Presented by: Justin Riseborough Albert Tirtariyadi

Content • Introduction • String Lookup Cache • Architectures • System Interaction • Systems comparison • Network Intrusion Detection • Architectures • System Interaction • Implementations • Critique

Keywords • Network processing • String matching • Content Addressable Memory (CAM) & Cache • Bottlenecks • Fixed-Size/Non-Fixed-Size keys • Cascading, propagating • Parallelism

Introduction • String matching are used in search engines, and network intrusion detection • Network processing applications require frequent string matching for specific keywords • As networks gets faster, it becomes more difficult for GPP to keep up • Bottlenecks are found in memory and also in slow implementation algorithms/methods

Software Algorithms Hardware Implementation • Rabin-Karp • Compares hashes of inputs instead of direct character matching • Knuth-Morris-Pratt • Character by character matching; skips non-matching • Boyer-Moore • Uses pre-computed functions to determine shifting distance • Finite automata methods • Translates finite automata graphs to FPGA circuitry • CAMs • Caches and lookup tables • Cellular automata • Finite state machines Current Implementations

Section I String Lookup Cache

String Lookup Cache • Hardware implementation based on CAMs, cellular automaton and caching • Caches retain frequently used values, reducing the need to constantly look up address values • Compatible with parallel processing, prefix sharing and pattern partitioning • Very high throughputs with low area overhead • Drawback of CAMs and hardware caches is the reliance on fixed-size keys • Implementations for non-fixed-size keys requires additional overhead

System Architecture

Content Addressable Memory • Hardware implementation of 2D [associative] arrays/ADT • In VLSI, the cells are transistors • In an FPGA, storage cells are registers, comparators are XOR gates

CAM as Character Match Array (CMA) • Takes characters from the network processor on successive clock cycles • Columns corresponds to a character in keyword • Input character is applied simultaneously to all n columns • Column match signal becomes high if all input bits matches • Storage cell used to indicate end of keyword

Processor Element (PE) Array • An array of finite state machines that carries out the approximate match algorithm • May contain multiple keywords from the CAM • Takes the match signals from the CAM and sets a PE flag which are forwarded to subsequent PEs • Evaluates entire input strings in linear time relative to the size of the input stream

CMA and PE Interaction

Map Table and Outputs • The map table takes the PE# and outputs the address to the value or an indirect pointer to the value object • The map table has as many slots as there are PEs • If words are too long, it can cause holes in the map table

System Interaction

Implementations Comparison

Section II Network Intrusion detection

Network Intrusion Detection • The process of identifying and analyzing packets that may contain threats to the organization’s network • Time consuming process that grows quickly as defined rule-set or signatures grows large • String matching is the most computationally intensive part of the intrusion detection • Every incoming packet is compared against several pre-defined signatures

Problems in the CAM Architecture • CAM-based designs cannot easily handle regular expressions • NIDs signatures are not of a fixed-size • (ie. CAM contains FOO and BAR, input stream is AFOOBARCD. In a 3-character size setup, the comparisons will be made against AFO, OBA and RCD; none of these will match and will slip right through the detection system) • CAM arrays are very large in area

Proposed Solution • Use discrete comparators instead of CAMs • Sacrifices the ability to update signatures dynamically; a fair tradeoff as signatures change relatively infrequently • Use p-rows of comparators for parallelism to match several characters in one clock cycle • Remove the aligned keyword approach as incoming streams may not be aligned to a certain size boundary

System Architecture

Processor Architecture

Processor Element Flow • Start at the beginning of the signature • Based on previous PE and current PE • If previous signal and current signal is a match, propagate match signal until end of signature • At the end of the signature, if entire signature match, flag the sig_match output

Signature Match Processor Example • Input string ‘144’ performed over 2 clock cycles • ‘1’ is checked in first cycle, sets off a match signal into the SMA • ‘4’ is checked in second cycle, sets off match signal into the SMA • Match signal for ‘1’ is present from previous clock cycle

Signature Match Processor Example • The ‘4’ is duplicated, so it simply propagates the first match signal to the second as a carry • Since this is the end of the signature, the output is a match due to the propagated match signals && sig_end

Address Output Logic • In order for the SMP to be useful, we also need to know which signatures caused the match • This is handled by the word match buffer, which maintains the position of the signature match • When the last character being processed has been reached, the match address output logic begins working on the buffer entries

Address Output Logic • A binary tree is used for the matching signatures • Decoding starts, and a signal is sent to the control circuitry stating there are matches • A pointer then propagates up the tree, generating a bit of the final address based on matches • Binary trees are fast and efficient, time to process is ~M cycles where M is the number of matches

FPGA Implementation • As parallelism increases, throughput increases, frequency decreases due to complexity • As characters increases, area increases, frequency decreases and throughput decreases

Implementation Comparison

Critique • New terms and unknown works referred to • Difficult to follow in some areas due to inconsistencies and how the topic is presented • Lots of procedure / methodology on implementation • Very detailed works • Good examples to strengthen theoretical explanations • Implementation data given for comparison purposes

Questions?

References • All figures and information used in this presentation pulled from the article • JanardhanSingaraju, John A. Chandy*, FPGA Based String Matching For Network Processing, ScienceDirect Microprocessors and Microsystems, December 14, 2007

FPGA Based String Matching for Network Processing Applications

FPGA Based String Matching for Network Processing Applications

Presentation Transcript

String Matching

Approximate String Matching

String Matching

Exact String Matching, Suffix Trees, and Applications

FPGA-Based Wireless Sensor Network Architecture for High Performance Applications

String Matching

String Matching

String Matching

String Matching

String Matching

String Matching

String Matching

String Matching II

String Matching

String Matching

String Matching Algorithms

String Matching

String matching

String Matching

String Matching

String Matching