1 / 20

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching. Presenter: Junchen Jiang (Tsinghua University) Yang Xu (Polytechnic Institute of NYU) Tian Pan (Tsinghua University) Bin Liu (Tsinghua University). Email: livejc@gmail.com. Outline. Background Regular expression

varden
Download Presentation

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching Presenter: Junchen Jiang (Tsinghua University) Yang Xu (Polytechnic Institute of NYU) Tian Pan (Tsinghua University) Bin Liu (Tsinghua University) Email: livejc@gmail.com

  2. Outline • Background • Regular expression • DFA space explosion • Problem statement & Idea of pattern grouping • Pattern-Based DFA • Grouping algorithms • Results • Summary

  3. Background (cont.) • RegEx (pattern) matching is now widely used • Network Intrusion Detection Systems (SNORT) • L7-filter: protocol identification • Example: ^220[\x09-\x0d -~]*ftp • Common Technique • Deterministic Finite Automaton (DFA) • Challenges • High memory requirement • Low processing speed

  4. Background (cont.) • Space Problem – DFA state explosion • Exponential worst-case space complexity • Solution – Pattern Grouping • Example DFA DFA P3 P1 Two smaller DFAs Fast memories One big DFA Slow memory P4 P2 DFA P5 After partition patterns into two groups

  5. Outline • Background • Problem Statement & Idea • Pattern-Based DFA & Pattern-Based Structure • Grouping Algorithms • Results • Summary

  6. Problem Statement & Idea (cont.) • Minimize group number (speed) while greatly reduce DFA size (space) • Regex Set A • For General purpose processor architecture • Sequentially process all groups stored in one shared memory • For Multi-parallel processor architecture • Parallel processor for one group stored in individual memory • Challenge • Quantify the influence of each pattern!

  7. Problem Statement & Idea (cont.) • Traditional Approach – Group patterns with little interactions together. • Pattern p and q have interaction iff DFA of p and q has a size larger than the total size of DFA of p and the one of q. • In our evaluation, only 23.6% pattern pairs in L7-filter and about only 5% pattern pairs have no interaction! • Interaction between patterns is not an accurate measurement for grouping patterns! • Our contribution • Add new specification to DFA structure by which we can quantify the influence of each pattern in the final DFA. • Based on new DFA structure, give more refined grouping algorithms

  8. Problem Statement & Idea (cont.) • Why traditional DFA insufficient ? • Observation: No information of individual pattern is preserved in the resulting DFA (renumbered or not) • Pattern-based DFA (P-DFA) • Objective: Store information of each pattern in the states

  9. Outline • Background • Problem Statement & Idea • Pattern-Based DFA & Pattern-Based Structure • Grouping Algorithms • Results • Summary

  10. Pattern-Based DFA (P-DFA) (cont.) • Construction Traditional DFA P-DFA P1 P2 P3 P1 P2 P3 NFA NFA NFA NFA DFA DFA DFA Equivalent DFA P-DFA

  11. Pattern-Based DFA (P-DFA) (cont.) • Each state in P-DFA contains some sub-states, each of which is derived from one RegEx pattern. • Example: state 0,3,6 (sub-state 0: P1, 3: P2, 6: P3) • Stored in Pattern-Based Structure (PBS) 1,3,8 ^a ^ax b ^b c DFA of P1 0,3,6 1,3,7 2,3,6 a b 0 1 2 a b x x b y ^x ^y DFA of P2 0,4,6 1,4,6 2,4,6 b 4 5 3 x y y y b a ^ac a 0,5,6 1,5,6 1,4,8 a DFA of P3 P-DFA of P1, P2, P3 7 8 6 y c a y 1,4,7 c ^ac

  12. Pattern-Based DFA (P-DFA) (cont.) • Add pattern to P-DFA is trivial • Remove one pattern • remove sub-states + merge states • We can predict the size of P-DFA when any pattern is removed. 1,3,8 ^ax c 0,3,6 1,3,7 2,3,6 a a a b c 3,7 3,8 c x x b ^ax 4,8 a a 0,4,6 1,4,6 2,4,6 b 4,7 3,6 a ^ac a y y y b x a 4,6 5,6 y Remove P1: Remove all red numbers and merge identical states 0,5,6 1,5,6 1,4,8 y ^ay y y 1,4,7 c P-DFA of P2, P3 P-DFA of P1, P2, P3

  13. Outline • Background • Problem Statement & Idea • Pattern-Based DFA • Grouping Algorithms • Results • Summary

  14. Grouping Algorithms • General Scheme of pattern grouping using P-DFA. • Core idea: Get a P-DFA of all patterns first, then greedily subtract the pattern that maximizes the decrease of the size of P-DFA. Greedy pattern grouping algorithm Hardware Implementation (Matching) DFA RegEx Pattern #1 P-DFA #1 DFA PBS Software Operation (Combine, Delete) … … PBS RegEx Pattern #k DFA P-DFA #t PBS P-DFA

  15. Grouping Algorithms • General Processor Architecture (Group1 ) • Generate the complete P-DFA • Repeat: split the current largest group in size into two small groups • Until the sum of all groups’ size is smaller than the given limit L. • Multi-parallel processor architecture (Group2) • For any group • If the size of its P-DFA is larger than the limit then • Extracts a pattern from the group so that the size of P-DFA is more closer to the limit L

  16. Outline • Background • Problem Statement & Idea • Pattern-Based DFA • Grouping Algorithms • Results • Summary

  17. Experimental Result (cont.) • Evaluation database: randomly select 300 RegEx patterns from Snort’s web pcre ruleset • General processor architecture

  18. Experimental Result (cont.) • Multi-parallel processor architecture

  19. Summary • RegEx pattern matching is challengeable • Elaborately grouping RegEx patterns to ease memory inflation • We present P-DFA, a new method to construct DFA • Quantify the influence of each pattern • Store information of each pattern in the state • Experiments show that our approach reduces almost half the number of groups in comparison with the traditional method.

  20. Questions? Email: livejc@gmail.com

More Related