Packet Classification On Multiple Fields

Packet Classification On Multiple Fields Pankaj Gupta and Nick McKeown Computer Systems Laboratory, Stanford University {pankaj,nickm}@stanford.edu

Why classify packets ? To determine which flow they belong to =>to decide what service they should receive Router needs to identify the flow of every incoming packet and then perform appropriate special processing

Special Processing Requires Identification of Flows • All packets of a flow obey a pre-defined rule and are processed similarly by the router • Classification is based on an arbitrary number of fields in the packet header • E.g. a flow = (src-IP-address, dst-IP-address), or a flow = (dst-IP-prefix, protocol) etc.

Network services : • Routing • Access-control in firewalls • Policy-based routing • Provision of differentiated qualities of service • Traffic billing

What to determine? • Forward or filter a packet? • Where to forward it to? • What class of service to receive? • How much to charge for transpoting it?

Packet Classifier HEADER Forwarding Engine Action Packet Classification Classifier (policy database) rules Action ---- ---- ---- ---- Incoming Packet ---- ----

Need for Differentiated Services E1 Y E2 Z ISP3 NAP X ISP2 ISP1

Table 2 : Class Relevant Packet Fields Source Link-layer Address,Source Transport port number Email and from ISP2 Source Link-layer Address From ISP2 From ISP3 and going to E2 Source Link-layer Address Destination Network-Layer Address All other packets ---------

Packet Classification: Problem Definition • Given a classifier C with N rules, Rj, 1  j  N, where Rj consists of three entities: • A regular expression Rj[i], 1  i  d, on each of the d header fields, • A number, pri(Rj), indicating the priority of the rule in the classifier, and • An action, referred to as action(Rj). For an incoming packet P with the header considered as a d-tuple of points (P1, P2, …, Pd), the d-dimensional packet classification problem is to find the rule Rm with the highest priority among all the rules Rj matching the d-tuple; i.e., pri(Rm) > pri(Rj),  j  m, 1 j  N, such that Pi matches Rj[i], 1  i  d. We call rule Rm the best matching rule for packet P.

Classification is a Generalization of Lookup • Classifier = routing table • One-dimension (destination address) • Rule = routing table entry • Regular expression = prefix • Action = (next-hop-address, port) • Priority = prefix-length

Example 4D classifier 152.163.198.4/255.255.255.255 152.163.36.0/255.255.255.255 tcp R6 Gt 1023 Permit

Example Classification Results

General characteristics of Classifiers Number of rules not a large number “0.7% more than 1000” “mean of 50 rules” Number of fields max of 8 fields : src/dst network layeraddress src/dst transport layer port numbers type-of-service field”TOS” protocol field transport-layer protocol flags 17% of rules : 1 field , 23% : 3 fields , 60% : 4 fields

General characteristics of Classifiers (contd.) Transport-layer protocol field TCP,UDP,ICMP,IGMP,(E)IGRP,GRE,IPINIP or ‘*’ Transport-layer field specification 10.2% have range specification Rules with non-contiguous mask 14% of classifiers have & 10.2% of all rules Many different rules in the same classifier share a number of field specification Redundant rules 8% of rules in classifiers 4.4% of rules are backward redundant 3.6% of rules are forward redundant

Goals The algorithm should: • Be fast enough to operate at OC48c linerates and preferably at OC192c linerates • Allow matching on arbitrary fields • Support general classification rules prefixes,operators,wildcards • Be suitable for implementation in both software and hardware • Not have expensive memory requirements • Scale in terms of both memory and speed with the size of the classifier

Previous work simplest classification algorithm : evaluating rules sequentially simple and efficient in its use of memory poor scaling properties : time grows linearly with the number of rules

Classification with Ternary-CAMs TCAM Memory array 0 0 1 1 2 0 3 0 Packet Header Priority The first matching rule encoder M 1 Too expensive,too small,and consume too much power for large classifiers

Structure of the Classifiers 4 regions R3 R2 R1 A classification algorithm must keep a record of each region and be able to determine the region to which each newly arriving packet belongs

{R2, R3} {R1, R2} {R1, R2, R3} Structure of the Classifiers 7 regions R3 R2 R1 The more region the classifier contains,the more storage is required and the longer it takes to classify a packet

Algorithm Packet Classification problem : S bits in the packet header => T bits of classID T = log N “ N is number of classifier rules “ A simple and fast way of doing this mapping : pre-compute the value of classID for each of the 2^S different packet headers : • Yield the answer in one step “in one memory access” • Require too much memory

Multi-step 2S =2128 264 232 2T =212 Recursive Flow Classificationperform the same mapping but over several stages One-step 2S =2128 2T =212

Recursive Flow Classification Consists of P phases each with a set of parallel memory lookups Each lookup is a reduction : the value returned by the memory lookup is shorter than the index of the memory access

Chunking of a Packet Used to index into multiple memories in parallel Chunk #0 Source L3 Address Destination L3 Address L4 protocol and flags Source L4 port Destination L4 port Chunk #7 Type of Service Packet Header

Packet Flow eqID index action Header Phase 0 Phase 1 Phase 2 Phase 3

Example 4D classifier 152.163.198.4/255.255.255.255 152.163.36.0/255.255.255.255 tcp R6 Gt 1023 Permit

In phase 0 chunk#6 : 1.{www=80} 2.{20,21} 3.{>1023} 4.{remaining numbers} can be encoded by” 00b to 11b : eqIDs” reduction : 16 to 2 bits chunk#4 : 1.{tcp} 2.{udp} 3.{remaining numbers} can be encoded by 2 bits reduction : 8 to 2 bits In phase 1 CESs : .{({80},{udp})} 2.{({20-21},{udp})} 3.{({80},{tcp})} 4.{({gt 1023},{tcp})} 5.{all remaining crossproducts} “concatenating” reduction : 4 to 3 bits can be encoded by 3 bits total reduction : 24 to 3 bits

RFC preprocessing for chunk j of phase 0 For each rule rl in the classifier project ith component of rl onto the number line (from 0 to 2^b-1) making the start and end points of each of its constituent intervals End for ; Bmp := 0 ; For n in 0…2^b-1 If(any rule starts or ends at n) update bmp ; if(bmp not seen earlier) eq := new_Equivalence_Class( ) ; eq -> cbm := bmp ; end if ; End if ; Else eq := the equovalence class whose cbm is bmp ; table_0_j[n] = eq->ID ; End for ;

RFC preprocessing for chunk i of phase j(j>0) Index := 0 ; listEqs := nil ; For each CES,c1eq,of chunk c1 For each CES,c2eq,of chunk c2 … For each CES,cmeq,of chunk cm intersectedBmp := c1eq->cbm& c2eq->cbm&…& cmeq->cbm neweq := searchList(listEqs,intersectedBmp) ; if(not found in listEqs) neweq := new_Equivalence_Class( ) ; neweq->cbm := bmp ; add neweq to listEqs ; end if ; table_j_i[index] := neweq->ID ; index++ ; End for ;

Performance of RFC 1.number of phasesP we combine those chunks together which have the most correlation 2.the reduction tree used we combine as many chunks as we can without causing unreasonable memory consumpsion

Choice of Reduction Tree Tree_B Tree-A 0 0 1 1 2 2 ClassID ClassID 3 3 4 4 5 5 Number of phases = P = 3 10 memory accesses

Choice of reduction tree Tree_A Tree_B 0 0 1 1 2 2 ClassID ClassID 3 3 4 4 5 5 Number of phases = P = 4 11 memory accesses

RFC lookup in Hardware Phase 1 Phase 0 Chks0 and 1 replicated SRAM1 chk0 chk1 chk0 chk1 Chks0-2 SDRAM1 Chk#0 Chk#0 (replicated) Chks3-5 SRAM2 SDRAM2 Phase 2 Clk : 125MHZ => 31.25 million packets per second

RFC lookup in software 30 lines of code in C compiled on a 333Mhz PentiumII PC running windows NT : worst case path for the code took (140clks+9tm) for three phases and (146clks+11tm) for four phases “tm : memory access time” = 60 ns => 0.98us for 3 phase & 1.1us for 4 phases close to one million packets per second the average lookup time is 50% faster than the worst case

RFc lookup operation For(each chunk,chknum of phase 0) eqNums[0][chkNum] = contents of appropriate rfctable at memory address pktFields[chkNum] ; For(phaseNum=1…numphases-1) For(each chunk,chkNum,in Phase phaseNum) chd = parent descriptor of (phaseNum,chknum) ; Index = eqNums[phaseNum of chkParents[0]][chkNum ofchkParents[0]] ; For(I=1…chd->numChkParents-1) index = index * (total #equivIDs of chd->chkParents[I]) + eqNums[phaseNum of chd->chkParents[I]] [chkNum of chd->chkParents[I]] ; End for eqNums[phaseNum][chkNum] = contents of appropriate rfctable at address index End for Return eqNums[0][numphases-1] ;

Table 6 : Src L3 31..16 Src L3 15..0 Dst L3 15..0 Dstn L4 16 bits Dst L3 31..16 L4 protocol 8 bits Action # 0.77/1… 0.0/0.0 0 0.83/1.. 4.6/1… udp * permit 1.0/255.0 0.0/0.0 4.6/1… 1 0.83/1.. udp 20-30 permit * 21 0.0/1… 2 0.83/1.. 0.77/1… 0.0/0.0 permit 3 0.0/0.0 0.0/0.0 0.0/1… * 21 deny 0.0/0.0 4 0.0/0.0 0.0/0.0 0.0/1… 0.0/0.0 * * permit

Variations and improvements of RFC 1.RFC can be extended to process a larger number of fields in each packet header 2.speed up RFC by taking advantage of available fast lookup algorithms 3.employ “adjacency groups” technique to reduce the memory requirements when processing large classifiers

Adjacency Groups Size of the RFC table ~ number of CES s R & S are adjacent in dimension I if : 1.they have the same action 2.all but the ith field have the exact same specification in the two rules 3.all rules appearing between them have either the same action or are disjoint from R two rules aresimple adjacentif they are adjacent in some dimension SO we will merge adjacent rules

Example of adjacency groups R(a1,b1,c1,d1) S(a1,b1,c2,d1) T(a2,b1,c2,d1) U(a2,b1,c1,d1) V(a1,b1,c4,d2) W(a1,b1,c3,d2) X(a2,b1,c3,d2) Y(a2,b1,c4,d2) RS(a1,b1,c1+c2,d1) TU(a2,b1,c1+c2,d1) VW(a1,b1,c3+c4,d2) XY(a2,b1,c3+c4,d2) Merge along Dimension 3 Merge along Dimension 1 RSTU(a1+a2,b1,c1+c2,d1) VWXY(a1+a2,b1,c3+c4,d2) Carry out an RFC phase Assume: chunks 1 & 2 are combined And also chunks 3 & 4 are combined RSTU(m1,n1) VWXY(m1,n2)) RSTUVWXY(m1,n1+n2) Merge Continue with RFC …

RFC: Pros and Cons • Advantages • Suitable for multiple fields • Supports non-contiguous masks • Fast accesses • Disadvantages • Large pre-processing time • Incremental updates slow • Large worst-case storage requirements

Packet Classification On Multiple Fields

Packet Classification On Multiple Fields

Presentation Transcript

Packet Classification

Packet Classification

ClassBench: A Packet Classification Benchmark

Online Multiple Kernel Classification

Approximate Caches for Packet Classification

Packet classification on Multiple Fields

Packet Classification # 3

Packet Classification on Multiple Fields

Towards a Packet Classification Benchmark

IP-Lookup and Packet Classification

Survey of Packet Classification Algorithms

Route Lookup and Packet Classification

Packet Classification on Multiple Fields

Packet Classification using Extended TCAMs

Packet Classification on PLUG Architecture

Efficient packet classification using TCAMs

Approximate Caches for Packet Classification

Performance Analysis of Packet Classification Algorithms on Network Processors

Classification on Missing Data for Multiple Imputations

Large-Scale Wire-Speed Packet Classification on FPGAs