Revisiting the Case for a Minimalist Approach for Network Flow Monitoring

Revisiting the Case for a Minimalist Approach for Network Flow MonitoringVyas Sekar, Michael K Reiter, Hui Zhang

Many Monitoring Applications Traffic Engineering Accounting Network Forensics Worm Detection Analyze new user apps ……. Botnet analysis Anomaly Detection

Need to estimate different metrics “Heavy-hitters” “Flow size distribution” Traffic Engineering Accounting “SuperSpreaders” Network Forensics Worm Detection Analyze new user apps ……. Botnet analysis Anomaly Detection “Degree histogram” “Entropy”, “Changes”

How are these metrics estimated? Traffic Packet Processing Monitoring (on router) Counter Data Structures Application-LevelMetrics Computation (off router)

Today’s solution: Packet Sampling Traffic Packet Processing Sample packets uniformly Monitoring (on router) FlowId Pkt/ByteCounts Counter Data Structures Flow = Packets with same Src/DstAddr and Ports Compute metrics on sampled flows Computation (off router) Application-Level Metrics Estimation is inaccurate for fine-grained analysis Extensive literature on limitations for many tasks!

Trend: Shift to Application-Specific Traffic Flow Size Distribution Entropy Superspreader Packet Processing Packet Processing Packet Processing …. Counter Data Structures Counter Data Structures Counter Data Structures Application-Level Metric Application-Level Metric Application-Level Metric Complexity: Need per-metric implementation Early commitment: Applications are a moving target

What do we ideally want? Traffic Simple Packet Processing Monitoring (on router) Support many applications Counter Data Structures Computation (off router) High accuracy Application-Specific Metrics

Outline • Motivation • A Minimalist Alternative • Evaluation • Summary and discussion

Requirements Botnet 4. Network-wideviews Anomaly 2. General acrossapplications Worm Accounting 1. Simple router implementation 3. Enable drill-down capabilities

How do we meet these requirements? 1. Simple router implementation Delay binding to specific applications 2. General across applications 3. Enable drill-down capabilities 4. Network-wide views

What does it mean to delay binding? Traffic Keep this stage as “generic” as possible Instead of splitting resources, Aggregate into generic primitives Packet Processing Monitoring (on router) Counter Data Structures Application-LevelMetrics Computation (off router)

What Generic Primitives? Two broad classes of monitoring tasks: 1. Communication structure e.g., Who talked to whom? 2. Volume structure e.g., How much traffic? • Flow sampling • [Hohn, Veitch IMC ‘03] •  Sample and Hold[Estan,Varghese SIGCOMM ’02]

Flow Sampling Traffic Hash(5-tuple) If hash < r, update Packet Processing FlowId Pkt/ByteCounts Counter Data Structures Flow = Packets with same Src/DstAddr and Ports Pick flows at random; not biased by flow size Good for “communication” patterns

Sample and Hold Traffic If flow in table, update Sample with probp If new, create entry Packet Processing FlowId Pkt/ByteCounts Counter Data Structures Flow = Packets with same Src/DstAddr and Ports Accurate counts of large flows Good for “volume” queries

How do we meet these requirements? 1. Simple router implementation Delay binding to specific applications Generic primitives = FS,SH 2. General across applications Retain NetFlow’s operational model 3. Enable drill-down capabilities 4. Network-wide views

Retain NetFlow operational model Application-Specific Minimalist FS+SH Entropy FSD Deg Can estimate new metrics! … Flow reports … Entropy Summary Statistics Difficult to do further analysis e.g., why is X high? DegreeHistogram FSD Entropy DegreeHistogram FSD

How do we meet these requirements? 1. Simple router implementation Delay binding to specific applications Generic primitives = FS,SH 2. General across applications Retain NetFlow’s Operational model Keep flow reports 3. Enable drill-down capabilities Network-wide resource management 4. Network-wide views

Network-Wide Sample-and-Hold Repeating Sample-and-Hold wastes resources  Do it once per-path FS+SH FS+SH 1 1 3 2 5 1 FS+SH 1 Sample-and-Hold Flow Sampling 5 FS+SH FS+SH 1 5 5 4 5 7

Network-Wide Flow Sampling Use cSamp [NSDI’08] to configure flow sampling capabilities Hash-based coordination  Non-overlapping sets of flows Network-wide Optimization Operator goals e.g., per-path guarantee 1 1 9 3 2 8 3 1 2 Flow Sampling 7 9 4 8 5 4 5 7 5

Putting the pieces together: “Minimalist” Proposal Traffic Flow Sampling Sample & Hold hHash(flowid) If h in FS_Range(path) Create/Update If Ingress(path) If flow in table Update With probSH_p(path) If new Create FlowId Pkt/ByteCounts FS_Range(path), SH_p(path) are configuration parameters e.g., via network-wide optimization using cSamp+

What do we ideally want? Traffic Simple Packet Processing ✔ Monitoring (on router) Support many applications ✔ Counter Data Structures Computation (off router) High accuracy Application-Specific Metrics ?

Outline • Motivation • A Minimalist Alternative • Evaluation • Compare FS+SH vs. application-specific • Summary and discussion

Assumptions in resource normalization • Hardware requirements are similar • Both need per-packet array/key-value updates • More than pkt sampling, but within router capabilities • Processing costs • Online cost lower for minimalist (don’t need per-app-instance) • Offline cost is higher for minimalist (but can be reduced, if necessary) • Reporting bandwidth • Higher for minimalist, but < 1% of network capacity • Memory for counters • Bottleneck is SRAM (Flow headers can be offloaded to DRAM) • We conservatively assume 4X more per-counter cost

Head-to-Head Comparison Application-Specific Minimalist Normalize SRAM + FSD Entropy Degree FS+SH = + Application Portfolio Flow Size Distribution Outdegree Histogram Flow Size Distribution Outdegree Histogram … … Relative Accuracy (Minimalist) – Accuracy (AppSpecific) accuracy = --------------------------------------------------------------- difference Accuracy (AppSpecific)

Resource split between FS and SH Run application-specific algorithms with recommended parameters (details in paper) Measure memory use; Run FS+SH with aggregate, but normalized (1/4X) memory Packet trace from CAIDA; consistent over other traces +  good -  bad We pick 80-20 split as a good operation point Relative difference is positive for most applications!

Varying the application portfolio Packet trace from CAIDA; consistent over other traces Minimalist vs. Application-specific under same resources +  good -  bad Relative accuracy difference Application portfolio More tasks or some resource-intensive  Better across entire portfolio! “Sharing” effect across estimation tasks

Network-Wide View Flow-level traces from Internet2. Configure Application-Specific per PoP Measure resource consumption, normalize and give to network-wide FS+SH Introduces some biases due to duplicates Lower  Better Configured per-ingress  can’t get network-wide! 1. App-Specific: Difficult to generate different views e.g., per-OD-pair 2. Coordination: better performance & operational simplicity

Conclusions and discussion Even a simple “minimalist” approach might work Key: Focus on portfolio rather than individual tasksProposal: FS + SH (complementary) ; cSamp-like mgmt • Implications for device vendors and operators • Late binding, lower complexity • Quest for feasibility not optimalityBetter primitives, combination, estimation?Is this sufficient?

Revisiting the Case for a Minimalist Approach for Network Flow Monitoring