Network Traffic Modeling

Network Traffic Modeling Mark Crovella Boston University Computer Science

Outline of Day • 9:00 – 10:45 Lecture 1, including break • 10:45 – 12:00 Exercise Set 1 • 12:00 – 13:30 Lunch • 13:30 – 15:15 Lecture 2, including break • 15:15 – 17:00 Exercise Set 2

The Big Picture • There are two main uses for Traffic Modeling: • Performance Analysis • Concerned with questions such as delay, throughput, packet loss. • Network Engineering and Management: • Concerned with questions such as capacity planning, traffic engineering, anomaly detection. • Some principal differences are that of timescale and stationarity.

Relevant Timescales • Performance effects happen on short timescales • from nanoseconds up to an hour • Network Engineering effects happen on long timescales • from an hour to months 1 usec 1 sec 1 hour 1 day 1 week

Stationarity, informally • “A stationary process has the property that the mean, variance and autocorrelation structure do not change over time.” • Informally: “we mean a flat looking series, without trend, constant variance over time, a constant autocorrelation structure over time and no periodic fluctuations (seasonality).” NIST/SEMATECH e-Handbook of Statistical Methods http://www.itl.nist.gov/div898/handbook/

The 1-Hour / Stationarity Connection • Nonstationarity in traffic is primarily a result of varying human behavior over time • The biggest trend is diurnal • This trend can usually be ignored up to timescales of about an hour, especially in the “busy hour”

Outline • Morning: Performance Evaluation • Part 0: Stationary assumption • Part 1: Models of fine-timescale behavior • Part 2: Traffic patterns seen in practice • Afternoon: Network Engineering • Models of long-timescale behavior • Part 1: Single Link • Part 2: Multiple Links

Morning Part 1:Traffic Models for Performance Evaluation • Goal: Develop models useful for • Queueing analysis • eg, G/G/1 queues • Other analysis • eg, traffic shaping • Simulation • eg, router or network simulation

A Reasonable Approach • Fully characterizing a stochastic process can be impossible • Potentially infinite set of properties to capture • Some properties can be very hard to estimate • A reasonable approach is to concentrate on two particular properties: marginal distribution and autocorrelation

Marginals and Autocorrelation Characterizing a process in terms of these two properties gives you • a good approximate understanding of the process, • without involving a lot of work, • or requiring complicated models, • or requiring estimation of too many parameters. … Hopefully!

Marginals Given a stochastic process X={Xi}, we are interested in the distribution of any Xi: i.e., f(x) = P(Xi=x) Since we assume X is stationary, it doesn’t matter which Xi we pick. Estimated using a histogram:

Histograms and CDFs • A Histogram is often a poor estimate of the pdf f(x) because it involves binning the data • The CDF F(x) = P[Xi <= x] will have a point for each distinct data value; can be much more accurate

Modeling the Marginals • We can form a compact summary of a pdf f(x) if we find that it is well described by a standard distribution – eg • Gaussian (Normal) • Exponential • Poisson • Pareto • Etc • Statistical methods exist for • asking whether a dataset is well described by a particular distribution • Estimating the relevant parameters

Distributional Tails • A particularly important part of a distribution is the (upper) tail • P[X>x] • Large values dominate statistics and performance • “Shape” of tail critically important

Light Tails, Heavy Tails • Light – Exponential or faster decline • Heavy – Slower than any exponential f1(x) = 2 exp(-2(x-1)) f2(x) = x-2

Examining Tails • Best done using log-log complementary CDFs • Plot log(1-F(x)) vs log(x) 1-F2(x) 1-F1(x)

Heavy Tails Arrive pre-1985: Scattered measurements note high variability in computer systems workloads 1985 – 1992: Detailed measurements note “long” distributional tails • File sizes • Process lifetimes 1993 – 1998: Attention focuses specifically on (approximately) polynomial tail shape: “heavy tails” post-1998: Heavy tails used in standard models

Power Tails, Mathematically We say that a random variable X is power tailed if: where a ~ b means Focusing on polynomial shape allows Parsimonious description Capture of variability inaparameter

A Fundamental Shift in Viewpoint • Traditional modeling methods have focused on distributions with “light” tails • Tails that decline exponentially fast (or faster) • Arbitrarily large observations are vanishingly rare • Heavy tailed models behave quite differently • Arbitrarily large observations have non-negligible probability • Large observations, although rare, can dominate a system’s performance characteristics

Heavy Tails are Surprisingly Common • Sizes of data objects in computer systems • Files stored on Web servers • Data objects/flow lengths traveling through the Internet • Files stored in general-purpose Unix filesystems • I/O traces of filesystem, disk, and tape activity • Process/Job lifetimes • Node degree in certain graphs • Inter-domain and router structure of the Internet • Connectivity of WWW pages • Zipf’s Law

Evidence: Web File Sizes Barford et al., World Wide Web, 1999

Evidence: Process Lifetimes Harchol-Balter and Downey, ACM TOCS, 1997

The Bad News • Workload metrics following heavy tailed distributions are extremely variable • For example, for power tails: • When a  2, distribution has infinite variance • When a 1, distribution has infinite mean • In practice, empirical moments are slow to converge – or nonconvergent • To characterize system performance, either: • Attention must shift to distribution itself, or • Attention must be paid to timescale of analysis

Heavy Tails in Practice Power tailswith a=0.8 Large observations dominate statistics (e.g., sample mean)

Autocorrelation • Once we have characterized the marginals, we know a lot about the process. • In fact, if the process consisted of i.i.d. samples, we would be done. • However, most traffic has the property that its measurements are not independent. • Lack of independence usually results in autocorrelation • Autocorrelation is the tendency for two measurements to both be greater than, or less than, the mean at the same time.

Autocorrelation

Measuring Autocorrelation Autocorrelation Function (ACF) (assumes stationarity): R(k) = Cov(Xn,Xn+k) = E[Xn Xn+k] – E2[X0]

ACF of i.i.d. samples

How Does Autocorrelation Arise? Network traffic is the superposition of flows Request Server Client Internet (TCP/IP) “click” Response

Why Flows?: Sources appear to be ON/OFF ON OFF { { P1: P2: P3: • • •

Superposition of ON/OFF sources  Autocorrelation P1: P2: P3:

Morning Part 2: Traffic Patterns Seen in Practice Traffic patterns on a link are strongly affected by two factors: • amount of multiplexing on the link • Essentially – how many flows are sharing the link? • Where flows are bottlenecked • Is each flow’s bottleneck on, or off the link? • Do all bottlenecks have similar rate?

Low Multiplexed Traffic • Marginals: highly variable • Autocorrelation: low

Highly MultiplexedTraffic

High Multiplexed, Bottlenecked Traffic • Marginals: tending to Gaussian • Autocorrelation: high

Highly Mutiplexed, Mixed-Bottlenecks dec-pkt-1 (Internet Traffic Archive)

Alpha and Beta Traffic • ON/OFF model revisited: High variability in connection rates (RTTs) Low rate = beta High rate = alpha + + + = = stableLevy noise fractionalGaussian noise Rice U., SPIN Group

Long Range Dependence R[k] ~ a-k a > 1 R[k] ~ k-a 0 < a < 1 H=1-a/2

Correlation and Scaling • Long range dependence affects how variability scales with timescale • Take a traffic timeseries Xn, sum it over blocks of size m • This is equivalent to observing the original process on a longer timescale • How do the mean and std dev change? • Mean will always grow in proportion to m • For i.i.d. data, the std dev will grow in proportion to sqrt(m) • So, for i.i.d. data, the process is “smoother” at longer timescale

Self-similarity: unusual scaling of variability • Exact self-similarity of a zero-mean, stationary process Xn • H: Hurst parameter 1/2 < H < 1 • H = 1/2 for i.i.d. Xn • LRD leads to (at least) asymptotic s.s.

Self Similarity in Practice H=0.95 H=0.50 10ms 1s 100s

The Great Wave (Hokusai)

How Does Self-Similarity Arise? Self-similarity  Autocorrelation  Flows Autocorrelation declines like a power law  Distribution of flow lengths has power law tail 

Power Tailed ON/OFF sources Self-Similarity { { ON OFF P1: P2: P3:

Measuring Scaling Properties • In principle, one can simply aggregate Xn over varying sizes of m, and plot resulting variance as a function of m • Linear behavior on a log-log plot gives an estimate of H (or a). • Slope > -1 indicates LRD WARNING: this method is very sensitive to violation of assumptions!

Better: Wavelet-based estimation Veitch and Abry

Optional Material: PerformanceImplications of Self-Similarity

Performance implications of S.S. • Asymptotic queue length distribution (G/G/1) • For SRD Traffic: • For LRD Traffic: • Severe - but, realistic?

Evaluating Self-Similarity • Queueing Models like these are open systems • delay does not feed back to source • TCP dynamics are not being considered • packet losses cause TCP to slow down • Better approach: • Closed network, detailed modeling of TCP dynamics • self-similarity traffic generated “naturally”

Simulating Self-Similar Traffic • Simulated network with multiple clients, servers • Clients alternative between requests, idle times • Files drawn from heavy-tailed distribution • Vary a to vary self-similarity of traffic • Each request is simulated at packet level, including detailed TCP flow control • Compare with UDP (no flow control) as an example of an open system

Network Traffic Modeling