Distributed Queueing Gigabit Kits (June 2002)

Distributed QueueingGigabit Kits (June 2002) Ken Wong Jon Turner and Prashanth PappuWashington Universitykenw@arl.wustl.edu

I O I O I O I O I O I O Distributed Queueing periodic queuelength reports ControlProcessor Switch Fabric queueper output Sched. Sched. Sched. Sched. Sched. Sched. Scheduler paces eachqueue according tobacklog share Routing Routing Routing Routing Routing Routing TI TI TI TI TI TI

What is Distributed Queueing? • Performs like an output queueing system • I.e., Maximize output link utilization • Without the need for a switch with speed-up of N • Goals of the DQ Algorithm • Avoid switch fabric congestion (and therefore cell loss) • Avoid output underflow (maximize output link utilization) • Topics • The STRESS experiment • Discrete-event simulation results • SPC-only prototype measurements • Current work

Stress Test R = 5 Phases K = 4 Sources 70 Mbps 70 Mbps L = 70 Mbps L = 70 Mbps 70 Mbps can vary number of inputs and outputs used, and length of “phases” S x L = 2 x 70 = 140 Mbps 70 Mbps

Stress Test Simulation - Backlog

Basic Ideas • To Avoid Output Underflow • Input-side backlog B(i,j) indicates need for switch bandwidth • Output-side backlog B(j) indicates less need for switch bandwidth (back pressure) • Apportion switch capacity based on relative input-side backlogs to avoid switch congestion and output underflow

Basic DQ Algorithm • Goal: Avoid switch congestion and output queue underflow. • Let hi(i,j) be input i’s share of input-side backlog to output j. • Avoid switch congestion by sending from input i to output j at rate LShi(i,j) • where L is external link rate and S is switch speedup • Let lo(i,j)be input i’s share of total backlog for output j. • Avoid underflow of queue at output j by sending from input i to output j at rate Llo(i,j) • This works if L(lo(i,1)+···+lo(i,n))LS for all i • Let wt(i,j) be the ratio of lo(i,j) to lo(i,1)+···+lo(i,n). • Let rate(i,j)=LS min ( wt(i,j), hi(i,j) ). • Note: Algorithm avoids congestion and output underflow for large enough S.

Stress Test Simulation - Backlog

Stress Test Simulation - Min Rates

Stress Test Simulation - Actual Rates

SPC Code • DQ cell (40 of 48 bytes for N=8 ports) • Port number • Output backlog B(i) • N input backlogs B(i,j) • SPC kernel at each port (Every D = 500 usec) • Multicast 1 DQ cell containing its backlogs to all ports using vpi/vci 0/61 • Read incoming DQ cells from all ports • Call dq_set_pace() • dq_set_pace • Compute rate(i,j) for all j • Set APIC pacing rates for each j

Stress Test Measurement Results

Improving Basic Algorithm • Allocated rates will oscillate when backlogs are near 0 • Use artificial minimum backlog in hi(i,j) and artificial minimum rate in lo(i,j) • Does not always make full use of available input bandwidth • Does not reallocate bandwidth that is lost when queues are “output limited” • Extend algorithm to reallocate “excess” rate • Allocate rates in decreasing order of largest lo(i,j)/hi(i,j) • I.e., rate(i,j) can “donate” excess rate to remaining rates

Current/Future Work • “Design and Evaluation of a High-Performance Dynamically Extensible Router,” DANCE Conference, May 2002. • FCFS property across input streams • SPC • Output-limited rate redistribution algorithm • Dynamic configuration of DQ parameters and algorithm • INFOCOM 2003 paper (Pappu, Turner, Wong) • Handles unequal link speed case • FPGA implementation • Extension to fair queueing • Extension to reserved bandwidth flows

Summary • Fluid model simulator (C++) • Discrete-event simulator (C++) • Study alternative DQ algorithms • Understand operating characteristics in controlled environment • Pre-testing of SPC (integer) algorithm • SPC-only prototype • Basically works • Great monitoring tools (Java GUI, DQ-cell capture, Other utilities) • Great traffic generator (AAL5Generator) • FPX version coming • Lots of fun

Backlog Shares • Let hi(i,j) be input i’s share of input-side backlog to output j: • Let lo(i,j)be input i’s share of total backlog for output j:

Allocated Rates r(i,j) • where

Artificial Backlogs & Rates • Let hi(i,j) be input i’s share of input-side backlog to output j: • Let lo(i,j)be input i’s share of total backlog for output j:

Rate Redistribution Algorithm • Revised rate allocation at input i: R = SL repeat n times Let j be unassigned queue with largest ratio lo(i,j)/hi(i,j) Let wt(i,j) = lo(i,j)/(sum of lo(i,q) for unassigned queues q) rate(i,j) = min{Rwt(i,j), SLhi(i,j)} R = R - rate(i,j)

Distributed Queueing Gigabit Kits (June 2002)