1 / 28

Quality of Service in Computer Networks

Quality of Service in Computer Networks. Alex Shpiner , Mellanox Technologies. High-performance communication , BGU, 2017. What is Quality of Service ( QoS )?.

loisw
Download Presentation

Quality of Service in Computer Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quality of Service in Computer Networks Alex Shpiner, Mellanox Technologies High-performance communication, BGU, 2017

  2. What is Quality of Service (QoS)? • Network configuration that aims to provide optimal end-to-end performance for users, according to class-of service of the traffic. • High throughput • Low latency • Fairness • Minimal or no packet loss • Minimal or no jitter • variability (stability) of throughput or latency over time • Specific class-of-service may prioritize one property over the other • High throughput for storage • Low latency for control • Low jitter for real time audio

  3. QoSComponents • Flow-control – eliminates packet loss caused by overflowing buffers. • Congestion control – prevents or reacts to congestion, reducing or controlling its effect on overall throughput in the network. • Service differentiation – applies differential handling to various traffic classes according to service priority.

  4. InfiniBand vs. RoCE (RDMA over Ethernet or IB over Ethernet) • RDMA is natively implemented as part of InfiniBand • Requires end-to-end network to support IB (IB switches) • Conventional TCP traffic runs over Ethernet based network • Consolidating them is desirable • Requirement for RDMA over Ethernet • Runs over commodity network infrastructure • Solution: RoCE • This is RoCE version 2 packet format: Commodity Ethernet/IP headers InfiniBand-specific headers Ethernet IP UDP Payload InfiniBand L4

  5. Network congestion 100% C 100% 300% C C 100% C

  6. What perfect congestion control achieves? 33% 33% 100% 33%

  7. Lossy vs. Loss-less Network • When buffer overflows: • In lossy network: packet is dropped • In lossless network: flow control prevents packets from being droped (snext slides) • High-performance networks are mostly lossless • Lossynetworks require end-to-end transport that is able to detect and retransmit lost data - these take computation resources and adds to protocol overhead. • If packet drop is not negligible, significant bandwidth may be lost on retransmissions. • To avoid drop, large costly buffers are used which introduce latency as they fill up.

  8. Flow Control in Ethernet • Link layer protocol (2nd layer in OSI) • Switch to neighboring switch/NIC. NIC to switch. • When buffer fills up, the receiver sends pause frame to the sender. • When buffer empties up, the receiver sends unpauseframe to the sender. • Pausing granularity is per priority • Called Priority Flow Control (PFC) • 8 priorities can be defined Buffer

  9. Flow Control in InfiniBand Buffer VL0 Buffer VL0 Buffer VL1 Buffer VL1 Buffer VL15 Buffer VL15 • InfiniBand defines point-to-point credit-based flow control scheme. • Receiving switch sends to its neighbor switch continuously information with amount of free space in the buffer. • Contrary to Ethernet flow control, in which pause frame is sent only if buffer occupancy crosses threshold. • A packet is never sent unless there is room for it, this ensures that packet loss is only a result of link-transmission errors. Transmitter Receiver MUX De- MUX Credit packet for VL1 Physical Link Data packet on VL1

  10. Why flow control is not enough: Congestion Spreading This flow is also paused, since the pause control does not distinguish between flows. 33% instead of 67% PFC stops links (priorities) and not specific flows.

  11. Congestion Spreading – another example F E D C B A • What is the throughput of victim flow? G X Y Transmitting Hoststo the same receiver Congested Flows Transmitting Host to another receiver Switch 1 Switch 2 Victim Flow

  12. Congestion Spreading Congestion might spread over whole network 3 2 30 Ideal throughput 40-13.3=26.6Gbps 4 20 Throughput (Gbps) 1 10 0 H’ H1 H2 H3 Host R H1 H2 H3 R’ H’ 40/3=13.3Gbps

  13. Credit Loop Deadlock • Do not transmit if not enough buffer to receive • Forwarding may cause a Cyclic Buffers Dependency • This received the name “Credit Loop” • If the buffers head fill up by packets that stay on the loop, it will dead lock f3 f2 12 22 f1 f4 1 2 2 1 3 23 11 3 f3 4 f2 4 f4 f1

  14. Solutions to credit loop deadlock • Routing rules constraints over known topologies • Prevent turns that might cause credit loop • Not defined for general topology • Emptying the switch buffer • Losslessness is not preserved • And more…. f3 f2 12 22 f1 f4 1 2 2 1 3 23 11 3 f3 4 f2 4 f4 f1

  15. Parking Lot Effect • Parking Lot Unfairness • In the chart below, the receiving sequence on Recv: A, B, A, C, A, B, A, D… A, B, A, C, A, B, A, D, A, B, A, C, A, B, A, D, … B, C, B, D, B, C, B, D, … C, D, C, D, … D, D, D, …

  16. Congestion Control • Congestion control – prevents or reacts to congestion, reduces its effect on overall network performance. • Kicks-in when arriving traffic is larger than the output link capacity at some point in the network • Buffer is used to absorb excess traffic • Congestion control aims to reduce buffer usage while: • preserving bottleneck link utilization • preserving fairness • Works on end-to-end flow granularity • Throttles the rate of specific flow/s, hence does not create victims. • Contrary to link-level flow control, which stops all the traffic on the link (per priority)

  17. Congestion Control Design Alternatives • How to identify congestion: • Packet drop • Con: long queues, retransmissions • Delay • Cons: backward delay, timestamping, unfair stable point • ECN (Explicit congestion notification) (in next slides) • Con: requires switch support • Acknowledgements (ACKs) • Con: false notification in case of reordering • Timeout (used with ACKs) • Control parameter: • Rate (inter-packet delay) • Cannot provide bound on number of packets in network • Window (number of packets in flight) • Rate ~= window [pkts] / round-trip-time [sec] • RTT-unfair

  18. TCP Basics (New Reno) • Window based algorithm • Keeps un-acked packets < CWND • ACK-based algorithm • TCP uses ACK to notify the sender about successful packet arrival. • ACK X acknowledges arrival of packets 0….X-1 (cumulative ACK) • CWND fluctuates based on ACKs arrival • Rate increase • Upon ACK arrival: CWND += 1/CWND • (SS: CWND += 1) • Rate decrease • 3 duplicate ACKS: CWND = CWND/2 • Timeout: CWND = 1 MSS

  19. Explicit Congestion Notification (ECN) • Switch-based enhancement that is used by end-hosts. • Allows end-to-end congestion notification without dropping packets • Supported by most advanced data center switches • Uses two bits in IP header (Diffserv field): • 00 – Non ECN capable • 01/10 – ECN capable • 11 – Congestion encountered • Upon congestion, switch changes 01/10 to 11. • ECN marking by probabilistic function based on queue length • Longer queue => higher marking probability

  20. DCTCP – Data Center TCP • Suggested and implemented by MSFT • Using ECN marking • Smooths the rate and queue usage: • Alpha estimation is moving average of fraction of received ECN marked packets in the last window • F = marked packets / total packets • g – moving average parameter • Rate reduced by alpha ratio:

  21. RDMA over Ethernet (RoCEv2) Packet Format Commodity Ethernet/IP headers InfiniBand-specific headers + payload ECN field in IP header is used to mark congestion (same as used for TCP) • In pure InfiniBand packet, FECN bit in BTH (Base Transport Header) BTH is used to mark congestion.

  22. IB/RoCECongestion Control Algorithm: Congestion Point Congested Traffic • Congestion Point (switch): marks ECN bits in packet header based on queue length • Standard functionality supported by all commodity switches • also used for TCP Congested Traffic (ECN marked) Sender NIC Reaction Point (RP) Switch Congestion Point Receiver NIC Notification Point (NP)

  23. IB/RoCECongestion Control Algorithm : Notification Point Congested Traffic (ECN marked) Notification Point: If ECN-marked packet arrives, sends CNP (Congestion Notification Packet) back to the sender CNP identifies a flow (QP) Congested Traffic CNP Sender NIC Reaction Point (RP) Switch Congestion Point Receiver NIC Notification Point (NP)

  24. IB/RoCECongestion Control Algorithm: Reaction Point Congested Traffic(ECN marked) Reaction Point: Throttles sending rate based on Congestion Notification Packets (CNPs) arrival Congested Traffic CNP Sender NIC Reaction Point (RP) Switch Congestion Point Receiver NIC Notification Point (NP)

  25. Service Differentiation • Specific class-of-service may prioritize one property over the other • High throughput for storage • Low latency for control • Low jitter for real time audio • Service differentiation – applies differential handling to various traffic classes according to service priority. • In-packet classification • Service level (SL) field in InfiniBand packet • Parallel queues • In InfiniBand: virtual lanes (VLs) • Network devices implement SL-to-VL mapping • Differential service for every queue: • Weighted round-robin, strict service (high or low priority) • Bounds on buffer utilization

  26. Service Levels in InfiniBand • The SL is a field in the Local Route Header of the packet indicating the service class of the packet, enabling the implementation of differentiated services. • SL to VL mapping: every service level is assigned with a VL. • While the appropriate VL for a specific Service Level may differ over a packet’s path (in different switches), the Service Level remains constant. • Eg. in one switch the packet can be the highest priority, in other switch – the lowest priority.

  27. High and Low Priority VLs in InfiniBand • InfiniBand defines two-level hierarchy of VLs service level: • High-priority vs low-priority • Weighted Round Robin inside priority • By assigning a packet to a service level and setting the service level to map to a particular virtual lane, packets can be classified with either a high or low priority. • High-priority traffic will preempt low-priority traffic… • To ensure forward progressing of low-priority packets we define a Limit of High Priority (LHP). The LHP is the maximum number of High-priority packets that may be scheduled on high-priority lanes before a packet from a Low-priority lane is selected. • Arbitration between individual virtual lanes of the same priority is carried out using a weighted fair arbitration scheme. • Each virtual lane is scheduled in table order and assigned a weight indicating the number of bytes it is allowed to transmit during its turn. High Priority Lanes MUX L H H H L H H H L Low Priority Lanes LHP

  28. Thank You

More Related