620 likes | 737 Views
T-110.5110 Computer Networks II Transport Issues 29.9.2008. Prof. Sasu Tarkoma. Contents. Transport Layer Overview Congestion Control TCP, TCP improvements, TCP and wireless Stream Control Transmission Protocol (SCTP) Datagram Congestion Protocol (DCCP) TLS and DTLS.
E N D
T-110.5110 Computer Networks IITransport Issues29.9.2008 Prof. Sasu Tarkoma
Contents • Transport Layer Overview • Congestion Control • TCP, TCP improvements, TCP and wireless • Stream Control Transmission Protocol (SCTP) • Datagram Congestion Protocol (DCCP) • TLS and DTLS
Transport Layer Overview • TCP congestion control principles introduced in late 1980s • Not part of the original transport layer functionality • Important design factor in today’s Internet protocol development • Important issues • Preventing congestion collapse • Fairness
TCP and UDP • Transmission Control Protocol • Connection oriented • RFC 793 • User Datagram Protocol (UDP) • Connectionless • RFC 768
Motivation for Congestion Control • UDP used instead of TCP by applications that prefer timeliness over reliability • UDP does not have congestion control • A problem with long-lived flows and traffic intensive flows (streaming video, audio, internet telephony) • Greater use increases risk of congestion collapse • Congestion control mechanisms refers to techniques and mechanisms that can either prevent congestion, before it happens, or remove congestion, after it has happened open-loop congestion control (prevention) and closed loop congestion control (removal)
Congestion Prevention • Transmission rate must be reduced when congestion is detected • Responsibility of transport layer, i.e., the sending end host • Packet loss is assumed to be congestion signal • No deployed explicit congestion notification scheme • At most one congestion action / round-trip time • Burst of packet losses can be indication of same congestion situation
Fairness • Transport implementations must be fair to other flows • Transmission rate should be roughly similar to that of TCP • Components of TCP-friendly congestion control • Slow-start • Additive Increase, Multiplicative Decrease (AIMD) • Retransmission timers relative to round-trip time
Solutions • Implement congestion control below UDP: too low • Above UDP: implement congestion control at application level • Reinventing the wheel each time • Complex, might not be done correctly • New protocol more interoperable than a user-level library • In transport layer: modification of TCP, UDP, RTP, SCTP • More complex protocols • Not general enough • Introduces a fundamental change • Current trend: new transport protocols, namely DCCP and SCTP
TCP • Reliable • Cumulative acknowledgements • Fast retransmit / fast recovery • Reno [RFC 2581], NewReno [RFC 3782] • Retransmission timeouts [RFC 2988] • Stream-oriented • no concept of datagram boundaries • ideal for transferring files • transferring series of structured messages more difficult
TCP Services • Reliable communication between pairs of processes • Across variety of reliable and unreliable networks and internets • Two labeling facilities • Data stream push • TCP user can require transmission of all data up to push flag • Receiver will deliver in same manner • Avoids waiting for full buffers • Urgent data signal • Indicates urgent data is upcoming in stream • User decides how to handle it
TCP Header Source: William Stallings, Data and Computer Communications, Chapter 17.
TCP Mechanisms • Connection establishment • Between ports • Three way handshake • Data transfer service • Stream of octets • Octets numbered modulo 223 • Flow control by credit allocation of number of octets • Data buffered at sender and receiver • Connection termination • Graceful close • Transport entity sets FIN flag on last segment sent • Abrupt termination by ABORT primitive
Congestion Control • RFC 1122, Requirements for Internet hosts • Retransmission timer management • Estimate round trip delay by observing pattern of delay • Set time to value somewhat greater than estimate • Simple average • Exponential average • RTT Variance Estimation (Jacobson’s algorithm)
Exponential RTO Backoff • Since timeout is probably due to congestion (dropped packet or long round trip), constant RTO is not good idea • RTO increased each time a segment is re-transmitted • RTO = q*RTO • Commonly q=2 • Binary exponential backoff
Retransmission Mechanism • TCP receiver acknowledges next sequence number it expects to receive • If receiver gets packet out of order it acknowledges same sequence number than earlier • When sender receives 3 duplicate acknowledgements it considers the first unacknowledged segment lost • Congestion response: reduce the congestion window by half • Retransmit the first unacknowledged segment • If no acknowledgements arrive for time RTO, sender retransmits the first unacknowledged segment • Reset window to one segment
Karn’s Algorithm • If a segment is re-transmitted, the ACK arriving may be: • For the first copy of the segment • RTT longer than expected • For second copy • No way to tell • Do not measure RTT for re-transmitted segments • Calculate backoff when re-transmission occurs • Use backoff RTO until ACK arrives for segment that has not been re-transmitted
Window Management • Slow start • awnd = MIN[credit, cwnd] • Start connection with cwnd=1 • Increment cwnd at each ACK, to some max • Dynamic windows sizing on congestion • When a timeout occurs • Set slow start threshold to half current congestion window • ssthresh=cwnd/2 • Set cwnd = 1 and slow start until cwnd=ssthresh • Increasing cwnd by 1 for every ACK • For cwnd >=ssthresh, increase cwnd by 1 for each RTT
Congestion Example Source: http://dpnm.postech.ac.kr/itec522/lecture/Chapter12-3.ppt
Congestion avoidance Source: http://dpnm.postech.ac.kr/itec522/lecture/Chapter12-3.ppt
TCP timers Source: http://dpnm.postech.ac.kr/itec522/lecture/Chapter12-3.ppt
TCP Congestion Summary Source: http://dpnm.postech.ac.kr/itec522/lecture/Chapter12-3.ppt
TCP Summary and Improvements • Concepts: Congestion window, round-trip time, retransmission timeout, duplicate acknowledgement (triggered by out of order segment) • Congestion control • Packet loss as a signal, reduce rate • Fairness • Transport implementations must be fair to other flows • Retransmission mechanism • Selective acknowledgements (SACK), RFC 2018 • Additional information about ”holes” in sequence number space • Limited transmit & early retransmit, timestamps
TCP Problems • Minimal information from cumulative acknowledgements • Problems in environments with frequent packet losses (wireless) • Small window and packet retransmissions • May prevent fast retransmit from working • Retransmission ambiguity -- is ACK for original or retransmit? • Hinders the round-trip time measurement • Unnecessary retransmissions • Unnecessary use of bandwidth (sometimes expensive in wireless)
SACK • Additional information about “holes” in sequence number space • TCP option that reports discontinuous blocks of received data • Sender gets better information about which segments are lost • Allows more efficient retransmissions • Without SACK sender can retransmit only one segment in round-trip time • With SACK more retransmissions can be made in a round-trip time • Allows more efficient tracking of number of outstanding segments • SACK option specified in RFC 2018 • SACK-based retransmission algorithm specified in RFC 3517
Timestamps • Specified in RFC 1323 • TCP option for sender to include timestamp in every packet • TCP receiver echoes the timestamp back to sender • Retransmissions have different timestamp than original • Allows round-trip time measurement for retransmitted segments • Not allowed without timestamps • Allows detection of spurious retransmissions [Ludwig00] • Allows protection against wrapped sequence numbers
Queue Management • Simple router implementation drops packet when queue is full • Lock-out: Sometimes few flows get to dominate most of queue space • Queue delay: Long packet queues increase transmission delays • Active Queue Management marks packets before queue is full • Random Early Detection (RED) [Floyd93] • Mark a packet at probability P when queue length is more than L • Marks are distributed more evenly between flows
Explicit Congestion Notification • Sender marks a bit in IP header if transport is ECN capable • Routers to indicate congestion with a congestion bit in IP header • Used with Active Queue Management • Reduces the number of packet losses • Transport layer receiver echoes congestion notification to sender • In transport header • When receiving notification, sender reduces its transmission rate • Implemented in many end-hosts, but not too many routers • Problem: Some devices in network drop IP packets with ECN bits
1984 Nagel’s algorithm to reduce overhead of small packets; predicts congestion collapse 1975 Three-way handshake Raymond Tomlinson In SIGCOMM 75 1996 SACK TCP (Floyd et al) Selective Acknowledgement 1987 Karn’s algorithm to better estimate round-trip time 1983 BSD Unix 4.2 supports TCP/IP 1988 Van Jacobson’s algorithms congestion avoidance and congestion control (most implemented in BSD Tahoe) 1986 Congestion collapse observed 1993 TCP Vegas (Brakmo et a.l) delay-based congestion avoidance 1994 ECN (Floyd) Explicit Congestion Notification 1974 TCP described by Vint Cerf and Bob Kahn In IEEE Trans Comm 1982 TCP & IP RFC 793 & 791 1990 1975 1980 1985 1994 1993 1996 TCP Evolution
SYN Cookies • Client • sends SYN packet and ACK number to server • waits for SYN-ACK from server w/ matching ACK number • Server • responds w/ SYN-ACK packet w/ initial SYN-cookie sequence number • Sequence number is cryptographically generated value based on client address, port, and time. • Client • sends ACK to server w/ matching sequence number • server • If ACK is to an unopened socket, server validates returned sequence number as SYN-cookie • If value is reasonable, a buffer is allocated and socket is opened SYN ack-number SYN-ACK seq-number as SYN-cookie, ack-number NO BUFFER ALLOCATED ACK seq_number ack-number+data SYN-ACK seq-number, ack-number TCP BUFFER ALLOCATED
SCTP • Stream Control Transmission Protocol (SCTP) • Specified in RFC 2960 • Additional features to TCP • Preservation of message boundaries • Support for multiple streams • Support for multi-homing • Packets consist of chunks: INIT, SACK, HEARBEAT, DATA, ABORT, SHUTDOWN, ERROR, and AUTH • Partial reliability • Retransmissions until abort • Extended Socket API (bind(), context data with sendmsg()) • Suitable for signalling traffic • Challenges with middleboxes
Motivation • TCP, UDP do not satisfy all application needs • SCTP evolved from work on IP telephony signaling • Proposed IETF standard (RFC 2960) • Like TCP, it provides reliable, full-duplex connections • Unlike TCP and UDP, it offers new delivery options that are particularly desirable for telephony signaling and multimedia applications • TCP + features • Congestion control similar; some optional mechanisms mandatory • Two basic types of enhancements: • performance • robustness
Comparison • Services/Features SCTP TCP UDP • Full-duplex data transmission yes yes yes • Connection-oriented yes yes no • Reliable data transfer yes yes no • Unreliable data transfer yes no yes • Partially reliable data transfer yes no no • Ordered data delivery yes yes no • Unordered data delivery yes no yes • Flow and Congestion Control yes yes no • ECN support yes yes no • Selective acks yes yes no • Preservation of message boundaries yes no yes • Application data fragmentation yes yes no • Multistreaming yes no no • Multihoming yes no no • Protection agains SYN flooding attack yes no n/a • Half-closed connections no yes n/a
Packet format • Unlike TCP, SCTP provides message-oriented data delivery service • key enabler for performance enhancements • Common header; three basic functions: • Source and destination ports together with the IP addresses • Verification tag • Checksum: CRC-32 instead of Adler-32 • followed by one or more chunks • chunk header that identifies length, type, and any special flags • concatenated building blocks containg either control or data information • control chunks transfer information needed for association (connection) functionality and data chunks carry application layer data. • Current spec: 14 different Control Chunks for association establishment, termination, ACK, destination failure recovery, ECN, and error reporting • Packet can contain several different chunk types
App waits Performance • Decoupling of reliable and ordered delivery • Unordered delivery: eliminate head-of-line blocking delay TCP receiver buffer Chunk 2 Chunk 3 Chunk 4 Chunk 1 • Application Level Framing • Support for multiple data streams (per-stream ordered delivery) • Stream sequence number (SSN) preserves order within streams • no order preserved between streams • per-stream flow control, per-association congestion control
App stream 1 TCP sender Chunk 1 Chunk 1 Chunk 2 Chunk 2 Chunk 3 Chunk 3 Chunk 4 Chunk 4 Chunk 1 Chunk 1 Chunk 2 Chunk 1 Chunk 2 Chunk 2 Chunk 2 Chunk 1 App stream 2 1 1 4 2 3 3 4 2 App 1 waits Multiple Data Streams • Application may use multiple logical data streams • e.g. pictures in a web browser • Common solution: multiple TCP connections • separate flow / congestion control, overhead (connection setup/teardown, ..) TCP receiver
Multihoming • TCP connection is equivalent to SCTP association • 2 IP addresses, 2 port numbers 2 sets of IP addresses, 2 port numbers • Goal: robustness • automatically switch hosts upon failure • eliminates effect of long routing reconvergence time • TCP: no guarantee for “keepalive“ messages when connection idle • SCTP monitors each destination's reachability via ACKs of • data chunks and heartbeat chunks • SCTP uses multihoming for redundancy, not for load balancing
Association phases • Association establishment: 4-way handshake • Host A sends INIT chunk to Host B • Host B returns INIT-ACK containing a cookie • information that only Host B can verify • No memory is allocated at this point (prevents DoS) • Host A replies with COOKIE-ECHO chunk; may contain A's first data. • Host B checks validity of cookie; association is established • Data transfer • SCTP assigns each chunk a unique Transmission Sequence Number (TSN) • SCTP peers exchange starting TSN values during association establishment phase • Message oriented data delivery; fragmented if larger than destination path MTU • Reliability through acks, retransmissions, and end-to-end checksum • Association shutdown: 3-way handshake • SHUTDOWN SHUTDOWN-ACK SHUTDOWN-COMPLETE • Does not allow half-closed connections
Motivation • Some apps want unreliable, timely delivery • For example: VoIP • UDP: no congestion control • Unresponsive long-lived applications • endanger others (congestion collapse) • may hinder themselves (queuing delay, loss, ..) • Implementing congestion control is difficult • may require precise timers; should be placed in kernel
DCCP • Datagram Congestion Control Protocol (DCCP) • Unreliable datagram-oriented protocol (RFC 4340) • UDP with congestion control • Connection-oriented, requires connection state machine • Congestion control requires ack mechanism and sequence numbers • Negotiable features and options • Checksums, congestion control parameters • Some features: partial checksums, service codes • Suitable for long-lived non-reliable flows • Challenges with middleboxes
DCCP Requirements • DCCP was designed for time-sensitive applications • Application requirements: • Choice of congestion control mechanism: TFRC vs. TCP-like • Buffering control: do not deliver old data • Low per-packet overhead • Additional features • Explicit Congestion Notification (ECN): mark congested packets • NAT and firewall support: TCP-style explicit connection setup and teardown
DCCP Requirements • Well-known features from TCP and UDP: • Port numbers, checksums, sequence numbers (with difficulty), acks (congestion and ECN info), piggybacked acks • Three-way handshake to set up, two-way with wait to tear down • New features: • Negotiate congestion control mechanism and parameters on setup • Two half-connections (A → B, B → A)
Half connections • Based on observation that traffic is typically asymmetric • It follows that separation is useful • Different routes implies different congestion issues • Each half connection has own congestion control mechanism and parameters • Better than two one-way connections • Works better with firewalls and NAT • Can piggyback acks with data
DCCP Feature Selection • Reliable feature selection: • A: change(f, α) • B: confirm(f, α) / prefer(f, β) • [A: confirm(f, β) ] • Selection for both half-connections done in parallel at startup • Generic, extensible
Issues with Acknowledgements (ACKs) • Acks must be at least partially reliable • TCP-style cumulative acks won’t work, so must ack everything (ack vector) • But ack state at receiver may grow without bound! • So sender occasionally acks the receiver’s acks • Receiver can throw away state for that ack • Acks take up sequence number space • Useful: can be used to detect reverse-path congestion
Packet Structure • Basic packet similar to UDP • Small (12 bytes) • Extensible for additional features instead of using a fixed-length flag field 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type |X|# NDP| Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
“Plug ‘n Play” Congestion Control • CC mechanism and parameters (both ways) chosen during connection setup • Currently two mechanisms: • TFRC (control equation) • TCP-like (TCP with tweaked parameters) • Can add more later
Partial checksums • Checksum covers DCCP header and (optionally) any number of bytes into payload • Allows delivery of some damaged data • May be useful on error-prone links (eg. wireless) • Drawbacks: • Might conflict with IP-level authentication (eg. IPSec’s AH)
When is a packet received? • TCP: acked packets must be delivered to application • DCCP: acked packet might be dropped from application’s queue (apps might favour new data over old) • Ack means received and placed into application queue