290 likes | 458 Views
End-to-End (Transport) Protocols. Underlying best-effort network. drops messages re-orders messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay. Common end-to-end services. guarantee message delivery
E N D
Underlying best-effort network • drops messages • re-orders messages • delivers duplicate copies of a given message • limits messages to some finite size • delivers messages after an arbitrarily long delay
Common end-to-end services • guarantee message delivery • deliver messages in the same order they are sent • deliver at most one copy of each message • support arbitrarily large messages • support synchronization • allow the receiver to apply flow control to the sender • support multiple application processes on each host
Simple Demultiplexor (User Datagram Protocol UDP) • Unreliable and unordered datagram service • Adds multiplexing • No flow control • Endpoints identified by ports • servers have well-known ports • see /etc/services on Unix • Optional checksum • pseudo header + udp header + data • Header format
RFC 768 • Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets. • The pseudo header conceptually prefixed to the UDP header contains the source address, the destination address, the protocol, and the UDP length. This information gives protection against misrouted datagrams. This checksum procedure is the same as is used in TCP.
Pseudo Header 0 7 8 15 16 23 24 31 +--------+--------+--------+--------+ | source address | +--------+--------+--------+--------+ | destination address | +--------+--------+--------+--------+ | zero |protocol| UDP length | +--------+--------+--------+--------+
Initiating a Session • Client initiates the connection and sends the clients port in the message header • Server port is contained in /etc/services • DNS=53 • talk=517 • Connectionless • Primary purpose demux
Demux Process Application Application Application process process process Ports Queues Port 3100 Port 3000 Packets Port 2000 demultiplexed UDP Packets arrive
TCP s=socket(AF_INET, SOC_STREAM,0) bind listen accept connect read/write UDP S=socket(AF_INET, SOCK_DGRAM, 0) bind (receiver) sendto, recvfrom Using it
Overview • Connection-oriented • Byte-stream • sending process writes some number of bytes • TCP breaks into segments and sends via IP • receiving process reads some number of bytes • Full duplex • Flow control: keep sender from overrunning receiver • Congestion control: keep sender from overrunning network
Appl Process Appl Process Read Write Bytes Bytes . . . . . . TCP TCP send buffer receive buffer . . . segment segment segment Transmit Segments TCP Stream
End-to-End Issues • Based on the sliding window protocol used at the data link layer, but the situation is very different • Potentially connects many different hosts • need explicit connection establishment and termination • Potentially different RTT • need adaptive timeout mechanism
More Issues • Potentially long delay in network • need to be prepared for arrival of very old packets (limit 60 seconds) • Potentially different capacity at destination • need to accommodate different amounts of buffering (end hosts may have hundreds of applications) • Potentially different network capacity • need to be prepared for network congestion
Src Port Dest Port SequenceNum Acknowledgement 0 HdrLen Flags Advertised (4) (6) (6) Window CheckSum UrgPtr options (variable) data Segment Format • Each connection identified with 4-tuple: • <SrcPort, SrcIPAddr, DstPort, DstIPAddr> • Sliding window + flow control • Acknowledgment, SequenceNum, AdvertisedWindow • Flags: SYN, FIN, RESET, PUSH, URG, ACK • Checksum: pseudo header + tcp header + data
Segment Size • Set to at most MSS (Maximum Segment Size) • MSS is largest segment size that can be sent without IP fragmentation • TCP supports push operation to allow application to explicitly send a segment • Timer sends partial segment
Data (SequenceNum) Sender Receiver Acknowledgement + AdvertisedWindow TCP Flow
Active Participant Passive Participant SYN, SequenceNum = x SYN + ACK, SequenceNum = y, Acknowledgement = x + 1 ACK, Acknowledgement = y + 1 Connection Establishment and Termination • Three-Way Handshake-random number so that packets from consecutive sessions are unique
CLOSED PassiveOpen Close/ ActiveOpen/SYN LISTEN SYN/SYN+ACK Send/SYN server client SYN/SYN+ACK Close/ SYN_RCVD SYN_SENT ACK/ SYN+ACK/ACK client ESTABLISHED Close/FIN FIN/ACK Close/FIN server client CLOSE_WAIT FIN_WAIT_1 FIN/ACK ACK/ Close/FIN client ACK/ CLOSING LAST_ACK FIN_WAIT_2 FIN+ACK/ACK ACK/ FIN/ACK server timeout after 2 segment lifetimes TIME_WAIT State Transition Diagram client server server server
Sliding Window Revisited • Each byte has a sequence number • ACKs are cumulative
Sliding Window • Sending side • LastByteAcked LastByteSent • LastByteSent LastByteWritten • bytes between LastByteAcked and LastByteWritten must be buffered • Receiving side • LastByteRead < NextByteExpected • bytes between NextByteRead and LastByteRcvd must be buffered
Flow Control • Sender buffer size: MaxSendBuffer • Receive buffer size: MaxRcvBuffer • Receiving side • LastByteRcvd - NextByteRead MaxRcvBuffer • AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - NextByteRead)
Sending side • NextByteExpected LastByteRcvd + 1 • LastByteSent - LastByteAcked AdvertisedWindow • EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked) • LastByteWritten - LastByteAcked MaxSendBuffer • block sender if (LastByteWritten - LastByteAcked) + y > MaxSendBuffer • Always send ACK in response to an arriving data segment • Persist when AdvertisedWindow=0 (Send 1 byte packets)
Keeping the Pipe Full • Wrap Around: 32-bit SequenceNum • Bandwidth & Time Until Wrap Around Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Time Until Wrap Around 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 28 seconds
Delay-Bandwidth product • Bytes in Transit: 16-bit AdvertisedWindow 64kB max) • Use scaled AdvertizedWindow • Bandwidth & Delay x Bandwidth Product for 100ms RTT Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Delay x Bandwidth Product 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB
Adaptive Retransmission • Original Algorithm • Measure SampleRTT for each segment/ACK pair • Compute weighted average of RTT • EstimatedRTT = x EstimatedRTT + x SampleRTT • where + = 1 • between 0.8 and 0.9 • between 0.1 and 0.2 • Set timeout based on EstimatedRTT • TimeOut = 2 x EstimatedRTT
Do not sample RTT when retransmitting Double timeout after each retransmission Karn/Partridge Algorithm Sender Receiver Sender Receiver original transmission original transmission retransmission ACK Sample RTT Sample RTT retransmission ACK (a) Sample RTT too long (b) Sample RTT too short
Notes • algorithm only as good as granularity of clock (20ms on Unix) • Cross Country RTT=100-200ms • accurate timeout mechanism important to congestion control (later)
Records • “Push” sends record, preserves boundaries. • Urgent packets actual signify record boundaries