Transport layer Part III
Principles of congestion control
Learning that congestion is occurring
Adapting to alleviate the congestion
TCP congestion control
Additive-increase, multiplicative-decrease
Slow start and slow-start restart
1
TCP flow control
application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is delivering
(sender is sending) TCP
code
IP
flow control code
receiver controls sender, so
sender won’t overflow receiver’s
buffer by transmitting too much, from sender
too fast
receiver protocol stack
Transport Layer 3-2
TCP flow control
receiver “advertises” free buffer
space by including rwnd value in
TCP header of receiver-to-sender to application process
segments
RcvBuffer size set via socket options
(typical default is 4096 bytes) RcvBuffer buffered data
many operating systems autoadjust
RcvBuffer
rwnd free buffer space
sender limits amount of unacked
(“in-flight”) data to receiver’s rwnd TCP segment payloads
value
receiver-side buffering
guarantees receive buffer will not
overflow
Transport Layer 3-3
Flow Control vs. Congestion Control
Flow control
Keeping one fast sender from overwhelming a slow receiver
Congestion control
Keep a set of senders from overloading the network
• E.g., persuade hosts to stop sending, or slow down
• Typically has notions of fairness (i.e., sharing the pain)
Different concepts, but similar mechanisms
TCP flow control: receiver window
TCP congestion control: congestion window
TCP window: min{congestion window, receiver window}
4
Simple Congestion Detection
Packet loss
Packet gets dropped along the way
Packet delay
Packet experiences high delay
How does TCP sender learn this?
Loss
• Timeout
• Triple-duplicate acknowledgment
Delay
• Round-trip time estimate
5
Idea of TCP Congestion Control
Each source determines the available capacity
… so it knows how many packets to have in transit
Congestion window
Maximum # of unacknowledged bytes to have in transit
MaxWindow = min{congestion window, receiver window}
Send at the rate of the slowest component
Adapting the congestion window
Decrease upon losing a packet: backing off
Increase upon success: optimistically exploring
6
Additive Increase, Multiplicative Decrease
How much to increase and decrease?
Increase linearly, decrease multiplicatively
Consequences of over-sized window are much worse than having
an under-sized window
• Over-sized window: packets dropped and retransmitted
• Under-sized window: somewhat lower throughput
Multiplicative decrease
On loss of packet, divide congestion window in half
Additive increase
On success for last window of data, increase linearly
7
Leads to the TCP “Sawtooth”
Window
Loss
halved
8
Practical Details
Congestion window
Represented in bytes, not in packets
Packets have MSS (Maximum Segment Size) bytes
Increasing the congestion window
Increase by MSS on success for last window of data
In practice, increase a fraction of MSS per received ACK
• # packets per window: CWND / MSS
• Increment per ACK: MSS * (MSS / CWND)
Decreasing the congestion window
Never drop congestion window below 1 MSS
9
Getting Started
Need to start with a small CWND to avoid overloading the network.
Window
But, could take a long
time to get started! t
10
“Slow Start” Phase
Start with a small congestion window
Initially, CWND is 1 MSS
So, initial sending rate is MSS/RTT
That could be pretty wasteful
Might be much less than the actual bandwidth
Linear increase takes a long time to accelerate
Slow-start phase (really “fast start”)
Sender starts at a slow rate (hence the name)
… but increases the rate exponentially
… until the first loss event
11
Slow Start in Action
Double CWND per round-trip time
1 2 4 8
Src
D D D A A D D D D
A
A A A A
Dest
12
Slow Start and the TCP Sawtooth
Window
Loss
Exponential “slow t
start”
13
Two Kinds of Loss in TCP
Triple duplicate ACK
Packet n is lost, but packets n+1, n+2, etc. arrive
Receiver sends duplicate acknowledgments
… and the sender retransmits packet n quickly
Do a multiplicative decrease and keep going
Timeout
Packet n is lost and detected via a timeout
• E.g., because all packets in flight were lost
start over with a low CWND
14
Repeating Slow Start After Timeout
Window
timeout
Slow start in operation
until it reaches half of
t cwnd.
previous
Slow-start restart: Go back to CWND of 1, but take advantage
of knowing the previous value of CWND.
15
Connection Management
before exchanging data, sender/receiver “handshake”:
agree to establish connection (each knowing the other willing
to establish connection)
agree on connection parameters
application application
connection state: ESTAB connection state: ESTAB
connection variables: connection Variables:
seq # client-to-server seq # client-to-server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
at server,client at server,client
network network
Transport Layer 3-16
TCP 3-way handshake
client state server state
LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB
Transport Layer 3-17
TCP: closing a connection
client, server each close their side of connection
send TCP segment with FIN bit = 1
respond to received FIN with ACK
on receiving FIN, ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-18
TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close
LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime
CLOSED
Transport Layer 3-19
Chapter 3: summary
principles behind transport layer services:
multiplexing, demultiplexing
reliable data transfer
flow control
congestion control
instantiation, implementation in the Internet
UDP
TCP
Transport Layer 3-20