Overview of TCP
Connection-oriented, byte-stream
sending process writes some number of bytes
TCP breaks into segments and sends via IP
receiving process reads some number of bytes
Overview of TCP
Overview of TCP
Full duplex
Implements both flow and congestion controls
Flow control: keep sender from overrunning
receiver
Congestion control: keep sender from
overrunning network
Flow control is an end-to-end issue and
congestion control is concerned with how hosts
and network interact
Overview of TCP
Based on sliding window protocol used at data
link level, but the situation is very different.
Potentially connects many different hosts
Potentially different RTT
need explicit connection establishment and
termination
need adaptive timeout mechanism
Potentially long delay in network
need to be prepared for arrival of very old packets
Overview of TCP
Potentially
different capacity at destination
need
to accommodate different amounts of
buffering
Potentially
need
TCP header
TCP
data is encapsulated in an IP
datagram
different network capacity
to be prepared for network congestion
The
normal size of the TCP header is 20
bytes, unless options are present
TCP header
TCP services
TCP
provides a byte-stream service
no
record markers are inserted by TCP
if sending application writes 10 bytes, 20
bytes, and 50 bytes --- the receiving
application may receive the 80 bytes in four
reads of 20 bytes
TCP
does not interpret the contents of the
bytes -- ASCII/binary -- same
TCP Segment format
As
mentioned earlier TCP does not
transmit bytes -- although it is a byte
stream based service
Source host buffers enough bytes from the
sending process to fill a reasonably sized
packet
Sends these packets, called segments to
receiver
TCP Segment format
What
segment
grows larger than the maximum
segment size
explicit action by the sending application
trigger by a timer that periodically fires -segment contains as many bytes as are
currently buffered
TCP Segment format
Recall
that IP discards packets after a
packets TTL expires
each TCP packet has a maximum
lifetime -- maximum segment lifetime
(MSL) -- current recommended setting is
120 seconds
This value is not enforced by the IP -- it is
a conservative estimate the TCP makes of
how long a packet might live
causes TCP to send the segments?
TCP Segment format
The
Src Port and Dest Port along with the
IP src/dest addresses identify each TCP
connection
TCPs demux key is
<SrcPort, SrcIPAddr, DestPort,
DestIPAddr>
Because TCP connections come and go, it
is possible for a connection to have
different incarnations
TCP Segment format
The
Acknowledgment, SequenceNumber,
AdvertisedWindow fields are all involved in
TCPs sliding window algorithm
Each data byte has a sequence number
The Sequence Number field contains the #
for the first byte of data
Acknowledgment and AdvertisedWindow
fields carry information about the opposite
flow
TCP Segment format
6-bit
flag field is used to relay information
between TCP peers
SYN,
FIN -- used for connections
flag set to indicate Acknowledgement
field is valid
URG flag set to indicate Urgent data is
contained
ACK
The
RESET flag -- receiver wants to abort
the connection
TCP Connection Establishment
TCP
Three-way handshake
connection begins with two actions
client
(caller) does an active open -- party
wanting to initiate a connection
server (callee) has already done a passive
open -- party willing to accept a connection
Most
connection setup is asymmetric
TCP has an explicit connection setup -both sides should agree on a set of
transmission parameters
Why the sequence number ACK is one larger than the
one sent?
It is the next sequence number expected -- this
implicitly acknowledges all earlier sequence numbers
Three-way handshake
Why
should client and server exchange
starting sequence numbers with each
other?
It should be simpler is each side starts
from 0 -- well known sequence number
Reason: to protect against two
incarnations of a connection reusing the
same sequence numbers too soon
TCP state-transition diagram
TCP state-transition diagram
The
states above ESTABLISHED are
involved in setting up a connection
The states below ESTABLISHED are
involved in terminating a connection
The sliding-window algorithm is hidden in
ESTABLISHED state
all connections start in CLOSED state
Each arc is labeled with a tag of the form
event/action
TCP state-transition diagram
Opening a connection:
server invokes a passive open operation -causing TCP to move to LISTEN state
client does an active open -- send a SYN
segment to server and moves to SYN_SENT
state
when SYN arrives at the server -- server moves
to SYN_RCVD and responds with SYN+ACK
arrival of SYN+ACK at client moves it to
ESTABLISHED -- three way handshake
TCP state-transition diagram
Closing a connection:
application process on both sides of the
connection must independently close its
half of the connection
if one side closes the connection, it has no
more data to send -- will be available to
receive data
TCP state-transition diagram
Three possible combinations to go from
ESTABLISHED to CLOSED
this side closes first: ESTABLISHED -- FIN_WAIT_1 - FIN_WAIT_2 -- TIME_WAIT -- CLOSED
other side closes first: ESTABLISHED -CLOSE_WAIT -- LAST_ACK -- CLOSED
both sides close at the same time: ESTABLISHED -FIN_WAIT_1 -- CLOSING -- TIME_WAIT -- CLOSED
A connection in TIME_WAIT state cannot move
to CLOSED state until it has waited 2*MSL
TCP state-transition diagram
Reason:
local
side responds with an ACK to a FIN from
remote side
in case the ACK is lost, the remote side would
retransmit the FIN again after timeout
if the connection is allowed to move to
CLOSED state -- then there might be another
incarnation of the connection when the FIN for
the earlier connection arrives at the local side
this FIN will close the new incarnation
TCP sliding window
Serves
several purposes:
it
guarantees reliable delivery of data
it ensures data is delivered in order
enforces flow control between sender/receiver
Receiver
advertises the size of the sliding
window -- using the Advertised Window
field in the TCP header
Receiver selects a suitable value so that
its buffer is not overflowed by a fast
sender
TCP sliding window
Receiver
cannot acknowledge a byte that
has not been sent
TCP cant send a byte application has not
written
TCP sliding window
NextByteExpected
if
<= LastByteRcvd + 1
data has arrived in-order, NextByteExpected
points to the byte after LastByteRcvd
if out-of-order arrival, NextBytesExpected
points to the first gap in the data
TCP sliding window
In receiving side:
LastByteRead < NextByteExpected -- byte
cannot be read by the application until it is
received and all preceding bytes are also
received -- NextBytesExpected points to
the byte immediately after the last byte
meeting this criterion
TCP sliding window
Flow control:
sending application is filling its local buffer
receiving application is emptying its buffer
Both buffers have finite size:
MaxSendBuffer
and MaxRcvBuffer
Receiver
throttles the sender by
advertising the window size no larger than
the amount of data it can buffer
TCP sliding window
On the receive side to avoid buffer overflow:
TCP sliding window
LastByteRcvd - LastByteRead <= MaxRcvBuffer
LastByteReceived - LastByteAcked <=
AdvertisedWindow
AdvertisedWindow =
MaxRcvBuffer - (LastByteRcvd - LastByteRead)
This is the amount of free space remaining in the
receive buffer
If the data arrives faster than consumed, this
value decreases with time -- at one time
AdvertisedWindow will be 0 if the receiver falls
behind the sender
On sending side TCP should adhere to the
advertised window it gets from the receiver
EffectiveWindow = AdvertisedWindow (LastByteSent - LastByteAcked)
EffectiveWindow must be greater than 0 before
the sender can send data
Send side must also make sure the local
application does not overflow the send buffer
TCP sliding window
LastByteWritten
- LastByteAcked <=
MaxSendBuffer
if the sending process tries to write y bytes
to TCP and
(LastByteWritten - LastByteAcked) + y >
MaxSendBuffer
TCP blocks
TCP ensures that a slow receiving process
can stop a fast sending process
TCP sliding window
When
advertisedWindow becomes 0 the
sender stops sending data
Because TCP sends a segment only in
response to a received segment -- How
sender know the receiver is ready?
When
receiver advertises a window size of 0,
sender periodically sends a 1 byte segment
this triggers a response -- reports a non-zero
window size
called smart sender/dumb receiver technique
TCP sliding Window Issues
TCPs
sequence is 32-bits wide
TCPs advertised window is 16-bits wide
This satisfies the sliding window algorithm
requirement 232 >> 2 216
However, 32-bit sequence number field
can wrap around -- i.e., a packet with
sequence # x can be sent and after a
while another packet with sequence # x
can be sent
TCP sliding Window Issues
TCP sliding Window Issues
A
packet can survive for MSL time -- 120
seconds
If sequence number wraps around within
120 seconds we have a problem
Sequence # will wrap around if the #s are
consumed very fast -- data is transmitted
very fast
An OC-48 (622Mbps) link can wraparound
sequence #s in 55 seconds
Adaptive Retransmissions in
TCP
Largest
possible data sender could have
in the pipe is determined by the 16-bit
AdvertisedWindow
The advertisedWindow should be large
enough to inject
delay
* bandwidth data into the network
Assuming
a RTT of 100ms for T3 -549Kbytes
16-bit AdvertisedWindow will allow only
64Kbytes!
Adaptive Retransmissions in
TCP
TCP
sets timeout for retransmission as a
function of the estimated RTT
TCP uses an adaptive mechanism to
estimate the RTT
Idea: keep a running average of RTT and
compute timeout as a function of RTT
Adaptive Retransmissions in
TCP
TimeOut
= 2 x EstimatedRTT
Karn/Partridge Algorithm:
EstimatedR TT = EstimatedR TT + (1 ) SampleRTT
is selected to smooth EstimatedRTT -original TCP spec. recommends a setting
between 0.8 and 0.9
Adaptive Retransmissions in
TCP
The
RTT estimation process should not
consider the sampleRTT when a
retransmission occurs
As shown above, precise measurement of
sampleRTT becomes difficult due to the
ambiguity in matching the ACKs with the
transmissions
Adaptive Retransmissions in
TCP
Jacobson/Karels Algorithm
Only the aspect of the algorithm that deals with
timeout and retransmit is discussed here
Main problem with the original scheme
it does not consider the variation of the sampleRTTs
into account
if the variation is small, then EstimatedRTT can be
better trusted
if large variation then, timeout should not be tightly
coupled to EstimatedRTT
10
Adaptive Retransmissions in
TCP
Difference = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + ( x
Difference)
Deviation = Deviation + (|Difference| Deviation)
is a fraction between 0 and 1
Timeout = x EstimatedRTT + x Deviation
is typicall 1, is set to 4
TCP Extensions
TCP timeout estimation:
TCP reads the actual system clock and
puts it in the segment header
Receiver echo the timestamp back to the
sender
Sender can estimate the RTT by
subtracting the current time from the
received timestamp
TCP Extensions
Measuring
RTT, sequence number wrap
around, and keeping the pipe full are some
of issues with TCP
Extensions have been proposed to
address these issues
TCP Extensions
Sequence number wrap around:
Two segments with the same sequence
number
Differentiate the two segments by putting
the timestamp value in the option field
Timestamps monotonically increasing;
helps in distinguishing the segments
11
TCP Extensions
Larger Pipe:
AdvertisedWindow may not be sufficient to
fully utilize the pipe -- Delay * bandwidth
product may very large compared to the
AdvertisedWindow
We can use a scaling factor
AdvertisedWindow is left shifted by that
many places before using its contents
12