Unit Iv
Unit Iv
Transport Layer
Introduction
The transport layer is the fourth layer of the OSI model and is the core of the Internet
model.
It responds to service requests from the session layer and issues service requests to
the network Layer.
The transport layer provides transparent transfer of data between hosts.
It provides end-to-end control and information transfer with the quality of service
needed by the application program.
It is the first true end-to-end layer, implemented in all End Systems (ES)
Process-to-Process Communication
The Transport Layer is responsible for delivering data to the appropriate application
process on the host computers.
This involves multiplexing of data from different application processes, i.e. forming
data packets, and adding source and destination port numbers in the header of each
Transport Layer data packet.
Together with the source and destination IP address, the port numbers constitutes a
network socket, i.e. an identification address of the process-to-process communication.
Addressing: Port Numbers
Ports are the essential ways to address multiple entities in the same location.
Using port addressing it is possible to use more than one network-based application
at the same time.
Three types of Port numbers are used :
1. Well-known ports - These are permanent port numbers. They range between 0 to
1023.These port numbers are used by Server Process.
2. Registered ports - The ports ranging from 1024 to 49,151 are not assigned or
controlled.
3. Ephemeral ports (Dynamic Ports) – These are temporary port numbers. They range
between 49152–65535.These port numbers are used by Client Process.
Encapsulation and Decapsulation
To send a message from one process to another, the transport-layer protocol
encapsulates and decapsulates messages.
Encapsulation happens at the sender site. The transport layer receives the data and adds
the transport-layer header.
Decapsulation happens at the receiver site. When the message arrives at thedestination
transport layer, the header is dropped and the transport layer delivers the message to
the process running at the application layer.
Multiplexing and Demultiplexing
Whenever an entity accepts items from more than one source, this is referred to as
multiplexing (many to one).
Whenever an entity delivers items to more than one source, this is referred to as
demultiplexing (one to many).
The transport layer at the source performs multiplexing
The transport layer at the destination performs demultiplexing
Flow Control
Flow Control is the process of managing the rate of data transmission between two
nodes to prevent a fast sender from overwhelming a slow receiver.
It provides a mechanism for the receiver to control the transmission speed, so that
the receiving node is not overwhelmed with data from transmitting node.
Error Control
Error control at the transport layer is responsible for
1. Detecting and discarding corrupted packets.
2. Keeping track of lost and discarded packets and resending them.
3. Recognizing duplicate packets and discarding them.
4. Buffering out-of-order packets until the missing packets arrive.
Error Control involves Error Detection and Error Correction
Congestion Control
Congestion in a network may occur if the load on the network (the number of packets
sent to the network) is greater than the capacity of the network (the number of packets
a network can handle).
Congestion control refers to the mechanisms and techniques that control the congestion
and keep the load below the capacity.
Congestion Control refers to techniques and mechanisms that can either prevent
congestion, before it happens, or remove congestion, after it has happened
Congestion control mechanisms are divided into two categories,
1. Open loop - prevent the congestion before it happens.
2. Closed loop - remove the congestion after it happens
Demultiplexing
Delivering the received segments at the receiver side to the correct app layer processes is
called demultiplexing.
The destination host receives the IP datagrams; each datagram has a source IP address and a
destination IP address.
Each datagram carries 1 transport layer segment.
Each segment has the source and destination port number.
The destination host uses the IP addresses and port numbers to direct the segment to the
appropriate socket.
Multiplexing and demultiplexing are just concepts that describe the process of the
transmission of data generated by different applications simultaneously. When the data
arrives at the Transport layer, each data segment is independently processed and sent to its
appropriate application in the destination machine.
Connection-oriented Service gives the guarantee Connection-less Service does not give a
of reliability. guarantee of reliability.
Ex: TCP (Transmission Control Protocol) Ex: UDP (User Datagram Protocol)
UDP Header
UDP header is an 8-byte fixed and simple header, while for TCP it may vary from 20 bytes to
60 bytes. The first 8 Bytes contain all necessary header information and the remaining part
consists of data. UDP port number fields are each 16 bits long, therefore the range for port
numbers is defined from 0 to 65535; port number 0 is reserved. Port numbers help to
distinguish different user requests or processes.
Source Port: Source Port is a 2 Byte long field used to identify the port number of the source.
Destination Port: It is a 2 Byte long field, used to identify the port of the destined packet.
Length: Length is the length of UDP including the header and the data. It is a 16-bits field.
Checksum: Checksum is 2 Bytes long field. It is the 16-bit one’s complement of the one’s
complement sum of the UDP header, the pseudo-header of information from the IP header,
and the data, padded with zero octets at the end (if necessary) to make a multiple of two
octets.
Applications of UDP
Used for simple request-response communication when the size of data is less and hence
there is lesser concern about flow and error control.
It is a suitable protocol for multicasting as UDP supports packet switching.
UDP is used for some routing update protocols like RIP(Routing Information Protocol).
Normally used for real-time applications which can not tolerate uneven delays between
sections of a received message.
VoIP (Voice over Internet Protocol) services, such as Skype and WhatsApp, use UDP for real-
time voice communication. The delay in voice communication can be noticeable if packets
are delayed due to congestion control, so UDP is used to ensure fast and efficient data
transmission.
DNS (Domain Name System) also uses UDP for its query/response messages. DNS queries are
typically small and require a quick response time, making UDP a suitable protocol for this
application.
DHCP (Dynamic Host Configuration Protocol) uses UDP to dynamically assign IP addresses to
devices on a network. DHCP messages are typically small, and the delay caused by packet
loss or retransmission is generally not critical for this application.
Following implementations uses UDP as a transport layer protocol:
o NTP (Network Time Protocol)
o DNS (Domain Name Service)
o BOOTP, DHCP.
o NNP (Network News Protocol)
o Quote of the day protocol
o TFTP, RTSP, RIP.
The application layer can do some of the tasks through UDP-
o Trace Route
o Record Route
o Timestamp
UDP takes a datagram from Network Layer , attaches its header, and sends it to the user. So,
it works fast.
UDP Checksum
Here the checksum includes three sections: a pseudo header, the UDP header, and the data
coming from the application layer.
ii. The pseudo header is the part of the header of the IP packet in which the user datagram is
to be encapsulated with some fields filled with 0s (see Figure1).
iii. If the checksum does not include the pseudo header, a user datagram may arrive safe and
sound. However, if the IP header is corrupted, it may be delivered to the wrong host.
iv. The protocol field is added to ensure that the packet belongs to UDP, and not to TCP.
v. The value of the protocol field for UDP is 17. If this value is changed during transmission,
the checksum calculation at the receiver will detect it and UDP drops the packet. It is not
delivered to the wrong protocol.
Checksum Calculation:
UDP Checksum calculation is similar to TCP Checksum computation. It’s also a 16-bit field of
one’s complement of one’s complement sum of a pseudo UDP header + UDP datagram.
Sender side:
1. It treats segment contents as sequence of 16-bit integers.
2. All segments are added. Let's call it sum.
3. Checksum: 1's complement of sum.(In 1's complement all 0s are converted into 1s and all 1s
are converted into 0s).
4. Sender puts this checksum value in UDP checksum field.
Receiver side:
1. Calculate checksum
2. All segments are added and then sum is added with sender's checksum.
3. Check that any 0 bit is presented in checksum. If receiver side checksum contains any 0 then
error is detected. So the packet is discarded by receiver.
Similarly in an unreliable channel we have design the sending and receiving side. The sending side of
the protocol is called from the above layer to rdt_send() then it will pass the data that is to be
delivered to the application layer at the receiving side (here rdt-send() is a function for sending data
where rdt stands for reliable data transfer protocol and _send() is used for the sending side).
On the receiving side, rdt_rcv() (rdt_rcv() is a function for receiving data where -rcv() is used for
receiving side), will be called when a packet arrives from the receiving side of the unreliable channel.
When the rdt protocol wants to deliver data to the application layer, it will do so by calling
deliver_data() (where deliver_data() is a function for delivering data to upper layer).
In reliable data transfer protocol, we only consider the case of unidirectional data transfer, that is
transfer of data from the sending side to receiving side(i.e. only in one direction). In case of
bidirectional (full duplex or transfer of data on both the sides) data transfer is conceptually more
difficult. Although we only consider unidirectional data transfer but it is important to note that the
sending and receiving sides of our protocol will needs to transmit packets in both directions, as
shown in above figure.
In order to exchange packets containing the data that is needed to be transferred the both (sending
and receiving) sides of rdt also need to exchange control packets in both direction (i.e., back and
forth), both the sides of rdt send packets to the other side by a call to udt_send() (udt_send() is a
function used for sending data to other side where udt stands for unreliable data transfer protocol).
Receiving Side: On receiving data from the channel, RDT simply accepts data via
the rdt_rcv(data) event. Then it extracts the data from the packet (via the extract(packet, data)) and
sends the data to the application layer using the deliver_data(data) event.
RDT1.0: Receiving Side FSM
No feedback is required by the receiving side as the channel is perfectly reliable, that is, no errors are
possible during the data transmission through the underlying channel.
Another technique is Receiver Feedback since the sender and receiver are executing on different end
systems, the only way for the sender to learn of the receiver’s scenario i.e., whether or not a packet
was received correctly, it is that the receiver should provide explicit feedback to the sender. The
positive (ACK) and negative acknowledgement (NAK) replies in the message dictation scenario are an
example of such feedback. A zero value indicate a NAK and a value of 1 indicate an ACK.
Sending Side
The send side of RDT 2.0 has two states. In one state, the send-side protocol is waiting for data to be
passed down from the upper layer to lower layer . In the other state, the sender protocol is waiting
for an ACK or a NAK packet from the receiver( a feedback). If an ACK packet is received i.e
rdt_rcv(rcvpkt) && is ACK(rcvpkt), the sender knows that the most recently transmitted packet has
been received correctly and thus the protocol returns to the state of waiting for data from the upper
layer.
If a NAK is received, the protocol re-transmits the last packet and waits for an ACK or NAK to be
returned by the receiver in response to the re-transmitted data packet. It is important to note that
when the sender is in the wait-for-ACK-or-NAK state, it can not get more data from the upper layer,
that will only happen after the sender receives an ACK and leaves this state. Thus, the sender will not
send a new piece of data until it is sure that the receiver has correctly received the current packet,
due to this behavior of protocol this protocol is also known as Stop and Wait Protocol.
Receiving Side
The receiver-site has a single state, as soon as the packet arrives, the receiver replies with either an
ACK or a NAK, depending on whether or not the received packet is corrupted i.e. by using
rdt_rcv(rcvpkt) && corrupt(rcvpkt) where a packet is received and is found to be in error or
rdt_rcv(rcvpkt) && not corrupt(rcvpkt) where a packet received is correct.
RDT 2.0 may look as if it works but it has some has some flaws. It is difficult to understand whether
the bits to ACK/NAK packets are actually corrupted or not, if the packet is corrupted how protocol
will recover from this errors in ACK or NAK packets. The difficulty here is that if an ACK or NAK is
corrupted, the sender has no way of knowing whether or not the receiver has correctly received the
last piece of transmitted data or not.
Pipelined Reliable Data Transfer Protocols
Pipelined reliable data transfer protocols allow the sender to start sending a second packet before
receiving acknowledgment for the first, which can improve performance. This is possible because the
packets in transit are visualized as filling a pipeline.
In computer networking, pipelining is the method of sending multiple data units without waiting for
an acknowledgment for the first frame sent. Pipelining ensures better utilization of network
resources and also increases the speed of delivery, particularly in situations where a large number of
data units make up a message to be sent.
Go-Back-N
Go-Back-N ARQ is a protocol in the transport layer that uses the sliding window protocol to
exchange data
The Go-Back-N protocol is a data link layer and transport layer protocol that employs the sliding
window approach to send data frames reliably and sequentially. We'll first look at the sliding window
protocol and then review the Go-Back-N functions. The sliding window protocol enables sending
numerous frames at once
Selective repeat
It is also known as Sliding Window Protocol and used for error detection and control in the data link
layer.
In the selective repeat, the sender sends several frames specified by a window size even without the
need to wait for individual acknowledgement from the receiver as in Go-Back-N ARQ. In selective
repeat protocol, the retransmitted frame is received out of sequence.
In Selective Repeat ARQ only the lost or error frames are retransmitted, whereas correct frames are
received and buffered.
The receiver while keeping track of sequence numbers buffers the frames in memory and sends
NACK for only frames which are missing or damaged. The sender will send/retransmit a packet for
which NACK is received.
Refer Explanation is similar in unit II
TCP connection
TCP is a connection-oriented protocol and every connection-oriented protocol needs to establish a
connection in order to reserve resources at both the communicating ends.
Connection Establishment –
TCP connection establishment involves a three-way handshake to ensure reliable communication
between devices. Understanding each step of this handshake process is critical for networking
professionals. For a deeper dive into how TCP connections are established and managed
1. Sender starts the process with the following:
Sequence number (Seq=521): contains the random initial sequence number generated at
the sender side.
Syn flag (Syn=1): request the receiver to synchronize its sequence number with the above-
provided sequence number.
Maximum segment size (MSS=1460 B): sender tells its maximum segment size, so that
receiver sends datagram which won’t require any fragmentation. MSS field is present
inside Option field in TCP header.
Window size (window=14600 B): sender tells about his buffer capacity in which he has to
store messages from the receiver.
2. TCP is a full-duplex protocol so both sender and receiver require a window for receiving messages
from one another.
Sequence number (Seq=2000): contains the random initial sequence number generated at
the receiver side.
Syn flag (Syn=1): request the sender to synchronize its sequence number with the above-
provided sequence number.
Maximum segment size (MSS=500 B): receiver tells its maximum segment size, so that
sender sends datagram which won’t require any fragmentation. MSS field is present
inside Option field in TCP header.
Since MSS receiver < MSS sender , both parties agree for minimum MSS i.e., 500 B to avoid
fragmentation of packets at both ends.
Window size (window=10000 B): receiver tells about his buffer capacity in which he has to store
messages from the sender
Acknowledgement Number (Ack no.=522): Since sequence number 521 is received by the
receiver so, it makes a request for the next sequence number with Ack no.=522 which is the
next packet expected by the receiver since Syn flag consumes 1 sequence no.
ACK flag (ACk=1): tells that the acknowledgement number field contains the next sequence
expected by the receiver.
3. Sender makes the final reply for connection establishment in the following way:
Sequence number (Seq=522): since sequence number = 521 in 1 st step and SYN flag
consumes one sequence number hence, the next sequence number will be 522.
Acknowledgement Number (Ack no.=2001): since the sender is acknowledging SYN=1
packet from the receiver with sequence number 2000 so, the next sequence number
expected is 2001.
ACK flag (ACK=1): tells that the acknowledgement number field contains the next sequence
expected by the sender.
TCP is a byte-oriented protocol, which means that the sender writes bytes into a
TCP connection and the receiver reads bytes out of the TCP connection.
TCP does not, itself, transmit individual bytes over the Internet.
TCP on the source host buffers enough bytes from the sending process to fill a
reasonably sized packet and then sends this packet to its peer on the destination
host.
TCP on the destination host then empties the contents of the packet into a
receive buffer, and the receiving process reads from this buffer at its leisure.
TCP connection supports byte streams flowing in both directions.
The packets exchanged between TCP peers are called segments, since each one
carries a segment of the byte stream
TCP PACKET FORMAT
A TCP segment consists of data bytes to be sent and a header that is added to the
data by TCP as shown:
The header of a TCP segment can range from 20-60 bytes. 40 bytes are for options.
If there are no options, a header is 20 bytes else it can be of upmost 60 bytes.
Header fields:
Source Port Address –
A 16-bit field that holds the port address of the application that is sending the data
segment.
Sequence Number –
A 32-bit field that holds the sequence number, i.e, the byte number of the first
byte that is sent in that particular segment. It is used to reassemble the message at
the receiving end of the segments that are received out of order.
Acknowledgement Number –
A 32-bit field that holds the acknowledgement number, i.e, the byte number that
the receiver expects to receive next. It is an acknowledgement for the previous
bytes being received successfully.
Control flags –
These are 6 1-bit control bits that control connection establishment, connection
termination, connection abortion, flow control, mode of transfer etc. Their
function is:
URG: Urgent pointer is valid
ACK: Acknowledgement number is valid( used in case of cumulative
acknowledgement)
PSH: Request for push
RST: Reset the connection
SYN: Synchronize sequence numbers
FIN: Terminate the connection
Window size –
This field tells the window size of the sending TCP in bytes.
Checksum –
This field holds the checksum for error control. It is mandatory in TCP as opposed
to UDP.
Urgent pointer –
This field (valid only if the URG control flag is set) is used to point to data that is
urgently required that needs to reach the receiving process at the earliest. The
value of this field is added to the sequence number to get the byte number of the
last urgent byte.
Measures of RTT
There are three ways to calculate RTT:
1. Estimated RTT
2. Deviation in RTT
3. Time-out Interval
Estimated RTT
In computer communication networking, the RTTs of different packets can be different. For
example, the first packet takes the round trip time of 1.1ms, the second packet takes 1.3ms,
and the third packet takes 0.98ms so, each sample RTT varies. That is why estimated RTT is
used, as it is the average of recent measurements, not just the current sample RTT.
Formula:
Estimated RTT = (1- α) * Estimated RTT + α * Sample RTT
Where typically, the value of α = 0.125.
For example, the sample RTT is 100ms, we have to compute the estimated RTT using α =
0.125, and we assume the value of the estimated RTT just before the sample RTT
was 110ms. So, by using the formula, we get:
Estimated RTT = (1-0.125) * 110ms + (0.125) * 100ms
Estimated RTT = (0.875) * 110ms + 12.5ms
Estimated RTT = 96.25ms + 12.5ms = 108.75ms
Hence, the required estimated RTT is 108.75ms.
Deviation in RTT
Deviation in RTT, also known as Dev-RTT is a measure that indicates how evenly the RTT is
distributed during the measurement. It depends upon the previous estimated RTT and helps
to find the retransmission time-out.
Formula:
DevRTT = (1- β) * DevRTT + β * |Sample RTT - Estimated RTT|
Where typically, the value of β = 0.125.
For example, the sample RTT is 100ms, we have to compute the DevRTT, and we assume the
value of the estimated RTT is 108.75ms and the previous DevRTT was 20ms. So, by using the
formula, we get:
Dev RTT = (1-0.125) * 20ms + (0.125) * |100ms - 108.75ms|
Dev RTT = (0.875) * 20ms + (0.125) * 8.75ms
Dev RTT = 17.5ms + 1.09ms = 18.59ms
Hence, the required DevRTT is 18.59ms
Time-out
Time-out is longer than RTT, but as RTT varies, we need to add some margin i.e., safety
margin, to it. The selection of a time-out value is essential because the longer the value of
estimated RTT, the slower its performance. We will be facing long delays in this case.
Similarly, in the case of a minimal value, the connection can be lost before the RTT
completes or before the arrival of the response or acknowledgment. So for the safety, we
use:
Formula:
Time-out Interval = 4 * DevRTT + Estimated RTT
Where the DevRTT value is used for a safety margin.
So, using the above-computed values of DevRTT = 18.59ms and Estimated RTT = 108.75ms,
we compute the time-out interval as:
Time-out Interval = 4 * 18.59ms + 108.75ms
Time-out Interval = 74.36ms + 108.75ms = 183.11ms
Hence, the required time-out interval is 183.11ms.
We can see that the time-out interval is dependent on both DevRTT and estimated RTT. Also,
DevRTT is dependent on estimated RTT. So, if we need to find DevRTT, we have to compute
the estimated RTT first. To compute the time-out interval, we have to find both DevRTT and
estimated RTT first.
Connection Establishment
While opening a TCP connection the two nodes(client and server) want to agree on a set
of parameters.
The parameters are the starting sequence numbers that is to be used for their respective
byte streams.
Connection establishment in TCP is a three-way handshaking.
1. Client sends a SYN segment to the server containing its initial sequence number (Flags =
SYN, SequenceNum = x)
2. Server responds with a segment that acknowledges client’s segment and specifies its
initial sequence number (Flags = SYN + ACK, ACK = x + 1 SequenceNum = y).
3. Finally, client responds with a segment that acknowledges server’s sequence number
(Flags = ACK, ACK = y + 1).
The reason that each side acknowledges a sequence number that is one larger than the
one sent is that the Acknowledgment field actually identifies the “next sequence number
expected,”
A timer is scheduled for each of the first two segments, and if the expected response is not
received, the segment is retransmitted.
Data Transfer
After connection is established, bidirectional data transfer can take place.
The client and server can send data and acknowledgments in both directions.
The data traveling in the same direction as an acknowledgment are carried on the same
segment.
The acknowledgment is piggybacked with the data. Connection
Termination
Connection termination or teardown can be done in two ways :
Three-way Close and Half-Close
Three-way Close—Both client and server close simultaneously
1. Retransmission Policy :
It is the policy in which retransmission of the packets are taken care of. If the sender
feels that a sent packet is lost or corrupted, the packet needs to be retransmitted.
This transmission may increase the congestion in the network.
To prevent congestion, retransmission timers must be designed to prevent
congestion and also able to optimize efficiency.
2. Window Policy :
The type of window at the sender’s side may also affect the congestion. Several
packets in the Go-back-n window are re-sent, although some packets may be
received successfully at the receiver side. This duplication may increase the
congestion in the network and make it worse.
Therefore, Selective repeat window should be adopted as it sends the specific packet
that may have been lost.
3. Discarding Policy :
A good discarding policy adopted by the routers is that the routers may prevent
congestion and at the same time partially discard the corrupted or less sensitive
packages and also be able to maintain the quality of a message.
In case of audio file transmission, routers can discard less sensitive packets to
prevent congestion and also maintain the quality of the audio file.
4. Acknowledgment Policy :
Since acknowledgements are also the part of the load in the network, the
acknowledgment policy imposed by the receiver may also affect congestion. Several
approaches can be used to prevent congestion related to acknowledgment.
The receiver should send acknowledgement for N packets rather than sending
acknowledgement for a single packet. The receiver should send an acknowledgment
only if it has to send a packet or a timer expires.
5. Admission Policy :
In admission policy a mechanism should be used to prevent congestion. Switches in a
flow should first check the resource requirement of a network flow before
transmitting it further. If there is a chance of a congestion or there is a congestion in
the network, router should deny establishing a virtual network connection to prevent
further congestion.
All the above policies are adopted to prevent congestion before it happens in the
network.
Congestion control ensures that a network can handle traffic efficiently without data
loss.
1. Backpressure :
Backpressure is a technique in which a congested node stops receiving packets from
upstream node. This may cause the upstream node or nodes to become congested and
reject receiving data from above nodes. Backpressure is a node-to-node congestion control
technique that propagate in the opposite direction of data flow. The backpressure technique
can be applied only to virtual circuit where each node has information of its above upstream
node.
In above diagram the 3rd node is congested and stops receiving packets as a result 2nd node
may be get congested due to slowing down of the output data flow. Similarly 1st node may
get congested and inform the source to slow down.
4. Explicit Signaling :
In explicit signaling, if a node experiences congestion it can explicitly send a packet to the
source or destination to inform about congestion. The difference between choke packet and
explicit signaling is that the signal is included in the packets that carry data rather than
creating a different packet as in case of choke packet technique.
Explicit signaling can occur in either forward or backward direction.
Forward Signaling : In forward signaling, a signal is sent in the direction of the
congestion. The destination is warned about congestion. The receiver in this case
adopt policies to prevent further congestion.
Backward Signaling : In backward signaling, a signal is sent in the opposite direction
of the congestion. The source is warned about congestion and it needs to slow down.
Figure 3.42: Congestion scenario 1: Throughput and delay as a function of host sending rate
Achieving a per-connection throughput of R/2 might actually appear to be a "good thing," as
the link is fully utilized in delivering packets to their destinations. The right-hand graph in
Figure 3.42, however, shows the consequences of operating near link capacity. As the
sending rate approaches R/2 (from the left), the average delay becomes larger and larger.
When the sending rate exceeds R/2, the average number of queued packets in the router is
unbounded, and the average delay between source and destination becomes infinite
(assuming that the connections operate at these sending rates for an infinite period of time).
Thus, while operating at an aggregate throughput of near R may be ideal from a throughput
standpoint, it is far from ideal from a delay standpoint. Even in this (extremely) idealized
scenario, we've already found one cost of a congested network--large queuing delays are
experienced as the packet-arrival rate nears the link capacity.
Scenario 2: Two Senders, a Router with Finite Buffers
Let us now slightly modify scenario 1 in the following two ways (see Figure 3.43). First, the
amount of router buffering is assumed to be finite. Second, we assume that each connection
is reliable. If a packet containing a transport-level segment is dropped at the router, it will
eventually be retransmitted by the sender. Because packets can be retransmitted, we must
now be more careful with our use of the term "sending rate." Specifically, let us again
denote the rate at which the application sends original data into the socket by in bytes/sec.
The rate at which the transport layer sends segments (containing original
data or retransmitted data) into the network will be denoted in' bytes/sec. in' is
sometimes referred to as the offered load to the network.
Figure 3.43: Scenario 2: Two hosts (with retransmissions) and a router with finite buffers
The performance realized under scenario 2 will now depend strongly on how retransmission
is performed. First, consider the unrealistic case that Host A is able to somehow (magically!)
determine whether or not a buffer is free in the router and thus sends a packet only when a
buffer is free. In this case, no loss would occur, in would be equal to in', and the
throughput of the connection would be equal to in. This case is shown by the upper curve
in Figure 3.44(a). From a throughput standpoint, performance is ideal--everything that is
sent is received. Note that the average host sending rate cannot exceed R/2 under this
scenario, since packet loss is assumed never to occur.
Figure 3.44: Scenario 2 performance
Consider next the slightly more realistic case that the sender retransmits only when a packet
is known for certain to be lost. (Again, this assumption is a bit of a stretch. However, it is
possible that the sending host might set its timeout large enough to be virtually assured that
a packet that has not been acknowledged has been lost.) In this case, the performance
might look something like that shown in Figure 3.44(b). To appreciate what is happening
here, consider the case that the offered load, in' (the rate of original data transmission plus
retransmissions), equals 0.5R. According to Figure 3.44(b), at this value of the offered load,
the rate at which data are delivered to the receiver application is R/3. Thus, out of the
0.5R units of data transmitted, 0.333R bytes/sec (on average) are original data and
0.266R bytes per second (on average) are retransmitted data. We see here another cost of a
congested network--the sender must perform retransmissions in order to compensate for
dropped (lost) packets due to buffer overflow.
Finally, let us consider the case that the sender may timeout prematurely and retransmit a
packet that has been delayed in the queue, but not yet lost. In this case, both the original
data packet and the retransmission may both reach the receiver. Of course, the receiver
needs but one copy of this packet and will discard the retransmission. In this case, the
"work" done by the router in forwarding the retransmitted copy of the original packet was
"wasted," as the receiver will have already received the original copy of this packet. The
router would have better used the link transmission capacity to send a different packet
instead. Here then is yet another cost of a congested network--unneeded retransmissions by
the sender in the face of large delays may cause a router to use its link bandwidth to forward
unneeded copies of a packet. The lower curve in Figure 3.44(a) shows the throughput versus
offered load when each packet is assumed to be forwarded (on average) twice by the router.
Since each packet is forwarded twice, the throughput achieved will be given by the line
segment in Figure 3.44(a) with the asymptotic value of R/4.
Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop Paths
In our final congestion scenario, four hosts transmit packets, each over overlapping two-hop
paths, as shown in Figure 3.45. We again assume that each host uses a timeout/
retransmission mechanism to implement a reliable data transfer service, that all hosts have
the same value of in, and that all router links have capacity R bytes/sec.
Figure 3.45: Four senders, routers with finite buffers, and multihop paths
Let us consider the connection from Host A to Host C, passing through Routers R1 and R2.
The A-C connection shares router R1 with the D-B connection and shares router R2 with the
B-D connection. For extremely small values of in, buffer overflows are rare (as in congestion
scenarios 1 and 2), and the throughput approximately equals the offered load. For slightly
larger values of in, the corresponding throughput is also larger, as more original data is
being transmitted into the network and delivered to the destination, and overflows are still
rare. Thus, for small values of in, an increase in in results in an increase in out.
Having considered the case of extremely low traffic, let us next examine the case that
in (and hence in') is extremely large. Consider router R2. The A-C traffic arriving to router R2
(which arrives at R2 after being forwarded from R1) can have an arrival rate at R2 that is at
most R, the capacity of the link from R1 to R2, regardless of the value of in. If in' is
extremely large for all connections (including the B-D connection), then the arrival rate of B-
D traffic at R2 can be much larger than that of the A-C traffic. Because the A-C and B-D traffic
must compete at router R2 for the limited amount of buffer space, the amount of A-C traffic
that successfully gets through R2 (that is, is not lost due to buffer overflow) becomes smaller
and smaller as the offered load from B-D gets larger and larger. In the limit, as the offered
load approaches infinity, an empty buffer at R2 is immediately filled by a B-D packet, and the
throughput of the A-C connection at R2 goes to zero. This, in turn, implies that the A-C end-
end throughput goes to zero in the limit of heavy traffic. These considerations give rise to the
offered load versus throughput tradeoff shown in Figure 3.46.
Figure 3.46: Scenario 3 performance with finite buffers and multihop paths
The reason for the eventual decrease in throughput with increasing offered load is evident
when one considers the amount of wasted "work" done by the network. In the high-traffic
scenario outlined above, whenever a packet is dropped at a second-hop router, the "work"
done by the first-hop router in forwarding a packet to the second-hop router ends up being
"wasted." The network would have been equally well off (more accurately, equally bad off) if
the first router had simply discarded that packet and remained idle. More to the point, the
transmission capacity used at the first router to forward the packet to the second router
could have been much more profitably used to transmit a different packet. (For example,
when selecting a packet for transmission, it might be better for a router to give priority to
packets that have already traversed some number of upstream routers.) So here we see yet
another cost of dropping a packet due to congestion--when a packet is dropped along a
path, the transmission capacity that was used at each of the upstream routers to forward
that packet to the point at which it is dropped ends up having been wasted.
3.6.2: Approaches toward Congestion Control
In Section 3.7, we'll examine TCP's specific approach towards congestion control in great
detail. Here, we identify the two broad approaches that are taken in practice toward
congestion control, and discuss specific network architectures and congestion-control
protocols embodying these approaches.
At the broadest level, we can distinguish among congestion-control approaches based on
whether or not the network layer provides any explicit assistance to the transport layer for
congestion-control purposes:
End-end congestion control. In an end-end approach toward congestion control, the
network layer provides no explicit support to the transport layer for congestion-
control purposes. Even the presence of congestion in the network must be inferred
by the end systems based only on observed network behavior (for example, packet
loss and delay). We will see in Section 3.7 that TCP must necessarily take this end-
end approach toward congestion control, since the IP layer provides no feedback to
the end systems regarding network congestion. TCP segment loss (as indicated by a
timeout or a triple duplicate acknowledgment) is taken as an indication of network
congestion and TCP decreases its window size accordingly. We will also see that new
proposals for TCP use increasing round-trip delay values as indicators of increased
network congestion.
Network-assisted congestion control. With network-assisted congestion control,
network-layer components (that is, routers) provide explicit feedback to the sender
regarding the congestion state in the network. This feedback may be as simple as a
single bit indicating congestion at a link. This approach was taken in the early IBM
SNA [Schwartz 1982] and DEC DECnet [Jain 1989; Ramakrishnan 1990] architectures,
was recently proposed for TCP/IP networks [Floyd TCP 1994; RFC 2481], and is used
in ATM available bit-rate (ABR) congestion control as well, as discussed below. More
sophisticated network-feedback is also possible. For example, one form of ATM ABR
congestion control that we will study shortly allows a router to explicitly inform the
sender of the transmission rate it (the router) can support on an outgoing link.
For network-assisted congestion control, congestion information is typically fed back from
the network to the sender in one of two ways, as shown in Figure 3.47. Direct feedback may
be sent from a network router to the sender. This form of notification typically takes the
form of a choke packet (essentially saying, "I'm congested!"). The second form of
notification occurs when a router marks/updates a field in a packet flowing from sender to
receiver to indicate congestion. Upon receipt of a marked packet, the receiver then notifies
the sender of the congestion indication. Note that this latter form of notification takes at
least a full round-trip time.