Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views177 pages

CN Merged May 2024 For Print

Devashish Gosain is an Assistant Professor at BITS Goa with a PhD from IIIT-Delhi and postdoctoral experience at Max Planck Institute and KU Leuven. His research interests focus on network security and privacy, and he has been a visiting researcher at several institutions. The course outlined in the document covers traditional and wireless networks, security, and management, requiring knowledge of C programming and operating systems.

Uploaded by

Chief Fuzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views177 pages

CN Merged May 2024 For Print

Devashish Gosain is an Assistant Professor at BITS Goa with a PhD from IIIT-Delhi and postdoctoral experience at Max Planck Institute and KU Leuven. His research interests focus on network security and privacy, and he has been a visiting researcher at several institutions. The course outlined in the document covers traditional and wireless networks, security, and management, requiring knowledge of C programming and operating systems.

Uploaded by

Chief Fuzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 177

IC: Devashish Gosain

• Education
• PhD -- IIIT-Delhi (2015-2020)
Computer Networks (WILP) • Employment
• Postdoctoral researcher at Max Planck Institute for Informatics (Germany), hosted by Prof. Anja Feldman
Jan to May 2024 (2020-2022)
• Postdoctoral researcher at COSIC, KU Leuven (Belgium), hosted by Prof. Claudia Diaz (2022-2023)
• On August 1, 2023 I joined BITS Goa as Assistant Prof.
• Research Interests
Devashish Gosain • Networks Security and Privacy

BITS Pilani Goa • Visiting Researcher


• TU Delft, UCSD, UW Madison, Brigham Young University (BYU)

What to expect from this course Knowledge Required


• Learn how traditional wired networks work C programming knowledge required
• Learn how wireless networks work Operating Systems
• Learn about security in networks and network management
• Focus primarily on the internet
Test books and other resources Let us check some key terms
• What is a client? What is a server?
• James F. Kurose and Keith W. Ross, Computer Networking: A top
down approach, 6th edition, Pearson In, 2017. • What is TCP/IP? What is UDP?
• Andrew S. Tanenbaum and David J. Wetherall, Computer Networks, • What is a firewall?
5th edition, Pearson In, 2013. • What is 802.11?
• What is an IP address? What is a MAC address?
• I will share the slides, video recordings and a few research papers. • What is a packet?

Chapter 1: introduction
Chapter 1 our goal: overview:
Introduction • get “feel” and • what’s the Internet?
terminology • what’s a protocol?
• more depth, detail • network edge; hosts, access net,
A note on the use of these ppt slides: later in course physical media
We’re making these slides freely available to all (faculty, students, readers). Computer •
They’re in PowerPoint form so you see the animations; and can add, modify,
Networking: A Top • approach: network core: packet/circuit
and delete slides (including this one) and slide content to suit your needs.
switching, Internet structure
• use Internet as
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
 If you use these slides (e.g., in a class) that you mention their source
6th edition • performance: loss, delay, throughput
(after all, we’d like people to use our book!)
Jim Kurose, Keith Ross
example
 If you post any slides on a www site, that you note that they are adapted
• security
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012 • protocol layers, service models
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012 • history


J.F Kurose and K.W. Ross, All Rights Reserved

Introduction 1-7 Introduction 1-8


Chapter 1: roadmap Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
1.1 what is the Internet?
 end systems, access networks, links
1.2 network edge
1.3 network core
 end systems, access networks, links
 packet switching, circuit switching, network structure
1.3 network core
1.4 delay, loss, throughput in networks
 packet switching, circuit switching, network structure
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-9 Introduction 1-10

What’s the Internet: “nuts and bolts” view “Fun” internet appliances
PC • millions of connected mobile network
server computing devices:
• hosts = end systems Web-enabled toaster +
wireless global ISP
laptop • running network apps weather forecaster
smartphone
IP picture frame
home
network http://www.ceiva.com/
 communication links regional ISP
wireless  fiber, copper, radio,
links satellite Tweet-a-watt:
wired
links  transmission rate: monitor energy use
bandwidth

 Packetswitches: forward Slingbox: watch,


router
packets (chunks of data) institutional
control cable TV remotely

 routers and switches network Internet


refrigerator Internet phones
Introduction 1-11 Introduction 1-12
What’s the Internet: a service view
What’s the Internet: “nuts and bolts” view
mobile network mobile network
• Infrastructure that provides
• Internet: “network of networks”
• Interconnected ISPs
global ISP services to applications: global ISP
• Web, VoIP, email, games, e-
• protocols control sending, commerce, social nets, …
receiving of msgs home home
network • provides programming network
• e.g., TCP, IP, HTTP, Skype, 802.11 regional ISP regional ISP
interface to apps
• Internet standards • hooks that allow sending
• RFC: Request for comments and receiving app programs
• IETF: Internet Engineering Task Force to “connect” to Internet
• provides service options,
analogous to postal service
institutional institutional
network network

Introduction 1-13 Introduction 1-14

What’s a protocol? What’s a protocol?


human protocols: network protocols: a human protocol and a computer network protocol:
• “what’s the time?” • machines rather than
• “I have a question” humans
• all communication activity Hi TCP connection
• introductions request
in Internet governed by
Hi TCP connection
protocols
… specific msgs sent Got the
response

… specific actions taken protocols define format, order of time? Get http://www.awl.com/kurose-ross
when msgs received, or msgs sent and received among 2:00
other events <file>
network entities, and actions
time
taken on msg transmission,
receipt
Q: other human protocols?
Introduction 1-15 Introduction 1-16
A closer look at network structure:
Chapter 1: roadmap
• network edge: mobile network
• hosts: clients and servers
• servers often in data centers global ISP
1.1 what is the Internet?
1.2 network edge
home
 end systems, access networks, links  access networks, physical network
regional ISP
1.3 network core media: wired, wireless
 packet switching, circuit switching, network structure communication links

 network core:
 interconnected routers
 network of networks institutional
network

Introduction 1-17 Introduction 1-18

Access networks and physical media Access net: digital subscriber line (DSL)

Q: How to connect end central office telephone


network
systems to edge router?
• residential access nets DSL splitter
modem DSLAM
• institutional access networks
(school, company) ISP
voice, data transmitted
• mobile access networks at different frequencies over DSL access
dedicated line to central office multiplexer

keep in mind:
 use existing telephone line to central office DSLAM
• bandwidth (bits per second)
of access network?  data over DSL phone line goes to Internet
 voice over DSL phone line goes to telephone net
• shared or dedicated?
 < 2.5 Mbps upstream transmission rate (typically < 1 Mbps)
 < 24 Mbps downstream transmission rate (typically < 10 Mbps)
Introduction 1-19 Introduction 1-20
Access net: cable network Access net: home network
cable headend
wireless
… devices

cable splitter cable modem


modem CMTS termination system

data, TV transmitted at different


frequencies over shared cable ISP to/from headend or
distribution network central office
often combined
in single box
 HFC: hybrid fiber coax
 asymmetric: up to 30Mbps downstream transmission rate, 2
Mbps upstream transmission rate cable or DSL modem

 network of cable, fiber attaches homes to ISP router wireless access router, firewall, NAT
 homes share access network to cable headend point (54 Mbps)
wired Ethernet (100 Mbps)
 unlike DSL, which has dedicated access to central office
Introduction 1-21 Introduction 1-22

Enterprise access networks (Ethernet) Wireless access networks


• shared wireless access network connects end system to router
• via base station aka “access point”

institutional link to wireless LANs: wide-area wireless access


ISP (Internet)  within building (100 ft)  provided by telco (cellular)
 802.11b/g (WiFi): 11, 54 Mbps operator, 10’s km
institutional router
transmission rate  between 1 and 10 Mbps
Ethernet institutional mail,  3G, 4G: LTE
switch web servers

• typically used in companies, universities, etc


 10 Mbps, 100Mbps, 1Gbps, 10Gbps transmission rates to Internet
 today, end systems typically connect into Ethernet switch
to Internet
Introduction 1-23 Introduction 1-24
Physical media Physical media: coax, fiber

• bit: propagates between coaxial cable: fiber optic cable:


transmitter/receiver pairs • two concentric copper  glass fiber carrying light
• physical link: what lies between
twisted pair (TP) conductors pulses, each pulse a bit
transmitter & receiver • two insulated copper wires  high-speed operation:
• bidirectional
• Category 5: 100 Mbps, 1  high-speed point-to-point
• guided media: • broadband: transmission (e.g., 10’s-100’s
Gpbs Ethernet
• signals propagate in solid • Category 6: 10Gbps • multiple channels on cable Gpbs transmission rate)
media: copper, fiber, coax • HFC  low error rate:
 repeaters spaced far apart
• unguided media:  immune to electromagnetic
• signals propagate freely, noise
e.g., radio

Introduction 1-25 Introduction 1-26

Physical media: radio Chapter 1: roadmap


• signal carried in radio link types:
electromagnetic spectrum  terrestrial microwave
• no physical “wire”  e.g. up to 45 Mbps channels 1.1 what is the Internet?
 LAN (e.g., WiFi)
• bidirectional 1.2 network edge
 11Mbps, 54 Mbps
• propagation environment  wide-area (e.g., cellular)  end systems, access networks, links
effects:  3G cellular: ~ few Mbps 1.3 network core
• reflection  satellite
 packet switching, circuit switching, network structure
• obstruction by objects  Kbps to 45Mbps channel (or
multiple smaller channels)
• interference
 270 msec end-end delay
 geosynchronous versus low
altitude

Introduction 1-27 Introduction 1-28


Host: sends packets of data
The network core
host sending function:
• takes application message • mesh of interconnected
• breaks into smaller chunks, two packets, routers
known as packets, of length L L bits each
bits • packet-switching: hosts
• transmits packet into access
network at transmission rate break application-layer
R 2 1 messages into packets
• link transmission rate,
aka link capacity, aka link R: link transmission rate • forward packets from one
bandwidth host router to the next, across
links on path from source
to destination
• each packet transmitted at
packet time needed to L (bits) full link capacity
transmission = transmit L-bit =
delay packet into link R (bits/sec)
1-29 Introduction 1-30

Packet-switching: store-and-forward Alternative core: circuit switching


end-end resources allocated to,
L bits reserved for “call” between
per packet source & dest:
3 2 1 • In diagram, each link has four
source destination circuits.
R bps R bps
• call gets 2nd circuit in top link
• takes L/R seconds to transmit and 1st circuit in right link.
(push out) L-bit packet into one-hop numerical • dedicated resources: no sharing
link at R bps example: • circuit-like (guaranteed)
performance
• store and forward: entire  L = 7.5 Mbits
• circuit segment idle if not used by
packet must arrive at router  R = 1.5 Mbps call (no sharing)
before it can be transmitted
 one-hop transmission • Commonly used in traditional
on next link delay = 5 sec telephone networks
 end-end delay = 2L/R
(assuming zero propagation more on delay shortly …
delay) Introduction 1-31 Introduction 1-32
Packet switching versus circuit switching Internet structure: network of networks

is packet switching a “slam dunk winner?”  End systems connect to Internet via access ISPs (Internet
Service Providers)
• great for bursty data
 Residential, company and university ISPs
• resource sharing
 Access ISPs in turn must be interconnected.
• simpler, no call setup
 So that any two hosts can send packets to each other
• excessive congestion possible: packet delay and loss  Resulting network of networks is very complex
• protocols needed for reliable data transfer, congestion  Evolution was driven by economics and national policies
control
 Let’s take a stepwise approach to describe current Internet
• Q: How to provide circuit-like behavior? structure
• bandwidth guarantees needed for audio/video apps
• still an unsolved problem (chapter 7)

Introduction 1-33

Internet structure: network of networks Internet structure: network of networks


Question: given millions of access ISPs, how to connect them Option: connect each access ISP to every other access ISP?
together?
access access access access
net net net net
access access
net net
access access
access net access net
net net
access access
access net access net
net net

connecting each access ISP to


access access access
each other directly doesn’t scale. access
net net net net

access access
net net
access access
net net

access access
net net
access access
net net
access access access access
net access net net access net
net net
Internet structure: network of networks Internet structure: network of networks
Option: connect each access ISP to a global transit ISP? Customer But if one global ISP is viable business, there will be competitors
and provider ISPs have economic agreement. ….
access access access access
net net net net
access access
net net
access access
access net access net
net net
access access
access net access net
net net
ISP A

global
access
net
ISP access
net
access
net ISP B
access
net

access access
ISP C
net net
access access
net net

access access
net net
access access
net net
access access access access
net access net net access net
net net

Internet structure: network of networks Internet structure: network of networks


But if one global ISP is viable business, there will be competitors … and regional networks may arise to connect access nets to
…. which must be interconnected ISPS
access access
Internet exchange point access access
net net net net
access access
net net
access access
access net access net
net net

access
IXP access
net access
IXP access
net
net
ISP A net
ISP A

access IXP access access IXP access


net ISP B net net ISP B net

access
ISP C access
ISP C
net net
access access
net net

access peering link access


regional net
net net
access access
net net
access access access access
net access net net access net
net net
Internet structure: network of networks Internet structure: network of networks
… and content provider networks (e.g., Google, Microsoft,
Akamai ) may run their own network, to bring services, content
close to end users Tier 1 ISP Tier 1 ISP Google
access access
net net
access
net

access
access
net IXP IXP IXP
net

access
IXP access
net
net
ISP A
Regional ISP Regional ISP
Content provider network
access IXP access
net ISP B net

access access access access access access access access


access
ISP B ISP ISP ISP ISP ISP ISP ISP ISP
net
access
net

access
• at center: small # of well-connected large networks
regional net
• “tier-1” commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national &
net
access
net
access
net access
access
net
international coverage
• content provider network (e.g, Google): private network that connects it
net

Introduction 1-42
data centers to Internet, often bypassing tier-1, regional ISPs

CH 2: outline
CH 2
Application Layer 2.1 principles of network
applications
2.2 Web and HTTP

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
 If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition
 If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved

Application Layer 2-1 Application Layer 2-2


CH 2: application layer Some network apps
 e-mail  voice over IP (e.g., Skype)
our goals: web real-time video
 learn about protocols by  
 conceptual, examining popular conferencing
 text messaging
implementation aspects application-level  social networking
of network application  remote login
protocols  search
protocols  HTTP
 P2P file sharing
 transport-layer  multi-user network games  …
service models  streaming stored video  …
 client-server (YouTube, Hulu, Netflix)
paradigm
 peer-to-peer
paradigm

Application Layer 2-3 Application Layer 2-4

Creating a network app


Application architectures
write programs that: application
transport
 run on (different) end systems network
data link

 communicate over network


physical possible structure of applications:
 e.g., web server software  client-server
communicates with browser  peer-to-peer (P2P)
software
no need to write software for
network-core devices
 network-core devices do not
run user applications application
transport
 applications on end systems network
data link application
allows for rapid app physical transport
network
development, propagation data link
physical

Application Layer 2-5 Application Layer 2-6


Client-server architecture P2P architecture
 no always-on server
server: peer-peer
 arbitrary end systems
 always-on host directly communicate
 permanent IP address  peers request service from
 data centers for scaling other peers, provide service
in return to other peers
clients:  self scalability – new
 communicate with server peers bring new service
client/server  may be intermittently capacity, as well as new
connected service demands
 may have dynamic IP  peers are intermittently
addresses connected and change IP
 do not communicate directly addresses
with each other  complex management

Application Layer 2-7 Application Layer 2-8

Processes communicating Sockets


 process sends/receives messages to/from its socket
process: program running clients, servers  socket analogous to door
within a host client process: process that  sending process shoves message out door
 within same host, two initiates communication  sending process relies on transport infrastructure on
processes communicate server process: process that other side of door to deliver message to socket at
using inter-process waits to be contacted receiving process
communication (defined by
OS) application
application
 processes in different hosts socket controlled by
process process app developer
communicate by exchanging  aside: applications with P2P
transport
messages architectures have client
transport
network network controlled
processes & server link Internet link by OS
processes physical physical

Application Layer 2-9 Application Layer 2-10


Addressing processes App-layer protocol defines
 to receive messages,  identifier includes both IP  types of messages open protocols:
process must have identifier address and port numbers exchanged,  defined in RFCs
 host device has unique 32- associated with process on  e.g., request, response  allows for interoperability
bit IP address host.
 message syntax:  e.g., HTTP, SMTP
 Q: does IP address of host  example port numbers:
 HTTP server: 80
 what fields in messages proprietary protocols:
on which process runs & how fields are
suffice for identifying the  mail server: 25  e.g., Skype
delineated
process?  to send HTTP message to
gaia.cs.umass.edu web  message semantics
server:  meaning of information
 A: no, many processes  IP address: 128.119.245.12 in fields
can be running on same  port number: 80  rules for when and how
host  more shortly… processes send & respond
to messages

Application Layer 2-11 Application Layer 2-12

What transport service does an app need? Transport service requirements: common apps

data integrity throughput application data loss throughput time sensitive


 some apps (e.g., file transfer,  some apps (e.g.,
web transactions) require multimedia) require file transfer no loss elastic no
minimum amount of e-mail no loss elastic no
100% reliable data transfer
Web documents no loss elastic no
 other apps (e.g., audio) can
throughput to be
real-time audio/video loss-tolerant audio: 5kbps-1Mbps yes, 100’s
tolerate some loss “effective” video:10kbps-5Mbps msec
 other apps (“elastic apps”) stored audio/video loss-tolerant same as above
make use of whatever interactive games loss-tolerant few kbps up yes, few secs
timing throughput they get text messaging no loss elastic yes, 100’s
msec
 some apps (e.g., Internet
security yes and no
telephony, interactive
games) require low delay  encryption, data integrity,
to be “effective” …
Application Layer 2-13 Application Layer 2-14
Internet transport protocols services Internet apps: application, transport protocols

TCP service: UDP service: application underlying


reliable transport between application layer protocol transport protocol
  unreliable data transfer
sending and receiving between sending and
process e-mail SMTP [RFC 5321] TCP
receiving process Telnet [RFC 854]
 flow control: sender won’t remote terminal access TCP
overwhelm receiver  does not provide: Web HTTP 1.1 [RFC 7320] TCP
reliability, flow control, file transfer FTP [RFC 959] TCP
 congestion control: throttle congestion control,
sender when network streaming multimedia HTTP (e.g., YouTube), TCP or UDP
overloaded timing, throughput RTP [RFC 1889]
 does not provide: timing, guarantee, security, Internet telephony SIP [RFC 3261], RTP
minimum throughput orconnection setup, [RFC 3550], or proprietary TCP or UDP
guarantee, security
 connection-oriented: setup Q: why bother? Why is
required between client and there a UDP?
server processes

Application Layer 2-15 Application Layer 2-16

CH 2: outline Web and HTTP


2.1 principles of network First, a review…
applications
 web page consists of objects
 app architectures
 app requirements  object can be HTML file, JPEG image, Java applet,
audio file,…
2.2 Web and HTTP
 web page consists of base HTML-file which
includes several referenced objects
 each object is addressable by a URL, e.g.,

www.someschool.edu/someDept/pic.gif

host name path name

Application Layer 2-17 Application Layer 2-18


HTTP overview HTTP overview (continued)
HTTP: hypertext uses TCP: HTTP is “stateless”
transfer protocol  client initiates TCP  server maintains no
 Web’s application layer connection (creates information about
protocol PC running
socket) to server, port 80 past client requests
Firefox browser
 client/server model
 server accepts TCP
 client: browser that connection from client aside
requests, receives, protocols that maintain
(using HTTP protocol) server
 HTTP messages “state” are complex!
and “displays” Web running (application-layer protocol
objects  past history (state) must be
Apache Web messages) exchanged maintained
 server: Web server server between browser (HTTP  if server/client crashes, their
sends (using HTTP client) and Web server views of “state” may be
protocol) objects in iphone running (HTTP server) inconsistent, must be
response to requests Safari browser
 TCP connection closed reconciled

Application Layer 2-19 Application Layer 2-20

HTTP connections Non-persistent HTTP


suppose user enters URL: (contains text,
www.someSchool.edu/someDepartment/home.index references to 10
non-persistent HTTP persistent HTTP jpeg images)
 at most one object  multiple objects can 1a. HTTP client initiates TCP
sent over TCP connection to HTTP server
be sent over single (process) at 1b. HTTP server at host
connection TCP connection www.someSchool.edu on port www.someSchool.edu waiting
between client, server 80
 connection then for TCP connection at port 80.
“accepts” connection, notifying
closed 2. HTTP client sends HTTP request
client
message (containing URL) into
 downloading multiple
TCP connection socket.
objects required Message indicates that client 3. HTTP server receives request
multiple connections wants object message, forms response
someDepartment/home.index message containing requested
object, and sends message into
time its socket

Application Layer 2-21 Application Layer 2-22


Non-persistent HTTP: response time
Non-persistent HTTP (cont.)
RTT (definition): time for a
4. HTTP server closes TCP small packet to travel from
connection. client to server and back
5. HTTP client receives response
message containing html file, HTTP response time: initiate TCP
displays html. Parsing html file, connection
finds 10 referenced jpeg objects  one RTT to initiate TCP
connection RTT
request
time  one RTT for HTTP request file
6. Steps 1-5 repeated for each of and first few bytes of HTTP time to
10 jpeg objects RTT
response to return transmit
file
 file transmission time file
received
 non-persistent HTTP
response time =
time time
2RTT+ file transmission
time

Application Layer 2-23 Application Layer 2-24

Persistent HTTP HTTP request message


non-persistent HTTP issues: persistent HTTP:
 requires 2 RTTs per object  server leaves connection
open after sending  two types of HTTP messages: request, response
 OS overhead for each TCP response
connection  HTTP request message:
 subsequent HTTP  ASCII (human-readable format)
 browsers often open messages between same carriage return character
parallel TCP connections client/server sent over line-feed character
to fetch referenced objects request line
open connection (GET, POST, GET /index.html HTTP/1.1\r\n
 client sends requests as HEAD commands) Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
soon as it encounters a Accept: text/html,application/xhtml+xml\r\n
referenced object header Accept-Language: en-us,en;q=0.5\r\n
 as little as one RTT for all lines Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
the referenced objects Keep-Alive: 115\r\n
carriage return,
Connection: keep-alive\r\n
line feed at start
\r\n
of line indicates
end of header lines
Application Layer 2-25 Application Layer 2-26
HTTP request message: general format Uploading form input
POST method:
sp sp version cr request
method URL lf
line
 web page often includes
header field name value cr lf form input
 input is uploaded to
header
~
~ ~
~ lines
server in entity body

URL method:
header field name value cr lf
 uses GET method
cr lf  input is uploaded in URL
field of request line:
~
~ entity body ~
~ body
www.somesite.com/animalsearch?monkeys&banana

Application Layer 2-27 Application Layer 2-28

Method types HTTP response message


status line
HTTP/1.0: HTTP/1.1: (protocol
status code HTTP/1.1 200 OK\r\n
 GET  GET, POST, HEAD Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
status phrase)
 POST  PUT Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02
 HEAD  uploads file in entity GMT\r\n
body to path specified header ETag: "17dc6-a5c-bf716880"\r\n
 asks server to leave in URL field Accept-Ranges: bytes\r\n
requested object out lines Content-Length: 2652\r\n
of response  DELETE Keep-Alive: timeout=10, max=100\r\n
 deletes file specified in Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-
the URL field 1\r\n
\r\n
data, e.g., data data data data data ...
requested
HTML file
Application Layer 2-29 Application Layer 2-30
HTTP response status codes User-server state: cookies [RFC 6265]
 status code appears in 1st line in server-to- example:
client response message. Many Web sites use cookies  Susan always access Internet
 some sample codes: four components: from PC
1) cookie header line of  visits specific e-commerce
200 OK
HTTP response site for first time
 request succeeded, requested object later in this msg
message  when initial HTTP requests
301 Moved Permanently
2) cookie header line in arrives at site, site creates:
 requested object moved, new location specified later in this msg
(Location:)
next HTTP request  unique ID
message  entry in backend
400 Bad Request
 request msg not understood by server
3) cookie file kept on database for ID
user’s host, managed
404 Not Found by user’s browser
 requested document not found on this server
4) back-end database at
505 HTTP Version Not Supported Web site
Application Layer 2-31 Application Layer 2-32

Cookies: keeping “state” (cont.) Cookies (continued)


client server aside
what cookies can be used cookies and privacy:
for:  cookies permit sites to
ebay 8734
usual http request msg Amazon server
 authorization learn a lot about you
cookie file creates ID  shopping carts
usual http response  you may supply name and
1678 for user create backend  recommendations
ebay 8734
set-cookie: 1678 entry database e-mail to sites
amazon 1678  user session state (Web
usual http request msg
e-mail)
cookie: 1678 cookie- access
specific how to keep “state”:
usual http response msg action
 protocol endpoints: maintain state at
one week later: sender/receiver over multiple
access transactions
ebay 8734 usual http request msg
amazon 1678 cookie: 1678 cookie-  cookies: http messages carry state
specific
usual http response msg action
Application Layer 2-33 Application Layer 2-34
FTP: the file transfer protocol
Chapter 2 FTP FTP
file transfer
FTP
Application Layer user
interface
client server
user
at host remote file
local file system
system

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top  transfer file to/from remote host
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach  client/server model
 If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition  client: side that initiates transfer (either to/from remote)
 If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley  server: remote host
March 2012
Thanks and enjoy! JFK/KWR  ftp: RFC 959
All material copyright 1996-2012
J.F Kurose and K.W. Ross, All Rights Reserved
 ftp server: port 21
Application Layer 2-1 Application Layer 2-2

FTP: separate control, data connections FTP commands, responses


 FTP client contacts FTP server TCP control connection, sample commands: sample return codes
server port 21
at port 21, using TCP  sent as ASCII text over  status code and phrase (as
 client authorized over control control channel in HTTP)
TCP data connection,
connection FTP server port 20 FTP  USER username  331 Username OK,
 client browses remote
client server  PASS password password required
directory, sends commands  LIST return list of file in  125 data
over control connection  server opens another TCP connection
data connection to transfer current directory
 when server receives file already open;
another file  RETR filename transfer starting
transfer command, server
 control connection: “out of retrieves (gets) file 425 Can’t open
opens 2nd TCP data 
connection (for file) to client band”  STOR filename stores data connection
 after transferring one file,  FTP server maintains (puts) file onto remote  452 Error writing
server closes data connection “state”: current directory, host file
earlier authentication

Application Layer 2-3 Application Layer 2-4


Chapter 2: outline Electronic mail outgoing
message queue
user mailbox
user
2.1 principles of network 2.6 P2P applications Three major components: agent
 user agents
applications 2.7 socket programming mail
 mail servers user
 app architectures with UDP and TCP  simple mail transfer protocol:
server agent
 app requirements SMTP SMTP user
mail
2.2 Web and HTTP server agent

2.3 FTP
User Agent SMTP
 a.k.a. “mail reader”
user
2.4 electronic mail  composing, editing, reading mail SMTP
agent
messages mail
 SMTP, POP3, IMAP server
 e.g., Outlook, Thunderbird, user
2.5 DNS iPhone mail client agent
 outgoing, incoming messages user
stored on server agent

Application Layer 2-5 Application Layer 2-6

Electronic mail: mail servers Electronic Mail: SMTP [RFC 2821,5321]


mail servers: user
agent
 uses TCP to reliably transfer email message from client
 mailbox contains incoming to server, port 25
messages for user mail user  direct transfer: sending server to receiving server
server
 message queue of outgoing agent  three phases of transfer
(to be sent) mail messages SMTP mail user
 handshaking (greeting)
server agent  transfer of messages
SMTP  closure
SMTP protocol between mail
servers to send email user
 command/response interaction (like HTTP, FTP)
messages SMTP  commands: ASCII text
agent
mail
 client: sending mail server  response: status code and phrase
server user  messages must be in 7-bit ASCII
agent
 “server”: receiving mail user
server agent

Application Layer 2-7 Application Layer 2-8


Scenario: Alice sends message to Bob Mail access protocols
1) Alice uses UA to compose 4) SMTP client sends Alice’s user
mail access user
message over the TCP SMTP SMTP protocol agent
message “to” agent
[email protected] connection (e.g., POP,
IMAP)
2) Alice’s UA sends message 5) Bob’s mail server places the
to her mail server; message message in Bob’s mailbox sender’s mail receiver’s mail
server server
placed in message queue 6) Bob invokes his user agent
3) client side of SMTP opens to read message  SMTP: delivery/storage to receiver’s server
TCP connection with Bob’s  mail access protocol: retrieval from server
mail server
 POP: Post Office Protocol [RFC 1939]: authorization,
download
 IMAP: Internet Mail Access Protocol [RFC 1730,3501]:
user
1 user mail mail
agent
more features, including manipulation of stored msgs on
agent server server server
2 6
3 4  HTTP: gmail, Hotmail, Yahoo! Mail, etc.
5
Alice’s mail server Bob’s mail server
Application Layer 2-9 Application Layer 2-12

POP3 protocol POP3 (more) and IMAP


S: +OK POP3 server ready
C: user bob
authorization phase S: +OK more about POP3 IMAP
C: pass hungry
 client commands: S: +OK user successfully logged on
 previous example uses  keeps all messages in one
 user: declare username POP3 “download and place: at server
 pass: password C: list delete” mode  allows user to organize
S: 1 498
 server responses
S: 2 912
 Bob cannot re-read e- messages in folders
 +OK S: . mail if he changes  keeps user state across
 -ERR C: retr 1 client sessions:
transaction phase, client: S: <message 1 contents>  POP3 is stateless across  names of folders and
S: . sessions
 list: list message numbers C: dele 1 mappings between
 retr: retrieve message by C: retr 2 message IDs and folder
number S: <message 1 contents> name
 dele: delete S: .
 quit C: dele 2
C: quit
S: +OK POP3 server signing off
Application Layer 2-13 Application Layer 2-14
Chapter 2: outline DNS: domain name system
2.1 principles of network 2.6 P2P applications people: many identifiers: Domain Name System:
applications  SSN, name, passport #  distributed database
2.7 socket programming implemented in hierarchy of
 app architectures Internet hosts, routers:
with UDP and TCP many name servers
 app requirements  IP address (32 bit) -
used for addressing  application-layer protocol: hosts,
2.2 Web and HTTP datagrams name servers communicate to
2.3 FTP  “name”, e.g., resolve names (address/name
www.yahoo.com - translation)
2.4 electronic mail used by humans  note: core Internet function,
 SMTP, POP3, IMAP implemented as application-
Q: how to map between IP layer protocol
2.5 DNS address and name, and  complexity at network’s
vice versa ? “edge”

Application Layer 2-15 Application Layer 2-16

DNS: services, structure DNS: a distributed, hierarchical database


DNS services why not centralize DNS? Root DNS Servers Root
 hostname to IP address  single point of failure … …
translation  traffic volume Top Level
host aliasing com DNS servers org DNS servers edu DNS servers
  distant centralized database Domain
 canonical, alias names  maintenance
pbs.org poly.edu umass.edu Autho-
 mail server aliasing yahoo.com amazon.com DNS serversDNS servers ritative
DNS servers DNS servers DNS servers
 load distribution
 replicated Web client wants IP for www.amazon.com; 1st approx:
servers: many IP A: doesn’t scale!  client queries root server to find com DNS server
addresses correspond
 client queries .com DNS server to get amazon.com DNS server
to one name
 client queries amazon.com DNS server to get IP address for
www.amazon.com

Application Layer 2-17 Application Layer 2-18


DNS: root name servers TLD, authoritative servers
 contacted by local name server that can not resolve name
 root name server:
top-level domain (TLD) servers:
 contacts authoritative name server if name mapping not known
 gets mapping  responsible for com, org, net, edu, aero, jobs, museums,
and all top-level country domains, e.g.: uk, fr, ca, jp, in
 returns mapping to local name server
 Network Solutions maintains servers for .com TLD
 Educause for .edu TLD
c. Cogent, Herndon, VA (5 other sites)
d. U Maryland College Park, MD
h. ARL Aberdeen, MD
k. RIPE London (17 other sites) authoritative DNS servers:
j. Verisign, Dulles VA (69 other sites ) i. Netnod, Stockholm (37 other sites)
 organization’s own DNS server(s), providing
e. NASA Mt View, CA
f. Internet Software C.
m. WIDE Tokyo
(5 other sites) authoritative hostname to IP mappings for organization’s
Palo Alto, CA (and 48 other
sites) named hosts
a. Verisign, Los Angeles CA 13 root name  can be maintained by organization or service provider
(5 other sites)
b. USC-ISI Marina del Rey, CA
“servers”
l. ICANN Los Angeles, CA worldwide
(41 other sites)
g. US DoD Columbus,
OH (5 other sites)

Application Layer 2-19 Application Layer 2-20

Local DNS name server DNS name root DNS server


resolution example
2
 does not strictly belong to hierarchy  host at cs.goa.bits- 3
TLD DNS server
 each ISP (residential ISP, company, university) has pilani.ac.in wants IP 4
one address for
cs.iitkgp.ac.in 5
 also called “default name server”
local DNS server
 when host makes DNS query, query is sent to its iterated query: dns.goa.bits-Pilani.ac.in
local DNS server  contacted server 7 6
1 8
 has local cache of recent name-to-address translation replies with name of
pairs (but may be out of date!) server to contact
authoritative DNS server
 acts as proxy, forwards query into hierarchy  “I don’t know this dns.iitkgp.ac.in
name, but ask this requesting host
cs.goa.bits-Pilani.ac.in
server”
cs.iitkgp.ac.in

Application Layer 2-21 Application Layer 2-22


DNS name root DNS server DNS: caching, updating records
resolution example
2 3  once (any) name server learns mapping, it caches
7
recursive query: 6 mapping
 puts burden of name TLD DNS  cache entries timeout (disappear) after some time (TTL)
server
resolution on  TLD servers typically cached in local name servers
contacted name local DNS server • thus root name servers not often visited
server dns.bits-Pilani.ac.in 5 4
 cached entries may be out-of-date (best effort
 heavy load at upper 1 8 name-to-address translation!)
levels of hierarchy?  if name host changes IP address, may not be known
authoritative DNS server
dns.iitkgp.ac.in
Internet-wide until all TTLs expire
requesting host  update/notify mechanisms proposed IETF standard
cs.bits-Pilani.ac.in
 RFC 2136
cs.iitkgp.ac.in

Application Layer 2-23 Application Layer 2-24

DNS records Inserting records into DNS


DNS: distributed db storing resource records (RR)
RR format: (name, value, type, ttl)
 example: new startup “Network Utopia”

type=A type=CNAME  register name networkuptopia.com at DNS registrar


 name is hostname  name is alias name for some (e.g., Network Solutions)
 value is IP address “canonical” (the real) name  provide names, IP addresses of authoritative name server
type=NS  www.ibm.com is really  registrar inserts two RRs into .com TLD server:
servereast.backup2.ibm.com (networkutopia.com, dns1.networkutopia.com, NS)
 name is domain (e.g.,
foo.com)  value is canonical name (dns1.networkutopia.com, 212.212.212.1, A)
 value is hostname of
authoritative name type=MX
server for this domain  value is name of mailserver
associated with name

Application Layer 2-25 Application Layer 2-26


DNS protocol, messages DNS protocol, messages
 query and reply messages, both with same message
format 2 bytes 2 bytes 2 bytes 2 bytes

msg header identification flags identification flags

 identification: 16 bit # for # questions # answer RRs # questions # answer RRs


query, reply to query uses
# authority RRs # additional RRs # authority RRs # additional RRs
same #
 flags: name, type fields
questions (variable # of questions) questions (variable # of questions)
for a query
 query or reply
 recursion desired answers (variable # of RRs) RRs in response answers (variable # of RRs)
to query
 recursion available
records for
 reply is authoritative authority (variable # of RRs)
authoritative servers
authority (variable # of RRs)

additional info (variable # of RRs) additional “helpful” additional info (variable # of RRs)
info that may be used
Application Layer 2-27 Application Layer 2-28

Attacking DNS
DDoS attacks Redirect attacks
 Bombard root servers  Man-in-middle
with traffic  Intercept queries
 Not successful to date
 Traffic Filtering
 DNS poisoning
 Send bogus replies to
Socket Programming
 Local DNS servers DNS server, which
cache IPs of TLD caches
servers, allowing root Exploit DNS for DDoS
server bypass
 Send queries with
 Bombard TLD servers
spoofed source
 Potentially more
dangerous address: target IP
 Requires amplification
Application Layer 2-29
Application Layer Protocols Data exchange between hosts: General
 HTTP  How do we get two hosts to exchange arbitrary
 DNS data?
 Without trying to use HTTP or SMTP or IMAP
 Email (SMTP/IMAP/POP3)
 Your own protocol?

Sockets! What are sockets?


 Abstract representation of a network connection on
application level
 Corresponding API provided by the host‘s OS
 OS responsible for actual data transmission
 Application responsible for content
 Makes sending data to a connected remote host
similar to simply writing data to a file
 Receiving data is similar to reading from a file
How do Sockets releate to Applications? Socket programming
 Browsers, Webservers Goal: Learn how to build client/server applications that communicate
using sockets
 Use sockets to speak HTTP Socket: Door between application process and end-end-transport
protocol
 Mailservers, Mailclients
 Use sockets to speak SMTP/IMAP/POP3 application
socket application
controlled by
process process app developer

 Peer 2 Peer Apps transport transport


network network controlled
by OS
 Use sockets to speak, e.g., Bittorrent link Internet link
physical physical

Socket Procedures Socket programming


Two socket types for two transport services:
1. Socket • UDP: Unreliable datagram
2. Bind
• TCP: Reliable, byte stream-oriented
3. Listen
4. Accept Application Example:
5. Connect 1. Client reads a line of characters (data) from its keyboard and sends
6. Send data to server
7. Receive 2. Server receives the data and converts characters to uppercase
8. Close 3. Server sends modified data to client
4. Client receives modified data and displays line on its screen
Socket programming with UDP Client-Server Communication (UDP)
 UDP: No “connection” between client & server
 No handshaking before sending data
 Sender explicitly attaches IP dst address and port # to each packet
 Receiver extracts src IP address and port # from received packet

 UDP: Transmitted data may be lost or received out-of-order


 Application viewpoint:
 UDP provides unreliable transfer of groups of bytes (“datagrams”) between
client and server

Client/server socket interaction: UDP Example app: UDP Client


server (running on serverIP) client Python UDPClient
include Python’s socket
library from socket import *
create socket:
create socket, port= x: serverName = ‘hostname’
clientSocket =
serverSocket = socket(AF_INET,SOCK_DGRAM) serverPort = 12000
socket(AF_INET,SOCK_DGRAM)
create UDP socket for server clientSocket = socket(AF_INET,
Create datagram with server IP and
port=x; send datagram via SOCK_DGRAM)
get user keyboard
read datagram from clientSocket input message = raw_input(’Input lowercase sentence:’)
serverSocket
Attach server name, port to clientSocket.sendto(message.encode(),
message; send into socket
write reply to (serverName, serverPort))
serverSocket read datagram from
modifiedMessage, serverAddress =
specifying clientSocket
client address, clientSocket.recvfrom(2048)
port number close
print out received string and close print modifiedMessage.decode()
clientSocket
socket
clientSocket.close()
Example app: UDP Server Socket programming with TCP
Python UDPServer Client must contact server  When contacted by client, server TCP
from socket import *  Server process must first be running creates new socket for server process to
 Server must have created socket communicate with that particular client
serverPort = 12000
(door) that welcomes client’s contact • Allows server to talk with multiple
create UDP socket serverSocket = socket(AF_INET, SOCK_DGRAM) clients
bind socket to local port number serverSocket.bind(('', serverPort))
Client contacts server by: • Source port numbers used to
12000  Creating TCP socket, specifying IP distinguish clients
print (“The server is ready to receive”) address, port number of server
loop forever while True: process
Read from UDP socket into message, clientAddress = serverSocket.recvfrom(2048)  When client creates socket: Client TCP
message, getting client’s address establishes connection to server TCP
(client IP and port) modifiedMessage = message.decode().upper() Application viewpoint:
send upper case string back to this
serverSocket.sendto(modifiedMessage.encode(), TCP provides reliable, in-order
client clientAddress) byte-stream transfer (“pipe”)
between client and server

Client-Server Communication (TCP) Client/server socket interaction: TCP


server (running on hostid) client
create socket,
port=x, for incoming request:
serverSocket = socket()

wait for incoming create socket,


connection request
TCP connect to hostid, port=x
connectionSocket = connection setup clientSocket = socket()
serverSocket.accept()

send request using


read request from clientSocket
connectionSocket

write reply to
connectionSocket read reply from
clientSocket
close
connectionSocket close
clientSocket
Example app: TCP Client Example app: TCP Server
Python TCPClient Python TCPServer
from socket import * from socket import *
serverName = ’servername’ create TCP welcoming
serverPort = 12000
socket serverSocket = socket(AF_INET,SOCK_STREAM)
serverPort = 12000
create TCP socket for server,
remote port 12000
serverSocket.bind((‘’,serverPort))
clientSocket = socket(AF_INET, SOCK_STREAM) server begins listening for incoming
TCP requests
serverSocket.listen(1)
clientSocket.connect((serverName,serverPort)) print ‘The server is ready to receive’
sentence = raw_input(‘Input lowercase sentence:’) loop forever
while True:
clientSocket.send(sentence.encode()) server waits on accept()
No need to attach server name, port connectionSocket, addr = serverSocket.accept()
for incoming requests, new socket
modifiedSentence = clientSocket.recv(1024) created on return

print (‘From Server:’, modifiedSentence.decode()) sentence = connectionSocket.recv(1024).decode()


read bytes from socket (but not
address as in UDP) capitalizedSentence = sentence.upper()
clientSocket.close()
connectionSocket.send(capitalizedSentence.encode())
close connection to this client (but connectionSocket.close()
not welcoming socket)

Specifying Address
Sockets in C
struct in_addr {
int sockid = socket(family, type, protocol);
unsigned long s_addr; /* Internet
● sockid: is socket descriptor struct sockaddr { address (32 bits) */
● family: takes integer as input, communication domain }
unsigned short sa_family; /* Address
○ AF_INET, IPv4 protocol
family (e.g. AF_INET) */ struct sockaddr_in {
○ AF_UNIX, File address, Local communication
unsigned short sin_family;/* Internet
● type: communication (SOCK_STREM, SOCK_DGRAM) char sa_data[14]; /* Family-specific protocol (AF_INET) */
● protocol: IPPROTO_TCP, IPPROTO_UDP address information */ unsigned short sin_port;/* Address
port (16 bits) */
Sockets create interface } struct in_addr sin_addr;/* Internet
address (32 bits) */
char sin_zero[8];/* not used*/
}
bind() Example of bind

int sockid;
int status = bind(sockid, &addrport, size); struct sockaddr_in addrport;
sockid = socket(PF_INET, SOCK_STREAM, 0);

addrport.sin_family = AF_INET;

● addrport: struct sockaddr, IP address, port addrport.sin_port = htons(5100);


○ To accept any incoming connection TCP uses INADDR_ANY addrport.sin_addr.s_addr = htonl(INADDR_ANY);
● size: size of addrport structure
if(bind(sockid, (struct sockaddr *) &addrport, sizeof(addrport))!= -1) {
● status return -1 for any failure ...}

htons and htonl Assigning address to sockets: bind() for TCP listening

htons and htonl takes integer input and we need to convert these integer values to int status = listen(sockid, queueLimit);
network byte order from host byte order.
● sockid: socket descriptor (integer)
● queueLimit: limit the number of active participants can wait for connection
addrport.sin_port = htons(5100); ● Not used for sending and receiving
● Only used by server to get new sockets
Establishing connection Accepting incoming connections

int s = accept(sockid, &clientAddr, &addrLen);


int status = connect(sockid, &foreignAddr, addrlen);

● S: interser, representing new socket


● connect(): to establish connection between host and server ● clientAddr: struct sockaddr– address of active entity
● foreignAddr: struct sockaddr– address of passive entity ● addrLen: sizeof(clientAddr)
● Is a blocking function
● Addrlen: integer, sizeof(name)
● It is a blocking function
Infinite Blocking? How to deal with it?

Exchanging data Closing the socket


int count = send(sockid, msg, msgLen, flags); Once communication is done we close the socket

● msgLen: integer, length of message to be transmitted in bytes


● special options: we use it as 0
status = close(sockid);
int count = recv(sockid, recvBuf, bufLen, flags);
● recvBuf: void[], stores received bytes
● bufLen: bytes received

BOTH are blocking functions


Chapter 3
Transport Layer Sample Quiz Questions:
1. Which protocol is commonly used for DNS communication?
a) HTTP
b) FTP
c) SMTP
d) UDP
Computer
Networking: A Top 2. Which of the following is NOT a function of SMTP?
a) Sending emails
Down Approach b) Receiving emails (at the mail servers)
6th edition c) Relaying emails between mail servers
Jim Kurose, Keith Ross d) Verifying email addresses
Addison-Wesley
March 2012

All material copyright 1996-2013


J.F Kurose and K.W. Ross, All Rights Reserved

Transport Layer 3-1 Transport Layer 3-2

Chapter 3: Transport Layer Chapter 3 outline


our goals: 3.1 transport-layer 3.5 connection-oriented
 understand  learn about Internet services transport: TCP
principles behind transport layer protocols: 3.2 multiplexing and  segment structure
transport layer  UDP: connectionless demultiplexing  reliable data transfer
services: transport  flow control
3.3 connectionless
 multiplexing,  TCP: connection-oriented  connection management
demultiplexing reliable transport transport: UDP
 TCP congestion control 3.4 principles of reliable 3.6 principles of congestion
 reliable data transfer
data transfer control
 flow control
 congestion control 3.7 TCP congestion control

Transport Layer 3-3 Transport Layer 3-4


Transport services and protocols Transport vs. network layer
application
transport
 provide logical communication network
 network layer: logical
between app processes
data link
household analogy:
physical
communication
running on different hosts
between hosts 12 kids in Ann’s house sending
 transport protocols run in letters to 12 kids in Bill’s
end systems  transport layer: house:
 sender side: breaks app logical  hosts = houses

messages into segments, communication  processes = kids

passes to network layer between processes  app messages = letters in


envelopes
 rcvr side: reassembles  relies on, enhances,  transport protocol = Ann
segments into messages,
application
transport network layer and Bill who demux to in-
passes to app layer
network
data link services house siblings
physical
 network-layer protocol =
 more than one transport postal service
protocol available to apps
 Internet: TCP and UDP
Transport Layer 3-5 Transport Layer 3-6

Internet transport-layer protocols Chapter 3 outline


application
 reliable, in-order transport

delivery (TCP)
network
data link 3.1 transport-layer 3.5 connection-oriented
physical
network services transport: TCP
 congestion control network data link


data link physical
physical 3.2 multiplexing and segment structure
 flow control network
data link
demultiplexing  reliable data transfer
 connection setup physical

network
3.3 connectionless  flow control
 unreliable, unordered data link
physical
 connection management
delivery: UDP network
data link
transport: UDP
 no-frills extension of network
physical
3.4 principles of reliable 3.6 principles of congestion
“best-effort” IP
data link
physical
network
application
transport data transfer control
data link network

 services not available: physical data link


physical 3.7 TCP congestion control
 delay guarantees
 bandwidth guarantees

Transport Layer 3-7 Transport Layer 3-8


How does the message reach the
Multiplexing/demultiplexing
Server 1
correct application?
Client 1
HTTP server
client
A1 A2
Application A1 A2
application application
HTTP
msg
transport
Transport
transport network transport
network link network
Network link physical link
physical physical

Multiplexing De-Multiplexing

Transport Layer 3-9 Transport Layer 3-10

How does the message reach the How does the message reach the
correct application? correct application?
HTTP server HTTP server
client client
application application application application
HTTP HTTP
msg msg
transport
Ht HTTP transport
Ht HTTP
msg msg
network transport Hnnetwork
Ht HTTP transport
transport transport
msg
network link network network link network
link physical link link physical link
physical physical physical physical

Transport Layer 3-11 Transport Layer 3-12


How does the message reach the How does the message reach the
correct application? correct application?
HTTP server HTTP server
client client
application application application application
HTTP
msg
transport transport
Ht HTTP
msg
transport network transport transport network transport
network link network Hn H
network
t HTTP
link network
physical msg physical
link link link link
physical physical physical physical

Hn Ht HTTP
msg

Transport Layer 3-13 Transport Layer 3-14

How does the message reach the How does the message reach the
correct application? correct application?
HTTP server
Q: how did transport layer know to deliver message to Firefox
client browser process rather then Netflix process or Skype process?
application application
HTTP
msg
client
transport
Ht HTTP
msg application application
transport
H t HTTP
network transport HTTP
msg msg
network link network HTTP transport
Ht HTTP
msg msg
link physical link transport network transport
physical physical network link network
link physical link
physical physical

Transport Layer 3-15 Transport Layer 3-16


Multiplexing/demultiplexing Connection-oriented demux
multiplexing at sender:
demultiplexing at receiver:  TCP socket identified  server host may support
handle data from multiple use header info to deliver by 4-tuple: many simultaneous TCP
sockets, add transport header (later received segments to correct
used for demultiplexing)
socket  source IP address sockets:
 source port number  each socket identified by
application  dest IP address its own 4-tuple
application P1 P2 application socket  dest port number  web servers have
P3 transport P4
process  demux: receiver uses different sockets for
transport network transport all four values to direct each connecting client
network link network
segment to appropriate  non-persistent HTTP will
link physical link
socket have different socket for
physical physical each request

Transport Layer 3-17 Transport Layer 3-18

Chapter 3 outline Principles of reliable data transfer


 Important in application, transport, link layers
3.1 transport-layer 3.5 connection-oriented  top-10 list of important networking topics!
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

 Characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-19 Transport Layer 3-20
Principles of reliable data transfer Principles of reliable data transfer
 Important in application, transport, link layers  Important in application, transport, link layers
 top-10 list of important networking topics!  top-10 list of important networking topics!

 Characteristics of unreliable channel will determine  Characteristics of unreliable channel will determine
complexity of reliable data transfer protocol (rdt) complexity of reliable data transfer protocol (rdt)
Transport Layer 3-21 Transport Layer 3-22

Reliable data transfer: getting started Reliable data transfer: getting started
We will:
rdt_send(): called from above, deliver_data(): called by
(e.g., by app.). Passed data to rdt to deliver data to upper  Incrementally develop sender, receiver sides of
deliver to receiver upper layer reliable data transfer protocol (rdt)
 Consider only unidirectional data transfer
 but control info will flow on both directions!
send receive  Use finite state machines (FSM) to specify sender,
side side receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
uniquely determined 1 event
udt_send(): called by rdt, rdt_rcv(): called when packet by next event 2
actions
to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver

Transport Layer 3-23 Transport Layer 3-24


rdt1.0: reliable transfer over a reliable channel rdt2.0: channel with bit errors (Stop and Wait)

 Underlying channel perfectly reliable  Underlying channel may flip bits in packet
 no bit errors  checksum to detect bit errors
 no loss of packets  The question: how to detect and recover from
 Separate FSMs for sender, receiver: errors:
 acknowledgements (ACKs): receiver explicitly tells sender
 sender sends data into underlying channel that pkt received OK
 receiver reads data from underlying channel  negative acknowledgements (NAKs): receiver explicitly tells
sender
Howthatdopkthumans recover from “errors”
had errors
 sender retransmits pkt on receipt of NAK
Wait for rdt_send(data) Wait for rdt_rcv(packet)
 new mechanisms during conversation?
in rdt2.0 (beyond rdt1.0):
call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)  error detection
udt_send(packet)
 receiver feedback: control msgs (ACK,NAK) rcvr-
>sender
sender receiver

Transport Layer 3-25 Transport Layer 3-26

rdt2.0: channel with bit errors rdt2.0: FSM specification


rdt_send(data)
 Underlying channel may flip bits in packet sndpkt = make_pkt(data, checksum) receiver
 checksum to detect bit errors udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
 The question: how to detect and recover from Wait for Wait for
isNAK(rcvpkt)
rdt_rcv(rcvpkt) &&
errors: call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
 Acknowledgements (ACKs): receiver explicitly tells sender udt_send(NAK)
that pkt received OK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
 Negative Acknowledgements (NAKs): receiver explicitly L
Wait for
tells sender that pkt had errors call from
below
 sender retransmits pkt on receipt of NAK sender
 new mechanisms in rdt2.0 (beyond rdt1.0): stop and wait rdt_rcv(rcvpkt) &&
 error detection sender sends one packet, notcorrupt(rcvpkt)

 feedback: control msgs (ACK,NAK) from receiver to then waits for receiver extract(rcvpkt,data)

sender response deliver_data(data)


udt_send(ACK)
 retransmission
Transport Layer 3-27 Transport Layer 3-28
rdt2.0: operation with no errors rdt2.0: error scenario
rdt_send(data) rdt_send(data)
snkpkt = make_pkt(data, checksum) snkpkt = make_pkt(data, checksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
isNAK(rcvpkt) isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) && Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt) call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK above NAK
udt_send(NAK) udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for Wait for
L call from L call from
below below

rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&


notcorrupt(rcvpkt) notcorrupt(rcvpkt)
extract(rcvpkt,data) extract(rcvpkt,data)
deliver_data(data) deliver_data(data)
udt_send(ACK) udt_send(ACK)

Transport Layer 3-29 Transport Layer 3-30

rdt2.0 has a fatal flaw! rdt2.1: sender, handles garbled ACK/NAKs

What happens if Handling duplicates: rdt_send(data)


ACK/NAK corrupted?  Sender retransmits current
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
 Sender doesn’t know pkt if ACK/NAK corrupted rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
what happened at Wait for
receiver!  Sender adds sequence number Wait for
call 0 from ACK or
isNAK(rcvpkt) )
to each pkt NAK 0 udt_send(sndpkt)
above
 Can’t just retransmit: rdt_rcv(rcvpkt)
possible duplicate  Receiver discards (doesn’t && notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
deliver up) duplicate pkt && isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
L
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
How many sequence numbers do we need? isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

Transport Layer 3-31


rdt2.1: receiver, handles garbled ACK/NAKs rdt2.1: discussion
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
sender: receiver:
 seq # added to pkt  must check if received
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)

rdt_rcv(rcvpkt) && (corrupt(rcvpkt)


udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)  two seq. #’s (0,1) will packet is duplicate
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum) suffice. Why?  state indicates whether
udt_send(sndpkt) udt_send(sndpkt) 0 or 1 is expected pkt
Wait for  must check if received
0 from
Wait for
seq #
rdt_rcv(rcvpkt) &&
below
1 from rdt_rcv(rcvpkt) &&
ACK/NAK corrupted
not corrupt(rcvpkt) &&
has_seq1(rcvpkt)
below not corrupt(rcvpkt) &&  note: receiver can not
 twice as many states
has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
know if its last
udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
udt_send(sndpkt)  state must ACK/NAK received
&& has_seq1(rcvpkt) “remember” whether OK at sender
extract(rcvpkt,data)
“expected” pkt should
deliver_data(data) have seq # of 0 or 1
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Transport Layer 3-33 Transport Layer 3-34

rdt2.2: a NAK-free protocol rdt2.2: sender, receiver fragments


rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
 same functionality as rdt2.1, using ACKs only udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
 instead of NAK, receiver sends ACK for last pkt Wait for Wait for
( corrupt(rcvpkt) ||
isACK(rcvpkt,1) )
received OK call 0 from ACK
0 udt_send(sndpkt)
above
 receiver must explicitly include seq # of pkt being ACKed sender FSM
 duplicate ACK at sender results in same action as fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
NAK: retransmit current pkt rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || L
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
Transport Layer 3-35 udt_send(sndpkt) Transport Layer 3-36
rdt3.0: channels with errors and loss rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
new assumption: approach: sender waits rdt_rcv(rcvpkt)
udt_send(sndpkt)
start_timer
isACK(rcvpkt,1) )
L
underlying channel can “reasonable” amount of L Wait for Wait
timeout
also lose packets time for ACK call 0from for
ACK0 udt_send(sndpkt)
above
(data, ACKs)  retransmits if no ACK rdt_rcv(rcvpkt)
start_timer

 checksum, seq. #, received in this time && notcorrupt(rcvpkt) rdt_rcv(rcvpkt)


&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
ACKs, retransmissions  if pkt (or ACK) just delayed stop_timer && isACK(rcvpkt,0)
(not lost):
will be of help … but stop_timer

not enough  retransmission will be timeout


Wait Wait for
for call 1 from
duplicate, but seq. #’s udt_send(sndpkt) ACK1 above
already handles this start_timer rdt_rcv(rcvpkt)
L
 receiver must specify seq rdt_rcv(rcvpkt) &&
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
( corrupt(rcvpkt) ||
# of pkt being ACKed isACK(rcvpkt,0) ) udt_send(sndpkt)
 requires countdown timer L
start_timer

Transport Layer 3-37 Transport Layer 3-38

rdt3.0 in action rdt3.0 in action


sender receiver
sender receiver send pkt0 pkt0
sender receiver sender receiver
send pkt0 pkt0 rcv pkt0
send pkt0 send pkt0 ack0 send ack0
pkt0 pkt0 rcv pkt0
rcv pkt0 rcv pkt0 send ack0 rcv ack0
ack0 send pkt1 pkt1
ack0 send ack0 ack0 send ack0 rcv ack0 rcv pkt1
rcv ack0 rcv ack0 send pkt1 pkt1
pkt1 pkt1 rcv pkt1 send ack1
send pkt1 send pkt1 ack1
rcv pkt1 X ack1 send ack1
ack1 loss X
send ack1 loss timeout
rcv ack1 resend pkt1 pkt1
send pkt0 pkt0 timeout rcv pkt1
rcv pkt0 timeout resend pkt1 pkt1 rcv ack1 (detect duplicate)
rcv pkt1 send pkt0 pkt0
ack0 send ack0 resend pkt1 pkt1
(detect duplicate) send ack1
rcv pkt1 ack1 send ack1
ack1 send ack1 rcv ack1 rcv ack1 ack1
pkt0 rcv pkt0
rcv ack1 send pkt0 Do nothing ack0
send pkt0 pkt0 rcv pkt0 send ack0
(a) no loss rcv pkt0 ack0 send ack0
ack0 send ack0

(c) ACK loss (d) premature timeout/ delayed ACK


(b) packet loss
Transport Layer 3-39 Transport Layer 3-40
rdt3.0 in action
sender receiver
Chapter 3 sender receiver send pkt0 pkt0
rcv pkt0
Transport Layer send pkt0 pkt0
rcv pkt0
rcv ack0
ack0 send ack0
ack0 send ack0 pkt1
send pkt1
rcv ack0 rcv pkt1
send pkt1 pkt1
rcv pkt1 send ack1
ack1 ack1
send ack1
X
Computer loss timeout
resend pkt1 pkt1
Networking: A timeout
resend pkt1 pkt1 rcv ack1
rcv pkt1
(detect duplicate)
rcv pkt1
Top Down (detect duplicate)
send pkt0 pkt0
send ack1
ack1
Approach rcv ack1
send ack1
rcv ack1 ack1
rcv pkt0
6th edition send pkt0 pkt0 Do nothing ack0
Jim Kurose, Keith Ross rcv pkt0 send ack0
Addison-Wesley ack0 send ack0
March 2012
All material copyright 1996-2013 (c) ACK loss (d) premature timeout/ delayed ACK
J.F Kurose and K.W. Ross, All Rights Reserved

Transport Layer 3-1 Transport Layer 3-2

rdt3.0: stop-and-wait operation Performance of rdt3.0


sender receiver ❖ rdt3.0 is correct, but performance stinks

first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
L 8000 bits
first packet bit arrives Dtrans = R = = 8 microsecs
RTT last packet bit arrives, send 109 bits/sec
ACK
▪ U sender: utilization – fraction of time sender busy sending
ACK arrives, send next
packet, t = RTT + L / R

▪ if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec


throughput over 1 Gbps link
❖ network protocol limits use of physical resources!
Transport Layer 3-3 Transport Layer 3-4
Pipelined protocols Pipelining: increased utilization
pipelining: sender allows multiple, “in-flight”, yet- sender receiver
first packet bit transmitted, t = 0
to-be-acknowledged pkts last bit transmitted, t = L / R
▪ range of sequence numbers must be increased
▪ buffering at sender and/or receiver first packet bit arrives
RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
3-packet pipelining increases
utilization by a factor of 3!

❖ two generic forms of pipelined protocols: go-Back-N,


selective repeat
Transport Layer 3-5 Transport Layer 3-6

Pipelined protocols: overview Go-Back-N: sender


❖ k-bit seq # in pkt header
Go-back-N: Selective Repeat:
❖ sender can have up to ❖ sender can have up to N
❖ “window” of up to N, consecutive unack’ed pkts allowed
N unacked packets in unack’ed packets in
pipeline pipeline
❖ receiver only sends ❖ rcvr sends individual ack
cumulative ack for each packet
▪ doesn’t ack packet if
there’s a gap
❖ sender has timer for ❖ sender maintains timer ❖ ACK(n): ACKs all pkts up to, including seq # n - “cumulative
oldest unacked packet for each unacked packet ACK”
▪ when timer expires, ▪ when timer expires, ▪ may receive duplicate ACKs (see receiver)
retransmit all unacked retransmit only that ❖ timer for oldest in-flight pkt
packets unacked packet ❖ timeout(n): retransmit packet n and all higher seq # pkts in
window
Transport Layer 3-7 Transport Layer 3-8
GBN: sender extended FSM GBN: receiver extended FSM
rdt_send(data)
default
if (nextseqnum < base+N) { udt_send(sndpkt)
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) rdt_rcv(rcvpkt)
udt_send(sndpkt[nextseqnum]) && notcurrupt(rcvpkt)
if (base == nextseqnum) Λ && hasseqnum(rcvpkt,expectedseqnum)
start_timer expectedseqnum=1 Wait extract(rcvpkt,data)
nextseqnum++ sndpkt = deliver_data(data)
} make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
Λ else udt_send(sndpkt)
refuse_data(data) expectedseqnum++
base=1
nextseqnum=1
timeout
start_timer ACK-only: always send ACK for correctly-received
Wait
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])
pkt with highest in-order seq #
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt) … ▪ may generate duplicate ACKs
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
▪ need only remember expectedseqnum
notcorrupt(rcvpkt)
❖ out-of-order pkt:
base = getacknum(rcvpkt)+1
If (base == nextseqnum) ▪ discard (don’t buffer): no receiver buffering!
stop_timer
else ▪ re-ACK pkt with highest in-order seq #
start_timer
Transport Layer 3-9 Transport Layer 3-10

GBN in action
Selective repeat
sender window (N=4) sender receiver
012345678 send pkt0 ❖ receiver individually acknowledges all correctly
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0 received pkts
012345678 send pkt3 Xloss receive pkt1, send ack1
▪ buffers pkts, as needed, for eventual in-order delivery
(wait) to upper layer
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1 ❖ sender only resends pkts for which ACK not
012345678 rcv ack1, send pkt5 receive pkt4, discard, received
ignore duplicate ACK
(re)send ack1
receive pkt5, discard,
▪ sender timer for each unACKed pkt
(re)send ack1 ❖ sender window
pkt 2 timeout
012345678 send pkt2 ▪ N consecutive seq #’s
012345678
012345678
send
send
pkt3
pkt4 rcv pkt2, deliver, send ack2 ▪ limits seq #s of sent, unACKed pkts
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

Transport Layer 3-11 Transport Layer 3-12


Selective repeat: sender, receiver windows Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
❖ if next available seq # in ❖ send ACK(n)
window, send pkt ❖ out-of-order: buffer
timeout(n): ❖ in-order: deliver (also
❖ resend pkt n, restart deliver buffered, in-order
timer pkts), advance window to
next not-yet-received pkt
ACK(n) in [sendbase,sendbase+N]:
❖ mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
❖ ACK(n)
❖ if n smallest unACKed
pkt, advance window base otherwise:
to next unACKed seq # ❖ ignore

Transport Layer 3-13 Transport Layer 3-14

Selective repeat in action Chapter 3 outline


sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
3.1 transport-layer 3.5 connection-oriented
012345678 send pkt2 receive pkt0, send ack0 services transport: TCP
receive pkt1, send ack1
012345678 send pkt3 Xloss
3.2 multiplexing and ▪ segment structure
(wait)
receive pkt3, buffer, demultiplexing ▪ reliable data transfer
012345678 rcv ack0, send pkt4 send ack3
3.3 connectionless ▪ flow control
rcv ack1, send pkt5

012345678 receive pkt4, buffer,
transport: UDP connection management
send ack4
record ack3 arrived receive pkt5, buffer, 3.4 principles of reliable 3.6 principles of congestion
pkt 2 timeout
send ack5
data transfer control
012345678 send pkt2 3.7 TCP congestion control
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack5 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

Transport Layer 3-15 Transport Layer 3-16


TCP: Overview RFCs: 793,1122,1323, 2018, 2581 TCP segment structure
32 bits

❖ ❖
URG: urgent data
point-to-point: full duplex data: (generally not used) source port # dest port # counting
by bytes
▪ one sender, one receiver ▪ bi-directional data flow sequence number of data
ACK: ACK #
❖ reliable, in-order byte in same connection valid acknowledgement number (not segments!)
steam: ▪ MSS: maximum segment head not
UAP R S F receive window
PSH: push data now len used
size # bytes
❖ pipelined: (generally not used) checksum Urg data pointer

rcvr willing
connection-oriented:
▪ TCP congestion and RST, SYN, FIN: to accept
flow control set window ▪ handshaking (exchange connection estab
options (variable length)

size of control msgs) inits (setup, teardown


sender, receiver state commands)
application
before data exchange
Internet data
❖ flow controlled: checksum (variable length)
▪ sender will not (as in UDP)
overwhelm receiver
Transport Layer 3-17 Transport Layer 3-18

TCP seq. numbers, ACKs TCP seq. numbers, ACKs


outgoing segment from sender
sequence numbers: source port # dest port #
Host Host B
▪ byte stream “number” of
sequence number
acknowledgement number
A
first byte in segment’s checksum
rwnd
urg pointer
data User
window size types
acknowledgements: N ‘C’ Seq=42, ACK=79, data = ‘C’
▪ seq # of next byte host ACKs
receipt of
expected from other side sender sequence number space ‘C’, echoes
▪ cumulative ACK host ACKs
Seq=79, ACK=43, data = ‘C’ back ‘C’
Q: how receiver handles sent
ACKed
sent, not- usable not
yet ACKed but not usable
receipt
of echoed
out-of-order segments (“in-flight”) yet sent ‘C’ Seq=43, ACK=80
▪ A: TCP spec doesn’t say, incoming segment to sender
- up to implementor source port # dest port #
simple telnet scenario
sequence number
acknowledgement number
A rwnd
checksum urg pointer

Transport Layer 3-19 Transport Layer 3-20


TCP round trip time, timeout TCP round trip time, timeout
EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT
Q: how to set TCP Q: how to estimate RTT?
❖ ❖ exponential weighted moving average
timeout value? SampleRTT: measured
time from segment ❖ influence of past sample decreases exponentially fast
❖ longer than RTT ❖ typical value: α = 0.125
transmission until ACK
▪ but RTT varies receipt
❖ too short: premature ▪ ignore retransmissions RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

timeout, unnecessary ❖ SampleRTT will vary, want

(milliseconds)
retransmissions estimated RTT “smoother”

RTT
▪ average several recent
❖ too long: slow measurements, not just
reaction to segment current SampleRTT
loss sampleRTT
EstimatedRTT

Transport Layer 3-21 time (seconds) Transport Layer 3-22

TCP round trip time, timeout Chapter 3 outline


❖ timeout interval: EstimatedRTT plus “safety margin” 3.1 transport-layer 3.5 connection-oriented
▪ large variation in EstimatedRTT -> larger safety margin services transport: TCP
❖ estimate SampleRTT deviation from EstimatedRTT: 3.2 multiplexing and ▪ segment structure
DevRTT = (1-β)*DevRTT + demultiplexing ▪ reliable data transfer
β*|SampleRTT-EstimatedRTT| 3.3 connectionless ▪ flow control
(typically, β = 0.25) transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
TimeoutInterval = EstimatedRTT + 4*DevRTT data transfer control
3.7 TCP congestion control
estimated RTT “safety margin”

Transport Layer 3-23 Transport Layer 3-24


TCP reliable data transfer TCP sender events:
data rcvd from app: timeout:
❖ TCP creates rdt service ❖ create segment with ❖ retransmit segment
on top of IP’s unreliable seq # that caused timeout
service
❖ seq # is byte-stream ❖ restart timer
▪ pipelined segments
let’s initially consider number of first data ack rcvd:
▪ cumulative acks byte in segment
▪ single retransmission simplified TCP sender: ❖ if ack acknowledges
timer ▪ ignore duplicate acks ❖ start timer if not previously unacked
❖ retransmissions ▪ ignore flow control, already running segments
triggered by: congestion control ▪ think of timer as for ▪ update what is known
oldest unacked to be ACKed
▪ timeout events segment
▪ duplicate acks ▪ start timer if there are
▪ expiration interval: still unacked segments
TimeOutInterval

Transport Layer 3-25 Transport Layer 3-26

TCP: retransmission scenarios TCP: retransmission scenarios


Host A Host B Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of Seq=92, 8 bytes of Seq=92, 8 bytes of
data data data
Seq=100, 20 bytes of data Seq=100, 20 bytes of data
timeo

timeo

ACK=100
ut

ut

X ACK=100

timeo
X

ut
ACK=100
ACK=120 ACK=120

Seq=92, 8 bytes of Seq=92, 8


data SendBase=100 bytes of data
Seq=120, 15 bytes of data
SendBase=120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout cumulative ACK


Transport Layer 3-28 Transport Layer 3-29
TCP ACK generation [RFC 1122, RFC 2581] TCP fast retransmit
❖ time-out period often
event at receiver TCP receiver action TCP fast retransmit
relatively long:
arrival of in-order segment with delayed ACK. Wait up to 500ms ▪ long delay before if sender receives 3
expected seq #. All data up to for next segment. If no next segment, resending lost packet ACKs for same data
expected seq # already ACKed send ACK
❖ detect lost segments (“triple
(“triple duplicate
duplicate ACKs”),
ACKs”),
arrival of in-order segment with immediately send single cumulative via duplicate ACKs. resend unacked
expected seq #. One other ACK, ACKing both in-order segments
▪ sender often sends segment with smallest
segment has ACK pending
many segments back- seq #
to-back
arrival of out-of-order segment immediately send duplicate ACK, ▪ likely that unacked
higher-than-expect seq. # . indicating seq. # of next expected byte ▪ if segment is lost, there segment lost, so don’t
Gap detected will likely be many wait for timeout
duplicate ACKs.
arrival of segment that immediate send ACK, provided that
partially or completely fills gap segment starts at lower end of gap

Transport Layer 3-30 Transport Layer 3-31

TCP fast retransmit


Host A Host B Chapter 3
Transport Layer
Seq=92, 8 bytes of
Seq=100, data
20 bytes of data
X

ACK=100
Computer
Networking: A
timeo

ACK=100
Top Down
ut

ACK=100
ACK=100 Approach
Seq=100, 20 bytes of data 6th edition
Jim Kurose, Keith Ross
Addison-Wesley
March 2012
All material copyright 1996-2013
fast retransmit after sender J.F Kurose and K.W. Ross, All Rights Reserved
receipt of triple duplicate ACK
Transport Layer 3-32 Transport Layer 3-1
Chapter 3: Transport Layer Chapter 3 outline
our goals: 3.1 transport-layer 3.5 connection-oriented
❖ understand ❖ learn about Internet services transport: TCP
principles behind transport layer protocols: 3.2 multiplexing and ▪ segment structure
transport layer ▪ UDP: connectionless demultiplexing ▪ reliable data transfer
services: transport
3.3 connectionless ▪ flow control
▪ multiplexing, ▪ TCP: connection-oriented ▪ connection management
demultiplexing reliable transport transport: UDP
▪ TCP congestion control 3.4 principles of reliable 3.6 principles of congestion
▪ reliable data transfer
data transfer control
▪ flow control
▪ congestion control 3.7 TCP congestion control

Transport Layer 3-2 Transport Layer 3-3

Transport services and protocols Transport vs. network layer


application

❖ ❖
transport
provide logical network
network layer:
communication between
data link
household
physical
logical
app processes running on
communication 12 kidsanalogy:
in Ann’s house sending
different hosts letters to 12 kids in Bill’s
❖ transport protocols run in
between hosts house:
end systems ❖ transport layer: ❖ hosts = houses
logical ❖ processes = kids
▪ sender side: breaks app
communication ❖ app messages = letters in
messages into segments, envelopes
passes to network layer between processes ❖ transport protocol = Ann
▪ rcvr side: reassembles
application
transport ▪ relies on, enhances, and Bill who demux to in-
segments into messages,
network
data link network layer house siblings
services ❖ network-layer protocol =
physical
passes to app layer
postal service
❖ more than one transport
protocol available to apps
▪ Internet: TCP and UDP
Transport Layer 3-4 Transport Layer 3-5
Internet transport-layer protocols Chapter 3 outline

application
reliable, in-order transport

delivery (TCP)
network
data link 3.1 transport-layer 3.5 connection-oriented
physical
network services transport: TCP
▪ congestion control network data link
data link physical
3.2 multiplexing and ▪ segment structure
▪ flow control
physical
network
data link
demultiplexing ▪ reliable data transfer
▪ connection setup physical

3.3 connectionless ▪ flow control



network
unreliable, unordered data link
▪ connection management
delivery: UDP
physical
network transport: UDP
▪ no-frills extension of network
data link
physical
3.4 principles of reliable 3.6 principles of congestion
“best-effort” IP
data link
physical
network
application
transport data transfer control
network

❖ services not available:


data link
physical data link
physical
3.7 TCP congestion control
▪ delay guarantees
▪ bandwidth guarantees

Transport Layer 3-6 Transport Layer 3-7

How does the message reach the


Multiplexing/demultiplexing
correct application?
Client 1 Server 1
HTTP server

A
client
A
Application A A
1 2 application application
1 2 HTTP
msg
transport
Transpor
transport network transport
t link
network network
Network link physical link
physical physical

Multiplexing De-Multiplexing

Transport Layer 3-8 Transport Layer 3-9


How does the message reach the How does the message reach the
correct application? correct application?
HTTP server HTTP server
client client
application application application application
HTTP HTTP
msg msg
transport
Ht HTTP transport
Ht HTTP
msg msg
network Hnnetwork
Ht HTTP
transport transport transport transport
msg
network link network network link network
link physical link link physical link
physical physical physical physical

Transport Layer 3-10 Transport Layer 3-11

How does the message reach the How does the message reach the
correct application? correct application?
HTTP server HTTP server
client client
application application application application
HTTP
msg
transport transport
Ht HTTP
msg
transport network transport transport network transport
network link network Hn H
network
t HTTP
link network
physical msg physical
link link link link
physical physical physical physical

Hn Ht HTTP
msg

Transport Layer 3-12 Transport Layer 3-13


How does the message reach the How does the message reach the
correct application? correct application?
HTTP server
Q: how did transport layer know to deliver message to Firefox
client browser process rather then Netflix process or Skype process?
application application
HTTP
msg client
transport
Ht HTTP
msg application application
H
transport
t HTTP
network transport HTTP
msg msg
network link network
HTTP transport
Ht HTTP
msg msg
link physical link transport network transport
physical physical network link network
link physical link
physical physical

Transport Layer 3-14 Transport Layer 3-15

Multiplexing/demultiplexing How demultiplexing works


multiplexing at
handle data from multiple demultiplexing at ❖ host receives IP datagrams 32 bits
sender: use header info to deliver
sockets, add transport header received receiver:
segments to ▪ each datagram has source IP source port # dest port #
(later used for demultiplexing) correct address, destination IP
socket address
other header fields
application
▪ each datagram carries one
transport-layer segment
P1 P2
application application socket ▪ each segment has source, application
P3 P4
transport
process destination port number data

network
transport
link
transport
host uses IP addresses & (payload)
network network
link physical link
port numbers to direct
physical physical segment to appropriate
TCP/UDP segment format
socket

Transport Layer 3-16 Transport Layer 3-17


Connectionless demultiplexing Port Numbers
❖ recall: created socket has ❖ recall: when creating ❖ Well known port ❖ Other port numbers
host-local port #: datagram to send into numbers ▪ 1024 to 65535
UDP socket, must specify ▪ 0 to 1023
▪ destination IP address ▪ RFC 1700
▪ destination port #

❖ when host receives UDP IP datagrams with same


segment: dest. port #, but different
▪ checks destination port # source IP addresses
in segment and/or source port
numbers will be directed
▪ directs UDP segment to to same socket at dest
socket with that port #

Transport Layer 3-18 Transport Layer 3-19

Connectionless demux: example Connection-oriented demux


DatagramSocket
DatagramSocket serverSocket = new
DatagramSocket ❖ TCP socket identified ❖ server host may support
DatagramSocket
mySocket2 = new
DatagramSocket (6428);
mySocket1 = new
DatagramSocket
by 4-tuple: many simultaneous TCP
(9157); application (5775); ▪ source IP address sockets:
application P application ▪ source port number ▪ each socket identified by
P 1 P
3 4 ▪ dest IP address its own 4-tuple
transport
transport
network
transport ▪ dest port number ❖ web servers have
network link network
❖ demux: receiver uses different sockets for
link physical link
all four values to direct each connecting client
physical physical
segment to appropriate ▪ non-persistent HTTP will
source port: 6428 source port: ? socket have different socket for
dest port: 9157 dest port: ? each request

source port: 9157 source port: ?


dest port: 6428 dest port: ?

Transport Layer 3-20 Transport Layer 3-21


Connection-oriented demux: example Connection-oriented demux: example
threaded server
application application
application P P P application application application
P 4 5 6 P P P P4 P P
3 2 3 3 2 3
transport transport
transport transport transport transport
network network
network link network network link network
link physical link link physical link
physical server: IP physical physical server: IP physical
address B address B

host: IP source IP,port: B,80 host: IP host: IP source IP,port: B,80 host: IP
address A dest IP,port: A,9157 source IP,port: C,5775 address C address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80 dest IP,port: B,80
source IP,port: A,9157 source IP,port: A,9157
dest IP, port: B,80 dest IP, port: B,80
source IP,port: source IP,port:
C,9157 C,9157
dest IP,port: B,80 dest IP,port: B,80
three segments, all destined to IP address: B,
dest port: 80 are demultiplexed to different sockets Transport Layer 3-22 Transport Layer 3-23

Chapter 3 outline UDP: User Datagram Protocol [RFC 768]


❖ “no frills,” “bare bones” ❖ UDP use:
3.1 transport-layer 3.5 connection-oriented Internet transport ▪ streaming multimedia
services transport: TCP protocol apps (loss tolerant, rate
3.2 multiplexing and ▪ segment structure ❖ “best effort” service, UDP sensitive)
demultiplexing ▪ reliable data transfer segments may be: ▪ DNS
▪ flow control ▪ lost ▪ SNMP
3.3 connectionless ▪ delivered out-of-order
transport: UDP ▪ connection management ❖ reliable transfer over
to app
3.6 principles of congestion UDP:
3.4 principles of reliable ❖ connectionless:
control ▪ add reliability at
data transfer ▪ no handshaking application layer
3.7 TCP congestion control between UDP sender,
receiver ▪ application-specific error
recovery!
▪ each UDP segment
handled independently
of others
Transport Layer 3-24 Transport Layer 3-25
Why UDP?

❖ No connection establishment (which can add


delay)
❖ Simple: no connection state at sender, receiver
❖ Small header size
❖ No congestion control: UDP can blast away as
fast as desired

Transport Layer 3-26 Transport Layer 3-27

UDP: segment header UDP checksum


32 bits
Goal: detect “errors” (e.g., flipped bits) in
source port # dest port # transmitted segment
length checksum
1st number 2nd number sum

Transmitted: 5 6 11
application length, in bytes of
data UDP segment,
(payload) including header
Received: 4 6 11

data to/from receiver-computed sender-computed


checksum
= checksum (as received)
UDP segment format application layer

Transport Layer 3-28 Transport Layer 3-29


UDP checksum Internet checksum: example
Goal: detect “errors” (e.g., flipped bits) in example: add two 16-bit integers
transmitted segment
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
sender: receiver: 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
❖ treat segment contents, ❖ compute checksum of
including header fields, wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
received segment
as sequence of 16-bit ❖ check if computed
integers sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum equals checksum
❖ checksum: addition field value:
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
(one’s complement sum)
of segment contents ▪ NO - error detected
❖ sender puts checksum ▪ YES - no error detected. Note: when adding numbers, a carryout from the most
value into UDP But maybe errors significant bit needs to be added to the result
checksum field nonetheless? More later
….
Transport Layer 3-30 Transport Layer 3-31

Why Internet checksum at Transport Layer? Checksum example


32 bits
❖ Isn’t link layer error detection enough? 1156 0050
❖ 1156+0050+000C+F234+FAAA
❖ =1 FE90
❖ No guarantees 000C checksum wraparound 1
❖ In-memory error F234 FAA ❖ Sum =FE91

❖ End to end principle Checksum = 016E
A
application
data ❖ Insert this into the checksum
(payload) field

❖ At receiver:
▪ Do the whole checksum.
▪ If sum is FFFF, there is no error

Transport Layer 3-32 Transport Layer 3-33


Chapter 3 outline Principles of reliable data transfer
❖ Important in application, transport, link layers
3.1 transport-layer 3.5 connection-oriented ▪ top-10 list of important networking topics!
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

❖ Characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-34 Transport Layer 3-35

Principles of reliable data transfer Principles of reliable data transfer


❖ Important in application, transport, link layers ❖ Important in application, transport, link layers
▪ top-10 list of important networking topics! ▪ top-10 list of important networking topics!

❖ Characteristics of unreliable channel will determine ❖ Characteristics of unreliable channel will determine
complexity of reliable data transfer protocol (rdt) complexity of reliable data transfer protocol (rdt)
Transport Layer 3-36 Transport Layer 3-37
Reliable data transfer: getting started Reliable data transfer: getting started
We will:
rdt_send(): called from above, deliver_data(): called by
(e.g., by app.). Passed data to rdt to deliver data to upper ❖ Incrementally develop sender, receiver sides of
deliver to receiver upper layer reliable data transfer protocol (rdt)
❖ Consider only unidirectional data transfer
▪ but control info will flow on both directions!

send receive Use finite state machines (FSM) to specify sender,
side side receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
uniquely determined 1 event
udt_send(): called by rdt, rdt_rcv(): called when packet by next event 2
actions
to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver

Transport Layer 3-38 Transport Layer 3-39

rdt1.0: reliable transfer over a reliable channel rdt2.0: channel with bit errors
❖ Underlying channel perfectly reliable ❖ Underlying channel may flip bits in packet
▪ no bit errors ▪ checksum to detect bit errors
▪ no loss of packets ❖ The question: how to detect and recover from
❖ Separate FSMs for sender, receiver: errors:
▪ acknowledgements (ACKs): receiver explicitly tells
▪ sender sends data into underlying channel sender that pkt received OK
▪ receiver reads data from underlying channel ▪ negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
How do humans recover from
▪ sender retransmits pkt on receipt of NAK
Wait for rdt_send(data) Wait for rdt_rcv(packet)
❖ new mechanisms in rdt2.0 “errors”(beyond rdt1.0):
call from call from
▪ error detection
extract (packet,data)
above packet = make_pkt(data) below deliver_data(data) during conversation?
udt_send(packet)
▪ receiver feedback: control msgs (ACK,NAK) rcvr-
>sender
sender receiver

Transport Layer 3-40 Transport Layer 3-41


rdt2.0: channel with bit errors rdt2.0: FSM specification
rdt_send(data)
❖ Underlying channel may flip bits in packet sndpkt = make_pkt(data, checksum) receiver
▪ checksum to detect bit errors udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
❖ The question: how to detect and recover from isNAK(rcvpkt)
Wait for Wait for
errors: call from ACK or udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
▪ Acknowledgements (ACKs): receiver explicitly tells above NAK
udt_send(NAK)
sender that pkt received OK
▪ Negative Acknowledgements (NAKs): receiver explicitly rdt_rcv(rcvpkt) && isACK(rcvpkt)
Wait for
Λ
tells sender that pkt had errors call from
▪ sender retransmits pkt on receipt of NAK sender below

❖ new mechanisms in rdt2.0 (beyond rdt1.0): stop and wait rdt_rcv(rcvpkt) &&
▪ error detection sender sends one packet, notcorrupt(rcvpkt)
▪ feedback: control msgs (ACK,NAK) from receiver to then waits for receiver extract(rcvpkt,data)
sender deliver_data(data)
response udt_send(ACK)
▪ retransmission
Transport Layer 3-42 Transport Layer 3-43

rdt2.0: operation with no errors rdt2.0: error scenario


rdt_send(data) rdt_send(data)
snkpkt = make_pkt(data, checksum) snkpkt = make_pkt(data, checksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
isNAK(rcvpkt) isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) && Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt) call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK above NAK
udt_send(NAK) udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for Wait for
Λ call from Λ call from
below below

rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&


notcorrupt(rcvpkt) notcorrupt(rcvpkt)
extract(rcvpkt,data) extract(rcvpkt,data)
deliver_data(data) deliver_data(data)
udt_send(ACK) udt_send(ACK)

Transport Layer 3-44 Transport Layer 3-45


rdt2.0 has a fatal flaw! rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
What happens if Handling duplicates: sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
ACK/NAK corrupted? ❖ Sender retransmits current
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
❖ Sender doesn’t know pkt if ACK/NAK corrupted Wait for Wait for
ACK or
isNAK(rcvpkt) )
what happened at ❖ Sender adds sequence
call 0 from
above NAK 0 udt_send(sndpkt)
receiver! rdt_rcv(rcvpkt)
number to each pkt
❖ Can’t just retransmit: && notcorrupt(rcvpkt) rdt_rcv(rcvpkt)

possible duplicate ❖ Receiver discards (doesn’t && isACK(rcvpkt) && notcorrupt(rcvpkt)


&& isACK(rcvpkt)
deliver up) duplicate pkt Λ
Λ
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
How many sequence numbers do we need? udt_send(sndpkt)
udt_send(sndpkt)

Transport Layer 3-46 Transport Layer 3-47

rdt2.1: receiver, handles garbled ACK/NAKs rdt2.1: discussion


rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
sender: receiver:
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
❖ seq # added to pkt ❖ must check if received
❖ two seq. #’s (0,1) will packet is duplicate
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum) suffice. Why? ▪ state indicates whether
udt_send(sndpkt) 0 or 1 is expected pkt
❖ must check if received
udt_send(sndpkt)
Wait for
0 from
Wait for seq #
rdt_rcv(rcvpkt) && 1 from rdt_rcv(rcvpkt) && ACK/NAK corrupted
❖ note: receiver can not
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt) ❖ twice as many states
know if its last
▪ state must “remember”
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt) ACK/NAK received
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
whether “expected” OK at sender
pkt should have seq #
extract(rcvpkt,data)
deliver_data(data) of 0 or 1
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Transport Layer 3-48 Transport Layer 3-49


rdt2.2: a NAK-free protocol rdt2.2: sender, receiver fragments
rdt_send(data)


sndpkt = make_pkt(0, data, checksum)
same functionality as rdt2.1, using ACKs only udt_send(sndpkt) rdt_rcv(rcvpkt) &&
❖ instead of NAK, receiver sends ACK for last pkt ( corrupt(rcvpkt) ||
Wait for Wait for
received OK call 0 from ACK isACK(rcvpkt,1) )
0 udt_send(sndpkt)
▪ receiver must explicitly include seq # of pkt being ACKed above
sender FSM
❖ duplicate ACK at sender results in same action as fragment rdt_rcv(rcvpkt)
NAK: retransmit current pkt && notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) || Λ
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
Transport Layer 3-50 udt_send(sndpkt) Transport Layer 3-51

rdt3.0: channels with errors and loss rdt3.0 sender


rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
new assumption: approach: sender waits udt_send(sndpkt)
start_timer
isACK(rcvpkt,1) )
Λ
“reasonable” amount of
rdt_rcv(rcvpkt)
underlying channel can Λ Wait for Wait
timeout
also lose packets time for ACK call 0from for
ACK0 udt_send(sndpkt)

above
(data, ACKs) retransmits if no ACK rdt_rcv(rcvpkt)
start_timer

▪ checksum, seq. #, received in this time && notcorrupt(rcvpkt) rdt_rcv(rcvpkt)



&& isACK(rcvpkt,1)
ACKs, retransmissions if pkt (or ACK) just delayed && notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
(not lost): stop_timer
will be of help … but stop_timer

not enough ▪ retransmission will be Wait Wait for


timeout
duplicate, but seq. #’s udt_send(sndpkt)
for
ACK1
call 1 from
above
already handles this start_timer rdt_rcv(rcvpkt)
Λ
▪ receiver must specify seq rdt_rcv(rcvpkt) &&
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
# of pkt being ACKed ( corrupt(rcvpkt) ||
isACK(rcvpkt,0) ) udt_send(sndpkt)
❖ requires countdown timer start_timer
Λ

Transport Layer 3-52 Transport Layer 3-53


rdt3.0 in action rdt3.0 in action
sender receiver
sender receiver send pkt0 pkt0
sender receiver sender receiver
send pkt0 pkt0 rcv pkt0
send pkt0 send pkt0 ack0 send ack0
pkt0 pkt0 rcv pkt0
rcv pkt0 rcv pkt0 send ack0 rcv ack0
ack0 send pkt1 pkt1
ack0 send ack0 ack0 send ack0 rcv ack0
pkt1 rcv pkt1
rcv ack0 rcv ack0 send pkt1 send ack1
send pkt1 pkt1 send pkt1 pkt1 rcv pkt1
X ack1 ack1
rcv pkt1 send ack1
loss X
ack1 send ack1 loss timeout
rcv ack1 resend pkt1 pkt1
send pkt0 pkt0 timeout rcv pkt1
rcv pkt0 timeout resend pkt1 pkt1 rcv ack1 (detect duplicate)
rcv pkt1 send pkt0 pkt0
ack0 send ack0 resend pkt1 pkt1 (detect duplicate) send ack1
rcv pkt1 ack1 send ack1
ack1 send ack1 rcv ack1 rcv ack1 ack1
rcv pkt0
rcv ack1 send pkt0 pkt0 Do nothing ack0
send pkt0 pkt0 rcv pkt0 send ack0
(a) no loss rcv pkt0 ack0 send ack0
ack0 send ack0

(c) ACK loss (d) premature timeout/ delayed ACK


(b) packet loss
Transport Layer 3-54 Transport Layer 3-55

rdt3.0: stop-and-wait operation Performance of rdt3.0


sender receiver ❖ rdt3.0 is correct, but performance stinks

first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
L 8000 bits
first packet bit arrives Dtrans = R = = 8 microsecs
RTT last packet bit arrives, send 109 bits/sec
ACK
▪ U sender: utilization – fraction of time sender busy sending
ACK arrives, send next
packet, t = RTT + L / R

▪ if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec


throughput over 1 Gbps link
❖ network protocol limits use of physical resources!
Transport Layer 3-56 Transport Layer 3-57
Pipelined protocols Pipelining: increased utilization
pipelining: sender allows multiple, “in-flight”, yet- sender receiver
first packet bit transmitted, t = 0
to-be-acknowledged pkts last bit transmitted, t = L / R
▪ range of sequence numbers must be increased
▪ buffering at sender and/or receiver first packet bit arrives
RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
3-packet pipelining increases
utilization by a factor of 3!

❖ two generic forms of pipelined protocols: go-Back-N,


selective repeat
Transport Layer 3-58 Transport Layer 3-59

Pipelined protocols: overview Go-Back-N: sender


❖ k-bit seq # in pkt header
Go-back-N: Selective Repeat:
❖ sender can have up to ❖ sender can have up to N
❖ “window” of up to N, consecutive unack’ed pkts allowed
N unacked packets in unack’ed packets in
pipeline pipeline
❖ receiver only sends ❖ rcvr sends individual ack
cumulative ack for each packet
▪ doesn’t ack packet if
there’s a gap
❖ sender has timer for ❖ sender maintains timer ❖ ACK(n): ACKs all pkts up to, including seq # n - “cumulative
oldest unacked packet for each unacked packet ACK”
▪ when timer expires, ▪ when timer expires, ▪ may receive duplicate ACKs (see receiver)
retransmit all unacked retransmit only that ❖ timer for oldest in-flight pkt
packets unacked packet ❖ timeout(n): retransmit packet n and all higher seq # pkts in
window
Transport Layer 3-60 Transport Layer 3-61
GBN: sender extended FSM GBN: receiver extended FSM
rdt_send(data)
default
if (nextseqnum < base+N) { udt_send(sndpkt)
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) rdt_rcv(rcvpkt)
udt_send(sndpkt[nextseqnum]) && notcurrupt(rcvpkt)
if (base == nextseqnum) Λ && hasseqnum(rcvpkt,expectedseqnum)
start_timer expectedseqnum=1 Wait extract(rcvpkt,data)
nextseqnum++ sndpkt = deliver_data(data)
} make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
Λ else udt_send(sndpkt)
refuse_data(data) expectedseqnum++
base=1
nextseqnum=1
timeout
start_timer ACK-only: always send ACK for correctly-received
Wait
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])
pkt with highest in-order seq #
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt) … ▪ may generate duplicate ACKs
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
▪ need only remember expectedseqnum
notcorrupt(rcvpkt)
❖ out-of-order pkt:
base = getacknum(rcvpkt)+1
If (base == nextseqnum) ▪ discard (don’t buffer): no receiver buffering!
stop_timer
else ▪ re-ACK pkt with highest in-order seq #
start_timer
Transport Layer 3-62 Transport Layer 3-63

GBN in action
Selective repeat
sender window (N=4) sender receiver
012345678 send pkt0 ❖ receiver individually acknowledges all correctly
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0 received pkts
012345678 send pkt3 Xloss receive pkt1, send ack1
▪ buffers pkts, as needed, for eventual in-order delivery
(wait) to upper layer
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1 ❖ sender only resends pkts for which ACK not
012345678 rcv ack1, send pkt5 receive pkt4, discard, received
ignore duplicate ACK
(re)send ack1
receive pkt5, discard,
▪ sender timer for each unACKed pkt
(re)send ack1 ❖ sender window
pkt 2 timeout
012345678 send pkt2 ▪ N consecutive seq #’s
012345678
012345678
send
send
pkt3
pkt4 rcv pkt2, deliver, send ack2 ▪ limits seq #s of sent, unACKed pkts
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

Transport Layer 3-64 Transport Layer 3-65


Selective repeat: sender, receiver windows Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
❖ if next available seq # in ❖ send ACK(n)
window, send pkt ❖ out-of-order: buffer
timeout(n): ❖ in-order: deliver (also
❖ resend pkt n, restart deliver buffered, in-order
timer pkts), advance window to
next not-yet-received pkt
ACK(n) in [sendbase,sendbase+N]:
❖ mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
❖ ACK(n)
❖ if n smallest unACKed
pkt, advance window base otherwise:
to next unACKed seq # ❖ ignore

Transport Layer 3-66 Transport Layer 3-67

Selective repeat in action Chapter 3 outline


sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
3.1 transport-layer 3.5 connection-oriented
012345678 send pkt2 receive pkt0, send ack0 services transport: TCP
receive pkt1, send ack1
012345678 send pkt3 Xloss
3.2 multiplexing and ▪ segment structure
(wait)
receive pkt3, buffer, demultiplexing ▪ reliable data transfer
012345678 rcv ack0, send pkt4 send ack3
3.3 connectionless ▪ flow control
rcv ack1, send pkt5

012345678 receive pkt4, buffer,
transport: UDP connection management
send ack4
record ack3 arrived receive pkt5, buffer, 3.4 principles of reliable 3.6 principles of congestion
pkt 2 timeout
send ack5
data transfer control
012345678 send pkt2 3.7 TCP congestion control
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack5 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

Transport Layer 3-68 Transport Layer 3-69


TCP: Overview RFCs: 793,1122,1323, 2018, 2581 TCP segment structure
32 bits

❖ ❖
URG: urgent data
point-to-point: full duplex data: (generally not used) source port # dest port # counting
by bytes
▪ one sender, one receiver ▪ bi-directional data flow sequence number of data
ACK: ACK #
❖ reliable, in-order byte in same connection valid acknowledgement number (not segments!)
steam: ▪ MSS: maximum segment head not
UAP R S F receive window
PSH: push data now len used
size # bytes
❖ pipelined: (generally not used) checksum Urg data pointer

rcvr willing
connection-oriented:
▪ TCP congestion and RST, SYN, FIN: to accept
flow control set window ▪ handshaking (exchange connection estab
options (variable length)

size of control msgs) inits (setup, teardown


sender, receiver state commands)
application
before data exchange
Internet data
❖ flow controlled: checksum (variable length)
▪ sender will not (as in UDP)
overwhelm receiver
Transport Layer 3-70 Transport Layer 3-71

TCP seq. numbers, ACKs TCP seq. numbers, ACKs


outgoing segment from sender
sequence numbers: source port # dest port #
Host Host B
▪ byte stream “number” of
sequence number
acknowledgement number
A
first byte in segment’s checksum
rwnd
urg pointer
data User
window size types
acknowledgements: N ‘C’ Seq=42, ACK=79, data = ‘C’
▪ seq # of next byte host ACKs
receipt of
expected from other side sender sequence number space ‘C’, echoes
▪ cumulative ACK host ACKs
Seq=79, ACK=43, data = ‘C’ back ‘C’
Q: how receiver handles sent
ACKed
sent, not- usable not
yet ACKed but not usable
receipt
of echoed
out-of-order segments (“in-flight”) yet sent ‘C’ Seq=43, ACK=80
▪ A: TCP spec doesn’t say, incoming segment to sender
- up to implementor source port # dest port #
simple telnet scenario
sequence number
acknowledgement number
A rwnd
checksum urg pointer

Transport Layer 3-72 Transport Layer 3-73


TCP round trip time, timeout TCP round trip time, timeout
EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT
Q: how to set TCP Q: how to estimate RTT?
❖ ❖ exponential weighted moving average
timeout value? SampleRTT: measured
time from segment ❖ influence of past sample decreases exponentially fast
❖ longer than RTT ❖ typical value: α = 0.125
transmission until ACK
▪ but RTT varies receipt
❖ too short: premature ▪ ignore retransmissions RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

timeout, unnecessary ❖ SampleRTT will vary, want

(milliseconds)
retransmissions estimated RTT “smoother”

RTT
▪ average several recent
❖ too long: slow measurements, not just
reaction to segment current SampleRTT
loss sampleRTT
EstimatedRTT

Transport Layer 3-74 time (seconds) Transport Layer 3-75

TCP round trip time, timeout Chapter 3 outline


❖ timeout interval: EstimatedRTT plus “safety margin” 3.1 transport-layer 3.5 connection-oriented
▪ large variation in EstimatedRTT -> larger safety margin services transport: TCP
❖ estimate SampleRTT deviation from EstimatedRTT: 3.2 multiplexing and ▪ segment structure
DevRTT = (1-β)*DevRTT + demultiplexing ▪ reliable data transfer
β*|SampleRTT-EstimatedRTT| 3.3 connectionless ▪ flow control
(typically, β = 0.25) transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
TimeoutInterval = EstimatedRTT + 4*DevRTT data transfer control
3.7 TCP congestion control
estimated RTT “safety margin”

Transport Layer 3-76 Transport Layer 3-77


TCP reliable data transfer TCP sender events:
data rcvd from app: timeout:
❖ TCP creates rdt service ❖ create segment with ❖ retransmit segment
on top of IP’s unreliable seq # that caused timeout
service
❖ seq # is byte-stream ❖ restart timer
▪ pipelined segments
let’s initially consider number of first data ack rcvd:
▪ cumulative acks byte in segment
▪ single retransmission simplified TCP sender: ❖ if ack acknowledges
timer ▪ ignore duplicate acks ❖ start timer if not previously unacked
❖ retransmissions ▪ ignore flow control, already running segments
triggered by: congestion control ▪ think of timer as for ▪ update what is known
oldest unacked to be ACKed
▪ timeout events segment
▪ duplicate acks ▪ start timer if there are
▪ expiration interval: still unacked segments
TimeOutInterval

Transport Layer 3-78 Transport Layer 3-79

TCP: retransmission scenarios TCP: retransmission scenarios


Host A Host B Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of Seq=92, 8 bytes of Seq=92, 8 bytes of
data data data
Seq=100, 20 bytes of data Seq=100, 20 bytes of data
timeo

timeo

ACK=100
ut

ut

X ACK=100

timeo
X

ut
ACK=100
ACK=120 ACK=120

Seq=92, 8 bytes of Seq=92, 8


data SendBase=100 bytes of data
Seq=120, 15 bytes of data
SendBase=120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout cumulative ACK


Transport Layer 3-81 Transport Layer 3-82
TCP ACK generation [RFC 1122, RFC 2581] TCP fast retransmit
❖ time-out period often
event at receiver TCP receiver action TCP fast retransmit
relatively long:
arrival of in-order segment with delayed ACK. Wait up to 500ms ▪ long delay before if sender receives 3
expected seq #. All data up to for next segment. If no next segment, resending lost packet ACKs for same data
expected seq # already ACKed send ACK
❖ detect lost segments (“triple
(“triple duplicate
duplicate ACKs”),
ACKs”),
arrival of in-order segment with immediately send single cumulative via duplicate ACKs. resend unacked
expected seq #. One other ACK, ACKing both in-order segments
▪ sender often sends segment with smallest
segment has ACK pending
many segments back- seq #
to-back
arrival of out-of-order segment immediately send duplicate ACK, ▪ likely that unacked
higher-than-expect seq. # . indicating seq. # of next expected byte ▪ if segment is lost, there segment lost, so don’t
Gap detected will likely be many wait for timeout
duplicate ACKs.
arrival of segment that immediate send ACK, provided that
partially or completely fills gap segment starts at lower end of gap

Transport Layer 3-83 Transport Layer 3-84

TCP fast retransmit Chapter 3 outline


Host A Host B

3.1 transport-layer 3.5 connection-oriented


services transport: TCP
Seq=92, 8 bytes of
Seq=100, data
20 bytes of data 3.2 multiplexing and ▪ segment structure
X demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
ACK=100
transport: UDP ▪ connection management
timeo

ACK=100
3.6 principles of congestion
ut

ACK=100 3.4 principles of reliable


ACK=100 data transfer control
Seq=100, 20 bytes of data 3.7 TCP congestion control

fast retransmit after sender


receipt of triple duplicate ACK
Transport Layer 3-85 Transport Layer 3-86
TCP flow control
applicati
on
application may process
remove data from
❖ Sample mid-sem question TCP socket buffers ….
application

TCP socket OS
receiver buffers
❖ Please explain if the following statements are … slower than TCP
receiver is delivering
True/False. (sender is sending) TCP
A) Cookies are a piece of code that has the potential code

to compromise the security of an Internet user.


B) Cookies are used in conjunction with HTTP. IP
C) Cookies have expiry time. flow code
receiver controls sender, so
D) Cookies can be used to track the browsing pattern control
sender won’t overflow
of a user at a particular site. receiver’s buffer by transmitting from sender
too much, too fast
receiver protocol stack

3-87 Transport Layer 3-88

TCP flow control Chapter 3 outline


❖ receiver “advertises” free 3.1 transport-layer 3.5 connection-oriented
buffer space by including to application process
services transport: TCP
rwnd value in TCP header
of receiver-to-sender 3.2 multiplexing and ▪ segment structure
segments RcvBuffer buffered data demultiplexing ▪ reliable data transfer
▪ RcvBuffer size set via
3.3 connectionless ▪ flow control
socket options (typical default rwnd
is 4096 bytes)
free buffer space
transport: UDP ▪ connection management
▪ many operating systems auto- 3.4 principles of reliable 3.6 principles of congestion
adjust RcvBuffer control
❖ sender limits amount of
TCP segment payloads data transfer
unacked (“in-flight”) data to 3.7 TCP congestion control
receiver-side buffering
receiver’s rwnd value
❖ guarantees receive buffer
will not overflow
Transport Layer 3-89 Transport Layer 3-90
Connection Management Agreeing to establish a connection
before exchanging data, sender/receiver “handshake”:
❖ agree to establish connection (each knowing the other willing 2-way handshake:
to establish connection) Q: will 2-way handshake
❖ agree on connection parameters always work in
network?
Let’s talk
ESTAB ❖ variable delays
application application OK
ESTAB ❖ retransmitted messages
connection state: ESTAB connection state: ESTAB (e.g. req_conn(x)) due to
connection variables: connection Variables:
seq # client-to-server seq # client-to-server
message loss
server-to-client server-to-client ❖ message reordering
rcvBuffer size rcvBuffer size
at server,client at server,client choose x ❖ can’t “see” other side
req_conn(x)
network network ESTAB
acc_conn(x)
ESTAB
Socket clientSocket = Socket connectionSocket =
newSocket("hostname","port welcomeSocket.accept();
number");
Transport Layer 3-91 Transport Layer 3-92

Agreeing to establish a connection TCP 3-way handshake


2-way handshake failure scenarios:
client state server state
LISTEN LISTEN
choose x choose x choose init seq num, x
req_conn(x) req_conn(x) send TCP SYN msg
ESTAB ESTAB SYNSENT SYNbit=1, Seq=x
retransmit acc_conn(x) retransmit acc_conn(x) choose init seq num, y
req_conn(x) req_conn(x) send TCP SYNACK
msg, acking SYN SYN RCVD
ESTAB ESTAB SYNbit=1, Seq=y
data(x+1) accept ACKbit=1; ACKnum=x+1
req_conn(x)
retransmit data(x+1) received SYNACK(x)
data(x+1) ESTAB indicates server is live;
connection connection send ACK for SYNACK;
x completes server x completes this segment may contain ACKbit=1, ACKnum=y+1
client client server client-to-server data
terminates forgets x terminates forgets x received ACK(y)
req_conn(x) indicates client is live
ESTAB
ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1)
(no client!)
Transport Layer 3-93 Transport Layer 3-94
TCP: closing a connection TCP: closing a connection
❖ client, server each close their side of connection client state server state
▪ send TCP segment with FIN bit = 1 ESTAB ESTAB

❖ respond to received FIN with ACK clientSocket.close()


FIN_WAIT_1 can no longer FINbit=1, seq=x
▪ on receiving FIN, ACK can be combined with own FIN send but can
CLOSE_WAIT
receive data
❖ simultaneous FIN exchanges can be handled ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close

LAST_ACK
FINbit=1,
TIMED_WAIT seq=y can no longer
send data
ACKbit=1;
timed wait ACKnum=y+1
for 2*max CLOSED
segment lifetime

CLOSED

Transport Layer 3-96 Transport Layer 3-97

Chapter 3 outline Principles of congestion control


3.1 transport-layer 3.5 connection-oriented congestion:
services transport: TCP ❖ informally: “too many sources sending too much
3.2 multiplexing and ▪ segment structure data too fast for network to handle”
demultiplexing ▪ reliable data transfer
❖ different from flow control!
3.3 connectionless ▪ flow control
❖ manifestations:
transport: UDP ▪ connection management
3.6 principles of congestion ▪ lost packets (buffer overflow at routers)
3.4 principles of reliable
data transfer control ▪ long delays (queueing in router buffers)
3.7 TCP congestion control

Transport Layer 3-98 Transport Layer 3-99


Causes/costs of congestion: scenario 1 Approaches towards congestion control
original data: λin throughput: λout
❖ two senders, two two broad approaches towards congestion control:
receivers Host A

❖ one router, infinite unlimited shared


buffers end-end congestion
output link buffers
network-assisted
❖ output link capacity: R
❖ no retransmission
control: congestion control:
❖ no explicit feedback ❖ routers provide
Host B
from network feedback to end systems
❖ congestion inferred ▪ single bit indicating
R/2
from end-system congestion (SNA,
observed loss, delay DECbit, TCP/IP ECN,

delay
λout

❖ approach taken by ATM)


TCP ▪ explicit rate for
λin R/2 λin R/2 sender to send at
❖ maximum per-connection ❖ large delays as arrival rate,
throughput: R/2 λin, approaches capacity
Transport Layer 3-100 Transport Layer 3-101

Chapter 3 outline TCP congestion control: additive increase


multiplicative decrease
❖ approach: sender increases transmission rate (window
3.1 transport-layer 3.5 connection-oriented size), probing for usable bandwidth, until loss occurs
services transport: TCP
▪ additive increase: increase cwnd by 1 MSS every
3.2 multiplexing and ▪ segment structure
RTT until loss detected
demultiplexing ▪ reliable data transfer
▪ flow control ▪ multiplicative decrease: cut cwnd in half after loss
3.3 connectionless
transport: UDP ▪ connection management additively increase window size …
…. until loss occurs (then cut window in half)
3.4 principles of reliable 3.6 principles of congestion

congestion window size


cwnd: TCP sender
data transfer control AIMD saw tooth
3.7 TCP congestion control behavior: probing
for bandwidth

time
Transport Layer 3-102 Transport Layer 3-103
TCP Congestion Control: details TCP Slow Start
Host A Host B
sender sequence number space
cwnd TCP sending rate: ❖ when connection begins,
❖ roughly: send cwnd
increase rate
bytes, wait RTT for exponentially until first

RTT
ACKS, then send loss event:
last byte
ACKed sent, not-
last byte
sent more bytes ▪ initially cwnd = 1 MSS
▪ double cwnd every RTT
yet ACKed
(“in-flight”)
cwnd ▪ done by incrementing
❖ sender limits transmission: rate ~
~ bytes/sec
RTT cwnd for every ACK
LastByteSent- < cwnd received

LastByteAcked
summary: initial rate is
❖ cwnd is dynamic, function slow but ramps up
of perceived network exponentially fast time
congestion
Transport Layer 3-104 Transport Layer 3-105

TCP: detecting, reacting to loss TCP: switching from slow start to CA


Q: when should the
❖ loss indicated by timeout: exponential
increase switch to
▪ cwnd set to 1 MSS; linear?
▪ window then grows exponentially (as in slow start) A: when cwnd gets
to threshold, then grows linearly to 1/2 of its value
before timeout.
❖ loss indicated by 3 duplicate ACKs: TCP RENO
▪ dup ACKs indicate network capable of delivering
some segments Implementation:
❖ variable ssthresh
▪ cwnd is cut in half window then grows linearly ❖ on loss event, ssthresh
❖ TCP Tahoe always sets cwnd to 1 (timeout or 3 is set to 1/2 of cwnd just
duplicate acks) before loss event

Transport Layer 3-106 Transport Layer 3-107


Summary: TCP Congestion Control ❖ Assuming TCP Reno is the protocol
experiencing the behavior shown.
New
New ACK!
duplicate ACK ACK! new ACK
cwnd = cwnd + MSS (MSS/cwnd)
.
dupACKcount++ new ACK dupACKcount = 0
cwnd = cwnd+MSS transmit new segment(s), as allowed
dupACKcount = 0
Λ transmit new segment(s), as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow Λ congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS duplicate ACK
timeout dupACKcount = 0 dupACKcount++
ssthresh = cwnd/2 retransmit missing segment
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment New
timeout
ACK!
ssthresh = cwnd/2
cwnd = 1 New ACK
dupACKcount = 0
retransmit missing segment cwnd = ssthresh dupACKcount == 3
dupACKcount == 3 dupACKcount = 0
ssthresh= cwnd/2 ssthresh= cwnd/2
cwnd = ssthresh + 3 cwnd = ssthresh + 3
retransmit missing segment retransmit missing segment
fast
recovery
duplicate ACK
a. Identify the intervals of time when TCP slow start is operating.
cwnd = cwnd + MSS
transmit new segment(s), as allowed
b. Identify the intervals of time when TCP congestion avoidance is operating.
Transport Layer 3-108 Transport Layer 3-109

❖ Assuming TCP Reno is the protocol ❖ Assuming TCP Reno is the protocol
experiencing the behavior shown. experiencing the behavior shown.

c. After the 15th transmission round, is segment loss detected by a triple e. What is the initial value of ssthresh at the first transmission round?
duplicate ACK or by a timeout? f. What is the value of ssthresh at the 18th transmission round?
d. After the 22nd transmission round, is segment loss detected by a triple g. What is the value of ssthresh at the 24th transmission round?
Transport Layer 3-110 Transport Layer 3-111
duplicate ACK or by a timeout?
❖ Assuming TCP Reno is the protocol Fairness (more)
experiencing the behavior shown.
Fairness and UDP Fairness, parallel TCP
❖ multimedia apps often connections
do not use TCP ❖ An application can open
▪ do not want rate multiple parallel
throttled by congestion connections between two
control
hosts
❖ instead use UDP:
❖ web browsers do this
▪ send audio/video at
constant rate, tolerate ❖ e.g., link of rate R with 9
packet loss existing connections:
❖ there is no “Internet ▪ new app asks for 1 TCP, gets rate
police” policing use of R/10
h. During what transmission round is the 70th segment sent? ▪ new app asks for 11 TCPs, gets R/2
congestion control
i. Assuming a packet loss is detected after the 26th round by the receipt of a
triple duplicate ACK, what will be the values of the congestion window size
Transport Layer 3-112 Transport Layer 3-113
and of ssthresh ?

Chapter 3: summary Causes/costs of congestion: insights


❖ principles behind
transport layer services: ▪ throughput can never exceed capacity
▪ multiplexing,
demultiplexing next: ▪ delay increases as capacity approached
❖ leaving the
▪ reliable data transfer
network “edge” ▪ loss/retransmission decreases effective
▪ flow control (application, throughput
▪ congestion control transport layers) ▪ un-needed duplicates further decreases
❖ instantiation, ❖ into the network effective throughput
implementation in the “core” ▪ upstream transmission capacity /
Internet buffering wasted for packets lost
▪ UDP downstream
▪ TCP
Transport Layer 3-114 Transport Layer 3-115
Chapter 4: network layer
Chapter 4
Network Layer chapter goals:
❖ understand principles behind network layer
services:
▪ network layer service models
Computer ▪ forwarding versus routing
Networking: A ▪ how a router works
Top Down ▪ routing (path selection)
Approach ▪ broadcast, multicast
6th edition
Jim Kurose, Keith Ross ❖ instantiation, implementation in the Internet
Addison-Wesley
March 2012
All material copyright 1996-2013
J.F Kurose and K.W. Ross, All Rights Reserved

Network Layer 4-1 Network Layer 4-2

Chapter 4: outline Network layer



application
transport segment from transport
network
4.1 introduction 4.5 routing algorithms sending to receiving host data link
physical

4.2 virtual circuit and ▪ link state network network

❖ on sending side data link data link


network
datagram networks ▪ distance vector data link
physical physical

▪ hierarchical routing encapsulates segments physical network network

4.3 what’s inside a router data link data link

4.6 routing in the Internet into datagrams physical physical

4.4 IP: Internet Protocol


▪ datagram format
▪ RIP ❖ on receiving side, delivers network
data link
network
data link
▪ OSPF segments to transport physical physical

network
IPv4 addressing
▪ BGP
data link

▪ layer physical
ICMP application
▪ 4.7 broadcast and multicast

network
IPv6 transport

routing network layer protocols network


data link
physical
network
data link
network
data link

in every host, router data link


physical
physical physical

❖ router examines header


fields in all IP datagrams
passing through it
Network Layer 4-3 Network Layer 4-4
Interplay between routing and forwarding
Two key network-layer functions
routing algorithm routing algorithm determines
❖ forwarding: move analogy: end-end-path through network

packets from router’s local forwarding table forwarding table determines


input to appropriate ❖ routing: process of header value output link local forwarding at this router

router output planning trip from source 0100


0101
3
2
to dest 0111
1001
2
1
❖ routing: determine route
taken by packets from ❖ forwarding: process of
source to dest. getting through single value in arriving
packet’s header
interchange 1
▪ routing algorithms 0111

3 2

Network Layer 4-5 Network Layer 4-6

Network service model Chapter 4: outline


Q: What service model for “channel” transporting 4.1 introduction 4.5 routing algorithms
datagrams from sender to receiver? 4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
example services for example services for a 4.3 what’s inside a router ▪ hierarchical routing
individual datagrams: flow of datagrams: 4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ RIP
❖ guaranteed delivery ❖ in-order datagram ▪ datagram format
▪ OSPF
❖ guaranteed delivery with delivery ▪ IPv4 addressing
▪ BGP
less than 40 msec delay ❖ guaranteed minimum ▪ ICMP
▪ IPv6 4.7 broadcast and multicast
bandwidth to flow
routing
❖ restrictions on changes in
inter-packet spacing

Network Layer 4-7 Network Layer 4-8


Connection, connection-less service Virtual circuits
❖ virtual-circuit network provides network-layer “source-to-dest path behaves much like telephone
connection service circuit”
❖ datagram network provides network-layer ▪ performance-wise
connectionless service ▪ network actions along source-to-dest path
❖ analogous to TCP/UDP connecton-oriented /
connectionless transport-layer services, but: ❖ call setup, teardown for each call before data can flow
▪ service: host-to-host ❖ each packet carries VC identifier (not destination host
address)
▪ no choice: network provides one or the other ❖ every router on source-dest path maintains “state” for
▪ implementation: in network core each passing connection
❖ link, router resources (bandwidth, buffers) may be
allocated to VC (dedicated resources = predictable
service)
Network Layer 4-9 Network Layer 4-10

VC implementation VC forwarding table


12 22 32

a VC consists of: 1
2
3

1. path from source to destination VC number


interface
2. VC numbers, one number for each link along path forwarding table number
3. entries in forwarding tables in routers along path in
❖ packet belonging to VC carries VC number northwest
Incoming interface
router: Incoming VC # Outgoing interface Outgoing VC #
(rather than dest address)
1 12 3 22
❖ VC number can be changed on each link. 2 63 1 18
▪ new VC number comes from forwarding table 3 7 2 17
1 97 3 87
… … … …

VC routers maintain connection state


Network Layer 4-11
information! Network Layer 4-12
Virtual circuits: signaling protocols Datagram networks
❖ no call setup at network layer
❖ used to setup, maintain teardown VC ❖ routers: no state about end-to-end connections
❖ used in ATM, frame-relay, X.25 ▪ no network-level concept of “connection”
❖ not used in today’s Internet ❖ packets forwarded using destination host address

application application application application


5. data flow begins 6. receive data
transport transport transport transport
network 4. call connected 3. accept call network 1. send datagrams
1. initiate call network 2. receive datagrams network
data link 2. incoming call data link
data link data link
physical physical physical physical

Network Layer 4-13 Network Layer 4-14

Datagram forwarding table Datagram or VC network: why?


4 billion IP addresses, so
routing algorithm rather than list individual Internet (datagram) ATM (VC)
destination address ❖ data exchange among ❖ evolved from telephony
local forwarding table
list range of addresses computers ❖ human conversation:
(aggregate table entries) ▪ strict timing, reliability
dest address output link ▪ “elastic” service, no strict
address-range 1 3
timing req. requirements
address-range 2 2
▪ need for guaranteed service

address-range 3 2
address-range 4 1 many link types ❖ “dumb” end systems
▪ different characteristics ▪ telephones
▪ uniform service difficult ▪ complexity inside
“smart” end systems
IP destination address in
arriving packet’s header ❖ network
1
(computers)
▪ can adapt, perform control,
3 2 error recovery
▪ simple inside network,
complexity at “edge”

Network Layer 4-15 Network Layer 4-16


Chapter 4: outline Router architecture overview
two key router functions:
❖ run routing algorithms/protocol (RIP, OSPF, BGP)
4.1 introduction 4.5 routing algorithms
▪ link state ❖ forwarding datagrams from incoming to outgoing link
4.2 virtual circuit and
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing forwarding tables computed, routing
routing, management
4.4 IP: Internet Protocol 4.6 routing in the Internet pushed to input ports
processor
control plane (software)
▪ RIP
▪ datagram format
▪ OSPF forwarding data
▪ IPv4 addressing plane (hardware)
▪ BGP
▪ ICMP
▪ IPv6 4.7 broadcast and multicast
routing high-seed
switching
fabric

router input ports router output ports


Network Layer 4-17 Network Layer 4-18

Input port functions Switching fabrics


❖ transfer packet from input buffer to appropriate
lookup,
link
forwarding
output buffer
layer

line switch
termination
protocol
fabric
switching rate: rate at which packets can be
(receive
) queueing
transfer from inputs to outputs
▪ often measured as multiple of input/output line rate
physical layer: ▪ N inputs: switching rate N times line rate desirable
bit-level reception ❖ three types of switching fabrics
data link layer: decentralized switching:
e.g., Ethernet ❖ given datagram dest., lookup output port
see chapter 5 using forwarding table in input port memory
memory (“match plus action”)
❖ goal: complete input port processing at
‘line speed’
memory bus crossbar
❖ queuing: if datagrams arrive faster than
forwarding rate into switch fabric
Network Layer 4-19 Network Layer 4-20
Switching via memory Switching via a bus
first generation routers:
❖ traditional
computers with switching under direct control ❖ datagram from input port memory
of CPU to output port memory via a
❖ packet copied to system’s memory shared bus
❖ speed limited by memory bandwidth (2 bus crossings per
❖ bus contention: switching speed
datagram)
limited by bus bandwidth
❖ 32 Gbps bus, Cisco 5600: sufficient bus
input output speed for access and enterprise
port memory port
(e.g.,
routers
(e.g.,
Ethernet) Ethernet)

system bus

Network Layer 4-21 Network Layer 4-22

Switching via interconnection network Input port queuing


❖ fabric slower than input ports combined -> queueing may
❖ Overcome bus bandwidth limitations occur at input queues
❖ Banyan networks, crossbar, other ▪ queueing delay and loss due to input buffer overflow!
interconnection nets initially developed to
❖ Head-of-the-Line (HOL) blocking: queued datagram at front
connect processors in multiprocessor
of queue prevents others in queue from moving forward
❖ Crossbar is an interconnection network
consisting of 2N buses connecting N input
ports to N output ports
❖ A crossbar switch is non-blocking—a crossbar switch switch
packet being forwarded to an output port fabric
fabric
will not be blocked from reaching that
output port as long as no other packet is
currently being forwarded to that output output port contention: one packet time later:
port only one red datagram can be green packet
❖ Cisco 12000: switches 60 Gbps through transferred. experiences HOL
the interconnection network lower red packet is blocked blocking

Network Layer 4-23 Network Layer 4-24


Output ports How much buffering?
datagram ❖ RFC 3439 rule of thumb: average buffering equal
switch
fabric
buffer link
layer line to “typical” RTT (say 250 msec) times link
(rate: NR) protocol
(send)
termination
R
capacity C
queueing
▪ e.g., C = 10 Gpbs link: 2.5 Gbit buffer
❖ recent recommendation: with N flows, buffering
equal to RTT . C
❖ buffering required when datagrams
Datagram (packets)arrive
can be lost N
from fabric faster than the
due to transmission
congestion, lack of buffers but too much buffering can increase delays (particularly
rate in home routers)
❖ scheduling discipline
Priority chooses
schedulingamong
– who gets best • long RTTs: poor performance for real-time apps,
queued datagrams for transmission
performance, network neutrality sluggish TCP response
• recall delay-based congestion control: “keep bottleneck
Network Layer 4-25
link just full enough (busy) but no fuller” Network Layer 4-27

Buffer Management Packet Scheduling: FCFS

buffer management: packet scheduling: deciding FCFS: packets transmitted in


switch
datagram
buffer
link
layer
▪ drop: which packet to which packet to send order of arrival to output
fabric
protoc
line
terminati
R
add, drop when buffers are next on link port
▪ ▪ also known as: First-in-first-out
ol on
queueing (send)
full first come, first served
scheduling
▪ priority
• tail drop: drop arriving ▪ round robin
(FIFO)
packet ▪ real world examples?
▪ weighted fair queueing
• priority: drop/remove
Abstraction: queue on priority basis Abstraction: queue
packet packet
R R
▪ marking: which packets
packet departures packet departures
arrivals queue link arrivals queue link
(waiting area) (server) (waiting area) (server)
to mark to signal
congestion (ECN, RED)

Network Layer 4-28 Network Layer 4-29


Scheduling policies: priority Scheduling policies: round robin

Priority scheduling: Round Robin (RR)


scheduling:
❖ arriving traffic classified, ❖ arriving traffic classified,
queued by class high priority queue
queued by class
▪ any header fields can be arrivals ▪ any header fields can be
used for classification used for classification
▪ send packet from highest classify link departures
▪ server cyclically, R
priority queue that has repeatedly scans class classify link departures
low priority queue arrivals
buffered packets queues, sending one
• FCFS within priority class complete packet from
each class (if available) in
turn

Network Layer 4-30 Network Layer 4-31

Chapter 4: outline The Internet network layer


host, router network layer functions:
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state transport layer: TCP, UDP
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing IP protocol
routing protocols
4.6 routing in the Internet • path selection • addressing conventions
4.4 IP: Internet Protocol • datagram format
▪ RIP • RIP, OSPF,
▪ datagram format network • packet handling
▪ OSPF BGP
▪ IPv4 addressing layer SDN controller conventions
▪ BGP forwarding
▪ ICMP table
ICMP protocol
4.7 broadcast and multicast • error reporting
▪ IPv6
routing • router “signaling”

link layer

physical layer

Network Layer 4-32 Network Layer 4-33


IP datagram format
IP protocol version 32 bits
number total datagram
header length length (bytes)
ver head. type of length
(bytes) len service for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset Reassembly
max number time to upper header
remaining hops live layer checksum
(decremented at Flgs: 3 bits
32 bit source IP address Rsvd, DF, MF
each router)
32 bit destination IP address
upper layer protocol
to deliver payload to options (if any) e.g. timestamp,
record route
how much overhead? data taken, specify
❖ 20 bytes of TCP
(variable length, list of routers
typically a TCP to visit.
❖ 20 bytes of IP
or UDP segment)
❖ = 40 bytes + app
layer overhead

Network Layer 4-34 Network Layer 4-35

IP fragmentation, reassembly IP fragmentation, reassembly


❖ network links have MTU length ID fragflag offset
(max.transfer size) - example: =4000 =x =0 =0
largest possible link-level fragmentation: ❖ 4000 byte datagram

frame in: one large datagram ❖ MTU = 1500 bytes


one large datagram becomes
▪ different link types, out: 3 smaller datagrams several smaller datagrams
different MTUs
1480 bytes in length ID fragflag offset
❖ large IP datagram divided data field =1500 =x =1 =0
(“fragmented”) within net reassembly
▪ one datagram becomes offset = length ID fragflag offset
several datagrams 1480/8 =1500 =x =1 =185
▪ “reassembled” only at

length ID fragflag offset


final destination =1040 =x =0 =370
▪ IP header bits used to
identify, order related
fragments
Network Layer 4-36 Network Layer 4-37
IP fragmentation, reassembly Chapter 4: outline
https://tools.ietf.org/html/rfc791

Len= 4000 ; ID= X; fragflag=0; offset=0 4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
MTU = 1500 datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing

Len= 1500 ; ID= X; Len= 1500 ; ID= X; Len= 1040 ; ID= X; 4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ RIP
fragflag=1; offset=0 fragflag=1; offset=185 fragflag=0; offset=370 ▪ datagram format
▪ OSPF
▪ IPv4 addressing
MTU = 900 ▪ BGP
▪ ICMP
▪ IPv6 4.7 broadcast and multicast
Len= 900 ; Len= 620 ; Len= 900 ; Len= 620 ; Len= 900 ; Len= 160 ; routing
ID= X; FF=1; ID= X; FF=1; ID= X; FF=1; ID= X; FF=1; ID= X; FF=1; ID= X; FF=0;
offset=0 offset=110 offset=185 offset=295 offset=370 offset=480

Receiver
0 880 1480 2360 2960 3840

Network Layer 4-38 Network Layer 4-39

IP addressing: introduction IP addressing: introduction


223.1.1.1 223.1.1.1
❖ IP address: 32-bit Q: how are interfaces
223.1.2.1 223.1.2.1
identifier for host, router actually connected?
interface 223.1.1.2
223.1.1.4 223.1.2.9 A: we’ll learn about that223.1.1.2 223.1.1.4 223.1.2.9
❖ interface: connection in chapter 5, 6.
between host/router and 223.1.3.27 223.1.3.27
physical link 223.1.1.3
223.1.2.2
223.1.1.3
223.1.2.2
▪ router’s typically have
multiple interfaces A: wired Ethernet interfaces
▪ host typically has one or connected by Ethernet switches
two interfaces (e.g., wired 223.1.3.1 223.1.3.2 223.1.3.1 223.1.3.2

Ethernet, wireless 802.11)


For now: don’t need to worry
❖ IP addresses associated about how one interface is
with each interface!!!! 223.1.1.1 = 11011111 00000001 00000001 00000001 connected to another (with no
A: wireless WiFi interfaces
223 1 1 1 intervening router)
connected by WiFi base station

Network Layer 4-40 Network Layer 4-41


Subnets Subnets
223.1.1.0/24
223.1.2.0/24
❖ What’s a subnet ?
▪ device interfaces that can
223.1.1.1
recipe 223.1.1.1

physically reach each other 223.1.1.2 223.1.2.1 ❖ to determine the 223.1.1.2 223.1.2.1
without passing through an 223.1.1.4 223.1.2.9 subnets, detach each 223.1.1.4 223.1.2.9
intervening router interface from its host
223.1.2.2 223.1.2.2
223.1.1.3 223.1.3.27 or router, creating 223.1.1.3 223.1.3.27

❖ IP address: subnet islands of isolated subnet


▪ subnet part - high order networks
bits are common in subnet ❖
223.1.3.2 223.1.3.2
223.1.3.1 each isolated network 223.1.3.1
▪ host part - low order bits is called a subnet
network consisting of 3 subnets 223.1.3.0/24

subnet mask: /24


Network Layer 4-42 Network Layer 4-43

Subnets 223.1.1.2 IP addressing: CIDR


how many? 223.1.1.1 223.1.1.4
CIDR: Classless InterDomain Routing
223.1.1.3
▪ subnet portion of address of arbitrary length
223.1.9.2 223.1.7.0
▪ address format: a.b.c.d/x, where x is # bits in
subnet portion of address

223.1.9.1 223.1.7.1 subnet host


223.1.8.1 223.1.8.0 part part

223.1.2.6
11001000 00010111 00010000 00000000
223.1.3.27

223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2


200.23.16.0/23

Network Layer 4-44 Network Layer 4-45


IP addresses: how to get one? DHCP: Dynamic Host Configuration Protocol
Q: How does a host get IP address? goal: allow host to dynamically obtain its IP address from network
server when it joins network
❖ hard-coded by system admin in a file ▪ can renew its lease on address in use
▪ Windows: control-panel->network->configuration- ▪ allows reuse of addresses (only hold address while
>tcp/ip->properties connected/“on”)
▪ UNIX: /etc/rc.config ▪ support for mobile users who want to join network (more
shortly)
❖ DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server DHCP overview:
▪ host broadcasts “DHCP discover” msg [optional]
▪ “plug-and-play”
▪ DHCP server responds with “DHCP offer” msg [optional]
▪ host requests IP address: “DHCP request” msg
▪ DHCP server sends address: “DHCP ack” msg

Network Layer 4-46 Network Layer 4-47

DHCP client-server scenario DHCP client-server scenario


DHCP server: 223.1.2.5 DHCP discover arriving
client
src : 0.0.0.0, 68
DHCP Broadcast: is there a
dest.: 255.255.255.255,67
223.1.1.0/24 DHCPyiaddr:
server0.0.0.0
out there?
server
transaction ID: 654
223.1.1.1 223.1.2.1
DHCP offer
src: 223.1.2.5, 67
223.1.1.2 Broadcast: I’m a DHCP
dest: 255.255.255.255, 68
arriving DHCP server!
yiaddrr:Here’s an IP
223.1.2.4
223.1.1.4 223.1.2.9
client needs transaction ID:
address you can use654
address in this lifetime: 3600 secs
DHCP request
223.1.3.27
223.1.2.2 network
223.1.1.3 src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
Broadcast: OK. I’ll take
yiaddrr: 223.1.2.4
223.1.2.0/24 that IP address!
transaction ID: 655
lifetime: 3600 secs

223.1.3.1 223.1.3.2
DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255,
Broadcast: 68
OK. You’ve
yiaddrr: 223.1.2.4
got that IPID:
transaction address!
655
223.1.3.0/24
lifetime: 3600 secs

Network Layer 4-48 Network Layer 4-49


IP addresses: how to get one? Hierarchical addressing: route aggregation
Q: how does network get subnet part of IP addr? hierarchical addressing allows efficient advertisement of routing
A: gets allocated portion of its provider ISP’s address information:
space
Organization 0
200.23.16.0/23
Organization 1
“Send me anything
200.23.18.0/23 with addresses
ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 .
. . Internet
Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 .
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
Organization 7 .
... ….. …. …. 200.23.30.0/23
“Send me anything
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23 ISPs-R-Us
with addresses
beginning
199.31.0.0/16”

Network Layer 4-50 Network Layer 4-51

Hierarchical addressing: more specific routes IP addressing: the last word...


ISPs-R-Us has a more specific route to Organization 1 Q: how does an ISP get block of addresses?
A: ICANN: Internet Corporation for Assigned
Organization 0 Names and Numbers http://www.icann.org/
▪ allocates addresses
200.23.16.0/23

“Send me anything
with addresses ▪ manages DNS
Organization 2
200.23.20.0/23 . Fly-By-Night-ISP
beginning
200.23.16.0/20” ▪ assigns domain names, resolves disputes
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
Organization 1 beginning 199.31.0.0/16
or 200.23.18.0/23”
200.23.18.0/23

Network Layer 4-52 Network Layer 4-53


NAT: network address translation NAT: network address translation
rest of local network motivation: local network uses just one IP address as far
Internet (e.g., home network)
10.0.0.1
as outside world is concerned:
10.0.0/24
▪ range of addresses not needed from ISP: just one
10.0.0.4
10.0.0.2
IP address for all devices
138.76.29.7 ▪ can change addresses of devices in local network
10.0.0.3
without notifying outside world
▪ can change ISP without changing addresses of
devices in local network
all datagrams leaving local datagrams with source or ▪ devices inside local net not explicitly addressable,
network have same single destination in this network
visible by outside world (a security plus)
source NAT IP address: have 10.0.0/24 address for
138.76.29.7,different source source, destination (as usual)
port numbers
Network Layer 4-54 Network Layer 4-55

NAT: network address translation NAT: network address translation


implementation: NAT router must: 2: NAT router
NAT translation table
WAN side addr LAN side addr
1: host 10.0.0.1
changes datagram sends datagram to
138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80
▪ outgoing datagrams: replace (source IP address, port #) of source addr from
10.0.0.1, 3345 to …… ……
every outgoing datagram to (NAT IP address, new port #) 138.76.29.7, 5001,
. . . remote clients/servers will respond using (NAT IP updates table S: 10.0.0.1, 3345
D: 128.119.40.186, 80
address, new port #) as destination addr 10.0.0.1
1
▪ remember (in NAT translation table) every (source IP 2
S: 138.76.29.7, 5001
D: 128.119.40.186, 80 10.0.0.4
address, port #) to (NAT IP address, new port #) translation 10.0.0.2
pair 138.76.29.7 S: 128.119.40.186, 80
4
D: 10.0.0.1, 3345
S: 128.119.40.186, 80
3
▪ incoming datagrams: replace (NAT IP address, new port #) in D: 138.76.29.7, 5001
3: reply arrives
4: NAT router 10.0.0.3

dest fields of every incoming datagram with corresponding changes datagram


dest. address: dest addr from
(source IP address, port #) stored in NAT table 138.76.29.7, 5001 138.76.29.7, 5001 to 10.0.0.1, 3345

Network Layer 4-56 Network Layer 4-57


NAT: network address translation
❖ 16-bit port-number field:
▪ 60,000 simultaneous connections with a single
LAN-side address! Network Layer
❖ NAT is controversial:
▪ routers should only process up to layer 3 Routing Algorithms
▪ violates end-to-end argument
Devashish Gosain
• NAT possibility must be taken into account by app
designers, e.g., P2P applications
▪ address shortage should instead be solved by
IPv6

Network Layer 4-58

Outline Graph Abstraction for Routing


• Graph
• G = (N, E)
• Routing algorithms

• Dijkstra’s algorithm

• Bellman-Ford algorithm

Data Networks Addressing 2 Data Networks Routing Algorithms 3


Graph Abstraction for Routing Graph Abstraction for Routing
• Graph • Graph
• G = (N, E) • G = (N, E)
• N: Set of routers (nodes) • N: Set of routers (nodes) 5
• N = { u, v, w, x, y, z } v w
• N = { u, v, w, x, y, z } v 3 w
• E: Set of links (edges) 2 5
u z u 2 1 z
• E = { (u, v), (u, x), (v, x), (v, w), (x, w), 1
3
x y (x, y), (w, y), (w, z), (y, z) } x y 2
1

Data Networks Routing Algorithms 3 Data Networks Routing Algorithms 3

Graph Abstraction for Routing Graph Abstraction for Routing


• Graph • Graph
• G = (N, E) • G = (N, E)
• N: Set of routers (nodes) 5 • N: Set of routers (nodes) Remark
5
Graph abstraction is also useful in other
• N = { u, v, w, x, y, z } v 3 w
• N = { u, v, w, x, y, z } network contexts.v 3 w
• E: Set of links (edges) • E: Set of links (edges)
2 5 2 5
u 2 1 z u P2P networks,
Example: In 2 where1N is set of
z
• E = { (u, v), (u, x), (v, x), (v, w), (x, w), 1
3 • E = { (u, v), (u, x), (v, x), (v, w), (x, w), 3
peers and E1is set of TCP connections.
(x, y), (w, y), (w, z), (y, z) } x y 2 (x, y), (w, y), (w, z), (y, z) } x y 2
1 1
• Path • Path
• Sequence of meeting edges (routers) • Sequence of meeting edges (routers)

Data Networks Routing Algorithms 3 Data Networks Routing Algorithms 3


Graph Abstraction: Costs Graph Abstraction: Costs
• c(x, x’) = cost of link (x, x’)
• e.g., c(w, z) = 5
5 5

v 3 w v 3 w
2 5 2 5
u 2 1 z u 2 1 z
3 3
1 2 1 2
x y x y
1 1

Data Networks Routing Algorithms 4 Data Networks Routing Algorithms 4

Graph Abstraction: Costs Graph Abstraction: Costs


• c(x, x’) = cost of link (x, x’) • c(x, x’) = cost of link (x, x’)
• e.g., c(w, z) = 5 • e.g., c(w, z) = 5
5 5

v 3 w v 3 w
2 5 2 5
Question
Cost can be always 1, or inversely related u 2 1 z Cost can be always 1, or inversely related u 2 1 z
3 What is the least cost path between u and z? 3
to bandwidth, or inversely related to 1 2
to bandwidth, or inversely related to 1 2
Routing algorithm: Algorithm that finds a “good” path (typically,
x y x y
congestion. 1 congestion. 1
the least cost path)

Data Networks Routing Algorithms 4 Data Networks Routing Algorithms 4


Graph Abstraction: Costs Classification of Routing Algorithms
• c(x, x’) = cost of link (x, x’)
• e.g., c(w, z) = 5
5

v 3 w
2 5
Question
Cost can be always 1, or inversely related u 2 1 z
What is the least cost path between u and z? 3
to bandwidth, or inversely related to 1 2
Routing algorithm: Algorithm that finds a “good” path (typically,
x y
congestion. 1
the least cost path)

Data Networks Routing Algorithms 4 Data Networks Routing Algorithms 5

Classification of Routing Algorithms Classification of Routing Algorithms


Global or decentralized information? Global or decentralized information?

Global:
• All routers have complete topology and link
cost information
• “Link state” algorithms

Data Networks Routing Algorithms 5 Data Networks Routing Algorithms 5


Classification of Routing Algorithms Classification of Routing Algorithms
Global or decentralized information? Global or decentralized information? Static or dynamic?

Global: Global:
• All routers have complete topology and link • All routers have complete topology and link
cost information cost information
• “Link state” algorithms • “Link state” algorithms

Decentralized: Decentralized:
• Router knows physically-connected neighbors • Router knows physically-connected neighbors
and link costs to neighbors and link costs to neighbors
• Iterative process of computation, exchange of • Iterative process of computation, exchange of
information with neighbors information with neighbors
• “Distance vector” algorithms • “Distance vector” algorithms

Data Networks Routing Algorithms 5 Data Networks Routing Algorithms 5

Classification of Routing Algorithms Classification of Routing Algorithms


Global or decentralized information? Static or dynamic? Global or decentralized information? Static or dynamic?

Global: Static: Global: Static:


• Routes change slowly over time • Routes change slowly over time
• All routers have complete topology and link • All routers have complete topology and link
cost information cost information
• “Link state” algorithms • “Link state” algorithms

Dynamic:
Decentralized: Decentralized: • Routes change more quickly
• Router knows physically-connected neighbors • Router knows physically-connected neighbors • Periodic update
and link costs to neighbors and link costs to neighbors • In response to link cost changes
• Iterative process of computation, exchange of • Iterative process of computation, exchange of
information with neighbors information with neighbors
• “Distance vector” algorithms • “Distance vector” algorithms

Data Networks Routing Algorithms 5 Data Networks Routing Algorithms 5


A Link-state Routing Algorithm A Link-state Routing Algorithm
Dijkstra’s algorithm Dijkstra’s algorithm
• Network topology and link costs are
known to all nodes
• Accomplished via “link state broadcast”
• All nodes have same information

Data Networks Routing Algorithms 6 Data Networks Routing Algorithms 6

A Link-state Routing Algorithm A Link-state Routing Algorithm


Dijkstra’s algorithm Dijkstra’s algorithm
• Network topology and link costs are • Network topology and link costs are
known to all nodes known to all nodes
• Accomplished via “link state broadcast” • Accomplished via “link state broadcast”
• All nodes have same information • All nodes have same information

• Computes least cost paths from one • Computes least cost paths from one
node (“source”) to all other nodes node (“source”) to all other nodes
• Gives routing table for that node • Gives routing table for that node

• Iterative: After k iterations, know


least cost paths to k destinations

Data Networks Routing Algorithms 6 Data Networks Routing Algorithms 6


A Link-state Routing Algorithm Dijkstra’s Algorithm
1 Initialization for A:
Dijkstra’s algorithm Notation 2 N’ = {A}
• Network topology and link costs are • c(i, j): Link cost from node i to j 3 for all nodes v:

known to all nodes (infinite if not direct neighbors) 4 if v adjacent to A:

• Accomplished via “link state broadcast” • D(v): Current value of cost of path 5 then D(v) = c(A, v)
6 else D(v) = ∞
from source to v
• All nodes have same information 7
• p(v): Current predecessor of v along 8 Loop
• Computes least cost paths from one path from source to v 9 find w not in N’ such that D(w) is a minimum
node (“source”) to all other nodes
• N’: Set of nodes whose least cost 10 add w to N’
• Gives routing table for that node path are known 11 update D(v) for all v adjacent to w and not in N’:
12 D(v) = min( D(v), D(w) + c(w,v) )
• Iterative: After k iterations, know 13 /* new cost to v is either old cost to v or known
least cost paths to k destinations 14 * shortest path cost to w plus cost from w to v */
15 until all nodes in N’

Data Networks Routing Algorithms 6 Data Networks Routing Algorithms 7

Dijkstra’s Algorithm Dijkstra’s Algorithm: Example


1 Initialization for A:
2 N’ = {A} Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
3 for all nodes v: 0 A Infinity Infinity Infinity Infinity Infinity
4 if v adjacent to A:
5 then D(v) = c(A, v)
6 else D(v) = ∞
7
8 Loop
9 find w not in N’ such that D(w) is a minimum
10 add w to N’ 5
11 update D(v) for all v adjacent to w and not in N’: B 3 C
12 D(v) = min( D(v), D(w) + c(w,v) ) 2 5
13 /* new cost to v is either old cost to v or known A 2 F
1
3
14 * shortest path cost to w plus cost from w to v */ 1
D E 2
15 until all nodes in N’ 1

Data Networks Routing Algorithms 7 Data Networks Routing Algorithms 8


Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example
Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 9 Data Networks Routing Algorithms 9

Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example


Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 1 AD 2, A 4, D 1, A 2, D Infinity

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 10 Data Networks Routing Algorithms 11


Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example
Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 11 Data Networks Routing Algorithms 12

Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example


Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE 2, A 3, E 1, A 2, D 4, E 2 ADE 2, A 3, E 1, A 2, D 4, E

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 13 Data Networks Routing Algorithms 13


Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example
Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE 2, A 3, E 1, A 2, D 4, E 2 ADE 2, A 3, E 1, A 2, D 4, E
3 ADEB

Next node?
B instead of C?
5 5
Cost of A-B is smaller (2) than cost of A-C (3)
B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 13 Data Networks Routing Algorithms 14

Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example


Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE 2, A 3, E 1, A 2, D 4, E 2 ADE 2, A 3, E 1, A 2, D 4, E
3 ADEB 2, A 3, E 1, A 2, D 4, E 3 ADEB 2, A 3, E 1, A 2, D 4, E

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 15 Data Networks Routing Algorithms 15


Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example
Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE 2, A 3, E 1, A 2, D 4, E 2 ADE 2, A 3, E 1, A 2, D 4, E
3 ADEB 2, A 3, E 1, A 2, D 4, E 3 ADEB 2, A 3, E 1, A 2, D 4, E
4 ADEBC 4 ADEBC 2, A 3, E 1, A 2, D 4, E

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 16 Data Networks Routing Algorithms 17

Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example


Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE 2, A 3, E 1, A 2, D 4, E 2 ADE 2, A 3, E 1, A 2, D 4, E
3 ADEB 2, A 3, E 1, A 2, D 4, E 3 ADEB 2, A 3, E 1, A 2, D 4, E
4 ADEBC 2, A 3, E 1, A 2, D 4, E 4 ADEBC 2, A 3, E 1, A 2, D 4, E
5 ADEBCF

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 17 Data Networks Routing Algorithms 18


Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example
Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) Step start N’ D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
0 A 2, A 5, A 1, A Infinity Infinity 0 A 2, A 5, A 1, A Infinity Infinity
1 AD 2, A 4, D 1, A 2, D Infinity 1 AD 2, A 4, D 1, A 2, D Infinity
2 ADE 2, A 3, E 1, A 2, D 4, E 2 ADE 2, A 3, E 1, A 2, D 4, E
3 ADEB 2, A 3, E 1, A 2, D 4, E 3 ADEB 2, A 3, E 1, A 2, D 4, E
4 ADEBC 2, A 3, E 1, A 2, D 4, E 4 ADEBC 2, A 3, E 1, A 2, D 4, E
5 ADEBCF 2, A 3, E 1, A 2, D 4, E 5 ADEBCF 2, A 3, E 1, A 2, D 4, E

5 5

B 3 C B 3 C
2 5 2 5
A 2 F A 2 F
1 1
3 3
1 2 1 2
D E D E
1 1

Data Networks Routing Algorithms 19 Data Networks Routing Algorithms 20

Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example


Results from running Dijkstra’s algorithm Results from running Dijkstra’s algorithm
B C

A F

D E

Data Networks Routing Algorithms 21 Data Networks Routing Algorithms 21


Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Example
Results from running Dijkstra’s algorithm Results from running Dijkstra’s algorithm
B C B C

A F Shortest-path tree from A A F Shortest-path tree from A

D E D E

Destination Link
B (A, B)
D (A, D)
E (A, D)
C (A, D)
F (A, D)

Data Networks Routing Algorithms 21 Data Networks Routing Algorithms 21

Dijkstra’s Algorithm: Example Dijkstra’s Algorithm: Discussion


Results from running Dijkstra’s algorithm
B C

A F Shortest-path tree from A

D E

Destination Link
B (A, B)
A A A A
D (A, D) 1 1+e 2+e 0 0 2+e 2+e 0
D 0 B D 1+e 1 B D B D 1+e 1 B
E (A, D) Forwarding table in A 0 0 0
0 e 0 0 1 e
C (A, D) 1
C C C 1+e 0
C
1
F (A, D)
e
initially … recompute … recompute
Data Networks Routing Algorithms 21 Data Networks Routing Algorithms 22
Dijkstra’s Algorithm: Discussion Dijkstra’s Algorithm: Discussion
Algorithm complexity: n nodes Algorithm complexity: n nodes
• Each iteration: need to check all nodes, w, not in N’ • Each iteration: need to check all nodes, w, not in N’
• Depending on data structure: O(n2), O(n log n), … • Depending on data structure: O(n2), O(n log n), …

Oscillations possible:
e.g., link cost = amount of carried traffic
Fix? Randomized update times
A A A A A A A A
1 1+e 2+e 0 0 2+e 2+e 1 1+e 2+e 0 0 2+e 2+e
0 0
D 0 B D 1+e 1 B D B D 1+e 1 B D 0 B D 1+e 1 B D B D 1+e 1 B
0 0 0 0 0 0
0 e 0 0 1 1+e 0 e 0 e 0 0 1 1+e 0 e
C C C C C C C C
1 1 1 1
e e
initially … recompute … recompute initially … recompute … recompute
Data Networks Routing Algorithms 22 Data Networks Routing Algorithms 22

Distance Vector Algorithm Bellman-Ford: Example


Bellman-Ford Equation (dynamic programming) 5

v 3 w
2 5
Define u 2
3
1 z

• dx(y) := cost of least-cost path from x to y 1


x y 2
1

Then
• dx(y) = minv{ c(x, v) + dv(y) }
where min is taken over all neighbors v of x

Data Networks Routing Algorithms 23 Data Networks Routing Algorithms 24


Bellman-Ford: Example Bellman-Ford: Example
5 Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3 5 Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3

v 3 w v 3 w
2 5 2 5 Bellman-Ford equation says:
u 2 1 z u 2 1 z
3 3 du(z) = minv{ c(u, v) + dv(z),
1 2 1 2 c(u, x) + dx(z),
x y x y
1 1 c(u, w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4

Data Networks Routing Algorithms 24 Data Networks Routing Algorithms 24

Bellman-Ford: Example Distance Vector Algorithm


5 Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3 • Dx(y) = estimate of least cost from x to y
v 3 w
5 Bellman-Ford equation says:
• Node x knows cost to each neighbor v: c(x, v)
2
u 2 1 z
3 du(z) = minv{ c(u, v) + dv(z),
1 2 c(u, x) + dx(z),
x y
• Node x maintains its distance vector Dx = [ Dx(y): y є N ]
1 c(u, w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
• Node x also maintains its neighbors’ distance vectors
• For each neighbor v, x maintains Dv = [ Dv(y): y є N ]
Node that yields minimum is next hop in shortest path➜ forwarding table

Data Networks Routing Algorithms 24 Data Networks Routing Algorithms 25


Distance Vector Algorithm
Basic idea:
• Each node periodically sends its own distance vector estimate to neighbors
• When a node x receives new DV estimate from neighbor, it updates its own
DV using B-F equation:
Dx(y) ← minv {c(x, v) + Dv(y)} for each node y∊ N y
• Under “natural” conditions the estimates of Dx(y) converge to the
2 1
x z
actual least cost dx(y) 7

Data Networks Routing Algorithms 26 Data Networks Routing Algorithms 27

x x

y y
2 1 2 1
y x z y x z
7 7

z z

time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27
to to
x y z x y z
x 0 2 7 x 0 2 7
from

from
x x
y ∞ ∞ ∞ y ∞ ∞ ∞
z ∞ ∞ ∞ z ∞ ∞ ∞

y x y z y
2 1 2 1
x ∞ ∞ ∞
z z

from
y x y x
7 y 2 0 1 7
z ∞ ∞ ∞

z z

time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27

to to
x y z x y z
x 0 2 7 x 0 2 7
from

from
x x
y ∞ ∞ ∞ y ∞ ∞ ∞
z ∞ ∞ ∞ z ∞ ∞ ∞

x y z y x y z y
2 1 2 1
x ∞ ∞ ∞ x ∞ ∞ ∞
z z
from

from
y x y x
y 2 0 1 7 y 2 0 1 7
z ∞ ∞ ∞ z ∞ ∞ ∞

x y z x y z
x ∞ ∞ ∞ x ∞ ∞ ∞
from

from

z z
y ∞ ∞ ∞ y ∞ ∞ ∞
z 7 1 0 z 7 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27
to to to to
x y z x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 7 x 0 2 3
from

from
x x
y ∞ ∞ ∞ y 2 0 1 y ∞ ∞ ∞ y 2 0 1
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0

x y z y x y z y
2 1 2 1
x ∞ ∞ ∞ x ∞ ∞ ∞
z z
from

from
y x y x
y 2 0 1 7 y 2 0 1 7
z ∞ ∞ ∞ z ∞ ∞ ∞

x y z x y z
x ∞ ∞ ∞ x ∞ ∞ ∞
from

from
z z
y ∞ ∞ ∞ y ∞ ∞ ∞
z 7 1 0 z 7 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27

Dx(y) = min {c(x,y) + Dy(y), c(x,z) + Dz(y)} Dx(y) = min {c(x,y) + Dy(y), c(x,z) + Dz(y)} Dx(z) = min {c(x,y) + Dy(z), c(x,z) + Dz(z)}
= min {2+0 , 7+1} = 2 = min {2+0 , 7+1} = 2 = min {2+1 , 7+0} = 3

to to to to
x y z x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 7 x 0 2 3
from

from
x x
y ∞ ∞ ∞ y 2 0 1 y ∞ ∞ ∞ y 2 0 1
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0

x y z y x y z y
2 1 2 1
x ∞ ∞ ∞ x ∞ ∞ ∞
z z
from

from
y x y x
y 2 0 1 7 y 2 0 1 7
z ∞ ∞ ∞ z ∞ ∞ ∞

x y z x y z
x ∞ ∞ ∞ x ∞ ∞ ∞
from

from

z z
y ∞ ∞ ∞ y ∞ ∞ ∞
z 7 1 0 z 7 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27
to to to to
x y z x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 7 x 0 2 3
from

from
x x
y ∞ ∞ ∞ y 2 0 1 y ∞ ∞ ∞ y 2 0 1
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0

x y z x y z y x y z x y z y
2 1 2 1
x ∞ ∞ ∞ x 0 2 7 x ∞ ∞ ∞ x 0 2 7
z z
from

from
y x y x
y 2 0 1 y 2 0 1 7 y 2 0 1 y 2 0 1 7
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0

x y z x y z
x ∞ ∞ ∞ x ∞ ∞ ∞
from

from
z z
y ∞ ∞ ∞ y ∞ ∞ ∞
z 7 1 0 z 7 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27

to to to to
x y z x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 7 x 0 2 3
from

from
x x
y ∞ ∞ ∞ y 2 0 1 y ∞ ∞ ∞ y 2 0 1
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0

x y z x y z y x y z x y z y
2 1 2 1
x ∞ ∞ ∞ x 0 2 7 x ∞ ∞ ∞ x 0 2 7
z z
from

from
y x y x
y 2 0 1 y 2 0 1 7 y 2 0 1 y 2 0 1 7
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0

x y z x y z x y z x y z
x ∞ ∞ ∞ x 0 2 7 x ∞ ∞ ∞ x 0 2 7
from

from

z z
y ∞ ∞ ∞ y 2 0 1 y ∞ ∞ ∞ y 2 0 1
z 7 1 0 z 3 1 0 z 7 1 0 z 3 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27
to to to to to to
x y z x y z x y z x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 3 x 0 2 7 x 0 2 3 x 0 2 3
from

from
x x
y ∞ ∞ ∞ y 2 0 1 y 2 0 1 y ∞ ∞ ∞ y 2 0 1 y 2 0 1
z ∞ ∞ ∞ z 7 1 0 z 3 1 0 z ∞ ∞ ∞ z 7 1 0 z 3 1 0

x y z x y z y x y z x y z x y z y
2 1 2 1
x ∞ ∞ ∞ x 0 2 7 x ∞ ∞ ∞ x 0 2 7 x 0 2 3
z z
from

from
y x y x
y 2 0 1 y 2 0 1 7 y 2 0 1 y 2 0 1 y 2 0 1 7
z ∞ ∞ ∞ z 7 1 0 z ∞ ∞ ∞ z 7 1 0 z 3 1 0

x y z x y z x y z x y z
x ∞ ∞ ∞ x 0 2 7 x ∞ ∞ ∞ x 0 2 7
from

from
z z
y ∞ ∞ ∞ y 2 0 1 y ∞ ∞ ∞ y 2 0 1
z 7 1 0 z 3 1 0 z 7 1 0 z 3 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27

to to to to to to
x y z x y z x y z x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 3 x 0 2 7 x 0 2 3 x 0 2 3
from

from
x x
y ∞ ∞ ∞ y 2 0 1 y 2 0 1 y ∞ ∞ ∞ y 2 0 1 y 2 0 1
z ∞ ∞ ∞ z 7 1 0 z 3 1 0 z ∞ ∞ ∞ z 7 1 0 z 3 1 0

x y z x y z x y z y x y z x y z x y z y
2 1 2 1
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x ∞ ∞ ∞ x 0 2 7 x 0 2 3
z z
from

from
y x y x
y 2 0 1 y 2 0 1 y 2 0 1 7 y 2 0 1 y 2 0 1 y 2 0 1 7
z ∞ ∞ ∞ z 7 1 0 z 3 1 0 z ∞ ∞ ∞ z 7 1 0 z 3 1 0

x y z x y z x y z x y z x y z x y z
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x ∞ ∞ ∞ x 0 2 7 x 0 2 3
from

from

z z
y ∞ ∞ ∞ y 2 0 1 y 2 0 1 y ∞ ∞ ∞ y 2 0 1 y 2 0 1
z 7 1 0 z 3 1 0 z 3 1 0 z 7 1 0 z 3 1 0 z 3 1 0
time time
Data Networks Routing Algorithms 27 Data Networks Routing Algorithms 27
Distance Vector Algorithm Distance Vector Algorithm
Iterative, asynchronous:
• Each local iteration caused by:
• Local link cost change
• DV update message from neighbor

Data Networks Routing Algorithms 28 Data Networks Routing Algorithms 28

Distance Vector Algorithm Distance Vector Algorithm


Iterative, asynchronous: Iterative, asynchronous: Each node:
• Each local iteration caused by: • Each local iteration caused by: wait for (change in local link cost
• Local link cost change • Local link cost change or msg. from neighbor)
• DV update message from neighbor • DV update message from neighbor

recompute estimates
Distributed: Distributed:
• Each node notifies neighbors only when • Each node notifies neighbors only when If Distance Vector to any dst. has
its Distance Vector changes its Distance Vector changes changed, notify neighbors
• Neighbors then notify their neighbors if • Neighbors then notify their neighbors if
necessary necessary

Data Networks Routing Algorithms 28 Data Networks Routing Algorithms 28


Distance Vector Algorithm Distance Vector Algorithm
At each node x: 9 Loop
1 Initialization: 10 wait (until I see a link cost change to neighbor w
11 or until I receive update from neighbor w)
2 for all destinations y in N: 12
3 Dx(y) = ∞ if y is not a neighbor 13 for each y in N:
4 Dx(y) = c(x, y) if y is a neighbor 14 Dx(y) = minv{c(x, v) + Dv(y)}
5 for each neighbor w 15
16 if Dx(y) changed for any destination y
6 Dw(y) = ∞ for all destinations y in N
17 send DV Dx = [Dx(y)]: y in N] to all neighbors
7 for each neighbor w 18
8 send distance vector Dx = [Dx(y): y in N] to w 19 forever

Data Networks Routing Algorithms 29 Data Networks Routing Algorithms 30

Distance Vector: Link Cost Changes Distance Vector: Link Cost Changes
Link cost changes Link cost changes 1
y
4 1
• Node detects local link cost change • Node detects local link cost change
x z
• Updates routing info, recalculates distance vector • Updates routing info, recalculates distance vector 50

• If DV changes, notify neighbors • If DV changes, notify neighbors

Data Networks Routing Algorithms 31 Data Networks Routing Algorithms 31


Distance Vector: Link Cost Changes Distance Vector: Link Cost Changes
Link cost changes 1 Link cost changes 1
y y
4 1 4 1
• Node detects local link cost change • Node detects local link cost change
x z x z
• Updates routing info, recalculates distance vector 50 • Updates routing info, recalculates distance vector 50

• If DV changes, notify neighbors • If DV changes, notify neighbors

• At time t0: y detects the link-cost change, updates its DV, and informs its neighbors. • At time t0: y detects the link-cost change, updates its DV, and informs its neighbors.
• At time t1: z receives the update from y and updates its table. It computes a new least
cost to x and sends its DV to neighbors.

Data Networks Routing Algorithms 31 Data Networks Routing Algorithms 31

Distance Vector: Link Cost Changes Distance Vector: Link Cost Changes
Link cost changes 1 Link cost changes 1
y y
4 1 4 1
• Node detects local link cost change • Node detects local link cost change
x z x z
• Updates routing info, recalculates distance vector 50 • Updates routing info, recalculates distance vector 50

• If DV changes, notify neighbors • If DV changes, notify neighbors


“Good news” travels fast!
• At time t0: y detects the link-cost change, updates its DV, and informs its neighbors. • At time t0: y detects the link-cost change, updates its DV, and informs its neighbors.
• At time t1: z receives the update from y and updates its table. It computes a new least • At time t1: z receives the update from y and updates its table. It computes a new least
cost to x and sends its DV to neighbors. cost to x and sends its DV to neighbors.
• At time t2: y receives z’s update and updates its distance table. y’s least costs do not • At time t2: y receives z’s update and updates its distance table. y’s least costs do not
change and hence y does not send any message to z. change and hence y does not send any message to z.

Data Networks Routing Algorithms 31 Data Networks Routing Algorithms 31


DV Algorithm: Good News Travels Fast! DV Algorithm: Good News Travels Fast!
1
y y
4 1 4 1
x z x z
50 50

to to
x y z x y z
x x
from

from
y y
y 4 0 1 y 4 0 1
z 5 1 0 z 5 1 0

x y z x y z
x x
from

from
z z
y 4 0 1 y 4 0 1
z 5 1 0 z 5 1 0

Data Networks Routing Algorithms 32 Data Networks Routing Algorithms 32

DV Algorithm: Good News Travels Fast! DV Algorithm: Good News Travels Fast!
1 1
y y
4 1 4 1
x z x z
50 50

to to to
x y z x y z x y z
x x x
from

from
y y
y 14 0 1 y 14 0 1 y 1 0 1
z 5 1 0 z 5 1 0 z 5 1 0

x y z x y z
x x
from

from

z z
y 4 0 1 y 4 0 1
z 5 1 0 z 5 1 0

Data Networks Routing Algorithms 32 Data Networks Routing Algorithms 32


DV Algorithm: Good News Travels Fast! DV Algorithm: Good News Travels Fast!
1 1
y y
4 1 4 1
x z x z
50 50

to to to to to
x y z x y z x y z x y z x y z
x x x x x
from

from
y y
y 14 0 1 y 1 0 1 y 14 0 1 y 1 0 1 y 1 0 1
z 5 1 0 z 5 1 0 z 5 1 0 z 5 1 0 z 2 1 0

x y z x y z x y z x y z x y z
x x x x x
from

from
z z
y 4 0 1 y 1 0 1 y 4 0 1 y 1 0 1 y 1 0 1
z 5 1 0 z 2 1 0 z 5 1 0 z 2 1 0 z 2 1 0

Data Networks Routing Algorithms 32 Data Networks Routing Algorithms 32

Distance Vector: Link Cost Changes DV Algorithm: Bad News Travels Slowly!
Link cost changes: y
• Good news travels fast 4 1
60
• Bad news travels… y
x
50
z
4 1
x z to
50
x y z
x

from
y
y 4 0 1
z 5 1 0

x y z
x
from

z
y 4 0 1
z 5 1 0

Data Networks Routing Algorithms 33 Data Networks Routing Algorithms 34


DV Algorithm: Bad News Travels Slowly! DV Algorithm: Bad News Travels Slowly!
60 60
y y
4 1 4 1
x z x z
50 50

to to
x y z x y z
x x
from

from
y y
y 4 0 1 y 64 0 1
z 5 1 0 z 5 1 0

x y z x y z
x x
from

from
z z
y 4 0 1 y 4 0 1
z 5 1 0 z 5 1 0

Data Networks Routing Algorithms 34 Data Networks Routing Algorithms 34

DV Algorithm: Bad News Travels Slowly! DV Algorithm: Bad News Travels Slowly!
60 60
y y
Dy(x) = min {c(y,x) + Dx(x), c(y,z) + Dz(x)} 4 1 4 1
= min {60+0 , 1+5} = 6 x z x z
50 50

to to to
x y z x y z x y z
x x x
from

from
y y
y 64 0 1 y 64 0 1 y 6 0 1
z 5 1 0 z 5 1 0 z 5 1 0

x y z x y z
x x
from

from

z z
y 4 0 1 y 4 0 1
z 5 1 0 z 5 1 0

Data Networks Routing Algorithms 34 Data Networks Routing Algorithms 34


DV Algorithm: Bad News Travels Slowly! DV Algorithm: Bad News Travels Slowly!
60 60
y y
4 1 4 1
x z x z
50 50

to to to to to
x y z x y z x y z x y z x y z
x x x x x
from

from
y y
y 64 0 1 y 6 0 1 y 64 0 1 y 6 0 1 y 68 0 1
z 5 1 0 z 5 1 0 z 5 1 0 z 5 1 0 z 7 1 0

x y z x y z x y z x y z
x x x x
from

from
z z
y 4 0 1 y 6 0 1 y 4 0 1 y 6 0 1
z 5 1 0 z 57 1 0 z 5 1 0 z 57 1 0

Data Networks Routing Algorithms 34 Data Networks Routing Algorithms 34

DV Algorithm: Bad News Travels Slowly! DV Algorithm: Bad News Travels Slowly!
60 60
y y
Dy(x) = min {c(y,x) + Dx(x), c(y,z) + Dz(x)} 4 1 4 1
= min {60+0 , 1+7} = 8 x z x z
50 50

to to to to to to
x y z x y z x y z x y z x y z x y z
x x x x x x
from

from
y y
y 64 0 1 y 6 0 1 y 68 0 1 y 64 0 1 y 6 0 1 y 68 0 1
z 5 1 0 z 5 1 0 z 7 1 0 z 5 1 0 z 5 1 0 z 7 1 0

x y z x y z x y z x y z x y z
x x x x x
from

from

z z
y 4 0 1 y 6 0 1 y 4 0 1 y 6 0 1 y 6 0 1
z 5 1 0 z 57 1 0 z 5 1 0 z 57 1 0 z 7 1 0

Data Networks Routing Algorithms 34 Data Networks Routing Algorithms 34


Distance Vector: Link Cost Changes Distance Vector: Link Cost Changes
Link cost changes: 60 Link cost changes: 60
y y
• Good news travels fast 4 1 • Good news travels fast 4 1
• Bad news travels slowly x z • Bad news travels slowly x z
50 50

Bad news travels slowly: “count to infinity” problem!


• 44 iterations before algorithm stabilizes!

Data Networks Routing Algorithms 35 Data Networks Routing Algorithms 35

Distance Vector: Link Cost Changes Distance Vector: Link Cost Changes
Link cost changes: 60 Link cost changes: 60
y y
• Good news travels fast 4 1 • Good news travels fast 4 1
• Bad news travels slowly x z • Bad news travels slowly x z
50 50

Bad news travels slowly: “count to infinity” problem! Bad news travels slowly: “count to infinity” problem!
• 44 iterations before algorithm stabilizes! • 44 iterations before algorithm stabilizes!

Poisoned reverse: Poisoned reverse:


• If Z routes through Y to get to X: Z tells Y its (Z’s) • If Z routes through Y to get to X: Z tells Y its (Z’s)
distance to X is infinite (so Y won’t route to X via Z) distance to X is infinite (so Y won’t route to X via Z)

Will this completely solve count to infinity problem?

Data Networks Routing Algorithms 35 Data Networks Routing Algorithms 35


Comparison of LS and DV Algorithms Recap
• Message complexity?
• LS: O(nE) messages with n nodes and E links
• DV: Updates exchanged only between neighbors • Routing algorithms

• Speed of convergence?
• LS: O(n2), O(E + n log n) depending on data structure; may oscillate • Link-state algorithm
• DV: varies; routing loops and “count to infinity” problem • Dijkstra’s algorithm

• Robustness? (what happens when a router malfunctions?)


• LS: a node may advertise incorrect link cost; each node computes only its own table • Distance vector algorithm
• DV: a node may advertise incorrect path cost; nodes use each other’s tables and • Bellman-Ford algorithm
error propagates

Data Networks Routing Algorithms 36 Data Networks Addressing 37

Chapter 4: network layer


Chapter 4
Network Layer chapter goals:
❖ understand principles behind network layer
services:
▪ network layer service models
Computer ▪ forwarding versus routing
Networking: A ▪ how a router works
Top Down ▪ routing (path selection)
Approach ▪ broadcast, multicast
6th edition
Jim Kurose, Keith Ross ❖ instantiation, implementation in the Internet
Addison-Wesley
March 2012
All material copyright 1996-2013
J.F Kurose and K.W. Ross, All Rights Reserved

Network Layer 4-1 Network Layer 4-2


Chapter 4: outline Network layer

application
transport segment from transport
network
4.1 introduction 4.5 routing algorithms sending to receiving host data link
physical

4.2 virtual circuit and ▪ link state network network

❖ on sending side data link data link


network
datagram networks ▪ distance vector data link
physical physical

▪ hierarchical routing encapsulates segments physical network network

4.3 what’s inside a router data link data link

4.6 routing in the Internet into datagrams physical physical

4.4 IP: Internet Protocol


▪ datagram format
▪ RIP ❖ on receiving side, delivers network
data link
network
data link
▪ OSPF segments to transport physical physical

network
IPv4 addressing
▪ BGP
data link

▪ layer physical
ICMP application
▪ 4.7 broadcast and multicast

network
IPv6 transport

routing network layer protocols network


data link
physical
network
data link
network
data link

in every host, router data link


physical
physical physical

❖ router examines header


fields in all IP datagrams
passing through it
Network Layer 4-3 Network Layer 4-4

Interplay between routing and forwarding


Two key network-layer functions
routing algorithm routing algorithm determines
❖ forwarding: move analogy: end-end-path through network

packets from router’s local forwarding table forwarding table determines


input to appropriate ❖ routing: process of header value output link local forwarding at this router

router output planning trip from source 0100


0101
3
2
to dest 0111
1001
2
1
❖ routing: determine route
taken by packets from ❖ forwarding: process of
source to dest. getting through single value in arriving
packet’s header
interchange 1
▪ routing algorithms 0111

3 2

Network Layer 4-5 Network Layer 4-6


Network service model Chapter 4: outline
Q: What service model for “channel” transporting 4.1 introduction 4.5 routing algorithms
datagrams from sender to receiver? 4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
example services for example services for a 4.3 what’s inside a router ▪ hierarchical routing
individual datagrams: flow of datagrams: 4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ RIP
❖ guaranteed delivery ❖ in-order datagram ▪ datagram format
▪ OSPF
❖ guaranteed delivery with delivery ▪ IPv4 addressing
▪ BGP
less than 40 msec delay ❖ guaranteed minimum ▪ ICMP
▪ IPv6 4.7 broadcast and multicast
bandwidth to flow
routing
❖ restrictions on changes in
inter-packet spacing

Network Layer 4-7 Network Layer 4-8

Connection, connection-less service Virtual circuits


❖ virtual-circuit network provides network-layer “source-to-dest path behaves much like telephone
connection service circuit”
❖ datagram network provides network-layer ▪ performance-wise
connectionless service ▪ network actions along source-to-dest path
❖ analogous to TCP/UDP connecton-oriented /
connectionless transport-layer services, but: ❖ call setup, teardown for each call before data can flow
▪ service: host-to-host ❖ each packet carries VC identifier (not destination host
address)
▪ no choice: network provides one or the other ❖ every router on source-dest path maintains “state” for
▪ implementation: in network core each passing connection
❖ link, router resources (bandwidth, buffers) may be
allocated to VC (dedicated resources = predictable
service)
Network Layer 4-9 Network Layer 4-10
VC implementation VC forwarding table
12 22 32

a VC consists of: 1
2
3

1. path from source to destination VC number


interface
2. VC numbers, one number for each link along path forwarding table number
3. entries in forwarding tables in routers along path in
❖ packet belonging to VC carries VC number northwest
Incoming interface
router: Incoming VC # Outgoing interface Outgoing VC #
(rather than dest address)
1 12 3 22
❖ VC number can be changed on each link. 2 63 1 18
▪ new VC number comes from forwarding table 3 7 2 17
1 97 3 87
… … … …

VC routers maintain connection state


Network Layer 4-11
information! Network Layer 4-12

Virtual circuits: signaling protocols Datagram networks


❖ no call setup at network layer
❖ used to setup, maintain teardown VC ❖ routers: no state about end-to-end connections
❖ used in ATM, frame-relay, X.25 ▪ no network-level concept of “connection”
❖ not used in today’s Internet ❖ packets forwarded using destination host address

application application application application


5. data flow begins 6. receive data
transport transport transport transport
network 4. call connected 3. accept call network 1. send datagrams
1. initiate call network 2. receive datagrams network
data link 2. incoming call data link
data link data link
physical physical physical physical

Network Layer 4-13 Network Layer 4-14


Datagram forwarding table Datagram or VC network: why?
4 billion IP addresses, so
routing algorithm rather than list individual Internet (datagram) ATM (VC)
destination address ❖ data exchange among ❖ evolved from telephony
local forwarding table
list range of addresses computers ❖ human conversation:
(aggregate table entries) ▪ strict timing, reliability
dest address output link ▪ “elastic” service, no strict
address-range 1 3
timing req. requirements
address-range 2 2
▪ need for guaranteed service

address-range 3 2
address-range 4 1 many link types ❖ “dumb” end systems
▪ different characteristics ▪ telephones
▪ uniform service difficult ▪ complexity inside
“smart” end systems
IP destination address in
arriving packet’s header ❖ network
1
(computers)
▪ can adapt, perform control,
3 2 error recovery
▪ simple inside network,
complexity at “edge”

Network Layer 4-15 Network Layer 4-16

Chapter 4: outline Router architecture overview


two key router functions:
❖ run routing algorithms/protocol (RIP, OSPF, BGP)
4.1 introduction 4.5 routing algorithms
▪ link state ❖ forwarding datagrams from incoming to outgoing link
4.2 virtual circuit and
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing forwarding tables computed, routing
routing, management
4.4 IP: Internet Protocol 4.6 routing in the Internet pushed to input ports
processor
control plane (software)
▪ RIP
▪ datagram format
▪ OSPF forwarding data
▪ IPv4 addressing plane (hardware)
▪ BGP
▪ ICMP
▪ IPv6 4.7 broadcast and multicast
routing high-seed
switching
fabric

router input ports router output ports


Network Layer 4-17 Network Layer 4-18
Input port functions Switching fabrics
❖ transfer packet from input buffer to appropriate
lookup,
link
forwarding
output buffer
layer

line switch
termination
protocol
fabric
switching rate: rate at which packets can be
(receive
) queueing
transfer from inputs to outputs
▪ often measured as multiple of input/output line rate
physical layer: ▪ N inputs: switching rate N times line rate desirable
bit-level reception ❖ three types of switching fabrics
data link layer: decentralized switching:
e.g., Ethernet ❖ given datagram dest., lookup output port
see chapter 5 using forwarding table in input port memory
memory (“match plus action”)
❖ goal: complete input port processing at
‘line speed’
memory bus crossbar
❖ queuing: if datagrams arrive faster than
forwarding rate into switch fabric
Network Layer 4-19 Network Layer 4-20

Switching via memory Switching via a bus


first generation routers:
❖ traditional computers with switching under direct control ❖ datagram from input port memory
of CPU to output port memory via a
❖ packetcopied to system’s memory shared bus
❖ speed limited by memory bandwidth (2 bus crossings per
❖ bus contention: switching speed
datagram)
limited by bus bandwidth
❖ 32 Gbps bus, Cisco 5600: sufficient bus
input output speed for access and enterprise
port
(e.g.,
memory port
(e.g.,
routers
Ethernet) Ethernet)

system bus

Network Layer 4-21 Network Layer 4-22


Switching via interconnection network Input port queuing
❖ fabric slower than input ports combined -> queueing may
❖ Overcome bus bandwidth limitations occur at input queues
❖ Banyan networks, crossbar, other ▪ queueing delay and loss due to input buffer overflow!
interconnection nets initially developed to
❖ Head-of-the-Line (HOL) blocking: queued datagram at front
connect processors in multiprocessor
of queue prevents others in queue from moving forward
❖ Crossbar is an interconnection network
consisting of 2N buses connecting N input
ports to N output ports
❖ A crossbar switch is non-blocking—a crossbar switch switch
packet being forwarded to an output port fabric
fabric
will not be blocked from reaching that
output port as long as no other packet is
currently being forwarded to that output output port contention: one packet time later:
port only one red datagram can be green packet
❖ Cisco 12000: switches 60 Gbps through transferred. experiences HOL
the interconnection network lower red packet is blocked blocking

Network Layer 4-23 Network Layer 4-24

Output ports How much buffering?


datagram ❖ RFC 3439 rule of thumb: average buffering equal
switch
fabric
buffer link
layer line to “typical” RTT (say 250 msec) times link
(rate: NR) protocol
(send)
termination
R
capacity C
queueing
▪ e.g., C = 10 Gpbs link: 2.5 Gbit buffer
❖ recent recommendation: with N flows, buffering
equal to RTT . C
❖ buffering required when datagrams
Datagram (packets)arrive
can be lost N
from fabric faster than the
due to transmission
congestion, lack of buffers but too much buffering can increase delays (particularly
rate in home routers)
❖ scheduling discipline
Priority chooses
schedulingamong
– who gets best • long RTTs: poor performance for real-time apps,
queued datagrams for transmission
performance, network neutrality sluggish TCP response
• recall delay-based congestion control: “keep bottleneck
Network Layer 4-25
link just full enough (busy) but no fuller” Network Layer 4-27
Buffer Management Packet Scheduling: FCFS

buffer management: packet scheduling: deciding FCFS: packets transmitted in


switch
datagram
buffer
link
layer
▪ drop: which packet to which packet to send order of arrival to output
fabric protoc
line
terminati
R
add, drop when buffers are next on link port
▪ ▪ also known as: First-in-first-out
ol on
queueing (send)
full first come, first served
scheduling
▪ priority
• tail drop: drop arriving ▪ round robin
(FIFO)
packet ▪ real world examples?
▪ weighted fair queueing
• priority: drop/remove
Abstraction: queue on priority basis Abstraction: queue
packet packet
R R
▪ marking: which packets
packet departures packet departures
arrivals queue link arrivals queue link
(waiting area) (server) (waiting area) (server)
to mark to signal
congestion (ECN, RED)

Network Layer 4-28 Network Layer 4-29

Scheduling policies: priority Scheduling policies: round robin

Priority scheduling: Round Robin (RR)


scheduling:
❖ arriving traffic classified, ❖ arriving traffic classified,
queued by class high priority queue
queued by class
▪ any header fields can be arrivals ▪ any header fields can be
used for classification used for classification
▪ send packet from highest classify link departures
▪ server cyclically, R
priority queue that has repeatedly scans class classify link departures
low priority queue arrivals
buffered packets queues, sending one
• FCFS within priority class complete packet from
each class (if available) in
turn

Network Layer 4-30 Network Layer 4-31


Chapter 4: outline The Internet network layer
host, router network layer functions:
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state transport layer: TCP, UDP
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing IP protocol
routing protocols
4.6 routing in the Internet • path selection • addressing conventions
4.4 IP: Internet Protocol • datagram format
▪ RIP • RIP, OSPF,
▪ datagram format network • packet handling
▪ OSPF BGP
▪ IPv4 addressing layer SDN controller conventions
▪ BGP forwarding
ICMP protocol
▪ ICMP table
4.7 broadcast and multicast • error reporting
▪ IPv6
routing • router “signaling”

link layer

physical layer

Network Layer 4-32 Network Layer 4-33

IP datagram format
IP protocol version 32 bits
number total datagram
header length length (bytes)
ver head. type of length
(bytes) len service for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset Reassembly
max number time to upper header
remaining hops live layer checksum
(decremented at Flgs: 3 bits
32 bit source IP address Rsvd, DF, MF
each router)
32 bit destination IP address
upper layer protocol
to deliver payload to options (if any) e.g. timestamp,
record route
how much overhead? data taken, specify
❖ 20 bytes of TCP
(variable length, list of routers
typically a TCP to visit.
❖ 20 bytes of IP
or UDP segment)
❖ = 40 bytes + app
layer overhead

Network Layer 4-34 Network Layer 4-35


IP fragmentation, reassembly IP fragmentation, reassembly
❖ network links have MTU length ID fragflag offset
(max.transfer size) - example: =4000 =x =0 =0
largest possible link-level fragmentation: ❖ 4000 byte datagram


frame in: one large datagram ❖ MTU = 1500 bytes
one large datagram becomes
▪ different link types, out: 3 smaller datagrams several smaller datagrams
different MTUs
1480 bytes in length ID fragflag offset
❖ large IP datagram divided data field =1500 =x =1 =0
(“fragmented”) within net reassembly
▪ one datagram becomes offset = length ID fragflag offset
several datagrams 1480/8 =1500 =x =1 =185
▪ “reassembled” only at

length ID fragflag offset
final destination =1040 =x =0 =370
▪ IP header bits used to
identify, order related
fragments
Network Layer 4-36 Network Layer 4-37

IP fragmentation, reassembly Chapter 4: outline


https://tools.ietf.org/html/rfc791

Len= 4000 ; ID= X; fragflag=0; offset=0 4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
MTU = 1500 datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing

Len= 1500 ; ID= X; Len= 1500 ; ID= X; Len= 1040 ; ID= X; 4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ RIP
fragflag=1; offset=0 fragflag=1; offset=185 fragflag=0; offset=370 ▪ datagram format
▪ OSPF
▪ IPv4 addressing
MTU = 900 ▪ BGP
▪ ICMP
▪ IPv6 4.7 broadcast and multicast
Len= 900 ; Len= 620 ; Len= 900 ; Len= 620 ; Len= 900 ; Len= 160 ; routing
ID= X; FF=1; ID= X; FF=1; ID= X; FF=1; ID= X; FF=1; ID= X; FF=1; ID= X; FF=0;
offset=0 offset=110 offset=185 offset=295 offset=370 offset=480

Receiver
0 880 1480 2360 2960 3840

Network Layer 4-38 Network Layer 4-39


IP addressing: introduction IP addressing: introduction
223.1.1.1 223.1.1.1
❖ IP address: 32-bit Q: how are interfaces
223.1.2.1 223.1.2.1
identifier for host, router actually connected?
interface 223.1.1.2
223.1.1.4 223.1.2.9 A: we’ll learn about that223.1.1.2 223.1.1.4 223.1.2.9
❖ interface: connection in chapter 5, 6.
between host/router and 223.1.3.27 223.1.3.27
physical link 223.1.1.3
223.1.2.2
223.1.1.3
223.1.2.2
▪ router’s typically have
multiple interfaces A: wired Ethernet interfaces
▪ host typically has one or connected by Ethernet switches
two interfaces (e.g., wired 223.1.3.1 223.1.3.2 223.1.3.1 223.1.3.2

Ethernet, wireless 802.11)


For now: don’t need to worry
❖ IP addresses associated about how one interface is
with each interface!!!! 223.1.1.1 = 11011111 00000001 00000001 00000001 connected to another (with no
A: wireless WiFi interfaces
223 1 1 1 intervening router)
connected by WiFi base station

Network Layer 4-40 Network Layer 4-41

Subnets Subnets
223.1.1.0/24
223.1.2.0/24
❖ What’s a subnet ?
▪ device interfaces that can
223.1.1.1
recipe 223.1.1.1

physically reach each other 223.1.1.2 223.1.2.1 ❖ to determine the 223.1.1.2 223.1.2.1
without passing through an 223.1.1.4 223.1.2.9 subnets, detach each 223.1.1.4 223.1.2.9
intervening router 223.1.2.2
interface from its host 223.1.2.2
223.1.1.3 223.1.3.27 or router, creating 223.1.1.3 223.1.3.27

❖ IP address: subnet islands of isolated subnet


▪ subnet part - high order networks
bits are common in subnet ❖
223.1.3.2 223.1.3.2
223.1.3.1 each isolated network 223.1.3.1
▪ host part - low order bits is called a subnet
network consisting of 3 subnets 223.1.3.0/24

subnet mask: /24


Network Layer 4-42 Network Layer 4-43
Subnets 223.1.1.2 IP addressing: CIDR
how many? 223.1.1.1 223.1.1.4
CIDR: Classless InterDomain Routing
223.1.1.3
▪ subnet portion of address of arbitrary length
223.1.9.2 223.1.7.0
▪ address format: a.b.c.d/x, where x is # bits in
subnet portion of address

223.1.9.1 223.1.7.1 subnet host


223.1.8.1 223.1.8.0 part part

223.1.2.6
11001000 00010111 00010000 00000000
223.1.3.27

223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2


200.23.16.0/23

Network Layer 4-44 Network Layer 4-45

IP addresses: how to get one? DHCP: Dynamic Host Configuration Protocol


Q: How does a host get IP address? goal: allow host to dynamically obtain its IP address from network
server when it joins network
❖ hard-coded by system admin in a file ▪ can renew its lease on address in use
▪ Windows: control-panel->network->configuration- ▪ allows reuse of addresses (only hold address while
>tcp/ip->properties connected/“on”)
▪ UNIX: /etc/rc.config ▪ support for mobile users who want to join network (more
shortly)
❖ DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server DHCP overview:
▪ host broadcasts “DHCP discover” msg [optional]
▪ “plug-and-play”
▪ DHCP server responds with “DHCP offer” msg [optional]
▪ host requests IP address: “DHCP request” msg
▪ DHCP server sends address: “DHCP ack” msg

Network Layer 4-46 Network Layer 4-47


DHCP client-server scenario DHCP client-server scenario
DHCP server: 223.1.2.5 DHCP discover arriving
client
src : 0.0.0.0, 68
DHCP Broadcast: is there a
dest.: 255.255.255.255,67
223.1.1.0/24 DHCPyiaddr:
server 0.0.0.0
out there?
server transaction ID: 654
223.1.1.1 223.1.2.1
DHCP offer
src: 223.1.2.5, 67
223.1.1.2 Broadcast: I’m a DHCP
dest: 255.255.255.255, 68
arriving DHCP server!
yiaddrr:Here’s an IP
223.1.2.4
223.1.1.4 223.1.2.9
client needs transaction ID:
address you can use654
address in this lifetime: 3600 secs
DHCP request
223.1.3.27
223.1.2.2 network
223.1.1.3 src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
Broadcast: OK. I’ll take
yiaddrr: 223.1.2.4
223.1.2.0/24 that IP address!
transaction ID: 655
lifetime: 3600 secs

223.1.3.1 223.1.3.2
DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255,
Broadcast: 68
OK. You’ve
yiaddrr: 223.1.2.4
got that IPID:
transaction address!
655
223.1.3.0/24
lifetime: 3600 secs

Network Layer 4-48 Network Layer 4-49

DHCP: more than IP addresses IP addresses: how to get one?


Q: how does network get subnet part of IP addr?
DHCP can return more than just allocated IP
address on subnet:
A: gets allocated portion of its provider ISP’s address
space
▪ address of first-hop router for client
▪ name and IP address of DNS sever
▪ network mask (indicating network versus host portion
of address) ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23


Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

Network Layer 4-50 Network Layer 4-54


Hierarchical addressing: route aggregation Hierarchical addressing: more specific routes
hierarchical addressing allows efficient advertisement of routing ISPs-R-Us has a more specific route to Organization 1
information:

Organization 0 Organization 0
200.23.16.0/23 200.23.16.0/23
Organization 1
“Send me anything “Send me anything
200.23.18.0/23 with addresses with addresses
Organization 2 beginning Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20” 200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
. .
. . Internet . . Internet
. .
Organization 7 . Organization 7 .
200.23.30.0/23 200.23.30.0/23
“Send me anything “Send me anything
ISPs-R-Us ISPs-R-Us
with addresses with addresses
beginning Organization 1 beginning 199.31.0.0/16
199.31.0.0/16” or 200.23.18.0/23”
200.23.18.0/23

Network Layer 4-55 Network Layer 4-56

IP addressing: the last word... NAT: network address translation


Q: how does an ISP get block of addresses? rest of local network
Internet (e.g., home network)
A: ICANN: Internet Corporation for Assigned 10.0.0/24 10.0.0.1
Names and Numbers http://www.icann.org/ 10.0.0.4
▪ allocates addresses 10.0.0.2

▪ manages DNS 138.76.29.7

▪ assigns domain names, resolves disputes 10.0.0.3

all datagrams leaving local datagrams with source or


network have same single destination in this network
source NAT IP address: have 10.0.0/24 address for
138.76.29.7,different source source, destination (as usual)
port numbers
Network Layer 4-57 Network Layer 4-58
NAT: network address translation NAT: network address translation
motivation: local network uses just one IP address as far implementation: NAT router must:
as outside world is concerned:
▪ outgoing datagrams: replace (source IP address, port #) of
▪ range of addresses not needed from ISP: just one every outgoing datagram to (NAT IP address, new port #)
IP address for all devices . . . remote clients/servers will respond using (NAT IP
▪ can change addresses of devices in local network address, new port #) as destination addr
without notifying outside world ▪ remember (in NAT translation table) every (source IP
▪ can change ISP without changing addresses of address, port #) to (NAT IP address, new port #) translation
devices in local network pair
▪ devices inside local net not explicitly addressable, ▪ incoming datagrams: replace (NAT IP address, new port #) in
visible by outside world (a security plus) dest fields of every incoming datagram with corresponding
(source IP address, port #) stored in NAT table

Network Layer 4-59 Network Layer 4-60

NAT: network address translation NAT: network address translation


NAT translation table

1: host 10.0.0.1
2: NAT router WAN side addr LAN side addr sends datagram to 16-bit port-number field:
changes datagram
source addr from 138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80 ▪ 60,000 simultaneous connections with a single
10.0.0.1, 3345 to …… ……
138.76.29.7, 5001, LAN-side address!
updates table S: 10.0.0.1, 3345 ❖ NAT is controversial:
D: 128.119.40.186, 80

1
10.0.0.1
▪ routers should only process up to layer 3
2
S: 138.76.29.7, 5001
D: 128.119.40.186, 80 10.0.0.4 ▪ violates end-to-end argument
10.0.0.2
138.76.29.7
• NAT possibility must be taken into account by app
S: 128.119.40.186, 80
D: 10.0.0.1, 3345
4 designers, e.g., P2P applications
S: 128.119.40.186, 80
3
D: 138.76.29.7, 5001
3: reply arrives
4: NAT router 10.0.0.3
▪ address shortage should instead be solved by
changes datagram
dest. address: dest addr from IPv6
138.76.29.7, 5001 138.76.29.7, 5001 to 10.0.0.1, 3345

Network Layer 4-61 Network Layer 4-62


Chapter 4: outline ICMP: internet control message protocol

4.1 introduction 4.5 routing algorithms ❖ used by hosts & routers


to communicate network- Type Code description
4.2 virtual circuit and ▪ link state 0 0 echo reply (ping)
datagram networks ▪ distance vector level information 3 0 dest. network unreachable
▪ hierarchical routing ▪ error reporting: 3 1 dest host unreachable
4.3 what’s inside a router unreachable host, network,
4.4 IP: Internet Protocol 4.6 routing in the Internet port, protocol
3 2 dest protocol unreachable
▪ RIP 3 3 dest port unreachable
▪ datagram format ▪ echo request/reply (used by 3 6 dest network unknown
▪ OSPF ping)
▪ IPv4 addressing 3 7 dest host unknown
▪ BGP ❖ network-layer “above” IP:
▪ ICMP 4 0 source quench (congestion
▪ IPv6 4.7 broadcast and multicast ▪ ICMP msgs carried in IP control - not used)
routing datagrams 8 0 echo request (ping)
❖ ICMP message: type, code 9 0 route advertisement
10 0 router discovery
plus first 8 bytes of IP 11 0 TTL expired
datagram causing error 12 0 bad IP header

Network Layer 4-66 Network Layer 4-67

Traceroute and ICMP IPv6: motivation


❖ source sends series of ❖ when ICMP messages
UDP segments to dest arrives, source records ❖ initial motivation: 32-bit address space soon to be
▪ first set has TTL =1 RTTs completely allocated.
▪ second set has TTL=2, etc.
❖ additional motivation:
▪ unlikely port number stopping criteria:

▪ header format helps speed processing/forwarding
when nth set of datagrams ❖ UDP segment eventually
arrives to nth router: arrives at destination host ▪ header changes to facilitate QoS
▪ router discards datagrams ❖ destination returns ICMP
▪ and sends source ICMP “port unreachable” IPv6 datagram format:
messages (type 11, code 0) message (type 3, code 3)
▪ ICMP messages includes ▪ fixed-length 40 byte header
name of router & IP address ❖ source stops ▪ no fragmentation allowed

3 probes 3 probes

3 probes
Network Layer 4-68 Network Layer 4-69
IPv6 datagram format Other changes from IPv4
priority: identify priority among datagrams in
❖ checksum: removed entirely to reduce processing
flow time at each hop
flow Label: identify datagrams in same “flow.” ❖ options: allowed, but outside of header, indicated
(concept of“flow” not well defined). by “Next Header” field
next header: identify upper layer protocol for ❖ ICMPv6: new version of ICMP
ver pri flow label
data payload len next hdr hop limit ▪ additional message types, e.g. “Packet Too Big”
source address
(128 bits)
▪ multicast group management functions
destination address
(128 bits)

data

32 bits
Network Layer 4-70 Network Layer 4-71

IPv6 addressing IPv6 addressing


❖ 128 bit address ❖ 128‐bit IPv6 address = 64‐bit prefix + 64‐bit
▪ divided into 8 hextets (16 bits) Interface ID (IID)
▪ Hextets divided by “:” ❖ Supports unicast, anycast, and multicast
❖ 2001:012a:0000:0000:2567:12dc:0000:0123 ❖ The 64‐bit prefix is hierarchical
▪ 0010000000000001:0000000100101010: ….. ❖ The 64‐bit IID identifies the network interface
:0000000100100011
▪ Continuous 0000 can be removed by ::
• 2001:012a::2567:12dc:0000:0123
▪ Remove starting 0s
• 2001:12a::2567:12dc:0:123
03 bits address prefix (001 global unicast)
45 bits global routing prefix
16 bits subnet id
Network Layer 4-72 https://docs.oracle.com/cd/E18752_01/html/816-4554/ipv6-overview-10.html Network Layer 4-73
Transition from IPv4 to IPv6 Tunneling
A B IPv4 tunnel E F

connecting IPv6 routers
not all routers can be upgraded simultaneously logical view:
▪ no “flag days” IPv6 IPv6 IPv6 IPv6

▪ how will network operate with mixed IPv4 and A B C D E F


IPv6 routers? physical view:

❖ tunneling: IPv6 datagram carried as payload in IPv4 IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

datagram among IPv4 routers

IPv4 header fields IPv6 header fields


IPv4 payload
IPv4 source, dest addr IPv6 source dest addr
UDP/TCP payload

IPv6 datagram
IPv4 datagram
Network Layer 4-74 Network Layer 4-75

Tunneling IPv6: adoption


A B IPv4 tunnel E F
logical view:
connecting IPv6 routers
▪ Google1: ~ 30% of clients access services via
IPv6 IPv6 IPv6 IPv6
IPv6
A B C D E F
physical view: ▪ NIST: 1/3 of all US government domains are
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6 IPv6 capable
flow: X
src: A
src:B src:B flow: X
src: A
▪ Long (long!) time for deployment, use
dest: E
• 25 years and counting!
dest: F
dest: E
dest: F
Flow: X Flow: X

data
Src: A
Dest: F
Src: A
Dest: F data
• think of application-level changes in last 25
years: WWW, social media, streaming media,
data data gaming, telepresence, …
A-to-B: E-to-F:
• Why?
IPv6 B-to-C: B-to-C: IPv6
IPv6 inside IPv6 inside 1
IPv4 IPv4 https://www.google.com/intl/en/ipv6/statistics.html
Network Layer 4-76 Network Layer 4-77
Chapter 4: outline Interplay between routing, forwarding
routing algorithm determines
4.1 introduction 4.5 routing algorithms routing algorithm
end-end-path through network
4.2 virtual circuit and ▪ link state
▪ distance vector forwarding table determines
datagram networks local forwarding table
local forwarding at this router
4.3 what’s inside a router ▪ hierarchical routing dest address output link
address-range 1 3
4.4 IP: Internet Protocol 4.6 routing in the Internet address-range 2 2
▪ RIP address-range 3 2
▪ datagram format address-range 4 1
▪ OSPF
▪ IPv4 addressing
▪ BGP
▪ ICMP IP destination address in
▪ IPv6 4.7 broadcast and multicast arriving packet’s header
routing
1
3 2

Network Layer 4-78 Network Layer 4-79

Computing the routing table Routing protocols


Routing protocol goal:
determine “good” paths
(equivalently, routes), from
sending hosts to receiving host,
through network of routers
❖path: sequence of routers
packets will traverse in going
from given initial source host to
given final destination host
❖“good”: least “cost”, “fastest”,
“least congested”
Local computation in each router SDN based computation (typically) ❖routing: a “top-10” networking
at a central controller challenge!
Network Layer 4-80 Network Layer 5-81
Graph abstraction Graph abstraction: costs
5 5
c(x,x’) = cost of link (x,x’)
v 3 w 3 e.g., c(w,z) = 5
2 5 v w 5
2
u 2 1 z u cost could always be 1, or
3 2
3
1 z inversely related to bandwidth,
1 2 1
x y 2 or inversely related to
graph: G = (N,E) 1 x 1
y congestion

N = set of routers = { u, v, w, x, y, z }
cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

aside: graph abstraction is useful in other network contexts, e.g., key question: what is the least-cost path between u and z
P2P, where N is set of peers and E is set of TCP connections ?
routing algorithm: algorithm that finds that least cost
Network Layer 4-82
path Network Layer 4-83

Routing algorithm classification Chapter 4: outline


Q: global or decentralized Q: static or dynamic?
information? 4.1 introduction 4.5 routing algorithms
static: 4.2 virtual circuit and ▪ link state
global: ❖ routes change slowly over ▪ distance vector
datagram networks
❖ all routers have complete time ▪ hierarchical routing
topology, link cost info 4.3 what’s inside a router
dynamic: 4.4 IP: Internet Protocol 4.6 routing in the Internet
❖ “link state” algorithms ❖ routes change more ▪ RIP
▪ datagram format
decentralized: quickly ▪ OSPF
▪ IPv4 addressing
❖ router knows physically- ▪ periodic update ▪ BGP
▪ ICMP
connected neighbors, link ▪ in response to link ▪ IPv6 4.7 broadcast and multicast
costs to neighbors cost changes routing
❖ iterative process of
computation, exchange of
info with neighbors
❖ “distance vector” algorithms
Network Layer 4-84 Network Layer 4-85
A Link-State Routing Algorithm Dijsktra’s Algorithm
Dijkstra’s algorithm 1 Initialization:
notation: 2 N' = {u}
❖ net topology, link costs ❖ c(x,y): link cost from 3 for all nodes v
known to all nodes node x to y; = ∞ if not 4 if v adjacent to u
▪ accomplished via “link state direct neighbors 5 then D(v) = c(u,v)
broadcast” ❖ D(v): current value of 6 else D(v) = ∞
▪ all nodes have same info cost of path from source 7
❖ computes least cost paths to dest. v 8 Loop
from one node (‘source”) ❖ p(v): predecessor node 9 find w not in N' such that D(w) is a minimum
to all other nodes along path from source to 10 add w to N'
▪ gives forwarding table for v 11 update D(v) for all v adjacent to w and not in N' :
that node ❖ N': set of nodes whose 12 D(v) = min( D(v), D(w) + c(w,v) )
❖ iterative: after k least cost path definitively 13 /* new cost to v is either old cost to v or known
iterations, know least cost known 14 shortest path cost to w plus cost from w to v */
path to k dest.’s 15 until all nodes in N'

Network Layer 4-86 Network Layer 4-87

Dijkstra’s algorithm: example Dijkstra’s algorithm, discussion


D(v) D(w) D(x) D(y) D(z)
Step N' p(v) p(w) p(x) p(y) p(z) algorithm complexity: n nodes
0 u 7,u 3,u 5,u ∞ ∞ ❖ each iteration: need to check all nodes, w, not in N
1 uw 5,u 11,w ∞

6,w
n(n+1)/2 comparisons: O(n2)
2 uwx 6,w 11,w 14,x
3 uwxv 10,v 14,x ❖ more efficient implementations possible: O(nlogn)
4 uwxvy 12,y
5 uwxvyz x
oscillations possible:
1 Initialization: 9 ❖ e.g., support link cost equals amount of carried traffic:
2 N' = {u}
notes:
3 for all nodes v 5 7
4
❖ construct shortest path tree by
4 if v adjacent to u
5 then D(v) = c(u,v) 1
A 1+e A A A
8 2+e 2+e
elsetracing
D(v) = ∞ predecessor nodes
0 0 2+e 0
6
D B D B D B D B
7 ❖ ties can exist (can be broken 3 0 0 1+e 1 0 0 1+e 1
8 Loop arbitrarily) u w y z 0 e 0 0
9 find w not in N' such that D(w) is a minimum
2 C C 0 1
C 1+e C 0
1 1
10 add w to N' 3
11 update D(v) for all v adjacent to w and not in N' :
7 4 e
12 D(v) = min( D(v), D(w) + c(w,v) ) given these costs, given these costs, given these costs,
13 /* new cost to v is either old cost to v or known
initially find new routing…. find new routing…. find new routing….
14 shortest path cost to w plus cost from w to v */
v resulting in new costs resulting in new costs resulting in new costs
15 until all nodes in N' Network Layer 4-88 Network Layer 4-91
Chapter 4: outline Distance vector (B-F) intuition
4.1 introduction 4.5 routing algorithms Dv(y)=5
4.2 virtual circuit and ▪ link state
▪ distance vector v
datagram networks 2 ?
4.3 what’s inside a router ▪ hierarchical routing
?
4.4 IP: Internet Protocol 4.6 routing in the Internet U x y
▪ RIP ?
▪ datagram format
▪ OSPF 2 ?
▪ IPv4 addressing w
▪ BGP
▪ ICMP
4.7 broadcast and multicast Dw(y)=3
▪ IPv6
routing

What is the least cost to go from u to y?

Network Layer 4-92 Network Layer 4-93

Distance vector algorithm Bellman-Ford example

Bellman-Ford equation (dynamic programming) 5


3
clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
v w 5
2
let u 2 1 z B-F equation says:
3
dx(y) := cost of least-cost path from x to y 1 2 du(z) = min { c(u,v) + dv(z),
x y
then 1 c(u,x) + dx(z),
dx(y) = min {c(x,v) + dv(y) } c(u,w) + dw(z) }
v = min {2 + 5,
1 + 3,
cost from neighbor v to destination y 5 + 3} = 4
cost to neighbor v
node achieving minimum is next
min taken over all neighbors v of hop in shortest path, used in forwarding table
x
Network Layer 4-94 Network Layer 4-95
Distance vector algorithm Distance vector algorithm
❖ Dx(y) = estimate of least cost from x to y key idea:
▪ x maintains distance vector Dx = [Dx(y): y є N ] ❖ from time-to-time, each node sends its own
❖ node x: distance vector estimate to neighbors
▪ knows cost to each neighbor v: c(x,v) ❖ when x receives new DV estimate from neighbor,
▪ maintains its neighbors’ distance vectors. For it updates its own DV using B-F equation:
each neighbor v, x maintains Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊
Dv = [Dv(y): y є N ] N
❖ under minor, natural conditions, the estimate
Dx(y) converge to the actual least cost dx(y)

Network Layer 4-96 Network Layer 4-97

Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
Distance vector algorithm = min{2+1 , 7+0} = 3
node x cost to cost to
table x y z x y z
iterative, asynchronous: each
x 0 2 7 x 0 2 3
each local iteration node:

from
from
y ∞∞ ∞ y 2 0 1
caused by:
wait for (change in local link z ∞∞ ∞ z 7 1 0
❖ local link cost change
cost or msg from neighbor)
❖ DV update message from node y cost to
neighbor table x y z y
2 1
x ∞ ∞ ∞
distributed: recompute estimates x z

from
y 2 0 1 7
❖ each node notifies z ∞∞ ∞
neighbors only when its
DV changes if DV to any dest has
node z cost to
▪ neighbors then notify their changed, notify neighbors table x y z
neighbors if necessary x ∞∞ ∞
from

y ∞∞ ∞
z 7 1 0
time
Network Layer 4-98 Network Layer 4-99
Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
Distance vector: link cost changes
node x cost to cost to cost to
table x y z x y z x y z link cost changes: 1
x 0 2 7 x 0 2 3 x 0 2 3 ❖ node detects local link cost change y

from
from

y ∞∞ ∞ y 2 0 1 4 1

from
y 2 0 1
z ∞∞ ∞ z 7 1 0 ❖ updates routing info, recalculates
z 3 1 0 x z
distance vector 50
node y cost to cost to cost to ❖ if DV changes, notify neighbors
table x y z x y z x y z y
2 1
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x z “good t0 : y detects link-cost change, updates its DV, informs its
from

from

y 2 0 1 y 2 0 1 7

from
y 2 0 1 neighbors.
z ∞∞ ∞ z 7 1 0 z 3 1 0
news
travels t1 : z receives update from y, updates its table, computes new
node z cost to cost to cost to fast” least cost to x , sends its neighbors its DV.
table x y z x y z x y z

x ∞∞ ∞ x 0 2 7 x 0 2 3 t2 : y receives z’s update, updates its distance table. y’s least costs
from

y 2 0 1 from
y 2 0 1 do not change, so y does not send a message to z.
from

y ∞∞ ∞
z 7 1 0 z 3 1 0 z 3 1 0
time
Network Layer 4-100 Network Layer 4-103

Distance vector: link cost changes Comparison of LS and DV algorithms


link cost changes: message complexity robustness: what happens if
60 ❖ LS: with n nodes, E links, O(nE) router malfunctions?
❖ node detects local link cost change y
4 1 msgs sent LS:
❖ bad news travels slow - “count to x z ❖ DV: exchange between ▪ node can advertise incorrect
infinity” problem! 50 neighbors only link cost
❖ 44 iterations before algorithm ▪ convergence time varies ▪ each node computes only its
stabilizes: see text own table
speed of convergence
❖ LS: O(n2) algorithm requires DV:
poisoned reverse: O(nE) msgs ▪ DV node can advertise
❖ If Z routes through Y to get to X : ▪ may have oscillations incorrect path cost
▪ Z tells Y its (Z’s) distance to X is infinite (so Y won’t route ❖ DV: convergence time varies ▪ each node’s table used by
others
to X via Z) ▪ may be routing loops
• error propagate thru
❖ will this completely solve count to infinity problem? ▪ count-to-infinity problem network

Network Layer 4-104 Network Layer 4-105


Chapter 4: outline Hierarchical routing
our routing study thus far - idealization
4.1 introduction 4.5 routing algorithms
❖ all routers identical
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector ❖ network “flat”

4.3 what’s inside a router ▪ hierarchical routing … not true in practice


4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ RIP
▪ datagram format
▪ IPv4 addressing
▪ OSPF scale: with 600 million administrative autonomy
▪ BGP destinations: ❖ internet = network of
▪ ICMP
▪ IPv6 4.7 broadcast and multicast ❖ can’t store all dest’s in networks
routing routing tables! ❖ each network admin may
❖ routing table exchange want to control routing in
would swamp links! its own network

Network Layer 4-106 Network Layer 4-107

Hierarchical routing Interconnected ASes


❖ aggregate routers into gateway router:
regions, “autonomous ❖ at “edge” of its own AS 3
3a 2c
systems” (AS) ❖ has link to router in 3b c 2a
AS3 2b
1c
❖ another AS AS2
routers in same AS 1a 1b AS1
run same routing 1d ❖ forwarding table
protocol configured by both intra-
▪ “intra-AS” routing and inter-AS routing
protocol Intra-AS Inter-AS algorithm
▪ intra-AS sets entries
Routing Routing
▪ routers in different AS algorithm algorithm

can run different intra- Forwarding


for internal dests
AS routing protocol table ▪ inter-AS & intra-AS
sets entries for
external dests

Network Layer 4-108 Network Layer 4-109


Inter-AS tasks Example: setting forwarding table in router 1d
❖ suppose router in AS1 AS1 must: ❖ suppose AS1 learns (via inter-AS protocol) that subnet x
receives datagram 1. learn which dests are reachable via AS3 (gateway 1c), but not via AS2
destined outside of AS1: reachable through AS2, ▪ inter-AS protocol propagates reachability info to all internal
▪ router should forward which through AS3 routers
packet to gateway 2. propagate this ❖ router 1d determines from intra-AS routing info that its
router, but which one? reachability info to all interface I is on the least cost path to 1c
routers in AS1 ▪ installs forwarding table entry (x,I)
job of inter-AS routing!

3c 3c
x
3a 3a
3b 3b
AS3 2c other AS3 2c other
1c 2a networks 1c 2a networks
other 1a 2b other 1a 2b
networks 1b AS2 networks 1b AS2
AS1 1d AS1 1d

Network Layer 4-110 Network Layer 4-111

Example: choosing among multiple ASes Example: choosing among multiple ASes
❖ now suppose AS1 learns from inter-AS protocol that subnet
❖ now suppose AS1 learns from inter-AS protocol that subnet x is reachable from AS3 and from AS2.
x is reachable from AS3 and from AS2. ❖ to configure forwarding table, router 1d must determine
❖ to configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x
which gateway it should forward packets towards for dest x ▪ this is also job of inter-AS routing protocol!
▪ this is also job of inter-AS routing protocol! ❖ hot potato routing: send packet towards closest of two
routers.

3c
x
use routing info determine from
learn from inter-AS hot potato routing:
3a protocol that subnet
from intra-AS
choose the gateway
forwarding table the
3b protocol to determine interface I that leads
AS3 2c other x is reachable via
costs of least-cost
that has the to least-cost gateway.
1c 2a networks multiple gateways
paths to each
smallest least cost Enter (x,I) in
other 1a 2b of the gateways forwarding table
networks 1b AS2
AS1 1d
?
Network Layer 4-112 Network Layer 4-113
Chapter 4: outline Intra-AS Routing
4.1 introduction 4.5 routing algorithms ❖ also known as interior gateway protocols (IGP)
▪ link state
4.2 virtual circuit and ❖ most common intra-AS routing protocols:
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing ▪ RIP: Routing Information Protocol
4.4 IP: Internet Protocol 4.6 routing in the Internet ▪ OSPF: Open Shortest Path First
▪ RIP
▪ datagram format
▪ OSPF
▪ IGRP: Interior Gateway Routing Protocol
▪ IPv4 addressing (Cisco proprietary)
▪ BGP
▪ ICMP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-114 Network Layer 4-115

RIP ( Routing Information Protocol) OSPF (Open Shortest Path First)


❖ included in BSD-UNIX distribution in 1982
❖ distance vector algorithm
❖ “open”: publicly available
▪ distance metric: # hops (max = 15 hops), each link has cost 1 ❖ uses link state algorithm
▪ DVs exchanged with neighbors every 30 sec in response message (aka ▪ LS packet dissemination
advertisement)
▪ topology map at each node
▪ each advertisement: list of up to 25 destination subnets (in IP addressing
sense) ▪ route computation using Dijkstra’s algorithm
❖ OSPF advertisement carries one entry per neighbor
❖ advertisements flooded to entire AS
from router A to destination subnets:
u v subnet hops ▪ carried in OSPF messages directly over IP (rather than
A B w u 1 TCP or UDP

v 2
w 2
IS-IS routing protocol: nearly identical to OSPF
x x 3
z C D y 3
y z 2
Network Layer 4-116 Network Layer 4-121
OSPF “advanced” features (not in RIP) Hierarchical OSPF

boundary router
security: all OSPF messages authenticated (to
prevent malicious intrusion) backbone router

❖ multiple same-cost paths allowed (only one path in


backbone
RIP)
area
❖ for each link, multiple cost metrics for different TOS border
(e.g., satellite link cost set “low” for best effort ToS; routers

high for real time ToS)


❖ integrated uni- and multicast support: area 3

▪ Multicast OSPF (MOSPF) uses same topology data


base as OSPF internal
routers
❖ hierarchical OSPF in large domains. area 1
area 2

Network Layer 4-122 Network Layer 4-123

Hierarchical OSPF Internet inter-AS routing: BGP


❖ two-level hierarchy: local area, backbone. ❖ BGP (Border Gateway Protocol): the de facto
▪ link-state advertisements only in area inter-domain routing protocol
▪ “glue that holds the Internet together”
▪ each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas. ❖ allows subnet to advertise its existence to rest of
Internet: “I am here”
❖ area border routers: “summarize” distances to nets
in own area, advertise to other Area Border routers. ❖ BGP provides each AS a means to:
❖ backbone routers: run OSPF routing limited to ▪ eBGP: obtain subnet reachability information from
backbone. neighboring ASs.
❖ boundary routers: connect to other AS’s. ▪ iBGP: propagate reachability information to all AS-
internal routers.
▪ determine “good” routes to other networks based on
reachability information and policy.
▪ advertise neighbor reachability information.
Network Layer 4-124 Network Layer 4-125
BGP basics BGP basics: distributing path information
❖ BGP session: two BGP routers (“peers”) exchange BGP ❖ using eBGP session between 3a and 1c, AS3 sends prefix
messages: reachability info to AS1.
▪ advertising paths to different destination network prefixes (“path vector” ▪ 1c can then use iBGP do distribute new prefix info to all routers
protocol) in AS1
▪ exchanged over semi-permanent TCP connections ▪ 1b can then re-advertise new reachability info to AS2 over 1b-to-
2a eBGP session
❖ when AS3 advertises a prefix to AS1:
❖ when router learns of new prefix, it creates entry for
▪ AS3 promises it will forward datagrams towards that prefix
▪ AS3 can aggregate prefixes in its advertisement
prefix in its forwarding table.

3c eBGP session
BGP
3a message 3a iBGP session
3b 3b
AS3 2c other AS3 2c other
1c 2a networks 1c 2a networks
other 1a 2b other 1a 2b
networks 1b AS2 networks 1b AS2
AS1 1d AS1 1d

Network Layer 4-126 Network Layer 4-127

Path attributes and BGP routes BGP route selection


❖ advertised prefix includes BGP attributes ❖ router may learn about more than 1 route to
▪ prefix + attributes = “route” destination AS, selects route based on:
❖ two important attributes: 1. local preference value attribute: policy decision
▪ AS-PATH: contains ASs through which prefix 2. shortest AS-PATH
advertisement has passed: e.g., AS 67, AS 17 3. closest NEXT-HOP router: hot potato routing
▪ NEXT-HOP: indicates specific internal-AS router to next- 4. additional criteria
hop AS. (may be multiple links from current AS to next-
hop-AS)
❖ gateway router receiving route advertisement uses
import policy to accept/decline
▪ e.g., never route through AS x
▪ policy-based routing

Network Layer 4-128 Network Layer 4-129


BGP messages
Putting it Altogether:
How Does an Entry Get Into a
❖ BGP messages exchanged between peers over TCP
connection Router’s Forwarding Table?
❖ BGP messages:
▪ OPEN: opens TCP connection to peer and authenticates
sender ❖ Answer is complicated!
▪ UPDATE: advertises new path (or withdraws old)
▪ KEEPALIVE: keeps connection alive in absence of ❖ Ties together hierarchical routing (Section 4.5.3)
UPDATES; also ACKs OPEN request with BGP (4.6.3) and OSPF (4.6.2).
▪ NOTIFICATION: reports errors in previous msg; also
used to close connection
❖ Provides nice overview of BGP!

Network Layer 4-130

How does entry get in forwarding table? How does entry get in forwarding table?

routing algorithms
High-level overview
Assume prefix 1. Router becomes aware of prefix
entr
local forwarding table is 2. Router determines output port for prefix
prefix output port
y 138.16.64/22 3
in another AS. 3. Router enters prefix-port in forwarding table
124.12/16 2
212/8 4
………….. …

Dest IP
1

3 2
Router becomes aware of prefix Router may receive multiple routes

3c 3c
BGP BGP
3a message 3a message
3b 3b
AS3 2c other AS3 2c other
1c 2a networks 1c 2a networks
other 1a 2b other 1a 2b
networks 1b AS2 networks 1b AS2
AS1 1d AS1 1d

❖ BGP message contains “routes”


❖ Router may receive multiple routes for same prefix
❖ “route” is a prefix and attributes: AS-PATH, NEXT-
HOP,… ❖ Has to select one route
❖ Example: route:
❖ Prefix:138.16.64/22 ; AS-PATH: AS3 AS131 ;
NEXT-HOP: 201.44.13.125

Select best BGP route to prefix Find best intra-route to BGP route
❖ Use selected route’s NEXT-HOP attribute
❖ Router selects route based on shortest AS-PATH ▪ Route’s NEXT-HOP attribute is the IP address of the
router interface that begins the AS PATH.
❖ Example:
❖ AS-PATH: AS2 AS17 ; NEXT-HOP: 111.99.86.55
❖ Example: selec ❖ Router uses OSPF to find shortest path from 1c to
❖ AS2 AS17 to 138.16.64/22 t 111.99.86.55
❖ AS3 AS131 AS201 to 138.16.64/22
3c

3a
What if there is a tie? We’ll come back to that! 3b 111.99.86.
AS3 55 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d
Router identifies port for route Hot Potato Routing
❖ Suppose there two or more best inter-routes.
❖ Identifies port along the OSPF shortest path
❖ Then choose route with closest NEXT-HOP
❖ Adds prefix-port entry to its forwarding table:
▪ (138.16.64/22 , port 4) ▪ Use OSPF to determine which gateway is closest
▪ Q: From 1c, chose AS3 AS131 or AS2 AS17?
▪ A: route AS3 AS201 since it is closer

3c router 3c
3a port 3a
3b 3b
AS3 1 2c other 2c
1c 4 AS3 1c
other
2 3 2a networks 2a networks
other 1a 2b other 1a 2b
networks 1b AS2 networks 1b
AS1 1d 1d AS2
AS1

How does entry get in forwarding table? BGP routing policy


legend: provider
B network
Summary W A
X
customer
1. Router becomes aware of prefix C network:
▪ via BGP route advertisements from other routers
Y
2. Determine router output port for prefix
▪ Use BGP route selection to find best inter-AS route ❖ A,B,C are provider networks
▪ Use OSPF to find best intra-AS route leading to best ❖ X,W,Y are customer (of provider networks)
inter-AS route ❖ X is dual-homed: attached to two networks
▪ Router identifies router port for that best route ▪ X does not want to route from B via X to C
3. Enter prefix-port entry in forwarding table ▪ .. so X will not advertise to B a route to C

Network Layer 4-141


BGP routing policy (2) Why different Intra-, Inter-AS routing ?
B
legend: provider
network
policy:
X ❖ inter-AS: admin wants control over how its traffic
W A
customer routed, who routes through its net.
network:

C
intra-AS: single admin, so no policy decisions needed
Y
scale:
❖ A advertises path AW to B ❖ hierarchical routing saves table size, reduced update
❖ B advertises path BAW to X traffic
❖ Should B advertise path BAW to C?
▪ No way! B gets no “revenue” for routing CBAW since neither W nor
performance:
C are B’s customers ❖ intra-AS: can focus on performance
▪ B wants to force C to route to w via A
❖ inter-AS: policy may dominate over performance
▪ B wants to route only to/from its customers!

Network Layer 4-142 Network Layer 4-143

Chapter 4: done!
Chapter 5
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state, distance vector, Link Layer
datagram networks hierarchical routing
4.3 what’s inside a router 4.6 routing in the Internet
▪ RIP, OSPF, BGP
4.4 IP: Internet Protocol
▪ datagram format, IPv4 Computer
addressing, ICMP, IPv6 Networking: A
❖ understand principles behind network layer services:
Top Down
Approach
▪ network layer service models, forwarding versus routing 6th edition
how a router works, routing (path selection), broadcast, Jim Kurose, Keith Ross
multicast Addison-Wesley
❖ instantiation, implementation in the Internet March 2012
All material copyright 1996-2012
J.F Kurose and K.W. Ross, All Rights Reserved

Network Layer 4-166 Link Layer 5-1


Chapter 5: Link layer Link layer, LANs: outline
our goals: 5.1 introduction, services 5.5 link virtualization:
❖ understand principles behind link layer MPLS
5.2 error detection,
services: correction 5.6 data center
▪ error detection, correction
5.3 multiple access networking
▪ sharing a broadcast channel: multiple access
▪ link layer addressing
protocols 5.7 a day in the life of a
▪ local area networks: Ethernet, VLANs 5.4 LANs web request
❖ instantiation, implementation of various link ▪ addressing, ARP
layer technologies ▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-2 Link Layer 5-3

Link layer: introduction Link layer: context


terminology: transportation analogy: ❖ datagram transferred by
❖ hosts and routers: nodes ❖ trip from BITS to Agra different link protocols over
❖ communication channels that global ISP
▪ limo: BITS to Goa airport different links:
connect adjacent nodes along ▪ plane: Goa airport to Delhi ▪ e.g., Ethernet on first link,
communication path: links ▪ train: Delhi to Agra frame relay on
▪ wired links ❖ tourist = datagram intermediate links, 802.11
▪ wireless links ❖ transport segment = on last link
▪ LANs communication link ❖ each link protocol provides
❖ layer-2 packet: frame, ❖ transportation mode = link different services
encapsulates datagram layer protocol ▪ e.g., may or may not
❖ travel agent = routing provide reliable data
algorithm transfer over link
data-link layer has responsibility of
transferring datagram from one node
to physically adjacent node over a link
Link Layer 5-4 Link Layer 5-5
Link layer services Link layer services (more)
❖ framing, link access: ❖ flow control:
▪ encapsulate datagram into frame, adding header, trailer ▪ pacing between adjacent sending and receiving nodes
▪ channel access if shared medium
❖ error detection:
▪ “MAC” addresses used in frame headers to identify
source, dest ▪ errors caused by signal attenuation, noise.
▪ receiver detects presence of errors:
• different from IP address!
• signals sender for retransmission or drops frame
❖ reliable delivery between adjacent nodes
❖ error correction:
▪ we learned how to do this already (chapter 3)!
▪ receiver identifies and corrects bit error(s) without resorting to
▪ seldom used on low bit-error link (fiber, some twisted retransmission
pair)
▪ wireless links: high error rates ❖ half-duplex and full-duplex
• Q: why both link-level and end-end reliability? ▪ with half duplex, nodes at both ends of link can transmit, but not
at same time

Link Layer 5-6 Link Layer 5-7

Where is the link layer implemented? Adaptors communicating


❖ in each and every host
❖ link layer implemented in
“adaptor” (aka network datagram datagra
m
interface card NIC) or on a controll
er
controll
er
chip application
▪ Ethernet card, 802.11 transport
network cpu
memor
sending host receiving host
y
card; Ethernet chipset link
datagra

▪ implements link, physical host frame


m

layer control bus


(e.g., PCI)
❖ ❖
ler
❖ attaches into host’s system sending side: receiving side
link
physical
▪ encapsulates datagram in ▪ looks for errors, rdt,
physical
buses transmission

❖ combination of hardware, frame flow control, etc


software, firmware network adapter
card
▪ adds error checking bits, ▪ extracts datagram, passes
rdt, flow control, etc. to upper layer at
receiving side
Link Layer 5-8 Link Layer 5-9
Link layer, LANs: outline Error detection
EDC= Error Detection and Correction bits (redundancy)
5.1 introduction, services 5.5 link virtualization: D = Data protected by error checking, may include header fields
5.2 error detection, MPLS • Error detection not 100% reliable!
correction 5.6 data center • protocol may miss some errors, but rarely
5.3 multiple access networking • larger EDC field yields better detection and correction
protocols 5.7 a day in the life of a
5.4 LANs web request
otherwise
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-10 Link Layer 5-11

Parity checking Internet checksum (review)


single bit parity: two-dimensional bit parity: goal: detect “errors” (e.g., flipped bits) in transmitted packet
❖ detect single bit ❖ detect and correct single bit errors (note: used at transport layer only)
errors
011100011010101 1 sender: receiver:
1 ❖ treat segment contents ❖ compute checksum of
d data received segment
bits as sequence of 16-bit
parity integers ❖ check if computed
bit ❖ checksum: addition (1’s checksum equals checksum
Even parity: set parity complement sum) of field value:
bit so there is an even segment contents ▪ NO - error detected
number of 1’s ❖ sender puts checksum ▪ YES - no error detected.
value into UDP But maybe errors
At receiver: 0 0 checksum field nonetheless?
• Compute parity of d+1 received
bits. If not even then error detected
• What does single bit parity detect?
Link Layer 5-12 Link Layer 5-13
Hamming Code Link layer, LANs: outline
❖ Hamming codes have both data and parity bits
▪ n =m+r (n is the total block length)
5.1 introduction, services 5.5 link virtualization:
❖ Consider the following 10-bit code words :
▪ 1111111111,1111100000,0000011111,0000000000 5.2 error detection, MPLS
▪ What is the maximum error that can be detected, corrected? correction 5.6 data center
▪ E.g., 0000000011, 0000000111. 5.3 multiple access networking
❖ 2r≥m+r+1. protocols 5.7 a day in the life of a
❖ Parity bits are placed at positions 2n web request
5.4 LANs
❖ Example of (7,4) hamming block
▪ addressing, ARP
▪ Ethernet
P0 P1 D0 P2 D1 D2 D3
▪ switches
❖ Such code has a hamming distance of 3. Can detect 2 error bits, and
correct 1 error bit ▪ VLANS
❖ Error bit correction done by adding the parity bit position values

Data Link Layer 5-14 Link Layer 5-19

Multiple access links, protocols Multiple access protocols


two types of “links”:
❖ single shared broadcast channel
❖ point-to-point
❖ two or more simultaneous transmissions by nodes:
▪ PPP for dial-up access
interference
▪ point-to-point link between Ethernet switch, host
▪ collision if node receives two or more signals at the same
❖ broadcast (shared wire or medium) time
▪ old-fashioned Ethernet Coordinator
▪ 802.11 wireless LAN
multiple access protocol
❖ distributed algorithm that determines how nodes share
channel, i.e., determine when node can transmit
❖ communication about channel sharing must use channel itself!
▪ no out-of-band channel for coordination

shared wire (e.g., shared RF shared RF humans at a


cabled Ethernet) (e.g., 802.11 WiFi) (satellite) cocktail party
(shared air, acoustical) Q. How do humans solve this problem?
Link Layer 5-20 Link Layer 5-21
An ideal multiple access protocol MAC protocols: taxonomy
three broad classes:
given: broadcast channel of rate R bps
❖ channel partitioning
desiderata: ▪ divide channel into smaller “pieces” (time slots, frequency, code)
1. when one node wants to transmit, it can send at rate R. ▪ allocate piece to node for exclusive use
2. when M nodes want to transmit, each can send at average ▪ TDMA, FDMA, CDMA
rate R/M. ❖ random access
3. fully decentralized: ▪ channel not divided, allow collisions
• no special node to coordinate transmissions. ▪ “recover” from collisions
▪ ALOHA, slotted ALOHA
• no synchronization of clocks, slots.
4. simple
❖ “taking turns”
▪ nodes take turns, but nodes with more to send can take longer
turns
▪ Polling, Token ring

Link Layer 5-22 Link Layer 5-23

Channel partitioning MAC protocols: TDMA Channel partitioning MAC protocols: FDMA
TDMA: time division multiple access FDMA: frequency division multiple access
❖ access to channel in "rounds" ❖ channel spectrum divided into frequency bands
❖ each station gets fixed length slot (length = pkt ❖ each station assigned fixed frequency band
trans time) in each round ❖ unused transmission time in frequency bands go idle
❖ unused slots go idle ❖ example: 6-station LAN, 1,3,4 have pkt, frequency bands 2,5,6
❖ example: 6-station LAN, 1,3,4 have pkt, slots idle
2,5,6 idle

frequency bands
6-slot 6-slot
frame frame
1 3 4 1 3 4

FDM cable
Efficient? Fair? Distributed?
Link Layer 5-24 Link Layer 5-25
Random access protocols Random Access
Slotted ALOHA
Protocols:
• Collisions are OK
• Randomize to
❖ when node has packet to send recover assumptions: operation:
▪ transmit at full channel data rate R. from collisions
❖ all frames same size ❖ when node obtains fresh
▪ no a priori coordination among nodes ❖ frame, transmits in next slot
two or more transmitting nodes ➜ “collision”,
time divided into equal size
❖ slots (time to transmit 1 ▪ if no collision: node can
❖ random access MAC protocol specifies: frame) send new frame in next
▪ how to detect collisions ❖ nodes are synchronized slot
▪ how to recover from collisions (e.g., via delayed ❖ nodes start to transmit ▪ if collision: node
retransmissions) only slot beginning retransmits frame in each
❖ examples of random access MAC protocols: ❖ if 2 or more nodes transmit subsequent slot with prob.
▪ slotted ALOHA in slot, all nodes detect p until success
▪ ALOHA collision
▪ CSMA, CSMA/CD, CSMA/CA Why?

How does the value of p


affect the protocol?
Link Layer 5-26 Link Layer 5-27

Slotted ALOHA Slotted ALOHA: efficiency


node 1 1 1 1 1

efficiency: long-run ❖ max efficiency: find p* that


node 2 2 2 2
fraction of successful slots maximizes
node 3 3 3 3 (many nodes, all with many Np(1-p)N-1
frames to send) ❖ for many nodes, take limit
C E C S E C E S S of Np*(1-p*)N-1 as N goes
Pros: Cons: ❖ suppose: N nodes with many to infinity, gives:

frames to send, each max efficiency = 1/e = .37
single active node can ❖ collisions, wasting slots
continuously transmit at transmits in slot with
❖ idle slots probability p
full rate of channel
❖ nodes may be able to ❖ at best: channel
!
prob that given node has
❖ highly decentralized: only detect collision in less success in a slot = p(1-p)N-1 used for useful
slots in nodes need to be transmissions 37%
in sync
than time to transmit ❖ prob that any node has a
packet success = Np(1-p)N-1 of time!
❖ simple
❖ clock synchronization
Link Layer 5-28 Link Layer 5-29
Pure (unslotted) ALOHA Pure ALOHA efficiency
P(success by given node) = P(node transmits) .
❖ unslotted Aloha: simpler, no synchronization
P(no other node transmits in [t0-1,t0] .
❖ when frame first arrives P(no other node transmits in [t0,t0+1]
▪ transmit immediately
❖ collision probability increases/decreases as compared to = p . (1-p)N-1 . (1-p)N-1
slotted ALOHA? = p . (1-p)2(N-1)
▪ frame sent at t0 collides with other frames sent in [t0-
1,t0+1] … choosing optimum p and then letting n

= 1/(2e) = .18

even worse than slotted


Aloha!
Link Layer 5-30 Link Layer 5-31

CSMA (carrier sense multiple access) CSMA collisions spatial layout of nodes

❖ collisions can still occur


CSMA: LISTEN
due to propagation delay,
BEFORE
if channel sensed idle: transmit entire frame TRANSMIT!! i.e., two nodes may not
❖ if channel sensed busy, defer transmission hear each other’s
transmission in time.
❖ collision: entire packet
❖ human analogy: don’t interrupt others! transmission time wasted
▪ distance & propagation
delay play role in in
determining collision
probability

Does CSMA prevent collisions?

Link Layer 5-32 Link Layer 5-33


CSMA/CD (collision detection) CSMA/CD (collision detection)
CSMA/CD: carrier sensing, deferral as in CSMA spatial layout of nodes
▪ collisions detected within short time
▪ colliding transmissions aborted, reducing channel wastage
❖ collision detection:
▪ easy in wired LANs: measure signal strengths, compare
transmitted, received signals
▪ difficult in wireless LANs: received signal strength
overwhelmed by local transmission strength
❖ human analogy: the polite conversationalist

Link Layer 5-34 Link Layer 5-35

Ethernet CSMA/CD algorithm “Taking turns” MAC protocols


1. NIC receives datagram 4. If NIC detects another channel partitioning MAC protocols:
from network layer, transmission while ▪ share channel efficiently and fairly at high load
creates frame transmitting, aborts and ▪ inefficient at low load: delay in channel access, 1/N
2. If NIC senses channel sends jam signal bandwidth allocated even if only 1 active node!
idle, starts frame 5. After aborting, NIC
transmission. If NIC enters binary
random access MAC protocols
senses channel busy, (exponential) backoff: ▪ efficient at low load: single node can fully utilize
waits until channel idle, channel
▪ after mth collision, NIC
then transmits. chooses K at random ▪ high load: collision overhead
3. If NIC transmits entire from {0,1,2, …, 2m-1}. “taking turns” protocols
frame without detecting NIC waits K·512 bit look for best of both worlds!
another transmission, times, returns to Step 2
NIC is done with frame ! ▪ longer backoff interval
with more collisions
Link Layer 5-36 Link Layer 5-38
“Taking turns” MAC protocols “Taking turns” MAC protocols
polling: token passing:
T
❖ master node “invites” ❖ control token passed
slave nodes to transmit from one node to next
data
in turn poll sequentially.
poll
❖ typically used with ❖ token message
(nothing
“dumb” slave devices master ❖ concerns: to send)
❖ ▪ token overhead
data
concerns: T
▪ polling overhead ▪ latency
▪ latency ▪ single point of failure
▪ single point of slaves (token)
failure (master)
T

data
Link Layer 5-39 Link Layer 5-40

Summary of MAC protocols Link layer, LANs: outline


❖ channel partitioning, by time, frequency or code 5.1 introduction, services 5.5 link virtualization:
▪ Time Division, Frequency Division MPLS
5.2 error detection,
❖ random access (dynamic), correction 5.6 data center
▪ ALOHA, S-ALOHA, CSMA, CSMA/CD networking
▪ carrier sensing: easy in some technologies (wire), hard
5.3 multiple access
in others (wireless) protocols 5.7 a day in the life of a
▪ CSMA/CD used in Ethernet 5.4 LANs web request
▪ CSMA/CA used in 802.11 ▪ addressing, ARP
❖ taking turns ▪ Ethernet
▪ polling from central site, token passing ▪ switches
▪ bluetooth, FDDI, token ring ▪ VLANS

Link Layer 5-41 Link Layer 5-42


MAC addresses and ARP LAN addresses and ARP
each adapter on LAN has unique LAN address
❖ 32-bit IP address:
▪ network-layer address for interface
▪ used for layer 3 (network layer) forwarding 1A-2F-BB-76-09-AD

❖ MAC (or LAN or physical or Ethernet) address:


▪ function: used ‘locally” to get frame from one interface to
another physically-connected interface (same network, in LAN
IP-addressing sense) (wired or adapter
▪ 48 bit MAC address (for most LANs) burned in NIC wireless)
71-65-F7-2B-08-53
ROM, also sometimes software settable 58-23-D7-FA-20-B0
▪ e.g.: 1A-2F-BB-76-09-AD
0C-C4-11-6F-E3-98
hexadecimal (base 16) notation
(each “number” represents 4 bits)

Link Layer 5-43 Link Layer 5-44

LAN addresses (more) ARP: address resolution protocol


Question: how to determine
❖ MAC address allocation administered by IEEE interface’s MAC address,
❖ manufacturer buys portion of MAC address space knowing its IP address? ARP table: each IP node (host,
(to assure uniqueness) router) on LAN has table
❖ Analogy: 137.196.7.78
▪ IP/MAC address
▪ MAC address: like your Aadhaar Number mappings for some LAN
1A-2F-BB-76-09-AD
nodes:
▪ IP address: like postal address 137.196.7.23
< IP address; MAC address; TTL>
137.196.7.14
❖ Portability: ▪ TTL (Time To Live):
❖MAC flat address are portable LAN time after which address
▪ can move LAN card from one LAN to another 71-65-F7-2B-08-53 mapping will be
❖IP hierarchical address not portable
58-23-D7-FA-20-B0
forgotten (typically 20
min)
▪ address depends on IP subnet to which node is attached 0C-C4-11-6F-E3-98
137.196.7.88

Link Layer 5-45 Link Layer 5-46


ARP protocol: same LAN Addressing: routing to another LAN
walkthrough: send datagram from A to B via R
❖ A wants to send datagram ▪ focus on addressing – at IP (datagram) and MAC layer (frame)
to B
▪ B’s MAC address not in A’s ❖ A caches (saves) IP-to- ▪ assume A knows B’s IP address
ARP table. MAC address pair in its ▪ assume A knows IP address of first hop router, R (how?)
❖ A broadcasts ARP query ARP table until ▪ assume A knows R’s MAC address (how?)
packet, containing B's IP information becomes old
address (times out)
▪ dest MAC address = FF-FF- ▪ soft state: information that
FF-FF-FF-FF times out (goes away)
▪ all nodes on LAN receive unless refreshed
A B
ARP query ❖ ARP is “plug-and-play”: R
❖ B receives ARP packet, ▪ nodes create their ARP 111.111.111.111
222.222.222.222
replies to A with its (B's) tables without intervention 74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A

MAC address from net administrator 222.222.222.220


1A-23-F9-CD-06-9B

▪ frame sent to A’s MAC 111.111.111.110 222.222.222.221


111.111.111.112
address (unicast) CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-47 Link Layer 5-48

Addressing: routing to another LAN Addressing: routing to another LAN


❖ A creates IP datagram with IP source A, destination B ❖ frame sent from A to R
❖ A creates link-layer frame with R's MAC address as dest, frame ❖ frame received at R, datagram removed, passed up to IP
contains A-to-B IP datagram
MAC src: 74-29-9C-E8-FF-55 MAC src: 74-29-9C-E8-FF-55
MAC dest: E6-E9-00-17-BB-4B MAC dest: E6-E9-00-17-BB-4B
IP src: 111.111.111.111
IP src: 111.111.111.111 IP dest: 222.222.222.222
IP src: 111.111.111.111
IP dest: 222.222.222.222 IP dest: 222.222.222.222

IP IP IP
Eth Eth Eth
Phy Phy Phy

A B A B
R R
111.111.111.111 111.111.111.111
222.222.222.222 222.222.222.222
74-29-9C-E8-FF-55 74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A 49-BD-D2-C7-56-2A
222.222.222.220 222.222.222.220
1A-23-F9-CD-06-9B 1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221 111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-49 Link Layer 5-50


Addressing: routing to another LAN Addressing: routing to another LAN
❖ R forwards datagram with IP source A, destination B ❖ R forwards datagram with IP source A, destination B
❖ R creates link-layer frame with B's MAC address as dest, frame ❖ R creates link-layer frame with B's MAC address as dest, frame
contains A-to-B IP datagram contains A-to-B IP datagram

MAC src: 1A-23-F9-CD-06-9B MAC src: 1A-23-F9-CD-06-9B


MAC dest: 49-BD-D2-C7-56-2A MAC dest: 49-BD-D2-C7-56-2A
IP src: 111.111.111.111 IP src: 111.111.111.111
IP dest: 222.222.222.222 IP dest: 222.222.222.222
IP IP
IP Eth IP Eth
Eth Phy Eth Phy
Phy Phy

A B A B
R R
111.111.111.111 111.111.111.111
222.222.222.222 222.222.222.222
74-29-9C-E8-FF-55 74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A 49-BD-D2-C7-56-2A
222.222.222.220 222.222.222.220
1A-23-F9-CD-06-9B 1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221 111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-51 Link Layer 5-52

Addressing: routing to another LAN Link layer, LANs: outline


❖ R forwards datagram with IP source A, destination B
❖ R creates link-layer frame with B's MAC address as dest, frame
contains A-to-B IP datagram 5.1 introduction, services 5.5 link virtualization:
MAC src: 1A-23-F9-CD-06-9B
MAC dest: 49-BD-D2-C7-56-2A 5.2 error detection, MPLS
IP src: 111.111.111.111
IP dest: 222.222.222.222
correction 5.6 data center
IP 5.3 multiple access networking
Eth
Phy
protocols 5.7 a day in the life of a
5.4 LANs web request
A B ▪ addressing, ARP
R ▪ Ethernet

111.111.111.111
74-29-9C-E8-FF-55
222.222.222.222
49-BD-D2-C7-56-2A
switches
222.222.222.220
1A-23-F9-CD-06-9B
▪ VLANS
111.111.111.112 111.111.111.110 222.222.222.221
CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-53 Link Layer 5-54


Ethernet Ethernet: physical topology
❖ bus: popular through mid 90s
“dominant” wired LAN technology:
▪ all nodes in same collision domain (can collide with each
❖ cheap $20 for NIC other) Bit level repeater
❖ first widely used LAN technology ❖ star: prevails today Dumb forwarder
Can have collisions
❖ simpler, cheaper than token LANs and ATM ▪ active switch in center (has replaced hub at center)
❖ kept up with speed race: 10 Mbps – 10 Gbps
▪ each “spoke” runs a (separate) Ethernet protocol (nodes
do not collide with each other)

switch
star
Metcalfe’s Ethernet sketch bus: coaxial cable
Link Layer 5-55 Link Layer 5-56

Ethernet frame structure Ethernet frame structure (more)


❖ addresses: 6 byte source, destination MAC addresses
sending adapter encapsulates IP datagram (or other ▪ if adapter receives frame with matching destination
network layer protocol packet) in Ethernet frame address, or with broadcast address (e.g. ARP packet), it
type passes data in frame to network layer protocol
dest. source
preamble address address data CRC ▪ otherwise, adapter discards frame
(payload)
❖ type: indicates higher layer protocol (mostly IP but
preamble: others possible, e.g., Novell IPX, AppleTalk)
❖ 7 bytes ( 6 x10101010 followed by 10101011, i.e., ❖ CRC: cyclic redundancy check at receiver
AA AA AA AA AA AA AB) ▪ error detected: frame is dropped
❖ used to synchronize receiver, sender clock rates
type
dest. source
preamble address address data CRC
(payload)

Link Layer 5-57 Link Layer 5-58


Ethernet: unreliable, connectionless 802.3 Ethernet standards: link & physical layers

❖ many different Ethernet standards


❖ connectionless: no handshaking between sending and ▪ common MAC protocol and frame format
receiving NICs ▪ different speeds: 2 Mbps, 10 Mbps, 100 Mbps, 1Gbps,
❖ unreliable: receiving NIC doesnt send ACKs or 10G bps
NACKs to sending NIC ▪ different physical layer media: fiber, cable
▪ data in dropped frames recovered only if initial
sender uses higher layer RDT (e.g., TCP),
otherwise dropped data lost MAC protocol
application
❖ Ethernet’s MAC protocol: unslotted CSMA/CD with transport
and frame format

binary backoff network 100BASE-TX 100BASE-T2 100BASE-FX


link 100BASE-T4 100BASE-SX 100BASE-BX
physical

copper (twister fiber physical layer


pair) physical layer
Link Layer 5-59 Link Layer 5-60

Link layer, LANs: outline Ethernet switch •



Smart
Store and fwd

• Selectively fwd
link-layer device: takes an active role • Transparent

5.1 introduction, services 5.5 link virtualization: ▪ store, forward Ethernet frames • Self-learning

5.2 error detection, MPLS ▪ examine incoming frame’s MAC address,


correction 5.6 data center selectively forward frame to one-or-more
5.3 multiple access networking outgoing links when frame is to be forwarded on
protocols 5.7 a day in the life of a segment, uses CSMA/CD to access segment
web request ❖ transparent
5.4 LANs
▪ addressing, ARP ▪ hosts are unaware of presence of switches
▪ Ethernet ❖ plug-and-play, self-learning
▪ switches ▪ switches do not need to be configured
▪ VLANS

Link Layer 5-62 Link Layer 5-63


Switch: multiple simultaneous transmissions Switch: multiple simultaneous transmissions
❖ hosts have dedicated, direct A ❖ hosts have dedicated, direct A
connection to switch connection to switch
❖ switches buffer packets C’ B
❖ switches buffer packets C’ B

❖ Ethernet protocol used on each 1 ❖ Ethernet protocol used on each 1


6 2 6 2
incoming link, but no collisions; incoming link, but no collisions;
full duplex 5 4 3
full duplex 5 4 3
▪ each link is its own collision ▪ each link is its own collision
domain B’ C domain B’ C

❖ switching: ❖ switching:
▪ A-to-A’ and B-to-B’ can transmit A ▪ A-to-A’ and B-to-B’ can transmit A
simultaneously, without collisions ’ simultaneously, without collisions ’
switch with six interfaces switch with six interfaces
▪ BUT!!! A-to-A’ and C to A’ can
(1,2,3,4,5,6) (1,2,3,4,5,6)
not happen simultaneously

Link Layer 5-64 Link Layer 5-65

Switch: forwarding and filtering Switch forwarding table

❖ Two important functions of a switch Q: how does switch know A’ A

▪ Filtering: Determining whether a frame should be reachable via interface 4, B’ C’ B


forwarded to some interface or dropped. reachable via interface 5?
▪ Forwarding: Determine the interface to which the ❖ A: each switch has a switch 6 1 2
frame must be directed. table, each entry: 5 4 3
▪ (MAC address of host,
B’ C
interface to reach host, time
stamp)
▪ looks like a routing table! A

Q: how are entries created, switch with six interfaces
maintained in switch table? (1,2,3,4,5,6)
▪ something like a routing
protocol?
Data Link Layer 5-66 Link Layer 5-67
Switch: self-learning Source: Switch: frame filtering/forwarding
A
Dest:
A’

A A
switch learns which hosts A’ when frame received at switch:
can be reached through
which interfaces C’ B
1. record incoming link, MAC address of sending host
▪ when frame received, 6 1 2
switch “learns” 2. index switch table using MAC destination address
location of sender: 4
3. if entry found for destination FILTER
5 3
incoming LAN segment then {
▪ records sender/location B’ C if destination on segment from which frame arrived
pair in switch table then drop frame
else forward frame on interface indicated by entry
A
’ }
MAC addr interface TTL else flood /* forward on all interfaces except arriving
A 1 60 Switch table interface */
(initially empty)

FORWARD
Link Layer 5-68 Link Layer 5-69

Self-learning, forwarding: example Source: A


Dest: A’ Institutional network
A A
❖ frame destination, A’, A’ mail
to
locaton unknown: flood C’ B server
extern rou web ❖ Switches connect various
al ter server smaller networks
❖ destination A location 6 1 2 netwo
rk
IP ❖ One router at the edge
known: selectively send A
4 subne
A’
5
A’ 3
t
on just one link B’ C
A’
A
A
❖ Switches

MAC addr interface TTL ❖ Eliminate collisions
A 1 60 switch table ❖ Support heterogeneous links

A’ (initially empty)
4 60 Eases network management

Link Layer 5-70 Link Layer 5-73


Switches vs. routers VLANs: motivation
application
transport
both are store-and-forward:
datagram network ❖ Traffic isolation: prevent certain groups of users from
▪ routers: network-layer frame link receiving broadcasts or frames.
devices (examine network-

physical
layer headers)
link frame Inefficiency: unused ports in a large switch
physical
▪ switches: link-layer devices ❖ Managing user
(examine link-layer headers) switch

both have forwarding tables: network datagram


▪ routers: compute tables link frame
using routing algorithms, IP physical
addresses application
▪ switches: learn forwarding transport
table using flooding, learning, network
MAC addresses link
physical

Link Layer 5-74 Link Layer 5-75

port-based VLAN: switch ports


VLANs: motivation VLANs grouped (by switch management
software) so that single physical
consider: switch ……
Virtual Local
❖ CS user moves office to Area Network
1 7 9 15

2 8 10 16
EE, but wants connect to
CS switch? switch(es) supporting
VLAN capabilities can … …
be configured to Electrical Engineering Computer Science
define multiple virtual (VLAN ports 1-8) (VLAN ports 9-15)

LANS over single … operates as multiple virtual switches


Computer
Computer physical LAN
Science Electrical Engineering
Engineering infrastructure. 1 7 9 15
2 8 10 16

… …

Electrical Engineering Computer Science


(VLAN ports 1-8) (VLAN ports 9-16)

Link Layer 5-76 Link Layer 5-77


Port-based VLAN VLANS spanning multiple switches
router
❖ traffic isolation: frames
to/from ports 1-8 can only 1 7 9 15 1 3 5 7

reach ports 1-8 2 8 10 16 2 4 6 8

▪ can also define VLAN based on


MAC addresses of endpoints, … …
rather than switch port 1 7 9 15

2 8 10 16
Electrical Engineering Computer Science Ports 2,3,5 belong to EE VLAN
❖ dynamic membership: ports (VLAN ports 1-8) (VLAN ports 9-15) Ports 4,6,7,8 belong to CS VLAN

can be dynamically assigned … …


among VLANs Electrical Engineering Computer Science ❖ trunk port: carries frames between VLANS defined over
(VLAN ports 1-8) (VLAN ports 9-15)
multiple physical switches
❖ forwarding between VLANS: done ▪ frames forwarded within VLAN between switches can’t be vanilla 802.1
via routing (just as with separate frames (must carry VLAN ID info)
switches) ▪ 802.1q protocol adds/removed additional header fields for frames
▪ in practice vendors sell combined forwarded between trunk ports
switches plus routers

Link Layer 5-78 Link Layer 5-79

802.1Q VLAN frame format Link layer, LANs: outline


type

preamble dest. source data (payload) CRC


802.1 frame
5.1 introduction, services 5.5 link virtualization:
address address
5.2 error detection, MPLS
type
correction 5.6 data center
data (payload) CRC 802.1Q frame 5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
2-byte Tag Protocol Identifier
(value: 81-00)
Recomputed ▪ addressing, ARP

CRC
Ethernet
▪ switches
Tag Control Information (12 bit VLAN ID field, ▪ VLANS
3 bit priority field like IP TOS)

Link Layer 5-80 Link Layer 5-81


Multiprotocol label switching (MPLS) MPLS capable routers
❖ initial goal: high-speed IP forwarding using fixed
length label (instead of IP address) ❖ a.k.a. label-switched router
▪ fast lookup using fixed length identifier (rather than ❖ forward packets to outgoing interface based only on
shortest prefix matching) label value (don’t inspect IP address)
▪ borrowing ideas from Virtual Circuit (VC) approach ▪ MPLS forwarding table distinct from IP forwarding tables
▪ but IP datagram still keeps IP address! ❖ flexibility: MPLS forwarding decisions can differ from
those of IP
PPP or Ethernet ▪ use destination and source addresses to route flows to
MPLS header IP header remainder of link-layer frame
header same destination differently (traffic engineering)
▪ re-route flows quickly if link fails: pre-computed backup
paths (useful for VoIP)
label Exp S TTL

20 3 1 5
Link Layer 5-82 Link Layer 5-83

MPLS versus IP paths MPLS versus IP paths


entry router (R4) can use different MPLS
routes to A based, e.g., on source address
R6 R6
D D
R4 R3 R4 R3
R5 R5
A A
R2 R2

❖ IP routing: path to destination ❖ IP routing: path to destination


IP router IP-only
determined by destination address determined by destination address router
alone alone MPLS and
❖ MPLS routing: path to destination can IP router
be based on source and dest. address
▪ fast reroute: precompute backup routes in
Link Layer 5-84 case of link failure Link Layer 5-85
Link layer, LANs: outline Data center networks
❖ 10’s to 100’s of thousands of hosts, often closely
5.1 introduction, services 5.5 link virtualization: coupled, in close proximity:
5.2 error detection, MPLS ▪ e-business (e.g. Amazon)
correction 5.6 data center ▪ content-servers (e.g., YouTube, Akamai, Apple, Microsoft)
5.3 multiple access networking ▪ search engines, data mining (e.g., Google)
protocols 5.7 a day in the life of a
❖ challenges:
5.4 LANs web request
▪ multiple applications, each
▪ addressing, ARP serving massive numbers of
▪ Ethernet clients
▪ switches ▪ managing/balancing load,
▪ VLANS avoiding processing,
networking, data bottlenecks
Inside a 40-ft Microsoft container,
Chicago data center
Link Layer 5-88 Link Layer 5-89

Data center networks Data center networks


load balancer: application-layer routing ❖ rich interconnection among switches, racks:
▪ receives external client requests
▪ directs workload within data center
▪ increased throughput between racks (multiple routing
▪ returns results to external client (hiding data paths possible)
Internet center internals from client) ▪ increased reliability via redundancy

Border router
Load Load
Access router Tier-1 switches
balancer balancer

Tier-1 switches
B
Tier-2 switches

A C Tier-2 switches

TOR TOR
switches switches
Server racks Server racks

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Link Layer 5-90


Link layer, LANs: outline Synthesis: a day in the life of a web request
❖ journey down protocol stack complete!
5.1 introduction, services 5.5 link virtualization: ▪ application, transport, network, link
5.2 error detection, MPLS
❖ putting-it-all-together: synthesis!
correction 5.6 data center ▪ goal: identify, review, understand protocols (at all
5.3 multiple access networking layers) involved in seemingly simple scenario:
protocols 5.7 a day in the life of a requesting www page
5.4 LANs web request ▪ scenario: student attaches laptop to campus network,
requests/receives www.google.com
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-92 Link Layer 5-93

A day in the life: scenario A day in the life… connecting to the Internet
DHCP DHCP ❖ connecting laptop needs to
UDP
browser DNS server DHCP
DHCP IP
get its own IP address, addr
Jio/Airtel network DHCP Eth of first-hop router, addr of
68.80.0.0/13 Phy DNS server: use DHCP
DHCP

❖ DHCP request encapsulated


in UDP, encapsulated in IP,
DHCP
school network DHCP
DHCP UDP
encapsulated in 802.3
68.80.2.0/24
DHCP IP Ethernet
DHCP Eth router
❖ Ethernet frame broadcast
web page
Phy (runs DHCP)
(dest: FFFFFFFFFFFF) on
LAN, received at router
web server Google’s network
running DHCP server
64.233.169.105 64.233.160.0/19 ❖ Ethernet demuxed to IP
demuxed, UDP demuxed to
DHCP
Link Layer 5-94 Link Layer 5-95
A day in the life… connecting to the Internet A day in the life… ARP (before DNS, before HTTP)
❖ before sending HTTP request, need
DHCP DHCP ❖ DHCP server formulates DNS DNS
DHCP UDP DHCP ACK containing DNS UDP IP address of www.google.com:
DHCP IP client’s IP address, IP DNS
ARP
IP DNS
DHCP Eth address of first-hop router ARP query Eth
Phy for client, name & IP Phy ❖ DNS query created, encapsulated
address of DNS server in UDP, encapsulated in IP,
❖ encapsulation at DHCP encapsulated in Eth. To send frame
DHCP server, frame forwarded ARP
to router, need MAC address of
DHCP ARP reply Eth
DHCP UDP (switch learning) through Phy router interface: ARP
IP LAN, demultiplexing at

DHCP

DHCP Eth router client router ARP query broadcast, received by


Phy (runs DHCP)
❖ DHCP client receives (runs DHCP) router, which replies with ARP
DHCP
DHCP ACK reply reply giving MAC address of
router interface
❖ client now knows MAC address
Client now has IP address, knows name & addr of of first hop router, so can now
DNS send frame containing DNS
query
server, IP address of its first-hop router
Link Layer 5-96 Link Layer 5-97

A day in the life… using DNS DNS A day in the life…TCP connection carrying HTTP
DNS UDP DNS server
DNS IP HTTP
DNS DNS DNS Eth HTTP
DNS UDP DNS Phy
SYNACK
SYN TCP
DNS IP SYNACK
SYN IP
DNS Eth SYNACK
SYN Eth
Phy Phy
DNS
Jio/Airtel network
68.80.0.0/13
❖ to send HTTP request,
client first opens TCP
socket to web server
❖ IP datagram forwarded from
router router ❖ TCP SYN segment (step 1 in 3-
(runs DHCP) campus network into Jio/Airtel (runs DHCP)
way handshake) inter-domain
SYNACK
SYN TCP
❖ IP datagram containing DNS network, routed (tables created routed to web server
SYNACK
SYN IP
query forwarded via LAN by RIP, OSPF, IS-IS and/or BGP SYNACK
SYN Eth
switch from client to 1st hop routing protocols) to DNS server Phy ❖ web server responds with TCP
router ❖ demux’ed to DNS server SYNACK (step 2 in 3-way
handshake)

web server
DNS server replies to client
with IP address of
64.233.169.105 ❖ TCP connection established!
www.google.com
Link Layer 5-98 Link Layer 5-99
A day in the life… HTTP request/reply
Chapter 5: Summary
HTTP
HTTP HTTP ❖ web page finally (!!!) displayed

TCP
HTTP
HTTP
HTTP
HTTP IP
principles behind data link layer services:
HTTP
HTTP Eth ▪ error detection, correction
Phy
▪ sharing a broadcast channel: multiple access
▪ link layer addressing
❖ HTTP request sent into ❖ instantiation and implementation of various link
TCP socket layer technologies
router ❖ IP datagram containing HTTP ▪ Ethernet
▪ switched LANS, VLANs
HTTP HTTP (runs DHCP) request routed to
HTTP TCP
www.google.com
HTTP IP ▪ virtualized networks as a link layer: MPLS
Eth ❖ web server responds with

HTTP

Phy HTTP reply (containing web synthesis: a day in the life of a web request
page)
web server
❖ IP datagram containing HTTP
64.233.169.105
reply routed back to client
Link Layer5-100 Link Layer5-101

Chapter 5: let’s take a breath


❖ journey down protocol stack complete (except
PHY)
❖ solid understanding of networking principles,
practice
❖ ….. could stop here …. but lots of interesting
topics!
▪ wireless
▪ multimedia
▪ security
▪ network management

Link Layer5-102

You might also like