Nasser Abouzakhar
23rd, Nov 2017
Content
Introduction
HTTP
– Request Messages
– Response Messages
– Uniform Resource Identifiers
– TCP Connections
– Caching
Apache
2
Introduction
WWW made the Internet accessible
– Originally designed to organise and retrieve information
using hypertext interlinked docs
Hypertext is about having one doc that can link to another doc
– HTTP and HTML were designed to meet that requirement
URLs provide information that allows objects on the Web to be
located (basis of hypertext system)
– Points to files that may be located on other machines which
is the core of the hypertext part of HTTP & HTML
Source: Peterson & Davie, 2012 p 708
3
HTTP (Hyper Text Transfer
Protocol)
4
HTTP
It is a request/response protocol
– Web browsers use HTTP protocol to fetch web pages from
web servers
– It is a text-oriented protocol running over TCP
Example: If you opened the UH’s URL
http://www.herts.ac.uk/index.html, your web browser would
open a TCP connection to the web server www.herts.ac.uk
– Your browser would immediately retrieve and display the file
called index.html
– Often webpages contain images, text and objects such as
audio, video clips, pieces of code, or URLs
Source: Peterson & Davie, 2012 p 709
5
HTTP Message
where <CRLF> stands for
carriage-return+line-feed.
START-LINE indicates whether this is a request message or a
response message. In case of
a request it identifies the “remote procedure” to be executed
a response it identifies the status of the request
(MESSAGE_HEADER) is where a server’s host name is specified
(MESSAGE_BODY) is where a server would place the requested
page when responding to a request
Source: Peterson & Davie, 2012 p 710
6
Request Messages
Request Messages
The first line of HTTP request message specifies 3 parts:
– The operation to be performed (e.g. GET, HEAD),
– The Webpage the operation should be performed on, and
– The HTTP version
Source: Peterson & Davie, 2012 p 711
8
Example (1)
The START_LINE
Option 1: indicates that the client wants the server on host
www.cs.princeton.edu to return the page named index.html
Option 2: to use a relative identifier and specify the host
name in one of the MESSAGE_HEADER lines
Host is one of MESSAGE_HEADER fields
Source: Peterson & Davie, 2012 p 712
9
Response Messages
10
Response Messages
Start with a single START_LINE and include:
– the HTTP version,
– A three-digit code indicating whether or not the request
was successful, and
– A text string giving the reason for the response
Source: Peterson & Davie, 2012 p 712
11
Example (2)
The START_LINE
indicates that the server managed to satisfy the request
shows that it was not able to satisfy the request
The Princeton Computer Science Department Webpage had moved
from http://www.cs.princeton.edu/index.html to
http://www.princeton.edu/cs/index.html
Source: Peterson & Davie, 2012 p 712
12
Response Messages, cont.
If successful, the response message will carry the requested
page which is an HTML document
The requested page may contain nontextual data e.g. GIF
image and encoded using MIME (base64)
The MESSAGE_HEADER lines give attributes of the page
contents, including:
– Content-Length that is the number of bytes in the contents
– Expires (time at which the contents are considered stale), and
– Last-Modified that is time which the contents were last
modified at the server
Source: Peterson & Davie, 2012 p 713
13
URIs (Uniform Resource
Identifiers)
14
Uniform Resource Identifiers
(URIs)
A URI is a character string that identifies a resource
– URLs are one type of URI
– A resource can be anything that has identity such as a doc, video
The format of URIs allows different sorts of resource identifiers to
be incorporated into the URI space
– The first part of a URI is a scheme that names a particular way of
identifying a certain kind of resource
– The second part, separated from the first part by a colon is the
scheme-specific part, as follows:
Source: Peterson & Davie, 2012 p 714
15
TCP Connection
16
TCP Connection
HTTP version 1.0
established a separate
TCP connection for each
data item retrieved from
the server
Note: some of the TCP
ACKs are not shown
Source: Peterson & Davie, 2012 p 715
17
TCP Connection, cont.
HTTP version 1.1 introduced
persistent connections
– The client & server can exchange
multiple request/response
messages over the same TCP
connections
Advantages:
– Eliminate the setup overhead
– TCP’s congestion window mechanism operates efficiently
it is not necessary to go through the slow start phase
for each page
Source: Peterson & Davie, 2012 p 715
18
TCP Connection, cont.
Disadvantages:
– neither the client nor server
knows how long to keep a
particular connection
– Server might be asked to keep
connections opened on behalf of
1000s of clients
Solution:
server must timeout and close a connection if it has
received no requests on the connection for a period of time
Source: Peterson & Davie, 2012 p 716
19
Caching
20
Caching
Benefits:
– Faster retrieve and display of pages from a nearby cache
– Load reduction on the server
Can be implemented in various places as follows:
– Internet Browser: can cache recently accessed pages,
– Website or Proxy: can support a single site-wide cache to allow users
to take advantage of previously downloaded pages, and
– ISP’s Router: can peek inside the request message and look at the
URL for the requested page. If it has the page in its cache, it returns it.
Source: Peterson & Davie, 2012 p 717
21
Caching, cont.
Caching Requirement:
– The cache needs to make sure that it is not responding with
an out-of-date version of the requested page
Example: the server assigns an expiration date to each page it
sends back to the client
– The cache remembers this date and only verifies the page
when that expiration date has passed
– After that time, the cache can use the HEAD or conditional
GET operation to verify that it has the most recent copy
Source: Peterson & Davie, 2012 p 717
22
Apache
23
Introduction
Apache is the most popular HTTP Web server, free software
and open source
It has a web server market share of > 50%
Advantages:
– Stable, efficient and flexible software
– Operates on a large number of popular OS platforms
Used by major websites such as amazon.com, IBM
Combined with Python, Perl, and PHP, Apache allows you
to develop customised and dynamic Web applications
24
Process Ownership and Security
Each process has an owner with limited rights on the system
Whenever a process is started, it inherits the permissions of its
parent process
– e.g. as a root user, the shell in which you’re doing your work
and any of its processes will have the same rights as you
Apache starts with root permissions to carryout initial network
functions
– binds itself to port 80 so that it can listen for clients requests
– Once it does this, it can give up its rights and run as a non-root
user, as indicated in its configuration files
25
Process Ownership and Security,
cont.
By limiting the permissions of the server, you reduce the
likelihood of sending malicious requests to the server
– Any input coming from a client over the network shouldn’t be
allowed to make CGI script perform unacceptable operation
– Due to improperly configured web servers some successful
attacks could be achieved easily
You can use APT to install Apache for a Debian-based Linux
distro as follows:
…:~$ sudo apt-get –y install apache2
26
Reference
Computer Networks: A systems approach
by Larry Peterson and Bruce Davie (Fifth
edition)
27
Resources
www.apache.org
http://httpd.apache.org/
28