Topics1 / What are Proxy Servers?
Module
Proxy servers reside between end users and
the internet.
Cache web content to reduce bandwidth
usage.
When a user requests a web page through
the proxy, the cache is checked first. If the
page is present in the cache, it is returned to
the user without the need for the proxy
server to access the internet.
If the page is not present, the proxy server
retrieves it from the internet.
© 2003 Campbell Wilson Page: 1 A Monash University Company
Module 1/ Proxy Servers Diagram
Internet
Proxy Server Cache
Client Client
Client Client Client
© 2003 Campbell Wilson Page: 2 A Monash University Company
Topics1 / Why Use Proxy Servers?
Module
Reduce bandwith usage.
Faster access to cached pages.
Filter access to inappropriate sites.
Hide internal IP addresses.
Creation of an audit trail for internet access.
Can be used in conjunction with firewalling
for increased protection of internal network.
Enable a common and controlled approach
to internet access in an organisation.
© 2003 Campbell Wilson Page: 3 A Monash University Company
Topics 2 / What is Squid?
Module
An open source implementation of a proxy server.
Designed specifically to run on UNIX servers
(although Windows versions exist).
Supports (www.squid-cache.org):
▪ proxying and caching of HTTP, FTP, and other URLs
▪ proxying for SSL
▪ cache hierarchies
▪ ICP, HTCP, CARP, Cache Digests
▪ transparent caching
▪ WCCP (Squid v2.3 and above)
▪ extensive access controls
▪ HTTP server acceleration
▪ SNMP
▪ caching of DNS lookups
© 2003 Campbell Wilson Page: 4 A Monash University Company
Topics 2 / Why Squid?
Module
Low cost.
Leverage existing open source servers (e.g.
Linux).
Highly configurable.
Constant development.
Mature (>8 years) software.
Probably the most widely deployed proxy
solution (source www.web-cache.com)
© 2003 Campbell Wilson Page: 5 A Monash University Company
Module 3 / Overview
System requirements
Obtaining the source code
Compiling Squid from source
Basic configuration for initial testing
Useful sites :
▪ http://www.squid-cache.org/
▪ http://www.deckle.co.za/squid-users-guide/
© 2003 Campbell Wilson Page: 6 A Monash University Company
Module 3 / Selecting Hardware
In decreasing order of importance:
▪ Disk random seek time
▪ System main memory
▪ Sustained disk throughput
▪ CPU power
Squid is not CPU intensive.
© 2003 Campbell Wilson Page: 7 A Monash University Company
Module 3 / Disk Considerations
Disk seek time normally specified by the
manufacturer as an average seek time for a
random access.
Most important factor since Squid is heavily reliant
on disks for caching.
Maximum number of cache requests per second is
approximately:
rps = 1000/(seek time in ms * no. of disks)
Seek time of much higher importance than
throughput since throughput usually higher than
internal network transfer speed.
© 2003 Campbell Wilson Page: 8 A Monash University Company
Module 3 / Disk Considerations
Disk space required depends on:
▪ projected throughput of data
▪ what proportion of data to be cached
Can use gathered statistics on usage patterns to
estimate projected throughput, or consider the
maximum throughput of your external connection to
the internet.
E.g. caching 100% of 100KB/sec for 10 hours per
day requires approx. :
100 x 3600 x 10 KB, or approx. 3.4GB
But…more disk usage means larger index in
RAM…
© 2003 Campbell Wilson Page: 9 A Monash University Company
Module 3 / Main Memory
Squid keeps an index of all objects in the
cache in RAM.
Each index entry approx 75 bytes per object.
Assume average web page size of 50K, so:
1 GB of cache storage = ~21,000 web pages
(1024*1024 / 50)
20,000 web pages requires approx 1.5MB of
RAM for index (21000*75 / 1024*1024)
© 2003 Campbell Wilson Page: 10 A Monash University Company
Module 3 / Operating System
Squid runs on:
▪ Linux
▪ FreeBSD
▪ Apple OS/X
▪ Digital UNIX and OSF/1
▪ Irix
▪ Sun Solaris
▪ SCO Unix
▪ NeXTStep
▪ IBM AIX
▪ HP-UX
▪ Microsoft Windows NT
And more! (generally any recent UNIX O/S will be supported)
© 2003 Campbell Wilson Page: 11 A Monash University Company
Module 3 / Installing Squid
Refer to Session Tasks for this week
© 2003 Campbell Wilson Page: 12 A Monash University Company
Module 4 / Further Configuration
Squid would normally be started from UNIX
startup scripts.
In Linux, startup scripts are commonly placed
in /etc/init.d and symlinked from the
appropriate run level directories (/etc/rcx.d
where x is the run level).
For the lab tasks, we will start Squid from the
command-line.
© 2003 Campbell Wilson Page: 13 A Monash University Company
Module 4 / Browser Configuration
Internet Explorer and Firefox need to be
configured to access the internet through the
proxy server.
Need to set the proxy name or IP and port for
each client browser you wish to use the
proxy server.
© 2003 Campbell Wilson Page: 14 A Monash University Company
Module 4 / Automatic Browser Config
Manual configuration of a large number of
client browsers can be extremely time
consuming.
This overhead is also incurred when changes
are made to the proxy configuration that
require a change in all browser
configurations (e.g. change proxy port)
Can use automatic browser configuration in
these cases, making use of “proxy automatic
configuration” (pac) scripts.
© 2003 Campbell Wilson Page: 15 A Monash University Company
Module 5 / Access Control
Access control refers to the implementation
of a strategy to prevent unauthorised access
to your web proxy server.
Squid uses access control lists (ACLs).
ACLs are specified in squid.conf.
Very flexible access control options through
ACL specification.
© 2003 Campbell Wilson Page: 16 A Monash University Company
Module 5 / acl tag
Acl tag in squid.conf used to specify an
access control list.
Usage:
acl aclname acltype type_parameters
type_parameters depends on the acltype
Can use all to match all IP addresses (all is a
predefined acl in squid.conf)
© 2003 Campbell Wilson Page: 17 A Monash University Company
Module 5 / ACL elements (acltype)
src: source (client) IP addresses
dst: destination (server) IP addresses
myip: the local IP address of a client's connection
srcdomain: source (client) domain name
dstdomain: destination (server) domain name
srcdom_regex: source (client) regular expression pattern matching
dstdom_regex: destination (server) regular expression pattern matching
time: time of day, and day of week
url_regex: URL regular expression pattern matching
urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname
port: destination (server) port number
myport: local port number that client connected to
proto: transfer protocol (http, ftp, etc)
method: HTTP request method (get, post, etc)
browser: regular expression pattern matching on the request's user- agent header
ident: string matching on the user's name
ident_regex: regular expression pattern matching on the user's name
src_as: source (client) Autonomous System number
dst_as: destination (server) Autonomous System number
proxy_auth: user authentication via external processes
proxy_auth_regex: user authentication via external processes
snmp_community: SNMP community string matching
maxconn: a limit on the maximum number of connections from a single client IP address
req_mime_type: regular expression pattern matching on the request content-type header
arp: Ethernet (MAC) address matching
rep_mime_type: regular expression pattern matching on the reply (downloaded content) content-type header. This is
only usable in the http_reply_access directive, not http_access.
external: lookup via external acl helper defined by external_acl_type
© 2003 Campbell Wilson Page: 18 A Monash University Company
Module 5 / Access control tags
Access control tags can be used in
squid.conf to specify which types of access
to allow/deny for the previously defined
access control lists.
Used in conjunction with either an allow or a
deny keyword to allow or deny access
respectively to construct access rules.
Incoming requests are checked against the
access rules.
© 2003 Campbell Wilson Page: 19 A Monash University Company
Module 5 / Access control tags
http_access: Control HTTP client access to the HTTP port.
http_reply_access: Controls whether HTTP clients (browsers) receive the reply to their
request. This further restricts permissions given by http_access, and is primarily intended
to be used together with the rep_mime_type acl type for blocking different content types.
icp_access: Control neighbor cache access to your cache via ICP.
miss_access: Controls whether certain clients may forward cache misses through your
cache. This further restricts permissions given by http_access, and is primarily intended to
be used for enforcing sibling relations by denying siblings from forwarding cache misses
through your cache.
no_cache: Defines responses that should not be cached.
redirector_access: Controls which requests are sent through the redirector pool.
ident_lookup_access: Controls which requests need an Ident lookup.
always_direct: Controls which requests should always be forwarded directly to origin
servers.
never_direct: Controls which requests should never be forwarded directly to origin
servers.
snmp_access: Controls SNMP client access to the cache.
broken_posts: Defines requests for which squid appends an extra CR/LF after POST
message bodies as required by some broken origin servers.
cache_peer_access: Controls which requests can be forwarded to a given neighbor
(peer).
© 2003 Campbell Wilson Page: 20 A Monash University Company
Module 5 / Processing Access Rules
Multiple values in acltype parameters are
processed using OR logic.
e.g. src 130.194.224.160 130.194.224.161
equivalent to “source IP 130.194.224.160 OR
source IP 130.194.224.161”
Multiple values in access rules are processed using
AND logic.
E.g. http_access allow list1 list2
will only allow access if an IP is in both list1 and
list2.
© 2003 Campbell Wilson Page: 21 A Monash University Company
Module 5 / Processing Access Rules
Access rules are processed in the order they
appear in the squid.conf file.
The first rule that matches terminates the search of
the rules.
If no rules are matched, the default action taken by
squid is the opposite of the last rule in the list. It is
always best to explicitly include a default rule at the
end of the list which will always be matched, e.g:
http_access deny all
© 2003 Campbell Wilson Page: 22 A Monash University Company
Module 5 / Access Control Examples
acl allowed_clients src 192.168.0.10 192.168.0.20 192.168.0.30
http_access allow allowed_clients
http_access deny !allowed_clients
(from linuxfocus.org)
© 2003 Campbell Wilson Page: 23 A Monash University Company
Module 5 / Access Control Examples
acl allowed_clients src 192.168.0.1/255.255.255.0
acl regular_days time MTWHF 10:00-16:00
http_access allow allowed_clients regular_days
http_access deny allowed_clients
© 2003 Campbell Wilson Page: 24 A Monash University Company
Module 5 / Access Control Examples
acl hosts1 src192.168.0.10
acl hosts2 src 192.168.0.20
acl hosts3 src 192.168.0.30
acl morning time 10:00-13:00
acl lunch time 13:30-14:30
acl evening time 15:00-18:00
http_access allow host1 morning
http_access allow host1 evening
http_access allow host2 lunch
http_access allow host3 evening
http_access deny all
© 2003 Campbell Wilson Page: 25 A Monash University Company
Module 5 / Access Control Examples
acl allowed_clients src 192.168.0.1/255.255.255.0
acl banned_sites url_regex abc.com *()(*.com
http_access deny banned_sites
http_access allow allowed_clients
© 2003 Campbell Wilson Page: 26 A Monash University Company
Module 5 / Access Control Examples
acl allowed_clients src 192.168.0.1/255.255.255.0
acl banned_sites url_regex dummy fake
http_access deny banned_sites
http_access allow allowed_machines
© 2003 Campbell Wilson Page: 27 A Monash University Company
Module 5 / Access Control Examples
acl allowed_clients src 192.168.0.1/255.255.255.0
acl banned_sites url_regex "/etc/banned.list"
http_access deny banned_sites
http_access allow allowed_clients
© 2003 Campbell Wilson Page: 28 A Monash University Company
Module 6 / Squid Log Files
Log files:
▪ squid.out
▪ cache.log
▪ useragent.log
▪ store.log
▪ access.log
© 2003 Campbell Wilson Page: 29 A Monash University Company
Module 6 / access.log
The most important of the Squid log files.
Covered briefly in the first practical exercise.
Format of access.log file entries (native format)
▪ Time of access
▪ Duration in ms
▪ Client IP
▪ Squid specific result code
▪ Bytes transferred
▪ HTTP request method
▪ URL
▪ Ident lookup
▪ Hierarchy code
▪ MIME type
© 2003 Campbell Wilson Page: 30 A Monash University Company
Module 6 / access.log
Squid result codes:
▪ TCP_HIT
The requested object was found in the cache and was
valid.
▪ TCP_MISS
The requested object was not found in the cache.
▪ TCP_REFRESH_HIT
The requested object was found but was not valid
because it was stale (needs to be refreshed). An HTTP
304 response (Not modified) was received to an IMS (If-
Modified-Since) HTTP request to the external server.
© 2003 Campbell Wilson Page: 31 A Monash University Company
Module 6 / access.log
Squid result codes:
▪ TCP_REF_FAIL_HIT
The requested object was found in the cache, but it was not
valid because it was stale. IMS query failed (no response
received from the external server) and the stale object was
delivered to the client from the cache.
▪ TCP_REFRESH_MISS
The requested object was found in the cache, but it was not
valid because it was stale. IMS query returned the updated
(refreshed) content.
▪ TCP_CLIENT_REFRESH_MISS
The cache has to fetch the object again from the external
server because the client requested no caching in their
request.
© 2003 Campbell Wilson Page: 32 A Monash University Company
Module 6 / access.log
Squid result codes:
▪ TCP_IMS_HIT
The client issued an IMS request for an object which was in
the cache and fresh.
▪ TCP_SWAPFAIL_MISS
The object was expected to be in the cache (it had been
swapped out of memory to the disk according to the ) but was
not found there (may indicate corruption of the cache)
▪ TCP_NEGATIVE_HIT
Responses such as HTTP/404 (not found) can be “negatively
cached” for 5 minutes by default in Squid (this time can be
changed in squid.conf). “If it was not there 5 minutes ago, it is
unlikely to be there now”
© 2003 Campbell Wilson Page: 33 A Monash University Company
Module 6 / access.log
Squid result codes:
▪ TCP_MEM_HIT
Object was located in cache memory (not disk).
▪ TCP_DENIED
Access was denied to the object requested.
▪ TCP_OFFLINE_HIT
The requested object was found in the cache
while Squid was in offline mode.
© 2003 Campbell Wilson Page: 34 A Monash University Company
Module 6 / access.log
Squid result codes:
▪ Also other codes for UDP requests.
▪ UDP_HIT
▪ UDP_MISS
▪ UDP_DENIED
▪ UDP_INVALID
▪ UDP_MISS_NOFETCH
© 2003 Campbell Wilson Page: 35 A Monash University Company