Chapter 5 Naming
Chapter 5 Naming
Outline
5.1 Names, Identifiers and Addresses
5.2 Flat Naming
5.3 Structured Naming
5.4 Attribute-based Naming
2
Objectives of the Chapter
3
Introduction
Names play an important role to:
share resources
uniquely identify entities
refer to locations
etc.
An important issue is that a name can be resolved to
the entity it refers to
To resolve names, it is necessary to implement a
naming system
In a distributed system, the implementation of a naming
system is itself often distributed, unlike in non-
distributed systems
Efficiency and scalability of the naming system are the
main issues
4
5.1 Names, Identifiers, and Addresses
A name in a distributed system is a string of bits or
characters that is used to refer to an entity
An entity is anything; e.g., resources such as hosts,
printers, disks, files, objects, processes, users, web pages,
...
Entities can be operated on;
e.g., a resource such as a printer offers an interface
containing operations for printing a document,
requesting the status of a job, ...
to operate on an entity, it is necessary to access it through
its access point, itself an entity (special)
5
Access point
the name of an access point is called an address (such as IP
address and port number as used by the transport layer)
the address of an access point of an entity is also simply
called an address of that entity
an entity can offer more than one access point (similar to
accessing an individual through different telephone numbers)
an entity may change its access point in the course of time
(e.g., a mobile computer getting a new IP address as it
moves)
6
An address is a special kind of name
it refers to an access point of an entity
each entity is referred by at most one address; even when
replicated such as in Web pages
an entity may easily change an access point, or an access
point may be reassigned to a different entity (like
telephone numbers in offices)
separating the name of an entity and its address makes it
easier and more flexible; such a name is called location
independent
7
There are also other types of names that uniquely
identify an entity;
A true identifier is a name with the following properties
An identifier refers to at most one entity
Each entity is referred to by at most one identifier
An identifier always refers to the same entity (never
reused)
identifiers allow us to unambiguously refer to an entity
8
Examples
Name of an FTP server (entity)
URL of the FTP server
Address of the FTP server
IP number:port number
The address of the FTP server may change
There is often a close relationship between name
resolution in distributed systems and message
routing
A naming system maintains a name-to-address
binding - just a table of (name, address) pairs
In DS there is no centralized table
9
There are three classes of naming systems:
Flat naming,
Structured naming, and
Attribute-based naming
10
5.2 Flat Naming
Identifiers are convenient to uniquely represent
entities
Flat name or unstructured name is a simply random
bit strings used as identifiers
An important property of such name is that it does not
contain any information whatsoever on how to locate
the access point of its associated entity
Difficult to be used in a large system since it must be
centrally controlled to avoid duplication
How are flat names resolved?
Name resolution: mapping a name to an address
or an address to a name is called name-address
resolution
Possible solutions:
Simple,
Home-based approaches, and
Hierarchical approaches 11
1. Simple Solutions
two solutions for LANs:
Broadcasting and
Forwarding Pointers
large-scale network
Long chain of forwarding pointers introduce performance problem
12
a. Broadcasting and Multicasting
a computer that wants to access another computer for
which it knows its IP address broadcasts this address
the owner responds by sending its Ethernet address
used by Internet Address Resolution Protocol (ARP) to
find the data link address (MAC address) of a machine
when giving only an IP address
Broadcasting is inefficient when the network grows
wastage of bandwidth and too much interruption to
other machines
Multicasting is better when the network grows
send only to a restricted group of hosts
Multicasting can also be used to locate the nearest replica
- choose the one whose reply comes in first
13
b. Forwarding Pointers
How to look mobile entities?
When an entity moves from A to B, it leaves behind a
reference to its new location
Advantage
simple: as soon as the first name is located using
traditional naming service, the chain of forwarding
pointers can be used to find the current address
Drawbacks
the chain can be too long - locating becomes
expensive
all the intermediary locations in a chain have to
maintain their pointers
vulnerability if links are broken
hence, making sure that chains are short and that
forwarding pointers are robust is an important issue
14
2. Home-Based Approaches
broadcasting and multicasting have scalability
problems;
performance problems and broken links are problems
in forwarding pointers
a home location keeps track of the current location of
an entity;
often it is the place where an entity was created
Used as a fall-back mechanism for location services
based on forwarding pointers;
it is a two-tiered approach
15
2. Home-Based Approaches …
an example where it is used in Mobile IP [Note 3.6]
each mobile host uses a fixed IP address
all communication to that IP address is initially
directly sent to the host’s home agent located on the
LAN corresponding to the network address
contained in the mobile host’s IP address
Whenever the mobile host moves to another
network, it requests a temporary address in the new
network (called care-of-address) and informs the
new address to the home agent
When the home agent receives a message for the
mobile host it forwards it to its new address and also
informs the sender the host’s current location for
sending other packets
16
home-based approach: the principle of Mobile IP
17
Problems:
creates communication latency
the host is unreachable if the home does no more exist
(permanently changed);
The solution is to register the home at a traditional name
service
18
3. Hierarchical Approaches
a generalization of the two-tiered approach into multiple layers
a network is divided into a collection of domains, similar to DNS
a single top-level domain spans the entire network
each domain can be subdivided into multiple, smaller domains
the lowest-level domain is called a leaf domain; typically a LAN
each domain D has an associated directory node dir(D) that keeps
track of the entities in that domain leading to a tree of directory nodes
the root (directory) node knows about all entities
19
hierarchical organization of a location service into domains, each having an associated directory node
20
each entity is represented by a location record in the
directory node dir(D) to keep track of its whereabouts
a location record for an entity in a leaf domain contains the
entity’s current address;
all other high-level domains will have only pointers to this
address;
this means the root node will store only pointers to all
entities
an entity may have multiple addresses,
for instance, if it is replicated;
a higher level domain containing the two subdomains
where the entity has addresses will have two pointers
21
an example of storing information of an entity having two addresses in different leaf domains
22
Example of a look up operation
a client (in Domain D) would like to locate an entity E
24
a name space is generally organized as a labeled, directed
graph with two types of nodes
leaf node: represents the named entity and stores
information such as its address or the state of that entity
directory node: a special entity that has a number of
outgoing edges, each labeled with a name
each node in a naming graph is considered as another entity
with an identifier
25
A general naming graph with a single root node, n0
26
a directory node stores a table in which an outgoing
edge is represented as a pair (node identifier, edge label),
called a directory table
each path in a naming graph can be referred to by the
sequence of labels corresponding to the edges of the
path and the first node in the path, such as
N:<label-1, label-2, ..., label-n>, where N refers to the
first node in the path
such a sequence is called a path name
if the first node is the root of the naming graph, it is
called an absolute path name; otherwise it is a relative
path name
27
instead of the path name n0:<home, steen, mbox>, we often
use its string representation /home/steen/mbox
there may also be several paths leading to the same node,
e.g., node n5 can be represented as /keys or
/home/steen/keys
although the above naming graph is directed acyclic graph
(a node can have more than one incoming edge but is not
permitted to have a cycle),
the common way is to use a tree (hierarchical) with a single
root (as is used in file systems)
in a tree structure, each node except the root has exactly
one incoming edge;
the root has no incoming edges
each node also has exactly one associated (absolute)
path name
28
Name Resolution
given a path name, the process of looking up a name stored
in the node is referred to as name resolution;
it consists of finding the address when the name is given
(by following the path)
Linking and Mounting
Linking: giving another name for the same entity (an alias)
e.g., environment variables in UNIX such as HOME that
refer to the home directory of a user
two types of links (or two ways to implement an alias):
hard link: to allow multiple absolute path names to refer
to the same node in a naming graph
e.g., in the previous graph, there are two different path
names for node n5: /keys and /home/steen/keys
29
symbolic link: representing an entity by a leaf node and
instead of storing the address or state of the entity, the
node stores an absolute path name
31
Consider a collection of name spaces distributed across
different machines (each name space implemented by a
different server)
To mount a foreign name space in a DS, the following are
at least required
the name of an access protocol (for communication)
the name of the server
the name of the mounting point in the foreign name space
Each of these names needs to be resolved
to the implementation of the protocol
to an address where the server can be reached
to a node identifier in the foreign name space
The three names can be listed as a URL
32
Example: Sun’s Network File System (NFS) is a distributed
file system with a protocol that describes how a client can
access a file stored on a (remote) NFS file server
an NFS URL may look like nfs://flits.cs.vu.nl/home/steen
- nfs is an implementation of a protocol
- flits.cs.vu.nl is a server name to be resolved using DNS
- /home/steen is resolved by the server
e.g., the subdirectory /remote includes mount points for
foreign name spaces on the client machine
a directory node named /remote/vu is used to store
nfs://flits.cs.vu.nl/home/steen
consider /remote/vu/mbox
this name is resolved by starting at the root directory on
the client’s machine until node /remote/vu, which returns
the URL nfs://flits.cs.vu.nl/home/steen
this leads the client machine to contact flits.cs.vu.nl
using the NFS protocol
then the file mbox is read in the directory /home/steen
33
mounting remote name spaces through a specific process protocol
34
Distributed systems that allow mounting a remote file
system also allow to execute some commands
Example commands to access the file system
horton$ cd /remote/vu
horton$ ls -l
by doing so the user is not supposed to worry about the
details of the actual access;
the name space on the local machine and that on the
remote machine look to form a single name space
35
The Implementation of a Name Space
a name space forms the heart of a naming service
a naming service allows users and processes to add,
remove, and lookup names
a naming service is implemented by name servers
for a distributed system on a single LAN, a single server
might suffice;
for a large-scale distributed system the implementation of a
name space is distributed over multiple name servers
Name Space Distribution
in large scale distributed systems, it is necessary to
distribute the name service over multiple name servers,
usually organized hierarchically
a name service can be partitioned into logical layers
the following three layers can be distinguished (according to
Cheriton and Mann)
36
global layer
formed by highest level nodes (root node and nodes close to
it or its children)
nodes on this layer are characterized by their stability, i.e.,
directory tables are rarely changed
they may represent organizations, groups of
organizations, ..., where names are stored in the name space
administrational layer
groups of entities that belong to the same organization or
administrational unit, e.g., departments
relatively stable
managerial layer
nodes that may change regularly, e.g., nodes representing
hosts of a LAN, shared files such as libraries or binaries, …
nodes are managed not only by system administrators, but
also by end users
37
an example partitioning of the DNS name space, including Internet-
accessible files, into three layers 38
the name space is divided into nonoverlapping parts, called
zones in DNS
a zone is a part of the name space that is implemented by a
separate name server
some requirements of servers at different layers
performance (responsiveness to lookups), availability (failure
rate), etc.
high availability is critical for the global layer, since name
resolution cannot proceed beyond the failing server; it is also
important at the administrational layer for clients in the same
organization
performance is very important in the lowest layer, since
results of lookups can be cached and used due to the relative
stability of the higher layers
they may be enhanced by client side caching (global and
administrational layers since names do not change often)
and replication; they create implementation problems since
they may introduce inconsistency problems (see Chapter 7)
39
Item Global Administrational Managerial
a comparison between name servers for implementing nodes from a large-scale name
space partitioned into a global layer, an administrational layer, and a managerial
layer
40
Implementation of Name Resolution
recall that name resolution consists of finding the address
when the name is given
assume that name servers are not replicated and that no
client-side caches are allowed
each client has access to a local name resolver, responsible
for ensuring that the name resolution process is carried out
e.g., assume the path name
root:<nl, vu, cs, ftp, pub, globe, index.txt>
is to be resolved
or using a URL notation, this path name would correspond
to ftp://ftp.cs.vu.nl/pub/globe/index.txt
41
Resolution
mapping a name to an address or an address to a name is called name-
address resolution
Resolver
a host that needs to map an address to a name or a name to an address
calls a DNS client named a resolver
the resolver accesses the closest DNS server with a mapping request
if the server has the information it satisfies the resolver; otherwise, it
either refers the resolver to other servers (called Iterative Resolution) or
asks other servers to provide the information (called Recursive
Resolution)
42
Iterative
a name resolver hands over the complete name to the root
name server
the root name server will resolve the name as far as it can and
return the result to the client; at the minimum it can resolve the
first level and sends the name of the first level name server to
the client
the client calls the first level name server, then the second, ...,
until it finds the address of the entity
45
communication costs may be reduced in recursive name
resolution
the comparison between recursive and iterative name resolution with respect to communication
costs; assume the client is in Ethiopia and the name servers in the Netherlands
Summary
Method Advantage(s)
Recursive Less Communication cost; Caching is more effective
Iterative Less performance demand on name servers
46
Example - The Domain Name System (DNS)
one of the largest distributed naming services is the Internet
DNS
it is used for looking up host addresses and mail servers
hierarchical, defined in an inverted tree structure with the root
at the top
the tree can have only 128 levels
47
Label
each node has a label, a string with a maximum of 63
characters (case insensitive)
the root label is null
children of a node must have different names (to guarantee
uniqueness)
Domain Name
each node has a domain name
a full domain name is a
sequence of labels separated
by dots (the last character is a
dot; null string is nothing)
domain names are read from
the node up to the root
full path names must not
exceed 255 characters
48
Fully Qualified Domain Name (FQDN) or Absolute
terminated by a null string
contains the full name of a host, e.g., cs.aau.edu.et.
usually the last dot is omitted for readability
Partially Qualified Domain Name (PQDN) or Relative
not terminated with a null string
it starts from a node but does not reach the root
used when the name to be resolved belongs to the same site
as the client (the resolver supplies the missing part, called the
suffix to create an FQDN)
49
Domain
a domain is a subtree of the domain name space
the name of the domain is the domain name of the node at the top of the subtree
the Internet is divided into over 200 top-level domains; each partitioned into
subdomains, ... ;
the leaves represent domains that have no subdomains;
a leaf domain may contain a single host or represent a company and contain
thousands of hosts
50
Hierarchy of Name Servers
storing the information contained in the domain name space
in a single computer is inefficient and unreliable
distribute the information among many computers called
DNS servers
there is a hierarchy of name servers as we have a hierarchy
of names
51
Zone
what a server is responsible for, or has authority over, is
called a zone; zones are nonoverlapping
the server makes a database called a zone file and keeps
all the information for every node under that domain
it can divide its domain into subdomains and delegate part
of its authority to other servers
52
Root Server
a server whose zone consists of the whole tree
it usually does not store the whole information about domains but
delegates its authority to other servers and keeps references to those
servers
there are currently more than 13 root servers, each covering the whole
domain name space and distributed all around the world
Primary and Secondary Servers
a primary server is one that stores a file about the zone for which it is
an authority; it is responsible for creating, maintaining, and updating
the zone file
a secondary server is one that transfers the complete information
about a zone from another server (primary or secondary); it does not
create or update the file
such arrangement is to create redundancy so that if one server fails,
the other can still serve clients
53
Types of Top-Level Domains
two types: generic domains and country domains; there is a third one called
Inverse Domain (used to map an address to a name; we will not discuss it
further)
Generic Domains
define registered hosts according to their generic behaviour
Label Description
com Commercial organizations
edu Educational institutions
gov Government institutions
int International organizations
mil Military groups
net Network support centers
org Nonprofit organizations
54
newly introduced first-level domains
Label Description
aero Airlines and aerospace companies
biz Businesses or firms (similar to com)
coop Cooperative business organizations
info Information service providers
museum Museums and other nonprofit organizations
name Personal names (individuals)
pro Professional individual organizations
Country Domains
include one entry for every
country (as defined by ISO) -
two character abbreviations
55
the contents of a node is formed by a collection of resource
records; the important ones are the following
Type of Associated
Description
record entity
SOA (start of Holds information on the represented zone, such as an
Zone
authority) e-mail address of the system administrator
A (address) Host Contains an IP address of the host this node represents
MX (mail Refers to a mail server to handle mail addressed to this
Domain
exchange) node; it is a symbolic link; e.g. name of a mail server
SRV Domain Refers to a server handling a specific service
NS (name Refers to a name server that implements the
Zone
server) represented zone
CNAME Node Contains the canonical name of a host
Symbolic link with the primary name of the represented
PTR (pointer) Host
node
HINFO (host Holds information on the host this node represents;
Host
info) such as machine type and OS
Contains any entity-specific information considered
TXT Any kind
useful
56
cs.vu.nl represents the
domain as well as the
zone; it has 3 name servers
(star, top, solo) and 3 mail
servers
name server for this zone
with 2 network addresses
mail server
Web server
FTP server
a single machine
implementing Web server
and FTP server
laser printer
inverse mapping
58
Hw are resources described?
one possibility is to use RDF (Resource Description Framework) that uses
triplets consisting of a subject, a predicate, and an object
e.g., (person, name, Alice) to describe a resource Person whose Name is
Alice
Hierarchical Implementations: LDAP
distributed directory services are implemented by combining structured
naming with attribute-based naming
e.g., Microsoft’s Active directory service
such systems rely on the lightweight directory access protocol or LADP
which is derived from OSI’s X.500 directory service
a LADP directory service consists of a number of records called directory
entries (attribute, value) pairs, similar to a resource record in DNS;
could be single- or multiple-valued (e.g., Mail_Servers)
59
Attribute Abbr. Value
Country C NL
Locality L Amsterdam
Organization O Vrije Universiteit
OrganizationalUnit OU Comp. Sc.
CommonName CN Main server
Mail_Servers -- 137.37.20.3, 130.37.24.6,137.37.20.10
FTP_Server -- 130.37.20.20
WWW_Server -- 130.37.20.20
a simple example of an LDAP directory entry using LDAP naming conventions
to identify the network addresses of some servers
60
the collection of all directory entries is called a Directory
Information Base (DIB)
each record is uniquely named so that it can be looked up
each naming attribute is called a Relative Distinguished Name
(RDN); the first 5 entries above
a globally unique name is formed using abbreviations of
naming attributes, e.g.,
/C=NL/O=Vrije Universiteit/OU=Comp. Sc.
this is similar to the DNS name nl.vu.cs
listing RDNs in sequence leads to a hierarchy of the collection
of directory entries, called a Directory Information Tree (DIT)
a DIT forms the naming graph of an LDAP directory service
where each node represents a directory entry
61
node N corresponds to the directory entry shown earlier; it also acts as
a parent of other directory entries that have an additional attribute,
Host_Name; such entries may be used to represent hosts
62
Attribute Value Attribute Value
Country NL Country NL
Locality Amsterdam Locality Amsterdam
Organization Vrije Universiteit Organization Vrije Universiteit
63