Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views14 pages

DSs CH 05 - Naming

Chapter Five discusses the importance of naming in distributed systems, highlighting how names are essential for identifying resources and facilitating communication. It explains various types of names, including addresses and identifiers, and introduces concepts like flat naming, broadcasting, multicasting, and hierarchical approaches for locating entities. The chapter emphasizes the need for location transparency and the challenges associated with mobile entities in large-scale networks.

Uploaded by

Student
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views14 pages

DSs CH 05 - Naming

Chapter Five discusses the importance of naming in distributed systems, highlighting how names are essential for identifying resources and facilitating communication. It explains various types of names, including addresses and identifiers, and introduces concepts like flat naming, broadcasting, multicasting, and hierarchical approaches for locating entities. The chapter emphasizes the need for location transparency and the challenges associated with mobile entities in large-scale networks.

Uploaded by

Student
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

DSs Lecture Notes

Chapter Five
Naming
5.1. Naming Entities
In a distributed system names are used to refer to a wide variety of resources such as computers,
services, remote objects, files as well as to users. Names facilitate communication and resource
sharing. A name is needed to request a computer system to act upon a specific resource chosen
out of many; for example a name in the form of a URL is needed to access a specific web page.
Processes cannot share particular resources managed by a computer system unless they can
name them consistently. Users also can’t communicate in distributed systems unless they able
to be referred each other, for example we’ve done this when communicating via emails.
Names are not the only useful means of identification: descriptive attributes are another.
Sometimes clients do not know name of a particular entity that they seek, however they may
have some information that describes the entity. Or they may require a service and knows some
of its characteristics but not what entity implements it. The naming facility of a distributed
operating system enables users and programs to assign character-string names to objects and
subsequently use these names to refer to those objects. The locating facility, which is an integral
part of the naming facility, maps an object's name to the object's location in a distributed system.
The naming and locating facilities jointly form a naming system that provides the users with an
abstraction of an object that hides the details of how and where an object is actually located in
the network. It provides a further level of abstraction when dealing with object replicas. Given
an object name, it returns a set of the locations of the object's replicas. The naming system plays
a very important role in achieving the goal of location transparency, facilitating transparent
migration and replication of objects, Object sharing.
The naming system is one of the most important components of a distributed operating system
(DOS) because it enables other services and objects to be identified and accessed in a uniform
manner. Now let’s look at some fundamental terminologies and concepts associated naming in
distributed systems.
5.1.1. Names
A name in a distributed system is a string of bits or characters that is used to refer to an entity.
Names can be file names like “/uog/cs/CH05_Naming.pdf”, URLs like “http://www.google.com/”
and Internet domain names like “www.google.com”.
Entities could be resources (such as hosts, printers, disks and files) and also those explicitly named
processes, users, mailboxes, newsgroups, webpages, graphical windows, messages and network
connections.

BireZman ([email protected]) UoG, Computer Science Department 1


DSs Lecture Notes

To operate on an entity, it is necessary to access it, for which we need an access point. An access
point is yet another, but special, kind of entity in a distributed system. The name of an access
point is called an address. The address of an access point of an entity is also simply called an
address of that entity.
An entity can offer more than one access point. As a comparison, a telephone can be viewed as
an access point of a person, whereas the telephone number corresponds to an address. Indeed,
many people nowadays have several telephone numbers, each number corresponding to a point
where they can be reached. In a distributed system, a typical example of an access point is a host
running a specific server, with its address formed by the combination of, for example, an IP
address and port number (i.e., the server's transport-level address).
An entity may change its access points in the course of time. For example, when a mobile
computer moves to another location, it is often assigned a different IP address than the one it
had before. Likewise, when a person moves to another city or country, it is often necessary to
change telephone numbers as well. In a similar fashion, changing jobs or Internet Service
Providers, means changing your e-mail address.
5.1.2. Addresses
An address is thus just a special kind of name: it refers to an access point of an entity. Because
an access point is tightly associated with an entity, it would seem convenient to use the address
of an access point as a regular name for the associated entity. Nevertheless, this is hardly ever
done as such naming is generally very inflexible and often human unfriendly.
For example, it is not uncommon to regularly reorganize a distributed system, so that a specific
server is now running on a different host than previously. The old machine on which the server
used to be running may be reassigned to a completely different server. In other words, an entity
may easily change an access point, or an access point may be reassigned to a different entity. If
an address is used to refer to an entity, we will have an invalid reference the instant the access
point changes or is reassigned to another entity. Therefore, it is much better to let a service be
known by a separate name independent of the address of the associated server.
Likewise, if an entity offers more than one access point, it is not clear which address to use as a
reference. For instance, many organizations distribute their Web service across several servers.
If we would use the addresses of those servers as a reference for the Web service, it is not obvious
which address should be chosen as the best one. Again, a much better solution is to have a single
name for the Web service independent from the addresses of the different Web servers.
These examples illustrate that a name for an entity that is independent from its addresses is often
much easier and more flexible to use. Such a name is called location independent.

BireZman ([email protected]) UoG, Computer Science Department 2


DSs Lecture Notes

5.1.3. Identifiers
Besides to addresses, there are names that deserve special treatment, for example names that
are used to uniquely identify an entity called true identifiers. True identifiers satisfies the
following properties:-
i) An identifier refers to at most one entity;
ii) Each entity is referred to by at most one identifier; and
iii) An identifier always refers to the same entity, which is never reused.
By using identifiers, it becomes much easier to unambiguously refer to an entity. For example,
assume two processes each refer to an entity by means of an identifier. To check if the processes
are referring to the same entity, it is sufficient to test if the two identifiers are equal. Such a test
would not be sufficient if the two processes were using regular, non-unique, non-identifying
names. For example, the name "Amare" cannot be taken as a unique reference to just a single
person in our community.
Likewise, if an address can be reassigned to a different entity, we cannot use an address as an
identifier. Consider the use of telephone numbers, which are reasonably stable in the sense that
a telephone number for some time refers to the same person or organization. However, using a
telephone number as an identifier will not work as it can be reassigned in the course of time.
Addresses and identifiers are two important types of names that are each used for very different
purposes. In many computer systems, addresses and identifiers are represented in machine-
readable form only, that is, in the form of bit strings. For example, an Ethernet address is
essentially a random string of 48 bits. Likewise, memory addresses are typically represented as
32-bit or 64-bit strings.
Another important type of name is that which is tailored to be used by humans, also referred to
as human-friendly names. In contrast to addresses and identifiers, a human-friendly name is
generally represented as a character string. These names appear in many different forms; for
e.g., files in UNIX systems have character-string names that can be as long as 255 characters, and
which are defined entirely by the user. Similarly, DNS names are represented as relatively simple
case-insensitive character strings.
5.2. Flat Naming
Above, we explained that identifiers are convenient to uniquely represent entities. In many cases,
identifiers are simply random bit strings, which we conveniently refer to as unstructured or flat
names. An important property of such a name is that it does not contain any information
whatsoever on how to locate the access point of its associated entity. In the following, we will
take a look at how flat names can be resolved, or, equivalently, how we can locate an entity when
given only its identifier.

BireZman ([email protected]) UoG, Computer Science Department 3


DSs Lecture Notes

5.2.1. Simple Solutions


We first consider two simple solutions for locating an entity. Both solutions are applicable only
to local-area networks. Nevertheless, in that environment, they often do the job well, making
their simplicity particularly attractive.
Broadcasting and Multicasting
Consider a distributed system built on a computer network: that offers efficient broadcasting
facilities. Typically, such facilities are offered by local-area networks in which all machines are
connected to a single cable or the logical equivalent thereof. Also, local-area wireless networks
fall into this category.
Locating an entity in such an environment is simple: a message containing the identifier of the
entity is broadcast to each machine and each machine is requested to check whether it has that
entity. Only the machines that can offer an access point for the entity send a reply message
containing the address of that access point. This principle is used in the Internet Address
Resolution Protocol (ARP) to find the data-link address of a machine when given only an IP
address. In essence, a machine broadcasts a packet on the local network asking who the owner
of a given IP address is. When the message arrives at a machine, the receiver checks whether it
should listen to the requested IP address. If so, it sends a reply packet containing, for example,
its Ethernet address.
Broadcasting becomes inefficient when the network grows. Not only is network bandwidth
wasted by request messages, but, more seriously, too many hosts maybe interrupted by requests
they cannot answer. One possible solution is to switch to multicasting, by which only a restricted
group of hosts receives the request. For example, Ethernet networks support data-link level
multicasting directly in hardware.
Multicasting can also be used to locate entities in point-to-point networks. For example, the
Internet supports network-level multicasting by allowing hosts to join a specific multicast group.
Such groups are identified by a multicast address.
When a host sends a message to a multicast address, the network layer provides a best-effort
service to deliver that message to all group members.
A multicast address can be used as a general location service for multiple entities. For example,
consider an organization where each employee has his or her own mobile computer. When such
a computer connects to the locally available network, it is dynamically assigned an IP address. In
addition, it joins a specific multicast group. When a process wants to locate computer A, it sends
a "where is A?" request to the multicast group. If A is connected, it responds with its current IP
address.
Another way to use a multicast address is to associate it with a replicated entity, and to use
multicasting to locate the nearest replica. When sending a request to the multicast address, each

BireZman ([email protected]) UoG, Computer Science Department 4


DSs Lecture Notes

replica responds with its current (normal) IP address. A crude way to select the nearest replica is
to choose the one whose reply comes in first.
5.2.2. Home Based Approach
The use of broadcasting and multicasting imposes scalability problems such as difficulties to
implement efficiently in large scale networks. A popular approach to supporting mobile entities
in large-scale networks is to introduce a home location - which keeps track of the current location
of an entity. Special techniques may be applied to safeguard against network or process failures.
In practice, the home location is often chosen to be the place where an entity was created.
For example, Mobile IP follows Home based approach. Each mobile host uses a fixed IP address.
All communication to that IP address is initially directed to the mobile host's home agent. This
home agent is located on the local-area network corresponding to the network address
contained in the mobile host's IP address. In the case of IPV6, it is realized as a network-layer
component. Whenever the mobile host moves to another network, it requests a temporary
address that it can use for communication. This care-of address is registered at the home agent.
When the home agent receives a packet for the mobile host, it looks up the host's current
location. If the host is on the current local network, the packet is simply forwarded. Otherwise, it
is tunneled to the host's current location, that is, wrapped as data in an IP packet and sent to the
care-of address. At the same time, the sender of the packet is informed of the host's current
location. Note that the IP address is effectively used as an identifier for the mobile host.

Figure 5.1: The Principle of Mobile IP

BireZman ([email protected]) UoG, Computer Science Department 5


DSs Lecture Notes

Drawbacks of home-based approach are: -


 In large scale networks, to communicate with a mobile entity a client first has to contact
the home, which may be at a completely different location than the entity itself. The
result is an increase in communication latency.
 From the use of a fixed home location, for one thing, it must be ensured that the home
location always exists. Otherwise, contacting the entity will become impossible.
 Problems are aggravated when a long-lived entity decides to move permanently to a
completely different part of the network than where its home is located. In that case, it
would have been better if the home could have moved along with the host.
A solution to this problem is to register the home at a traditional naming service and to let a client
first look up the location of the home. Because the home location can be assumed to be relatively
stable that location can be effectively cached after it has been looked up.
Generally, the issues with Home-Based Approaches are: - Home address has to be supported as
long as entity lives, Home address is fixed (unnecessary burden if entity permanently moves) and
Poor geographical scalability (the entity may be next to the client).
5.2.3. Hierarchical Approach
In a hierarchical scheme, a network is divided into a collection of domains. There is a single top-
level domain that spans the entire network. Each domain can be subdivided into multiple, smaller
sub domains. A lowest-level domain, called a leaf domain, typically corresponds to a local-area
network in a computer network or a cell in a mobile telephone network.
Each domain D has an associated directory node dir(D) that keeps track of the entities in that
domain. This leads to a tree of directory nodes. The directory node of the top-level domain, called
the root (directory) node, knows about all entities.

Figure 5.2: Hierarchical organization of a location service into domains, each having an
associated directory node.

BireZman ([email protected]) UoG, Computer Science Department 6


DSs Lecture Notes

To keep track of the whereabouts of an entity, each entity currently located in a domain D is
represented by a location record in the directory node dir(D). A location record for entity E in
the directory node N for a leaf domain D contains the entity's current address in that domain. In
contrast, the directory node N' for the next higher-level domain D' that contains D, will have a
location record for E containing only a pointer to N. Likewise, the parent node of N' will store a
location record for E containing only a pointer to N'. Consequently, the root node will have a
location record for each entity, where each location record stores a pointer to the directory node
of the next lower-level sub domain where that record's associated entity is currently located.
An entity may have multiple addresses, for example if it is replicated. If an entity has an address
in leaf domain D1 and D2 respectively, then the directory node of the smallest domain containing
both D1 and D2, will have two pointers, one for each sub domain containing an address.

Figure 5.3: An example of storing information of an entity having two addresses in


different leaf domains
In hierarchical approach a look up operation proceeds as follows. A client wishing to locate an
entity E, issues a lookup request to the directory node of the leaf domain D in which the client
resides. If the directory node does not store a location record for the entity, then the entity is
currently not located in D. Consequently, the node forwards the request to its parent. Note that
the parent node represents a larger domain than its child. If the parent also has no location record
for E, the lookup request is forwarded to a next level higher, and so on.

Figure 5.4: Looking up a location in a hierarchically organized location service.

BireZman ([email protected]) UoG, Computer Science Department 7


DSs Lecture Notes

As soon as the request reaches a directory node M that stores a location record for entity E, we
know that E is somewhere in the domain dom(M) represented by node M. As shown in the above
Figure 5.4, M is shown to store a location record containing a pointer to one of its sub domains.
The lookup request is then forwarded to the directory node of that sub domain, which in turn
forwards it further down the tree, until the request finally reaches a leaf node. The location
record stored in the leaf node will contain the address of E in that leaf domain. This address can
then be returned to the client that initially requested the lookup to take place.
5.3. Structured Naming
Flat names are good for machines, but are generally not very convenient for humans to use. As
an alternative, naming systems generally support structured names that are composed from
simple, human-readable names. Not only file naming, but also host naming on the Internet follow
this approach.
5.3.1. Name Space
Names are commonly organized into what is called a name space. Name spaces for structured
names can be represented as a labeled, directed graph with two types of nodes.
 A leaf node represents a named entity and has the property that it has no outgoing edges.
A leaf node generally stores information on the entity it is representing-for example, its
address-so that a client can access it. Alternatively, it can store the state of that entity,
such as in the case of file systems in which a leaf node actually contains the complete file
it is representing.
 In contrast to a leaf node, a directory node has a number of outgoing edges, each labeled
with a name, as shown below in Figure 5.5. Each node in a naming graph is considered as
yet another entity in a distributed system, and, in particular, has an associated identifier.
A directory node stores a table in which an outgoing edge is represented as a pair (edge
label, node identifier). Such a table is called a directory table.

Figure 5.5: A general naming graph with a single root node

BireZman ([email protected]) UoG, Computer Science Department 8


DSs Lecture Notes

The naming graph shown in Figure 5.5 has one node, namely no, which has only outgoing and no
incoming edges. Such a node is called the root (node) of the naming graph. Although it is possible
for a naming graph to have several root nodes, for simplicity, many naming systems have only
one. Each path in a naming graph can be referred to by the sequence of labels corresponding to
the edges in that path, look the following example path.

N:< label-1, label-2, …, label-n> (where N refers to the first node in the path)

Such a sequence is called a path name. If the first node in a path name is the root of the naming
graph, it is called an absolute path name. Otherwise, it is called a relative path name.
It is important to realize that names are always organized in a name space. As a consequence, a
name is always defined relative only to a directory node. In this sense, the term "absolute name"
is somewhat misleading. Likewise, the difference between global and local names can often be
confusing. A global name is a name that denotes the same entity, no matter where that name is
used in a system.
In other words, a global name is always interpreted with respect to the same directory node. In
contrast, a local name is a name whose interpretation depends on where that name is being
used. Put differently, a local name is essentially a relative name whose directory in which it is
contained is (implicitly) known.
This description of a naming graph comes close to what is implemented in many file systems.
However, instead of writing the sequence of edge labels to represent a path name, path names
in file systems are generally represented as a single string in which the labels are separated by a
special separator character, such as a slash “/”). This character is also used to indicate whether a
path name is absolute. For example, in Figure 5.5, instead of using n0:<home, steen, mbox>
that is, the actual path name, it is common practice to use its string representation
/home/steen/mbox. Note also that when there are several paths that lead to the same node,
that node can be represented by different path names. For example, node n 5 in Figure 5.5 can
be referred to by /home/steen/keys as well as /keys.
There are many different ways to organize a name space. As we mentioned, most name spaces
have only a single root node. In many cases, a name space is also strictly hierarchical in the sense
that the naming graph is organized as a tree.
This means, each node except the root has exactly one incoming edge; the root has no incoming
edges. As a consequence, each node also has exactly one associated (absolute) path name.

BireZman ([email protected]) UoG, Computer Science Department 9


DSs Lecture Notes

5.3.2. Name Space Distribution


Name spaces for a large-scale, possibly worldwide distributed system, are usually organized
hierarchically. As before, assume such a name space has only a single root node. To effectively
implement such a name space, it is convenient to partition it into three logical layers.
I) The Global Layer
It is formed by highest-level nodes, that is, the root node and other directory nodes logically
close to the root, namely its children. Nodes in the global layer are often characterized by
their stability, in the sense that directory tables are rarely changed. Such nodes may represent
organizations, or groups of organizations, for which names are stored in the name space.
II) The Administrational Layer
It is formed by directory nodes that together are managed within a single organization. A
characteristic feature of the directory nodes in the administrational layer is that they
represent groups of entities that belong to the same organization or administrational unit.
For e.g., there may be a directory node for each department in an organization, or a directory
node from which all hosts can be found. Another directory node may be used as the starting
point for naming all users, and so forth. The nodes in the administrational layer are relatively
stable, although changes generally occur more frequently than to nodes in the global layer.
III) The Managerial Layer
It consists of nodes that may typically change regularly. For example, nodes representing
hosts in the local network belong to this layer. For the same reason, the layer includes nodes
representing shared files such as those for libraries or binaries. Another important class of
nodes includes those that represent user-defined directories and files. In contrast to the
global and administrational layer, the nodes in the managerial layer are maintained not only
by system administrators, but also by individual end users of a distributed system.

Figure 5.6: An Example of partitioning DNS name space into three layers.

BireZman ([email protected]) UoG, Computer Science Department 10


DSs Lecture Notes

5.3.3. Name Resolution and its Implementation


Name resolution is the process of mapping an object's name to the object's properties, such as
its location. Since an object's properties are stored and maintained by the authoritative name
servers of that object, name resolution is basically the process of mapping an object's name to
the authoritative name servers of that object.
Once an authoritative name server of the object has been located, operations can be invoked to
read or update the object's properties. Each name agent in a distributed system knows about at
least one name server a prior.
To get a name resolved, a client first contacts its name agent, which in turn contacts a known
name server, which may in turn contact other name servers.
The distribution of a name space across multiple name servers affects the implementation of
name resolution. To explain the implementation of name resolution in large-scale name services,
we assume for the moment that name servers are not replicated and that no client-side caches
are used. Each client has access to a local name resolver, which is responsible for ensuring that
the name resolution process is carried out.
Referring to Figure 5.6, assume the (absolute) path name root:<nl, VU, CS, ftp, pub,
globe, index.html> is to be resolved. Using a URL notation, this path name would
correspond to ftp://ftp.cs. vu.nl/pub/globe/index.html. There are two ways to
implement name resolution: Iterative and Recursive.
A. Iterative Name Resolution
In iterative name resolution, a name resolver hands over the complete name to the root
name server. It is assumed that the address where the root server can be contacted is well
known. The root server will resolve the path name as far as it can, and return the result to
the client. In our example, the root server can resolve only the label nl, for which it will return
the address of the associated name server.
At that point, the client passes the remaining path name i.e., nl: <VU, cs, jtp,pub,
globe, index.html> to that name server. This server can resolve only the label VU, and
returns the address of the associated name server, along with the remaining path name
vu:<cs, ftp, pub, globe, index.html>.

BireZman ([email protected]) UoG, Computer Science Department 11


DSs Lecture Notes

Figure 5.7: Principle of Iterative Name Resolution


The client's name resolver will then contact this next name server, which responds by
resolving the label cs, and subsequently also ftp, returning the address of the FTP server
along with the path name ftp:<pub, globe, index.html>. The client then contacts the
FTP server, requesting it to resolve the last part of the original path name. The FTP server will
subsequently resolve the labels pub, globe, and index.html, and transfer the requested
file (in this case using FTP). This process of iterative name resolution is shown in Figure 5.7.
(The notation #<cs> is used to indicate the address of the server responsible for handling
the node referred to by <cs>.)
In practice, the last step, namely contacting the FTP server and requesting it to transfer the
file with path name ftp:<pub, globe, index.himl>, is carried out separately by the
client process. In other words, the client would normally hand only the path name root:
<nl, VU, CS, ftp> to the name resolver, from which it would expect the address where
it can contact the FTP server, as is also shown in Figure 5.7.
B. Recursive Name Resolution
An alternative technique to iterative is to use recursion name resolution. Instead of returning
each intermediate result back to the client's name resolver, with recursive name resolution,
a name server passes the result to the next name server it finds.
So, for example, when the root name server finds the address of the name server
implementing the node named nl, it requests that name server to resolve the path name
nl:<vu, CS, ftp, pub, globe, index.html>. Using recursive name resolution as
well, this next server will resolve the complete path and eventually return the file
index.html to the root server, which, in turn, will pass that file to the client's name resolver.
Recursive name resolution is shown in Figure 5.8. As in iterative name resolution, the last

BireZman ([email protected]) UoG, Computer Science Department 12


DSs Lecture Notes

resolution step (contacting the FTP server and asking it to transfer the indicated file) is
generally carried out as a separate process by the client.

Figure 5.8: Principle of Recursive Name Resolution


The main drawback of recursive name resolution is that it puts a higher performance demand on
each name server. Basically, a name server is required to handle the complete resolution of a
path name, although it may do so in cooperation with other name servers. This additional burden
is generally so high that name servers in the global layer of a name space support only iterative
name resolution.
There are two important advantages to recursive name resolution.
 The first advantage is that caching results is more effective compared to iterative name
resolution.
 The second advantage is that communication costs may be reduced.
5.4. Attribute-Based Naming
Flat and structured names generally provide a unique and location-independent way of referring
to entities. Moreover, structured names have been partly designed to provide a human-friendly
way to name entities so that they can be conveniently accessed. In most cases, it is assumed that
the name refers to only a single entity. However, location independence and human friendliness
are not the only criterion for naming entities. In particular, as more information is being made
available it becomes important to effectively search for entities. This approach requires that a
user can provide merely a description of what he is looking for.
There are many ways in which descriptions can be provided, but a popular one in distributed
systems is to describe an entity in terms of (attribute, value) pairs, generally referred to as
attribute-based naming. In this approach, an entity is assumed to have an associated collection

BireZman ([email protected]) UoG, Computer Science Department 13


DSs Lecture Notes

of attributes. Each attribute says something about that entity. By specifying which values a
specific attribute should have, a user essentially constrains the set of entities that he is interested
in. It is up to the naming system to return one or more entities that meet the user's description.
In this section we take a closer look at attribute-based naming systems.

BireZman ([email protected]) UoG, Computer Science Department 14

You might also like