Chapter 3 - Processes
1
Introduction
In this chapter, we take a closer look at
the role of processes and threads in distributed systems
explain why they are so important
how they can be used to build applications
What is actually meant by code migration
Types of code migration
2
Introduction(con’t…)
In the pervious chapter, we concentrated on communication in
distributed system.
Communication takes place between processes
The concept of a process originates from the field of operating
systems where it is generally defined as a program in execution.
From OS perspective, management and scheduling of processes is
important.
However, when it comes to distributed systems, other issues
turn out :
Multithreading can help us in enhancing
performance
How are clients and servers organized
Process or code migration can help in achieving
scalability and can also help to dynamically configure
clients and servers.
3
Introduction(con’t…)
To understand the role of threads in distributed systems, it is important to
understand how OS execute the program.
To execute a program, an operating system creates:
a number of virtual processors,
each virtual processor running different program.
To keep the track of these virtual processors, the operating system has a
process table, containing:
CPU register values,
memory maps,
open files,
privileges,
etc.
4
Introduction(con’t…)
So, what is a process?
A process is often defined as a program in execution
Or, a program that is currently being executed on one of the
virtual processors is called process.
Please note that multiple processes may be concurrently
sharing the same CPU.
5
3.1 Threads(con’t…)
What is tread?
A thread is the smallest unit of execution within a process.
Or a thread is a segment of a process
Threads allow for parallelism within a process
It is good for applications that require concurrent tasks
a process can have multiple threads
6
3.1 Threads
Types of tread:
Single treaded process
Multi treaded process
A process has an address space (containing program text and data) and a
single thread of control, as well as other resources such as open files, child
processes, etc.
Process 1 Process 2 Process 3
three processes each with one thread one process with three threads 7
3.1 Threads(con’t…
each thread has its own program counter, registers, stack, and
state;
But, all threads of a process share address space, global
variables and other resources such as open files, etc.
8
3.1 Threads
Threads can be used in both Non distributed systems and
Distributed systems; can be seen into two ways:
1. Tread usage in Non distributed systems and
2. Tread usage in Distributed systems
9
3.1 Threads (con’t…)
1. Thread Usage in Nondistributed Systems:
Before discussing the role of threads in distributed systems, let us first consider their usage in nondistributed
systems.
In nondistributed systems, there are multithreaded and single treaded process.
In a single treaded process, there is blocking system.
Why do we need multithreaded process?
To allow multiple executions to take place in the same process environment, for example a wordprocessor
has different parts; parts for
interacting with the user
formatting the page as soon as changes are made
time savings (for auto recovery)
spelling and grammar checking, etc.
10
3.1 Threads (con’t…)
Advantages of multithreaded process:
1. Simplifying the programming model:
since many activities are going on at once
2. They are easier to create and destroy than processes since
they do not have any resources attached to them
3. Performance improves by overlapping activities if there is too
much I/O; i.e., to avoid blocking when waiting for input or doing
calculations, say in a spreadsheet
4. Real parallelism is possible in a multiprocessor system
5. Efficient Utilization of System Resources
11
3.1 Threads (con’t…)
2. Thread Usage in Distributed Systems:
An important property of threads is that they can provide
non-blocking system.
multiple logical connections at the same time.
This property makes threads particularly attractive to use in
distributed systems
We illustrate this point by taking a closer look at multithreaded
clients and servers, respectively.
In distribute systems, there are two types of multithreaded :
1. Multithreaded Clients
2. Multithreaded Servers
12
3.1.1.Threads in Distributed Systems (con’t…)
1. Multithreaded Clients
A typical example where this happens is in Web browsers, in
which.
Fetching different parts of a page can be implemented as a
separate thread,
Each thread sets up a separate connection to the server
Or each opening pages has its own TCP/IP connection to
the server or to replicated server
each can display the results as it gets its part of the page
Summary: multithreaded Web browsers several
in
connections can be opened simultaneously.
13
3.1.1.Threads in Distributed Systems (con’t…)
2. Multithreaded Servers
Multithreading in distributed systems can be found at the
server side.
In distributed system, servers can be constructed in three
ways
1. Single-threaded server
2. Multithreaded server
3. Finite-state machine
14
3.1.1.Threads in Distributed Systems (con’t…)
1. Single-threaded server: Steps
It gets a request,
It examines it,
carries it out to completion before getting the next request
The server is idle while waiting for disk read, i.e., system calls
are blocking
Consequently, requests from other clients cannot be handled.
15
3.1.1.Threads in Distributed Systems (con’t…)
2. Multithreaded server:
Multithreads are more important for implementing servers
e.g., a file server Steps
The dispatcher thread reads incoming requests
Examining the request
The server chooses an idle worker thread and hands it the
request
A multithreaded server organized in a dispatcher/worker model
16
3.1.1.Threads in Distributed Systems (con’t…)
3. Finite-state machine
Please note that in the first two possible designs, there is
blocking system calls
A third possibility is to run the server as a big finite-
state machine.
Instead of blocking, it records the state of the current request
in a table and then gets/proceeds to the next request.
Or
if threads are not available in server
it gets a request,
examines it,
tries to fulfill the request from cache, else sends a request
to the file system;
17
3.1.1.Threads in Distributed Systems (con’t…)
Summary of possible server designs:
Model Characteristics
Single-threaded server No parallelism, blocking system calls
Parallelism, blocking system calls
Multithreads server
(thread only)
Parallelism, nonblocking system
Finite-state machine
calls
three ways to construct/ design a server
Please note that nonblocking calls is hard to program.
18
3.2. Client: General Design Issues
Reading Assignment
19
3.3 Servers: General Design Issues
Issues
1. How to organize servers?
2. Where do clients contact a server?
3. Whether and how a server can be interrupted
4. Whether or not the server is stateless
20
3.3 Servers: General Design Issues (con’t…)
1. How to organize servers?
Two primary types of server architectures are Iterative Servers and Concurrent
Servers.
Iterative server
the server itself handles the request and returns the result
How It Works
Server waits for a client request.
Accepts a single request from a client.
Processes the request.
Sends the response back to the client.
Moves to the next request
Disadvantage: Blocking, poor scalability, and Not Suitable for High
Loads
Concurrent server
it passes a request to a separate process or thread and waits for the next incoming
request;
Responds to clients simultaneously
e.g., a multithreaded server
21
3.3 Servers: General Design Issues (con’t…)
2.Where do clients contact a server?
using endpoints or ports at the machine where the server is
running where each server listens to a specific endpoint
how do clients know the endpoint of a service?
globally assign endpoints for well-known services; e.g. FTP is
on TCP port 21, HTTP is on TCP port 80
for services that do not require preassigned endpoints, it can
be dynamically assigned by the local OS
how can the client know this endpoint? two approaches
a. have a daemon running and listening to a well-
known endpoint like in DCE; it keeps track of all endpoints of
services on the collocated server
the client will first contact the daemon which provides it with
the endpoint, and then the client contacts the specific server
22
3.3 Servers: General Design Issues (con’t…)
Client-to-server binding using a daemon as in DCE
23
3.3 Servers: General Design Issues (con’t…)
3. Whether and how a server can be interrupted
for instance, a user may want to interrupt a file
transfer, may be it was the wrong file
let the client exit the client application; this will break
the connection to the server;
the server will tear down the connection assuming that
the client had crashed
Hardware/Network Failures
Programming errors or crashes in server
Servers may be intentionally interrupted for updates,
upgrades, or repairs.
24
3.3 Servers: General Design Issues (con’t…)
4. Whether or not the server is stateless
a stateless server does not keep information on the
state of its clients;
for instance a Web server
a stateful server maintains information about its
clients;
for instance a file server that allows a client to keep a
local copy of a file and can make update operations
25
3.4 Code Migration
so far, communication was concerned on passing data
we may pass programs, even while running and in
heterogeneous systems
code migration also involves moving data as well: when a
program migrates while running, its status, pending signals,
and other environment variables such as the stack and the
program counter also have to be moved
26
3.4.1 Reasons for Migrating Code
Traditionally, code migration in distributed systems took place in the
form of process migration in which an entire process was moved
from one machine to another.
Problem: Performance
Solution: using code migration
move only code between machines
Reasons for migrating code are:
to improve performance; move processes from heavily-loaded to
lightly-loaded machines (load balancing)
to reduce communication: move a client application that performs many
database operations to a server if the database resides on the server;
then send only results to the client.
to exploit parallelism (for nonparallel programs): e.g., copies of a
mobile program moving from site to site searching the Web.
27
3.4.2. Models for Migrating Code
To get a better understanding of the different models for
code migration, we have to understand segments in a
process.
A process consists of three segments:
Code segment
Contains the set of instructions
Resource segment
Contains references to external resources such as files,
printers, devices, and so on.
Execution segment
Is used to store the current execution state of a process
consisting of private data, the stack, and the program
counter
28
3.4.2. Models for Migrating Code(con’t…)
There two models for migrating conde between machines:
1. Weak Mobility
2. Strong Mobility
29
3.4.2. Models for Migrating Code(con’t…)
1. Weak Mobility:
transfer only the code segment and may be some
initialization data; in this case a program always starts
from its initial stage, e.g. Java Applets
execution can be by the target process (in its own
address space like in Java Applets) or by a separate
process
30
3.4.2. Models for Migrating Code(con’t…)
2. Strong Mobility:
transfer code and execution segments; helps to migrate a
process in execution
can also be supported by remote cloning; having an exact copy
of the original process and running on a different machine;
executed in parallel to the original process; UNIX does this by
forking a child process
31
3.4.2. Models for Migrating Code(con’t…)
More on models for migration
migration can be
sender-initiated: the machine where the code resides
or is currently running; e.g., uploading programs to a
server; may need authentication or that the client is a
registered one
receiver-initiated: by the target machine; e.g., Java
Applets; easier to implement
32
3.4.2. Models for Migrating Code(con’t…)
Summary of models of code migration
alternatives for code migration
33
3.4.3. Migration and Local Resources
How to migrate the resource segment
not always possible to move a resource; e.g., a reference to TCP port
held by a process to communicate with other processes
Types of Process-to-Resource Bindings
Binding by identifier (the strongest): a resource is referred by its
identifier; e.g., a URL to refer to a Web page or an FTP server
referred by its Internet address
Binding by value (weaker): when only the value of a resource is
needed; in this case another resource can provide the same value;
e.g., standard libraries of programming languages such as C or Java
which are normally locally available, but their location in the file
system may vary from site to site
Binding by type (weakest): a process needs a resource of a specific
type; reference to local devices, such as monitors, printers, ...
34
3.4.3. Migration and Local Resources(cont…)
in migrating code, the above bindings cannot change, but the
references to resources can
how can a reference be changed? depends whether the
resource can be moved along with the code, i.e., resource-to-
machine binding
Types of Resource-to-Machine Bindings
Unattached Resources: can be easily moved with the
migrating program (such as data files associated with the
program)
Fastened Resources: such as local databases and complete
Web sites; moving or copying may be possible, but very
costly
Fixed Resources: intimately bound to a specific machine or
environment such as local devices and cannot be moved
we have nine combinations to consider 35
3.4.3. Migration and Local Resources(cont…)
Resource-to machine binding
Unattached Fastened Fixed
By identifier MV (or GR) GR (or MV) GR
Process-to-
resource binding By value CP (or MV, GR) GR (or CP) GR
By type RB (or GR, CP) RB (or GR, CP) RB (or GR)
actions to be taken with respect to the references to local resources when migrating code
to another machine
GR: Establish a global system wide reference
MV: Move the resource
CP: Copy the value of the resource
RB: Rebind process to a locally available resource
Exercise: for each of the nine combinations, give example resources
36
Thank You!
37