Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views27 pages

CC Module 1

This document discusses the evolution of cloud computing and distributed systems over the past 30 years, highlighting the transition from high-performance computing (HPC) to high-throughput computing (HTC) due to increasing demands from Internet users. It covers various computing paradigms, including centralized, parallel, and distributed computing, as well as the Internet of Things (IoT) and cyber-physical systems. Additionally, it addresses technological advancements in multicore CPUs, GPUs, memory, storage, and virtualization that support these computing models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views27 pages

CC Module 1

This document discusses the evolution of cloud computing and distributed systems over the past 30 years, highlighting the transition from high-performance computing (HPC) to high-throughput computing (HTC) due to increasing demands from Internet users. It covers various computing paradigms, including centralized, parallel, and distributed computing, as well as the Internet of Things (IoT) and cyber-physical systems. Additionally, it addresses technological advancements in multicore CPUs, GPUs, memory, storage, and virtualization that support these computing models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CLOUD COMPUTING

UNIT - I
DISTRIBUTED SYSTEM MODELS AND ENABLING
TECHNOLOGIES
This chapter presents the evolutionary changes that
have occurred in parallel, distributed, and cloud computing
over the past 30 years, driven by applications with variable
workloads and large data sets.

Ø We study both high-performance and high-


throughput computing systems in parallel computers
appearing as computer clusters, service- oriented
architecture, computational grids, peer-to-peer networks,
Internet clouds, and the Internet of Things.

Ø These systems are distinguished by their hardware


architectures, OS platforms, processing algorithms,
communication protocols, and service models applied. We
also introduce essential issues on the scalability,
performance, availability, security, and energy efficiency in
distributed systems.

1.1 SCALABLE COMPUTING OVER THE INTERNET


Scalability is the ability of a system, network, or
process to handle a growing amount of work in a capable
manner or its ability to be enlarged to accommodate that

1
CLOUD COMPUTING

growth. For example, it can refer to the capability of a system


to increase its total output under an increased load when
resources (typically hardware) are added.
1.1.1 The Age of Internet Computing

Billions of people use the Internet every day. As a


result, supercomputer sites and large data centers must
provide high-performance computing services to huge
numbers of Internet users concurrently Because of this high
demand, high-performance computing (HPC)applications is
no longer optimal for measuring system performance. The
emergence of computing clouds instead demands high-
throughput computing (HTC) systems built with parallel and
distributed computing technologies.

Ø Internet computing is the foundation on which e-


business runs.
Ø It is the only architecture that can run all facets of
business, from supplier collaboration and merchandise
purchasing, to distribution and store operations, to
customer sales and service.

The Platform Evolution


Ø Computer technology has gone through five
generations of development, with each generation

2
CLOUD COMPUTING

lasting from 10 to 20 years. Successive generations are


overlapped in about 10 years.
Ø For instance, From 1950 to 1970, a handful of
mainframes, including the IBM 360 and CDC 6400,
were built to satisfy the demands of large businesses and
government organizations.

Ø From 1960 to 1980, lower-cost minicomputers such as


the DEC PDP 11 and VAX Series became popular among
small businesses and on college campuses.
Ø From 1970 to 1990, we saw widespread use of
personal computers built with VLSI microprocessors.
Ø From 1980 to 2000, massive numbers of portable
computers and pervasive devices appeared in both wired
and wireless applications.

Explanation
Ø On the HPC side, supercomputers (massively parallel
processors or MPPs) are gradually replaced by clusters
of cooperative computers out of a desire to share
computing resources. The cluster is often a collection of
homogeneous compute nodes that are physically connected
in close range to one another.
Ø On the HTC side, peer-to-peer (P2P) networks are
formed for distributed file sharing and content delivery
applications. A P2P system is built over many client.
3
CLOUD COMPUTING

Peer machines are globally distributed in nature. P2P, cloud


computing, and web service platforms are more focused on
HTC applications than on HPC applications.
High-Performance Computing
1 The development of market-oriented high-end
computing systems is undergoing a strategic change from
an HPC paradigm to an HTC paradigm. This HTC
paradigm pays more attention to high-flux computing. The
main application for high-flux computing.

2 The performance goal thus shifts to measure high


throughput or the number of tasks completed per unit of
time. HTC technology needs to not only improve in terms
of batch processing speed, but also address the acute
problems of cost, energy savings, security, and reliability at
many data and enterprise computing centers.
Three New Computing Paradigms
Ø The maturity of Radio-frequency Identification
(Rfid), Global Positioning System (GPS), and sensor
technologies has triggered the development of the Internet
of Things (IoT).
Computing Paradigm Distinctions
v In general, distributed computing is the opposite of
centralized computing. The field of parallel computing

4
CLOUD COMPUTING

overlaps with distributed computing to a great extent, and


cloud computing overlaps with distributed, centralized.
Ø Centralized computing This is a computing paradigm
by which all computer resources are centralized in one
physical system. All resources (processors, memory, and
storage) are fully shared and tightly coupled within one
integrated OS. Many data centers and supercomputers are
centralized systems, but they are used in parallel, distributed,
and cloud computing applications.
Ø Parallel computing In parallel computing, all
processors are either tightly coupled with centralized
shared memory or loosely coupled with distributed
memory. Interprocessor communication is accomplished
through shared memory or via message passing.
Ø Distributed computing This is a field of computer
science/engineering that studies distributed systems. A
distributed system consists of multiple autonomous
computers, each having its own private memory,
communicating through a computer network. Information
exchange in a distributed system is accomplished
through message passing.
Ø Cloud computing An Internet cloud of resources can
be either a centralized or a distributed computing system.
The cloud applies parallel or distributed computing, or
both. Clouds can be built with physical or virtualized

5
CLOUD COMPUTING

resources over large data centers that are centralized or


distributed.

Scalable Computing Trends and New Paradigms


Includes,
Ø Degrees of Parallelism
Ø Innovative Applications
Ø The Trend toward Utility Computing
Ø The Hype Cycle of New Technologies
Ø Fifty years ago, when hardware was bulky and
expensive, most computers were designed in a bit-serial
fashion.
Ø Data-level parallelism (DLP) was made popular through
SIMD (single instruction, multiple data) and vector
machines using vector or array types of instructions. DLP
requires even more hardware support and compiler
assistance to work properly.
Innovative Applications
Ø Both HPC and HTC systems desire transparency in
many application aspects. For example, data access,
resource allocation, process location, concurrency in
execution, job replication, and failure recovery should be
made transparent to both users and system management.

6
CLOUD COMPUTING

Ø For example, distributed transaction processing is often


practiced in the banking and finance industry.
Transactions represent 90 percent of the existing market
for reliable banking systems. Users must deal with multiple
database servers in distributed transactions.
The Trend toward Utility Computing
v Utility computing focuses on a business model in which
customers receive computing resources from a paid service
provider. All grid/cloud platforms are regarded as utility
service providers.

7
CLOUD COMPUTING

Ø Figure 1.2 identifies major computing paradigms to


facilitate the study of distributed systems and their
applications. These paradigms share some common
characteristics.
Ø First, they are all ubiquitous in daily life. Reliability
and scalability are two major design objectives in these
computing models.
Ø Second, they are aimed at autonomic operations that
can be self- organized to support dynamic discovery.
The Hype Cycle of New Technologies
Ø Any new and emerging computing and information
technology may go through a hype cycle, Generally
illustrated in Figure 1.3. This cycle shows the expectations
for the technology at five different stages.

8
CLOUD COMPUTING

Ø Also as shown in Figure 1.3, the cloud technology had


just crossed the peak of the expectation stage in 2010,
and it was expected to take two to five more years to
reach the productivity stage.
1.1.3 The Internet of Things and Cyber-Physical
Systems
v Two Internet development trends:
v The Internet of Things
v Cyber-Physical Systems.
These evolutionary trends emphasize the extension of
the Internet to everyday objects.
The Internet of Things
Ø The concept of the IoT was introduced in 1999 at MIT .
The IoT refers to the networked interconnection of
everyday objects, tools, devices, or computers. One can
view the IoT as a wireless network of sensors that
interconnect all things in our daily life.
Ø The IoT needs to be designed to track 100 trillion static
or moving objects simultaneously. The IoT demands
universal addressability of all of the objects or things. To
reduce the complexity of identification, search, and storage,
one can set the threshold to filter out fine-grain objects.

Cyber-Physical Systems:

9
CLOUD COMPUTING

Ø A cyber-physical system (CPS) is the result of


interaction between computational processes and the
physical world. A CPS integrates “cyber” (heterogeneous,
asynchronous) with “physical” (concurrent and information-
dense) objects. A CPS merges the “3C” technologies of
computation, communication, and control into an intelligent
closed feedback system between the physical world and the
information world.

Ø The IoT emphasizes various networking connections


among physical objects, while the CPS emphasizes
exploration of virtual reality (VR) applications in the
physical world

TECHNOLOGIES FOR NETWORK-BASED SYSTEMS


Ø Multicore CPUs and Multithreading Technologies
Ø Advances in CPU Processors
Ø Multicore CPU and Many-Core GPU Architectures
Ø Multithreading Technology
Ø GPU Computing to Exascale and Beyond
Ø How GPUs Work
Ø GPU Programming Model
Ø Power Efficiency of the GPU
Ø Memory, Storage, and Wide-Area Networking
Ø Memory Technology
Ø Disks and Storage Technology
10
CLOUD COMPUTING

Ø System-Area Interconnects
Ø Wide-Area Networking
Ø Virtual Machines and Virtualization Middleware
Ø Virtual Machines
Ø VM Primitive Operations
Ø Virtual Infrastructures
Ø Data Center Virtualization for Cloud Computing
Ø Data Center Growth and Cost Breakdown
Ø Low-Cost Design Philosophy
Ø Convergence of Technologies

Multi core CPUs and Multithreading Technologies


Advances in CPU Processors:
Ø Today, advanced CPUs or microprocessor chips assume
a multi core architecture with dual, quad, six, or more
processing cores. These processors exploit parallelism at ILP
and TLP levels.

11
CLOUD COMPUTING

Ø Both multi-core CPU and many-core GPU processors


can handle multiple instruction threads at different
magnitudes today. Figure 1.5 shows the architecture of a
typical multicore processor. Each core is essentially a
processor with its own private cache (L1 cache). Multiple
cores are housed in the same chip with an L2 cache that is
shared by all cores.

2. Multicore CPU and Many-Core GPU


Architectures:
Ø Multicore CPUs may increase from the tens of cores
to hundreds or more in the future. But the CPU has
reached its limit in terms of exploiting massive DLP due to
the aforementioned memory wall problem. This has triggered
12
CLOUD COMPUTING

the development of many-core GPUs with hundreds or more


thin cores. Both IA-32 and IA-64 instruction set
architectures are built into commercial CPUs. Now, x-86
processors have been extended to serve HPC and HTC
systems in some high-end server processors.

Ø Consider in Figure 1.6 the dispatch of five independent


threads of instructions to four pipelined data paths
(functional units) in each of the following five processor
categories, from left to right: a four-issue superscalar
processor, a fine-grain multithreaded processor, a coarse-
grain multithreaded processor, a two-core CMP, and a
simultaneous multithreaded (SMT) processor. The
superscalar processor is single- threaded with four functional

13
CLOUD COMPUTING

units. Each of the three multithreaded processors is four-way


multithreaded over four functional data paths.
2.2 GPU Computing to Exascale and Beyond
How GPUs Work:
Ø Early GPUs functioned as coprocessors attached to the
CPU. Today, the NVIDIA GPU has been upgraded to 128
cores on a single chip. Furthermore, each core on a GPU
can handle eight threads of instructions. This translates
to having up to 1,024 threads executed concurrently on a
single GPU. This is true massive parallelism, compared to
only a few threads that can be handled by a conventional
CPU.
2. GPU Programming Model:
The interaction between a CPU and GPU in performing
parallel execution of floating-point operations
concurrently. The CPU is the conventional multicore
processor with limited parallelism to exploit. The GPU has
a many-core architecture that has hundreds of simple
processing cores organized as multiprocessors. Each core can
have one or more threads. Essentially, the CPU’s floating-
point kernel computation role is largely offloaded to the
many-core GPU. The CPU instructs the GPU to perform
massive data processing. The bandwidth must be matched
between the on-board main memory and the on-chip GPU
memory.

14
CLOUD COMPUTING

3. Power Efficiency of the GPU:


Ø Bill Dally of Stanford University considers power and
massive parallelism as the major benefits of GPUs over
CPUs for the future.

15
CLOUD COMPUTING

Memory, Storage, and Wide-Area Networking


1. Memory Technology:
Ø The upper curve in Figure 1.10 plots the growth of
DRAM chip capacity from 16 KB in 1976 to 64 GB in
2011. This shows that memory chips have experienced a
4x increase in capacity every three years.

16
CLOUD COMPUTING

Disks and Storage Technology:


Ø Beyond 2011, disks or disk arrays have exceeded 3 TB
in capacity. The lower curve in Figure 1.10 shows the disk
storage growth in 7 orders of magnitude in 33 years.
Ø The rapid growth of flash memory and solid-state
drives (SSDs) also impacts the future of HPC and HTC
systems.

17
CLOUD COMPUTING

3 Virtual Machines and Virtualization


Middleware
Ø A conventional computer has a single OS image. This
offers a rigid architecture that tightly couples
application software to a specific hardware platform.
Ø Some software running well on one machine may not
be executable on another platform with a different
instruction set under a fixed OS.

18
CLOUD COMPUTING

1. Virtual Machines:
Ø The VM can be provisioned for any hardware system.
The VM is built with virtual resources managed by a
guest OS to run a specific application. Between the VMs and
the host platform, one needs to deploy a middleware
layer called a virtual machine monitor (VMM).
Ø The guest OS could be a Linux system and the
hypervisor is the XEN system developed at Cambridge
University. This hypervisor approach is also called bare-
metal VM, because the hypervisor handles the bare hardware
(CPU, memory, and I/O) directly.
2. VM Primitive Operations:
Ø The VMM provides the VM abstraction to the guest OS.
Ø With full virtualization, the VMM exports a VM
abstraction identical to the physical machine so that a

19
CLOUD COMPUTING

standard OS such as Windows 2000 or Linux can run just as


it would on the physical hardware.

Ø First, the VMs can be multiplexed between hardware


machines, as shown in Figure 1.13(a).
Ø Second, a VM can be suspended and stored in stable
storage, as shown in Figure 1.13(b).
Ø Third, a suspended VM can be resumed or provisioned
to a new hardware platform, as shown inFigure 1.13(c).

3. Virtual Infrastructures:
Physical resources for compute, storage, and
networking at the bottom of Figure 1.14 are mapped to
the needy applications embedded in various VMs at the top.

20
CLOUD COMPUTING

Ø Hardware and software are then separated. Virtual


infrastructure is what connects resources to distributed
applications.

1.2.5 Data Center Virtualization for Cloud


Computing
Ø Almost all cloud platforms choose the popular x86
processors. Low-cost terabyte disks and Gigabit Ethernet
are used to build data centers.

1. Data Center Growth and Cost Breakdown:


A large data center may be built with thousands of
servers. Smaller data centers are typically built with
hundreds of servers.
21
CLOUD COMPUTING

2. Low-Cost Design Philosophy:


Ø High-end switches or routers may be too cost-
prohibitive for building data centers. Thus, using high-
bandwidth networks may not fit the economics of cloud
computing.
Ø Recent advances in SOA, Web 2.0, and mashups of
platforms are pushing the cloud another step forward.
Ø Finally, achievements in autonomic computing and
automated data center operations contribute to the rise of
cloud computing.

Software Environments For Distributed Systems


And Clouds
Service-Oriented Architecture (SOA):
1.1 Layered Architecture for Web Services and
Grids:
These interfaces are linked with customized, high-level
communication systems: SOAP, RMI, and IIOP in the
three examples.

22
CLOUD COMPUTING

Ø These communication systems support features


including particular message patterns (such as Remote
Procedure Call or RPC), fault recovery, and specialized
routing.

2. Web Services and Tools:


Ø Loose coupling and support of heterogeneous
implementations make services more attractive than
distributed objects.
1.4.2 Trends toward Distributed Operating
Systems:

1.4.2.1 Distributed Operating Systems:

23
CLOUD COMPUTING

Ø Tanenbaum identifies three approaches for distributing


resource management functions in a distributed computer
system.

1.4.2.2 Amoeba versus DCE:


DCE is a middleware-based system for distributed
computing environments. The Amoeba was academically
developed at Free University in the Netherlands.
1.4.2.3 MOSIX2 for Linux Clusters:
Ø MOSIX2 is a distributed OS [3], which runs with a
virtualization layer in the Linux environment.
1.4.2.3 Transparency in Programming
Environments:

24
CLOUD COMPUTING

The user data, applications, OS, and hardware are


separated into four levels. Data is owned by users,
independent of the applications.
Ø The OS provides clear interfaces, standard
programming interfaces, or system calls to application
programmers.

1.4.3 Parallel and Distributed Programming


Models
Ø we will explore four programming models for
distributed computing with expected scalable
performance and application flexibility.

25
CLOUD COMPUTING

1.4.3.1 Message-Passing Interface (MPI):


Ø This is the primary programming standard used to
develop parallel and concurrent programs to run on a
distributed system.
Ø Besides MPI, distributed programming can be also
supported with low-level primitives such as the
Parallel Virtual Machine (PVM).
1.4.3.2 MapReduce:
Ø This is a web programming model for scalable
data processing on large clusters over large data
sets.

Ø A typical MapReduce computation process can


handle terabytes of data on tens of thousands or
more client machines.
1.4.3.3 Hadoop Library:
Ø Hadoop offers a software platform that was
originally developed by a Yahoo! group. The package
enables users to write and run applications over vast
amounts of distributed data.
1.4.3.4 Open Grid Services Architecture
(OGSA) :
Ø The development of grid infrastructure is driven
by large-scale distributed computing applications.

26
CLOUD COMPUTING

1.4.3.5 Globus Toolkits and Extensions:


Ø Globus is a middleware library jointly developed by the
U.S. Argonne National Laboratory and USC Information
Science Institute over the past decade.
1.5 PERFORMANCE, SECURITY, AND ENERGY
EFFICIENCY
1.5.1 Performance Metrics and Scalability Analysis:
Ø Performance metrics are needed to measure various
distributed systems. In this section, we will discuss
various dimensions of scalability and performance laws.
Then we will examine system scalability against OS images
and the limiting factors encountered.
1.5.1.1 Performance Metrics :

27

You might also like