Unit - 2 - Data Storage and Cloud Computing
Unit - 2 - Data Storage and Cloud Computing
[7Hrs]
1
https://youtu.be/aOg1SXXp0JE
2
Data Storage
3
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
4
Data Storage
● File storage
● In file storage, data is stored in files, the files are organized in
folders, and the folders are organized under a hierarchy of directories and
subdirectories.
● To locate a file, all you or your computer system need is the path—from
directory to subdirectory to folder to file.
● If you need to store very large or unstructured data volumes, you should
consider block-based or object-based storage
● Example:Harddrive,google drive etc.
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
5
Data Storage
● Block Storage:
● Block storage breaks a file into equally-sized chunks (or blocks) of data and
stores each block separately under a unique address.
● Rather than conforming to a rigid directory/subdirectory/folder structure,
blocks can be stored anywhere in the system.
● To access any file, the server's operating system uses the unique address to
pull the blocks back together into the file, which takes less time than
navigating through directories and file hierarchies to access a file.
● Example:Block Storage are SAN, iSCSI, and local disks. 6
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage
● Object Storage:
● unstructured media and web content like email, videos, image files, web
pages, and sensor data produced by the Internet of Things (IoT).
● object is a simple, self-contained repository that includes the data,
metadata (descriptive information associated with an object), and a
unique identifying ID number.
● This information enables an application to locate and access the
object.
● Example: storing objects like videos and photos on Facebook, songs7
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Different Storage Types
File Storage Block Storage Object Storage
8
Data Storage Challenges
● Some challenges are :
● massive data demand
● performance barrier
● power consumption and cost.
9
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage Challenges
● Massive Data Demand
● An industry survey estimates the digital world to increase by 45
zettabytes by 2020, that is, one terabyte is equal to 1024 gigabytes,
one petabytes is equal to 1024 terabytes, one exabytes is equal to
1024 petabytes and one zettabytes is equal to 1024 exabytes.
10
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage Challenges
● Performance Barrier
● Rapid growth in data has caused a parallel increase in the size of
databases.
● In the traditional storage method, the response time taken for queries is
slow and it should be increased.
● Be it a social networking site, an enterprise database or a web
application, all requires faster disk access to read and write data.
11
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage Challenges
● Power Consumption and Cost
● Because of increase in storage demands, IT organizations and data
centres need larger storage with minimal cost.
● Performance lags with minimal cost but has other expenses like
licensing and maintenance.
● Apart from this, other factors such as power consumed by storage
devices, cooling systems, man power for managing it and space for data
centres are to be considered.
12
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Introduction to Enterprise Data Storage,
● The size of data that businesses can store depends on the storage type
they use.
13
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Introduction to Enterprise Data Storage,
15
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Introduction of DAS
● Advantage of DAS
● Disadvantage of DAS
16
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
17
Direct Attached Storage(DAS)
● Introduction of DAS
● Direct-attached storage (DAS) is a type of storage that is attached directly to
a computer without going through a network.
● The storage might be connected internally or externally.
● Only the host computer can access the data directly.
● Most servers, desktops and laptops contain an internal hard disk drive (
HDD) or solid-state drive (SSD).
● Some computers also use external DAS devices.
18
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Introduction of DAS
● In some cases, an enterprise server might connect directly to drives
that are shared by other servers.
● A direct-attached storage device is not networked.
● An external DAS device connects directly to a computer through an
interface such as Small Computer System Interface (SCSI), Serial Advanced
Technology Attachment (SATA), Serial-Attached SCSI (SAS), FC or Internet
SCSI (iSCSI).
19
Direct Attached Storage(DAS)
● Advantage of DAS
● DAS can provide users with better performance than networked storage
because the server does not have to traverse a network to read and
write data, which is why many organizations turn to DAS for applications
that require high performance.
● DAS is also less complex than network-based storage systems, making
it easier to implement and maintain, and it is cheaper.
20
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Disadvantage of DAS
● Introduction of SAN
● Advantage of SAN
● Disadvantage of SAN
22
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
23
Storage Area Network(SAN)
● Introduction of SAN
● A Storage Area Network (SAN) is a specialized, high-speed
network that provides network access to storage devices.
● SANs are typically composed of hosts, switches, storage
elements, and storage devices that are interconnected using a
variety of technologies, topologies, and protocols.
24
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Introduction of SAN
● Traditionally, only a limited number of storage devices could attach
to a server, limiting a network's storage capacity.
● But a SAN introduces networking flexibility enabling one server, or
many heterogeneous servers across multiple data centers, to share
a common storage utility.
25
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Advantage of SAN
● Simplified storage administration
● Disk mirroring
● Low cost of storage management
● Instant and real-time information
● Ability to boot itself and expand the storage capacity
● SAN is not directly attached to any particular server or network, SAN
can be shared by all
26
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Disadvantage of SAN
● If client computers need intensive data transfer then SAN is not the right choice.
SAN is good for low data traffic
● More expensive
● It is very hard to maintain
● As all client computers share the same set of storage devices so sensitive data
can be leaked. It is preferable not to store confidential information on this
network.
● Poor implementation results in a performance bottleneck
● Not affordable for small business
● Require a high-level technical person 27
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Introduction of NAS
● Advantage of NAS
● Disadvantage of NAS
29
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
30
Network Attached Storage(NAS)
● Introduction of NAS
● An NAS device is a storage device connected to a network that allows
storage and retrieval of data from a central location for authorised
network users and varied clients.
● NAS is a centralized, file server, which allows multiple users to store and
share files over a TCP/IP network via Wifi or an Ethernet cable.
● It is also commonly known as a NAS box, NAS unit, NAS server, or NAS head.
31
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Introduction of NAS
● Network Protocols: TCP/IP protocols –i.e. Transmission Control Protocol (TCP)
and Internet Protocol (IP)—are used for data transfer, but the network
protocols for data sharing can vary based on the type of client.
32
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Advantage of NAS
● Simple to operate, a dedicated IT professional is often not required
● Lower cost
● Easy data backup, so it’s always accessible when you need it
● Good at centralising data storage in a safe, reliable way
● Disadvantage of NAS
● Out-of-sync data
● Reliability and accessibility issues if storage goes down
33
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
35
Data Storage management
● Data storage Management tool must rely on policies which govern the
usage of storage devices .
● Data Storage management refers to the software and processes that
improve the performance of data storage resources.
● It may include network virtualization, replication, mirroring, security,
compression, deduplication, traffic analysis, process automation, storage
provisioning and memory management.
36
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage management
38
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Google Cloud :File system
● Google Cloud Storage (GCS): Google Cloud Storage is an object
storage service that offers scalable, durable, and highly available
storage for large volumes of unstructured data. It is designed to
store and serve a wide variety of data types, including images,
videos, backups, and archives. Google Cloud Storage offers different
storage classes, including Standard, Nearline, Coldline, and Archive,
allowing users to optimize storage costs based on their data access
requirements.
● Google Cloud Filestore: Google Cloud Filestore provides fully
managed Network Attached Storage (NAS) for applications that
require shared file storage.
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
39
Cloud File System
● Introduction
● Ghost File System:
● Gluster File System:
● Hadoop File System:
● XtreemFS: A Distributed and Replicated File System:
● Kosmos File System:
● CloudFS:
● Google File system(GFS):
40
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction
FAT:
● FAT was planned for systems with very small RAM and small disks. It required
much less system resources compared to other file systems like UNIX.
NTFS
● NTFS is much simpler than FAT.
● While files are used, the system areas can be customized,enlarged, or moved
as required. NTFS has much more security incorporated.
● NTFS is not apt for small-sized disks.
44
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction
45
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System
● In cloud file systems, the considerations are:
● It must sustain basic file system functionality.
● It should be an open source.
● It should be grown-up enough that users will at least think about trusting
their data to it.
● It should be shared, i.e., available over a network.
● It should be paralleling scalable.
● It should provide honest data protection, still on commodity hardware
with only internal storage.
46
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Ghost File System
50
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Gluster File System
51
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Gluster File System
● Users no longer locked with legacy storage platforms which are costly and
monolithic.
● GlusterFS gives users the ability to deploy scale-out, virtualized storage,
centrally managed pool of storage.
● Attributes of GlusterFS include scalability and performance, high
availability, global namespace, elastic hash algorithm, elastic volume
manager, gluster console manager, and standards-based.
52
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Hadoop File System
53
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
54
Cloud File System :XtreemFS
56
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :CloudFS
58
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System:Google file System
59
60
61
Cloud Data Stores
62
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Popular Cloud Data Stores
● Amazon Web Services (AWS) - Amazon DynamoDB, Amazon
RDS (Relational Database Service), Amazon Redshift, Amazon
Aurora, Amazon S3 (Simple Storage Service)
● Microsoft Azure - Azure Cosmos DB, Azure SQL Database, Azure
Blob Storage, Azure Data Lake Storage
● Google Cloud Platform (GCP) - Google Cloud Bigtable, Google
Cloud SQL, Google Cloud Storage, Google BigQuery
● IBM Cloud - IBM Db2 on Cloud, IBM Cloud Object Storage, IBM
Cloudant
● Oracle Cloud - Oracle Database Cloud Service, Oracle NoSQL
Database Cloud Service, Oracle Object Storage 63
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores
● A data store is a data repository where data are stored as objects.
● Data store includes data repositories, flat files that can store data.
● Data stores can be of different types:
● Relational databases (Examples: MySQL, PostgreSQL, Microsoft SQL Server, Oracle
● Database)
● Object-oriented databases
● Operational data stores
● Schema-less data stores, e.g. Apache Cassandra or Dynamo
● Paper files
● Data files (spreadsheets, flat files, etc)
64
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Distributed Data Store
65
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Types of Data Stores:BigTable
67
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Bigtable storage model
68
69
Bigtable architecture
70
Cloud Data Stores:Types of Data Stores:Dynamo
75
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
● What is grids?
● Grid Storage for Grid Computing
● Grid Oriented Storage (GOS)
76
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
77
78
Example For Grids Computing
X = ((5 x 7) + (6 x 3) + (4 x 5) /2)*41
79
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Example of Grids Computing
In grid computing, each task is broken into small fragments and distributed across computing nodes for efficient execution.
Each fragment is processed in parallel, and, as a result, a complex task is accomplished in less time. Let’s consider this
equation:
X = (5 x 7) + (6 x 3) + (4 x 5)
Typically, on a desktop computer, the steps needed here to calculate the value of X may look like this:
● Step 1: X = 35 + (6 x 3) + (4 x 5)
● Step 2: X = 35 + 18 + (4 x 5)
● Step 3: X = 35 + 18 + 20
● Step 4: X = 73
However, the steps in a grid computing setup differ as three processors or computers calculate different pieces of the equation
separately and combine them later. This implies fewer steps and shorter timeframes. 80
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
81
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
83
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
84
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
● Grid Oriented Storage (GOS)
● GOS is a dedicated data storage architecture connected directly to a computational
grid.
● It supports and acts as a data bank and large supply for data if needed, which
can be shared among multiple grid clients.
● GOS is a successor of Network-Attached Storage (NAS) products in the grid
computing era.
● GOS accelerates all kinds of applications in terms of performance and
transparency.
● A GOS system contains multiple hard disks, arranged into logical, redundant storage
85
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
●Introduction
●Cloud Data Management Interface (CDMI)
●Cloud Storage Requirements
90
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● Introduction
● Cloud storage should incorporate new services according to change of
time.
● For cloud storage, a standard document is placed by SNIA(Storage
Networking Industry Association), Storage Industry Resource Domain
Model (SIRDM).
● Figure shows the SIRDM model which uses CDMI standards(Cloud Data
Management Interfac)
● SIRDM model adopts three metadata:
91
Data Management for Cloud Storage
● User metadata is used by the cloud to find the data objects and
containers.
● Storage system metadata is used by the cloud to offer basic storage
functions like assigning,modifying and access control.
● Data system metadata is used by the cloud to offer data as a service
based on user requirements and controls the operation based on that
data.
92
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
93
Data Management for Cloud Storage
● Cloud Data Management Interface (CDMI)
● To create, retrieve, update and delete objects in a cloud ,the cloud
data management interface (CDMI) is used.
● The functions in CDMI are:
● Cloud storage offerings are discovered by clients
● Management of containers and the data
● Sync metadata with containers and objects
94
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● Cloud Data Management Interface (CDMI)
● CDMI is also used to manage containers, domains, security access
and billing information.
● CDMI standard is also used as protocols for accessing storage.
● CDMI defines how to manage data and also ways of storing and
retrieving it.
● ‘Data path’ means how data is stored and retrieved.
● ‘Control path’ means how data is managed.
● CDMI standard supports both data path and control path interface.
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
95
Provisioning Cloud Storage
● Cloud means sharing third party resources via the Internet.
● This sharing can be done on need basis and there is no need to
invest any infrastructure at consumers end.
● Capacity of storage can be increased on need basis and can be done
using multi-tenancy methods.
96
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Provisioning Cloud Storage
● By adopting Cloud Data Management Interface (CDMI), standard
service providers can implement the method for metering the
storage and data usage of consumers.
● This interface also helps the providers for billing to the IT
organizations based on their usage.
● Advantage of this interface is that IT organizations need not write/use
different adapters used by the service providers.
97
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●Introduction
●Processing Approach
●System Architecture
98
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●Introduction
● Data-intensive computing is a related type of computing which use
parallelism concept for processing large volumes of data, called big data.
● Parallel processing approaches are divided into two types: compute-
intensive and data intensive.
99
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●Introduction
●Compute-intensive:Applications which need more
execution time for computational requirements
●Data-intensive :Applications which to try to find
large volume of data and time in process.
100
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●Processing Approach
●Data-intensive computing platforms use a parallel computing
approach.
●This approach combines multiple processors and disks as
computing clusters connected via high-speed network.
●The data that are needed to be processed are independently
done by computing resources available in the clusters.
101
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●System Architecture
● For data-intensive computing an array of system architectures
have been implemented.
● Architecture for data-intensive computing
1. MapReduce
2. HPCC
103
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●System Architecture
● 1. MapReduce
● MapReduce concept which is developed by Google and available as open-source
implementation known as Hadoop.
● This project is used by Yahoo, Facebook and others.
● To create a map function, the MapReduce architecture uses a functional
programming style using key-value pair.
● Reduce function merges all intermediate values using intermediate keys.
● Hence programmers who do not have experience in parallel programming can simply
use a large distributed processing environment without any problem.
104
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
105
Data-intensive Technologies for Cloud
Computing
●System Architecture
● 2. HPCC:(High-Performance Computing Cluster).
● Developed by Lexis Nexis Risk Solutions called LexisNexis.
● LexisNexis Risk Solutions independently developed and
implemented a solution for data intensive computing called the
HPCC .
● The LexisNexis method structure clusters with commodity hardware
that runs in Linux OS.
106
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud
Computing
●System Architecture
● 2. HPCC
107
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
❏ Cloud Characteristics
❏ Distributed Data Storage.
❏ Application Utilizing Cloud storage
108
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
★ Cloud Characteristics
★ There are three characteristics of a cloud computing ,considered before
choosing storage in cloud.
111
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
115
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
ThruDB is an open source database built on Apache's Thrift framework and
is a set of simple services such as scaling, indexing and storage which is
used for building and scaling websites.
116
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
118
119
Free Cloud Storage
Terabox:1024
Mpeg box
Telegram
120
AWS:File system
● Amazon S3 (Simple Storage Service): Amazon S3 is an object storage service that offers industry-
leading scalability, data availability, security, and performance. It is suitable for a wide variety of use
cases, including backup and restore, data archiving, data lakes, and big data analytics. S3 is not a
traditional file system but rather an object storage system, where data is stored as objects within
buckets.
● Amazon EFS (Elastic File System): Amazon EFS provides scalable file storage for use with Amazon
EC2 instances in the AWS Cloud. It is designed to provide scalable, elastic, and shared file storage
that is compatible with the NFSv4 protocol. Amazon EFS can be used to support a wide range of file-
based workloads and applications, including content repositories, development environments, and
data analytics workloads.
Both Amazon S3 and Amazon EFS have their own advantages and use cases. Amazon S3 is ideal for
storing large amounts of unstructured data, while Amazon EFS is suitable for applications that require
shared file storage and compatibility with the NFSv4 protocol. Depending on your specific requirements,
121
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Azure:File system
Azure Blob Storage and Azure Files, which are commonly used for storing files in the
cloud.
Azure Blob Storage: Azure Blob Storage is Microsoft's object storage solution for the
cloud. It is designed to store and serve large amounts of unstructured data, such
as text or binary data, such as documents, images, videos, and backups. Blob
storage offers various tiers for different access patterns, including hot, cool, and
archive tiers, allowing users to optimize storage costs based on their data access
requirements.
Azure Files: Azure Files offers fully managed file shares in the cloud using the Server
Message Block (SMB) protocol. It provides the ability to create file shares that can
be accessed from multiple Azure virtual machines or on-premises systems over
standard SMB protocols. Azure Files is suitable for scenarios requiring shared file
122
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
●ACID stands for atomicity, consistency, isolation, and durability
123
Extra:Chubby Lock Service
● Chubby is used extensively inside Google in various systems such
as GFS, BigTable.
● Chubby is used to elect a master, allow the master to discover the
servers it controls, and allow clients to find the master.
● It is also used to store metadata. Chubby is the root of its distributed
data structures.
● Google Chubby is a highly available and persistent distributed lock
service and configuration manager for large-scale distributed systems.
● It was first introduced in 2006 to manage locks for resources and store
configuration information for various distributed services throughout the
Google cluster environment.
● Since then, it has since become a important component of many
Google services, including the Google File System, Bigtable, MapReduce
etc 126
Basic Concept
● Sorted Strings Table (SSTable)
● is a persistent file format used by ScyllaDB, Apache Cassandra, and other
NoSQL databases to take the in-memory data stored in memtables, order
it for fast access, and store it on disk in a persistent, ordered, immutable
set of files. Immutable means SSTables are never modified.
● An SSTable provides a persistent, ordered immutable map from keys to
values, where both keys and values are arbitrary byte strings.
● Colossus is our cluster-level file system, successor to the Google File
System (GFS).
127
128