Module 1 - Data Center Environment
Module 1 - Data Center Environment
ENVIRONMENT
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 1
Contents
▪ Application
▪ Database Management System (DBMS)
▪ Host(Compute)
▪ Connectivity
▪ Storage
▪ Disk Drive Components
▪ Disk Drive Performance
▪ Host Access to Data
▪ Direct-Attached Storage
▪ Storage Design Based on Application
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 2
Application
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 3
Database Management System (DBMS)
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 4
Host (Compute)
(I/O) devices
• Software components
4 Include OS, device driver, file system,
volume manager, and so on
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 5
Operating Systems and Device Driver
• In a traditional environment OS resides between the applications and
the hardware
4 Responsible for controlling the environment
• In a virtualized environment virtualization layer works between OS
and hardware
4 Virtualization layer controls the environment
4 OS works as a guest and only controls the application environment
4 In some implementation OS is modified to communicate with
virtualization layer
• Device driver is a software that enables the OS to recognize the
specific device
• Device drivers are hardware-dependent and operating-system-
specific
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 6
Memory Virtualization
• an expensive component of a host
• determines both the size and number of applications that can run on a
host
• Memory virtualization enables multiple applications and to run on a
host without impacting each other
− an operating system feature that virtualizes the physical memory (RAM)
of a host.
• Virtual memory manager (VMM) manages the virtual memory
− The VMM manages the virtual-to-physical memory mapping and fetches
data from the disk storage
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 7
• An OS feature that presents larger memory to
the application than physically available Operating System
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 8
• In a virtual memory implementation, the memory of a system is divided
into contiguous blocks of fixed-size pages.
• paging moves inactive physical memory pages onto the swap file and
brings them back to the physical memory when required.
− enables efficient use of the available physical memory among different
applications
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 9
Logical Volume Manager (LVM)
• software that runs on the compute system and manages logical and physical
storage
• intermediate layer between the file system and the physical disk.
→ partition a larger-capacity disk into virtual, smaller-capacity volumes -
partitioning or aggregate several smaller disks to form a larger virtual volume-
concatenation
• The LVM provides optimized storage access and simplifies storage resource
management.
─ hides details about the physical disk and the location of data on the disk
─ enables administrators to change the storage allocation even when the
application is running.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 10
LVM Example: Partitioning and Concatenation
Hosts
Logical Volume
Physical Volume
Partitioning Concatenation
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 11
• The basic LVM components are physical volumes, volume groups, and logical
volumes
‒ each physical disk connected to the host system is a physical volume (PV)
‒ A unique physical volume identifier (PVID) is assigned to each physical
volume when it is initialized for use by the LVM.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 12
• Logical volume appears as a physical device to the operating system
- made up of noncontiguous physical extents
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 13
Compute Virtualization
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 14
File System
• A file is a collection of related records or data stored as a unit with a name
• A file system is a hierarchical structure of files
─ provides users with the functionality to create, modify, delete, and access
files.
─ enables easy access to data files residing within a disk drive, a disk partition, or a
logical volume
─ consists of logical structures and software routines that control access to
files
─ Access to files on the disks is controlled by the permissions assigned to the file
by the owner, which are also maintained by the file system
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 15
• A file system organizes data in a structured hierarchical
manner via the use of directories, which are containers for
storing pointers to multiple files.
• Examples of common file systems are:
─ FAT 32 (File Allocation Table) for Microsoft Windows
─ NT File System (NTFS) for Microsoft Windows
─ UNIX File System (UFS) for UNIX
─ Extended File System (EXT2/3) for Linux
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 16
• Process of mapping user files to the disk storage subsystem with an LVM
1. Files are created and managed by users and applications.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 17
File System Blocks
Users
File System
Files
1 2 3
Disk Physical
Disk Sectors Extents LVM Logical Extents
6 5 4
Mapped to Mapped to Mapped to
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 18
• A file system can be
NonJournaling
▪ If the system crashes during the write process, the metadata or data might be
lost or corrupted
▪ When the system reboots, the file system attempts to update the metadata
structures by examining and repairing them -takes a long time on large file
systems
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 20
Compute Virtualization
Compute Virtualization
hardware resources
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 21
Need for Compute Virtualization
CPU NIC Card Memory Hard Disk CPU NIC Card Memory Hard Disk
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 22
Advantages
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 23
Desktop Virtualization
Desktop Virtualization
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 24
Connectivity
• Interconnection between hosts or between a host
and peripheral devices, such as storage
Host
Adapter Cable
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 25
• A host interface device or host adapter connects a host to other
hosts and storage devices.
• Examples of host interface devices are host bus adapter (HBA) and
network interface card (NIC).
• An HBA may contain one or more ports to connect the host to the
storage device.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 27
• Protocol
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 28
IDE/ATA and Serial ATA
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved.
SCSI and Serial SCSI
• Parallel Small computer system interface (SCSI)
4 Popular standard for connecting host and peripheral devices
8 Commonly used for storage connectivity in servers
4 Higher cost than IDE/ATA, therefore not popular in PC environments
4 Available in wide variety of related technologies and standards
4 Supports multiple simultaneous data access
4 Used primarily in “higher end” environments
4 Support up to 16 devices on a single bus
4 Ultra-640 version provides data transfer speed up to 640 MB/s
• Serial Attached SCSI (SAS)
4 Point-to-point serial protocol replacing parallel SCSI
4 Supports data transfer rate up to 6 Gb/s (SAS 2.0)
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 30
Fibre Channel (FC)
4 Widely used protocol for high-speed communication to the storage
device
4 Provides gigabit network speed
4 Provides a serial data transmission that operates over copper wire
and/or optical fiber
4 Latest version of the FC interface ‘16FC’ allows transmission of data up
to 16 Gb/s
Internet Protocol (IP)
4 Traditionally used to transfer host-to-host traffic
4 Succesfull option for host-to-storage communication
4 Offers advantages in terms of cost and maturity
4 Provide opportunity to leverage existing IP based network for storage
communication
8 Examples: iSCSI and FCIP protocols
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 31
Storage
• core component in a data center
• Storage Options
Magnetic Tape
4 Low cost solution for long term data storage
8 Preferred option for backup destination in the past
4 Limitations
8 Sequential data access
8 Single application access at a time
8 Physical wear and tear
8 Storage/retrieval overheads
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 32
• Optical discs
4 Popularly used as distribution medium in small, single-user
computing environments
4 Limited in capacity and speed
4 Write once and read many (WORM): CD-ROM, DVD-ROM
4 Other variations: CD-RW, Blu-ray discs
• Disk drive
4 Most popular storage medium
4 Large storage capacity
4 Random read/write access
• Flash drives
4 Uses semiconductor media
4 Provide high performance and low power consumption
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 33
Disk Drive Components
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 34
• The key components of a hard disk drive are platter, spindle, read-write
head, actuator arm assembly, and controller board
• I/O operations in a HDD are performed by rapidly moving the arm across
the rotating flat platters coated with magnetic particles.
• Data is transferred between the disk controller and magnetic platters
through the read-write (R/W) head which is attached to the arm.
• Data can be recorded and erased on magnetic platters any number of
times.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 35
Platter
• One or more flat circular disks.
• Data recorded in binary codes.
• Sealed in a case, called Head Disk Assembly(HDA).
• Data is encoded by polarizing magnetic area of disk surface.
• Number of platters and storage capacity of each platter determines total
storage capacity.
• Spindle
• Connects all platters.
• Spindle is connected to a motor.
• Speed 7,200 rpm, 10,000 rpm or 15,000 rpm.
• platter diameter 3.5” (90 mm)
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 36
Read/Write Head
• Read or write data from or to a platter.
• R/W Head changes magnetic polarization on the surface of the platter when
writing data.
• While reading, head detects magnetic polarization.
• Head never touches the surface of platter.
• A microscopic air gap between R/W head and platter surface, known as head
flying height.
• Head rests on special area – landing zone.
• Head crash leads to data loss.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 37
Actuator Arm Assembly
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 38
Controller
• It is mounted on PCB at the bottom of disk drive
• It has microprocessor, internal memory, circuitry and firmware
• Firmware controls power to spindle motor and speed of motor
• It also manages the communication between the drive and the host.
• In addition, it controls the R/W operations by moving the actuator arm
and switching between different R/W heads, and performs the
optimization of data access.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 39
Physical Disk Structure
Spindle Sector
Sector
Track
Cylinder
Track
Platter
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 40
• Data recorded on tracks
• Tracks are numbered, starting from zero from outer edge of platter
• Number of tracks per inch (TPI) measures track density
• Each track is divided into smallest, addressable units – sectors
• Tracks and sectors are written by manufacturer
• There are thousands of tracks on a platter based on its recording density
and dimension
• Ex: unformatted disk has 500GB capacity will only hold 465.7GB of user
data, and remaining 34.3GB is used for metadata.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 42
Zone Bit Recording
• Platters are made of concentric rings, Outer tracks hold more data than
inner tracks.
• On older disk drives, data density was low on outer tracks.
• This lead to inefficient use of available space.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 43
• Zone bit recording utilizes disk efficiently.
• Tracks are grouped into zones based on their distance from center of disk.
• Outer zone numbered 0, 1, 2…
• Appropriate number of sectors per track are assigned to each zone, so that
data density is uniform.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 44
Logical Block Addressing
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 45
• Earlier drives used physical addresses consisting of
the cylinder, head and sector(CHS) number to refer
to specific locations on the disk(as shown in fig (a) )
• Host OS has to be aware of the geometry of each
disk used
• Logical block addressing(LBA) simplifies addressing
by using a linear address to access physical blocks of
data
• The disk controller translates LBA to a CHS address
and the host needs to know only the size of the disk
drive in terms of the number of blocks
• The logical blocks are mapped to physical sectors on
a 1:1 basis
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 46
Example:
• Previous drive shows,
─ 8 sectors per track, 8 heads,
─ & 4 cylinders. (here tracks are referred by cylinders)
• This means: 8*8*4= 256 blocks can be formed, 0 to 255.
• Assuming 512 bytes stored in a block,
• For a 500GB drive, we will access to 976,000,000 blocks.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 47
Disk Drive Performance
• Electromechanical device
4 Impacts the overall performance of the storage system
• Disk service time
4 Time taken by a disk to complete an I/O request, depends on:
8 Seek time
8 Rotational latency
8 Data transfer rate
Disk service time = seek time + rotational latency + data transfer time
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 48
Seek Time
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 49
• Each of these specifications is measured in milliseconds
• The seek time of a disk is typically specified by the drive
manufacturer
• The average seek time on a modern disk is typically in the
range of 3 to 15 milliseconds
• Seek time has more impact on the read operation of random
tracks rather than adjacent tracks
• To minimize the seek time, data can be written to only a
subset of the available cylinders
• This results in lower usable capacity than the actual capacity
of the drive.
• For example, a 500 GB disk drive is set up to use only the first
40 percent of the cylinders and is effectively treated as a 200
GB drive. This is known as short-stroking the drive.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 50
Rotational Latency
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environnent 51
• Average rotational latency for a 15,000 rpm (or 250 rps) drive
0.5/250 = 2 milliseconds.
• Average rotational latency is approximately 5.5 ms for a 5,400-rpm drive
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 52
Data Transfer Rate
• Average amount of data per unit time that the drive can deliver to the HBA
• Read operation:
─ Data first moves from disk platters to R/W heads, then it moves to the drive’s
internal buffer
─ Data moves from the buffer through the interface to the host HBA
• Write operation:
─ Data moves from the HBA to the internal buffer of the disk drive through the
drive’s interface
─ The data then moves from the buffer to the R/W heads
─ Finally, it moves from R/W heads to the platters
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 53
• The data transfer rates during the R/W operations are measured in terms of
internal and external transfer rates
4 Internal transfer rate : Speed at which data moves from a platter’s surface to the
internal buffer of the disk
4 External transfer rate: Rate at which data move through the interface to the HBA
Head Disk
HBA Interface Buffer Assembly
Disk Drive
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 54
Disk I/O Controller Utilization
• The I/O requests arrive at the controller at the rate generated by the application.
This rate is also called the arrival rate.
• These requests are held in the I/O queue, and the I/O controller processes them
one by one
• The I/O arrival rate, the queue length, and the time taken by the I/O controller
to process each request determines the I/O response time.
• If the controller is busy or heavily utilized, the queue size will be large and the
response time will be high.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environnent 55
• Based on the fundamental laws of disk drive performance, the relationship
between controller utilization and average response time is given as
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environnent 56
• The graph indicates that the
response time changes are
nonlinear as the utilization
increases.
• When the average queue sizes
are low, the response time
remains low.
• The response time increases
slowly with added load on the
queue and increases
exponentially when the
Fig: Utilization versus response
utilization exceeds 70 percent.
time
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 57
Host Access to Data
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 58
• Understanding access to data over a network is important because it lays
the foundation for storage networking technologies
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 59
Block level access
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 60
File-level access
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 61
Object-level access
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 62
Direct-Attached Storage (DAS)
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 63
Internal DAS architectures
• The storage device is internally connected to the host by a serial or parallel bus
• The physical bus has distance limitations and can be sustained only over a
shorter distance for highspeed connectivity.
• In addition, most internal buses can support only a limited number of devices,
and they occupy a large amount of space inside the host, making maintenance
of other components difficult.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 64
External DAS architectures
• The host connects directly to the external storage device, and data is
accessed at the block level
• In most cases, communication between the host and the storage device
takes place over a SCSI or FC protocol
• Compared to internal DAS, an external DAS overcomes the distance and
device count limitations and provides centralized management of
storage devices.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 65
DAS Benefits and Limitations
Benefits
• requires a relatively lower initial investment than storage networking architectures
• The DAS configuration is simple and can be deployed easily and rapidly
• The setup is managed using host-based tools, such as the host OS, which makes
storage management tasks easy for small environments
• Because DAS has a simple architecture, it requires fewer management tasks and less
hardware and software elements to set up and operate.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 66
Limitations
• DAS does not scale well. A storage array has a limited number of ports, which restricts
the number of hosts that can directly connect to the storage.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 67
Storage Design Based on Application Requirements and Disk Drive Performance
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environnent 68
Example: Consider the following specifications provided for a disk
• The average seek time is 5 ms in a random I/O environment;
therefore, T = 5 ms.
• Disk rotation speed of 250 revolutions per second — from which
rotational latency (L) can be determined, which is one-half of the
time taken for a full rotation or L = (0.5/250 rps expressed in ms)
• 40 MB/s internal data transfer rate, from which the internal transfer
time (X) is derived based on the block size of the I/O — for example,
an I/O with a block size of 32 KB; therefore X = 32 KB/40 MB
• The time taken by the I/O controller to serve an I/O of block size 32
KB is (TS) = 5 ms + (0.5/250) + 32 KB/40 MB = 7.8 ms
• Therefore, the maximum number of I/Os serviced per second (IOPS)
is
(1/TS) = 1/(7.8 × 10-3) = 128 IOPS
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environnent 69
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 70
• Disks required to meet an application’s capacity need (DC):
• IOPS serviced by a disk (S) depends upon disk service time (TS):
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environnent 71
Module 2: Summary
Key points covered in this module:
• Key data center elements
• Application and compute virtualization
• Disk drive components and performance
• Host access to storage
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 72
Exercise: Design Storage Solution for New
Application
• Scenario
4 Characteristics of new application:
8 Require 1TB of storage capacity
8 Peak I/O workload 4900 IOPS
8 Typical I/O size is 4KB
4 Specifications of the available disk drives:
8 15K rpm drive with storage capacity = 100 GB
8 Average seek time = 5ms
8 Data transfer rate = 40 MB/sec
4 As it is business critical application, response time must be within
acceptable range
• Task
4 Calculate the number of disks required for the application
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Center Environment 73