Automatic Storage
Management
Julian Dyke
Independent Consultant
Web Version - December 2008
1 © 2008 Julian Dyke juliandyke.com
Objectives
1. Understand how Oracle database files are stored in ASM
2. Calculate how long ASM rebalance operations will take
2 © 2008 Julian Dyke juliandyke.com
Agenda
ASM Instances
ASM Disk Groups
Metadata
Extent Distribution
Rebalancing
Redundancy
3 © 2008 Julian Dyke juliandyke.com
ASM
Instances
4 © 2008 Julian Dyke juliandyke.com
ASM Single Instance Architecture
OCSSD Oracle
Daemon Clusterware
Only
ASM
Instance
RDBMS
Instance
Server
Dedicated
Storage
5 © 2008 Julian Dyke juliandyke.com
ASM Single Instance Background Processes
Oracle 11.1
DBW0 VKTM PSP0 MMAN RBAL
LGWR Fixed Area GMON
CKPT X000
Variable Area
SMON DIAG
PMON ASM Cache DIA0
6 © 2008 Julian Dyke juliandyke.com
ASM RAC Architecture
Public
Network
Private
Network
Oracle Oracle Oracle Oracle
Clusterware Clusterware Clusterware Clusterware
ASM ASM ASM ASM
Instance Instance Instance Instance
RDBMS RDBMS RDBMS RDBMS
Instance Instance Instance Instance
Node 1 Node 2 Node 3 Node 4
Storage
Network
Shared
Storage
7 © 2008 Julian Dyke juliandyke.com
ASM RAC Architecture
CLUSTERWARE Clusterware CLUSTERWARE
+ASM1 ASM +ASM2
Instances
PROD1 TEST1 RDBMS PROD2 TEST2
Instances
PROD
TEST
Database Files
8 © 2008 Julian Dyke juliandyke.com
ASM RAC Instance Background Processes
Oracle 11.1
LMON LMS0 LMD0 LCK0 DIAG
DBW0 VKTM PSP0 MMAN RBAL
LGWR Fixed Area GMON
CKPT X000
SMON Variable Area DIAG
PMON DIA0
MARK ASM Cache KATE
9 © 2008 Julian Dyke juliandyke.com
ASM
Disk
Groups
1 © 2008 Julian Dyke juliandyke.com
ASM Disk Groups and Disks
Disk Disk Disk
Group 1 Group 2 Group 3
Disk 1 Disk 2 Disk 5 Disk 3
Disk 4 Disk 6 Disk 7
11 © 2008 Julian Dyke juliandyke.com
ASM Disk Groups and Disks
Disk Disk Disk
Group 1 Group 2 Group 3
Disk 1 Disk 4 Disk 2 Disk 5 Disk 6 Disk 7 Disk 3
1 © 2008 Julian Dyke juliandyke.com
ASM Disk Groups, Disks and Database Files
Disk Disk Disk
Group 1 Group 2 Group 3
File 2
File 3
File 1
File 5
File 4
File 5 File 6
Disk 3
Disk 1 Disk 4
Disk 2 Disk 5 Disk 6 Disk 7
1 © 2008 Julian Dyke juliandyke.com
File Extents versus Allocation Units
File Extent
Logical unit of ASM file
Map to allocation units
One to many mapping
Allocation Unit
Physical unit of ASM disk
Oracle 10.2 and below
Always 1MB
Can be increased using _asm_ausize
Oracle 11.1 and above
Variable size
1MB, 2MB ,4MB, 8MB, 16MB, 32MB, 64MB
1 © 2008 Julian Dyke juliandyke.com
X$KFFXP
Maps file extents to allocation units
Only populated in ASM instance
Columns include
Column Name Description
GROUP_KFFXP Disk Group Number
NUMBER_KFFXP File Number
COMPOUND_KFFXP Disk Group Number || File Number
INCARN_KFFXP Incarnation Number
PXN_KFFXP Physical Extent Number (within file)
XNUM_KFFXP Logical Extent Number (within file)
LXN_KFFXP 0=primary, 1=first mirror, 2=second mirror
DISK_KFFXP Disk Number
AU_KFFXP Allocation Unit Number (within disk)
SIZE_KFFXP Size (# allocation units)
1 © 2008 Julian Dyke juliandyke.com
ASM Metadata
Metadata is stored in first 256 files in ASM disk group
Space is initially allocated when disk group is created
Can be subsequently extended
Metadata allocation units are divided into blocks
Each block is 4096 bytes
Block size specifed using _asm_blksize
File# Description
Metadata files include
0 Metadata Header
1 File Directory
2 Disk Directory
3 Active Change Directory
4 Continuing Operations Directory
5 Template Directory
6 Alias Directory
9 Attribute directory (optional)
12 Staleness registry (optional)
1 © 2008 Julian Dyke juliandyke.com
ASM Metadata
Metadata Header
Disk Header
Partner Status Table
Free Space
File Directory Table
Disk Directory Allocation
Table
Active Change
Directory
Continuing Operations
Directory
Template Directory
Alias Directory Metadata
Header
1 © 2008 Julian Dyke juliandyke.com
ASM Metadata
Initial Allocation (Single Instance)
File# AU Description # AUs
0 0 Disk Header, Free Space Table, Allocation Table 1
1 Partner Status Table 1
1 File Directory 1
2 Disk Directory 1
3 Active Change Directory 42
4 Continuing Operations Directory 2
5 Template Directory 1
6 Alias Directory 1
Active Change Directory
Records changes to metadata
Used during recovery of instance or operation failures
Continuing Operations Directory
Maintains state of active operations
1 © 2008 Julian Dyke juliandyke.com
ASM Metadata Block Types
Type Description Type Description
1 KFBTYP_DISKHEAD 13 KFBTYP_PST_NONE
2 KFBTYP_FREESPC 14 KFBTYP_HASHNODE
3 KFBTYP_ALLOCTBL 15 KFBTYP_COD_RBO
4 KFBTYP_FILEDIR 16 KFBTYP_COD_DATA
5 KFBTYP_LISTHEAD 17 KFBTYP_PST_META
6 KFBTYP_DISKDIR 18 KFBTYP_PST_DTA
7 KFBTYP_ACDC 19 KFBTYP_HBEAT
8 KFBTYP_CHNGDIR 20 KFBTYP_SR
9 KFBTYP_CODBGO 21 KFBTYP_STALEDIR
10 KFBTYP_TMPLTDIR 22 KFBTYP_VOLUMEDIR
11 KFBTYP_ALIASDIR 23 KFBTYP_ATTRDIR
12 KFBTYP_INDIRECT
1 © 2008 Julian Dyke juliandyke.com
KFED Utility
In Oracle 10.2 and above the kfed utility can be used to inspect and edit the
contents of ASM blocks
[oracle@server3 ~]$ $ORACLE_HOME/bin/kfed -h
as/mlib ASM Library [asmlib='lib']
aun/um AU number to examine or update [AUNUM=number]
aus/z Allocation Unit size in bytes [AUSZ=number]
blkn/um Block number to examine or update [BLKNUM=number]
blks/z Metadata block size in bytes [BLKSZ=number]
ch/ksum Update checksum before each write [CHKSUM=YES/NO]
cn/t Count of AUs to process [CNT=number]
d/ev ASM device to examine or update [DEV=string]
o/p KFED operation type
[OP=READ/WRITE/MERGE/NEW/FORM/FIND/STRUCT]
p/rovnm Name for provisioning purposes [PROVNM=string]
s/eek AU number to seek to [SEEK=number]
te/xt File name for translated block text [TEXT=string]
ty/pe ASM metadata block type number [TYPE=number]
This utility should only be used under the guidance of Oracle Support
2 © 2008 Julian Dyke juliandyke.com
KFED Utility
For example to dump blocks in aliases directory in DISKGROUP1
Find group number
SELECT group_number FROM v$asm_diskgroup
WHERE name = 'DISKGROUP1';
Alias directory is stored in file number 6
SELECT disk_kffxp, au_kffxp FROM x$kffxp
WHERE group_kffxp = 1
AND number_kffxp = 6
AND lxn_kffxp = 0;
Disk Allocation Unit
0 49
Find disk name
SELECT path FROM v$asm_disk
WHERE group_number = 1
AND disk_number = 0;
Path
/dev/oracleasm/disks/VOL1
2 © 2008 Julian Dyke juliandyke.com
KFED Utility
Example (continued)
Allocation unit is 1MB
Block size is 4096
Therefore there are 256 blocks per allocation unit
Starting block offset = 256 * 49 = 12544
for (( f = 12544 ; f < 12544 + 256 ; f++ ))
do
kfed op=read blkn=$f dev='/dev/oracleasm/disks/VOL1' > blk${f}
done
2 © 2008 Julian Dyke juliandyke.com
Extent
Distribution
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
Creating a disk group:
CREATE DISKGROUP diskgroup1
EXTERNAL REDUNDANCY
DISK '/dev/oracleasm/disks/VOL1';
Dropping a disk group:
DROP DISKGROUP diskgroup1
INCLUDING CONTENTS
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
1 disk
Metadata
0
1
2
3
4
5
6
7
Disk 0
Metadata Data
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
2 disks
1
3
5
Metadata 7
9
11
13
0 15
2
4
6
8
10
12
14
Disk 0 Disk 1
Metadata Data
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
4 disks
2 1 3
6 5 7
10 9 11
Metadata 14 13 15
18 17 19
22 21 23
26 25 27
0 30 29 31
4
8
12
16
20
24
28
Disk 0 Disk 1 Disk 2 Disk 3
Metadata Data
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
1 large disk - 1 small disk
2
5
8
Metadata 11
0
1
3
4
6
7
9
10
Disk 0 Disk 1
Metadata Data
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
1 large disk - 3 small disks
2 1 4
7 6 9
12 11 14
Metadata 17 16 19
0
3
5
8
10
13
15
18
Disk 0 Disk 1 Disk 2 Disk 4
Metadata Data
2 © 2008 Julian Dyke juliandyke.com
Extent Distribution
2 large disks - 2 small disks
1 3 5
4 9 11
7 15 17
Metadata 10 21 23
13
16
19
0 22
2
6
8
12
14
18
20
Disk 0 Disk 1 Disk 2 Disk 4
Metadata Data
3 © 2008 Julian Dyke juliandyke.com
Rebalancing
3 © 2008 Julian Dyke juliandyke.com
Extent Distribution
Adding a disk:
ALTER DISKGROUP diskgroup1
ADD DISK '/dev/oracleasm/disks/VOL2'
REBALANCE POWER 0;
Dropping a disk:
ALTER DISKGROUP diskgroup1
DROP DISK 'DISKGROUP1_0002'
REBALANCE POWER 0;
Rebalancing a disk group:
ALTER DISKGROUP diskgroup1
REBALANCE POWER 1;
3 © 2008 Julian Dyke juliandyke.com
Rebalancing
Adding disks - 1 disks to 2 disks
0 1
1 3
2 5
3 7
4 1
5 3
6 5
7 7
0
1
2
3
4
5
6
7
Disk 0 Disk 1
Metadata Data
3 STOP © 2008 Julian Dyke juliandyke.com
Rebalancing
Adding disks - 1 disks to 4 disks
0 2 1 0
1 6 5 4
2 2 1 0
3 6 5 4
4
5
6
7
0
1
2
3
4
5
6
7
Disk 0 Disk 1 Disk 2 Disk 3
Metadata Data
3 STOP © 2008 Julian Dyke juliandyke.com
Rebalancing
Adding disks - 2 disks to 3 disks
0 0 0
1 12 3
2 4 2
3 6 5
4 48 8
5 10 11
1 12 14
3 14 17
5 16
7
9
11
13
15
17
Disk 0 Disk 1 Disk 2
Metadata Data
3 © 2008 Julian Dyke juliandyke.com
Rebalancing
Adding disks - 2 disks to 4 disks
0 20 1 0
1 2 5 4
2 64 1 0
3 6 5 4
4 8 9 8
5 10 13 12
6 12
7 14
1
3
5
7
9
11
13
15
Disk 0 Disk 1 Disk 2 Disk 3
Metadata Data
3 STOP © 2008 Julian Dyke juliandyke.com
Rebalancing
Dropping disks - 3 disks to 1 disk
0 1 2
3 4 5
0 1 2
3 4 5
6 7 8
1
2
4
5
1
2
4
5
7
8
Disk 0 Disk 1 Disk 2
Metadata Data
3 STOP © 2008 Julian Dyke juliandyke.com
Rebalancing
Moving disks - 2 disks to 2 disks
0 0 0
1 2 2
2 4 4
3 6 6
4 8 0
5 10 2
6 12 4
7 14 6
1 8
3 10
5 12
7 14
9
11
13
15
Disk 0 Disk 1 Disk 2
Metadata Data
3 STOP © 2008 Julian Dyke juliandyke.com
Rebalancing
V$ASM_OPERATION
Contains details of ongoing rebalance operations
Column Name Data Type
GROUP_NUMBER NUMBER
OPERATION CHAR(5)
STATE VARCHAR2(4)
POWER NUMBER
ACTUAL NUMBER
SOFAR NUMBER
EST_WORK NUMBER
EST_RATE NUMBER
Estimate of
EST_MINUTE NUMBER remaining time
ERROR_CODE VARCHAR2(44)
3 © 2008 Julian Dyke juliandyke.com
Rebalancing
Power Limit
Power limit can be 0 to 11
0 disables rebalance operation
1 to 11 specifies number of ARBn background processes used for
rebalance
In Oracle 10.2
RBAL manages rebalance operation
Each ARBn background process is allocated a range of 128 allocation
units to rebalance
When complete another range is requested
AD lock is taken while an allocation unit is being rebalanced
Rebalance operations take much longer than theoretically necessary.
Possible reasons include:
Locking
GES updates with other ASM instances
Updates to RDBMS instance
4 © 2008 Julian Dyke juliandyke.com
Rebalancing
Summary
EST_MINUTES column of V$ASM_OPERATION is reasonably accurate
Allow a few minutes for SAN cache to stabilize
Check regularly for changes to estimate
ASM rebalance operations do not affect workload
Locks are only taken briefly
Lock mechanism has changed in Oracle 11.1
SAN cache and I/O performance will be affected
In Oracle 10.2 rebalancing is fastest if
Other ASM instances are shutdown
RDBMS instance is shutdown
Estimated completion time will be affected by:
Use of SAN cache and I/O by rest of workload
Rate of change by applications to blocks in ASM files being rebalanced
4 © 2008 Julian Dyke juliandyke.com
Redundancy
4 © 2008 Julian Dyke juliandyke.com
Redundancy
ASM Supports three levels of redundancy
External Redundancy
Implemented externally using storage layer
Most common configuration in production
Normal Redundancy
Two copies of each extent maintained in separate failure groups
Used with extended clusters
Used occasionally in production e.g. CERN
Increases CPU overhead on servers
High Redundancy
Three copies of each extent maintained in separate failure groups
Very rare in production
4 © 2008 Julian Dyke juliandyke.com
ASM Failure Groups - External Redundancy
Disk Group
Disk 1 Disk 2 Disk 3
4 © 2008 Julian Dyke juliandyke.com
ASM Failure Groups - Normal Redundancy
Disk Group
Failure Group 1 FailureGroup 2
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6
4 © 2008 Julian Dyke juliandyke.com
ASM Failure Groups - High Redundancy
Disk Group
Failure Group 1 Failure Group 2 Failure Group 3
Disk 1 Disk 2 Disk 1 Disk 2 Disk 1 Disk 2
4 © 2008 Julian Dyke juliandyke.com
Normal Redundancy
1 Disk Per Failure Group
Failure Group 1 Failure Group 2
0 0
1 1
2 2
3 3
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
Disk 0 Disk 1
Metadata Primary Secondary
Data Primary Secondary
4 © 2008 Julian Dyke juliandyke.com
Normal Redundancy
2 Disks per Failure Group
Failure Group 1 Failure Group 2
0 1 0 1
3 2 3 2
4 5 4 5
7 6 7 6
0 1 0 1
3 2 3 2
4 5 4 5
7 6 7 6
8 9 8 9
11 10 11 10
12 13 12 13
15 14 15 14
16 17 16 17
19 18 19 18
20 21 20 21
23 22 23 22
Disk 0 Disk 1 Disk 2 Disk 3
Metadata Primary Secondary
Data Primary Secondary
4 © 2008 Julian Dyke juliandyke.com
High Redundancy
1 Disk per Failure Group
Failure Group 1 Failure Group 2 Failure Group 3
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
Disk 0 Disk 1 Disk 2
Metadata Primary Secondary Tertiary
Data Primary Secondary Tertiary
4 © 2008 Julian Dyke juliandyke.com
References
Oracle Automatic Storage Management (Oracle Press)
Nitin Vengurlekar
Murali Vallath
Rich Long
What ASM and ZFS Can Do For You
Jason Arneil - Nominet
A Closer Look Inside Oracle ASM
Luca Canali - CERN
Implementing ASM Without HW Raid
Luca Canali - CERN
5 © 2008 Julian Dyke juliandyke.com
Thank you for listening
[email protected]
5 © 2008 Julian Dyke juliandyke.com