Recovery System:
Failure classification, Storage structure, Data access,
Recovery and atomicity, Checkpoints
Introduces the structures that
are used during database
recovery and describes the
Recovery Manager utility, which
simplifies backup and recovery
operations
An Introduction to Database Recovery
Structures Used for Database Recovery
Rolling Forward and Rolling Back
Improving Recovery Performance
Recovery Manager
Database Archiving Modes
Control Files
Database Backups
Survivability
An Introduction to Database Recovery
A major responsibility of the database administrator is to prepare for the
possibility of hardware, software, network, process, or system failure. If
such a failure affects the operation of a database system, you must usually
recover the database and return to normal operation as quickly as
possible.
Recovery should protect the database and associated users from
unnecessary problems and avoid or reduce the possibility of having to
duplicate work manually.
Recovery processes vary depending on the type of failure that occurred,
the structures affected, and the type of recovery that you perform. If no
files are lost or damaged, recovery may amount to no more than restarting
an instance. If data has been lost, recovery requires additional steps.
Errors and Failures
Several problems can halt the normal operation of an Oracle database or affect database I/O to disk. The
following sections describe the most common types. For some of these problems, recovery is automatic and
requires little or no action on the part of the database user or database administrator.
User Error
A database administrator can do little to prevent user errors (for example, accidentally dropping a table). Usually,
user error can be reduced by increased training on database and application principles. Furthermore, by
planning an effective recovery scheme ahead of time, the administrator can ease the work necessary to recover
from many types of user errors.
Statement Failure
Statement failure occurs when there is a logical failure in the handling of a statement in an Oracle program. For
example, assume all extents of a table (in other words, the number of extents specified in the MAXEXTENTS
parameter of the CREATE TABLE statement) are allocated, and are completely filled with data; the table is
absolutely full. A valid INSERT statement cannot insert a row because there is no space available. Therefore, if
issued, the statement fails.
If a statement failure occurs, the Oracle software or operating system returns an error code or message. A
statement failure usually requires no action or recovery steps; Oracle automatically corrects for statement failure
by rolling back the effects (if any) of the statement and returning control to the application. The user can simply
re-execute the statement after correcting the problem indicated by the error message.
Process Failure
A process failure is a failure in a user, server, or background process of a database instance (for example, an abnormal
disconnect or process termination). When a process failure occurs, the failed subordinate process cannot continue
work, although the other processes of the database instance can continue.
The Oracle background process PMON detects aborted Oracle processes. If the aborted process is a user or server
process, PMON resolves the failure by rolling back the current transaction of the aborted process and releasing any
resources that this process was using. Recovery of the failed user or server process is automatic. If the aborted
process is a background process, the instance usually cannot continue to function correctly. Therefore, you must shut
down and restart the instance.
Network Failure
When your system uses networks (for example, local area networks, phone lines, and so on) to connect client
workstations to database servers, or to connect several database servers to form a distributed database system,
network failures (such as aborted phone connections or network communication software failures) can interrupt the
normal operation of a database system. For example:
A network failure might interrupt normal execution of a client application and cause a process failure to occur. In this
case, the Oracle background process PMON detects and resolves the aborted server process for the disconnected
user process, as described in the previous section.
A network failure might interrupt the two-phase commit of a distributed transaction. Once the network problem is
corrected, the Oracle background process RECO of each involved database server automatically resolves any
distributed transactions not yet resolved at all nodes of the distributed database system
Database Instance Failure
Database instance failure occurs when a problem arises that prevents an
Oracle database instance (SGA and background processes) from
continuing to work.
An instance failure can result from a hardware problem, such as a power
outage, or a software problem, such as an operating system crash.
Instance failure also results when you issue a SHUTDOWN ABORT or
STARTUP FORCE command.
Recovery from Instance Failure
Crash or instance recovery recovers a database to its transaction-
consistent state just before instance failure. Crash recovery recovers a
database in a single-instance configuration and instance recovery recovers
a database in an Oracle Parallel Server configuration.
Recovery from instance failure is automatic. For example, when using the
Oracle Parallel Server, another instance performs instance recovery for the
failed instance. In single-instance configurations, Oracle performs crash
recovery for a database when the database is restarted (mounted and
opened to a new instance). The transition from a mounted state to an
open state automatically triggers crash recovery, if necessary.
Crash or instance recovery consists of the following steps:
Rolling forward to recover data that has not been recorded in the datafiles, yet has been recorded
in the online redo log, including the contents of rollback segments. This is called cache recovery.
Opening the database. Instead of waiting for all transactions to be rolled back before making the
database available, Oracle allows the database to be opened as soon as cache recovery is
complete. Any data that is not locked by unrecovered transactions is immediately available.
Marking all transactions system-wide that were active at the time of failure as DEAD and marking
the rollback segments containing these transactions as PARTLY AVAILABLE.
Rolling back dead transactions as part of SMON recovery. This is called transaction recovery.
Resolving any pending distributed transactions undergoing a two-phase commit at the time of the
instance failure.
As new transactions encounter rows locked by dead transactions, they can automatically roll back
the dead transaction to release the locks. If you are using Fast-Start Recovery, just the data block is
immediately rolled back, as opposed to the entire transaction.
Media (Disk) Failure
An error can arise when trying to write or read a file that is required to
operate an Oracle database. This occurrence is called media failure
because there is a physical problem reading or writing to files on the
storage medium.
A common example of media failure is a disk head crash, which causes the
loss of all files on a disk drive. All files associated with a database are
vulnerable to a disk crash, including datafiles, online redo log files, and
control files.
The appropriate recovery from a media failure depends on the files
affected.
How Media Failures Affect Database Operation
Media failures can affect one or all types of files necessary for the operation of an Oracle
database, including datafiles, online redo log files, and control files.
Database operation after a media failure of online redo log files or control files depends on
whether the online redo log or control file is multiplexed, as recommended. A multiplexed online
redo log or control file simply means that a second copy of the file is maintained. If a media
failure damages a single disk, and you have a multiplexed online redo log, the database can
usually continue to operate without significant interruption. Damage to a non-multiplexed online
redo log causes database operation to halt and may cause permanent loss of data. Damage to
any control file, whether it is multiplexed or non-multiplexed, halts database operation once
Oracle attempts to read or write the damaged control file (which happens frequently, for example
at every checkpoint and log switch).
Media failures that affect datafiles can be divided into two categories: read errors and write
errors. In a read error, Oracle discovers it cannot read a datafile and an operating system error is
returned to the application, along with an Oracle error indicating that the file cannot be found,
cannot be opened, or cannot be read. Oracle continues to run, but the error is returned each
time an unsuccessful read occurs.
Structures Used for Database Recovery
Several structures of an Oracle database safeguard data against
possible failures.
This section introduces each of these structures and its role in
database recovery.
Database Backups
Database Backups
A database backup consists of backups of the physical files (all datafiles and
a control file) that constitute an Oracle database. To begin media recovery
after a media failure, Oracle uses file backups to restore damaged datafiles
or control files. Replacing a current, possibly damaged, copy of a datafile,
tablespace, or database with a backup copy is called restoring that portion
of the database.
Oracle offers several options in performing database backups, including:
Recovery Manager
operating system utilities
Export utility
Enterprise Backup Utility
5 most common database challenges
1. Managing Scalability with Growing Data Volumes
2. Maintaining Database Performance
3. Database Access Concerns
4. Misconfigured or Incomplete Security
5. Data Integration and Quality Problems.
What are the security challenges in DBMS?
-such as data quality, intellectual property rights, and
database survivability