All technological notes.
The database administrator (DBA) is typically responsible for ensuring that the database is open and available when users need it. To achieve that goal, the DBA (working with the system administrator):
Increase the mean time between failures (MTBF).
Real Application ClustersOracle Data GuardDecrease the mean time to recover (MTTR).
Separation of DBA Duties
The SYSBACKUP administrative privilege:
SELECT ANY TABLESYSBACKUP user that is created during database installationRMAN connections by a SYSBACKUP privileged user# Explicitly connect with the SYSDBA role:
$ rman target "'/ as sysbackup'"
# connected to target database: ORCL (DBID=1297344416)
# For backward compatibility
rman target / connects as SYSDBA.
Identify and prioritize critical data. 需求+错误类型
Base recovery requirements on data criticality.(first determine how critical a database that is lost or unavailable is to your business.)
Recovery Point Objective (RPO):
data lossdata warehousing reporting system that can tolerate 12 hours worth of lost data, because batch loads can be rerun with a few hours of downtime that is acceptable to the user community. A disk and tape backup strategy may be appropriate for this type of system.Recovery Time Objective (RTO):
Oracle Data Guard.backup retention policy for on-site, off-site, and long-term backups.Assess data protection requirements.
Failures can generally be divided into the following categories:
| Typical Problems | Possible Solutions |
|---|---|
| Attempts to enter invalid data into a table | Work with users to validate and correct data. |
| Attempts to perform operations with insufficient privileges | Provide the appropriate object or system privileges. |
| Attempts to allocate space that fails | Enable resumable space allocation. Increase owner quota. Add space to the tablespace. |
| Logic errors in applications | Work with developers to correct program errors. |
| Typical Problems | Possible Solutions |
|---|---|
| A user performs an abnormal disconnect. | A DBA’s action is not usually needed to resolve user process failures. |
| A user’s session is abnormally terminated. | Instance background processes roll back uncommitted changes and release locks. |
| A user experiences a program error that terminates the session. | The DBA should watch for trends. |
User processes that abnormally disconnect from the instance may have uncommitted work in progress that needs to be rolled back.The Process Monitor (PMON) background process periodically polls server processes to ensure that their sessions are still connected.
A DBA’s intervention should not be required to recover from user process failure, but the administrator must watch for trends.
| Typical Problems | Possible Solutions |
|---|---|
| Listener fails | Configure a backup listener and connect-time failover. |
| Network interface card (NIC) fails | Configure multiple network cards. |
| Network connection fails | Configure a backup network connection. |
listeners, network connections, and network interface cards reduce the chance that network failures will affect system availability.| Typical Problems | Possible Solutions |
|---|---|
| User inadvertently(疏忽地) deletes or modifies data | Roll back a transaction and dependent transactions or rewind the table |
| User drops a table | Recover the table from recycle bin; Recover the table from a backup |
Oracle LogMiner to query your online redo logs and archived redo logs through an Enterprise Manager or SQL interface.online redo logs longer than it persists in undo segments; if you have configured archiving of redo information, redo persists until you delete the archived files.Users who drop a table can recover it from the recycle bin by flashing back the table to before the drop. 回收垃圾箱
PURGE option, the dropped table can still be recovered by using point-in-time recovery (PITR) if the database has been properly configured. 也可以用 PITRRMAN enables you to recover one or more tables or table partitions to a specified point in time
without affecting the remaining database objects.LogMinerInstance failure
Typical Causes
SHUTDOWN ABORT and STARTUP FORCE)Possible Solutions
STARTUP command. Recovering from instance failure is automatic, including rolling forward changes in the redo logs and then rolling back any uncommitted transactions.alert log, trace files, and Enterprise Manager.Administrator involvement in recovering from instance failure is rarely required if Oracle Restart is enabled and is monitoring your database.
media failure
Typical Causes:
Possible Solution
redo information (if necessary).checkpoint (CKPT) procossRodo log files and the Log Writer (LGWR) process
CKPT is responsible for:
data file headers with checkpoint informationcontrol files with checkpoint informationDBWn at full checkpointsincremental checkpoint:
CKPT process stores data in a control file to document the modified data blocks that DBWn has written from the SGA to disk.The purpose of a checkpoint is to identify that place in the online redo log file where instance recovery is to begin(checkpoint position).
log switch, the CKPT process also writes this checkpoint information to the headers of data files.Checkpoints exist for the following reasons:
online redo log file entries following the last checkpoint need to be processed for recovery)The checkpoint information written by the CKPT process includes:
system change number (SCN),online redo log file to begin recovery,Note: The CKPT process does not write data blocks to the disk or redo blocks to the online redo log files.
Redo log files
transactions and internal Oracle server actions.
transaction: a logical unit of work consisting of one or more SQL statements.Log Writer (LGWR) writes:
DBWn writesredo log consists of groups of redo log files.
redo log file and its multiplexed copies.
Log Writer (LGWR) process
redo log buffer to all members of a redo log group until the files are filled or a log switch operation is requested.一个组一个组地写入, 写入一个组中的所有文件Best practice tip: If possible, multiplexed redo log files should reside on different disks.

As modifications are made to data in the database, the “old” data is stored in the undo tablespace
and the new change details in online redo log files (as shown in the graphic).
The database maintains online redo log files to protect against data loss.
Because the online redo log files are reused in a circular fashion, there is a protocol for controlling when one is allowed to be reused.
In ARCHIVELOG mode, the database only begins writing to an online redo log file if it has been archived.This ensures that every redo log file has a chance to be archived.
NOARCHIVELOG mode-if, of course, the database is not open.NOARCHIVELOG mode, you must restore all data files before executing a recovery operation.
ARCHIVELOG mode.Use cases
| NOARCHIVELOG mode | ARCHIVELOG mode |
|---|---|
| Closed database | Open database |
| Recovery to the last backup | Recovery to last committed transaction |
| Suitable for training and test environments, for data warehouses with infrequent loads | Suitable for production environments |

Involves two distinct operations:
control files and then attempts to open the data files.When it discovers that the data files have not been synchronized during shutdown, the instance uses information contained in the redo log groups to roll the data files forward to the time of shutdown.
Crash recovery
online redo records to a database after either a single instance database crashes or all instances of an Oracle Real Applications Cluster configuration crash.redo from the online logs; 只需要 online redoarchived redo logs are not required.Instance recovery:
Oracle RAC configuration, the application of redo data to an open database by an instance when this instance discovers that another instance has crashed
For an instance to open a data file, the system change number (SCN) contained in the data file’s header must match the current SCN that is stored in the database’s control files.
online redo logs, sequentially “redoing” transactions until the data files are up-to-date.
data files have been synchronized with the control files, the database is opened, and users can log in.
unmatch -> redo -> open -> rollback -> committed data
checkpoint position and the end of the redo log must be applied to data files.You tune instance recovery by controlling the difference between the checkpoint position and the end of the redo log.
Transaction information
redo log groups before the instance returns “commit complete” for a transaction.先于返回信息redo log groups guarantees that the transaction can be recovered in case of a failure.在 redo log 中的信息保证可以恢复must also be written to the data file.
redo log groups
data file write process is much slower than the redo writes.checkpoint process records information in the control file about the checkpoint position in the redo log.

The time required for instance recovery is the time required to bring data files from their last checkpoint to the latest SCN recorded in the control file.
MTTR target (in seconds) and through the sizing of redo log groups.
0 (disabled).The maximum value is 3,600 seconds (one hour).
FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery
time from instance or system failure.The MTTR Advisor converts the FAST_START_MTTR_TARGET value into several parameters to enable instance recovery in the desired time (or as close to it as possible).
FAST_START_MTTR_TARGET parameter to 0 disables the MTTR Advisor.FAST_START_MTTR_TARGET parameter must be set to a value that supports the service level agreement for your system.
Enterprise Manager Cloud Control, navigate as follows:
recovery unit includes two major types of activities: restoring and recovering.
Restoring a file:
RMAN uses the records of available backup sets or image copies in the RMAN repository to select the best available backups. - If two backups are from the same point in time, then RMAN prefers image copies over backup sets because RMAN can restore them more quickly (similar for disk versus tape). -RMAN automatically uses restore failover to skip corrupted or inaccessible backups and look for usable backups.RMAN skips restoring a data file if the file is present in the correct location and its header contains the expected information, and so on.Recovering the file entails applying redo such that the state of the file is brought forward in time, to whatever point you want.

Recovery can have two kinds of scope:
Complete recovery:
database or tablespace up to the present, including all committed data changes made to the point in time when the recovery was requestedIncomplete or point-in-time recovery (PITR):
database or tablespace up to a specified point in time in the past, before the recovery operation was requested
The following steps describe what takes place during complete recovery:
incremental backups, archived redo log files, and online redo log files are applied as necessary.
redo log changes are applied to the data files until the current online log is reached and the most recent changes have been reapplied.Undo blocks are generated during this entire process.rolling forward or cache recovery.data files may now contain committed and uncommitted changes.undo blocks are used to roll back any uncommitted changes.
transaction recovery.data files are now in a recovered state and are consistent with the other data files in the database.Incomplete recovery, or database point-in-time recovery (DBPITR)
redo records generated after the most recent backup.To perform point-in-time recovery, you need:
backup of all the data files made before the recovery pointarchived logs from the time of the backup until the specified time of recovery
The steps to perform a point-in-time recovery are as follows:
Restore the data files from backup:
OS commands or using the RMAN RESTORE command.Use the RECOVER command:
redo from the archived redo log files, including as many as necessary to reach the restore point destination.State of over-recovery:
Use the ALTER DATABASE OPEN command:
Apply undo data:
Process complete:
Oracle Flashback Database is the most efficient alternative to DBPITR.
data files to their contents at a past timeDBPIIR, including the OPEN RESETLOGS,Flashback Database is typically faster
data files and requires only limited application of redo compared to media recovery.| Backup and Recovery Objective | Recovery Time Objective (RTO) | Oracle Solution |
|---|---|---|
| Physical data protection | Hours/Days | Recovery Manager; Oracle Secure Backup |
| Logical data protection | Minutes/Hours | Flashback Technologies |
| Recovery analysis | Minimize time for problem; identification and recovery planning | Data Recovery Advisor |
| Disaster Recovery Objective | Recovery Time Objective (RTO) | Oracle Solution |
|---|---|---|
| Physical data protection | Seconds/Minutes | Data Guard;Active Data Guard |
RTO:
Oracle Recovery Manager (RMAN)
Oracle Secure Backup (OSB):
Oracle Database Flashback technologies:
Data Recovery Advisor:
Data Guard and Active Data Guard:
standby databases to be open for read access while being kept synchronized with the production database through media recovery.Oracle Flashback technology:
Use Flashback technology for:
For error analysis:
For users who have committed erroneous changes, use the following to analyze the errors:
Flashback Query:
SELECT command with the AS OF clause references a time in the past through a time stamp or system change number (SCN).Flashback Version Query:
VERSIONS BETWEEN clause of the SELECT command (for performance reasons with existing indexes).Flashback Transaction Query:
For error recovery:
Possible solutions to recover from user error:
Flashback Transaction Backout:
Flashback Table: