Although backup and
recovery tuning requires a good understanding of hardware and software used like
disk speed, IO, Buffering and/or MML used for net backup.
As many factors can affect backup performance. Often, finding the solution to a
slow backup is a Process of trial and error. To get the best performance for a
backup, follow the following
Suggested steps:
Step 1: Remove RATE Parameters from Configured
and Allocated Channels
The RATE parameter on
a channel is intended to reduce, rather than increase, backup throughput,
so that more disk bandwidth is available for other database operations.
If you’re backup is not streaming to tape, then make sure that the RATE
parameter is not set on the ALLOCATE CHANNEL or CONFIGURE CHANNEL
commands.
Step 2: Consider Using I/O Slaves
- If You
Use Synchronous Disk I/O, Set DBWR_IO_SLAVES
If and only if your disk does not support asynchronous I/O, then try setting
the DBWR_IO_SLAVES initialization parameter to a nonzero value. Any nonzero
value for DBWR_IO_SLAVES causes a fixed number (four) of disk I/O slaves to be
used for backup and restore, which simulates asynchronous I/O.
If I/O slaves are
used, I/O buffers are obtained from the SGA. The large pool is used, if
configured. Otherwise, the shared pool is used.
Note: By setting DBWR_IO_SLAVES, the database writer processes will use slaves
as well.
You may need to
increase the value of the PROCESSES initialization parameter.
- Use Tape
slaves To keep the tape streaming (continually moving) by simulating
Asynchronous I/O
Set the "init.ora" parameter:
BACKUP_TAPE_IO_SLAVES = true
This causes one tape I/O slave to be assigned to each channel server process.
In 8i/9i/10g, if the DUPLEX option is specified, then tape I/O slaves must be
enabled.
In this case, for DUPLCEX=<n>, there are
<n> tape slaves per channel. These N slaves
all operate on the same four output buffers.
Consequently, a buffer is not freed
up until all <n> slaves have finished
writing to tape.
Step 3: If You Fail to Allocate Shared Memory,
Set LARGE_POOL_SIZE
Set this
initialization parameter if the database reports an error in the alert.log
stating that it does not have enough memory and that it will not start I/O
slaves.
The message should resemble the following:
ksfqxcre: failure to
allocate shared memory means sync I/O will be used whenever async I/O to file
not supported native
When attempting to get shared buffers for I/O slaves, the database does the
following:
* If LARGE_POOL_SIZE is set, then the database attempts to get memory from the
large pool. If this value is not large enough, then an error is recorded in the
alert log, the database does not try to get buffers from the shared pool, and
asynchronous I/O is not used.
* If LARGE_POOL_SIZE
is not set, then the database attempts to get memory from the shared pool.
* If the database
cannot get enough memory, then it obtains I/O buffer memory from the PGA and
writes a message to the alert.log file indicating that synchronous I/O is used
for this backup.
The memory from the large pool is used for many features, including the shared
server (formerly called multi-threaded server), parallel query, and RMAN I/O
slave buffers. Configuring the large pool prevents RMAN from competing with
other subsystems for the same memory.
Requests for contiguous memory allocations from the shared pool are usually
small (under 5 KB) in size. However, it is possible that a request for a large
contiguous memory allocation can either fail or require significant
Memory housekeeping to
release the required amount of contiguous memory. Although the shared pool may
be unable to satisfy this memory request, the large pool is able to do so. The
large pool does not have a least recently used (LRU) list; the database does
not attempt to age memory out of the large pool.
Use the LARGE_POOL_SIZE initialization parameter to configure the large pool.
To see in which pool (shared pool or large pool) the memory for an object
resides, query V$SGASTAT.POOL.
The formula for setting LARGE_POOL_SIZE is as follows:
LARGE_POOL_SIZE =
number_of_allocated_channels *
(16 MB + ( 4 *
size_of_tape_buffer ) )
Step 4: Tune RMAN Tape Streaming Performance
Bottlenecks
There are several
tasks you can perform to identify and remedy bottlenecks that affect RMAN's
performance on tape backups:
Using BACKUP...
VALIDATE To Distinguish Between Tape and Disk Bottlenecks
One reliable way to determine whether the tape streaming or disk I/O is the
bottleneck in a given backup job is to compare the time required to run backup
tasks with the time required to run BACKUP VALIDATE of the same tasks.
BACKUP VALIDATE of a
backup to tape performs the same disk reads as a real backup but performs no
tape I/O. If the time required for the BACKUP VALIDATE to tape is significantly
less than the time required for a real backup to tape, then writing to tape is
the likely bottleneck.
Using Multiplexing to Improve Tape Streaming with Disk Bottlenecks
In some situations when performing a backup to tape, RMAN may not be able to
send data blocks to the tape drive fast enough to support streaming.
For example, during an incremental backup, RMAN only backs up blocks changed
since a previous data file backup as part of the same strategy. If you do not
turn on change tracking, RMAN must scan entire data files for changed blocks,
and fill output buffers as it finds such blocks. If there are not many changed
blocks, RMAN may not fill output buffers fast enough to keep the tape drive
streaming.
You can improve performance by increasing the degree of multiplexing used for
backing up. This increases the rate at which RMAN fills tape buffers, which
makes it more likely that buffers are sent to the media manager fast enough to
maintain streaming.
Using Incremental Backups to Improve Backup Performance With Tape Bottlenecks
If writing to tape is the source of a bottleneck for your backups, consider
using incremental backups as part of your backup strategy. Incremental level 1
backups write only the changed blocks from datafiles to tape, so that any
bottleneck on writing to tape has less impact on your overall backup strategy.
In particular, if tape drives are not locally attached to the node running the
database being backed up, then incremental backups can be faster.
Step 5: Query V$ Views to Identify Bottlenecks
If none of the
previous steps improves backup performance, then try to determine the exact
source of the bottleneck. Use the V$BACKUP_SYNC_IO and V$BACKUP_ASYNC_IO views
to determine the source of backup or restore bottlenecks and to see detailed
progress of backup jobs.
V$BACKUP_SYNC_IO contains rows when the I/O is synchronous to the process (or
thread on some platforms) performing the backup.
V$BACKUP_ASYNC_IO
contains rows when the I/O is asynchronous.
Asynchronous I/O is
obtained either with I/O processes or because it is supported by the underlying
operating system.
To determine whether your tape is streaming when the I/O is synchronous, query
the EFFECTIVE_BYTES_PER_SECOND column in the V$BACKUP_SYNC_IO or
V$BACKUP_ASYNC_IO view.
If
EFFECTIVE_BYTES_PER_SECOND is less than the raw capacity of the hardware, then
the tape is not streaming. If EFFECTIVE_BYTES_PER_SECOND is greater than the
raw capacity of the hardware, the tape may or may not be streaming.
Compression may cause the EFFECTIVE_BYTES_PER_SECOND to be greater than the
speed of real I/O.
Identifying
Bottlenecks with Synchronous I/O
With synchronous I/O, it is difficult to identify specific bottlenecks because
all synchronous I/O is a bottleneck to the process. The only way to tune
synchronous I/O is to compare the rate (in bytes/second) with the device's
maximum throughput rate. If the rate is lower than the rate that the device
specifies, then consider tuning this aspect of the backup and restore process.
The DISCRETE_BYTES_PER_SECOND column in the V$BACKUP_SYNC_IO view displays the
I/O rate. If you see data in V$BACKUP_SYNC_IO, then the problem is that you
have not enabled asynchronous I/O or you are not using disk I/O slaves.
Identifying
Bottlenecks with Asynchronous I/O
Long waits are the number of times the backup or restore process told the
operating system to wait until an I/O was complete. Short waits are the number
of times the backup or restore process made an operating system call to poll
for I/O completion in a nonblocking mode. Ready indicates the number of time
when I/O was already ready for use and so there was no need to made an
operating system call to poll for I/O completion.
The simplest way to identify the bottleneck is to query V$BACKUP_ASYNC_IO for
the datafile that has the largest ratio for LONG_WAITS divided by IO_COUNT.
Note:
If you have
synchronous I/O but you have set BACKUP_DISK_IO_SLAVES, then the I/O will be
displayed in V$BACKUP_ASYNC_IO.
Also the following is a recommended for
improving RMAN performance on AIX5L based system..
IBM suggestions the following
AIX related advice:
1. set AIXTHREAD_SCOPE=S in /etc/environment.
2. " ioo -o maxpgahead=256 " to set maxpgahead parameter
Initial settings were : Min/Maxpgahead 2 16
3. " vmo -o minfree=360 -o maxfree=1128 " to set minfree and
maxfree...
Initial settings were : Min/Maxfree 240 256
Getting %15-20 performance improvements on RMAN backup performance on AIX 5L
Based Systems.
Reference:
Advise On How To
Improve Rman Performance (Doc ID 579158.1)
|