Oracle RMAN 11g Backup and Recovery- P12

After the split is complete, you pull the database out of hot backup mode with the following command: alter database end backup; Control Files A split mirror copy of a control file is in

Trang 1

518 Part IV: RMAN in the Oracle Ecosystem

ync and split technology is an example of an innovative (and challenging) solution for storage recovery that complements or duplicates many of the features RMAN can accomplish independently Over the past five years, sync and split has become

a widely used technology to provide immediate and very fast system recovery at the storage hardware level

In this chapter, we will provide an overview of what sync and split technology refers to We won’t be discussing any single implementation in particular, but rather discussing the implications for RMAN and database backups After the overview, we go into the specific steps required to integrate sync and split solutions into an RMAN backup strategy

Sync and Split: Broken Mirror Backups

In the beginning, doing sync and split backups involved nothing more complicated than extending the functionality of hardware mirroring The best way to explain this statement is through an example Suppose we have a disk controller that has two hard drives For redundancy, we set the RAID level to 0 + 1 so that we are mirroring everything on disk A to disk B This gives us immediate protection against any kind of hardware failure on either disk A or disk B

The next step, then, is to try to leverage the hardware mirror to provide logical fault tolerance That is the goal of sync and split technology: to provide a fallback position in case of some failure that has occurred on both copies in the mirror For example, suppose that a user has deleted the entire oracle software tree or the oradata directory Such a deletion would immediately occur at both copies in our mirror, so having a mirrored copy would do us no good

So, what is the solution? The innovation is that any mirrored disk group may have two mirror groups, but may only ever have one mirror currently writing the identical bits as the primary disk group Let’s build an example with three logical volumes, A, B, and C, all dedicated to the same data Volumes A, B, and C are all mirrored copies of each other However, at 2 P.M., volume A is split away from the mirror, leaving its bits “stuck” at the split time Volumes B and C continue to

be bit-for-bit copies After four hours, at 6 P.M., volume C is split from volume B so that it no longer gets writes of data At this point, there are three different copies of the data on the volume:

a copy at 2 P.M., a copy at 6 P.M., and a current copy There is also no redundancy to protect against a disk failure

S

Where Are We in RAID?

Need a superfast, overly simplistic primer on RAID? We’re here for you There are hundreds of theories, from the radical to the traditional, that outline the best possible solution for disk failure protection Typically, the Oracle “technorati” have long taken the position that nothing beats RAID 0 + 1, in which you have two disk groups, group 1 and group 2, both of which have two disks The two disks on group 1 are striped, so that data is evenly spread across both disks

Group 2 is an exact copy, bit for bit, of group 1 This configuration gives us both performance,

by striping across disks to avoid hot spots, and redundancy, by writing every bit twice

Recently, we were reviewing the specs for a RAID 1 + 0 configuration, which is slightly different from 0 + 1 Instead of striping and then mirroring, a RAID 1 + 0 configuration mirrors and then stripes The difference is best represented visually, as shown in the following

illustration Here, we mirror each disk separately so that we end up with four disk groups After mirroring each disk, we then stripe across the four mirrors

Trang 2

Chapter 22: RMAN in Sync and Split Technology 519

To get back to our RAID 0 + 1 configuration, disk volume A will be “resilvered” up to disk B, which runs at the current point in time This sync up is based on the fact that the volumes have a journaling mechanism in place that records all data changes This journaling is more I/O on top of the multiple writes to each volume Volume A will get access to the journals of changes on volume B and will apply all the changes until it is getting live writes at the same time as volume B At this point, then, you have volumes A and B in redundant mode, and volume C is your fallback position,

at 6 P.M Figure 22-1 illustrates this process

It might seem like a small difference, but RAID 1 + 0 has greater fault tolerance, because the failure of any one disk does not take down the other mirrored disks In RAID 0 + 1, if any disk in group 2 fails, the whole group goes offline So, RAID 1 + 0 provides greater tolerance than RAID 0 + 1 for multiple disk failure, instead of single disk failure

Trang 3

This sync and split cycle goes on and on, ad infinitum Every four hours, a volume is synced

up to the primary volume, and another volume is split away to provide a fallback position in case

of a logical failure What happens at the time you actually encounter the logical failure? In our example, let’s assume that it is now 8 P.M Volumes A and B are getting concurrent writes, and volume C is waiting idle at 6 P.M At 8 P.M., a DBA is doing some system maintenance and deletes the system datafile from the production database This is when the worrying begins

Luckily, no unrecoverable data has been added to the database since the end of the day at

5P.M However, the nightly batch loads start in about 15 minutes The DBA has a small window

to get the production database back up and running

With the database running entirely on the mirrored disk volumes A and B, the sync and split architecture has given our DBA an immediate solution He immediately configures volume C, which was stuck at 6 P.M., as the primary volume and starts up the database When the database looks for its datafiles, it finds all the files as they appear on volume C, at 6 P.M., and no deletes have taken place By the time the DBA is finished, it is only 8:05 P.M The batch processes will kick off on time Figure 22-2 shows the process

Oracle Databases on Sync and Split Volumes

The Oracle software files can reside on a sync and split volume and thus can help protect against logical corruption that occurs in the binaries themselves No additional configuration is needed, from an Oracle perspective The files associated with an Oracle database, on the other hand, come with some very specific caveats and disclaimers when you start putting them on sync and split volumes These caveats and disclaimers relate to the fact that Oracle files are always open and always have active writes taking place (this being the primary importance of a good relational database) So, if you are actively writing to your database and it is mirrored on two drives, there will be consequences if you suddenly break the mirror, unbeknownst to the database

Each vendor-specific solution is a bit different, but at some point, a volume that is getting active writes must turn off the writes to that volume while continuing to allow writes to another volume And regardless of how a salesperson might pitch it, the process of breaking a mirror is not instantaneous Breaking a mirror is more like peeling a banana—you start at the top and

FIGURE 22-2 Sync and split in action

Trang 4

separate the peel from the fruit until you get to the bottom Suppose your Oracle datafile is the fruit, and the mirrored copy of the datafile is the peel If you peel away the mirror copy, you are starting at the beginning of the datafile, and the break is complete when you reach the end of the datafile However, it is possible (likely) that Oracle will attempt to write to a block while the mirror is in the middle of peeling away So, on the primary volume, nothing is wrong—the file header knows that an SCN has been advanced in the file and knows which block it was—but on the split mirror, the datafile header knows nothing about the written block So, after the mirror break is complete, what do we have on the split mirror volume? One fuzzy datafile that is unrecoverable Check out Figure 22-3 to see this

Fear not, for there are ways to ensure that the split mirror is a healthy copy of the database It just takes a bit of work first How you configure Oracle database files in a sync and split environment depends on what type of files you are configuring: datafiles, control files, redo log files, or archive logs The following sections address each in turn

Datafiles

The previous section explained what happens to Oracle datafiles if a mirror split takes place without any preparation: the split volume copies of the files are left in a fuzzy, unusable state This is precisely the same predicament you run into if you simply take a copy of an online datafile without first putting it into hot backup mode So, before you break the mirror, you must put all datafiles into hot backup mode This is not an optional step, regardless of which vendor product you are using Because the split generally takes a very short time, the amount of time in hot backup mode is much shorter than it would be if you were doing a copy against the same datafiles And the I/O hit of running in backup mode (and producing more archive logs) will

be relatively small, as well

FIGURE 22-3 Unrecoverable fuzzy datafile

Trang 5

To alleviate the headaches of hot backup mode for those implementing sync and split architectures, Oracle has added syntax that allows you to put an entire database into hot backup mode with a single command:

alter database begin backup;

Previously, you had to put each tablespace into hot backup mode If there is something preventing

the file from going into backup mode, a warning is generated in the alert log, but the begin backup

command proceeds anyway

After the split is complete, you pull the database out of hot backup mode with the following command:

alter database end backup;

Control Files

A split mirror copy of a control file is in an unusable state immediately after the split mirror operation completes The control file, in general, is up-to-date on the current state of all the datafiles However, based on the total duration of the split itself, and the overall activity on the database at the time of the split, the control file at the split volume may not reflect much accurate data about the state of the datafiles

Putting the database into hot backup mode cures most of these ills With the database in hot backup mode, the control file is aware of a starting point at which recovery will be required, and from which it will be feasible However, the control file is still at odds with reality: it thinks of itself as a current control file of an active database This is hardly the case

We’ve seen some implementations where a DBA insists on trying to keep the current control file available as such on the split volume, particularly if the split volume will be used for reporting purposes However, when the time comes to put this control file into service for the sake of recovery,

you have to use the using backup controlfile command so that the control file understands that

some of its checkpoint and SCN information may not reflect reality:

recover database using backup controlfile until cancel;

If you will be mounting the Oracle database on the split mirror volume for reporting purposes,

you may want to use the using backup controlfile command, even if you will not be applying any

archive logs, just so the control file is flagged as a backup We discuss this later in the section

“Benefits of the Split Mirror Backup.”

Redo Log Files

Split mirror copies of the online redo logs are useless in every way, shape, and form If possible, don’t even bother putting them on the volume that is going through the sync and split There is

no mechanism in the online redo logs to account for writes to the file during the split operation

Archive Logs

Archive logs are an excellent candidate to be put on a sync and split volume Doing so gives you

a backup of existing archive logs on disk in a second location Of course, if you split the archive log volume at the same time as the datafile volume, you do not get all the redo that you need to properly recover your database from the split volume We suggest that you keep your archive logs

on a separate set of sync and split volumes from the set on which you keep your datafiles and

Trang 6

control files That way, you can split the datafiles, take the database out of hot backup mode, force a log switch, and then split the archive log volumes Then the split mirror volume with the archive logs contains all of the redo required to start the split mirror copy of the database

One last note on archive logs on split mirror volumes When the database begins to create

an archive log on disk, the split operation may leave behind an unfinished archive log on the split mirror volume This archive log would be unusable during any recovery operation This poses a problem only for human-managed backup and recovery operations, where it is unknown if the archive log that is on-disk is complete or only half-written Here’s why it doesn’t pose a problem for RMAN: When an archive log is being generated, the control file is not updated with a record that such an archive log exists until the archive log is complete Therefore, in a split mirror scenario, if half of an archive log is generated on the split volume, the control file on the split volume has no record of that archive log During an RMAN operation, then, the control file would be consulted for archive log records, and the half-written file would not exist in the metadata To RMAN, the half-written file doesn’t really exist

Benefits of the Split Mirror Backup

We’ve discussed briefly the primary benefit of using the sync and split architecture: a nearly instantaneous fallback recovery point for all files on a particular set of disks This benefit expands beyond the scope of this book (the Oracle database) to include a fallback point for all files that exist on the volume There are also other primary benefits of the sync and split, which are discussed next

Fast Point-In-Time Recovery

From the database perspective, sync and split provides a point-in-time recovery option that can take minutes instead of hours You simply change the primary disk group to the split mirror, and the datafiles are ready Then, apply archive logs up to the point where the failure occurred, and you can open the database

Speedy-Looking Backups

Another benefit of the sync and split architecture is the relative speed of the backup operation itself Properly generating copies of the database files at the split mirror side takes only a few moments with the database in hot backup mode After that, a backup is ready to be pressed into service very quickly Of course, there’s no magic involved with sync and split I/O is I/O is I/O It might look like the backup is taking no time at all, but in reality the backup is being taken all the time at the hardware level, because prior to the split operation, the files are being written to simultaneously However, handing the backups over to the hardware architecture can prove to be extremely powerful in many organizations, where the hardware can be responsible for backing up more than just the database

Mounting a Split Mirror Volume on Another Server

Beyond the simplistic restore and recovery features, much of the true power of sync and split solutions currently in the marketplace comes from what you can do with the split copy of the database Because the underlying hardware is likely to be a storage array with many computers connected to it, any volume on that storage array can theoretically be associated with any computer connected to it

Trang 7

For example, let’s take a database, PROD PROD resides on disks in volume A, which is mirrored on volume B Both volume A and B are connected to server Dex Volumes A and B both exist on storage array Newton At 2 P.M., volume B is split from volume A and disassociated from server Dex Immediately after this, volume B is mounted on a different server, Proto, which is also connected to storage array Newton After volume B is mounted on Proto, a copy of the database PROD that resided on Dex now resides on Proto, with almost real-time amounts of data The database copy that is on volume B, and mounted by server Proto, can be recovered and then opened for testing, development, or reporting Later, at 6 P.M., when it is time to resilver volume

B with volume A, Proto can dismount volume B, and then it can be remounted by Dex The sync operation takes place, overwriting any changes that occurred on volume B after the split at 2 P.M.Note that before you can open a split mirror copy of the database on a different node, a new backup control file should be taken and used When you resilver volume B with volume A, this new copy will be overwritten by the correct file on A

Taking Backups from the Split Mirror

Another benefit of sync and split backups, within the framework of this book about RMAN, is the ability to mount the split volume on a different server and, from there, back up the database to tape for long-term backup storage This allows you to offload the memory, CPU, and I/O operations

of the RMAN backup to a completely different server and ensure that there is no impact to your production database

RMAN and Sync and Split

There are a few different contact points that RMAN has with a sync and split implementation:

If you use RMAN for recovery, you must make RMAN aware of the datafile copies that are created by the split operation

You can use RMAN to take backups from the split mirror volume instead of from the production database itself

Registering Split Mirror Copies with RMAN

If you are a dedicated RMAN user, then you probably understand the benefits that come from executing all recovery statements from within RMAN, instead of from SQL*Plus or elsewhere RMAN recovery provides access to the information in the control file so that you are not scrambling

to uncover which backups exist where and trying to ensure that you are not missing any files The control file also aids in archive log management during recovery When a sync and split system is

in place, RMAN doesn’t know about everything The act of splitting the mirror volumes effectively gives you a full datafile copy of every datafile in the database that can be used during a restore/recovery operation, but RMAN has no idea these copies exist

So, you have to make RMAN aware You do this by registering the datafile copies with RMAN

via the catalog command The catalog command can be used against a single datafile copy:

catalog datafilecopy '/volumeA/oradata/system01.dbf';

Or, starting with 10g, you can catalog an entire directory by the directory name:

catalog start with '/volumeA/oradata';

■

Trang 8

By using the catalog command, you take the split mirror copies and make them part of any future

restore or recovery operation that might be required

You might be asking yourself, “Why do I need to make RMAN aware of the split mirror copies when I can just remount the entire volume as the primary volume and be up and running without RMAN’s help?” A valid question But what if it makes more sense to switch to only a single copy

of the file? Perhaps doing a full database point-in-time recovery would be too expensive, but you still want to leverage the split mirror copy of a subset of files Beyond that, RMAN also greatly simplifies the recovery stage of any operation, so it makes sense to make RMAN aware of the copies of the archive logs, as well

Taking RMAN Backups from the Split Mirror

With increasing frequency, DBAs are realizing that with split mirror investments, an additional layer of protection is required, in the form of RMAN backups of the database The split mirror backup is by definition a short-lived copy—sooner or later, it will be lost when the volume is resilvered with the primary database volume But what about restoring from last night? Or last week? As you can see, a full-fledged media backup is still required

With an idle copy of the database simmering on the back burner of the split mirror, a light bulb appears above the DBA’s head: “I should just mount the split mirror drive onto a different server, and take the RMAN backup from the split mirror directly to tape (or to a different disk volume that can be mounted on the primary).” Great idea! Sounds simple enough, right? Well,

a few tricky points need to get worked out first; otherwise, you will have the case of the mysteriously disappearing backups

Here’s the problem: RMAN accesses the control file to determine what to back up, and after the backup is complete, it updates the control file with the details of the backup If you are connected to a split mirror copy of the control file, that copy gets updated with the details about the backup So then, of course, when you go to resilver the split volume with the primary, the control file is overwritten with the data in the primary control file, and the backup data is lost forever

The solution, you figure, is to use a recovery catalog when you back up at the split mirror That is a sound, logical decision: after the backup is complete, the split volume control file is updated with the backup records, which are then synchronized to the catalog Then, it’s simply a matter of syncing the catalog with the primary volume so that the backups can be used Too cool!

So, suppose that you back up from the secondary volume, you sync the backup records to your recovery catalog, and then, you connect RMAN to the primary volume database and to the catalog You perform a resync This is where things get really, really weird Sometimes, when you try to perform an operation, you get this error:

RMAN-20035: invalid high recidOther times, things work just fine, it seems, but the backups you took at the split mirror database have disappeared from the recovery catalog

The problem, now, has become the internal mechanism of how RMAN handles record building in the control file and the recovery catalog Every record that is generated gets a record

ID (RECID), which is generated at the control file When the backup occurs at the split mirror database, the control file gets its high RECID value updated, and this information gets passed to the catalog But the RECID at the primary database control file has not been updated, necessarily

So, when you connect to the catalog and the primary database, if the catalog’s high RECID is higher than the one in the control file, you get the “invalid high recid” error If the RECID in the

Trang 9

catalog is lower than the RECID of the primary database control file, RMAN initiates an update of the catalog that effectively eliminates all the records since the last sync operation with the primary control file Poof! Backup records from the split volume are gone

The solution to this problem is to set the control file at the split mirror to become a backup control file If RMAN detects that it is backing up from a noncurrent control file (backup or standby), it does not increment the RECID in the catalog, so that the records are available after

a resync with the current control file at the primary database

You cannot use the control file autobackup feature if you will be taking backups from the split mirror volume Because the control file in use is a backup control file, autobackup is disallowed

RMAN Workshop: Configure RMAN to Back Up

from the Split Mirror

Workshop Notes

This workshop assumes that you put all the tablespaces into hot backup mode (a requirement) during the period of the split After the split, you connect the split volume to a new server that has

10g installed, and you now want to take an RMAN backup Because RMAN will give an error if

files are in backup mode, you need to manually end backup for every file, as described in this workshop It’s best to write a script for this This workshop also assumes that you split the archive log destination and bring it across to the clone at the same time for archive log backup

Step 1 Mount the database on the clone server, and prepare the control file for RMAN backup:

startup mount;

alter database end backup;

recover database using backup controlfile until cancel;

cancel exit

Step 2 Connect RMAN to the clone instance (as the target) and the recovery catalog, and run

the datafile backup:

rman target / rman> connect catalog rman/password@rman cat db rman> backup database plus archivelog not backed up two times;

Step 3 Connect RMAN to the production database (as the target) and the catalog, perform a

sync operation and archive log cleanup, and then back up the control file:

rman target / rman> connect catalog rman/password@rman cat db rman> delete archivelog completed before sysdate -7;

rman> backup controlfile;

rman> resync catalog;

Trang 10

Getting Sync and Split Functionality from Oracle Software

There is considerable upside to having a hardware solution provide the architecture described

in this chapter Typically, any operation that can be done purely at the hardware level will have performance increases over the same operation done by software By the same token, a hardware solution is always going to cost you more than a software solution Sync and split solutions are

no different—the more work that is being done at the storage array, the faster it will go…and the more it will cost

Starting with Oracle Database 10g Release 2, Oracle includes a full solution to provide sync

and split functionality without paying for any third-party hardware or software solutions All you

need is Oracle Database 10g Enterprise Edition, two servers (with the same OS), and a storage array.

Using a Standby Database, Flashback Database, and Incremental Apply for Sync and Split

To implement a sync and split solution using only Oracle software, you need to employ a different feature set within the RDBMS: a standby database, Flashback Database, and RMAN incremental backup and incremental apply All of these features have already been discussed to some extent

in previous chapters

Here’s how it works First, you create a standby database of your production database (see the workshops in Chapter 20) Once you have the standby database fully operational as a disaster recovery solution, you need to implement Flashback Database on both production and standby databases:

alter database flashback on;

With Flashback Database enabled, you can set a restore point on the primary server:

create restore point chapter 20;

alter system switch logfile;

Apply changes through the restore point to the standby database At this point, the standby database can be opened with reset logs for testing or reporting

alter database activate standby database;

To resilver your standby database with the primary database, you need to take an incremental

backup by using the from scn keywords to specify the SCN of the restore point Once this backup

is complete, move it to the standby database site

backup database incremental from scn 120000;

At the standby database, shut down and then remount the database again Perform a flashback database to the restore point specified before the standby database was opened:

flashback database to restore point chapter 20;

Once the flashback completes, apply the incremental backup from the production database to the standby database, bringing it up to the point of the backup:

recover database until scn 1521321;

Trang 11

Then, the standby database can go back into managed standby mode and catch up to the production database Or, it can simply be opened again for reporting, now with all of the latest data imported from the incremental backup Figure 22-4 illustrates how this process might work

Benefits of the Oracle Sync and Split Solution

Being less expensive isn’t the only thing going for the Oracle sync and split solution While most likely there are performance drop-offs related to using the standby database/Flashback Database/incremental apply solution, those drop-offs might be less dramatic than you think This depends entirely on whether you are already using flashback logs for the inherent functionality provided

by them If you are, then you already have two journals of database changes: the flashback logs and the redo logs Any more journaling at the file system level only adds additional—and redundant—journaling and can be eliminated

In addition, you now have a standby database, which you can use for disaster recovery Although disaster recovery is inherent in the hardware sync and split model as well, having a

FIGURE 22-4 Using sync and split with a standby database and Flashback Database

Trang 12

standby database at your disposal means that much of the manual footwork involved in failing over during an actual disaster is automated and simplified

Ultimately, deciding between a fully Oracle solution and a hardware solution will come down to other factors, as well Is the sync and split architecture needed for things other than the Oracle databases? Do you have licensing for the additional Enterprise Edition database? Do you have the expertise to use one solution over the other? You would need to address these questions, obviously More than anything else, though, you would want to test the solutions The good news about the Oracle solution is that you probably already have all the requirements to test it right now

Oracle-Integrated Shadow Copy Services for Windows

An interesting example of the direction of sync/split type of hardware/OS integration can be seen

in the integration Oracle 11g has down with the Volume Shadow Copy Service (VSS) functionality

on the Windows platform VSS is a capability that allows for background journaling, much like other vendors’ mirroring functions, which can then be split off as a separate volume and moved

to a different location on a storage array VSS as a component of the Windows OS offers the ability to coordinate activities between storage writers (the Oracle database) and storage providers (the storage array technologies) It can coordinate component-based shadow copies, meaning that

it doesn’t have to understand the world only as a set of volumes; VSS can be informed of the components on the volume and act accordingly

Oracle created a plug-in for VSS called the Oracle VSS Writer, a separate Windows service that runs independently from the Oracle Database service The Oracle VSS Writer coordinates the specific activities required to take a VSS copy of the database

Oracle VSS Writer is capable of making either component-level backups (i.e., file by file, such

as datafiles and control files) or full volume backups When making component-level backups of datafiles, the VSS Writer keeps track of redo generated separately from existing mechanisms, and then, during restore, it applies the redo automatically to the components that were backed up When VSS is making a full volume backup, nothing magical is occurring here A database’s data blocks can still be caught in mid-write, and therefore fuzzy, by the VSS Writer So the Oracle VSS Writer still does the same things we’ve discussed so far in this chapter: it puts datafiles into hot backup mode for the duration of the datafile backup, so that the archive logs will have full copies of changed blocks to overwrite any fuzzy blocks

The difference is the level of integration that we are starting to see—as the sync/split technologies offer better interface points for their technologies, as Microsoft has done, it allows Oracle to provide better automation of tasks that otherwise would have to be scripted separately by the system administrator or DBA

Summary

In this chapter, we covered how a hardware sync and split architecture would impact your backup and recovery solutions We discussed how to implement sync and split with the Oracle database and how to take RMAN backups from a split mirror copy of the database Finally, we discussed how to use an existing Oracle RDBMS to implement a software-based sync and split environment

Trang 13

This page intentionally left blank

Trang 14

23

RMAN in the Workplace:

Case Studies

Trang 15

e have covered a number of different topics in this book, and we are sure you have figured out that you might face almost an infinite number of recovery combinations

In this chapter, we provide various case studies to help you review your knowledge

of backup and recovery (see if you can figure out the solution before you read it) When you do come across these situations, these case studies may well help you avoid some mistakes that you might otherwise make when trying to recover your database You can even use these case studies to practice performing recoveries so that you become an RMAN backup and recovery expert

Before we get into the case studies, though, the following section provides a quick overview about facing the ultimate disaster, a real-life failure of your database

Before the Recovery

Disaster strikes Often, when you are in a recovery situation, everyone is in a big rush to recover the database Customers are calling, management is panicking, and your boss is looking at you for answers, all of which is making you nervous, wondering if your résumé is up to date When the real recovery situation occurs, stop Take a few moments to collect yourself and ask these questions:

1 What is the exact nature of the failure?

2 What are the recovery options available to me?

3 Might I need Oracle support?

4 Is there anyone who can act as a second pair of eyes for me during this recovery?

Let’s address each of these questions in detail

What Is the Exact Nature of the Failure?

Here’s some firsthand experience from one of the authors Back in the days when I was contracting,

I was paged one night (on Halloween, no less!) because a server had failed, and once they got the server back up, none of the databases would come up Before I received the page, the DBAs at this site had spent upward of eight hours trying to restart the 25 databases on that box Most of the databases would not start The DBAs had recovered a couple of the seemingly lost databases, yet even those databases still would not open The DBAs called Oracle, and Oracle seemed unsure as to what the problem was Finally, the DBAs paged me (while I was out trick-or-treating with my kids).Within about 20 minutes after arriving at the office, I knew what the answer was I didn’t find the answer because I was smarter than all the other DBAs there (I wasn’t, in fact) I found the answer for a couple of reasons First, I approached the problem from a fresh perspective (after eight hours

of problem solving, one’s eyes tend to become burned and red!) Second, I looked to find the nature of the failure rather than just assuming the nature of the failure was a corrupted database.What ended up being the problem, pretty clearly to a fresh pair of eyes, was a set of corrupted Oracle libraries Once we recovered those libraries, all the databases came up quickly, without a problem The moral of the story is that when you have a database that has crashed, or that will not open, do not assume that the cause is a corrupted datafile or a bad disk drive Find out for sure what the problem is by investigative analysis Good analysis may take a little longer to begin with, but, generally, it will prove valuable in the long run

W

Trang 16

Chapter 23: RMAN in the Workplace: Case Studies 533

What Recovery Options Are Available?

Recovery situations can offer a number of solutions Again, back when I was a consultant, I had a customer who had a disk controller drive fail over a weekend, and the result was the loss of file systems on the box, including files belonging to an Oracle database in ARCHIVELOG mode The DBA at the customer site went ahead and recovered the entire database (about 150GB), which took, as I recall, a couple of hours

The following Monday, the DBA and I had a discussion about the recovery method he selected The corrupted file systems actually impacted only about five database datafiles (the other file systems contained web server files that we were not concerned with) The total size of the impacted database datafiles was no more than 8 or 10GB The DBA was pretty upset about having to come into the office and spend several hours recovering the database When I asked the DBA why he hadn’t just recovered the five datafiles instead of the entire database, he replied that it just had not occurred to him

The moral of this story is that it’s important to consider your recovery options The type of recovery you do may make a big difference in how long it takes you to recover your database Another moral of this story is to really become a backup and recovery expert Part of the reason the DBA in this case had not considered datafile recovery, I think, is that he had never done such

a recovery When facing a stressful situation, people tend to not consider options they are not familiar with So, we strongly suggest you set up a backup and recovery lab and practice recoveriesuntil you can do it in your sleep

Might Oracle Support Be Needed?

You might well be a backup and recovery expert, but even the experts need help from time to time This is what Oracle support is there for Even though I feel like I know something about backup and recovery, I ask myself if the failure looks to be something that I might need Oracle support for Generally, if the failure is something odd, even if I think I can solve it on my own, I

“prime” support by opening a service request on the problem That way, if I need help, I have already provided Oracle with the information they need (or at least some initial information) and have them primed to support me should I need it If you are paying for Oracle support, use it now, don’t wait for later

Who Can Act as a Second Pair of Eyes During Recovery?

When I’m in a stressful situation, first of all it’s nice to have someone to share the stress with Somehow I feel a bit more comfortable when someone is there just to talk things out with Further, when you are working on a critical problem, mistakes can be costly Having a second, experienced pair of eyes there to support you as you recover your database is a great idea!

Recovery Case Studies

Now to the meat of the chapter, the recovery case studies In this section, we provide you with a number of case studies listed next in the order they appear:

1 Recovering from complete database loss in NOARCHIVELOG mode with a recovery

catalog

2 Recovering from complete database loss in NOARCHIVELOG mode without a recovery

catalog

Trang 17

3 Recovering from complete database loss in ARCHIVELOG mode without a recovery

catalog

4 Recovering from complete database loss in ARCHIVELOG mode with a recovery catalog

5 Recovering from the loss of the SYSTEM tablespace

6 Recovering online from the loss of a datafile or tablespace

7 Recovering from loss of an unarchived online redo log

8 Recovering through resetlogs

9 Completing a failed duplication manually

10 Using RMAN duplication to create a historical subset of the target database

11 Recovering from a lost datafile in ARCHIVELOG mode using an image copy in the flash

recovery area

12 Recovering from running the production datafile out of the flash recovery area

13 Using Flashback Database and media recovery to pinpoint the exact moment to open the

database with resetlogs

In each of these case studies, we provide you with the following information:

The Scenario Outlines the environment for you

The Problem Defines a problem that needs to be solved

The Solution Outlines the solution for you, including RMAN output solving the problem

Now, let’s look at our case studies!

Case #1: Recovering from Complete Database Loss (NOARCHIVELOG Mode) with a Recovery Catalog

The Scenario

Thom is a new DBA at Unfortunate Company Upon arriving at his new job, he finds that his databases are not backed up at all, and that they are all in NOARCHIVELOG mode Because Thom’s manager will not shell out the money for additional disk space for archived redo logs, Thom

is forced to do offline backups, which he begins doing the first night he is on the job Thom also has turned on autobackups of his control file and has converted the database so that it is using an SPFILE Finally, Thom has created a recovery catalog schema in a different database that is on a different database server

The Problem

Unfortunate Company’s cheap buying practices catch up to it in the few days following Thom’s initial work, when the off-brand (cheap) disks that it has purchased all become corrupted due to

a bad controller card Thom’s database is lost

Thom’s offline database backup strategy includes tape backups to a local tape drive Once the hardware problems are solved, the system administrator quickly rebuilds the lost file systems, and Thom quickly gets the Oracle software installed Now, Thom needs to get the database back up and running immediately

■

Trang 18

The Solution Revealed Based on the preceding considerations, Thom devises and implements

the following recovery plan:

1 Restore a copy of the SPFILE While you will be able to nomount the Oracle instance

in many cases without a parameter file at all, to properly recover the database, Thom has to restore the correct SPFILE from backup Because he doesn’t have a control file yet, he cannot configure channels permanently In this case, Thom has configured his autobackups of the control files to go to default disk locations Thus, once Thom restored his Oracle software backups, he also restored the backup pieces to the autobackups of the control file This makes the recovery of the SPFILE simple as a result:

rman target sys/password catalog rcat user/rcat password@catalogdb startup force nomount;

restore spfile from autobackup;

2 Restore a copy of the control file Using the same RMAN session as in Step 1, Thom

can do this quite simply After the restore operation, he mounts the database using the restored control file:

restore controlfile from autobackup;

alter database mount;

3 Configure permanent channel parameters Now that Thom has a control file restored, he

can update the persistent parameters for channel allocation to include the name of the tape device his backup sets are on This will allow him to proceed to restore the backup from tape and recover the database

configure default device type to sbt;

configure channel 1 device type sbt parms "env (nb ora serv mgtserv, nb ora client cervantes)";

4 Perform the restore and recovery:

restore database;

recover database noredo;

alter database open resetlogs;

Trang 19

NOTE

Thom used the alter database open resetlogs command He could have used the SQL command (sql “alter database open resetlogs”), too However, one benefit of using the RMAN alter command is

that the catalog and the database will both be reset Using the SQL version, only the database is reset.

Case #2: Recovering from Complete Database Loss (NOARCHIVELOG Mode) Without a Recovery Catalog

The Scenario

Charles is the DBA of a development OLTP system Because it is a development system, the decision was made to do RMAN offline backups and to leave the database in NOARCHIVELOG mode Charles did not decide to use a recovery catalog when doing his backups Further, Charles has configured RMAN to back up the control file backups to disk by default, rather than to tape

The Problem

Sevi, a developer, developed a piece of PL/SQL code designed to truncate specific tables in the database However, due to a logic bug, the code managed to truncate all the tables in the schema, wiping out all test data

The Solution

If there were a logical backup of the database, this would be the perfect time to use it Unfortunately, there is no logical backup of the database, so Charles (the DBA) is left with performing an RMAN recovery Since his database is in NOARCHIVELOG mode, Charles has only one recovery option

in this case, which is to restore from the last offline backup Because all the pieces to do recovery are in place (the RMAN disk backups, the Oracle software, and the file systems), all that needs to

be done is to fire up RMAN and recover the database

The Solution Revealed Based on the preceding considerations, Charles devises and

implements the following recovery plan:

1 Restore the control file When doing a recovery from a cold backup, it is always a good

idea to recover the control file associated with that backup (this prevents odd things from happening) In this case, Charles will be using the latest control file backup (since he doesn’t back up the control file at other times) Since Charles uses the default location to create control file backup sets to, he doesn’t need to allocate any channels If Charles is not using the Oracle flash recovery area and not using a recovery catalog, he will need to set the DBID of the system, since he is not using a recovery catalog before he can restore the control file If Charles is using a recovery catalog or the FRA, then setting the DBID would not be required Once Charles restores the control file, he mounts the database:rman target sys/password

startup nomount set dbid 2540040039;

restore controlfile from autobackup;

sql 'alter database mount';

Trang 20

NOTE

If you are using the FRA, you will not need to set the database DBID.

2 The control file that Charles restored has the correct default persistent parameters already

configured in it, so all he needs to do is perform the restore and recovery:

restore database;

recover database noredo;

sql "alter database open resetlogs";

Case #3: Recovering from Complete Database Loss (ARCHIVELOG Mode) Without a Recovery Catalog

The Scenario

We meet Thom from Case #1 again Thom’s company finally has decided that putting the database in ARCHIVELOG mode seems like a good idea (Thom’s boss thought it was his idea!) Unfortunately for Thom, due to budget restrictions, he was forced to use the space that was allocated to the recovery catalog to store archived redo logs Thus, Thom no longer has a recovery catalog at his disposal

The Problem

As if things have not been hard enough on Thom, we also find that Unfortunate Company is also

an unfortunately located company His server room, located in the basement as so many server rooms are, suffered the fate of a broken water main nearby The entire room was flooded, and the server on which his database resides has been completely destroyed

Thom’s backup strategy has improved It now includes tape backups to an offsite media management server Also, he’s sending his automated control file/SPFILE backups to tape rather than to disk Again, he’s salvaged a smaller server from the wreckage, which already has Oracle installed on the system, and now he needs to get the database back up and running immediately

The Solution

Again, Thom has lost the current control file and the online redo logs for his database, so it’s time

to employ the point-in-time recovery skills Thom still has control file autobackups turned on, so

he can use them to get recovery started In addition, he’s restoring to a new server, so he wants to

be aware of the challenges that restoring to a new server brings; there are media management, file system layout, and memory utilization considerations

Media Management Considerations Because he’s restoring files to a new server, Thom must

first make sure that the MML file has been properly set up for use on his emergency server This means having the media management client software and Oracle Plug-In installed prior to using RMAN for restore/recovery Thom uses the sbttest utility—a good way to check to make sure that the media manager is accessible

Next, Thom needs to configure his tape channels to specify the client name of the server that has been destroyed Thom will need to specify the name of the client from which the backups were taken In addition, he needs to ensure that the media management server has been configured to allow for backups to be restored from a different client to his emergency server

Trang 21

File System Layout Considerations Thom’s new system has a different file system structure from his original server The production database had files manually striped over six mount points: /u02, /u03, /u04, /u05, /u06, and /u07 His new server has only two mount points: /u02 and /u03 Fortunately, Thom employed directory structure standards across his enterprise, and all data directories are /oradata/prod/ on all mount points In addition, he has a standard that always puts the ORACLE_HOME on the same mount point and directory structure on every server

Memory Utilization Considerations Thom’s emergency server has less physical memory than his lost production server This means he will have to significantly scale back the memory utilization for the time being in order to at least get the database up and operational

The Solution Revealed Based on the preceding considerations, Thom devises and implements

the following recovery plan:

1 Determine the DBID of the target database Thom can do this by looking at the

file handle for his control file autobackup He needs to be able to view the media management catalog to do so Even easier, Thom has every DBID for all his databases stored somewhere in a log—a notebook, a PDA, whatever Whatever you decide to use, just make sure it’s accessible in an emergency

2 Restore a copy of the SPFILE As you may remember, Thom will have to force an instance

to be opened using a dummy SPFILE, and then restore the correct SPFILE from backup Because Thom changed the default location for his control file/SPFILE autobackups to tape, he needs to manually configure the channel for this backup, because he doesn’t have a control file yet; thus, he cannot configure channels permanently Instead, he

has to imbed channel allocation commands in a run block, and then issue the startup

command to start the database with the correct SPFILE

rman target / set dbid 204062491;

startup force nomount;

run { allocate channel tape 1 type sbt parms 'env (nb ora serv rmsrv, nb ora client cervantes)';

restore spfile from autobackup;}

3 Make changes to the SPFILE Thom must modify his SPFILE to take into account the new

server configuration This means changing memory utilization parameters and setting filename conversion parameters He must connect to the newly started instance from SQL*Plus and make the necessary changes

alter system set control files '/u02/oradata/prod/control01.dbf', '/u03/oradata/prod/control02.dbf' scope spfile;

alter system set db file name convert ('/u04' , '/u02' , '/u05' , '/u02' ,

Trang 22

'u06' , ' u03' , 'u07' , 'u03') scope spfile;

alter system set log file name convert ('/u04' , '/u02' , '/u05' , '/u02' ,

'u06' , ' u03' , 'u07' , 'u03') scope spfile;

alter system set log archive dest 1 'location /u02/oradata/prod/arch' scope spfile;

alter system set db cache size 300m scope spfile;

alter system set shared pool size 200m scope spfile;

shutdown immediate;

startup nomount;

NOTE

You could also choose to use the set newname option here.

4 Restore a copy of the control file Using the same RMAN session as the preceding, Thom

can do this quite simply (he’s already set the DBID) Then, mount the database using the restored control file

run { allocate channel tape 1 type sbt parms 'env (nb ora serv rmsrv, nb ora client Cervantes)';

restore controlfile from autobackup; } alter database mount;

5 Configure permanent channel parameters Now that Thom has a control file restored, he

can update the persistent parameters for channel allocation to include the name of the lost server as the media management client This serves two purposes: it allows RMAN to access the backups that were taken from the lost server, and RMAN will pass this client name to the media management server when any backups are taken from the new server That way, when the lost server is rebuilt, any backups taken from this stopgap system will

be accessible at the newly reconstructed production server

configure device type sbt parallelism 2;

configure auxiliary channel 1 device type sbt parms "env (nb ora serv mgtserv, nb ora client cervantes)";

configure auxiliary channel 2 device type sbt parms "env (nb ora serv mgtserv, nb ora cient cervantes)";

6 Determine the last archive log for which there is a copy Because Thom lost the entire

server, he also lost any archive logs that had not yet been backed up by RMAN So, he must query RMAN to determine what the last archive log is for which a backup exists.list backup of archivelog from time 'sysdate-7';

7 With the last log sequence number in hand, Thom performs his restore and recovery and

opens the database:

restore database;

recover database until sequence <number>;

alter database open resetlogs;

Trang 23

Case #4: Recovering from Complete Database Loss (ARCHIVELOG Mode) with a Recovery Catalog

The Scenario

Charles is taking over for Thom, because management recognized that Thom was a hero of a DBA and thus sent him and his wife to Hawaii for two weeks of R and R Before he left, Thom’s company added additional disk storage and decided that using the RMAN recovery catalog was probably a good idea

Unfortunately for Charles, disaster seems to follow him around At his last company, a huge electrical fire caused all sorts of mayhem, and this time, it’s gophers Yes, gophers Somewhere outside the computer room, a lone gopher ate through the power cable leading to the computer room This resulted in an electrical fire and a halon release into the computer room As a result

of the electrical fire, the server and disks on which his database resides have been completely destroyed…again

The Problem

Charles reviews Thom’s backup strategy Again, Charles has salvaged a smaller server that survived the fiasco, which already has Oracle installed on the system, and now he needs to get the database back up and running immediately Fortunately, the recovery catalog server is intact,

so Charles can use it during the recovery

The Solution

Again, Charles has lost the current control file and the online redo logs for his database, so it’s time to employ his point-in-time recovery skills The backup strategy still has control file autobackups turned on, so Charles can use them to get recovery started In addition, he’s restoring

to a new server, so he wants to be aware of the challenges that restoring to a new server brings; there are media management, file system layout, and memory utilization considerations

Media Management Considerations Because Charles is restoring files to a new server, he must first make sure that the MML file has been properly set up for use on his emergency server This means having the media management client software and Oracle Plug-In installed prior to using RMAN for restore/recovery Charles uses sbttest to check to make sure that the media manager is accessible

Next, Charles needs to configure his tape channels to specify the client name of the server that has been destroyed Charles will need to specify the name of the client from which the backups were taken In addition, he needs to ensure that the media management server has been configured to allow for backups to be restored from a different client to his emergency server

File System Layout Considerations On Charles’s new system, the file system structure is different from that on his original server The production database had files manually striped over six mount points: /u02, /u03, /u04, /u05, /u06, and /u07 His new server has only two mount points: /u02 and /u03 Luckily, directory structure standards exist across his enterprise, and all data directories are /oradata/prod/ on all mount points In addition, he has a standard that always puts the ORACLE_HOME on the same mount point and directory structure on every server

Memory Considerations Charles’s emergency server has less physical memory than his lost production server This means he has to significantly scale back the memory utilization for the time being in order to at least get the database up and operational

Trang 24

The Solution Revealed Based on the preceding considerations, Charles devises and implements the following recovery plan:

1 Get a copy of the SPFILE restored First, Charles will nomount the database instance

without a parameter file, since Oracle supports this Then, he will restore the correct SPFILE from backup Because he doesn’t have a control file yet, he cannot configure

channels permanently Instead, he has to embed channel allocation commands in a run block, and then issue the startup command to start the database with the correct SPFILE

Since he has a recovery catalog, he doesn’t need to set the machine ID as he did earlier.rman target / catalog rcat user/rcat password@catalog

startup force nomount;

run { allocate channel tape 1 type sbt parms 'env (nb ora serv rmsrv, nb ora client cervantes)';

restore spfile from autobackup;}

shutdown immediate;

startup nomount;

2 Make changes to the SPFILE Charles must modify his SPFILE to take into account the

new server configuration This means changing memory utilization parameters and setting filename conversion parameters He must connect to the newly started instance from SQL*Plus and make the necessary changes

alter system set control files '/u02/oradata/prod/control01.dbf', '/u03/oradata/prod/control02.dbf' scope spfile;

alter system set db file name convert "('/u04' , '/u02' , '/u05' , '/u02' ,

'/u06' , '/u03' , '/u07' , '/u03')" scope spfile;

alter system set log file name convert "('/u04' , '/u02' , '/u05' , '/u02' ,

'/u06' , '/u03' , '/u07' , '/u03')" scope spfile;

alter system set log archive dest 1 'location /u02/oradata/prod/arch' scope spfile;

alter system set db cache size 300m scope spfile;

alter system set shared pool size 200m scope spfile;

shutdown immediate;

startup nomount;

3 Restore a copy of the control file Using the same RMAN session, Charles can do this

quite simply (he’s already set the DBID) Then, he must mount the database using the restored control file

run { allocate channel tape 1 type sbt parms 'env (nb ora serv rmsrv, nb ora client Cervantes)';

restore controlfile from autobackup; } sql 'alter database mount';

Trang 25

4 Configure permanent channel parameters Now that Charles has a control file restored,

he can update the persistent parameters for channel allocation to include the name of the lost server as the media management client This serves two purposes: it allows RMAN to access the backups that were taken from the lost server, and RMAN will pass this client name to the media management server when any backups are taken from the new server That way, when the lost server is rebuilt, any backups taken from this stopgap system will

be accessible at the newly reconstructed production server

configure device type sbt parallelism 2;

configure auxiliary channel 1 device type sbt parms "env (nb ora serv mgtserv, nb ora client cervantes)";

configure auxiliary channel 2 device type sbt parms "env (nb ora serv mgtserv, nb ora cient cervantes)";

5 Determine the last archive log for which there is a copy Because Charles lost the entire

server, he also lost any archive logs that had not yet been backed up by RMAN So, he must query RMAN to determine what the last archive log is for which a backup exists:list backup of archivelog from time 'sysdate-7';

6 With the last log sequence number in hand, Charles performs his restore and recovery

and opens the database:

restore database;

recover database until sequence <number>;

sql "alter database open resetlogs";

Case #5: Recovering from the Loss of the SYSTEM Tablespace

The Solution

Fortunately for Nancy, this is not a complete loss of her system Her online redo logs and control file are all intact Because she has to recover the SYSTEM tablespace, she has to do her recovery with the database closed, not open Otherwise, the recovery is a pretty easy one

The Solution Revealed Based on the preceding considerations, the recovery plan that Nancy devises and implements simply requires her to restore the database, as follows:

rman target / catalog rcat user/rcat password@catalog startup force mount;

restore tablespace users, system, index;

recover tablespace users, system, index;

alter database open;

Tiêu đề	Oracle RMAN 11g Backup and Recovery- P12
Trường học	Unknown
Chuyên ngành	Database Backup and Recovery
Thể loại	Document

Định dạng
Số trang	50
Dung lượng	744,17 KB