1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Disaster Recovery: Backing Up and Restoring docx

46 495 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Disaster Recovery: Backing Up and Restoring
Trường học University of Technology
Chuyên ngành Information Technology
Thể loại giáo trình
Năm xuất bản 2000
Thành phố Unknown
Định dạng
Số trang 46
Dung lượng 241,23 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Removable Storage and Media Pools Removable Storage RS is a new service in Windows 2000 that takes away a lot ofthe complexity of managing backup systems.. This service also brings netwo

Trang 1

Disaster Recovery:

Backing Up and Restoring

Every MIS or network administrator has a horror story

to tell about backing up and restoring systems or data

One organization, where we now manage more than a dozenbackup servers, has data processing centers spread all overthe United States, and all are inter-connected via a large pri-vate wide area network In mid-1999, a valuable remoteMicrosoft SQL Server machine just dropped dead The IT doctor said it had died of exhaustion five years of faithfulservice and never a day’s vacation After trying everything

to revive it, we instructed the data center’s staff to ship theserver back to HQ for repairs

The first thing we asked the IT people at the remote officewas: “You’ve been doing your backups everyday right?” “Surething,” they replied “Every day for the past five years.” Theysounded so proud, we were overjoyed “Good, we will have torebuild your server from those tapes, so send them all to uswith the server.” To cut a frustrating story short: The five

years’ worth of tapes had nada on them, not a bit nor a byte.

Zilch We spent two weeks trying to make sense of what was

on that SQL Server computer and to rebuild it We refuse toeven guess the cost of that loss

We have another horror story we will later relate, but thisexample should make it clear to you that backup administra-tion, a function of disaster recovery, is one of the most impor-tant IT functions you will have the fortune to be charged with

Backup administrators need to be trained, responsible, andcool people They need to be constantly revising and refiningtheir practice and strategy; their companies depend on them

17C H A P T E R

In This Chapter

UnderstandingBackup Practice and ProcedureRemovable Storageand Media PoolsUsing the BackupTools that Come withWindows 2000

Trang 2

This chapter serves as an introduction to disaster recovery/backup-restore dures on Windows 2000 networks, the Backup-Restore utility that ships with theWindows 2000 operating system, and the Windows 2000 Removable Storage Manager.Before we get into this chapter, we should consider several angles on the backup/restore functions expected of administrators.

proce-Why Back Up Data?

You back up for two reasons, and even Windows 2000, with its fancy tools, rarelyhighlights the differences:

✦ Record-keeping (such as annual backups performed every month)

✦ Disaster Recovery (DR) or System RecoveryYou should make an effort to decide when a file is no longer valuable to the disasterrecovery period, and then it should be archived out for record-keeping Depending

on your company’s needs, this may vary from a week to a couple of weeks, or from

a month to a couple of months, and even years There is no point buying media forannual backups for a site you know is due to close in six months

What To Back Up

Often, administrators back up every file on a machine or network and dump thewhole pile into a single backup strategy Instead, they should be splitting up ourfiles into two distinct groups: System and Data

✦ System files comprise files that do not change between versions of the

applica-tions and operating systems

✦ Data files comprise all the files that change every day, such as word-processing

files, database files, spreadsheets files, media files, graphics files, and tion files (like the registry, DHCP, WINS, DNS, and the Active Directory data-bases) Depending on your business, data files can change from 2 percent a day

configura-on the low side to 80 percent a day configura-on the high side The average in many of thebusinesses for which we have consulted is around 20 percent of the files chang-ing every day And, you must also consider the new files that arrive

Understanding the requirements will make your life in the admin seat easier,because this is one of the most critical of all IT or network admin jobs One per-son’s slip-up can cause millions of dollars in data loss How often have you backed

up an entire system that was lost for some reason, only to find that to restore it,you had to reinstall from scratch? “So why was I backing up the system,” you might

Trang 3

have asked yourself And how often have your restored a file for a user who thencomplained he or she lost five days’ worth of work on the file because the restorewas so outdated It’s happened to us on many occasions and is very disheartening

if you are trying so hard to keep your people productive

There is nothing worse than trying to recover lost data, knowing that all on MahoganyRow are sitting idle, with the IT director standing behind you in the server room, anddiscovering you cannot recover The thought of your employment record being pulledshould make you realize how important it is to pay attention to this function

We will delve into these two subjects in more depth in this chapter and explore howWindows 2000 helps us better manage our recovery and record-keeping processes

We will start by focusing on the data side of the backup equation and finally leadthis discussion into system backup/restore

The archive bit is a flag, or a unit of data, indicating that the file has been modified.

When we refer to the setting of the archive bit, we mean that we have turned it on,

or we have set it to “1.” Turning it off means we set it to zero or “0.” If the archivebit is turned on since we last backed up the file, it means that the file has been mod-ified since it was last backed up

Trusting the state of the archive bit, however, is not an exact science by any means,because it is not unusual for other applications (and developers) and processes tomess with the archive bit This is the reason we recommend that a full backup beperformed on all data at least once a week

What Is a Backup?

A backup is an exact copy of a file (including documentation) that is stored on a

storage media (usually in a compressed state) and kept in a safe place (usually at aremote location) for use in the event the working copy is destroyed Notice that weplaced emphasis on “including documentation,” because with every media holdingbackups, you need to maintain a history or documentation of the files on the media

This is usually in the form of labels and identification data on the media itself, on

Trang 4

the outside casing, and in spreadsheets, hard catalogs, or data ledgers in someform or another Without history data, restore media will be unable to locate yourfiles and the backup will be useless This is why it is possible to prepare a tape foroverwriting by merely formatting the label so that the magnetic head thinks themedia is blank.

There are various types of backups, depending on what you back up and how oftenyou back it up:

✦ Archived backup: A backup that documents (in header files, labels, and

backup records) the state of the archive bit at the time of copy The state off) of the bit indicates to the backup software that the file has been changedsince the last backup When Windows 2000 Backup does an archived backup,

(on-it sets the archive b(on-it accordingly

✦ Copy backup: An ad-hoc “raw” copy that ignores the archive bit state It also

does not set the archive bit after the copy A copy backup is useful for quickcopies between DR processes and rotations, or to pull an “annual” during themonthly rotation (we discuss this later)

✦ Daily backup: This does not form part of any rotation scheme (in our book

anyway) It is just a backup of files that have been changed on the day of thebackup We question the usefulness of the daily backup in Backup, becausemission-critical DR practice dictates the deployment of a manual or auto-mated rotation scheme (described later) Also, Backup does not offer a sum-mary or history of the files that have changed during the day If you wereresponsible for backing up a couple of million files a day well, this justwould not fly

✦ Normal backup: A complete backup of all files (that can be backed up), period.

The term “normal” is more a Windows 2000 term because this backup is morecommonly called a “full” backup in DR circles The full backup copies all filesand then sets the archive bit to indicate (to Backup) that the files have beenbacked up You would do a full backup at the start of any backup scheme Youwould also have to do a full backup after making changes to any scheme A fullbackup, and documentation or history drawn from it, is the only means of per-forming later incremental backups Otherwise, the system would not knowwhat has or has not changed since the last backup

✦ Incremental backup: A backup of all files that have changed since the last

full or incremental backup The backup software sets the archive bit, whichthereby denotes that the files have been backed up Under a rotation scheme,

a full restore would require you to have all the incremental media used in the media pool, all the way back to the first media, which contains the fullbackup You would then have the media containing all the files that havechanged (and versions thereof) at the time of the last backup

Trang 5

✦ Differential backup: This works exactly as the incremental, except that it

does not do anything to the archive bit In other words, it does not mark thefiles as having been backed up When the system comes around to do a differ-ential backup, it will rely on comparison of the files to be backed up with theoriginal catalog Differential backups are best done on a weekly basis, alongwith a full or normal backup, so as to keep differentials comparing againstrecently backed up files

What Is a Restore?

A restore is the procedure you perform to replace a working copy of a file or

collec-tion of files to a computer’s hard disks in the event they are lost or destroyed Youwill often perform a restore for no reason other than to return files to a former state(such as when a file gets mangled, truncated, corrupted, or infected with a virus)

Restore management is crucial in the DR process If you lose a hard disk or theentire machine (for example, it is trashed, stolen, lost, or fried in a fire), you willneed to rebuild the machine and have it running in almost the same state (if notexactly) as its predecessor was in at the time of the loss How you manage your DRprocess will determine how much downtime you experience or the missing genera-

tion of information between the last backup and the disaster — a period we call void

recovery time.

Understanding How Backup Works

A collection of media, such as tapes or disks, is known as a backup set (this is ent from a media pool, which we will discuss in a bit) The backup set is the backup

differ-media containing all the files that were backed up during the backup operation

Backup uses the name and date of the backup set as the default set name Backupallows you to either append to a backup set in future operations or replace or over-write the files in the media set It allows you to name your backup set according toyour scheme or regimen

Backup also completes a summary or histories catalog of the backed-up files, which

is called a backup set catalog If your backup set contains several media, then thecatalog is stored on the last medium in the set, at the end of the file backup Thebackup catalog is loaded when you begin a restore operation You will be able toselect the files and folders you need to restore from the backup catalog

Removable Storage and Media Pools

Removable Storage (RS) is a new service in Windows 2000 that takes away a lot ofthe complexity of managing backup systems This service also brings network sup-port to Windows for a wider range of backup and storage devices

Trang 6

Microsoft took the responsibility of setting up backup devices and management ofmedia away from the old Backup application and created a central authority forsuch tasks This central authority is known as Removable Storage and is one of the largest and most sophisticated additions to the operating system, worth theprice of the OS license alone, and a welcome member on any network If you are not ready to convert to a Windows 2000 network, you might consider raising aWindows 2000 “Backup” server just to obtain the services of Removable Storage.But Removable Storage is like an iceberg In this chapter and in other parts of thebook, we can only show you the tip Exposing the rest of this monster service, andeverything you can do with it, is beyond the scope of this treatise, and a full treat-ment of the subject would run into several chapters To fully appreciate this service,and if you need to get into some serious disaster recovery strategies, possibly evencustom backup and media handling algorithms, you should refer to the Microsoftdocumentation covering both the Removable Storage Service and its API and theTape/Disk API A good starting place is the Windows 2000 Server Operations Guide,which is part of the Resource Kit, discussed in Appendix B We will, however, pro-vide you with an introduction to the service, coming up next.

The Removable Storage Service

Removable Storage comprises several components But the central nervous system

of this technology is the Removable Storage Service and the Win32 Tape/Disk API.These two components, respectively, expose two application programming inter-faces (APIs) that any third party can access to obtain removable storage functional-ity and gain access to removable storage media and devices The Backup programthat ships with the OS makes use of both APIs to provide a usable, but not toosophisticated, backup service

By using the two services, applications do not need to concern themselves with thespecifics of media management, such as identifying cartridges, changing them inbackup devices, cataloging, numbering, and so on This is all left to the RemovableStorage Service All the application requires is access to a media pool created andmanaged by Removable Storage The backup application’s responsibility is identify-ing what needs to be backed up or restored, and the source and destination of data;Removable Storage’s responsibility deals with where to store it, what to store it on,and how to retrieve it Essentially, the marriage of backup-restore applications andRemovable Storage has been consummated along client/server principles

The Removable Storage Service can be accessed directly by programming against theAPI You can also work with it interactively (albeit not as completely as programmingagainst the API) in the Removable Storage node found in the Computer Managementsnap-in (compmgmt.msc) The Removable Storage node is also present in the RemoteStorage snap-in discussed in Chapter 21 Before we begin with any hard-core backuppractice, let’s look at Removable Storage and how it relates to backup and disasterrecovery Removable Storage is also briefly discussed in Chapter 16

Trang 7

Figure 17-1: The Removable Storage Snap-in

The service provides the following functionality to backup applications, also known

as backup or data moving and fetching clients:

✦ Management of hardware, such as drive operations, drive health and status,and drive head cleaning

✦ Mounting and dismounting of cartridges and disks (media)

Like the dynamic disk management technology discussed in Chapter 16, RemovableStorage hides the physical media from the clients Instead, media is presented as alogical unit, which is assigned a logical identifier or ID When a client needs to store

or retrieve data from media, it does not deal with the physical media, but ratherwith that media’s logical ID The logical ID can thus encapsulate any physical media,the format of which is of no concern to the client application

Trang 8

Although the client need not be concerned about the actual media, you, thebackup administrator, have the power, by configuring media pools, to dictate ontowhich format or media type your backups should be placed If this is confusing toyou, it will become clearer when you understand media pools, discussed shortly.The benefit of the logical ID is patent, but a good example of its application is thatthe service is able to move data, represented by its logical ID, from one physicalmedium to another This would be desirable if media is approaching the end of itslife and the data needs to be moved to new cartridges.

Media formats can be extremely complex Some media allow you to write and read

to both sides; others only allow access to one side How media is written to andread from differs from format to format Removable Storage handles all those pecu-liarities for you Just like the Print Spooler service, which can expose the variousfeatures of thousands of different print devices, so can Removable Storage identifymany storage devices and expose their capabilities to you and the application (Thepros and cons of each of the popular backup media formats are discussed at theend of this chapter)

Finally, and most important from the cost/benefit aspect, Removable Storage allowsmedia to be shared by various applications This ensures maximum use of yourmedia asset

The Removable Storage Database

Removable Storage stores all the information it needs about the hardware, mediapools, work lists, and more in its own database This database is not accessible toclients and is not a catalog of which files have been backed up and when Everythingthat Removable Storage is asked to do, or does, is automatically saved in thisdatabase

Physical Locations

Removable Storage also completely handles the burden of managing media location,

a chore once shared between the client applications and the administrator But thephysical location service deals with more than knowing in which cupboard, shoe-box, vault, or offsite dungeon you prefer your media stored in; it is also responsiblefor the physical attributes of the hardware devices used for backing up and restoringdata It is worthwhile to understand this section, because you will need such knowl-edge to perform high-end backup services that protect a company’s data

Removable storage splits the location services into two tiers: libraries and offlinelocations If a media is online, then it is inside a tape device of some kind that can

at any time be fired up to allow data to be accessed or backed up If media is offline,then it means that you have taken it out of its drive or slot and sent it somewhere

Note

Trang 9

As soon as you remove media from a device, Removable Storage makes a note in its database that the media is offline.

Libraries can be single tape drives or highly sophisticated and very expensiverobotic storage silos comprising hundreds of drive bays A CD-R/W tower, with 12drives, is also an example of a library Media in these devices or so-called librariesare always considered online, and are marked as such in the database RemovableStorage also understands the physical components that make up these devices

Library components comprise the following:

✦ Drives: All backup devices are equipped with drives The drive machinery

consists of the recording heads, drums, motors, and other electronics Toqualify as a library, a device requires at least one drive

✦ Slots: Slots are pigeonholes, pits, or holding pens in which online media is

placed, in an online state When media is needed for a backup, a restore, or aread, the cartridge or disk is pulled out of the slot and inserted into the drive

When the media is no longer needed, the cartridge is removed from the driveand returned to its slot The average tape drive does not come equipped with aslot, but all high-end, multi-drive robotic systems do The basic slot-equippedmachine typically comes equipped with two drives and 15 slots Slots are typi-cally grouped into collections called magazines Each magazine holds aboutfive cartridges, and one magazine maintains a cleaning cartridge in one of theslots You typically have access to magazines so that you can populate themwith the cartridges you fetched from offline locations

✦ Transports: These are the robotic machines in high-end libraries that move

cartridges and disks from slots to drives and back again

✦ Bar Code Readers: Bar coding is discussed later in this chapter It is a means

by which the cartridges can be identified in their slots You do not require abar code reader-equipped system to use a multi-drive or multi-slot systembecause media identifiers can also be written to the media But bar code read-ing allows for much faster access to the cartridges, because the system doesnot need to read information off the actual media, which requires every car-tridge to be pulled from a slot and inserted into a drive, a process that couldtake as long as five minutes for every cartridge

✦ Doors: Doors differ from device to device and from library system to library

system In some cases, the door looks like the door to a safe, which is released

by Removable Storage when you need to gain access to slots or magazines

Many systems have doors that only authorized users can access Some doorsare built so strong that you would need a blowtorch to open them On manycheaper devices, especially single drive-no slot hardware, the door is a smalllever that Removable Storage will release so that you can extract the cartridge

Other devices have no doors at all, but when Removable Storage sends an

“open sesame” command to the “door,” the cartridge is ejected out of the drive bay

Trang 10

✦ Insert/Eject Ports: The IE ports are not supported on all devices IE ports

pro-vide a high degree of controlled access to the unit in a multi-slot library tem In other words, you insert media into the port, and the transport goesand finds a free slot for it Another way to comprehend the IE port function is

sys-to compare it sys-to a valet service You hand your car keys sys-to the valet, and he orshe goes and finds a free parking space for you

If the hardware you attach supports any or all of these sophisticated features,Removable Storage will be able to “discover it” and use it appropriately

There are dozens, if not hundreds, of devices from which to choose for backing upand storing data Removable Storage, as we discussed, can handle not only tradi-tional tape backup systems, but also CD silos, changers, and huge multi-disk read-ers If you wish to check if Removable Storage supports a particular device, followthe steps to create a media pool discussed in the section “Performing a Backup”later in this chapter

Media Pools

A new term in the Windows operating system is the media pool If you are planning

to do a lot of backing up or have been delegated the job of backup operator oradministrator, you will have a lot to do with media pools in your future backup-restore career

A media pool in the general sense of the term is a collection of media organized as

a logical unit Conceptually speaking, the media pool contains media that belong toany defined storage or backup device, format, or technology assigned to your hard-ware, be it a server in the office or one located out on the WAN somewhere, 15,000miles away However, each media pool can only represent media of one type Youcannot have a media pool that combines DVD, DAT, and ZIP technology But you canback up your data to multiple media pools of different types if the client application

or function so requires it

It may be easier to think of the media pool in terms of the hardware devices that areavailable to your system (such as a CD-R/W or a DLT tape drive) You should strivenot to work with media pools from dissimilar devices, especially when backing upzillions of files For example, you should stay away from creating media pools thatconsist of Zip drives, DLT tape drives, and a CDR-R/W changer It would make man-aging your media, such as offsite storage, boxing, and labeling, very difficult, muchlike wearing a sneaker on one foot and a hiking boot on the other and then justify-ing walking with both at the same time because they both represent “pools” ofwalking attire

Removable Storage separates media pools into two classes: system pools and

appli-cation pools The Removable Storage Service creates system pools when it is first

installed By default, the Removable Storage Service is enabled and starts up when

Trang 11

you boot your system If you disable it or remove it from installation, any devicesinstalled in your servers — or attached on external busses — will be ignored byWindows 2000, as if they did not exist When Removable Storage is activated, it willdetect your equipment, and if compliant, they will be used in media pools automati-cally created by the service or applications.

identi-✦ Free pools: Free pools allow any application to access the media pools in this

group In other words, these media pools can be made available to any cation requiring free media Applications can draw on these media poolswhen they need to back up data When media pools are no longer required,they can be returned to this group

appli-✦ Unrecognized pools: Media in these pools are not known to Removable

Storage If the service cannot read information on a cartridge, or if the tridge is blank, the media pool supporting it is placed into this grouping

car-✦ Import pools: This group is for media pools that were used in other Removable

Storage systems, on other servers, or by applications that are compatible withRemovable Storage or that can be read by Removable Storage Media written

to by the Microsoft Tape Format (MTF) can thus be imported into the localRemovable Storage system

Application pools

When an application is given access to a free media pool, either it will create a cial pool into which the media can be placed or you can create pools manually forthe application using the Removable Storage snap-in, illustrated in Figure 17-1

spe-A very useful and highly sought after feature of Windows 2000 media pools is thatpermissions can be assigned to pools to allow other applications to use the pools

or to protect the pools in their own sets

Multi-level media pools

It might astonish you to find out that media pools can be organized into hierarchies

or nests In other words, you can create media pools that hold several other mediapools An application can then use the root media pool and gain access to the dif-ferent data storage formats in the nested media pools Expect to see sophisticateddocument storage, backup, and management applications using such media pools

Trang 12

An example of using such a hierarchy of media pools can be drawn from a near aster that was averted during the writing of this chapter One of our 15-tape DLTchangers went nuts and began reporting that our tapes were not really DLT tapesbut alien devices it was unable to identify The only way to continue backing up ourserver farm was to enlist every SCSI tape and disk device on the network into onelarge pool Once the DLT library recovered, we could go back to business as usual.

dis-Work Queue and Operator Requests

You will notice nodes for both Work Queue and Operator Requests in theRemovable Storage tree These services provide a communications and informationexchange function between the operator (the backup operator or administrator orthe backup operator group) and Removable Storage, respectively

Work queue

Working backup applications and the HRS/RSS service post work requests to theRemovable Storage service To manage the multitude of requests that can comefrom applications and services, each request for work from the Removable Storageservice is placed into the work queue The work queue is very similar in concept to

a print queue discussed in Chapter 23

The work queue provides information on queue states on a continual basis, andthese are reported to the details pane in the Work Queue node For example, if anapplication is busy backing up data, an “In Process” state will be posted to thedetails pane identifying the work request and the state it is in Table 17-1 describesthe work queue states reported to the Work Queue details pane

In Process RS is working on the work item.

Waiting The request is waiting for a resource, currently being used by another

service, before work on the item can continue.

Completed RS has handled the work item successfully The request has been satisfied Failed RS has failed to complete the work item The request did not obtain the

desired service.

Trang 13

Operator requests

No matter how sophisticated Removable Storage is, there are some things it justwill not do These items will be marked for the “human” work queue For example,Removable Storage will not go and fetch cartridges from the cabinet or the store-room This is something you have to do The details pane in the Operator Requestsnode is where Removable Storage posts its request states for you, the operator

Removable Storage can also send you a message via the messenger service or thesystem tray, just in case you have the habit of pretending the Operator Requestsnode does not exist Table 17-2 lists the possible Operator Request States

Table 17-2

Operator Request States

State Explanation

Submitted The described request has been submitted, and the system is waiting for

the operator’s input.

Refused The operator has refused to perform the described request.

Completed The operator has complied and has completed the described request.

Labeling Media

Removable Storage can read data written to the labels on the actual tape or netic disk as well as external information supplied in bar code format The identifi-cation service is robust and highly sophisticated and will ensure that your mediadoes not get overwritten or modified by other applications

mag-You need to provide names for your media pools, and you should also, if you canafford a bar code reader, organize them according to serial numbers (represented

as bar codes) for more accurate handling If you are planning to install a library system, make sure you get one that can read the bar codes from the physical labels

on the cartridge casing This information will be critical when it comes to locating

a few files that need restoring from five million files stored on 120 30GB tapes (the bigger the enterprise, the more complex the backup and restore regimen and management)

Another reason we prefer a numbering or bar code scheme for identifying media, asopposed to labeling it according to the day of the week, is that often a cartridge canget inadvertently written to on the wrong day If that happens, you may have a cartnamed Wednesday, but with Tuesday data on it, which can get confusing and createunnecessary concern With a bar code or serial number, you can simply make surethat the cart gets put back into the Wednesday box without having to scratch out

or change the label

Trang 14

Practicing Scratch and Save

Although Windows 2000 does not cater to the concept of scratch and save sets, it

is worth a mention because you should understand the terms for more advanced

backup procedures Simply put, a save set is a set of media in the media pool that cannot be overwritten for a certain period of time A scratch set is a set of media

that is safe to overwrite A backup set should be stored and cataloged in a save setfor any period of time during which the media should not be used for backup Youcan create your own spreadsheet or table of media rotating in and out of scratchand save sets

The principal behind scratch and save is to protect data from being overwritten for pre-determined periods We have included a scratch and save utility on the CD

accompanying this book; it is called Scratch n’ Save and can be found in the SNS

folder Although this little utility does not prevent you from overwriting data, it will assist you in organizing your media pools

For example, a monthly save set is saved for a month, while a yearly is saved for

a year After a “safe” period of time has elapsed, you can move the save set to thescratch set In other words, once a set is moved out of the save status into thescratch status, you are tacitly allowing the files on it to be destroyed A save setbecomes a scratch set when you are sure, through proper media pool management,that other media in the pool contain both full and modified, and current and pastfiles of your data, and that it is safe to destroy the data on the scratch media

It is important to fully understand the concept of save and scratch sets because it

is the only way you will be able to ensure your media can be safely recycled Thealternative is to make every set a save set, which means you never recycle thetapes making your DR project a very costly and risky venture because tapesthat are being constantly used will stretch and wear out sooner

Establishing Quality of Support Baselines for Data Backup/Restore

Windows 2000 provides the administrator with backup and recovery tools seenbefore only on midrange and mainframe technology (such as the ability to markfiles for archiving) For the first time, Windows network administrators are in amuch better position to commit to service level agreements and quality of service

or support levels than before Unfortunately, the new tools and technologies result

in a higher and more critical administrative burden (the service level shifts to theWindows administrator as opposed to being usually the domain of the midrange,UNIX, or mainframe administrative team) Let’s consider some of the abstractissues related to backups before we get into procedures

No matter how regularly you back up the data on your network, you can only restore

up to the point of your last complete backup Unless you are backing up every second

Trang 15

of the day, which is highly unlikely and impractical, you can never fully recover thelatest data up to the point of meltdown (unless you had a crash immediately after you backed up) You need to decide how critical it is that your business cannot afford

to lose even one hour of data For many companies, any loss could mean serious back and costly recovery, often lasting long after the disaster occurs

set-It is important, therefore, that you consider the numerous alternatives for backupprocedures and various strategies if out-of-date data is considered inadequaterecovery You need to decide on a baseline for backup/restores: What is the leastacceptable recovery situation? You will also need to take into account the quality ofsupport promised to staff and the departments and divisions that depend on yoursystems, and the service level agreements (SLA) in place with the customers

Service level and quality of support are discussed fully in Chapters 1, 4, and 5

First, before we consider other factors, let’s decide what we would consider quate in terms of the currency of backed-up data Then, once we have establishedour tolerance level, we need to work out how to cater to it, and at what cost

ade-Starting with currency, consider this list:

1 Data restored is one month or more old.

2 Data restored is between one and four weeks old.

3 Data restored is between four and seven days old.

4 Data restored is between one and three days old.

5 Data restored is between six and twelve hours old.

6 Data restored is between two and five hours old.

7 Data restored is between one and sixty minutes old.

Now, depending on how the backups were done and the nature of your backup nology, just starting up the recovery process could take anywhere up to ten minutes(such as reading the catalog), depending on the technology So, level 7 would be out

tech-of the picture for you as a tape backup proposition In cases where backup media isoff-site, you would need to take into consideration how long it takes after placing acall to the backup bank for the media to arrive at the data center This could be any-thing from 30 minutes to 6 hours And you may be charged for “rush” delivery

Now look back at the list and consider your options How important (mission-critical)

is it that data is restored, if not in real-time, almost in real-time? There are many tions requiring immediate restoration of data Many applications in banking, finance,business, science, engineering, medicine, and so on require real-time recovery of data

situa-in the event of a crash, corruption of data, deleted data, and so on

You could and should be exploring or installing clustered systems, mirrors, tion sets, and RAID-5 level and higher storage arrays, as described in the previouschapter But these so-called fault-tolerant and redundant systems typically share a

replica-Note

Trang 16

common hard-disk array or a central storage facility Loss of data is thus wide and mirrored across the entire array A mirror is a reflection: no more, no less.This brings us to another factor to consider: the flawed backup You bring this fac-tor into consideration if your data is continuously changing The question to ask

system-is, “How soon after the update of data should I make a backup?” You may decide,based on the previous list, that data even five minutes out of date is damaging tosystem integrity or the business objectives A good example is online real-timeorder or delivery tracking But backing up data with such narrow intervals betweenversions brings us to the subject of quality and integrity of backed-up data (Later

in this chapter, we will discuss versioning and how new technology in Windows

2000 facilitates it.) What if the file that just got hit by a killer virus is quarantinedand you go to the backup only to find it is also infected or corrupt? What if all theprevious files are infected, and now just opening the file renders it useless? It’ssomething to think about

Earlier this year, we rushed to the aid of our main SQL Server group, which had lost

a valuable database on the customer ordering system (on our extranet) Every houroffline was costing the company six figures as customers went elsewhere to placetheir orders Four-letter words were flying around the server room We had to goback three days to find a clean backup of the database that showed no evidence ofcorrupt metadata

Figure 17-2 illustrates data backed up on a daily basis, and in this case, bad data isbacked up for three days in a row You may consider some of the gray area as safe,where backup data is bound to have all the flaws of its source (corruption, viruses,lack of integrity, and so forth), if you have other means of assuring quality or dataintegrity Such assurances may be provided by means of highly sophisticated anti-virus software, quality of data routines and algorithms, versioning, and just makingsure people check their data themselves Backing up bad data every ten minutesmay be a futile exercise depending on the tools you have to recover or rebuild theintegrity of the data

Most companies back up data to a tape drive (we discuss the formats later) Theinitial cost is really insignificant in relation to the benefit: the ability to back up andrecover large amounts of data A good tape drive can run anywhere from $500 forgood Quarter-inch Cartridge (QIC) systems to $3,000 to $4,000 for the high-speed,high-capacity Digital Linear Tape (DLT) systems, and a robotic library system cancost as much as $30,000 Let’s now consider minimum restore levels, keeping thequality of backup factors described earlier in mind:

1 Restore is required in real-time (now) or close to it Data must be no longer

than a few seconds old and immediately accessible by users and systemseven in the event the primary source is offline In the case of industrial ormedical systems, the secondary source of data must be up-to-date, and thelatency might be measured in milliseconds and not seconds Your SLAs may

dictate that 24-7 customers can fine you if data is offline longer than x seconds

or minutes Let’s call this the critical restore level.

Trang 17

Figure 17-2: The narrower the interval between backups, the more

chance that backed up data is also corrupted, infected, or lacks integrity

2 Restore is required within ten minutes of the primary source going offline.

Let’s call this emergency restore.

3 Restore is required within one hour of the primary source going offline Let’s

call this urgent restore.

4 Restore is required within one to four hours of the primary source going

offline Let’s call this important restore.

5 All other restores that can occur later than the previous can be considered

Trang 18

Figure 17-3: The data restoration pyramid

The pyramid in Figure 17-3 illustrates that the faster the response to a restore orrecall of data request, the higher the chance of retrieving poor data Each layer ofthe pyramid covers the critical level of the restore request This does not mean thatcritical restores are always going to be a risk and that the restored data is flawed

It means that the data backed up closest to the point of failure is more likely to be

at risk compared to data that was backed up hours or even days before the failure

If a hard disk crashes, the data on the backup tapes is probably sound, but if thecrash is due to corrupt data or virus infection, the likelihood of recent data beinginfected is high

Another factor to consider is that often you’ll find that the “cleanest” backup data

is the furthest away from the point of restoration, or the most out-of-date

If the level of restore you need is not as critical or the quality of the backup not too important, you could consider a tape drive system either to a backup server

or local to the hosting machine You could then set up a scheme of continuous orhourly backup routines In the event data is lost (usually because someone deletes

a file or folder), you would be able to restore the file The worst-case scenario isthat the data restored is one hour out of date, and at such a wide interval, that areplacement of a corrupt file with another corrupt file is unlikely Consider the fol-lowing anecdote: We recently lost a very important Exchange-based e-mail system.Many accounts on the server could be considered extremely mission critical.Thousands of dollars were lost every minute the server was down (The falloutfrom downed systems compounds damages at an incredible rate The longer a system is down, the worse it becomes.)

= data integrity

criticalEmergency

UrgentImportantCasual

Trang 19

The last full backup of the server was performed on the weekend The system went down on Wednesday Since we were backing up only the files that changed

on Monday and Tuesday, we would be able to restore the e-mail server to the state

it was the night before This was good news to the MIS director, but not very goodnews to people who felt that losing six to eight hours of e-mail was unacceptable(for many that would mean losing an entire day of work and a lot of wasted timerewriting and resending e-mail)

But the good news was short-lived when we discovered that the transaction logscovering the Monday and Tuesday backups were corrupt on the system and on thetapes The result was that we were able to restore the entire system to the state itwas on Friday, essentially losing everything between Friday night and Wednesdayafternoon For backup administrators, this was an unacceptable event Later in thechapter, we discuss how to prevent this from happening

If you have several servers that need this level of protection, you will have to installsome expensive backup equipment and advanced third-party software Having ahot “clone” mirroring the entire system would be the way to go Both disk and sys-tem mirroring, striping, and redundancy are discussed in Chapters 16 and 21 Full-blown redundant systems are required if applications need to continue oblivious ofthe switch to alternative media and hardware To summarize: Looking back at ourchecklists and matrices for a restore service level of five and up, you would be look-ing at regular tape backup systems Anything more critical would require onlinelibraries and a hierarchical storage management system — and yes, we will lookinto this new service provided by Remote Storage Services (RSS) in Chapter 21

Establishing Quality of Capture

In planning backup procedures and establishing quality of support levels for ups, and considering what we have discussed previously, it is vital you consider thequality of your backups before you begin designing rotation schedules and schemesand backup/restore procedures Every business is different Even businesses in likeindustries do things differently, so what you work out may work for you, but not foranyone else What we suggest here are guidelines for establishing procedures

back-Before you get stuck in here, remember this: Devise a plan, and if it works (aftertests work under strict analysis), stick to it When backup media gets out of sync orgets lost or damaged, you may have a disaster when trying to restore critical data

Best Backup Time-of-Day

Let’s say that you decide to back up your data every night One of the first items

to consider is when you start your backups If staff work late or your systems areaccessed late into the night, you might wait until the early hours of the morning tobegin backing up In other words, the best time to start doing backups is when the

Trang 20

files are least likely to be open and changing, or when you feel you are getting thelast possible version change before people go home for the night and systemsbecome idle again.

You may run into problems backing up earlier in the evening or even late at nightwhen, for example, a process or department swings around at near midnight andupdates 20 percent of the critical data you need to back up (like night order pro-cessing) It can be especially tough to decide when to start backing up e-mail sys-tems and database management systems, because they typically are in use aroundthe clock, especially if your organization is a national or global entity

Some organizations restrict access to systems at certain times to ensure that thebest backups are achieved at that certain time This would naturally have to becoordinated with other departments and change control because making a systemunavailable could crash other processes that may be running at the same time, orthey may need access to the data We believe systems should never be taken offline even for backups Also, in the age of the Internet, who would want to restrictaccess to systems? That’s tantamount to closing shop in the middle of the day for international Web sites that view “after hours” as an obsolete term in theInformation Age

Length of Backup

You should also work out how long your backups take It may be prudent to startyour backups at one minute to midnight, but if morning swings around and yourbackups are still churning away, you will have hardly performed a backup and thefile may become locked or substantially changed when systems and people log inand seize control again

If your backup devices are backing up multiple servers, you may not get to the lastmachines until the next day There’s not much sense in a Thursday incrementalbackup that is part of a rotation scheme, and that only takes place on Saturday.There are a number of options to consider when striving to ensure that the bestquality backups take place in as quick a time as possible

✦ Files that do not change: Repeatedly backing up system and application files

is a waste of time Many administrators, from either lack of time to plan theirbackups or ignorance, waste an incredible amount of time and resourcesbacking up files that seldom change System files are a good example, as aretemp files and non-critical log files You could consider dividing your backupsinto the categories described next

✦ Long-term system and system state files: These files include program files

and system state files that never change or change very seldom As explainedlater, incremental and differential backup functions ignore these files once afull backup has occurred, but it still makes no sense tying up time and mediaeven on a weekly or monthly full routine that can often run into two or moredays of continuous backup

Trang 21

✦ Short-term state files: These files include system or application state files

that do change often Such files include configuration files, registry files, the Active Directory files, and so on On servers, both registry and ActiveDirectory files can change every day, when new users or resources are added

or changed So if short-term state files change daily on your servers, then they will need to be included in backups Non-critical short-term state files,including pagesysfiles, event log files, and temp files (.tmp), are not needed

to restore downed systems, nor are they critical or useful to data

✦ Data and resource files: These files include word-processing files,

graphics-related files, database files, transaction logs, e-mails and other tions files, spreadsheets, voice and audio recordings, and so on These files(and they can often be listed or categorized by their extensions) change often,are almost always critical, and should always be backed up or included in allbackup routines

communica-If you intelligently include or exclude certain groups of files, you can control andkeep backup times to a minimum You will also save on media (at $30 to $50 a popfor DLTs and not much less for small packs of DAT cartridges); you can save a lot ofmoney and wear and tear on systems, media, and backup devices

Redundant systems that use replication services in products like Active Directory,SQL Server 2000, Exchange 2000, and so on, are more effective, in many cases,than fancy backup technology for high-availability initiatives

Backup of Servers and Workstations

If you have not by now separated your backup procedures into backup of systemsand backup of data, now is the time to do it Often, system administrators repeat-edly back up Windows servers and workstations in their entirety for absolutely noreason We cannot count how many full versions or backups of our systems wehave in storage This has a lot to do with the lack of thought that goes into backuppractice and little to do with the inflexible backup technology of earlier versions ofthe Windows server platform

In some cases, we have several years of system backups where the only files on the media that are different are the data files From the get-go, you could probablyrecover 10 to 20 cartridges and put them back into the rotation without impactingyour quality of service and backup integrity levels (that could be worth a lot ofmoney to you in media costs and time)

How do you then deal with the backup of systems? If you have not already done so,you should consider taking an “image” of the system and saving it either on tapemedia, compact disk, DVD, or on a remote storage volume We recommend againststoring images on any remote storage volume or disk because the hardware couldfail or someone else might delete the file, even if you secure it (although Windows

2000 security provides more protection than Windows NT 4.0)

Note

Trang 22

Instead, burn the system image onto a CD, or use a product that specializes in so-called “bare metal” capture of all data There are several popular products thatspecialize in bare metal recovery The Stac Replica system, for example, boasts the ability to back up a server and then restore it to any another machine with zero reinstallation required.

Workstations are viable candidates for image storage because they usually neverget backed up Most system administrators tell their users to stick their data intoserver share-points where (1) they are accessible to the groups that have interests

in the files, and (2) the data gets backed up every day when the rotation sweepsaround Windows 2000 now offers such advanced control over the user’s workspacethat policy dictating the storage of user’s files on a server share is entirely enforce-able See Chapter 11 for information on how to redirect user’s data folders tobackup share-points

Many users lose a considerable amount of computing time and inconvenience whenthey lose a workstation and there is no backup for it Getting such a system back towhat it was before a hard disk crash, fire, or theft can take more than a day Manycritical processes also take place from workstations

To restore a system from an image is relatively simple, and in many cases, recoverycan take place in a morning Images can also be kept in a safe place at work forquick access

The upshot of this method is that if a system is blown away, you need only to set upidentical or very similar hardware and restore from the image to get back a machinethat is at the same state as when the image was burned You would then restore thedata and any files that have changed since the image was burned

Naturally, you need to ensure that you install the necessary service packs that were installed on the system from the time of the last image burning Or you shouldre-burn the image after a new service pack, application software, or new systemlibraries have been applied

The best candidates for the image burning and bare metal backup techniques areservers where the majority of files on the system are static system files A printserver is a good example, and the Windows 2000 Resource Kit includes such a util-ity (printmig) to back up logical printer shares It may not be much of a savings

to burn an image of a server where 89 percent of the storage space is dedicated todatabases or e-mail files On the other hand, a Remote Access Server, one of agroup of WINS servers, and volumes that have no changing data on them are idealcandidates for image burns

Trang 23

The Open Files Dilemma

Open files have always been the backup administrator’s nightmare on Windows NTServer, and this is still very much the case on Windows 2000 volumes What arethese open files? Any resource file on a system needs to be opened for exclusive orshared use by a user or device that is exploiting or updating its contents Backupsoftware, backup schemes and rotations, and backup administrators hate open filesbecause:

✦ Open files cannot be backed up

✦ Open files trash automated backup jobs

✦ Open files cause the backup schedules to slow down, even grind to a halt

✦ Forcing open files closed or shutting down services and systems causesheadaches, inconveniences, missed deadlines, crashes and, worse, the BlueScreen of Death (although the latter is the least likely to occur)

Many relational database applications, for example, place “locks” on files while theyare in use The system also places locks on files These files can range from simpleconfiguration files, the registry and Active Directory files (their databases, forexample), SQL servers, WINS servers, DHCP servers, and so on E-mail applicationsare a good example of an open-files nightmare These files are often huge and arealmost always open and in use by the applications Microsoft Exchange is a goodcase in point

If a file is open or there is an exclusive lock on the file, your backups are in trouble

On a mail server like Exchange, the result of the open files problem could be trophic for you The information stores, the registry, the Exchange directory, theActive Directory, WINS, DNS, DHCP, and so on, are always open If the backup failsbecause these huge files could not be backed up, you might be talking about hun-dreds if not thousands of users inconvenienced, at incredible cost

catas-Let’s suppose a disaster: You do a full backup of Microsoft Exchange every end Then one day your silent pager vibrates your hip joints with the message thatthe Exchange server crashed When you try to revive the system, guess what, itdoes not want to be revived But that’s okay because you have been diligently making full backups of Exchange every weekend Only, when you go and do yourbackup, you find that the backup software was skipping exactly those files you need to do the backup from Career killer?

week-Database servers can cause even bigger headaches Many, such as SQL Server, areself-contained domains of users and login mechanisms From the outside world,you only see a huge database blob In the case of SQL Server, it’s the files with the.datextension, such as msdb.dat In fact, any huge file that has a dator a ?dbextension is likely to be a database

Ngày đăng: 21/12/2013, 05:18

TỪ KHÓA LIÊN QUAN

w