Often, global companies devise a DR configuration that reserves each major data center site in their regions as the active or passive DR site for another region.. Active Multi-Site DR Pr
Trang 1A few things may cause issues for this pattern, such as the need to make sure that no
application keeps “state” from one transaction to the other Additionally, the application
and/or the web tier needs to be able to route user connections (the load) to either site in
some type of balanced or round-robin method This is often done with big IP routers that
use round-robin routing algorithms, for example, to determine which site to direct
connections to Active/active configurations can be created using peer-to-peer continuous
data replication as well as other multi-updating subscriber replication topologies A slight
twist to having two primary sites is to have one primary site and a secondary site that
doesn’t process transactions but is actively used for reporting, testing, and other tasks (just
no processing that changes anything) In the event of a primary failure, the secondary site
can take over full primary site responsibilities quickly This is sort of active/passive, with
active “secondary usage” on the passive site (following the first active/passive DR pattern
described previously) This type of configuration can take advantage of database mirroring
and database snapshots (for the reporting) There are plenty of advantages to this
varia-tion, which greatly distributes the workload and moves up the DR pyramid
Active Multisite DR Pattern
An active multisite DR configuration contains three or more active sites, with the
inten-tion of using any one of them as the DR site for the other (as shown in Figure 56.4) This
pattern allows you to distribute your applications redundantly between any pair of sites,
but not to all three (or more) For instance, you could have half of Primary Site 1’s
appli-cations on Primary Site 2 and the other half on Primary Site 3 This way, you spread out
the risk further and increase your odds of uninterrupted processing
Again, having “stateless” applications is critical here, as is some smart routing of all
connections to the right sites Using continuous data replication and the database
mirror-ing options allows you to easily create such a DR topology And, again, you also have the
secondary usage variation available to you if one or more alternative sites were passive
(with secondary usage supporting reporting, for example)
Choosing a Disaster Recovery Pattern
We reduce these to patterns because, at the foundational level, they represent what you
need to do to support the level of business continuity your company demands Some
companies can tolerate different levels of loss because of the nature of their business;
others cannot At the highest levels, it is fairly easy to match these patterns to what your
business requires In this chapter, we look at what SQL Server capabilities are available to
help you implement these patterns
Often, global companies devise a DR configuration that reserves each major data center
site in their regions as the active or passive DR site for another region Figure 56.5 shows
one large high-tech company’s global data center locations Its Alexandria, Virginia, site is
also the passive DR site for its Phoenix, Arizona, site Its Paris, France, regional site is also
the DR site for its Alexandria, Virginia, site, and so on
For companies that have multiple data center sites but only need to support the
active/passive DR pattern, a very popular variation can be used This variation is called
reciprocal DR As you can see in Figure 56.6, there are two sites (Site 1 and Site 2) Each is
active for some applications (Applications 1, 3, and 5 on Site 1 and Applications 2, 4, and
Trang 2Active Multi-Site DR
Primary Site 1
ACTIVE
A B
Web and
Application Tier
SQL Server
Database Tier
Physical
Storage
Tier
“In Sync”
Primary Site 2
ACTIVE
A C
Web and Application Tier
SQL Server Database Tier
Physical Storage Tier
“In Sync”
Primary Site n
ACTIVE
“In Sync”
Web and Application Tier
SQL Server Database Tier
Physical Storage Tier
snapshots snapshots
snapshots
Bi-directional
Synchronization
Bi-directio nal Synchronization
Bi-directional Synchronization
C B
FIGURE 56.4 Active multisite DR pattern
6 on Site 2) Site 1’s applications are passively supported on Site 2, and Site 2’s applications
are passively supported on Site 1 Rolling out the configuration this way eliminates the
“stateless” application issue completely and is fairly easy to implement It is also possible
to provide the passive applications data available via database snapshots at the other
reci-procal site (for free!), further leveraging distributing workload geographically
This configuration also spreads out the risk of losing all applications if one site ever
happens to be lost (as in a disaster) Again, the Microsoft products to help you achieve this
DR pattern variation are data replication to the DR site, or log shipping, and even
asyn-chronous database mirroring with database snapshots available to help with some
distrib-uted reporting As we noted previously, third-party products such as Symantec’s Veritas
Volume Replicator can be used to push physical byte-level changes to the passive (hot) DR
site physical tier level
Trang 3Reciprocal DR
Primary App1
Primary App3
Primary App5
…
App 2 App 4 App 6
Primary Site 1
snapshots
Primary App2 Primary App4 Primary App6
…
App 1 App 3 App 5
Primary Site 2
snapshots
FIGURE 56.6 Reciprocal DR
Phoenix,
AZ
Paris, FRANCE
Mumbai, INDIA
Alexandria VA
FIGURE 56.5 Using active regional sites for passive DR
Recovery Objectives
You need to understand two main recovery objectives: the point in time to which data
must be restored to be able to successfully resume processing (called the recovery point
objective) and the acceptable amount of downtime that is tolerable (called the recovery time
objective) The recovery point objective (RPO) is often thought of as the time between the
Trang 4last backup and the point when the outage occurred It indicates the amount of data that
will be lost The recovery time objective (RTO) is determined based on the acceptable
downtime in case of a disruption of operations It indicates the latest point in time at
which the business operations must resume after disaster (that is, how much time can
elapse)
The RPO and RTO form the basis on which a data protection strategy is developed This
helps to provide a picture of the total time that a business may lose due to a disaster The
two of them together are very important requirements when designing a solution Let’s
put these terms in the form of algorithms:
RTO = Difference between the time of the disaster to the time the system is operational –
Time operational (up) – Time disaster occurred (down)
RPO = Time since the last backup of complete transactions representing data that must
be re-acquired or entered – Time disaster occurred – Time of last usable data backup
Therefore:
Total lost business time = Time operational (up) – Time disaster occurred (down) – Time of
the last usable data backup
Knowing your RPO and RTO requirements is essential in determining what DR pattern to
use and what Microsoft options to utilize
A Data-Centric Approach to Disaster Recovery
Disaster recovery is a complex undertaking unto itself However, it isn’t really necessary to
recover every system or application in the event of a disaster Priorities must be set on
determining exactly which systems or applications must be recovered These are typically
the revenue-generating applications (such as order entry, order fulfillment, and invoicing)
that your business relies on to do basic business with its customers Therefore, you set the
highest priorities for DR with those revenue-generating systems Then the next level of
recovery is for the second-priority applications (such as HR systems)
After you prioritize which applications should be part of your DR plans, you need to fully
understand what must be included in recovery to ensure that these priority applications
are fully functional The best way is to take a data-centric approach, which focuses on
what data is needed to bring up the application Data comes in many flavors, as Figure
56.7 shows:
appli-cations, middleware, or back end needs
do, or the middleware needs to execute with, and so on
the transactional data in your systems
As just mentioned, you first identify which applications you must include in your DR
plans, and then you must make sure you back up and are able to recover that application’s
Trang 5Applications (ERP, HR, SFA,…)
Middleware (EAI, ETL, WS,…)
tightly coupled loosely coupled
Back End (SQL Server, Files, Other…)
Systems (HW, OS, Network)
B A
loosely coupled tightly coupled
loosely coupled tightly coupled tightly coupled
loosely coupled
Meta data
Types of Data
Location of the Data (Tiers)
Configuration data
Application data (values)
FIGURE 56.7 Types of data and where the data resides
data (metadata, configuration data, and application data) As part of this exercise, you
must determine how tightly or loosely coupled the data is to other applications In other
words (as you can also see in Figure 56.7), if on the back-end tier, Database A has the
orders transactions and Database B has the invoicing data, both must be included in the
DR plans (because they are tightly coupled) In addition, you must also know how tightly
or loosely coupled the application stack components are with each layer In other words
(again looking at Figure 56.7), if the ERP application (in the application tier) requires some
type of middleware to be present to handle all its messaging, that middleware tier
compo-nent is tightly coupled with the ERP application and so on
Microsoft SQL Server Options for Disaster Recovery
You have seen the fundamental DR patterns you will be targeting and also recognize how
to identify the highest priority applications and their tightly coupled components for DR
Now let’s look again at the specific Microsoft options available to implement various DR
solutions These options include data replication, log shipping, database mirroring, and
database snapshots
Data Replication
One of the strongest and more stable Microsoft options that can be leveraged for disaster
recovery is data replication Not all variations of data replication fit this bill, though In
particular, the central publisher using either continuous or very frequently scheduled
distribution is very good for creating a hot spare of a SQL Server database across almost
any geographical distance, as shown in Figure 56.8 The primary site is the only one
actively processing transactions (updates, inserts, deletes) in this configuration, with all
transactions being replicated to the subscriber, usually in a continuous replication mode
Trang 6Central Publisher Replication
Publication
Server
SQL Server
2008
Active Primary Site
Remote Distribution
Server
Publisher
Distributor
Adventure
Works DB
translog
Subscription Server
SQL Server 2008
Passive DR Site with Active read only
“Hot Spare”
Continuous (transactional)
Adventure Works DB translog
Subscriber
SQL Server
2008
distribution
FIGURE 56.8 Central publisher data replication configuration for active/passive DR
The subscriber at the DR site is as up-to-date as the last distributed (replicated) transaction
from the publisher—usually near real-time The subscriber can be used for a read-only type
of processing if controlled properly and that read-only access does not hinder the
replica-tion processing and put your DR pattern at risk
The newer peer-to-peer replication option provides a viable active/active capability that
keeps both primaries in sync as transactions flow into each server’s database, as shown in
Figure 56.9 Both sites contain a full copy of the database, with transactions being
consumed and then replicated simultaneously between them
The complete setup of these data replication configurations is covered in Chapter 19,
“Replication.”
Log Shipping
As you can see in Figure 56.10, log shipping is readily usable for the active/passive DR
pattern You must understand that log shipping is only as good as the last successful
trans-action log shipment Frequency of these log ships is critical in the RTO and RPO aspects of
DR This is really not a real-time solution Even if you are using continuous log shipping
mode, there is a lag of some duration due to the file movement and log application on the
destination
Trang 7Peer-to-Peer Relication
SQL Server 2008
Publication
Server
Distribution
Server
SQL Server
2008
Active Primary Site
North American
Active Site
distribution
Adventure
Works DB
translog
SQL Server 2008
Publication Server
Distribution Server
SQL Server 2008
Active Primary Site
Asia Active Site
distribution
Adventure Works DB translog
FIGURE 56.9 Peer-to-peer data replication configuration for active/active DR
SQL Server 2008
Primary
Server
CallOne DB translog
SQL Server 2008
Monitor Server
Log Shipping
“Monitor”
MSDB DB
SQL Server 2008
Active Primary Site
TxnLog backups
\Backup\CallOne_tlog_200905141120.TRN
\LogShare\CallOne_tlog_200905141120.TRN
Passive DR Site
TxnLog Copies
TxnLog Restores
Secondary Server
CallOne DB
Last log shipped Delay Answer
Delay between logs loaded Delay Answer
FIGURE 56.10 Log shipping configuration for active/passive DR
Trang 8SQL Server 2008
Principal
Server
Adventure Works DB translog
SQL Server 2008
Witness Server
Database Mirroring
MSDB DB
SQL Server 2008
Active Primary Site
Passive DR Site with Active DB Snapshot
Mirror Server
Adventure Works DB translog
Reporti ng Users
Reporting Users
Database Snapshot
FIGURE 56.11 Database mirroring and database snapshots for active/passive DR
Remember, log shipping is destined to be deprecated by Microsoft (unofficially
announced) So it is perhaps not a good idea to start planning a future DR
implementa-tion that will go away
Database Mirroring and Snapshots
Database mirroring is rapidly becoming the new, viable DR option from Microsoft In
either a high-availability mode (synchronous) or performance mode (asynchronous), this
capability can help minimize data loss and time to recover (RPO and RTO) As you can see
in Figure 56.11, database mirroring can be used across any reasonable network connection
that may exist from one site to another It effectively creates a mirror image that is
completely intact for failover purposes if a site is lost It is viable in both an active/passive
pattern and in an active/active pattern (where a database snapshot is created from the
unavailable mirror database and is used for active reporting)
NOTE
It is likely Microsoft will rapidly enhance database mirroring to support all DR
pat-terns over time
Setup and configuration of database mirroring are covered in Chapter 20, “Database
Mirroring,” along with full details of database snapshots in Chapter 32, “Database
Snapshots.”
Trang 9Now, to complete the DR planning for your SQL Server platform, you must do much more
homework and preparation The next section explains a great overall disaster approach
that includes pulling all the right information available and executing on a DR plan (and
testing it thoroughly)
The Overall Disaster Recovery Process
In general, a handful of things need to be put together (that is, defined and executed
upon) as the basis for an overall disaster recovery process or plan The following list
clearly identifies where you need to start:
1 Create a disaster recovery execution tasks/run book This should include all steps to
take to recover from a disaster and cover all system components that need to be
recovered
2 Arrange for or procure a server/site to recover to This should be a configuration that
can house what is needed to get you back online
3 Guarantee that a complete database backup/recovery mechanism is in place
(includ-ing offsite/alternate site archive and retrieval of databases)
4 Guarantee that an application backup/recovery mechanism is in place (for example,
COM+ applications, NET applications, web services, other application components,
and so on)
5 Make sure you can completely re-create and resynchronize your security (Microsoft
Active Directory, domain accounts, SQL Server logins/passwords, and so on) We call
this “security resynchronization readiness.”
6 Make sure you can completely configure and open up network/communication
lines This also includes ensuring that routers are configured properly, IP addresses
are made available, and so on
7 Train your support personnel on all elements of recovery You can never know
enough ways to recover a system And it seems that a system never recovers the
same way twice
8 Plan and execute an annual or bi-annual disaster recovery simulation The one or
two days that you do this will pay you back a hundred times over if a disaster
actual-ly occurs And, remember, disasters come in many flavors
Many organizations have gone to the concept of having hot alternate sites available via
stretch clustering or log shipping techniques Costs can be high for some of these
advanced and highly redundant solutions
The Focus of Disaster Recovery
If you create some very solid, time-tested mechanisms for re-creating your SQL Server
envi-ronment, they will serve you well when you need them most Following are the things to
focus on for disaster recovery:
Trang 10Always generate scripts for as much of your work as possible (anything created using
a wizard, SMSS, and so on) These scripts will save your hide They should include
the following:
Complete replication buildup/breakdown scripts
Complete database creation scripts (DB, tables, indexes, views, and so on)
Complete SQL login, database user IDs and password scripts (including roles
and other grants)
Linked/remote server setup (linked servers, remote logins)
Log shipping setup (source, target, and monitor servers)
Any custom SQL Agent tasks
Backup/restore scripts
Potentially other scripts, depending on what you have built on SQL Server
Make sure you document all aspects of SQL database maintenance plans being used
This includes frequencies, alerts, email addresses being notified when errors occur,
backup file/device locations, and so on
Document all hardware/software configurations used:
Leveragesqldiag.exefor this (as described in the next section)
Record what accounts were used to start up the SQL Agent service for an
instance and MS Distributed Transaction Coordinator (MS DTC) service This
step is especially important if you’re using distributed transactions and data
replication
The favorite SQL Server implementation characteristics that we script and
record for a SQL Server instance are
Server and instance
Microsoft SQL Server is running
the current installation of Microsoft SQL Server
name; the server’s replication status; and the server’s identification
number, collation name, and time-out values for connecting to, or
queries against, linked servers
associated users in each database