Replicating DataIN THIS CHAPTER Replication concepts Configuring replication Replication is an optional native SQL Server 2008 component that is used to copy data and other database obje
Trang 1Nielsen c35.tex V4 - 07/21/2009 2:10pm Page 812
Service Broker can handle complex message groups, such as multiple line items of an order that may not
appear consecutively in the queue due to other messages being received simultaneously The
conversa-tion group can be used to select out the related messages
Monitoring Service Broker
While Management Studio has no visibility in the activity of a queue, nor summary page reports for
the queue object, you can select directly from the queue or select acount() In addition, there are
database catalog views to shed light on the queue:
* sys.dm_broker_activated_tasks
* sys.dm_broker_connections
* sys.dm_broker_forwarded_messages
* sys.dm_broker_queue_monitors
SQL Trace/Profiler has a Broker event class with 10 Service Broker–related events that can be traced
Summary
Service Broker is one of those technologies that provides no benefit ‘‘out of the box.’’ Unless you make
the effort to architect the database using Service Broker, it offers no advantage However, if you do
take the time to design the database using Service Broker, you’ll see significant scalability benefits, as
Service Broker queues buffer the workload
The next chapter continues the progression through SQL Server technologies and discusses ADO.NET
2.0 and its powerful methods for connectivity
Trang 2Replicating Data
IN THIS CHAPTER
Replication concepts Configuring replication
Replication is an optional native SQL Server 2008 component that is used
to copy data and other database objects from one database or server to
another
Replication is used for many purposes, listed here in order from most popular to
rarely used:
■ Offloading reporting from an OLTP server to a reporting server
■ Data consolidation — for example, consolidating branch office data to a
central server
■ Data distribution — for example, distributing data from a central server
to a set of member servers to improve read performance
■ Disaster recovery — replication can be used to keep a DR (disaster
recovery) server synchronized with the main server, and clients can be
manually redirected to the DR with minimal interruption
■ Synchronizing data with a central server and a mobile sales force
■ Synchronizing data with handheld devices (such as PDAs and
smartphones)
Replication processes can be made to be highly scalable, and typically can
syn-chronize data between servers/databases with acceptable latency Latency reflects
the lag of time between when data is sent (replicated) from the source server and
received at the destination server
Replication is not the only way to move data between servers There are several
alternatives, each with its own pros and cons:
■ bcp utility
■ SSIS
Trang 3Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 814
■ Distributed transactions
■ Triggers
■ Copy Database Wizard
■ Backup and restore
■ Log shipping and database mirroring Bulk copy program (bcp) is a command-line tool that can be used to send tabular data to the file
sys-tem, and from there to a remote server While it can be scripted, it is slower than replication processes,
requires significant work to set up, and the DBA/developer needs to ensure that all objects are in place
on the destination server For example, all tables, views, stored procedures, and functions need to be on
the destination server There is no provision for change tracking In other words, bcp can’t tell what has
changed in the data, and only sends the changes to the destination server The solution requires change
tracking — a way to determine what has been inserted/updated/deleted on the source server These may
involve using Change Data Capture or the Change Tracking features in SQL 2008
SSIS can be thought of as a programmatic interface to a high-performance bcp utility It can be faster
than bcp As with bcp, it requires that the DBA/developer place all objects on the destination server, and
there is no provision for change tracking
Distributed transactions normally involve using MS DTC (Microsoft Distributed Transaction
Coordi-nator) With a distributed transaction, the transaction is committed on the source server, then on the
destination server, and then the application can do the next unit of work (This is sometimes called
a split write.) The application has to be configured to use distributed transactions, and the network
connection must be stable and have ample bandwidth; otherwise, the transactions will fail With
distributed transactions, only changes are ‘‘replicated.’’ The DBA/developer needs to place all tables
(along with the initial data), stored procedures, views, and functions on the destination server
Triggers are very similar to distributed transactions With distributed transactions, all application code
(for example, stored procedures, and sometimes ADO.NET code) must be rewritten for the distributed
transactions With triggers, the ‘‘replication’’ logic is incorporated on the trigger And like
distrib-uted transactions, only changes are ‘‘replicated.’’ The DBA/developer needs to place all tables (along
with the initial data), stored procedures, views, and functions on the destination server There is also
overhead with using triggers, especially over a network
The Copy Database Wizard will move or copy a database from one server to another It is intended for
a single use move or copy In the move mode, you can only move the database one time In the copy
mode, the database can be copied multiple times if you specify the options to delete the database and
the database files that might exist on the destination server
Backup and restore will copy the entire database to the destination server The level of granularity
possi-ble for the preceding options are tapossi-bles (bcp, SSIS) and transactions (triggers, distributed transactions)
Backup and restore, log shipping, database mirroring, and the Copy Database Wizard ‘‘replicate’’ entire
databases As the name suggests, backup and restore involves backing up the database on the source
server and restoring it on the destination server This option is not scalable for large databases, and the
database must go offline while the database is being restored It is not a good option in environments
with real-time data requirements, as the data becomes progressively out of date until the latest backup is
restored on the destination server
Trang 4Log shipping is continuous backup and restore The log is backed up on the source server and applied
to a previously restored database backup on the destination server Log shipping is not considered to
be scalable, especially for large databases or large numbers of databases The database on the
destina-tion server is not accessible with log shipping There are opdestina-tions to make it accessible, but it will be in
read-only mode, and users need to be kicked off when the next log is ready to be applied
Database mirroring is continuous log shipping Changes to the database transaction log are continually
shipped from the source server to the destination server The database on the destination server will be
inaccessible while being mirrored There are two modes of database mirroring: high performance and
high safety With high safety, application writes on the source server are not committed on the source
server until they are also committed on the destination server This can cause increased latency for all
writes on the destination server, which may make database mirroring not a good fit for your particular
requirements High-performance mode does not have this problem, as changes occurring on the source
server are applied to the destination server asynchronously However, the high-performance option is
only available on the Enterprise Edition of SQL Server 2005 and SQL Server 2008
What’s New with Replication?
There are several new features in SQL Server 2008 replication:
■ Integration with database mirroring If you have a remote distributor, you can
con-figure your publisher on your principal to failover to your mirror without having to
reinitialize your subscribers
■ Much faster snapshot delivery on Windows 2008 servers
■ A new Wizard for deploying nodes in a peer-to-peer topology For more information,
consult http://msdn.microsoft.com/en-us/library/dd263442.aspx
■ Conflict detection in peer-to-peer replication
■ Ability to make schema changes in peer-to-peer replication without having to stop all
users from using the topology while changes are deployed
Replication Concepts
SQL Server replication operates according to a publishing metaphor There can be three types of servers
in a replication topology:
■ Publisher: The source server
■ Distributor: For transactional replication and peer-to-peer replication, the distributor is where
the changes are stored until they are replicated to the destination server For merge replication,
the distributor is merely a repository for replication process history Changes and historical
information are stored in a database called the distribution database.
■ Subscriber: The destination server
Trang 5Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 816
Types of replication
Based on the publishing metaphor, SQL Server 2008 offers five basic types of replication, each serving a
different purpose:
■ Snapshot replication: A point-in-time image of database objects (a snapshot) is copied from the source server to the destination server This image generation and deployment can
be scheduled at whatever interval makes sense for your requirements; however, it is best used when the majority of your data seldom changes, and when it does, it changes at the same time
■ Transactional replication: Transactions occurring on the source server are asynchronously
captured and stored in a repository (called a distribution database) and then applied, again
asynchronously, on the destination server
■ Oracle publishing: This is a variant of transactional replication Instead of SQL Server being the source server, an Oracle server is the source server, and changes are replicated from the Oracle server to SQL Server This SQL Server can be the final destination for the Oracle server’s data, or it can act as a gateway, and changes can be replicated downstream to other SQL Servers, or other RDBMs Oracle publishing is only available on SQL Server Enterprise Edition and above
■ Peer-to-peer replication: Another variant of transactional replication that is used to replicate data to one or more nodes Each node can publish data to member nodes in a peer-to-peer replication topology Should one node go offline, changes occurring on the offline node and the other member servers will be synchronized when that node comes back online Changes are replicated bi-directionally, so a change occurring on Node A will be replicated to Node B, and changes occurring on Node B will be replicated to Node A Peer-to-peer replication
is an Enterprise Edition–only feature that is scalable to approximately 10 nodes, but your results may vary depending on your replicated workload, your hardware, and your available bandwidth
■ Merge replication: As the name indicates, merge replication is used to merge changes occur-ring on the destination server with changes occuroccur-ring on the source server, and vice versa It
is highly scalable to hundreds if not thousands of destination servers With merge replication, there is a central clearinghouse for changes that determines which changes go where With peer-to-peer replication, any member node in the topology can assume the clearinghouse role
Replication agents
As you might imagine, a lot of work is required to move data between the various servers in the
pub-lishing metaphor To do so, SQL Server replication makes use of three agents:
■ Snapshot agent: Generates the tabular and schema data or schemas for the objects you wish to replicate The tables and schema data, and related replication metadata, is frequently
referred to as the snapshot The snapshot agent is used by all replication types The snapshot
agent writes the tabular/schema data to the file system
■ Distribution agent: Used by snapshot replication to apply the snapshot on the subscribers, and used by transactional replication to apply the snapshot on the subscriber and to replicate subsequent changes occurring on the publisher to the subscriber
Trang 6■ Merge agent: Detects changes that have occurred on the publisher and the subscriber
since the last time these agents ran and merges them together to form a consistent set on
both the publisher and the subscriber In some cases, the same primary key value will be
assigned on the publisher and one or more subscribers between runs of the merge agent
(called a sync) When the merge agent runs it detects this conflict and logs it to conflict
tables that can be viewed using the conflict viewer With merge replication, the data that
is in conflict will persist on the publisher and the subscriber by default For example, if
a primary key value of 1,000 for a table is assigned on the publisher, and then the same
value is assigned on the same table on the subscriber, when the merge agent runs it will log
the conflict, but keep the publisher’s values for the row with a PK (primary key) of 1,000
on the publisher, and keep the subscriber’s values for the row with a PK of 1,000 on the
subscriber
Merge replication has a rich set of features to handle conflicts, including one that skips
changes to different columns occurring on the same row between publisher and subscriber
This is termed column-level conflict tracking For example, a change to John Smith’s home
phone number occurring on the publisher and his cell phone number occurring on the
subscriber would be merged to have both changes persisting on both the publisher and
sub-scriber By default, merge replication uses row-level conflict tracking that might result in the
change to John Smith’s home phone number being updated on both the publisher and the
subscriber, but his cell phone change being rolled back, with this conflict and the conflicting
values being logged to the conflict tables
Best Practice
Asingle server can serve as both the publisher and the distributor and even as the subscriber An excellent
configuration for experimenting with replication is a server with multiple SQL Server instances However,
when performance is an issue, a dedicated distributor server is the best plan This remote distributor can act
as a distributor for multiple publishers; in fact, you can configure this remote distributor to have a separate
distribution database for each publisher
The publisher server organizes multiple articles (an article is a data source: a single table, view, function,
or stored procedure) into a publication You may find that you get better performance by grouping large
articles (tables) into their own publication The distributor server manages the replication process The
publisher can initiate the subscription and push data to the subscriber server, or the subscriber can set
up the subscription and pull the subscription from the publisher
Transactional consistency
The measure of transactional consistency is the degree of synchronization between two replicated
servers As the lag time between synchronizations increases, transactional consistency decreases If
the data is identical on both servers most of the time, transactional consistency is said to be high
Conversely, a replication system that passes changes every two weeks by e-mail has low transactional
consistency
Trang 7Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 818
Configuring Replication
Using wizards is the simplest way to implement replication Developers and DBAs generally avoid
wiz-ards because they have limited features, but implementing replication without wizwiz-ards requires
numer-ous calls to arcane stored procedures and is a tedinumer-ous and painful process prone to user errors
Before configuring replication, it is important to understand the limitations of various SQL Server
editions For example, SQL Server Express can only act as a subscriber, and the number of subscribers
each edition can have is limited For example, the Standard Edition can only have five subscribers, the
Web 25 subscribers Merge replication can only be used to replicate to subscribers with same version or
lower For example, you can’t have a SQL Server 2005 publisher merge replicating to SQL Server 2008
subscribers; however, a SQL Server 2008 publisher can replicate to a SQL Server 2005 subscriber Merge
replication is the only replication type that can replicate to SQL Server CE subscribers
Creating a publisher and distributor
To enable a server as a publisher you must first configure it as a subscriber While you can configure
the publisher with a local or remote distributor, it is recommended that you configure the
distrib-utor first, before creating your first publication This way, if there is a problem, it will be easier to
troubleshoot
The following steps walk you through the process of creating your first distributor:
1 Connect to the server that will be acting as your publisher/distributor or remote distributor
using SQL Server Management Studio You need to use the SQL Server 2008 version of SQL Server Management Studio for this
2 Once you have connected, right-click on the replication folder and select the menu option
Configure Distribution
If you do not see the Configure Distribution option, either your SQL Server edition is SQL Server Express or you do not have the replication components installed To install the replication components, you need to run Setup again.
3 After clicking through the initial splash screen, you will have the option to select which
server you should use as your distributor: either the local server or a remote server If you are using a remote server, you need to ensure that the remote server is already configured as a distributor Because this is a local distributor, select the default option and click Next
4 You will be prompted for a folder to serve as the default location where the snapshot agent
deposits the snapshot Select a different location if the default folder does not have adequate space for your snapshots, or if you want to minimize I/O contention The snapshot generation process is an I/O-intensive process during snapshot generation You do have the option to select a snapshot folder or share for each publication when you create it, so the snapshot folder location is not of critical importance
5 Once you have selected the location for your snapshot folder or snapshot share, click Next.
The Distribution Database dialog enables you to name your distribution database and select folders where the database data and log files will reside
Trang 8Best Practice
If you have a large number of subscribers, or you are replicating over a WAN, you should use a share
for your snapshot folder, or use FTP along with pull subscribers (you will configure FTP server details
when you create your publication) With pull subscribers, the merge or distribution agent or process runs on
the subscriber With push subscribers, the distribution and merge agents run on the publisher/distributor or
distributor If you are using push subscribers with a remote distributor, your snapshot folder must also be
configured as a snapshot share It is not a good security practice to use the Admin shares (i.e., C$), but rather
a share name that hides the path of the actual physical snapshot folder location, and does not require the
distribution or merge agents to run under an account that has rights to access the snapshot share
Best Practice
Optimal configuration of a distributor or a distribution server is on a 64-bit server with ample RAM and
RAID 10 drives The distributor server will be I/O and network bound, so the more RAM available for
caching and the greater the available network bandwidth, the greater the throughput of your transactional
replication solution Merge replication is CPU and network bound, so these best practices do not apply for it
6 Click Next to enable the publishers that you wish to use this distributor If this is a local
publisher/distributor, your publisher will already be selected If not, you need to click the Add
button if you want to enable other publishers to use this distributor
7 Click Next to assign a distributor password This allows remote publishers to use this
distributor as their distributor
8 Click Next, Next again, and Finish to complete the creation of your distributor.
To configure a publisher to use a remote distributor, follow these steps:
1 Connect to the publisher using SQL Server Management Studio, right-click on the Replication
folder, and select Configure Distribution
2 When you get to the option to select which server you wish to use as your distributor, select
the ‘‘Use the following server as the Distributor’’ option
3 Click the Add button and enter the connection information to connect to the remote
distribu-tor You will be prompted for the password you configured to access the remote distribudistribu-tor
4 Click Next, Next again, and Finish.
Your remote distributor is now ready to use
Creating a snapshot/transactional publication
Once a distributor is set up for your server, you can create your publications A publication is defined
as a collection of articles, where an article is an item to be published An article in SQL Server can be a
Trang 9Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 820
table, a view, an indexed view, a user-defined function, or even a stored procedure or its execution If
you choose to replicate the execution of a stored procedure, the stored procedure call will be executed
on the subscriber
For example, if you fire a stored procedure that updates 10,000 rows on a table, this table is replicated,
and the execution of the stored procedure is executed, only the stored procedure call will be executed
If the replication of the stored procedure execution was not replicated, 10,000 update statements would
have to be replicated by the publisher, through the distributor to the subscriber As you can imagine,
there are considerable benefits to doing this
Typically tables are published, but views can also be published You just need to ensure that the base
tables referenced by the views are also published
To create a publication, execute the following steps:
1 Connect to your publisher using SQL Server Management Studio, and expand the Replication
folder, then right-click on the local publication folder and select New Publication
2 After clicking through the initial splash screen, select the database you wish to replicate from
the Publication Databases section
3 Click Next In the Replication Types dialog that appears, select the replication type you wish
to use You will then get a dialog entitled Articles, from which you can select the type of objects you wish to replicate
4 Expand each object type tree and select the articles you wish to replicate For example, if
you wish to replicate tables, expand the table tree and select the individual tables you wish
to replicate You can elect to replicate all tables by selecting the check box next to the table tree You also have the option to replicate only a subset of the columns in tables you are replicating
If you see a table with what appears to be a red circle with a slash through it next to the table, this table does not have a primary key and you will be unable to replicate
it in a transactional publication Snapshot and merge replication allow you replicate tables without
primary keys.
If you highlight a table and then click the article properties drop-down box, you can configure options regarding how the table will be replicated to the subscriber For example, you can replicate user triggers, include foreign key dependencies, and determine what will happen if a table with the same name already exists on the subscriber The options are as follows:
■ Drop the subscriber table
■ Do nothing
■ Keep the table, but delete all of its data
■ Keep the table, but delete all of the data that meets your filtering criteria (covered in the next step)
5 Once you have selected the objects you wish to replicate, click Next The Filter Table Rows
dialog will appear From here, you can configure filtering criteria that sends only a subset of the rows to the subscriber For example, if you were replicating a table with a state column,
Trang 10you may decide that the subscriber should have only rows from California To enable that, you
would click the Add button, select the table in the drop-down box in the ‘‘Select table to filter
option,’’ click the State column in the ‘‘Complete the filter statement’’ section, and then add the
state value In code, it might look like this:
SELECT <published_columns> FROM [dbo].[SalesStaff] WHERE [State]=’CA’
This would ensure that the subscriber only receives data and changes from sales staff when the
value of StateisCA
6 Once you have enabled your filters, click Next The next dialog is Snapshot Agent, which
controls two snapshot options:
■ Create a snapshot immediately and keep the snapshot available to initialize subscriptions
■ Schedule the Snapshot Agent to run at the following times
The first option generates the snapshot immediately; and every replicated change that occurs
in the publication is not only replicated to the subscriber, but also added to the snapshot files
This is a great option when you have to deploy a lot of snapshots frequently, but it does add
a constant load to your publisher The second option to schedule the snapshot agent generates
a snapshot on a schedule, so the snapshot files are updated each time you run the snapshot
agent Changes not in the snapshot have to be stored in the distribution agent, which may
mean extra storage requirements on the distributor For most DBAs/developers, it is not a good
practice to enable these options
7 Click Next to configure Agent Security This option allows you to select the security context
you wish your replication agents to run under By default, SQL Server runs the replication
agents under the same account under which the SQL Server agent account runs
This is not considered to be a good security practice, as buffer overflow, worm attacks, or
Trojan attacks might be able to hijack the replication agent and run commands with the
same security context as the SQL Server Agent on the publisher, distributor, or subscriber This dialog
enables you to control which account the replication agent is going to run under; ideally, this will be an
account with as few rights as possible on the publisher, distributor, or subscriber Figure 36-1 illustrates
this dialog.
8 Click the Security Settings button to display the Snapshot Agent Security dialog shown in
Figure 36-2 From here, you can enter the Windows or SQL Server Agent account under
which you wish the snapshot agent to run If you choose a Windows account, it needs to be
added using the following syntax: DomainName\AccountName.
9 Once you have selected the agents you wish to use, click OK, and then Next to exit the Agent
Security dialog
10 The next dialog is the Wizard Actions dialog This enables you to create the publication
imme-diately, create a script to create the publication, or both Once you have made your selection,
click Next In the Complete the Wizard dialog that appears, you can name your publication
11 Once you have given your publication a name, click Finish to create it.
After you have created your publication, you can now create one or more subscriptions to it