Tài liệu SQL Server MVP Deep Dives- P14 ppt

To re-add theforeign keys, the Distribution Agent calls the new sp_MSrestoresavedforeignkeyssystem stored procedure once the snapshot has been applied.. When using SQL Server 2000, enter

Trang 1

476 C 36 Understated changes in SQL Server 2005 replication

Reading the text of hidden replication stored procedures

Consider the case when you need to examine the text of a replication system storedprocedure You might want to see this text for several reasons Perhaps you want toperform the same actions as the SQL Server Management Studio (SSMS) GUI, buthope to gain programmatic control of the process for yourself Perhaps you want toget a better understanding of what happens under the hood in order to increase yourreplication expertise and troubleshooting ability Whatever the reason, usually youwill need to know the name of the system stored procedure and then use sp_helptext

or the OBJECT_DEFINITION function to see the whole procedure definition For some

of the replication stored procedures, though, you will find that the text is hidden andthese two methods will not work For example, if you try the following code in a nor-mal query window, you will have NULL returned:

SELECT OBJECT_DEFINITION (OBJECT_id('sys.sp_MSrepl_helparticlecolumns'))

On the other hand, if you use the dedicated administrator connection (DAC), you will

be able to access the underlying text of the procedure The process is pretty forward and is shown here:

straight-1 Enable remote access to the DAC:

sp_configure 'remote admin connections', 1;

GO RECONFIGURE;

GO

2 Connect to the server using the DAC.Use a query window to connect to yourservername by using ADMIN:your-servername in the server name section (or use the sqlcmd command-promptutility with the -A switch)

3 Execute the script:

SELECT OBJECT_DEFINITION ( OBJECT_id('sys.sp_MSrepl_helparticlecolumns') )

You should find the procedure text returned as expected, and if you are on a tion system, don’t forget to close the DAC connection when you are done with it!

produc-Creating snapshots without any data—only the schema

When we look in BOL at the definition of a replication stored procedure or a tion agent, we find that the permitted values for the parameters are all clearly listed.But it occasionally becomes apparent that there are other acceptable values that havenever been documented The exact number of these hidden parameters is somethingwe’ll never know, and in all cases they will be unsupported for the general public.Even so, sometimes they start being used and recommended prior to documentation,usually in order to fix a bug A case in point is the sp_addpublication procedure, inwhich there is now the acceptable value of database snapshot for the @sync_method

Trang 2

Undocumented or partially documented changes in behavior

This value was for some time known about, undocumented and yet used, but it nowexists in fully documented (and supported) form in BOL The usual caveats apply ifyou decide to use any such workaround; you must take full responsibility, and anysuch modifications are completely unsupported

Another example that exists in the public domain but is not yet in BOL is alsoavailable If your distributor is SQL Server 2005, the Snapshot Agent has an undocu-mented /NoBcpData switch that will allow you to generate a snapshot without any BCP

data This can be useful when you need to (quickly) debug schema problems ated on initialization

You can access the command line for running the Snapshot Agent from SSMS asfollows:

1 Expand the SQL Server Agent node

2 Expand the Jobs node

3 Double-click on the Snapshot Agent job, which typically has a name of the form

<Publisher>_<PublisherDB>_<Publication>_<number> (for example, TestPub-TestPublication-1) You’ll know if this is the correct job because the cat-egory will be listed as REPL-Snapshot

Paul-PC-4 Select Steps from the left pane

5 Select the second Run Agent step, and click the Edit button to open it Youshould see the command line in the Command text box

Once you have added the /NoBcpData parameter to the command line, as shown infigure 1, click OK in the Job Step dialog box and click OK again in the Job dialogbox to make sure that the change is committed The /NoBcpData switch tells theSnapshot Agent to create empty BCP files instead of bulk-copying data out from thepublished tables

Trang 3

Some changed replication defaults

Many replication defaults changed between SQL Server 2000 and SQL Server2005—far too many to cover in this section Most of the new defaults are obvious andself-explanatory, but occasionally some of the following changes catch people out

ROW-LEVEL CONFLICT DETECTION

In SQL Server 2000, column-level conflict detection was the default for merge tion, and this has changed to row-level conflict detection in SQL Server 2005 Whichone is correct for your business is something only you can determine, but if you previ-ously left this setting alone and intend to do the same now, you might find an unex-pected set of records in the conflict viewer

replica-NEW MERGE IDENTITY RANGE MANAGEMENT

The following changes to identity range management for merge publications havebeen introduced in SQL Server 2005:

Range allocation is automatic In SQL Server 2005 merge publications, the cle identity range management is set to automatic by default In SQL Server 2000,the default identity range management was manual What is the difference? Automatic range management ensures that each subscriber is reseeded withits own identity ranges without any extra configuration, whereas manual meansthat you will need to change either the seed or the increment of the identityrange on each subscriber to avoid conflicts with the publisher If you previouslyrelied on leaving this article property alone and chose to manually administerthe identity range, beware because a range of 1,000 values will have alreadybeen allocated to each of your subscribers

arti- Default range sizes have increased The publisher range has changed from

100 to 10,000, and the subscriber range size has increased from 100 to 1,000

Overflow range is allocated The merge trigger code on the published articleimplements an overflow range that is the same size as the normal range Thismeans that by default you will have two ranges of 1,000 values allocated to a sub-scriber The clever part is that the overflow range is automatically allocated bythe merge insert trigger and therefore doesn’t require a connection to the pub-lisher However, the reseeding performed in the trigger is restricted to thosecases where a member of the db_owner role does the insert

The threshold parameter is no longer used Although it appears in the cle properties dialog box much the same as in SQL Server 2000, the thresholdparameter only applies to subscribers running SQL Server Mobile or previousversions of SQL Server

arti-“NOT FOR REPLICATION” TREATMENT OF IDENTITY COLUMNS

Identity columns have a new default behavior These columns are automaticallymarked as Not for Replication (NFR) on the publisher and are transferred with theidentity NFR property intact at the subscriber This retention of the NFR propertyapplies to both transactional and merge replication

Trang 4

Undocumented or partially documented changes in behavior

Why might this be a useful change? First, it means that you don’t need to wadethrough all the tables before creating the publication in order to manually set eachidentity column as NFR This is a huge improvement because the method used in SQL

Server 2000 by Enterprise Manager to set the NFR attribute involved making whole(time-consuming) copies of the table data It also means that if you are using transac-tional replication as a disaster recovery solution, there is now one less hoop you willneed to jump through on failover because you don’t have to change this setting oneach table at the subscriber That particular part of your process can now beremoved

(If you are now thinking that it is not possible in T-SQL to directly add the NFR

attribute to an existing identity column, please take a look inside thesp_identitycolumnforreplication system stored procedure, because this is the pro-cedure that marks the identity column as NFR.)

DEFERRED UPDATE TRACE FLAGS

For transactional replication, you might be using deferred update trace flags sarily In SQL Server 2000, updates to columns that do not participate in a unique keyconstraint are replicated as updates to the subscriber unless trace flag 8202 is enabled,after which they are treated as deferred updates (paired insert/deletes) On the otherhand, updates to columns that do participate in unique constraints are always treated

unneces-as deferred updates (paired insert/deletes) unless trace flag 8207 is enabled In SQL

Server 2005, all such changes are replicated as updates on the subscriber regardless ofwhether the columns being updated participate in a unique constraint or not

PARTITIONING OF SNAPSHOT FILES

The following change to a replication default is more complicated to explain, but itdeals with a significant improvement that has been made to the initial snapshot pro-cess In SQL Server 2000, when an article is BCP’d to the filesystem (the distributionworking folder) during the snapshot generation, there is always one file created thatcontains the table’s data In SQL Server 2005, when you look in the distribution work-

Trang 5

ing folder after creating a snapshot, you might be surprised to find many such files foreach article, each containing a separate part of the table data, as shown in figure 2

Clearly there has been a big change in the processing rules I’ll refer to this overallprocess of splitting data files as BCP partitioning, borrowing the term from a Micro-

soft developer who once pointed this out in a posting in the Microsoft ReplicationNewsgroup (microsoft.public.sqlserver.replication) This section explains why BCP

partitioning exists, what the expected behavior is, and how to troubleshoot if it allgoes wrong

BCP partitioning has several benefits First, it helps in those cases where there hasbeen a network outage when the snapshot is being applied to the subscriber In SQL

Server 2000, this would mean that the complete snapshot would have to be reapplied,and in the case of concurrent snapshots, this would all have to be done in one trans-action In contrast, if you have a SQL Server 2005 distributor and SQL Server 2005subscribers, there is now much greater granularity in the process The article rows arepartitioned into the separate text files, and each partition is applied in a separatetransaction, meaning that after an outage, the snapshot distribution is able to con-tinue with the partition where it left off and complete the remaining partitions For atable containing a lot of rows, this could lead to a huge saving in time

Other useful side effects are that this can cause less expansion of the transactionlog (assuming that the migration crosses a backup schedule or the subscriber usesthe simple recovery model), and it can lead to paths of parallel execution of the BCP

process for those machines having more than one processor (It is true that parallelexecution existed in SQL Server 2000, but this was only for the processing of severalarticles concurrently and not for a single table.)

Similarly, the same benefits apply when creating the initial snapshot using theSnapshot Agent Note that the –BcpBatchSize parameter of the Snapshot and Distri-bution Agents governs how often progress messages are logged and has no bearing atall on the number of partitions

Source table

Articlename#1.bcp Articlename#2.bcp

a table is now partitioned across several text files.

Trang 6

More efficient methodologies

To disable BCP partitioning, you can add the unofficial Partitioning 0 switch to the Snapshot Agent and a single data file will be produced,just like in SQL Server 2000 Why would you want to turn off such a useful feature?Well, anecdotally, things may get worse for folks who don’t start off with empty tables(archiving or roll-up scenarios) or if the CPU, disk I/O, or network bandwidth is thebottleneck in the attempt to extract more snapshot processing throughput whenusing BCP partitioning

Finally, for those tables that expand the transaction log, some DBAs like to enablethe bulk-logged recovery mode to try to minimize logging, but this will not alwayswork when dealing with multiple partitions To ensure that there is a maximumchance of going down the bulk-logged path, you should use -MaxBcpThreads X(where X > 1) for the Distribution Agent and ensure that the target table doesn’t haveany indexes on it before the Distribution Agent delivers the snapshot

More efficient methodologies

In the previous section, we looked at several undocumented techniques that can beused to enhance the replication behavior We’ll now look at some capabilities that arefully documented, but that are not always understood to be replacements for less-efficient methodologies

Remove redundant pre-snapshot and post-snapshot scripts

In SQL Server 2000 publications, we sometimes use pre-snapshot and post-snapshotscripts The pre-snapshot scripts are T-SQL scripts that run before the snapshot filesare applied, whereas the post-snapshot scripts apply once the snapshot has com-pleted Their use is often to overcome DRI (declarative referential integrity) issues onthe subscriber

Remember that the initialization process starts by dropping tables on the scriber If all the tables on the subscriber originate from one publication, this is not

sub-an issue, but if there is more thsub-an one publication involved, we might have a scenariowhere the dropping of tables at the subscriber during initialization would be invalidbecause of relationships between articles originating from different publications.There might also be other tables on the subscriber that are related to replicated arti-cles and that are not themselves part of any publication Either way, we find the same

DRI problem when initialization tries to drop the subscriber’s table In such cases, thepre-snapshot and post-snapshot scripts are needed—a pre-snapshot script would dropthe foreign keys to allow the tables to be dropped, and a post-snapshot script wouldthen add the foreign keys back in Such scripts are not difficult to write, but eachneeds to be manually created and maintained, causing (another!) maintenance head-ache for the DBA

In SQL Server 2005 there is a new, automatic way of achieving this on initialization

at the subscriber Initially, there is a call to the sys.sp_MSdropfkreferencingarticle

Trang 7

system stored procedure, which saves the relevant DRI information to the followingthree metadata tables:

dbo.MSsavedforeignkeys

dbo.MSsavedforeignkeycolumns

dbo.MSsavedforeignkeyextendedproperties Once the information is safely hived away, the foreign keys are dropped To re-add theforeign keys, the Distribution Agent calls the new sp_MSrestoresavedforeignkeyssystem stored procedure once the snapshot has been applied Note that all this hap-pens automatically and requires no manual scripts to be created

Take a look at your existing pre-snapshot and post-snapshot scripts If they dealwith the maintenance of foreign keys, there’s a good chance they are doing work that

is already done by default, in which case you’ll be able to drop the scripts entirely andremove the maintenance issue

Replace merge -EXCHANGETYPE parameters

In SQL Server 2005 merge replication, we can now mark articles as download-only,meaning that changes to the table are only allowed at the publisher and not at thesubscriber Previously, in SQL Server 2000, we would use the -EXCHANGETYPE value toset the direction of merge replication changes This was implemented by manuallyediting the Merge Agent’s job step and adding -EXCHANGETYPE 1|2|3 as text

When using SQL Server 2000, entering a value of -EXCHANGETYPE 2 means thatchanges to a replicated article at the subscriber are not prohibited, are recorded inthe merge metadata tables via merge triggers, and are subsequently filtered out whenthe Merge Agent synchronizes This means there may be a huge amount of unneces-sary metadata being recorded, which slows down both the data changes made to thetable and the subsequent synchronization process

This -EXCHANGETYPE setting is not reflected directly in the GUI and is hidden away

in the text of the Merge Agent’s job Despite being a maintenance headache andcausing an unnecessary slowing down of synchronization, it was the only way ofachieving this end, and judging by the newsgroups, its use was commonplace

In SQL Server 2005, when adding an article, there is an option to define thesubscriber_upload_options either using the article properties screen in the GUI or

in code, like this:

sp_addmergearticle @subscriber_upload_options = 1

This parameter defines restrictions on updates made at a subscriber The parametervalue of 1 is described as “download only, but allow subscriber changes” and seemsequivalent to the -EXCHANGETYPE = 2 setting mentioned previously, but in the SQL

Server 2005 case there are no triggers at all on the subscriber table Another tion is that this setting is made at the more granular article level rather than set for theentire publication This means that although the -EXCHANGETYPE and sp_add-mergearticle methods are logically equivalent, the implementation has become

Trang 8

Summary

much more sophisticated in SQL Server 2005 Triggers that unnecessarily log data at the subscriber are no longer fired; therefore both subscriber data changes andthe subsequent synchronization are significantly faster

Put simply, you should replace the use of EXCHANGETYPE with download-only articles! Incidentally, this setting is also implemented by a separate check box in SSMS, asshown in figure 3 This check box does a similar job but sets the value of

@subscriber_upload_options to 2, which again makes the changes download-only,but in this case any subscriber settings are prohibited and rolled back

Summary

We have looked at many of the lesser-known replication techniques useful in SQL

Server 2005 Some of these involve using parameters or procedures that are partiallydocumented but that might help solve a particular issue Other methods are fully doc-umented, but we have looked at how these methods can be used to replace replicationtechniques used in SQL Server 2000 and improve our replication implementation andreduce administration

reduce metadata.

Trang 9

About the author

Paul Ibison is a contractor SQL Server DBA in London He runsthe website www.replicationanswers.com—the only site dedi-cated to SQL Server replication—and has answered over 6,000questions on the Microsoft SQL Server Replication newsgroup.When not working, he likes spending time with his wife andson, Ewa and Thomas, going fell-walking in the Lake District,and learning Ba Gua, a Chinese martial art

Trang 10

Summary

Trang 11

transactional replication Hilary Cotter

The purpose of this chapter is to educate DBAs on how to get maximum mance from their high-performance transactional replication topology across allversions and editions Most DBAs are concerned with latency—in other words, howold the transactions are when they are applied on the Subscriber

To set expectations, you should know that the minimum latency of any tional replication solution will be several seconds (lower limits are between 1 and 2seconds) Should you need replication solutions which require lower latencies, youshould look at products like Golden Gate, which is an IP application that piggy-backs off the Log Reader Agent

Focusing solely on latency will not give you a good indication of replication formance The nature of your workload can itself contribute to larger latencies Forexample, transactions consisting of single insert statements can be replicated withsmall latencies (that is, several seconds), but large batch operations can have largelatencies (that is, many minutes or even hours) Large latencies in themselves arenot necessarily indicative of poor replication performance, insufficient networkbandwidth, or inadequate hardware to support your workload

Consequently, in this study we’ll be looking at the following:

replicate per second These can be measured by the performance monitorcounters SQLServer:Replication Dist:Dist:Delivered Trans/sec and SQL-Server:Replication Dist:Dist:Delivered Cmds/sec

Worker time—How long it takes for SQL Server to replicate a fixed number oftransactions and commands This statistic is logged when the replicationagents are run from the command line

Latency can be measured using the performance monitor counterSQLServer:Replication Dist:Dist:Delivery Latency

Trang 12

Performance kiss of death factors in transactional replication

Although these performance monitor counters are the best way to get a handle onyour current throughput and latency in your production environments, in this studywe’ll be focusing primarily on throughput We’ll focus mainly on worker time, or howlong the distribution agent has to work to replicate a given set of commands We’llfocus on the Distribution Agent metrics, as the Log Reader is rarely the bottleneck in

a replication topology Additionally the Log Reader Agent operates asynchronouslyfrom the Distribution Agent; therefore, the Log Reader Agent can keep current withreading the log, while the Distribution Agent can be experiencing high latencies Bystudying the output of the replication agents themselves, when you replay your work-loads through them (or measure your workloads as they are replicated by theagents), you can determine the optimal configuration of profile settings for yourworkloads, and determine how to group articles into different publications for themaximum throughput

This chapter assumes that you have a good understanding of replication concepts.Should you be unfamiliar with replication concepts, I advise you to study the section

Replication Administrator InfoCenter in Books Online, accessible online at http://

msdn.microsoft.com/en-us/library/ms151314(SQL.90).aspx Before we begin it is important to look at factors that are the performance kiss ofdeath to any replication solution After we look at these factors and possible ways tomitigate them, we’ll look at tuning the replication agents themselves for maximumperformance

Performance kiss of death factors

Transactional replication replicates transactions within a transactional

con-text—hence the name transactional This means that if I do a batch update, insert, or

delete, the batch is written in the log as singleton commands Singletons are datamanipulation language (DML) commands that affect at most one row For example,the following are all singletons:

Trang 13

486 C 37 High-performance transactional replication

insert into tableName (Col1, Col2) values(1,2) update tableName set Col1=1, Col2=2 where pk=1 delete from tableName where pk=1

Each singleton is wrapped in a transaction Contrast this with the following batch

updates (the term update refers to any DML—an insert, update, or delete:

insert into tableName select * from tableName1 update tableName set col1=1 where pk<=20 delete from tableName where pk<=20

In the insert statement the insert batch update will insert as many rows as there are intableName1 into tableName (as a transaction) Assuming there were 20 rows with a pkless than or equal to 20 in tableName, 20 rows would be affected by the batch updateand batch deletes

If you use any transaction log analysis tool, you’ll see that the batch updates aredecomposed into singleton commands The following update command

update tableName set col1=1 where pk<=20

would be written in the log as 20 singleton commands, that is:

update tableName set col1=1 where pk=1 update tableName set col1=1 where pk=2 update tableName set col1=1 where pk=3

update tableName set col1=1 where pk=20

The Log Reader Agent reads committed transactions and their constituent singletoncommands in the log and writes them to the distribution database as the constituentcommands

Details about the transaction are written to MSrepl_transactions along with detailsabout the constituent commands

The Distribution Agent wakes up (if scheduled) or polls (if running continuously)and reads the last applied transaction on the subscription database for that publica-tion It then reads MSrepl_transactions on the distribution database and applies thecorresponding commands for that transaction it finds in MSrepl_commands one byone on the Subscriber

Transactions are committed to the database depending on the settings of theBatchCommitSize and BatchCommitThreshold settings for the Distribution Agent.We’ll talk about these settings later

Key to understanding the performance impact of this architecture is realizing thatreplicating large transactions means that a transaction will be held on the Subscriberwhile all the singleton commands are being applied on the Subscriber Then a com-mit is issued This allows the Distribution Agent to roll back the entire transaction,should there be a primary key violation, foreign key violation, lack of transactionalconsistency (no rows affected), or some other event that causes the DML to fail (for

Trang 14

example, the subscription database transaction log filling up) This can mean that alengthy period of time is required to apply large batch updates While these batchupdates are being applied, the Distribution Agent will wrap them in a transaction sothat it can roll them back on errors, and Subscriber resources are consumed to holdthis transaction open Latencies that were previously several seconds can quickly grow

to many minutes, and occasionally to hours (for large numbers of modified rows)

SQL Server will get bogged down when replicating transactions that affect large bers of rows—typically in the tens or hundreds of thousands of rows Strategies forimproving performance in this regard are presented in the sections that follow

num-REPLICATING THE EXECUTION OF STORED PROCEDURES

Replication involves doing your batch DML through a stored procedure and then licating the execution of the stored procedure If you choose to replicate the execu-tion of a stored procedure, every time you execute that stored procedure its name andits parameters will be written to the log, and the Log Reader Agent will pick it up andwrite it to the distribution database, where the Distribution Agent will pick it up andapply it on the Subscriber The performance improvements are due to two reasons:

rep- Instead of 100,000 commands (for example) being replicated, only one storedprocedure statement would be replicated

The Log Reader Agent has to read only the stored procedure execution ment from the log, and not the 100,000 constituent singleton commands Naturally this will only work if you have a small number of parameters to pass Forexample, if you’re doing a batch insert to 100,000 rows, it will be difficult to pass the100,000 rows to your stored procedure

state-SP_SETSUBSCRIPTIONXACTSEQNO

Another trick is to stop your Distribution Agent before you begin your batch update.Use the sp_browsereplcmds stored procedure to extract commands that have notbeen applied to the Subscriber and issue them on the Subscriber to bring it up to datewith the Publisher Then perform the batch update on your Publisher and Subscriber.The Log Reader Agent will pull all the commands from the Publisher into the distri-bution database, but make sure that they are not replicated (and hence applied twice

at the Subscriber) Use sp_browsereplcmds to determine the transaction identifier(xact_ seqno) for the last batch update command that the Log Reader Agent writesinto the distribution database Note that you can select where to stop and startsp_browsereplcmds as it will take a long time to issue unqualified calls tosp_browsereplcmds

You may have to wait awhile before the Log Reader Agent reads all the commandsfrom the Publisher’s log and writes them to the distribution database

When you detect the end of the batch update using sp_browsereplcmds, note thevalue of the last transaction identifier (xact_seqno) and then use sp_setsubscriptionxactseqno to tell the subscription database that all the batch updates havearrived on the Subscriber Then restart your Distribution Agent, and the agent willwrite only transactions that occurred after the batch update

Trang 15

Take care to note any transactions that may be in the distribution database andoccurred after the batch update started and before it stopped You’ll need to ensurethat these commands are also applied on the subscription database

The problem with this approach is that the Log Reader Agent still has to processall of the batch update commands that are written to the log This approach will elim-inate the lengthy time required for the Distribution Agent to apply the commands tothe Subscriber, but will not address the time that it takes for the Log Reader Agent toread the batch commands from the log and write them to the distribution database

MAXCMDSINTRAN

MaxCmdsInTran is a Log Reader Agent parameter, which will break a large transactioninto small batches For example, if you set this to 1,000 and do a batch insert of 10,000rows, as the Log Reader Agent reads 1,000 commands in the log it will write them tothe distribution database, even before that batch insert has completed This allowsthem to be replicated to the Subscriber If this batch insert was wrapped in a transac-tion on the Publisher and the batch insert failed before the transaction was commit-ted, the commands read by the Log Reader Agent and written in the distributiondatabase would not be rolled back For example, if the batch insert failed on the9,999th row, the entire 10,000-row transaction would be rolled back on the Publisher.The log reader would already have read the 9,000 rows out of the transaction log andwritten them to the distribution database, and then they would be written in 1,000 rowbatches to the Subscriber

The advantage of this method is reduced latency because the Log Reader Agentcan start reading these commands out of the Publisher’s log before the transaction iscommitted, which will mean faster replication of commands The disadvantage is thatconsistency may be lacking between your Publisher and Subscriber Appropriate usecases are situations in which being up to date is more important than having a com-plete historical record or transactional record For example, a media giant used thismethod with the understanding that their peak usage would occur during a disaster.For example, during 9/11 everyone used the news media resources for the latestnews, and if a story was lost, its rewrite swiftly came down the wire A large book selleralso used this method when they wrote off a few lost orders, knowing that the bulk ofthem would be delivered to the subscribers on time

Replicating text

Text in this context refers to any of the large-value data types—text, ntext, image,

nvarchar(max), varchar(max), varbinary(max) with filestream enabled, binary(max), and XML

Like a batch update, when you replicate text, the constituent commands may bespread over multiple rows in MSrepl_commands

For example, this statement,

Insert into tableName (col1) values(replicate('x',8000)

Trang 16

is spread over eight rows in MSrepl_commands When the text is being replicated,there is overhead not only when the command is read out of the Publisher’s log andbroken into eight commands in MSrepl_commands, but there is also overhead for theDistribution Agent in assembling these eight commands into one insert statement toapply on the Subscriber

Unfortunately there is no easy way of getting around the overhead, other thanusing vertical filtering to avoid replicating text columns On the newsgroups I fre-quently encounter misconceptions about the max text repl size (B) option, thevalue of which can be set using sp_configure The misconception is that this settingsomehow helps when replicating text Some people think that if you are replicatingtext values larger than the setting of max text repl size, then the value is not repli-cated; others think that special optimizations kick in In actuality your insert orupdate statement will fail with this message: “Length of text, ntext, or image data (x)

to be replicated exceeds configured maximum 65536.”

Although you can use this option to avoid the expense of replicating large text ues, ensure that your application can handle the error that is raised

val-Logging

This is a catch-22 The agents need to log minimal replication activity so that the lication subsystem can detect hung agents However, logging itself will degrade repli-cation performance Figure 1 illustrates the impact of various settings of theHistoryVerboseLoggingLevel when replicating 10,000 singleton insert statements The y axis is worker time (ms), and the x axis is OutputVerboseLevel Notice how

rep-a setting for HistoryVerboseLevel of 0 rep-and using the defrep-ault for OutputVerboseLevel(1) will give you the best performance and replicate 20 percent faster than its nearestcompetitor; 20 percent faster meant a total of 18,356 transactions per second The characteristics are completely different for 100 transactions of 100 singletoninserts as displayed in figure 2

The y axis is worker time, and the x axis is OutputVerboseLevel

1400 1200 1000 800 600 400 200 0

1 2 3

HistoryVerboseLevel=2 HistoryVerboseLevel=1 HistoryVerboseLevel=0

HistoryVerboseLevel and

OutputVerboseLevel settings on a workload of 10,000 singleton inserts

Trang 17

Notice how the default settings (HistoryVerboseLevel = 1, and OutputVerboseLevel

= 1) are optimal for this workload You’ll need to create custom profiles and adjustthem for your workload A setting for HistoryVerboseLevel of 0 means no history, 1means only the current history is retained, and 2 means that all history will beretained An OutputVerboseLevel setting of 1 means minimal output, 2 meansdetailed output, and 3 means debugging information is included in the output The above examples occur when you run the Distribution Agent from the com-mand line For comparative purposes, when you run in SQL Server Management Stu-dio using the defaults, the maximum number of commands per second on the sametest system is 7,892 We get a performance boost of 25 percent when we run the agentsfrom the command line; this is because of the overhead in running and monitoringthe agents from within SQL Server If you run your agents from the command line,you’ll have to roll your own monitoring agent to ensure that the agents are not hung,and there are no errors

As the performance improvements are significant, this is definitely an option touse in environments requiring minimal latency

Network latency

If you are replicating from one server to another, the impact of network latency cancripple high performance replication solutions You should examine two factors—net-work bandwidth and speed Although network latency is typically not a factor whenreplicating between servers on the same network segment or switch, it can be painfulwhen replicating over a WAN

For example, a financial client of mine was replicating between their New YorkCity office and offices in Europe The ping response time between local servers was 0

ms, and between New York and the European offices was 7 ms Replication latency wastypically several seconds during the day; however while replicating batch commands atnight the latency would shoot up to several hours Contrast this with local subscriberswhich would apply the batches within minutes Sniffer analysis revealed minimal net-work congestion at the time The best response to minimizing the effect of network

1400 1200 1000 800 600 400 200 0

1 2 3

HistoryVerboseLevel=2 HistoryVerboseLevel=1 HistoryVerboseLevel=0

HistoryVerboseLevel and

OutputVerboseLevel settings

on a workload of 100 transactions of

100 singleton inserts

Trang 18

latency is to increase the network packet size You can increase the network packet size

to 32 KB by issuing the following command:

sp_configure 'Show Advanced Options', 1 reconfigure with override

go sp_configure 'network packet size (B)', 14000 reconfigure with override

go

Note that the maximum network packet size is 32,767 bytes Prior to SQL 2005 SP3 and

SQL 2008 SP1 there is a bug that causes exceptions using the maximum value for work packet size The threshold appears to be somewhere around 14,000 bytes Youcan experiment with binary elimination to fine tune this limit further—unfortunatelythere isn’t a knowledge base (KB ) article that describes this

Increasing the network packet size allows SQL Server to encapsulate more tion commands within a single packet Should an error occur, the network stack willrequest this packet be retransmitted Lossy or congested networks will benefit fromlower packet sizes; otherwise you should be using a larger value

Although this client found the effects of network latency particularly painful withbatch updates, each time you replicate over WANs or distances you can benefit fromincreased packet sizes

Anecdotally, I heard that to circumvent this problem one client batched all theupdate statements into a text file, copied them over manually to the subscribers, andthen issued them on the subscribers The client was able to live with the synchroniza-tion issues that the solution presented The batch update was wrapped in a stored pro-cedure on the Publisher, the execution of which was replicated, and the code for thestored procedure on the subscribers was replaced with a no-op

Subscriber hardware

Subscriber hardware impacts replication performance For most non-trivial loads, the Subscriber’s hardware does not have to be the same as the Publisher’s Youshould be running a multi-proc box, with ample memory Preferably you should berunning SQL Server 2005/2008 64-bit Should your hardware be inadequate you’llrun into a problem called hardware impedance mismatch, where the Subscriber willdegrade the performance of your Publisher

work-Subscriber indexes and triggers

Subscriber indexes and triggers will add to the latency of every replicated commandbeing applied Ensure that the indexes on the Subscriber are as lightweight as possi-ble Unfortunately, transactional replication is most frequently used to replicate to areporting environment, which usually requires a different set of indexes on the Sub-scriber than on the Publisher, and consequently this is not an easy task However, theeffort spent in careful elimination of marginally effective indexes and triggers will paydividends If at all possible, trigger logic should be incorporated into the replication

Trang 19

stored procedures, and sometimes can be incorporated into a different article withcareful use of custom sync objects

Distributor hardware

As replication impacts the performance of the Publisher, using a clustered remote tributor is necessary when you have any appreciable throughput Ensure that your Dis-tributor is running RAID 10, and has ample memory—64 GB is recommended Pleaserefer to this blog post for an analysis of distributor hardware: http://blogs.technet.com/lzhang/archive/2006/05/12/428178.aspx

Dis-Large numbers of push subscriptions

If you have a large number of push subscriptions, you should migrate to pull ers to transfer the processing requirements from the Distributor to the Subscriber.Also, using pull subscribers over WANs has performance advantages

In the remainder of this chapter we’ll look at optimal replication settings Most ofthese settings can be configured using custom profiles

Optimal settings for replication

Most of the tuning should be done on the Distribution Agents Log Reader Agents arerarely bottlenecks Naturally, you should separate log files on I/O paths different fromthe data drives to prevent I/O contention; this will help log reader performance Use the settings ReadBatchSize and ReadBatchThreshold to most significantlyimprove the Log Reader Agent settings

ReadBatchSize determines when the log reader will distribute the transactions ithas read from the log to the distribution database The default is 500, which meansthat the log reader will batch up 500 transactions before writing them to the distribu-tion database ReadBatchThreshold determines when the log reader will distributethe commands it has read from the log to the distribution database The default forReadBatchThreshold is 0

This concept is key to understanding how these parameters work: the parametershonor transactional boundaries For example, if I had a ReadBatchSize and Read-BatchThreshold settings of 500 and 100 respectively, and I issued a batch updatewhich affected 1000 rows, neither of these parameters would trigger a message tellingthe Log Reader Agent to write the buffered data to the distribution database If Iissued a singleton insert, both transactions would fly as the ReadBatchThreshold isreached (or exceeded) The PollingInterval determines how long transactions willremain in the buffer before being sent to the distribution database even if ReadBatch-Size or ReadBatchThreshold have not been exceeded

Figure 3 illustrates the impact of various settings for ReadBatchSize and BatchThreshold

Trang 20

Optimal settings for replication

The y axis is worker time in ms, and the x axis is ReadBatchThreshold As you can see,the higher values produce the lowest worker times for this particular workload In thiscase, the workload is 10,000 singleton inserts

CommitBatchSize and CommitBatchThreshold

CommitBatchSize and CommitBatchThreshold settings are analogous to Size and ReadBatchTheshold CommitBatchSize refers to the number of transactionsthat will be sent in a batch to the Subscriber, and CommitBatchThreshold refers to thenumber of commands Similar to ReadBatchSize and ReadBatchThreshold, highernumbers are better, but they will consume more resources on the Subscriber In somecases the snapshot isolation model can be used to minimize the effect of higherCommitBatchSize and CommitBatchThreshold settings, keeping in mind the addedpressure put on tempdb and possible performance problems associated with it Figure

ReadBatch-4 shows the results of 10,000 singleton inserts with varying values of CommitBatchSizeand CommitBatchThreshold settings

In figure 4 the y axis is worker time in ms, and the x axis is CommitBatchThreshold.The lower values of worker time are better As you can see, the best values for Commit-BatchSize and CommitBatchThreshold are 1,000 and 1,000 The defaults are Commit-BatchSize=100 and CommitBatchThreshold=1,000

35000 30000 25000 20000 15000 10000 5000 0

10 100 1000 10000

ReadBatchSize=1000 ReadBatchSize=100 ReadBatchSize=10

ReadBatchSize=10000 default (500)

ReadBatchSize and ReadBatchThreshold for a workload of 10,000 singleton inserts on the Log Reader Agent

18000 16000 14000 12000 10000 8000 6000 4000 2000 0

10 100 1000 10000

CommitBatchSize=1000 CommitBatchSize=100 CommitBatchSize=10

CommitBatchSize=10000

and CommitBatchThreshold for a workload of 10,000 singleton inserts on the Distribution Agent

Tiêu đề	Understated Changes in SQL Server 2005 Replication
Chuyên ngành	SQL Server

Định dạng
Số trang	40
Dung lượng	1,19 MB