153 Scaling Out Reads, Not Writes 155 The Value of Asynchronous Replication 156 Managing the Replication Topology 158 Application-Level Load Balancing 162 Hierarchical Replication 170 Se
Trang 3Charles Bell, Mats Kindahl, and Lars Thalmann
SECOND EDITIONMySQL High Availability
Trang 4MySQL High Availability, Second Edition
by Charles Bell, Mats Kindahl, and Lars Thalmann
Copyright © 2014 Charles Bell, Mats Kindahl, Lars Thalmann All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are
also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Andy Oram
Production Editor: Nicole Shelby
Copyeditor: Jasmine Kwityn
Proofreader: Linley Dolby
Indexer: Lucie Haskins
Cover Designer: Karen Montgomery
Interior Designer: David Futato
Illustrator: Rebecca Demarest June 2010: First Edition
April 2014: Second Edition
Revision History for the Second Edition:
2014-04-09: First release
See http://oreilly.com/catalog/errata.csp?isbn=9781449339586 for release details.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly
Media, Inc MySQL High Availability, the image of an American robin, and related trade dress are trademarks
of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
ISBN: 978-1-449-33958-6
[LSI]
Trang 5Table of Contents
Foreword for the Second Edition xv
Foreword for the First Edition xix
Preface xxi
Part I High Availability and Scalability 1 Introduction 3
What’s This Replication Stuff, Anyway? 5
So, Backups Are Not Needed Then? 7
What’s With All the Monitoring? 7
Is There Anything Else I Can Read? 8
Conclusion 9
2 MySQL Replicant Library 11
Basic Classes and Functions 15
Supporting Different Operating Systems 16
Servers 17
Server Roles 19
Conclusion 21
3 MySQL Replication Fundamentals 23
Basic Steps in Replication 24
Configuring the Master 25
Configuring the Slave 27
Connecting the Master and Slave 28
A Brief Introduction to the Binary Log 29
What’s Recorded in the Binary Log 30
Watching Replication in Action 30
The Binary Log’s Structure and Content 33
iii
Trang 6Adding Slaves 35
Cloning the Master 37
Cloning a Slave 39
Scripting the Clone Operation 41
Performing Common Tasks with Replication 42
Reporting 43
Conclusion 49
4 The Binary Log 51
Structure of the Binary Log 52
Binlog Event Structure 54
Event Checksums 56
Logging Statements 58
Logging Data Manipulation Language Statements 58
Logging Data Definition Language Statements 59
Logging Queries 59
LOAD DATA INFILE Statements 65
Binary Log Filters 67
Triggers, Events, and Stored Routines 70
Stored Procedures 75
Stored Functions 78
Events 81
Special Constructions 82
Nontransactional Changes and Error Handling 83
Logging Transactions 86
Transaction Cache 87
Distributed Transaction Processing Using XA 91
Binary Log Group Commit 94
Row-Based Replication 97
Enabling Row-based Replication 98
Using Mixed Mode 99
Binary Log Management 100
The Binary Log and Crash Safety 100
Binlog File Rotation 101
Incidents 103
Purging the Binlog File 104
The mysqlbinlog Utility 105
Basic Usage 106
Interpreting Events 113
Binary Log Options and Variables 118
Options for Row-Based Replication 120
iv | Table of Contents
Trang 7Conclusion 121
5 Replication for High Availability 123
Redundancy 124
Planning 126
Slave Failures 127
Master Failures 127
Relay Failures 127
Disaster Recovery 127
Procedures 128
Hot Standby 130
Dual Masters 135
Slave Promotion 144
Circular Replication 149
Conclusion 151
6 MySQL Replication for Scale-Out 153
Scaling Out Reads, Not Writes 155
The Value of Asynchronous Replication 156
Managing the Replication Topology 158
Application-Level Load Balancing 162
Hierarchical Replication 170
Setting Up a Relay Server 171
Adding a Relay in Python 172
Specialized Slaves 173
Filtering Replication Events 174
Using Filtering to Partition Events to Slaves 176
Managing Consistency of Data 177
Consistency in a Nonhierarchical Deployment 178
Consistency in a Hierarchical Deployment 180
Conclusion 187
7 Data Sharding 189
What Is Sharding? 190
Why Should You Shard? 191
Limitations of Sharding 192
Elements of a Sharding Solution 194
High-Level Sharding Architecture 196
Partitioning the Data 197
Shard Allocation 202
Mapping the Sharding Key 206
Sharding Scheme 206
Table of Contents | v
Trang 8Shard Mapping Functions 210
Processing Queries and Dispatching Transactions 215
Handling Transactions 216
Dispatching Queries 218
Shard Management 220
Moving a Shard to a Different Node 220
Splitting Shards 225
Conclusion 225
8 Replication Deep Dive 227
Replication Architecture Basics 228
The Structure of the Relay Log 229
The Replication Threads 233
Starting and Stopping the Slave Threads 234
Running Replication over the Internet 235
Setting Up Secure Replication Using Built-in Support 237
Setting Up Secure Replication Using Stunnel 238
Finer-Grained Control Over Replication 239
Information About Replication Status 239
Options for Handling Broken Connections 248
How the Slave Processes Events 249
Housekeeping in the I/O Thread 249
SQL Thread Processing 250
Semisynchronous Replication 257
Configuring Semisynchronous Replication 258
Monitoring Semisynchronous Replication 259
Global Transaction Identifiers 260
Setting Up Replication Using GTIDs 261
Failover Using GTIDs 263
Slave Promotion Using GTIDs 264
Replication of GTIDs 266
Slave Safety and Recovery 268
Syncing, Transactions, and Problems with Database Crashes 268
Transactional Replication 270
Rules for Protecting Nontransactional Statements 274
Multisource Replication 275
Details of Row-Based Replication 278
Table_map Events 280
The Structure of Row Events 282
Execution of Row Event 283
Events and Triggers 284
Filtering in Row-Based Replication 286
vi | Table of Contents
Trang 9Partial Row Replication 288
Conclusion 289
9 MySQL Cluster 291
What Is MySQL Cluster? 292
Terminology and Components 292
How Does MySQL Cluster Differ from MySQL? 293
Typical Configuration 293
Features of MySQL Cluster 294
Local and Global Redundancy 296
Log Handling 297
Redundancy and Distributed Data 297
Architecture of MySQL Cluster 298
How Data Is Stored 300
Partitioning 303
Transaction Management 304
Online Operations 304
Example Configuration 306
Getting Started 306
Starting a MySQL Cluster 308
Testing the Cluster 313
Shutting Down the Cluster 314
Achieving High Availability 314
System Recovery 317
Node Recovery 318
Replication 319
Achieving High Performance 324
Considerations for High Performance 325
High Performance Best Practices 326
Conclusion 328
Part II Monitoring and Managing 10 Getting Started with Monitoring 333
Ways of Monitoring 334
Benefits of Monitoring 335
System Components to Monitor 335
Processor 336
Memory 337
Disk 338
Network Subsystem 339
Table of Contents | vii
Trang 10Monitoring Solutions 340
Linux and Unix Monitoring 341
Process Activity 342
Memory Usage 347
Disk Usage 350
Network Activity 353
General System Statistics 355
Automated Monitoring with cron 356
Mac OS X Monitoring 356
System Profiler 357
Console 359
Activity Monitor 361
Microsoft Windows Monitoring 365
The Windows Experience 366
The System Health Report 367
The Event Viewer 369
The Reliability Monitor 372
The Task Manager 374
The Performance Monitor 375
Monitoring as Preventive Maintenance 377
Conclusion 377
11 Monitoring MySQL 379
What Is Performance? 380
MySQL Server Monitoring 381
How MySQL Communicates Performance 381
Performance Monitoring 382
SQL Commands 383
The mysqladmin Utility 389
MySQL Workbench 391
Third-Party Tools 402
The MySQL Benchmark Suite 405
Server Logs 407
Performance Schema 409
Concepts 410
Getting Started 412
Using Performance Schema to Diagnose Performance Problems 420
MySQL Monitoring Taxonomy 421
Database Performance 423
Measuring Database Performance 423
Best Practices for Database Optimization 435
Best Practices for Improving Performance 444
viii | Table of Contents
Trang 11Everything Is Slow 444
Slow Queries 444
Slow Applications 445
Slow Replication 445
Conclusion 446
12 Storage Engine Monitoring 447
InnoDB 448
Using the SHOW ENGINE Command 450
Using InnoDB Monitors 453
Monitoring Logfiles 457
Monitoring the Buffer Pool 458
Monitoring Tablespaces 460
Using INFORMATION_SCHEMA Tables 461
Using PERFORMANCE_SCHEMA Tables 462
Other Parameters to Consider 463
Troubleshooting Tips for InnoDB 464
MyISAM 467
Optimizing Disk Storage 467
Repairing Your Tables 468
Using the MyISAM Utilities 468
Storing a Table in Index Order 470
Compressing Tables 471
Defragmenting Tables 471
Monitoring the Key Cache 471
Preloading Key Caches 472
Using Multiple Key Caches 473
Other Parameters to Consider 474
Conclusion 475
13 Replication Monitoring 477
Getting Started 477
Server Setup 478
Inclusive and Exclusive Replication 478
Replication Threads 481
Monitoring the Master 483
Monitoring Commands for the Master 483
Master Status Variables 487
Monitoring Slaves 487
Monitoring Commands for the Slave 487
Slave Status Variables 492
Replication Monitoring with MySQL Workbench 493
Trang 12Other Items to Consider 495
Networking 495
Monitor and Manage Slave Lag 496
Causes and Cures for Slave Lag 497
Working with GTIDs 498
Conclusion 499
14 Replication Troubleshooting 501
What Can Go Wrong 502
Problems on the Master 503
Master Crashed and Memory Tables Are in Use 503
Master Crashed and Binary Log Events Are Missing 503
Query Runs Fine on the Master but Not on the Slave 505
Table Corruption After a Crash 505
Binary Log Is Corrupt on the Master 506
Killing Long-Running Queries for Nontransactional Tables 507
Unsafe Statements 507
Problems on the Slave 509
Slave Server Crashed and Replication Won’t Start 510
Slave Connection Times Out and Reconnects Frequently 510
Query Results Are Different on the Slave than on the Master 511
Slave Issues Errors when Attempting to Restart with SSL 512
Memory Table Data Goes Missing 513
Temporary Tables Are Missing After a Slave Crash 513
Slave Is Slow and Is Not Synced with the Master 513
Data Loss After a Slave Crash 514
Table Corruption After a Crash 514
Relay Log Is Corrupt on the Slave 515
Multiple Errors During Slave Restart 515
Consequences of a Failed Transaction on the Slave 515
I/O Thread Problems 515
SQL Thread Problems: Inconsistencies 516
Different Errors on the Slave 517
Advanced Replication Problems 517
A Change Is Not Replicated Among the Topology 517
Circular Replication Issues 518
Multimaster Issues 518
The HA_ERR_KEY_NOT_FOUND Error 519
GTID Problems 519
Tools for Troubleshooting Replication 520
Best Practices 521
Know Your Topology 521
Trang 13Check the Status of All of Your Servers 523
Check Your Logs 523
Check Your Configuration 524
Conduct Orderly Shutdowns 525
Conduct Orderly Restarts After a Failure 525
Manually Execute Failed Queries 526
Don’t Mix Transactional and Nontransactional Tables 526
Common Procedures 526
Reporting Replication Bugs 528
Conclusion 529
15 Protecting Your Investment 531
What Is Information Assurance? 532
The Three Practices of Information Assurance 532
Why Is Information Assurance Important? 533
Information Integrity, Disaster Recovery, and the Role of Backups 533
High Availability Versus Disaster Recovery 534
Disaster Recovery 535
The Importance of Data Recovery 541
Backup and Restore 542
Backup Tools and OS-Level Solutions 547
MySQL Enterprise Backup 548
MySQL Utilities Database Export and Import 559
The mysqldump Utility 560
Physical File Copy 562
Logical Volume Manager Snapshots 564
XtraBackup 569
Comparison of Backup Methods 569
Backup and MySQL Replication 570
Backup and Recovery with Replication 571
PITR 571
Automating Backups 579
Conclusion 581
16 MySQL Enterprise Monitor 583
Getting Started with MySQL Enterprise Monitor 584
Commercial Offerings 585
Anatomy of MySQL Enterprise Monitor 585
Installation Overview 586
MySQL Enterprise Monitor Components 590
Dashboard 591
Monitoring Agent 594
Trang 14Advisors 594
Query Analyzer 595
MySQL Production Support 597
Using MySQL Enterprise Monitor 597
Monitoring 599
Query Analyzer 605
Further Information 608
Conclusion 609
17 Managing MySQL Replication with MySQL Utilities 611
Common MySQL Replication Tasks 612
Checking Status 612
Stopping Replication 615
Adding Slaves 617
MySQL Utilities 618
Getting Started 618
Using the Utilities Without Workbench 619
Using the Utilities via Workbench 619
General Utilities 621
Comparing Databases for Consistency: mysqldbcompare 621
Copying Databases: mysqldbcopy 624
Exporting Databases: mysqldbexport 625
Importing Databases: mysqldbimport 628
Discovering Differences: mysqldiff 629
Showing Disk Usage: mysqldiskusage 632
Checking Tables Indexes: mysqlindexcheck 635
Searching Metadata: mysqlmetagrep 636
Searching for Processes: mysqlprocgrep 637
Cloning Servers: mysqlserverclone 639
Showing Server Information: mysqlserverinfo 641
Cloning Users: mysqluserclone 642
Utilities Client: mysqluc 643
Replication Utilities 644
Setting Up Replication: mysqlreplicate 644
Checking Replication Setup: mysqlrplcheck 646
Showing Topologies: mysqlrplshow 648
High Availability Utilities 650
Concepts 650
mysqlrpladmin 651
mysqlfailover 655
Creating Your Own Utilities 663
Architecture of MySQL Utilities 663
Trang 15Custom Utility Example 664
Conclusion 673
A Replication Tips and Tricks 675
B A GTID Implementation 693
Index 705
Trang 17Foreword for the Second Edition
In 2011, Pinterest started growing Some say we grew faster than any other startup todate In the earliest days, we were up against a new scalability bottleneck every day thatcould slow down the site or bring it down altogether We remember having our laptopswith us everywhere We slept with them, we ate with them, we went on vacation withthem We even named them We have the sound of the SMS outage alerts imprinted inour brains
When the infrastructure is constantly being pushed to its limits, you can’t help but wishfor an easy way out During our growth, we tried no less than five well-known databasetechnologies that claimed to solve all our problems, but each failed catastrophically.Except MySQL The time came around September 2011 to throw all the cards in the airand let them resettle We re-architected everything around MySQL, Memcache, andRedis with just three engineers
MySQL? Why MySQL? We laid out our biggest concerns with any technology andstarted asking the same questions for each Here’s how MySQL shaped up:
• Does it address our storage needs? Yes, we needed mappings, indexes, sorting, andblob storage, all available in MySQL
• Is it commonly used? Can you hire somebody for it? MySQL is one of the mostcommon database choices in production today It’s so easy to hire people who haveused MySQL that we could walk outside in Palo Alto and yell out for a MySQLengineer and a few would come up Not kidding
• Is the community active? Very active There are great books available and a strongonline community
• How robust is it to failure? Very robust! We’ve never lost any data even in the mostdire of situations
• How well does it scale? By itself, it does not scale beyond a single box We’d need asharding solution layered on top (That’s a whole other discussion!)
Trang 18• Will you be the biggest user? Nope, not by far Bigger users included Facebook,Twitter, and Google You don’t want to be the biggest user of a technology if youcan help it If you are, you’ll trip over new scalability problems that nobody has had
a chance to debug yet
• How mature is it? Maturity became the real differentiator Maturity to us is a meas‐ure of the blood, sweat, and tears that have gone into a program divided by itscomplexity MySQL is reasonably complex, but not nearly so compared to some ofthe magic autoclustering NoSQL solutions available Additionally, MySQL has had
28 years of the best and the brightest contributing back to it from such companies
as Facebook and Google, who use it at massive scale Of all the technologies we
looked at, by our definition of maturity, MySQL was a clear choice
• Does it have good debugging tools? As a product matures, you naturally get greatdebugging and profiling tools since people are more likely to have been in a similarsticky situation You’ll find yourself in trouble at 3 A.M (multiple times) Beingable to root cause an issue and get back to bed is better than rewriting for anothertechnology by 6 A.M
Based on our survey of 10 or so database technologies, MySQL was the clear choice.MySQL is great, but it kinda drops you off at your destination with no baggage and youhave to fend for yourself It works very well and you can connect to it, but as soon asyou start using it and scaling, the questions starting flying:
• My query is running slow, now what?
• Should I enable compression? How do I do it?
• What are ways of scaling beyond one box?
• How do I get replication working? How about master-master replication?
• REPLICATION STOPPED! NOW WHAT?!
• What are options for durability (fsync speeds)?
• How big should my buffers be?
• There are a billion fields in mysql.ini What are they? What should they be set to?
• I just accidentally wrote to my slave! How do I prevent that from happening again?
• How do I prevent running an UPDATE with no where clause?
• What debugging and profiling tools should I be using?
• Should I use InnoDB, MyISAM, or one of several other flavors of storage engine?The online community is helpful for answering specific questions, finding examples,bug fixes, and workarounds, but often lacks a strong cohesive story, and deeper dis‐cussions about architecture are few and far between We knew how to use MySQL at
Trang 19small scale, but this scale and pace were insane High Availability MySQL provided
insights that allowed us to squeeze more out of MySQL
One new feature in MySQL 5.6, Global Transaction Handlers, adds a unique identifier
to every transaction in a replication tree This new feature makes failover and slavepromotion far easier We’ve been waiting for this for a long time and it’s well covered inthis new edition
During our grand re-architecture to a sharded solution, we referred to this book forarchitectural decisions, such as replication techniques and topologies, data shardingalternatives, monitoring options, tuning, and concerns in the cloud It gave us a deeperunderstanding of how MySQL works underneath the hood, which allowed us to makebetter informed choices around the high level queries, access patterns, and structureswe’d be using, as well as iterate on our design afterward The resulting MySQL archi‐tecture still serves Pinterest’s core data needs today
—Yashwanth Nelapati and Marty Weiner
Pinterest February 2014
Trang 21Foreword for the First Edition
A lot of research has been done on replication, but most of the resulting concepts arenever put into production In contrast, MySQL replication is widely deployed but hasnever been adequately explained This book changes that Things are explained herethat were previously limited to people willing to read a lot of source code and spend alot of time—including a few late-night sessions—debugging it in production
Replication enables you to provide highly available data services while enduring theinevitable failures There are an amazing number of ways for things to fail, includingthe loss of a disk, server, or data center Even when hardware is perfect or fully redundant,people are not Database tables will be dropped by mistake Applications will writeincorrect data Occasional failure is assured But with reasonable preparation, recoveryfrom failure can also be assured The keys to survival are redundancy and backups.Replication in MySQL supports both
But MySQL replication is not limited to supporting failure recovery It is frequently used
to support read scale-out MySQL can efficiently replicate to a large number of servers.For applications that are read-mostly, this is a cost-effective strategy for supporting alarge number of queries on commodity hardware
And there are other interesting uses for MySQL replication Online data definition lan‐guage (DDL) is a very complex feature to implement in a relational database manage‐ment system MySQL does not support online DDL, but through the use of replication,you can implement something that is frequently good enough You can get a lot donewith replication if you are willing to be creative
Replication is one of the features that made MySQL wildly popular It is also the featurethat allows you to convert a popular MySQL prototype into a successful business-criticaldeployment Like most of MySQL, replication favors simplicity and ease of use As aconsequence, it is occasionally less than perfect when running in production This bookexplains what you need to know to successfully use MySQL replication It will help you
to understand how replication has been implemented, what can go wrong, how to pre‐
Trang 22vent problems, and how to fix them when—despite your best attempts at prevention—they crop up.
MySQL replication is also a work in progress Change, like failure, is also assured.MySQL is responding to that change, and replication continues to get more efficient,more robust, and more interesting For instance, row-based replication is new in MySQL5.1
While MySQL deployments come in all shapes and sizes, I care most about data servicesfor Internet applications and am excited about the potential to replicate from MySQL
to distributed storage systems like HBase and Hadoop This will make MySQL better atsharing the data center
I have been on teams that support important MySQL deployments at Facebook andGoogle I’ve encountered many of the problems covered in this book and have had theopportunity and time to learn solutions The authors of this book are also experts onMySQL replication, and by reading this book you can share their expertise
—Mark Callaghan
Trang 23The authors of this book have been creating parts of MySQL and working with it formany years Dr Charles Bell is a senior developer leading the MySQL Utilities team Hehas also worked on replication and backup His interests include all things MySQL,database theory, software engineering, microcontrollers, and three-dimensional print‐ing Dr Mats Kindahl is a principal senior software developer currently leading theMySQL High Availability and Scalability team He is architect and implementor of sev‐eral MySQL features Dr Lars Thalmann is the development director and technical lead
of the MySQL Replication, Backup, Connectors, and Utilities teams, and has designedmany of the replication and backup features He has worked on the development ofMySQL clustering, replication, and backup technologies
We wrote this book to fill a gap we noticed among the many books on MySQL Thereare many excellent books on MySQL, but few that concentrate on its advanced featuresand applications, such as high availability, reliability, and maintainability In this book,you will find all of these topics and more
We also wanted to make the reading a bit more interesting by including a runningnarrative about a MySQL professional who encounters common requests made by hisboss In the narrative, you will meet Joel Thomas, who recently decided to take a jobworking for a company that has just started using MySQL You will observe Joel as helearns his way around MySQL and tackles some of the toughest problems facing MySQLprofessionals We hope you find this aspect of the book entertaining
Who This Book Is For
This book is for MySQL professionals We expect you to have basic knowledge of SQL,MySQL administration, and the operating system you are running We provide intro‐ductory information about replication, disaster recovery, system monitoring, and otherkey topics of high availability See Chapter 1 for other books that offer useful backgroundinformation
Trang 24How This Book Is Organized
This book is divided into two parts Part I encompasses MySQL high availability andscale-out Because these depend a great deal on replication, a lot of this part focuses onthat topic Part II examines monitoring and performance concerns for building robustdata centers
Part I, High Availability and Scalability
Chapter 1, Introduction, explains how this book can help you and gives you a contextfor reading it
Chapter 2, MySQL Replicant Library, introduces a Python library for working with sets
of servers that is used throughout the book
Chapter 3, MySQL Replication Fundamentals, discusses both manual and automatedprocedures for setting up basic replication
Chapter 4, The Binary Log, explains the critical file that ties together replication andhelps in disaster recovery, troubleshooting, and other administrative tasks
Chapter 5, Replication for High Availability, shows a number of ways to recover fromserver failure, including the use of automated scripts
Chapter 6, MySQL Replication for Scale-Out, shows a number of techniques and top‐ologies for improving the read scalabilility of large data sets
Chapter 7, Data Sharding, shows techniques for handling very large databases and/orimproving the write scalability of a database through sharding
Chapter 8, Replication Deep Dive, addresses a number of topics, such as secure datatransfer and row-based replication
Chapter 9, MySQL Cluster, shows how to use this tool to achieve high availability
Part II, Monitoring and Managing
Chapter 10, Getting Started with Monitoring, presents the main operating system pa‐rameters you have to be aware of, and tools for monitoring them
Chapter 11, Monitoring MySQL, presents several tools for monitoring database activityand performance
Chapter 12, Storage Engine Monitoring, explains some of the parameters you need tomonitor on a more detailed level, focusing on issues specific to MyISAM or InnoDB
Chapter 13, Replication Monitoring, offers details about how to keep track of what mas‐ters and slaves are doing
Trang 25Chapter 14, Replication Troubleshooting, shows how to deal with failures and restarts,corruption, and other incidents.
Chapter 15, Protecting Your Investment, explains the use of backups and disaster re‐covery techniques
Chapter 16, MySQL Enterprise Monitor, introduces a suite of tools that simplifies many
of the tasks presented in earlier chapters
Chapter 17, Managing MySQL Replication with MySQL Utilities, introduces the MySQLUtilities, which are a new set of tools for managing MySQL Servers
Conventions Used in This Book
The following typographical conventions are used in this book:
Indicates command-line options, variables and other code elements, the contents
of files, and the output from commands
Constant width bold
Shows commands or other text that should be typed literally by the user
Constant width italic
Shows text that should be replaced with user-supplied values
This element signifies a tip or suggestion
Trang 26This element signifies a general note.
This element indicates a warning or caution
Using Code Examples
Supplemental material (code examples, exercises, etc.) is available for download at at
We appreciate, but do not require, attribution An attribution usually includes the title,
author, publisher, and ISBN For example: “MySQL High Availability, by Charles Bell,
Mats Kindahl, and Lars Thalmann Copyright 2014 Charles Bell, Mats Kindahl, andLars Thalmann, 978-1-44933-958-6.”
If you feel your use of code examples falls outside fair use or the permission given above,feel free to contact us at permissions@oreilly.com
Safari® Books Online
Safari Books Online (www.safaribooksonline.com) is an demand digital library that delivers expert content in bothbook and video form from the world’s leading authors intechnology and business
on-Technology professionals, software developers, web designers, and business and crea‐tive professionals use Safari Books Online as their primary resource for research, prob‐lem solving, learning, and certification training
Trang 27Safari Books Online offers a range of product mixes and pricing programs for organi‐zations, government agencies, and individuals Subscribers have access to thousands ofbooks, training videos, and prepublication manuscripts in one fully searchable databasefrom publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Pro‐fessional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, JohnWiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FTPress, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technol‐ogy, and dozens more For more information about Safari Books Online, please visit us
We also want to thank our extremely talented colleagues on the MySQL team and in theMySQL community who have provided comments, including Alfranio Correia, AndreiElkin, Zhen-Xing He, Serge Kozlov, Sven Sandberg, Luis Soares, Rafal Somla, Li-BingSong, Ingo Strüwing, Dao-Gang Qu, Giuseppe Maxia, and Narayanan Venkateswaranfor their tireless dedication to making MySQL the robust and powerful tool it is today
We especially would like to thank our MySQL customer support professionals, who help
us bridge the gap between our customers’ needs and our own desires to improve the
Trang 28product We would also like to thank the many community members who so selflesslydevote time and effort to improve MySQL for everyone.
Finally, and most important, we would like to thank our editor, Andy Oram, who helped
us shape this work, for putting up with our sometimes cerebral and sometimes the-top enthusiasm for all things MySQL A most sincere thanks goes out to the entireO’Reilly team and especially our editor for their patience as we struggled to fit so manynew topics into what was already a very large book
over-Charles would like to thank his loving wife, Annette, for her patience and understandingwhen he was spending time away from family priorities to work on this book Charleswould also like to thank his many colleagues on the MySQL team at Oracle who con‐tribute their wisdom freely to everyone on a daily basis Finally, Charles would like tothank all of his brothers and sisters in Christ who both challenge and support him daily.Mats would like to thank his wife, Lill, and two sons, Jon and Hannes, for their uncon‐ditional love and understanding in difficult times You are the loves of his life and hecannot imagine a life without you Mats would also like to thank his MySQL colleaguesinside and outside Oracle for all the interesting, amusing, and inspiring times together
—you are truly some of the sharpest minds in the trade
Lars would like to thank his amazing girlfriend Claudia; he loves her beyond words Hewould also like to thank all of his colleagues, current and past, who have made MySQLsuch an interesting place to work In fact, it is not even a place The distributed nature
of the MySQL development team and the open-mindedness of its many dedicated de‐velopers are truly extraordinary The MySQL community has a special spirit that makesworking with MySQL an honorable task What we have created together is remarkable
It is amazing that it started with such a small group of people and managed to build aproduct that services so many of the Fortune 500 companies today
Trang 29PART I High Availability and Scalability
One of the key database features that supports both high availability and scalability in
an application is replication Replication is used to create redundancy in the databaselayer as well as to make copies of the database available for scaling the reads Part I covershow you can use replication to ensure high availability and how you can scale yoursystem
Trang 31CHAPTER 1
Introduction
Joel looked through the classified ads for a new job His current job was a good one, andthe company had been very accommodating to him while he attended college But ithad been several years since he graduated, and he wanted to do more with his career
“This looks promising,” he said, circling an advertisement for a computer science spe‐cialist working with MySQL He had experience with MySQL and certainly met theacademic requirements for the job After reading through several other ads, he decided
to call about the MySQL job After a brief set of cursory questions, the human resourcesmanager granted him an interview in two days’ time
Two days and three interviews later, he was introduced to the company’s president andchief executive officer, Robert Summerson, for his final technical interview He waitedwhile Mr Summerson paused during the questions and referred to his notes So far, theywere mostly mundane questions about information technology, but Joel knew the hardquestions about MySQL were coming next
Finally, the interviewer said, “I am impressed with your answers, Mr Thomas May Icall you Joel?”
“Yes, sir,” Joel said as he endured another uncomfortable period while the interviewerread over his notes for the third time
“Tell me what you know about MySQL,” Mr Summerson said before placing his hands
on his desk and giving Joel a very penetrating stare
Joel began explaining what he knew about MySQL, tossing in a generous amount of thematerial he had read the night before After about 10 minutes, he ran out of things totalk about
Mr Summerson waited a couple of minutes, then stood and offered Joel his hand AsJoel rose and shook Mr Summerson’s hand, Summerson said, “That’s all I need to hear,Joel The job is yours.”
Trang 32“Thank you, sir.”
Mr Summerson motioned for Joel to follow him out of his office “I’ll take you back tothe HR people so we can get you on the payroll Can you start two weeks from Monday?”Joel was elated and couldn’t help but smile “Yes, sir.”
“Excellent.” Mr Summerson shook Joel’s hand again and said, “I want you to comeprepared to evaluate the configuration of our MySQL servers I want a complete report
on their configuration and health.”
Joel’s elation waned as he drove out of the parking lot He didn’t go home right away.Instead, he drove to the nearest bookstore “I’m going to need a good book on MySQL,”
• Provide plans for recovery of business-essential data in the event of a disaster It isalso likely that you will have to execute the procedure at least once
• Provide plans for handling a large customer/user base and monitoring the load ofeach node in the site in order to optimize it
• Plan for rapid scale-out in the event the user base grows rapidly
For all these cases, it is critical to plan for the events in advance and be prepared to actquickly when necessary
Because not all applications using big sets of servers are websites, we prefer to use the
term deployment—rather than the term site or website—to refer to the server that you
are using to support some kind of application This could be a website, but could just
as well be a customer relationship management (CRM) system or an online game Thebook focuses on the database layer of such a system, but there are some examples thatdemonstrate how the application layer and the database layer integrate
You need three things to keep a site responsive and available: backups of data, redun‐dancy in the system, and responsiveness The backups can restore a node to the state itwas in before a crash, redundancy allows the site to continue to operate even if one ormore of the nodes stops functioning, and the responsiveness makes the system usable
in practice
Trang 331 You are not restricted to using a single backup method; you can just as well use a mix of different methods depending on your needs For each case, however, you have to make a choice of the most appropriate method
to do the backup.
There are many ways to perform backups, and the method you choose will depend onyour needs.1 Do you need to recover to an exact point in time? In that case, you have toensure that you have all that is necessary for performing a point-in-time recovery(PITR) Do you want to keep the servers up while making a backup? If so, you need toensure that you are using some form of backup that does not disturb the running server,such as an online backup
Redundancy is handled by duplicating hardware, keeping several instances running inparallel, and using replication to keep multiple copies of the same data available onseveral machines If one of the machines fails, it is possible to switch over to anothermachine that has a copy of the same data
Together with replication, backup also plays an important role in scaling your systemand adding new nodes when needed If done right, it is even possible to automaticallyadd new slaves at the press of a button, at least figuratively
What’s This Replication Stuff, Anyway?
If you’re reading this book, you probably have a pretty good idea of what replication isabout It is nevertheless a good idea to review the concepts and ideas
Replication is used to clone all changes made on a server—called the master server or just master—to another server, which is called the slave server or just slave This is
normally used to create a faithful copy of the master server, but replication can be usedfor other purposes as well
The two most common uses of replication are to create a backup of the main server toavoid losing any data if the master crashes and to have a copy of the main server toperform reporting and analysis work without disturbing the rest of the business.For a small business, this makes a lot of things simpler, but it is possible to do a lot morewith replication, including the following:
Support several offices
It is possible to maintain servers at each location and replicate changes to the otheroffices so that the information is available everywhere This may be necessary toprotect data and also to satisfy legal requirements to keep information about thebusiness available for auditing purposes
Ensure the business stays operational even if one of the servers goes down
An extra server can be used to handle all the traffic if the original server goes down
Trang 342 There is an extension called semisynchronous replication as well (see “Semisynchronous Replication” on page
257 ), but that is a relatively new addition Until MySQL 5.7.2 DMR, it externalized the transaction before it was replicated, allowing it to be read before it had been replicated and acknowledged requiring some care when being used for high availability.
Ensure the business can operate even in the presence of a disaster
Replication can be used to send changes to an alternative data center at a differentgeographic location
Protect against mistakes (“oopses”)
It is possible to create a delayed slave by connecting a slave to a master such that
the slave is always a fixed period—for example, an hour—behind the master If amistake is made on the master, it is possible to find the offending statement andremove it before it is executed by the slave
One of the two most important uses of replication in many modern applications is that
of scaling out Modern applications are typically very read-intensive; they have a high
proportion of reads compared to writes To reduce the load on the master, you can set
up a slave with the sole purpose of answering read queries By connecting a load balancer,
it is possible to direct read queries to a suitable slave, while write queries go to the master.When using replication in a scale-out scenario, it is important to understand that
MySQL replication traditionally has been asynchronous2 in the sense that transactionsare committed at the master server first, then replicated to the slave and applied there.This means that the master and slave may not be consistent, and if replication is runningcontinuously, the slave will lag behind the master
The advantage of using asynchronous replication is that it is faster and scales better thansynchronous replication, but in cases where it is important to have current data, theasynchrony must be handled to ensure the information is actually up-to-date
Scaling out reads is, however, not sufficient to scale all applications With growing de‐mands on larger databases and higher write load, it is necessary to scale more than justreads Managing larger databases and improving performance of large database systems
can be accomplished using techniques such as sharding With sharding, the database is
split into manageable chunks, allowing you to increase the size of the database by dis‐tributing it over as many servers as you need as well as scaling writes efficiently.Another important application of replication is ensuring high availability by adding
redundancy The most common technique is to use a dual-master setup (i.e., using
replication to keep a pair of masters available all the time, where each master mirrorsthe other) If one of the masters goes down, the other one is ready to take over imme‐diately
In addition to the dual-master setup, there are other techniques for achieving highavailability that do not involve replication, such as using shared or replicated disks
Trang 35Although they are not specifically tied to MySQL, these techniques are important toolsfor ensuring high availability.
So, Backups Are Not Needed Then?
A backup strategy is a critical component of keeping a system available Regular backups
of the servers provide safety against crashes and disasters, which, to some extent, can
be handled by replication Even when replication is used correctly and efficiently, how‐ever, there are some things that it cannot handle You’ll need to have a working backupstrategy for the following cases:
Protection against mistakes
If a mistake is discovered, potentially a long time after it actually occurred, repli‐cation will not help In this case, it is necessary to roll back the system to a timebefore the mistake was introduced and fix the problem This requires a workingbackup schedule
Replication provides some protection against mistakes if you are using a delayed slave, but if the mistake is discovered after the delay period, the change willhave already taken effect on the slave as well So, in general, it is not possible toprotect against mistakes using replication only—backups are required as well
time-Creating new servers
When creating new servers—either slaves for scale-out purposes or new masters toact as standbys—it is necessary to make a backup of an existing server and restorethat backup image on the new server This requires a quick and efficient backupmethod to minimize the downtime and keep the load on the system at an acceptablelevel
Legal reasons
In addition to pure business reasons for data preservation, you may have legalrequirements to keep data safe, even in the event of a disaster Not complying withthese requirements can pose significant problems to operating the business
In short, a backup strategy is necessary for operating the business, regardless of anyother precautions you have in place to ensure that the data is safe
What’s With All the Monitoring?
Even if you have replication set up correctly, it is necessary to understand the load onyour system and to keep a keen eye on any problems that surface As business require‐ments shift due to changing customer usage patterns, it is necessary to balance thesystem to use resources as efficiently as possible and to reduce the risk of losing avail‐ability due to sudden changes in resource utilization
Trang 36There are a number of different things that you can monitor, measure, and plan for tohandle these types of changes Here are some examples:
• You can add indexes to tables that are frequently read
• You can rewrite queries or change the structure of databases to speed up executiontime
• If locks are held for a long time, it is an indication that several connections are usingthe same table It might pay off to switch storage engines
• If some of your scale-out slaves are hot-processing a disproportionate number ofqueries, the system might require some rebalancing to ensure that all the scale-outslaves are hit evenly
• To handle sudden changes in resource usage, it is necessary to determine the normalload of each server and understand when the system will start to respond slowlybecause of a sudden increase in load
Without monitoring, you have no way of spotting problematic queries, hot slaves, orimproperly utilized tables
Is There Anything Else I Can Read?
There is plenty of literature on using MySQL for various jobs, and also a lot of literatureabout high-availability systems Here is a list of books that we strongly recommend ifyou are going to work with MySQL:
This is the reference to MySQL and consists of 1,200 pages (really!) packed with
everything you want to know about MySQL (and probably a lot that you don’t want
Written by one of the most prominent thinkers in the industry, this is a must foranybody working with systems of scale
The book uses a Python library developed by the authors (called the MySQL Python
on Launchpad
Trang 37In the next chapter, we will start with the basics of setting up replication, so get a com‐fortable chair, open your computer, and we’ll get started
Joel was adjusting his chair when a knock sounded from his door
“Settling in, Joel?” Mr Summerson asked
Joel didn’t know what to say He had been tasked to set up a replication slave on his firstday on the job and while it took him longer than he had expected, he had yet to hear hisboss’s feedback about the job Joel spoke the first thing on his mind: “Yes, sir, I’m stilltrying to figure out this chair.”
“Nice job with the documentation, Joel I’d like you to write a report explaining whatyou think we should do to improve our management of the database server.”
Joel nodded “I can do that.”
“Good I’ll give you another day to get your office in order I expect the report by Wed‐nesday, close of business.”
Before Joel could reply, Mr Summerson walked away
Joel sat down and flipped another lever on his chair He heard a distinct click as the backgave way, forcing him to fling his arms wide “Whoa!” He looked toward his door as heclumsily picked up his chair, thankful no one saw his impromptu gymnastics “OK, thatlever is now off limits,” he said
Trang 39CHAPTER 2
MySQL Replicant Library
Joel opened his handy text file full of common commands and tasks and copied theminto another editor, changing the values for his current need It was a series of commandsinvolving a number of tools and utilities “Ah, this is for the birds!” he thought “Therehas got to be a better way.”
Frustrated, he flipped open his handy MySQL High Availability tome and examined the
table of contents “Aha! A chapter on a library of replication procedures Now, this iswhat I need!”
Automating administrative procedures is critical to handling large deployments, so youmight be asking, “Wouldn’t it be neat if we could automate the procedures in this book?”
In many cases, you’ll be happy to hear that you can This chapter introduces the MySQLReplicant library, a simple library written by the authors for managing replication Wedescribe the basic principles and classes, and will extend the library with new func‐tionality in the coming chapters
The code is available at Launchpad, where you can find more information and downloadthe source code and documentation
The Replicant library is based around the idea of creating a model of the connectionsbetween servers on a computer (any computer, such as your laptop), like the model in
Figure 2-1 The library is designed so you can manage the connections by changing themodel For example, to reconnect a slave to another master, just reconnect the slave inthe model, and the library will send the appropriate commands for doing the job
Trang 40Figure 2-1 A replication topology reflected in a model
Besides the simple replication topology shown in Figure 2-1, two other basic topologiesinclude tree topologies and dual masters (used for providing high availability) Topol‐ogies will be covered in more depth in Chapter 6
To make the library useful on a wide variety of platforms and for a wide variety ofdeployments, it has been constructed with the following in mind:
• The servers are likely to run on a variety of operating systems, such as Windows,Linux, and flavors of Unix such as Solaris or Mac OS X Procedures for starting andstopping servers, as well as the names of configuration files, differ depending onthe operating system The library should therefore support different operating sys‐tems and it should be possible to extend it with new operating systems that are not
in the library
• The deployment is likely to consist of servers running different versions of MySQL.For example, while you are upgrading a deployment to use new versions of theserver, it will consist of a mixture of old and new versions The library should beable to handle such a deployment
• A deployment consists of servers with many different roles, so it should be possible
to specify different roles for the servers In addition, it should be possible to createnew roles that weren’t anticipated at the beginning Also, servers should be able tochange roles
• It is necessary to be able to execute SQL queries on each server This functionality
is needed for configuration as well as for extracting information necessary to man‐age the deployment This support is also used by other parts of the system to im‐plement their jobs—for example, to implement a slave promotion