postgresql 9.0 high performance

Table of ContentsChapter 1: PostgreSQL Versions 7 Performance of historical PostgreSQL releases 8 Choosing a version to deploy 9Upgrading to a newer major version 9 Upgrades to PostgreSQ

Trang 3

PostgreSQL 9.0 High Performance

All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews

Every effort has been made in the preparation of this book to ensure the accuracy

of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information.First published: October 2010

Trang 5

About the Author

Gregory Smith is a Principal Consultant for international database professional services firm 2ndQuadrant, and founder of the company's first United States office.Writing about PostgreSQL represents his second foray into teaching database

performance tuning Greg wrote a small, free e-book titled Progress Performance FAQ

in 1995, covering the basics of how to make the Progress 4GL and its associated database run faster In 2001, he converted exclusively to using PostgreSQL 7.0 for projects, and has been watching the complexity of problems the database is capable

of solving increase with every release since

Greg has contributed feature additions to every PostgreSQL version since 8.3 He's also the creator of a growing set of add-on tools for the database, currently including pgtune, pgbench-tools, peg, and 2warm

I was able to focus on the material in this book well enough to do

it justice only through the support provided by Simon Riggs and

the rest of the 2ndQuadrant staff around the world The exposure

to interesting problems to solve, and resources to solve them, has

made working with 2ndQuadrant staff and clients a fertile source for

PostgreSQL performance ideas over the last year

The writing schedule pace needed to deliver a current book covering

a major new database release just after it ships is grueling I'd never

have made it through so many weeks of working all seven days

without the support of my family: Judy, Jerry, and Amanda

Finally, the material in this book only exists because of the hundreds

of contributors to the PostgreSQL project And without the free

sharing of ideas on mailing lists like performance and

pgsql-hackers the last few years, I'd never have been able to collect up

such a wide survey of common performance issues Whether it was

having my own questions answered, or carefully considering how to

answer someone else's, the interaction on those mailing lists has been

vital to forming the ideas of this book

Trang 6

About the Reviewers

Kevin Grittner has worked in the computer industry since 1972 While he has filled many roles during decades of consulting, working with databases has been

a major focus—particularly in terms of optimization and providing frameworks for efficient application development against a database In the mid 80s, he was

the architect and primary author of the PROBER Database and Development

Environment, which was never released commercially but enjoyed widespread

use in certain vertical markets, such as fire departments, hospitals, and probation and parole agencies

Jim Mlodgenski is Chief Architect at EnterpriseDB He is one of EnterpriseDB's first employees, having joined the company in May, 2005 Over several years, Jim has been responsible for key activities such as sales engineering, professional services, strategic technology solutions delivery, and customer education

Prior to joining EnterpriseDB, Jim was a partner and architect at Fusion Technologies,

a technology services company For nearly a decade, Jim developed early designs and concepts for Fusion's consulting projects and specialized in Oracle application development, web development, and open source information architectures

I want to thank my wonderful wife Stacie and awesome son Paul for

supporting me

Trang 7

Scott Marlowe has over 25 years of experience in software development, system administration, and database development His first program was a gradebook program for the Air Force and he's been hooked ever since.Scott works for

Edline/Schoolfusion as a systems administrator and DBA

I'd like to thank my two sons for being the greatest kids in the world, and my best friend Darren for all the expertise and knowledge we've shared in the last decade or so

Trang 8

Table of Contents

Chapter 1: PostgreSQL Versions 7

Performance of historical PostgreSQL releases 8

Choosing a version to deploy 9Upgrading to a newer major version 9

Upgrades to PostgreSQL 8.3+ from earlier ones 10

Additional PostgreSQL-related software 16

Chapter 2: Database Hardware 21

Trang 9

Table of Contents

[ ii ]

Performance impact of write-through caching 38

Chapter 3: Database Hardware Benchmarking 41

STREAM memory testing 42

Sources of slow memory and processors 45

Random access and I/Os Per Second 47Sequential access and ZCAV 48

Complicated disk benchmarks 62

Disk performance expectations 65

Trang 10

Table of Contents

[ iii ]

General Linux filesystem tuning 79

Database directory tree 92

Disk arrays, RAID, and disk layout 94

Chapter 5: Memory for Database Caching 99

Memory units in the postgresql.conf 99Increasing UNIX shared memory parameters for larger buffer sizes 100

Installing pg_buffercache into a database 105Database disk layout 106

Trang 11

Table of Contents

[ iv ]

Creating a new block in a database 108Writing dirty blocks to disk 109

Checkpoint processing basics 110Write-ahead log and recovery processing 110

Database block lifecycle 113

Database buffer cache versus operating system cache 114

Starting size guidelines 116

Platform, version, and workload limitations 117

Inspection of the buffer cache queries 118

Using buffer cache inspection for sizing feedback 123

Chapter 6: Server Configuration Tuning 125

Defaults and reset values 126Allowed change context 126Reloading the configuration file 127

Trang 12

Chapter 7: Routine Maintenance 149

Transaction visibility with multiversion concurrency control 149

Visibility computation internals 149

Trang 13

Common vacuum and autovacuum problems 167

autovacuum is running even though it was turned off 167

Measuring index bloat 172

Basic PostgreSQL log setup 175

Chapter 8: Database Benchmarking 189

Query script definition 191Configuring the database server for pgbench 193

Trang 14

Worker threads and pgbench program limitations 204

Transaction Processing Performance Council benchmarks 207

Chapter 9: Database Indexing 209

Measuring query disk and index block statistics 210

Simple index lookups 213

Lookup with an inefficient index 216

Switching from indexed to sequential scans 218

Clustering against an index 219Explain with buffer counts 221

Trang 15

Table of Contents

[ viii ]

Expression-based indexes 229Indexing for full-text search 230

Chapter 10: Query Optimization 233

Hot and cold cache behavior 237

Basic cost computation 240

Machine readable explain output 243

Subquery Scan and Subplan 258

Trang 16

Difficult areas to estimate 276

effective_cache_size 277

constraint_exclusion 279cursor_tuple_fraction 279

Optimizing for fully cached data sets 281Testing for query equivalence 281Disabling optimizer features 282Working around optimizer bugs 287Avoiding plan restructuring with OFFSET 287External trouble spots 290

Numbering rows in SQL 291Using Window functions for numbering 292Using Window functions for cumulatives 293

Trang 17

Table of Contents

[ x ]

Virtual transactions 307Decoding lock information 309Transaction lock waits 312

Logging lock information 314

Buffer, background writer, and checkpoint activity 318

Saving pg_stat_bgwriter snapshots 319Tuning using background writer statistics 322

Chapter 12: Monitoring and Trending 325

Enabling sysstat and its optional features 342

Windows System Monitor 344

Types of monitoring and trending software 346

Trang 18

Read scaling with replication queue software 369

Chapter 15: Partitioning Data 375

Determining a key field to partition over 376Sizing the partitions 377

Trang 19

Tuning for bulk loads 401Skipping WAL acceleration 402Recreating indexes and adding constraints 402

Slow function and prepared statement execution 406PL/pgSQL benchmarking 407High foreign key overhead 408

Heavy statistics collector overhead 409

Trang 20

Table of Contents

[ xiii ]

Aggressive PostgreSQL version upgrades 413

Trang 22

PrefacePostgreSQL has become an increasingly viable database platform to serve as storage for applications, from classic corporate database use to the latest web apps But getting the best performance from it has not been an easy subject to learn You need just the right combination of rules of thumb to get started, solid monitoring, and maintenance to keep your system running well, suggestions for troubleshooting, and hints for add-on tools to add the features the core database doesn't try to handle on its own.

What this book covers

Chapter 1, PostgreSQL Versions introduces how PostgreSQL performance has

improved in the most recent versions of the databases It makes a case for using the most recent version feasible, in contrast to the common presumption that newer versions of any software are buggier and slower than their predecessors

Chapter 2, Database Hardware discusses how the main components in server

hardware, including processors, memory, and disks, need to be carefully selected for reliable database storage and a balanced budget In particular, accidentally using volatile write-back caching in disk controllers and drives can easily introduce database corruption

Chapter 3, Database Hardware Benchmarking moves on to quantifying the different

performance aspects of database hardware Just how fast is the memory and raw drives in your system? Does performance scale properly as more drives are added?

Chapter 4, Disk Setup looks at popular filesystem choices and suggests the trade-offs

of various ways to layout your database on disk Some common, effective filesystem tuning tweaks are also discussed

Trang 23

[ 2 ]

Chapter 5, Memory for Database Caching digs into how the database is stored on disk,

in memory, and how the checkpoint process serves to reconcile the two safely It also suggests how you can actually look at the data being cached by the database,

to confirm whether what's being stored in memory matches what you'd expect

to be there

Chapter 6, Server Configuration Tuning covers the most important settings in the

postgresql.conf file, what they mean, and how you should set them And the settings you can cause trouble by changing are pointed out, too

Chapter 7, Routine Maintenance starts by explaining how PostgreSQL determines what

rows are visible to which clients The way visibility information is stored requires a cleanup process named VACUUM to reuse leftover space properly Common issues and general tuning suggestions for it and the always running autovacuum are covered Finally, there's a look at adjusting the amount of data logged by the database, and using a query log analyzer on the result to help find query bottlenecks

Chapter 8, Database Benchmarking investigates how to get useful benchmark results

from the built-in pgbench testing program included with PostgreSQL

Chapter 9, Database Indexing introduces indexes in terms of how they can reduce the

amount of data blocks read to answer a query That approach allows for thoroughly investigating common questions like why a query is using a sequential scan instead

of an index in a robust way

Chapter 10, Query Optimization is a guided tour of the PostgreSQL optimizer, exposed

by showing the way sample queries are executed differently based on what they are asking for and how the database parameters are set

Chapter 11, Database Activity and Statistics looks at the statistics collected inside the

database, and which of them are useful to find problems The views that let you watch query activity and locking behavior are also explored

Chapter 12, Monitoring and Trending starts with how to use basic operating system

monitoring tools to determine what the database is doing Then it moves onto

suggestions for trending software that can be used to graph this information

over time

Chapter 13, Pooling and Caching explains the difficulties you can encounter when

large numbers of connections are made to the database at once Two types of

software packages are suggested to help: connection poolers, to better queue

incoming requests, and caches that can answer user requests without connecting

to the database

Trang 24

[ 3 ]

Chapter 14, Scaling with Replication covers approaches for handling heavier system

loads by replicating the data across multiple nodes, typically a set of read-only nodes synchronized to a single writeable master

Chapter 15, Partitioning Data explores how data might be partitioned into subsets

usefully, such that queries can execute against a smaller portion of the database Approaches discussed include the standard single node database table partitioning, and using PL/Proxy with its associated toolset to build sharded databases across multiple nodes

Chapter 16, Avoiding Common Problems discusses parts of PostgreSQL that regularly

seem to frustrate newcomers to the database Bulk loading, counting records, and foreign key handling are examples This chapter ends with a detailed review of what performance related features changed between each version of PostgreSQL from 8.1

to 9.0 Sometimes, the best way to avoid a common problem is to upgrade to version where it doesn't happen anymore

What you need for this book

In order for this book to be useful, you need at least access to a PostgreSQL client that is allowed to execute queries on a server Ideally, you'll also be the server

administrator Full client and server packages for PostgreSQL are available for most popular operating systems at http://www.postgresql.org/download/

All of the examples here are executed at a command prompt, usually running the psql program This makes them applicable to most platforms It's straightforward

to do many of these operations instead using a GUI tool for PostgreSQL, such as the pgAdmin III program

There are some scripts provided that are written in the bash scripting language If you're on Windows, the cygwin software suite available from http://www.cygwin.com/ provides a way to get common UNIX tools such as bash onto your system

Who this book is for

This book is aimed at intermediate to advanced database administrators using or planning to use PostgreSQL Portions will also interest systems administrators looking to build or monitor a PostgreSQL installation, as well as developers

interested in advanced database internals that impact application design

Trang 25

Code words in text are shown as follows: " If you are sorting data, work_mem determines when those sorts are allowed to execute in memory "

A block of code is set as follows:

time sh -c "dd if=/dev/zero of=bigfile bs=8k count=blocks && sync"

time dd if=bigfile of=/dev/null bs=8k

Any command-line input or output is written as follows:

$ psql -e -f indextest.sql > indextest.out

New terms and important words are shown in bold.

Warnings or important notes appear in a box like this

Tips and tricks appear like this

Trang 26

[ 5 ]

To send us general feedback, simply send an e-mail to feedback@packtpub.com, and mention the book title via the subject of your message

If there is a book that you need and would like to see us publish, please send

us a note in the SUGGEST A TITLE form on www.packtpub.com or e-mail

suggest@packtpub.com

If there is a topic that you have expertise in and you are interested in either writing

or contributing to a book, see our author guide on www.packtpub.com/authors

Customer support

Now that you are the proud owner of a Packt book, we have a number of things

to help you to get the most from your purchase

Downloading the example code for this book

You can download the example code files for all Packt books you have

purchased from your account at http://www.PacktPub.com If you

purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you

Errata

Although we have taken every care to ensure the accuracy of our content,

mistakes do happen If you find a mistake in one of our books—maybe a mistake

in the text or the code—we would be grateful if you would report this to us By doing so, you can save other readers from frustration and help us improve

subsequent versions of this book If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the

errata submission form link, and entering the details of your errata Once your

errata are verified, your submission will be accepted and the errata will be uploaded

on our website, or added to any list of existing errata, under the Errata section

of that title Any existing errata can be viewed by selecting your title from

http://www.packtpub.com/support

Trang 27

[ 6 ]

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media

At Packt, we take the protection of our copyright and licenses very seriously If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy

Please contact us at copyright@packtpub.com with a link to the suspected

Trang 28

PostgreSQL VersionsPostgreSQL certainly has a reputation It's known for having a rich feature set and very stable software releases The secure stance which its default configuration takes

is simultaneously praised by security fans and criticized for its learning curve The SQL-specification conformance and data integrity features allow only the strictest ways to interact with the database, which is surprising to those who come from a background working with looser desktop database software All of these points have an element of truth to them

Another part of PostgreSQL's reputation is that it's slow This too has some truth

to it, even today There are many database operations where "the right thing" takes longer to do than the alternative As the simplest example of this, consider the date

"February 29, 2009" With no leap year in 2009, that date is only valid as an abstract one It's not possible for this to be the real date of something that happened If you ask the database to store this value into a standard date field, it can just do that, the fast approach Alternatively, it can check whether that date is valid to store into the destination field, note that there is no such date in a regular calendar, and reject your change That's always going to be slower PostgreSQL is designed by and for the sort

of people who don't like cutting corners just to make things faster or easier, and in cases where the only way you can properly handle something takes a while that may

be the only option available

However, once you have a correct implementation of something, you can then go back and optimize it That's the mode PostgreSQL has been in for the last few years PostgreSQL usually rises above these smaller issues to give excellent database performance Parts of it have the sort of great design that outperforms simpler approaches, even after paying the overhead that complexity can introduce This is

a fairly recent phenomenon though, which explains quite a bit about the perception that PostgreSQL is a slower database than its competitors

Trang 29

a major improvement in the ability of the database to scale upwards to handle a heavy load Benchmarks on modern hardware really highlight just how far that version leapfrogged earlier ones You can find an excellent performance comparison

of versions 8.0 through 8.4 from György Vilmos at http://suckit.blog

hu/2009/09/29/postgresql_history This shows exactly how dramatic

these improvements have been These tests use the Online Transaction

Processing (OLTP) test of the sysBench benchmarking software, available

at http://sysbench.sourceforge.net/

This test gives a transactions per second (TPS) figure that measures the total system speed, and you can run it in either a read-only mode or one that includes writes The read-only performance improved by over four times from 8.0 to 8.1 and more than doubled again by 8.3:

Version Peak Read-Only TPS # of clients at peak

Version Peak Write TPS # of clients at peak

Trang 30

Chapter 10, Query Optimization.

These improvements have been confirmed by other benchmarking results, albeit normally not covering such a wide range of versions It's easy to see that any

conclusion about PostgreSQL performance reached before late 2005, when 8.1

shipped, is completely out of date at this point The speed improvement in 2008's 8.3 release was an additional large leap Versions before 8.3 are not representative of the current performance and there are other reasons to prefer using that one or a later one too

Choosing a version to deploy

Because of these dramatic gains, if you have an older PostgreSQL system you'd like to make faster, the very first thing you should ask yourself is not how to tweak its settings, but instead, if it's possible to upgrade to a newer version If you're starting a new project, 8.3 is the earliest version you should consider In addition

to the performance improvements, there were some changes to that version that impact application coding that you'd be better off to start with to avoid needing

to retrofit later

Chapter 16, Avoiding Common Problems includes a reference guide to what

performance-related features were added to each major version of PostgreSQL, from 8.1 through 9.0 You might discover that one of the features only available in a very recent version is compelling to you, and therefore have a strong preference to use that one Many of these version-specific changes are also highlighted throughout the book

Upgrading to a newer major version

Until very recently, the only way to upgrade an existing PostgreSQL version to a newer major version, such as going from 8.1.X to 8.2.X, was to dump and reload The pg_dump and/or pg_dumpall programs are used to write the entire contents of the database to a file, using the newer versions of those programs That way, if any changes need to be made to upgrade, the newer dumping program can try to handle them Not all upgrade changes will happen automatically though Then, depending

on the format you dumped in, you can either restore that just by running the script

it generates or use the pg_restore program to handle that task pg_restore can be

a much better alternative in newer PostgreSQL versions that include a version with parallel restore capabilities

Trang 31

Dumping can take a while, and restoring can take even longer While this is going

on, your database likely needs to be down, so that you don't allow any changes that won't then be migrated over by the dump For large databases, this downtime can be both large and unacceptable

The most demanding sites prefer near zero downtime, to run 24/7 There a dump and reload is never an acceptable option Until recently, the only real approach available for doing PostgreSQL upgrades in those environments has been using statement replication to do so Slony is the most popular tool for that, and more information

about it is in the Chapter 14, Scaling with Replication One of Slony's features is that you

don't have to be running the same version of PostgreSQL on all the nodes you are replicating to You can bring up a new node running a newer PostgreSQL version, wait for replication to complete, and then switch over once it matches the original.Now, there is another way available that works without needing any replication software A program originally called pg_migrator at http://pgfoundry.org/projects/pg-migrator/ is capable of upgrading from 8.3 to 8.4 without the dump

and reload This process is called in-place upgrading You need to test this carefully,

and there are both known limitations and likely still unknown ones related to less popular PostgreSQL features Be sure to read the documentation of the upgrade tool very carefully Starting in PostgreSQL 9.0, this module is included with the core database, with the name changed to pg_upgrade While all in-place upgrades have some risk and need careful testing, in many cases, pg_upgrade will take you from 8.3 or 8.4 to 9.0 and hopefully beyond

The PostgreSQL development community is now committed to allowing in-place upgrades to future versions Now that terabyte and larger PostgreSQL installs are common, upgrading only using dump and reload just isn't always practical

Upgrades to PostgreSQL 8.3+ from earlier ones

The major internal changes of 8.3 make it impossible to upgrade from any earlier version past it without dumping the entire database and reloading it into the later one This makes 8.3 a doubly important version milestone to cross Not only is it much faster than 8.2, once your data is in 8.3, you can perform in-place upgrades from there

Trang 32

version will need to be updated to work against 8.3 or later It is possible to work around this issue by manually adding back the automatic typecasting features that were removed http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html provides a sample However, fixing the behavior in your application instead is a more robust and sustainable solution to the problem The old behavior was eliminated because it caused subtle application issues If you just add it back, you'll both be exposed to those and need to continue doing this extra cast addition step with every new PostgreSQL release There is more information available at http://blog.endpoint.com/2010/01/postgres-upgrades-ten-problems-and.html on this topic and on the general challenges of doing a major PostgreSQL upgrade.

Minor version upgrades

A dump/reload or the use of tools like pg_upgrade is not needed for minor version updates, for example, going from 8.4.1 to 8.4.2 These simply require stopping the server, installing the new version, and then running the newer database binary against the existing server data files Some people avoid ever doing such upgrades once their application is running for fear that a change in the database will cause a problem This should never be the case for PostgreSQL The policy of the PostgreSQL project described at http://www.postgresql.org/support/versioning states very clearly:

While upgrades always have some risk, PostgreSQL minor releases fix only

frequently-encountered security and data corruption bugs to reduce the risk of

upgrading The community considers not upgrading to be riskier than upgrading.

You should never find an unexpected change that breaks an application in a minor PostgreSQL upgrade Bug, security, and corruption fixes are always done in a way that minimizes the odds of introducing an externally visible behavior change, and if that's not possible, the reason why and the suggested workarounds will be detailed

in the release notes What you will find is that some subtle problems, resulting from resolved bugs, can clear up even after a minor version update It's not uncommon to discover a report of a problem to one of the PostgreSQL mailing lists is resolved in the latest minor version update compatible with that installation, and upgrading to that version is all that's needed to make the issue go away

Trang 33

PostgreSQL Versions

[ 12 ]

PostgreSQL or another database?

There are certainly situations where other database solutions will perform better For example, PostgreSQL is missing features needed to perform well on some

of the more difficult queries in the TPC-H test suite (see the Chapter 8, Database

Benchmarking for more details) It's correspondingly less suitable for running

large data warehouse applications than many of the commercial databases If you need queries like some of the very heavy ones TPC-H includes, you may find that databases such as Oracle, DB2, and SQL Server still have a performance advantage worth paying for There are also several PostgreSQL-derived databases that include features, making them more appropriate for data warehouses and similar larger systems Examples include Greenplum, Aster Data, and Netezza

For some types of web applications, you can only get acceptable performance by cutting corners on the data integrity features in ways that PostgreSQL just won't allow These applications might be better served by a less strict database such as MySQL

or even a really minimal one like SQLite Unlike the fairly mature data warehouse market, the design of this type of application is still moving around quite a bit Work

on approaches using the key-value-based NoSQL approach, including CouchDB, MongoDB, and Cassandra, are all becoming more popular at the time of writing this All of them can easily outperform a traditional database, if you have no need to run the sort of advanced queries that key/value stores are slower at handling

But for many "normal" database use cases, in the middle ground between those two extremes, PostgreSQL performance in 8.3 reached a point where it's more likely you'll run into the limitations of your hardware or application design before the database is your limiting factor Moreover, some of PostgreSQL's traditional strengths, like its ability to handle complicated queries well and its heavy

programmability, are all still there

associated utilities that can only be developed as a part of the database itself When new features are proposed, if it's possible for them to be built and distributed "out

of core", this is the preferred way to do things This approach keeps the database core as streamlined as possible, as well as allowing those external projects to release their own updates without needing to synchronize them against the main database's release schedule

Trang 34

Chapter 1

[ 13 ]

Successful PostgreSQL deployments should recognize that a number of additional tools, each with their own specialized purpose, will need to be integrated with the database core server to build a complete system

PostgreSQL contrib

One part of the PostgreSQL core that you may not necessarily have installed is what's called the contrib modules (it is named after the contrib directory they are stored in) These are optional utilities shipped with the standard package, but that aren't necessarily installed by default on your system The contrib code is

maintained and distributed as part of the PostgreSQL core, but not required for the server to operate

From a code quality perspective, the contrib modules aren't held to quite as high of a standard primarily by how they're tested The main server includes heavy regression tests for every feature, run across a large build farm of systems that look for errors The optional contrib modules don't get that same level of testing coverage However, the code itself is maintained by the same development team, and some of the

modules are extremely popular and well tested by users

A list of all the contrib modules available is at http://www.postgresql.org/docs/current/static/contrib.html

Finding contrib modules on your system

One good way to check if you have contrib modules installed is to see if the

pgbench program is available That's one of the few contrib components that

installs a full program, rather than just the scripts you can use Here's a UNIX

example of checking for pgbench:

$ pgbench -V

pgbench (PostgreSQL) 9.0

If you're using an RPM or DEB packaged version of PostgreSQL, as the case would

be on many Linux systems, the optional postgresql-contrib package contains all of the contrib modules and their associated installer scripts You may have to add that package using yum, apt-get, or a similar mechanism if it wasn't installed already On Solaris, the package is named SUNWpostgr-contrib

Trang 35

[ 14 ]

If you're not sure where your system PostgreSQL contrib modules are installed, you can use a filesystem utility to search locate works well for this purpose on many UNIX-like systems, as does the find command The file search utilities, available on the Windows Start menu, will work A sample file you could look for is pg_buffercache.sql, which will be used in the upcoming chapter on

memory allocation Here's where that might be on some of the platforms that

PostgreSQL supports:

RHEL and CentOS Linux systems will put the main file you need into /usr/share/pgsql/contrib/pg_buffercache.sql

Debian or Ubuntu Linux systems will install the file at

/usr/share/postgresql/version/contrib/pg_buffercache.sql

Solaris installs it into /usr/share/pgsql/contrib/pg_buffercache.sqlThe standard Windows one-click installer with the default options will always include the contrib modules, and this one will be in C:\Program Files\PostgreSQL/version/share/contrib/pg_buffercache.sql

Installing a contrib module from source

Building your own PostgreSQL from source code can be a straightforward exercise

on some platforms, if you have the appropriate requirements already installed on the server Details are documented at http://www.postgresql.org/docs/current/static/install-procedure.html

After building the main server code, you'll also need to compile contrib modules like pg_buffercache by yourself too Here's an example of how that would work, presuming that your PostgreSQL destination is /usr/local/postgresql and that there's a directory there named source you put the source code into (this is not intended to be a typical or recommended structure you should use):

Trang 36

Chapter 1

[ 15 ]

It's also possible to build and install all the contrib modules at once by running make/make install from the contrib directory Note that some of these have more extensive source code build requirements The uuid-ossp module is an example of

a more challenging one to compile yourself

Using a contrib module

While some contrib programs like pgbench are directly executable, most are utilities that you install into a database in order to add extra features to it

As an example, to install the pg_buffercache module into a database named abc, the following command line would work (assuming the RedHat location of the file):

$ psql -d abc -f /usr/share/postgresql/contrib/pg_buffercache.sql

You could instead use the pgAdmin III GUI management utility, which is bundled with the Windows installer for PostgreSQL, instead of the command line:

Navigate to the database you want to install the module into

Click on the SQL icon in the toolbar to bring up the command editor

Choose File/Open Navigate to C:\Program Files\PostgreSQL/version/share/contrib/pg_buffercache.sql and open that file

Execute using either the green arrow or Query/Execute.

You can do a quick test of the module installed on any type of system by running the following quick query:

SELECT * FROM pg_buffercache;

If any results come back, the module was installed Note that pg_buffercache will only be installable and usable by database superusers

Windows software allowing access to PostgreSQL through Net and OLEConnection poolers like pgpool and pgBouncer

Database management utilities like pgFouine, SkyTools, and pgtune

Trang 37

[ 16 ]

While sometimes maintained by the same people who work on the PostgreSQL core, pgFoundry code varies significantly in quality One way to help spot the healthier projects is to note how regularly and recently new versions have been released

Additional PostgreSQL-related software

Beyond what comes with the PostgreSQL core, the contrib modules, and software available on pgFoundry, there are plenty of other programs that will make

PostgreSQL easier and more powerful These are available at sources all over the Internet There are actually so many available that choosing the right package for a requirement can itself be overwhelming

Some of the best programs will be highlighted throughout the book, to help provide

a short list of the ones you should consider early This approach, where you get

a basic system running and then add additional components as needed, is the

standard way large open-source projects are built

It can be difficult for some corporate cultures to adapt to that style such as the ones where any software installation requires everything from approval to a QA cycle In order to improve the odds of your PostgreSQL installation being successful in such environments, it's important to start early on introducing this concept Additional programs to add components building on the intentionally slim database core will be needed later, and not all of what's needed will be obvious at the beginning

PostgreSQL application scaling lifecycle

While every application has unique growth aspects, there are many common

techniques that you'll find necessary as an application using a PostgreSQL

database becomes used more heavily The chapters of this book each focus on one of the common aspects of this process The general path that database servers follow includes:

1 Select hardware to run the server on Ideally, you'll test that hardware to make sure it performs as expected too

2 Set up all the parts of database disk layout: RAID level, filesystem, and possibly table/index layout on disk

3 Optimize the server configuration

4 Monitor server performance and how well queries are executing

5 Improve queries to execute more efficiently, or add indexes to help

accelerate them

Trang 38

Chapter 1

[ 17 ]

6 As it gets more difficult to just tune the server to do more work, instead reduce the amount it has to worry about by introducing connection pooling and caching

7 Replicate the data onto multiple servers and distribute reads among them

8 Partition larger tables into sections Eventually, really large ones may need

to be split so that they're written to multiple servers simultaneously

This process is by no means linear You can expect to make multiple passes over optimizing the server parameters It may be the case that you decide to buy newer hardware first, rather than launching into replication or partitioning work that requires application redesign work Some designs might integrate caching into the design from the very beginning The important thing is to be aware of the various options available and to collect enough data about what limits the system is reaching

to decide which of the potential changes is most likely to help

Performance tuning as a practice

Work on improving database performance has its own terminology, just like any other field Here are some terms or phrases that will be used throughout the book:

Bottleneck or limiting factor: Both of these terms will be used to refer to the

current limitation that is keeping the performance from getting better

Benchmarking: Running a test to determine how fast a particular operation

can run This is often done to figure out where the bottleneck of a program

or system is

Profiling: Monitoring what parts of a program are using the most resources

when running a difficult operation such as a benchmark This is typically

to help prove where the bottleneck is, and whether it's been removed as expected after a change Profiling a database application usually starts with monitoring tools such as vmstat and iostat Popular profiling tools at the code level include gprof, oprofile, and dtrace

One of the interesting principles of performance tuning work is that, in general, you cannot figure out what the next bottleneck an application will run into is until you remove the current one When presented with a system that's not as fast as someone would expect it to be, you'll often see people guessing what the current bottleneck is,

or what the next one will be That's generally a waste of time You're always better off measuring performance, profiling the parts of the system that are slow, and using that to guess at causes and guide changes

•

Trang 39

[ 18 ]

Let's say what you've looked at suggests that you should significantly increase

shared_buffers, the primary tunable for memory used to cache database reads and writes This normally has some positive impact, but there are potential negative things you could encounter instead The information needed to figure out which category a new application will fall into, whether this change will increase or decrease performance, cannot be predicted from watching the server running with the smaller setting This falls into the category of chaos theory: even a tiny change in the starting conditions can end up rippling out to a very different end condition, as the server makes millions of decisions and they can be impacted to a small degree by that change Similarly, if shared_buffers is set too small, there are several other parameters that won't work as expected at all, such as those governing database checkpoints

Since you can't predict what's going to happen most of the time, the mindset you need to adopt is one of heavy monitoring and change control Monitor as much

as possible—from application to database server to hardware Introduce a small targeted change Try to quantify what's different and be aware that some changes you have rejected as not positive won't always stay that way forever Move the bottleneck to somewhere else, and you may discover that some parameter that didn't matter before is now suddenly the next limiting factor

There's a popular expression on the mailing list devoted to PostgreSQL performance when people speculate about root causes without doing profiling to prove their theories: "less talk, more gprof" While gprof may not be the tool of choice for every performance issue, given it's more of a code profiling tool than a general monitoring one, the idea that you measure as much as possible before speculating

as to the root causes is always a sound one You should also measure again

to validate your change did what you expected too

Another principle that you'll find a recurring theme of this book is that you must

be systematic about investigating performance issues Do not assume your server

is fast because you bought it from a reputable vendor; benchmark the individual components yourself Don't start your database performance testing with application level tests; run synthetic database performance tests that you can compare against other people's first That way, when you run into the inevitable application

slowdown, you'll already know your hardware is operating as expected and that the database itself is running well Once your system goes into production, some

of the basic things you might need to do in order to find a performance problem, such as testing hardware speed, become impossible to take the system down

Trang 40

Chapter 1

[ 19 ]

You'll be in much better shape if every server you deploy is tested with a common methodology, which is exactly what later chapters here lead you through Just because you're not a "hardware guy", it doesn't mean you should skip over the parts here that cover things like testing your disk performance You need to perform work like that as often as possible when exposed to new systems—that's the only way

to get a basic feel of whether something is operated within the standard range

of behavior or if instead there's something wrong

Summary

PostgreSQL has come a long way in the last five years After building solid database fundamentals, the many developers adding features across the globe have made significant strides in adding both new features and performance improvements in recent releases The features added to the latest PostgreSQL 9.0, making replication and read scaling easier than ever before, are expected to further accelerate the types

of applications the database is appropriate for

The extensive performance improvements in PostgreSQL 8.1 and 8.3 in particular shatter some earlier notions that the database server was slower than its main competitors

There are still some situations where PostgreSQL's feature set results in slower query processing than some of the commercial databases it might otherwise displace

If you're starting a new project using PostgreSQL, use the latest version possible (and strongly prefer to deploy 8.3 or later)

PostgreSQL works well in many common database applications, but certainly there are applications, it's not the best choice for

Not everything you need to manage and optimize a PostgreSQL server will be included in a basic install Be prepared to include some additional number of utilities that add features outside of what the core database aims to provide

Performance tuning is best approached as a systematic, carefully

Tiêu đề	PostgreSQL 9.0 High Performance
Tác giả	Gregory Smith
Trường học	Birmingham - Mumbai
Chuyên ngành	Database Performance Tuning
Thể loại	Book
Năm xuất bản	2010
Thành phố	Birmingham

Định dạng
Số trang	468
Dung lượng	11,93 MB