PostgreSQL: The comprehensive guide to building, programming, and administering PostgreSQLdatabases, Second Edition By Korry Douglas, Susan Douglas .... Publisher: Sams Publishing Pub Da
Trang 1PostgreSQL: The comprehensive guide to building, programming, and administering PostgreSQL
databases, Second Edition
By Korry Douglas, Susan Douglas
Publisher: Sams Publishing Pub Date: July 26, 2005 ISBN: 0-672-32756-2 Pages: 1032
Table of Contents | Index
The second edition of the best-selling PostgreSQL has been updated to completely cover
new features and capabilities of the 8.0 version of PostgreSQL You will be lead through the internals of the powerful PostgreSQL open source database chapter, offering an easy- to-read, code-based approach that makes it easy to understand how each feature is implemented, how to best use each feature, and how to get more performance from database applications This definitive guide to building, programming and administering the powerful PostgreSQL open-source database system will help you harness one of the most widely used open source, enterprise-level database systems.
Trang 2PostgreSQL: The comprehensive guide to building, programming, and administering PostgreSQL
databases, Second Edition
By Korry Douglas, Susan Douglas
Publisher: Sams Publishing Pub Date: July 26, 2005 ISBN: 0-672-32756-2 Pages: 1032
Trang 9Copyright © 2006 by Sams Publishing
All rights reserved No part of this book shall be reproduced,stored in a retrieval system, or transmitted by any means,
electronic, mechanical, photocopying, recording, or otherwise,without written permission from the publisher No patent
liability is assumed with respect to the use of the informationcontained herein Although every precaution has been taken inthe preparation of this book, the publisher and author assume
no responsibility for errors or omissions Nor is any liabilityassumed for damages resulting from the use of the informationcontained herein
regarded as affecting the validity of any trademark or servicemark
Warning and Disclaimer
Trang 10as accurate as possible, but no warranty or fitness is implied.The information provided is on an "as is" basis The authors andthe publisher shall have neither liability nor responsibility to anyperson or entity with respect to any loss or damages arisingfrom the information contained in this book
Bulk Sales
Sams Publishing offers excellent discounts on this book whenordered in quantity for bulk purchases or special sales For
Trang 12These days, it seems that most discussion of open-source
software centers around the idea that you should not have totie your future to the whim of some giant corporation Peoplesay that open-source software is better than proprietary
software because it is developed and maintained by the usersinstead of a faceless company out to lighten your wallet
I think that the real value in free software is education I havenever learned anything by reading my own code[1] On the otherhand, it's a rare occasion when I've looked at code written bysomeone else and haven't come away with another tool in mytoolkit People don't think alike I don't mean that people
disagree with each other; I mean that people solve problems indifferent ways Each person brings a unique set of experiences
to the table Each person has his own set of goals and biases.Each person has his own interests All of these things will shapethe way you think about a problem Often, I'll find myself in aheated disagreement with a colleague only to realize that weare each correct in our approach Just because I'm right,
doesn't mean that my colleague can't be right as well
[1] Maybe I should say that I have never learned anything new by reading my own code I've certainly looked at
code that I've written and wondered what I was thinking at the time, learning that I'm not nearly as clever as I had remembered Oddly enough, those who have read my code have reached a similar conclusion.
Open-source software is a great way to learn You can learnabout programming You can learn about design You can learn
about debugging Sometimes, you'll learn how not to design,
code, or debug; but that's a valuable lesson, too You can learnsmall things, like how to cache file descriptors on systems
where file descriptors are a scarce and expensive resource, orhow to use the select() function to implement fine-grained
timers You can learn big things, like how a query optimizer
works or how to write a parser, or how to develop a good
Trang 13PostgreSQL is a great example I've been using databases forthe last two decades I've used most of the major commercialdatabases: Oracle, Sybase, DB2, and MS SQL Server With each
commercial database, there is a wall of knowledge between my needs and the vendor's need to protect his intellectual property.
Until I started exploring open-source databases, I had an
incomplete understanding of how a database works Why wasthis particular feature implemented that way? Why am I gettingpoor performance when I try this? That's a neat feature; I
wonder how they did that? Every commercial database tries toexpose a small piece of its inner workings The explain
statement will show you why the database makes its
optimization decisions But, you only get to see what the vendorwants you to see The vendor isn't trying to hide things fromyou (in most cases), but without complete access to the sourcecode, they have to pick and choose how to expose information
in a meaningful way With open-source software, you can divedeep into the source code and pull out all the information youneed While writing this book, I've spent a lot of time readingthrough the PostgreSQL source code I've added a lot of my
own code to reveal more information so that I could explain
things more clearly I can't do that with a commercial database
There are gems of brilliance in most open-source projects In awell-designed, well-factored project, you will find designs andcode that you can use in your own projects Many open-sourceprojects are starting to split their code into reusable libraries.The Apache Portable Runtime is a good example The ApacheWeb server runs on many diverse platforms The Apache
development team saw the need for a layer of abstraction thatwould provide a portable interface to system functions such asshared memory and network access They decided to factor theportability layer into a library separate from their main project.The result is the Apache Portable Runtimea library of code that
can be used in other open-source projects (such as
Trang 14Some developers hate to work on someone else's code I loveworking on code written by another developerI always learnsomething from the experience I strongly encourage you todive into the PostgreSQL source code You will learn from it Youmight even decide to contribute to the project
Korry Douglas
Trang 15Korry Douglas is the director of research and development for
Appx Software Over the last two decades, he has worked onthe design and implementation of a number of high-level, high-productivity languages and development environments His
products interface with many relational (and non-relational)databases Working with so many different database products(Oracle, Sybase, SQL Server, DB2, PostgreSQL, MySQL, MSQL)has given him a broad understanding of the commonalities of,and differences between, databases
Susan Douglas is the president and CEO of Conjectrix, Inc., a
software company specializing in database technologies andsecurity tools Consulting to the end-user community has givenher widespread database experience and a real appreciation forhigh-quality programs and flexible tools powerful enough tohandle data well and intuitive enough to actually use
Korry and his wife (and best friend) Susan raise horses in ruralVirginia Both are natives of the Pacific Northwest, but preferthe sunshine and open spaces offered by Virginia They bothtelecommute, preferring to spend as much time as possible withtheir 200 or so animal friends (who never complain about buggycode, inelegant design, or poor performance) Susan is an avidequestrienne; Korry gets to clean the barn
Trang 16Thank you to our technical reviewer Vince Vielhaber We
appreciate his many hours spent poring over manuscripts
exposing technical inaccuracies His knowledge and expertisehave been invaluable We'd also like to thank Peter Eisentrautand Barry Stinson for reviewing the first edition of this book and
Paul DuBois (of MySQL fame) for his guidance while we
struggled for clarity in the first edition
We would especially like to thank the developers of PostgreSQLfor the years of development spent producing an excellent
database Without their devotion to the project, it wouldn't haveevolved into the masterpiece we all know today
Most of the books that we read are dedicated to various
household members for the long hours devoted to their writingproject rather than to family life Instead, we have enjoyed thelong hours of R&D spent together, interspersed with screaming(during breaks, on the roller coasters at King's Dominionnot ateach other)
Trang 17As the reader of this book, you are our most important critic
and commentator We value your opinion and want to know
what we're doing right, what we could do better, what areasyou'd like to see us publish in, and any other words of wisdomyou're willing to pass our way
You can email or write me directly to let me know what you did
or didn't like about this bookas well as what we can do to makeour books stronger
Please note that I cannot help you with technical problems
related to the topic of this book, and that due to the high
volume of mail I receive, I might not be able to reply to every message.
When you write, please be sure to include this book's title andauthor as well as your name and phone or email address I willcarefully review your comments and share them with the authorand editors who worked on the book
Email: opensource@samspublishing.com
Mail: Mark Taber
Associate Publisher Sams Publishing
800 East 96th Street Indianapolis, IN 46240 USA
Trang 18For more information about this book or another Sams
Publishing title, visit our website at www.samspublishing.com.Type the ISBN (excluding hyphens) or the title of a book in theSearch field to find the page you're looking for
Trang 19PostgreSQL is a relational database with a long history In thelate 1970s, the University of California at Berkeley began
development of PostgreSQL's ancestora relational database
known as Ingres Relational Technologies turned Ingres into acommercial product Relational Technologies became IngresCorporation and was later acquired by Computer Associates.Around 1986, Michael Stonebraker from UC Berkeley led a teamthat added object-oriented features to the core of Ingres; thenew version became known as Postgres Postgres was againcommercialized; this time by a company named Illustra, whichbecame part of the Informix Corporation Andrew Yu and JollyChen added SQL support to Postgres in the mid-'90s Prior
versions had used a different, Postgres-specific query languageknown as Postquel In 1996, many new features were added,including the MVCC transaction model, more adherence to theSQL92 standard, and many performance improvements
Postgres once again took on a new name: PostgreSQL
Today, PostgreSQL is developed by an international group ofopen-source software proponents known as the PostgreSQLGlobal Development group PostgreSQL is an open-source
productit is not proprietary in any way Red Hat has recentlycommercialized PostgreSQL, creating the Red Hat Database, butPostgreSQL itself will remain free and open source
Trang 20PostgreSQL has benefited well from its long history Today,
PostgreSQL is one of the most advanced database servers
available Here are a few of the features found in a standardPostgreSQL distribution:
Object-relational In PostgreSQL, every table defines a class.PostgreSQL implements inheritance between tables (or, ifyou like, between classes) Functions and operators arepolymorphic
Standards compliant PostgreSQL syntax implements most
of the SQL92 standard and many features of SQL99 Wheredifferences in syntax occur, they are most often related tofeatures unique to PostgreSQL
Open source An international team of developers maintainsPostgreSQL Team members come and go, but the core
members have been enhancing PostgreSQL's performanceand feature set since at least 1996 One advantage to
PostgreSQL's open-source nature is that talent and
knowledge can be recruited as needed The fact that thisteam is international ensures that PostgreSQL is a product
users through table-, page-, or row-level locking
Trang 21referential integrity by supporting foreign and primary keyrelationships as well as triggers Business rules can be
PL/SQL You can also develop server-side code in Tcl, Perl,even bash (the open-source Linux/Unix shell)
addresses
Extensibility One of the most important features of
PostgreSQL is that it can be extended If you don't find
something that you need, you can usually add it yourself.For example, you can add new data types, new functionsand operators, and even new procedural and client
languages There are many contributed packages available
on the Internet For example, Refractions Research, Inc.has developed a set of geographic data types that can beused to efficiently model spatial (GIS) data
Trang 22The first edition of this book covered versions 7.1 through 7.3
In this edition, we've updated the basics and added coveragefor the new features introduced in versions 7.4 and 8.0
Throughout the book, I'll be sure to let you know which featureswork only in new releases, and, in a few cases, I'll explain
features that have been deprecated (that is, features that areobsolete) You can use this book to install, configure, tune,
program, and manage PostgreSQL versions 7.1 through 8.0
Fortunately, the PostgreSQL developers try very hard to
maintain forward compatibilitynew features tend not to breakexisting applications This means that all the features discussed
in this book should still be available and substantially similar inlater versions of PostgreSQL I have tried to avoid talking aboutfeatures that have not been released at the time of
writingwhere I have mentioned future developments, I will point
them out
Who Is This Book For?
If you are already using PostgreSQL, you should find this book auseful guide to some of the features that you might be less
programming in a variety of languages
Trang 23PostgreSQL will fit your needs
Trang 24discusses the sample database we'll be using throughout thebook Chapter 2, "Working with Data in PostgreSQL," describesthe many data types supported by a standard PostgreSQL
distribution; you'll learn how to enter values (literals) for eachdata type, what kind of data you can store with each type, andhow those data types are combined into expressions Chapter
3, "PostgreSQL SQL Syntax and Use," fills in some of the details
we glossed over in the first two chapters You'll learn how tocreate new databases, new tables and indexes, and how
PostgreSQL keeps your data safe through the use of
transactions Chapter 4, "Performance," describes the
PostgreSQL optimizer I'll show you how to get information
about the decisions made by the optimizer, how to decipher thatinformation, and how to influence those decisions
Part II, "Programming with PostgreSQL," is all about PostgreSQLprogramming In Chapter 5, "Introduction to PostgreSQL
Programming," we start off by describing the options you havewhen developing a database application that works with
Trang 25has very fast access to data Each chapter in the remainder ofthe programming section deals with a client-based API You canconnect to a PostgreSQL server using a number of languages Ishow you how to interface to PostgreSQL using C, C++, ecpg,ODBC, JDBC, Perl, PHP, Tcl/Tk, Python, and Microsoft's NET
Chapters 8 through 18 all follow the same pattern: you develop
a series of client applications in a given language The first
client application shows you how to establish a connection tothe database (and how that connection is represented by thelanguage in question) The next client adds error checking sothat you can intercept and react to unusual conditions The
third client in each chapter demonstrates how to process SQLcommands from within the client The final client wraps
everything together and shows you how to build an interactivequery processor using the language being discussed Even ifyou program in only one or two languages, I would encourageyou to study the other chapters in this section I think you'll findthat looking at the same application written in a variety of
languages will help you understand the philosophy followed bythe PostgreSQL development team, and it's a great way to startlearning a new language Chapter 19, "Other Useful
Programming Tools," introduces you to a few programming tools(and interfaces) that you might find useful: PL/Java and PL/Perl.I'll also show you how to use PostgreSQL inside of bash shellscripts
The final part of this book (Part III, "PostgreSQL
Administration") deals with administrative issues The final sixchapters of this book show you how to perform the occasionalduties required of a PostgreSQL administrator In the first twochapters, Chapter 20, "Introduction to PostgreSQL
Administration," and Chapter 21, "PostgreSQL Administration,"you'll learn how to start up, shut down, back up, and restore aserver In Chapter 22, "Internationalization and Localization,"you will learn how PostgreSQL supports internationalization andlocalization PostgreSQL understands how to store and process
Trang 26Chapter 24, "Replicating PostgreSQL with Slony," you'll learnhow to replicate data with PostgreSQL's Slony replication
system Chapter 25, "Contributed Modules," introduces a fewopen-source projects that work well with PostgreSQL I'll showyou how to query a PostgreSQL database using XML, how toconfigure and use TSEARCH2 (a full-text indexing and searchsystem), and how to install and use PgAdmin III, a graphicaluser interface specifically designed for PostgreSQL
Trang 27The first edition of this book hit the shelves in February 2003atthat time, the PostgreSQL developers had just released version7.3.2 Release 7.4 was unleashed in November 2003 In
January 2005, the PostgreSQL developers released version 8.0amajor release full of new features We timed the second edition
of this book to coincide with the release of version 8.0 (the
book will appear in bookstores a few months after 8.0 hits thestreets) In this edition, we've added coverage for all of the(major) new features in 7.3, 7.4, and 8.0, including
Installing, securing, and managing PostgreSQL on Windowshosts
Trang 28Other useful programming tools (PL/Java, pgpash, pgcurl,etc.)
Trang 30Chapter 1 Introduction to PostgreSQL and SQL
company It is developed, maintained, broken, and fixed by agroup of volunteer developers around the world You don't have
to buy PostgreSQLit's free You won't have to pay any
maintenance fees (although you can certainly find commercialsources for technical support)
PostgreSQL offers all the usual features of a relational databaseplus quite a few unique features PostgreSQL offers inheritance(for you object-oriented readers) You can add your own datatypes to PostgreSQL (I know, some of you are probably
thinking that you can do that in your favorite database.) Mostdatabase systems allow you to give a new name to an existingtype Some systems allow you to define composite types With
including C, C++, Java, Python, Perl, TCL/Tk, and others Onthe server side, PostgreSQL sports a powerful procedural
language, PL/pgSQL (okay, the language is sportier than the
name) You can add procedural languages to the server You will
find procedural languages supporting Perl, TCL/Tk, and even the
Trang 31bash shell.
Trang 32Throughout this book, I'll use a simple example database tohelp explain some of the more complex concepts The sampledatabase represents some of the data storage and retrievalrequirements that you might encounter when running a videorental store I won't pretend that the sample database is usefulfor any real-world scenarios; instead, this database will help usexplore how PostgreSQL works and should illustrate many
PostgreSQL features
To begin with, the sample database (which is called movies)
contains three kinds of records: customers, tapes, and rentals
Whenever a customer walks into our imaginary video store, youwill consult your database to determine whether you alreadyknow this customer If not, you'll add a new record What items
of information should you store for each customer? At the veryleast, you will want to record the customer's name You willwant to ensure that each customer has a unique identifieryoumight have two customers named "Danny Johnson," and you'llwant to keep them straight A name is a poor choice for a
unique identifiernames might not be unique, and they can often
be spelled in different ways ("Was that Danny, Dan, or
Daniel?") You'll assign each customer a unique customer ID.You might also want to store the customer's birth date so thatyou know whether he should be allowed to rent certain movies
If you find that a customer has an overdue tape rental, you'llprobably want to phone him, so you better store the customer'sphone number In a real-world business, you would probablywant to know much more information about each customer
(such as his home address), but for these purposes, you'll keepyour storage requirements to a minimum
Next, you will need to keep track of the videos that you stock.Each video has a title and a durationyou'll store those You
Trang 33certainly have many movies with the same duration, so youcan't use either one for a unique identifier Instead, you'll assign
a unique ID to each video
Finally, you will need to track rentals When a customer rents atape, you will store the customer ID, tape ID, and rental date
Notice that you won't store the customer name with each
rental As long as you store the customer ID, you can alwaysretrieve the customer name You won't store the movie titlewith each rental, eitheryou can find the movie title by its uniqueidentifier
At a few points in this book, we might make changes to the
layout of the sample database, but the basic shape will remainthe same
Trang 34Before we get into the interesting stuff, it might be useful to getacquainted with a few of the terms that you will encounter inyour PostgreSQL life PostgreSQL has a long historyyou can
trace its history back to 1977 and a program known as Ingres
A lot has changed in the relational database world since 1977.When you are breaking ground with a new product (as the
Ingres developers were), you don't have the luxury of usingstandard, well-understood, and well-accepted terminologyyouhave to make it up as you go along Many of the terms used byPostgreSQL have synonyms (or at least close analogies) in
today's relational marketplace In this section, I'll show you afew of the terms that you'll encounter in this book and try toexplain how they relate to similar concepts in other databaseproducts
Schema
A schema is a named collection of tables (see table) A
schema can also contain views, indexes, sequences, datatypes, operators, and functions Other relational database
products use the term catalog.
Database
A database is a named collection of schemas When a client
application connects to a PostgreSQL server, it specifies thename of the database that it wants to access A client
cannot interact with more than one database per connection
but it can open any number of connections in order to
access multiple databases simultaneously
Command
Trang 35column
Figure 1.1 A column (highlighted).
Trang 36engine, and so on) In Figure 1.2, the shaded area depicts arow
Figure 1.2 A row (highlighted).
Trang 37equivalent to a row
Composite type
Starting with PostgreSQL version 8, you can create newdata types that are composed of multiple values For
example, you could create a composite type named address
that holds a street address, city, state/province, and postalcode When you create a table that contains a column oftype address, you can store all four components in a singlefield We discuss composite types in more detail in Chapter
2, "Working with Data in PostgreSQL."
Domain
A domain defines a named specialization of another data
type Domains are useful when you need to ensure that asingle data type is used in several tables For example, youmight define a domain named accountNumber that contains a
Trang 38View
A view is an alternative way to present a table (or tables).
You might think of a view as a "virtual" table A view is
(usually) defined in terms of one or more tables When youcreate a view, you are not storing more data, you are
instead creating a different way of looking at existing data
A view is a useful way to give a name to a complex querythat you may have to use repeatedly
Client/server
PostgreSQL is built around a client/server architecture In a
client/server product, there are at least two programs
involved One is a client and the other is a server Theseprograms may exist on the same host or on different hoststhat are connected by some sort of network The serveroffers a service; in the case of PostgreSQL, the server offers
to store, retrieve, and change data The client asks a server
to perform work; a PostgreSQL client asks a PostgreSQLserver to serve up relational data
Client
A client is an application that makes requests of the
PostgreSQL server Before a client application can talk to aserver, it must connect to a postmaster (see postmaster) andestablish its identity Client applications provide a user
interface and can be written in many languages Chapters 8
through 19 will show you how to write a client application
Server
Trang 39commands coming from client applications The PostgreSQLserver has no user interfaceyou can't talk to the server
directly, you must use a client application
Postmaster
Because PostgreSQL is a client/server database, somethinghas to listen for connection requests coming from a clientapplication That's what the postmaster does When a
connection request arrives, the postmaster creates a new
Rollback
A rollback marks the unsuccessful end of a transaction.
When you roll back a transaction, you are telling
PostgreSQL to discard any changes that you have made to
Trang 40Tablespace
A tablespace defines an alternative storage location where
you can create tables and indexes When you create a table(or index), you can specify the name of a tablespaceif youdon't specify a tablespace, PostgreSQL creates all objects inthe same directory tree You can use tablespaces to