We will access relational databases such as Oracle and MySQL and we will work with LDAP repositories.. When you have to create a small standalone application— one that only relies upon a
Trang 1Prepared exclusively for Jacob Hochstetler
Trang 2Beta Book
Agile publishing for agile developers
The book you’re reading is still under development As an experiment,we’re releasing this copy well before we normally would That wayyou’ll be able to get this content a couple of months before it’s avail-able in finished form, and we’ll get feedback to make the book evenbetter The idea is that everyone wins!
Be warned The book has not had a full technical edit, so it will tain errors It has not been copyedited, so it will be full of typos.And there’s been no effort spent doing layout, so you’ll find bad pagebreaks, over-long lines, incorrect hyphenations, and all the other uglythings that you wouldn’t expect to see in a finished book We can’t
con-be held liable if you use this book to try to create a spiffy applicationand you somehow end up with a strangely shaped farm implementinstead Despite all this, we think you’ll enjoy it!
Throughout this process you’ll be able to download updated PDFsfromhttp://books.pragprog.com/titles/fr_eir/reorder.When the book is finally ready, you’ll get the final version (and
subsequent updates) from the same address In the meantime,
we’d appreciate you sending us your feedback on this book at
http://books.pragprog.com/titles/fr_eir/errata.
Thank you for taking part in this experiment
Dave Thomas
Trang 3Enterprise Integration with Ruby
A Pragmatic Guide
Maik Schmidt
The Pragmatic Bookshelf
Raleigh, North Carolina Dallas, Texas
Prepared exclusively for Jacob Hochstetler
Trang 4B o o k s h e l f
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf and the linking g device are trademarks of The Pragmatic Programmers, LLC.
Every precaution was taken in the preparation of this book However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein.
Our Pragmatic courses, workshops, and other products can help you and your team create better software and have more fun For more information, as well as the latest Pragmatic titles, please visit us at
http://www.pragmaticprogrammer.com
Copyright © 2006 The Pragmatic Programmers LLC.
All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or ted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher.
transmit-Printed in the United States of America.
ISBN 0-9766940-6-9
Printed on acid-free paper with 85% recycled, 30% post-consumer content.
B1.2 printing, January 2006
Version: 2006-1-24
Trang 51.1 What Is Enterprise Software? 2
1.2 What Is Enterprise Integration? 3
1.3 Why Ruby? 3
1.4 Who Should Read This Book? 5
1.5 PragBouquet 5
1.6 Acknowledgments 6
2 Databases 8 2.1 The Coupon Application 9
2.2 Database Interface (DBI) 25
2.3 Object-Relational Mappers 28
2.4 Lightweight Directory Access Protocol (LDAP) 51
3 Processing XML 75 3.1 A Short XML reminder 77
3.2 Generating XML documents 79
3.3 Processing XML Documents 91
3.4 Validating XML Documents 123
3.5 Are There Alternatives to XML? 128
4 Low Ceremony Distributed Applications 141 4.1 “I’d Rather Use a Socket” 142
4.2 Remote Procedure Calls Using HTTP 155
5 Distributed Applications with RPC 175 5.1 Another Day, Another Protocol 175
5.2 We Will Take No REST, Will We? 185
5.3 SOAP 196
5.4 CORBA, RMI, and Friends 210
Prepared exclusively for Jacob Hochstetler
Trang 6CONTENTS vi
6.1 Internationalization and Localization 230
6.2 Logging 250
6.3 Creating Daemons and Services 269
6.4 Build and Deployment Process 276
6.5 Project Automation with Rake 293
6.6 Testing Legacy Applications 304
Trang 7There are two types of complex systems: those that have
grown out of simpler systems and those that do not work.
Unknown
Chapter 1
IntroductionHave you ever worked for a big enterprise? Do you remember yourexpectations as you walked into work on that first day? Whistling asthe sun shone brightly, you might have been thinking “It will be great
to work for <company name here> They will have a professional ronment, where coffee is free, where every system has been specifiedaccurately, implemented carefully, and tested thoroughly Hmmmm
envi-I wonder which database and programming language they use.”
After your fifth cup of free coffee (around 9:07) you came to realize thatthe real world looks completely different from your expectations Typi-cal enterprises use dozens, hundreds, and sometimes even thousands
of applications, components, services, and databases Many of themwere custom-built in-house or by third parties, some were bought, oth-ers are based on Open Source projects, and the origin of a few—usuallythe most critical ones—is completely unknown A lot of applicationsare very old, some are fairly new and seemingly no two of them werewritten using the same tools They run on heterogeneous operating sys-tems and hardware, they use databases and messaging systems fromvarious vendors, they were written in different programming languages.The reasons for this are manifold You can find countless books explain-ing why the situation is so bad You can even find books claimingthat they help you to prevent such a chaos This book uses anotherapproach We will not help you to clean up this mess, but we willhelp you to deal with the problems pragmatically Instead of complain-ing that valuable data is spread across different database schemas oracross databases from several vendors, we will write code that inte-grates it We will take it even a step further and write new applicationswhich aggregate all your existing resources It doesn’t matter if we
Prepared exclusively for Jacob Hochstetler
Trang 8WHATISENTERPRISESOFTWARE? 2
have to use relational databases, LDAP repositories, XML files, or web
services based on different protocol standards We will blend data from
multiple, disparate databases to create new business knowledge
Along the way we’ll show you how to solve all the small day-to-day
problems These are the things that occur over and over again,
espe-cially when developing enterprise software We will access relational
databases such as Oracle and MySQL and we will work with LDAP
repositories We’ll show you how to do application logging, how to
deploy your software, how to automate tedious and error-prone tasks,
and how to survive in an international environment Oh, and as you
might have guessed already from the book’s title, we will use Ruby to
accomplish all these things
1.1 What Is Enterprise Software?
In Patterns of Enterprise Application Architecture [?], Martin Fowler writes:
“Enterprise applications are about the display, manipulation, and
stor-age of large amounts of often complex data and the support or
automa-tion of business processes with that data.”
That’s a concise but nevertheless abstract definition, because every
non-trivial piece of software has to store, manipulate, and display data
Video games do nothing else (and modern video games also need huge
amounts of data that often can get complex) The key point in the
defi-nition above is the second part: that the data in enterprise applications
is used for business processes and not for rendering alien space ships
Unsurprisingly, there are more differences between enterprise
applica-tions and other types of software For example, enterprise applicaapplica-tions
are often created only for a small user group that is in close contact
with the development team, implying the developers know their
cus-tomers very well In extreme cases programs are written for only a
single person (special report generators for theCEO, for example)
Enterprise software demands a certain set of tools Large amounts
of data—complex or not—have to be stored somehow and somewhere
Often it is stored in relational databases, but it can also be in plain
text files orLDAP repositories In addition, modern enterprise software
is often based on distributed architectures consisting of many small
to mid-size components that perform specialized tasks and that are
connected by some kind of middleware such asCORBA,RMI,SOAP, and
XML-RPC
Trang 9WHATISENTERPRISEINTEGRATION? 3
Obviously, as an enterprise software developer you’re better off if you
know how to deal with such technologies You shouldn’t be troubled
by the details of reading from a relational database or accessing aLDAP
repository Mastering skills such as these help you to concentrate on
the fun stuff—the application itself
1.2 What Is Enterprise Integration?
Enterprise integration is a rather vague term and cannot be defined in
a strict mathematical sense Simply put, it happens whenever you use
an existing enterprise resource to achieve some results If you use an
existing database or web service in your application, you’re
perform-ing enterprise integration If you build a new component that is used
by other pieces of your existing architecture, you’re doing enterprise
integration, too
Integration needn’t just happen inside a single enterprise It’s possible—
and not too unusual—that the software or data of two different
enter-prises has to be integrated If you’re using a payment gateway to bill
your customers, for example, you’re effectively integrating enterprise
software
You might ask yourself if every development activity in an enterprise
environment is some kind of enterprise integration There are a few
exceptions Enterprise integration does not happen when you build a
completely new piece of software from scratch, for example In reality
this case is rare, but from a theoretical point of view this is the only
clear exception
Enterprise integration often means integration with standard software
such as databases, LDAP repositories, message queues, ERMsystems,
and so on If you’re using one of these technologies, chances are good
that you’re doing some enterprise integration
1.3 Why Ruby?
Most enterprise software running today was written in languages such
as COBOL, C/C++, and Java Because of its distributed nature,
enter-prise software often makes it easy to use new tools and programming
languages When you have to create a small standalone application—
one that only relies upon an existing database, SOAP service, or LDAP
repository—it almost doesn’t seem to matter if you were to write it in
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 10WHYRUBY? 4
C++, Java, or Ruby But if you look into it more deeply, dynamic
lan-guages such as Perl, Python, and Ruby have many of advantages,
espe-cially in enterprise environments:
• They are interpreted and do not need a compile phase, which
increases development speed tremendously After editing your
program you can see the results of your changes immediately
• Enterprise software is about munging data Dynamic languages
are designed to handle data, and include high-level data types
such as hashes
• Memory management is dealt with by the language This is a great
advantage over languages such as C++ where you have to specify
the length of each string you read from a database Dynamic
lan-guages prevent waste and result in more concise, more robust,
and more secure software
• Software written in dynamic languages is installed as source code,
so you always know exactly which version is currently running on
your production system Gone are the days when you had to guess
if a certain binary executable is the right one
We will show you Ruby’s strengths and how it helps you to
accom-plish many tasks much faster, more elegant, and with more fun than
with any other programming language available today But, even more
important, we will also tell you about Ruby’s weaknesses Ruby is
com-paratively young and although the core of the language is mature and
lots of excellent libraries are available, many things are still missing or
incomplete
Although there is no industry standard for enterprise programming
with Ruby (as there is with J2EE or NET), everything you need is
readily available The most important libraries come with every Ruby
distribution and the standard distribution has grown rapidly over the
last years All the other stuff can be found in public places such as
RubyForge1 or the Ruby Application Archive2
1http://www.rubyforge.org
2http://raa.ruby-lang.org
Trang 11WHOSHOULDREADTHISBOOK? 5
1.4 Who Should Read This Book?
This book was written for experienced enterprise developers who know
Java, C#, or C++, but don’t know much Ruby (although you should
probably have read Programming Ruby [?]) We assume you are familiar
with relational databases and have at least an idea whatLDAPis Maybe
you do not know RELAX NG, but you understand the concepts of XML
and you know what well-formed, SAX2, and DOM mean
You’ve probably used tools such as object-relational mappers Maybe
you’re familiar with Enterprise Java Beans (EJB), Java Data Objects
(JDO) and so on Maybe you’re fed up with editing configuration files
instead of coding You are looking for better ways to integrate the
exist-ing resources in your company and you are lookexist-ing for better ways to
quickly create new and fancy applications based on all the wonderful
stuff you already have
Depending on the tools you’ve used to build your architecture, different
choices are available for the integration process If you’re using
mes-sage queues you have a lot of freedom and flexibility for integrating your
services and software with others The same holds true for all kinds of
web service protocols It’s slightly different with databases, because
they usually do not offer interfaces as clean as message based systems
do Sometimes you have to access tables directly, sometimes you have
to use a set of stored procedures written in a proprietary database
pro-gramming language
In this book we do not talk about sophisticated messaging patterns
Instead, we cover the basics We show you how to use databases, web
services, XML files, and all the other legacy stuff you want to combine
for building new applications
1.5 PragBouquet
To make things more interesting and tangible we’ve founded an
imag-inary company called PragBouquet It sells flowers from a web shop
Customers from all over the world can order flowers and send them to
people living in the United States
PragBouquet’s business demands a lot of components and services It
depends on several partners, too Their current infrastructure is shown
in Figure1.1, on the following page Customers place orders in the web
shop The shop communicates with the central order system Because
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 12ACKNOWLEDGMENTS 6
Figure 1.1: PragBouquet Infrastructure
PragBouquet has no billing system, the order system uses an external
payment gateway to charge orders In parallel the production system is
informed of new orders and busy florists create wonderful bunches of
flowers Eventually, the floral goods are picked up by a parcel service
and are delivered to the happy recipient
This is only a rough overview We’ll show single components in more
detail when necessary
1.6 Acknowledgments
First, I’d like to thank Dave Thomas and Andy Hunt for giving me the
opportunity to write this book for The Pragmatic Bookshelf Working
with them has been both an honor and a pleasure I couldn’t imagine
better or more professional working conditions
It would be impossible to write a book about software for enterprise
integration without the software itself The following gentlemen kindly
made their ingenious work public for free, and have always responded
quickly and accurately to all my questions: Yukihiro “matz” Matsumoto,
Will Drewry, arton (the author of Rjb), Sean Russel, Ian Macdonald,
Takaaki Tateishi, Thomas Uehlinger, Jim Weirich, Nikolai Lugovoi, Daniel
Berger, why the lucky stiff, Minero Aoki, Michael Neumann, Kubo
Take-hiro, Tomita MasaTake-hiro, Matt Mower, David Heinemeier Hansson, Hiroshi
Trang 13ACKNOWLEDGMENTS 7
Nakamura, John W Small, Takahashi Masayoshi, Gotou Yuuzou, Yoshida
Masato, and Grant McLean
Please, stand up with me and give my reviewers a round of applause:
Frank Tewissen, Matthias “Matze” Klame, Uwe Simon, and Kaan Karaca
did an awesome job! Without their corrections and suggestions this
book wouldn’t be half as good
A loud “Thank you very much!!!” goes to all the people who sent errata
and suggestions during the beta book process: Lee Grey, Hoang Uong,
Ola Bini, Ron Lusk, John Athayde, Blair Zajac, Jim Weirich, Pat
Poden-ski, Gregory Brown, Lachlan Dowding, Sean, Eldon, Stuart Halloway,
Raymond Brigleb, Ken Barker, Peter Morelli, Eric-Olivier Lamey, and
Jim Kimball
Perhaps there are authors who write books in isolation under a rock
or on a lonesome island Fortunately, I didn’t and got invaluable
sup-port from a lot of wonderful people I am deeply grateful to my
par-ents (this one’s for you), my sister Yvonne Janka (yet another book you
won’t read?), my brother Andrè Schmidt (for relaxing shopping/running
tours and even more relaxing evenings with “the boys”), Christian &
Agnieszka Rattat (for being true friends when I needed them most),
Frank Tewissen (for listening patiently and for advising carefully), Manu
(for being “die Manu”! Heja BVB!), AleX Reinartz (I’m looking forward to
the next decades), Bettina Hamidian & Corinna Lorscheid (for
insight-ful talks and lots of fun), Katja Wevelsiep (let’s have a coffee tomorrow,
OK?), Frank Möcke (for giving me the opportunity to publish texts in
my mother tongue), Dr Andreas Kötz (for your appreciation), and to the
“Gleis drei” staff (for providing a perfect proof-reading environment)
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 14Chapter 2
Databases
Database management systems are one of the oldest and most widelyused applications in information technology—they are indispensable toenterprises It’s nearly impossible to do some serious enterprise inte-gration without touching some kind of database directly or indirectly.Various types exist (relational databases, object-oriented databases,directory services,XMLdatabases, and hash databases such as Berke-ley DB) They differ mainly in the way data is organized and accessedinternally Under the hood, though, they are all similar: data is stored
in some kind of file system and is accessed through a special layer,often over a network You can find one or more of the different types
in every company, but relational databases are by far the most popularones in use today
Although it’s often tedious, repetitive, and error-prone work, accessingdatabases is, in principle, easy You open a connection, create andexecute some statements, read and process some data, and finally freeall resources occupied At least, that’s how the Gods Of Persistencewanted it to be But real life in our sinful world looks different Infor-mation and business logic is often spread across different schemas anddatabases To make things even worse many companies use productsfrom multiple vendors This happens for various reasons: they want
to prevent vendor lock-in, our company is the product of a corporatemerger, different departments prefer different tools, and so on
Unfortunately, PragBouquet is no exception Its data is stored in bothOracle and MySql databases In this chapter we will show you not onlyhow to directly manipulate different types of databases, but also how
to access them using more advanced tools such as object-relationalmappers and database abstraction layers
Trang 15THECOUPONAPPLICATION 9
2.1 The Coupon Application
PragBouquet’s business has been doing well, but business can always
be better, can’t it? To boost sales, the marketing department wants
to send a coupon to every customer who’s used the online store, but
hasn’t used it in the last 6 months People who have been asked not to
be e-mailed should not get an e-mail
That does not sound too difficult PragBouquet already has a mass
mailing program that expects a CSV (Comma Separated Values) file
containing e-mail addresses, customer names, and a text to be sent
The problem becomes selecting names and e-mail addresses of all
cus-tomers who did not place an order in the last 6 months, filtering out
those who do not want to be e-mailed, and writing the rest to the CSV
file
Instantly you’ve fired up your favorite text editor thinking that this is
a great opportunity to strengthen your Ruby skills CreatingCSV files
is a breeze and selecting some data sets from a database should not
be a problem, either So you ask your database administrator where
you can find the information you need and he takes you down a peg or
two He tells you that for historical reasons (an euphemism for “Nobody
knows why”) the information you need is spread across two databases
Customer data and order data are stored in an Oracle database, but the
white list containing the e-mail addresses of all customers who want to
receive e-mail from PragBouquet is stored in the web shop’s MySQL
database You scribble a bit on your notepad and realize that the
sys-tem architecture has to look like Figure2.1, on the next page
Exploring The Environment
You decide to start with the Oracle part Before moving on you want to
have a closer look at the structure of the order database Your database
administrator told you that the relevant tables are calledcustomersand
orders He gave you plenty of Microsoft Word documents describing
every single table in the order database Despite this you have a look at
the current state of affairs yourself using SQL*Plus, Oracle’sSQLshell
C:\> sqlplus scott
SQL*Plus: Release 9.2.0.1.0 - Production on Sat Jun 4 16:00:04 2005
Copyright (c) 1982, 2002, Oracle Corporation All rights reserved.
Enter password: XXXX
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 16THECOUPONAPPLICATION 10
Figure 2.1: Coupon Application Workflow
Why didn’t we use a standard product?
You might be asking yourself if it’s a good idea for
PragBo-quet to have created its own customer and order databases?
Wouldn’t it be much easier to buy a solution off the shelf?
Cus-tomer data is at the core of every enterprise and many
pro-cesses rely upon it It’s needed for billing, for statistics, for
trou-bleshooting, and so on Although many big companies offer
software for customer relationship management, it’s never a
bad idea to think about building your own customer database
No product will fit your needs better than your own and no
product will ever be as flexible as yours
Trang 17THECOUPONAPPLICATION 11
Connected to:
Personal Oracle9i Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL> describe customers
- -
SQL> describe orders
- -
No big surprises here Obviously, customers are characterized mainly
by their address data and we guess that the tables are connected using
columncustomer_id in tableorders
Determine the Winners
If we’re going to use a Ruby program to extract information from an
Oracle database, we’ll need a library that connects our code to the
underlying OracleAPI There are currently three Ruby modules for
Ora-cle:
• Oracle by Yoshida Masato1
• Ruby/OCI8 by Kubo Takehiro2
• Ruby9i by Jim Kain3
Trang 18THECOUPONAPPLICATION 12
Storing Addresses—A Plea from the Rest of the World
Even though addresses are critical for many purposes, their
data representation is often performed carelessly and
with-out foresight In particular, aspects of internationalization are
often forgotten, because designers and developers normally
do not know a lot about the administrative characteristics of
their neighbors
For example, Germany is a federal country divided into 16
states, but to the Germans the different states do not mean
a lot They aren’t part of an address, they do not occur on
envelopes, and you do not have to put them into a web form
when ordering something from an internet shop It’s not
surpris-ing that German customers get annoyed by web forms insistsurpris-ing
on a state When working in an international environment, it’s
better to make the state optional
There is no international standard for the representation of an
address In Germany, for example, a street address is street
name followed by a blank followed by the house number In
Italy, there’s a comma between the street name and the house
number Other countries put the number before the name
It’s nearly impossible to automatically separate street names
and house numbers afterwards, because house numbers can
contain nearly arbitrary characters
Another aspect of addresses that is forgotten surprisingly often
in this context is that addresses represent geographical objects
Geographical objects have coordinates, locations that are
becoming increasingly important as we move into a world
using location-based services If you want to offer location
based services to your customers some day you’ll have to
determine the geographical position of their addresses For
many cities it’s possible to locate an object down to the
indi-vidual house number
Please, don’t misunderstand me: you should not try to come up
with a solution that will work with every possible address format
in the world (I think that would probably be impossible), but
you should at least have a closer look at the countries you’re
potentially working in
Trang 19THECOUPONAPPLICATION 13
The main difference between these libraries is their support (or lack
thereof) for new data types Gone are the days when you could only
store small strings and numbers in your database Nowadays you can
store complete books or MP3 files in CLOB (Character Large Object) or Character Large ObjectBLOB(Binary Large Object) columns Major versions of the Oracle Call Binary Large ObjectInterface (OCI) also differ in other areas, such as security, performance Oracle Call Interfaceetc
In this book we’ll use Kubo Takehiro’s Ruby/OCI8 driver—it’s actively
maintained, runs on many platforms, and provides a lot of
function-ality It comes in two flavors: A low-level and a high-level API The
low-levelAPIdirectly reflects the Oracle C library and we will not show
its usage, as the high-levelAPIis probably more convenient to use
Let’s dive into Ruby now and see how we can identify the customers
who should get a coupon
connection = OCI8.new('maik' , 'maik' )
- cursor = connection.exec(<<-SQL)
5 select a.id, a.name, a.surname, a.email
- from customers a, orders b
- where a.id = b.customer_id
- and b.created < sysdate - 180
look similar in every modern programming language First we
estab-lish a database connection by calling the new( ) method of class OCI8
(connect( ) would have been a much better name, but for the moment
we have to live with it) Thenew( ) method returns a connection object,
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 20THECOUPONAPPLICATION 14
that can be used to communicate with the database server and to create
other database objects, such as statements and cursors
The SQL statement joins the tables customers and orders and returns
only those customers whose last order is older than 180 days The
sub-select identifies the most current entry for each customer and makes
sure that every customer is returned only once
As you can see,SQLstatements can be executed directly by calling the
exec( ) method of an OCI8 connection For SELECT statements, exec( )
returns a so-called cursor representing a result set on the database
server Clients can move through a result set by calling fetch( ) on the
cursor object After the last row has been read from the cursorfetch( )
returnsnil
Eventually, we close our cursor to free valuable resources on the database
server Cursors are resources like file handles, and are in limited
sup-ply If you’re a bad citizen and failed to free off these resources, Oracle
will raise an exception sooner or later
Admittedly, our example is concise and expressive, but using Ruby’s
iterators automatically leads to a more elegant solution with less explicit
5 select a.id, a.name, a.surname, a.email
- from customers a, orders b
- where a.id = b.customer_id
- and b.created < sysdate - 180
Trang 21THECOUPONAPPLICATION 15
seymour@example.com
Found 2 coupon recipients.
Whenexec( ) is called as an iterator—with a code block—it returns the
number of rows selected The code block automatically gets each row
fetched as a parameter and you no longer have to close the cursor
explicitly Actually, you don’t even notice that you’re working with a
cursor
Enhancing Flexibility
OK, our first example works We know where to get the data from and
we know how to get it, so let’s turn our little script into software First
of all, we have to replace the constant 180 days with something more
dynamic To do this, we could create the string containing the SQL
statement on the fly, substituting in the time value, but this approach
has some serious drawbacks
As we already know, theSQL statement gets transferred over the
net-work to the database server whenever we call exec( ) Then it gets
parsed, analyzed, optimized, executed, and eventually the result is sent
back to the client
Actually, modern database servers try to optimize a lot Part of this
process is the creation of a so-called query execution plan for every query execution planstatement they receive Current Oracle versions even try to compress
the result sets before sending it back to the client to decrease
band-width and processing time ForSQLstatements that are executed often
this means that we could gain a lot if the statement could be parsed,
analyzed, and optimized only once
Furthermore, building SQLstatements on the fly often creates
danger-ous security holes What if someone uses a web form to pass us the
following string for the number of days?
'180; delete from customers; commit;'
In the worst case the database server will happily execute the malicious
statement giving you an excellent opportunity to check if your backup
system is working properly This common kind of attack is called SQL
Fortunately, it is possible to circumvent all these disadvantages by
using so called prepared statements We transmit a statement tem- prepared statementsplate to the server, where it is parsed, analyzed, and optimized The
server then sends back a statement handle All the dynamic portions
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 22THECOUPONAPPLICATION 16
of our statement are replaced by placeholders Whenever we want to
execute our statement, we only send the server the handle and the
actual values for our placeholders
Customer = Struct.new(:id, :name, :surname, :email)
- def initialize(connection)
- @find_stmt = connection.parse(<<-SQL)
- select a.id, a.name, a.surname, a.email
- and b.created < sysdate - :days
First of all, we have inserted a placeholder (:days) into theSELECT
state-ment Then we create a prepared statement by callingparse(sql)on our
connection This method returns a handle identifying our statement on
the server
Callingbind_param( ) in line 17 binds the:daysplaceholder to its actual
value and in the following line we finally execute theSELECT statement
@find_stmt is referring to The rest is business as usual Using the
CustomerFinderlooks like this:
- finder = CustomerFinder.new(ora_connection)
- customers = finder.find(180)
- customers.each { |c| puts c.email }
5 ora_connection.logoff
Trang 23THECOUPONAPPLICATION 17
Respecting Customer Privacy
So far, so good We can create a list of all customers that should
poten-tially get a coupon, but we still have to sort out those who do not want
to receive e-mails from PragBouquet As we’ve already learned, this
information is stored in the web shop’s MySQL database There we can
find a table called whitelistcontaining a list of all e-mail addresses that
we are allowed to use
MySQL, created by Monty Widenius, is one of the most popular Open
Source databases at the moment It started as a thin wrapper for the
mSQL database and has grown over the years into a full-blown
trans-actional database management system MySQL support in Ruby was
made possible by the great work of Tomita Masahiro He has developed
both a C library binding called MySQL/Ruby4 and a pure Ruby
bind-ing called Ruby/MySQL5 Thanks to a patch written by Matt Mower,
Ruby/MySQL now also works with MySQL version 4.1.1 and later.6
In this book we’ll use the pure Ruby implementation (for no special
reason) As with our order database we first examine the webshop
database using the MySQL shell:
C:\>mysql webshop
Welcome to the MySQL monitor Commands end with ; or \g.
Your MySQL connection id is 3 to server version: 4.0.22-nt
Type ' help; ' or ' \h ' for help Type ' \c ' to clear the buffer.
mysql> describe whitelist;
+ -+ -+ -+ -+ -+ -+
+ -+ -+ -+ -+ -+ -+
Trang 24THECOUPONAPPLICATION 18
connection = Mysql.new( 'localhost' , '' , '' , 'webshop' )
- whitelist = connection.query('select * from whitelist' );
5 whitelist.each_hash { |h| puts h[ 'email' ] }
Here we have a textbook example of database use: create a
tion, execute a query, print its result, and finally close the
connec-tion What more could we say that hasn’t already been expressed in
the code? Alright, we have some details for you Calling thequery(sql)
method returns an object of class Mysql::Result that represents a
com-plete result set You can read the single rows of a result set using
various methods—here we chose each_hash( ) It returns a Hash for
every row where the column names are the hash keys with the data as
the corresponding values
Printing the whole whitelist was not exactly what we wanted Instead
we have to check whether a certain email address is contained in the
whitelist That means we have to execute a statement such as
select count(*)
from whitelist
where email = 'email@example.com'
and see if it returns 1 Obviously, the email address in thewhereclause
of our statement is variable and from what we’ve learned in Section2.1,
Enhancing Flexibility, on page15, you might assume it would be a good
idea to use a prepared statement for this purpose You are absolutely
right: it would be a good idea, but unfortunately support for prepared
statements in MySQL is a rather new feature It was introduced in
version 4.1 and the current Ruby drivers do not support it
Trang 25THECOUPONAPPLICATION 19
Obviously,num_rows( ) returns the number of rows in a result set (which
is what we wanted to determine) In use, our Whitelist class looks as
follows
- whitelist = Whitelist.new(connection)
- puts whitelist.contains?('homer@example.com' )
- puts whitelist.contains?( 'unknown_address' )
5 connection.close
produces:
true
false
We’ve created our SQLstatement using strings Does it make you feel
comfortable? Although the coupon application is an internal project,
the e-mail addresses come from an external source and so you should
never trust them In addition, it’s really wasteful to execute an SQL
statement for every single e-mail address So, we will trade some space
for time and read all e-mail addresses into a Hash initially
class Whitelist
- def initialize(connection)
- result = connection.query( 'select email from whitelist' );
- result.each_hash { |h| @whitelist[h['email' ]] = true }
That’s a really good compromise Even if we have to read several
thou-sand e-mail addresses into memory, it’s still a low price for the
perfor-mance and security we get
Joining Forces
We have everything available now to create the list of our lucky coupon
recipients: we can read all potential customers from the Oracle order
database and can look them up on the white list stored in the MySQL
webshop database Because the mailing program expects data as CSV
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 26THECOUPONAPPLICATION 20
we reopen theCustomerclass and add an appropriate method (see
Sec-tion 3.5, Comma-Separated Values (CSV), on page 129, to learn more
about Ruby’sCSVlibrary)
The following program then printsCSVdata to the console so it can be
easily redirected to the mass mailing program
- require 'whitelist'
# Read all potential customers
5 ora_connection = OCI8.new( 'maik' , 'maik' )
- finder = CustomerFinder.new(ora_connection)
- customers = finder.find(180)
- ora_connection.logoff
-10 # Sort out customers not in whitelist
- mysql_connection = Mysql.new('localhost' , '' , '' , 'webshop' )
That’s it We could happily move to the next project But wouldn’t
it be interesting to know how many customers actually convert their
coupon? To do this, we have to store at least the customer ids of all
coupon recipients somewhere Let’s put it into the order database in
a new table called coupon_recipients This will let us check to see how
many of the customers on this list placed an order after the coupon
mailing
Trang 27THECOUPONAPPLICATION 21
File create table coupon_recipients (
customer_id int not null,
created timestamp default sysdate
);
For the first time in this chapter we’re going to write data into the
database It’s nearly the same as reading information, but there are
a few subtleties we have to take care of
Here, we’ve used another form of bind variable, numbering them instead
of naming them explicitly It’s more or less a matter of taste whether
you bind parameters by name or by number, but you have to be
con-sistent If you’ve used numbers as placeholders for the parameters in
the SQL statement, you have to bind them by number later That’s
especially important for output parameters:
- cursor = connection.parse("begin :now := sysdate; end;")
- cursor.bind_param( ':now' , Time.mktime(1972, 9, 30), Date)
- puts cursor[':now' ]
There’s something even more critical hidden in ourRecipientclass
- connection = OCI8.new('maik' , 'maik' )
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 28See that we’ve enabled the auto-commit feature of the connection object auto-commit
on line 3 This makes sure that every SQL statement gets committed
immediately, saving any changes to the database when the statement
is executed That’s what we’d normally expect to happen
Oracle is a transactional database—you can group several SQL
ments as if they were one If any of the statements fail, all the
state-ments will be ignored—the database content will not be changed The
current transaction can be committed by executing the COMMIT
com-mand or it can be rolled back by callingROLLBACK Settingautocommit
to true is like callingCOMMITafter every singleSQLstatement Without
it, nothing would ever get written to the database You wouldn’t even
notice it, because from the database’s point of view it’s not an error
Our final version of the coupon application differs only slightly from our
-5 # Read all potential customers
- ora_connection = OCI8.new('maik' , 'maik' )
# Sort out customers not in whitelist
- mysql_connection = Mysql.new('localhost' , '' , '' , 'webshop' )
Trang 29THECOUPONAPPLICATION 23
set its autocommitfeature totrue We also defer closing the connection
until the end of the program, as it’s needed during the whole runtime
The Fruits of Our Labor
Two weeks ago the coupons were sent to their lucky recipients Today
started like any other: switched on your PC and went into the kitchen
to get a (free) cup of coffee As you came back to your desk to create
yet more extraordinary code, one of the marketing guys was waiting for
you “You’re the techie that sent out the coupons two weeks ago, aren’t
you?” he asks Before you can say a word he proceeds: “Although
we worked several weeks on the functional specification of the coupon
application, we somehow forgot to define some statistics requirements
Now we’re afraid that we can’t find out how successful our marvelous
and groundbreaking coupon idea was Is there any way you could create
some statistics, anyhow?”
Mostly, you’re surprised that something like a functional specification
exists—it’s the first you heard of it But, when you recover, you
remem-ber thecoupon_recipients table and open an SQL*Plus shell:
SQL> select count(*) from coupon_recipients;
COUNT(*)
-3145
SQL> select count(*) from orders where customer_id in (
2 select customer_id from coupon_recipients
3 ) and created > sysdate - 14;
Turning around to the marketing guy you say: “29.16% of the coupon
recipients placed an order during the last two weeks Do you need
anything else?” He is obviously impressed: “No, thank you very much!
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 30THECOUPONAPPLICATION 24
Managing Database Resources
So far, our examples have been simple and we didn’t care
about performance and optimization But opening a new
database connection is expensive and should not be
per-formed unnecessarily If you only need a single connection,
databases can be represented as a singleton object A
single-ton object is available everywhere in your program and can
be created only once Thanks to the Ruby standard library it’s
a piece of cake to create a singleton encapsulating ourOCI8
driver:
File require 'oci8'
require 'singleton'
class Database include Singleton attr_reader :connection
def initialize
@connection = nil end
def connect(usr, pwd, dbname = nil)
@connection = OCI8.new(usr, pwd, dbname)
@connection.autocommit = true
@connection
end def disconnect
if !@connection.nil?
@connection.logoff
@connection = nil end
end end
ClassDatabasemakes a connection to our database available
wherever we need it and we get access to the one and only
instance by calling Database.instance( ) At program start we
have to callDatabase.instance.connect(usr,pwd)once and from
then onDatabase.instance.connectioncontains our connection
You did an awesome job and I wouldn’t be surprised, if you get a corner
office soon.” You lean back and take a sip of your coffee It’s still hot
Trang 31DATABASEINTERFACE(DBI) 25
2.2 Database Interface (DBI)
It’s a bit annoying that the information we needed for our coupon
application is spread across two databases—it might be a good idea
to change this situation someday Anticipating this change, it might
be advantageous to make our application more independent of the
underlying drivers As we’ve seen in the previous sections,
access-ing databases usaccess-ing native drivers in principle differs only slightly from
vendor to vendor: you have to obtain a connection, create or prepare
statements, execute statements, and retrieve results eventually
Tech-nically, though, there are many subtle (and sometimes not so subtle)
differences Countless attempts have been made to standardize this
interface For example, on the Microsoft Windows platform there is
ODBC,OLE DB, andADO.NET, to name just a few Java has itsJDBCand
dynamic languages such as Perl, Python, and Ruby use an approach
All database abstraction layers work in a similar fashion: they define database abstraction
layers
an abstract interface to the database, and a concrete implementation,
called a database driver, is implemented for each specific database database driverFor the Ruby DBI library, these drivers are known as DBD modules.8
These drivers are accessed by your program through a standard
inter-face,9 so you do not have to remember if the method to get a new
con-nection was called new( ), connect( ), create_connection( ), or whatever
In DBI it’s called connect(driver_url, user=nil, auth=nil, params=nil)for every
database supported and it always expects the same parameters in the
same order
Compared to other database abstraction layers, DBI is extremely
sim-ple To use it you only have to know two classes, DatabaseHandleand
StatementHandle A database handle represents a connection to the
database, while a statement handle represents an activeSQLstatement
To examine whether we can benefit from usingDBIin our PragBouquet
application, we’ll change theWhitelistclass to use it
DBI.connect( 'DBI:Mysql:webshop' , '' , '' ) do |conn|
- conn.select_all('select * from whitelist' ) { |row| p row }
7 This list proves the old adage: the good thing about standards is that there’re so
many to choose from.
Trang 32DATABASEINTERFACE(DBI) 26
Because of the block syntax supported by theDBImethods, our
demon-stration program became extremely compact In line 3, DBI.connect( )
returns a database handle that gets passed into the block When the
program reaches the end of the block, the connection is closed
auto-matically Within the block we callselect_all( ), which executes a SELECT
statement and calls a code block for every row that was returned
Again, we do not have to care about resource management—the
state-ment will be released at the end of the block The only thing left to do
is to integrate the code into theWhitelistclass
We did not change the interface and only the connection object has to
be instantiated differently to use theWhitelistclass:
- whitelist = Whitelist.new(connection)
- puts whitelist.contains?( 'homer@example.com' )
- connection.disconnect
Should we move thewhitelisttable from MySQL to our Oracle database,
we only have to change the string “Mysql” to “Oracle” and the program
will still work
Encouraged by our success, we’ll change the Oracle stuff in our
Cus-tomerFinderclass to use DBI too
- def initialize(connection)
- @find_stmt = connection.prepare(<<-SQL)
- select a.id, a.name, a.surname, a.email
- and b.created < sysdate - :days
Trang 33DATABASEINTERFACE(DBI) 27
As with the previous example, we did not have to change a lot Instead
of calling parse( ) on our connection object in line 3, we have to call
prepare( ) now Similarly,exec( ) becomesexecute( ) on line 18 We have
to pass aDBI connection object now:
- finder = CustomerFinder.new(connection)
- customers = finder.find(180)
- customers.each { |c| puts c.email }
5 connection.disconnect
Despite all this, the benefits of a database abstraction layer aren’t as
big as you might think It’s convenient to work with DBI when you
have to access a database product that you haven’t worked with before,
but you shouldn’t assume that you can easily replace your existing
database by a completely different one only because you’re using an
abstraction layer Moving from one database to another is one of the
most complicated things in developing enterprise software
Because there are so many proprietary additions to SQL in every
ven-dor’s implementation, writing portable statements is nearly impossible
Often such statements look quite harmless For example, look at the
statement starting on line 3 in ourCustomerFinderclass It contains at
least three potential problems:
• Not all databases support sub-selects
• sysdate is specific to Oracle In MySQL you’d have to use now( )
and in DB2 it’d becurrent timestamp
• The syntax of arithmetic expressions for dates (such assysdate-180)
differs from vendor to vendor
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 34OBJECT-RELATIONALMAPPERS 28
Sometimes the problems aren’t directly related to aSQLstatement, but
are caused by some side-effects like auto-generated identifiers which
aren’t available in every database To support such database specific
functions, the drivers used by DBI allow for some extensions, but if
you want to write portable software, it’s certainly not a good idea to
use them For example, to read the last auto-generated identifier from
a MySQL database, you call the last_insert_id( ) method This method
is not available for Oracle databases and it’s not easy to simulate the
auto-generation feature in Oracle
A last problem with DBI could be performance: the extra layers and
the need to map features can decrease performance significantly For
example, accessing MySQL using the native driver is twice as fast as
using theDBIlayer
There are much more important (and tricky) issues that might prevent
you from easily changing your database Consider, for example, C/C++
programs that contain Embedded SQL Even if you’re lucky and have Embedded SQLaccess to the source code of all programs running in your environment,
it still will be a lot of work to adjust them all
So, if you know up front that you have to support multiple databases
you can gain a lot by using an abstraction layer, but you have to plan
for it carefully
2.3 Object-Relational Mappers
A lot of people working in the software development department of
Prag-Bouquet have been thinking about re-organizing the current database
landscape for quite a long time The design of many databases has
become a bit messy over the years and it’s a big problem that logic and
data are spread across Oracle and MySQL databases To save license
costs, all the Oracle databases should be migrated to a MySQL database
in the future and all new stuff should be implemented in the MySQL
database right from the beginning
The first thing that has to be added is an automatic management
sys-tem for ordering flowers Today flowers are ordered from a big
whole-saler more or less manually by the buying department The clerks get
daily order reports and they can see how many flowers are still in stock
Then they do some simple calculations using a spreadsheet application
and place new orders accordingly It’s your task now to automate this
process as far as possible, i.e to create a database for the flowers in
Trang 35OBJECT-RELATIONALMAPPERS 29
Generating Unique Ids
It’s really strange: mankind is talking about going to Mars, but
creating artificial primary keys in databases is still a problem in
the 21stcentury, because there’s no standard
From a design point of view, there are a lot of advantages
to creating an artificial unique (numeric) primary key for every
table in the database, even if a natural primary key does exist
Numeric values only need a small amount of space and can
be indexed efficiently
Although there’s a need for unique ids in every database, all
vendors come up with their own ideas and concepts to
gener-ate them It’s easy to genergener-ate them more or less portable by
creating a table containing only two columns:
create table sequences (
value int default 1 not null,
table_name varchar(64)
);
To create a sequence for our customerstable, we insert a new
row into thesequencestable:
insert into sequences (table_name) values ( 'customers' );
Generating a new sequence value is straightforward, then:
begin
update sequences set id = id + 1
where table_name = 'customers' ;
select id from sequences
where table_name = 'customers' ;
end;
Unfortunately, this solution is not particularly efficient, because it
has to be executed in a transaction that can slow down things
a bit Oh, and did I mention that not all databases support
transactions?
Whenever your program relies upon auto-generated identifiers
you should encapsulate this process carefully to prevent bad
surprises when you have to migrate to another database
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 36OBJECT-RELATIONALMAPPERS 30
stock and to remove flowers from stock whenever a new bouquet leaves
PragBouquet
Before opening your text editor you take a day off to think about the
new database structure and after 24 hours of constant thinking you
finally had this revolutionary idea: we need a table that represents
flowers:
- id int unsigned not null auto_increment primary key,
- name varchar(64) not null,
- price double not null
That should be sufficient for a first version: flowers have a name, a
price, and an artificial primary key that is created by the database
auto-matically The “only” thing left to do is mapping the flowers table to a
Flowerclass and mapping all its columns to the according attributes
You have read Martin Fowler’s Patterns of Enterprise Application
Archi-tecture [?] and you still remember his Active Record pattern and its
definition:
“An object that wraps a row in a database table or view, encapsulates
the database access, and adds domain logic on that data.”
Before programming an Active Record for theflowers table we
encapsu-late access to the MySQL database in a singleton first:
-10 def connect(host, usr, pwd, db= nil)
- @connection = Mysql.new(host, usr, pwd, db)
Fine, after calling Database.instance.connect( ) once, we can access the
database connection calling Database.instance.connection( ) from
Trang 37any-OBJECT-RELATIONALMAPPERS 31
where we want So, let’s use it to create new flowers:
- insert into flowers (name, price)
- values ( '#{name}' , #{price.to_f})
def initialize(id, name, price)
- @id, @name, @price = id, name, price
Virtually planting a rose looks like this:
- rose = Flower.create( 'rose' , 1.99)
- puts rose
and produces:
A rose (1) costs $1.99.
The first version of the Flower class allows for creating new objects
by calling create(name,price) This method inserts a new row into the
database, reads the id that has been generated by MySQL and returns
a new Flower object To make sure that no conflicts happen in the
database because of duplicateidvalues, we have declared theinitialize( )
method private Hence, only methods of theFlowerclass can create new
objects
For the sake of completeness we add the remaining methods needed
to be fullyCRUD compliant (CRUDstands for Create, Retrieve, Update, CRUD
Delete):
Report erratum
Prepared exclusively for Jacob Hochstetler
Trang 38OBJECT-RELATIONALMAPPERS 32
Now we can retrieve, update, and deleteFlowerobjects in the database:
It took less than an hour to create the Active Record and it works fine,
but despite all this you still think that sometimes life isn’t fair: all your
friends are hanging around at the beach having fun and you’re writing
tons of boring SQL statements only to read and save Flower objects
Enough is enough and hence you decide to look for a tool that will do
all this tedious stuff for you
Trang 39OBJECT-RELATIONALMAPPERS 33
Object-Relational Mappers for Ruby
Because of its dynamic nature Ruby is a perfect language
for creating tools like object-relational mappers: you can
eas-ily create classes and methods on the fly and determining
the structure of a database is not a big problem with most
database systems, either
Unsurprisingly, several projects have been initiated to
imple-ment an object-relational mapper∗, but ActiveRecord is by far
the most popular and most advanced It’s much more than a
simple mapper, it’s fast, it supports nearly every database
avail-able, and it is constantly enhanced by a big community
interest-ing one, for example.
ActiveRecord Basics
ActiveRecord is an an enhanced implementation of Martin Fowler’s
Active Record object-relational mapping pattern.10 ActiveRecord was
created by David Heinemeier Hansson because he needed it for the
famous Ruby on Rails project11 ActiveRecord now supports nearly
every database system currently in use (MySQL, PostgreSQL, SQLite,
Microsoft SQL Server, Oracle, and DB2)
Code always trumps prose, so instead of explaining academic
persis-tence strategies, let’s start by telling ActiveRecord to connect to our
These statements load the ActiveRecord gem (see Section6.4, RubyGems,
on page288to learn more about RubyGems) then establish a
connec-tion to thewebshopdatabase running on localhost
Trang 40OBJECT-RELATIONALMAPPERS 34
Now we have to map theflowerstable to a Ruby class calledFlower:
That’s it! All we had to do is derive our class from ActiveRecord::Base
Every instance of classFlowerrepresents a single row of theflowerstable
ActiveRecord derives the name of the database table by taking the class
name, turning it into lowercase, and pluralizing it For example, Flower
becomesflowersandPragmaticProgrammerbecomespragmatic_programmers
If necessary, you can also set the table name explicitly, either because
the built-in pluralization rules don’t work for you or because you want
to map to an existing table whose name doesn’t meet ActiveRecord’s
expectations
class LegacyTable < ActiveRecord::Base
set_table_name 'xy12aj'
end
All Flower objects automatically have accessors for all the columns of
theflowerstable, so there’ll be accessors namedname( ) andprice( ):
- flower.name = 'primrose'
- flower.price = 0.99
ActiveRecord stores all columns internally in a hash called attributes,
but using this knowledge is dangerous, as it links us to ActiveRecord’s
implementation Instead, we should access column values using just
the attributes For example, we could add ato_s( ) method to our class
- "A #{self.name} (#{self.id}) costs $#{self.price}."
In addition, ActiveRecord creates methods for reading, updating, and
deleting rows in the database To initialize the flowerstable with some
lovely plants, we can do the following:
- ].each do |name, price|
- flower = Flower.new(:name => name, :price => price)
- flower.save