These serve the same role as Java packages or Ruby modules, namespaces but the syntax is different: The namespace is http://www.relevancellc.com/sample.. Here’s the one-line YAML versi
Trang 1XML is intended to be human-readable and self-describing XML is
human-readable because it is a text format, and it is self-describing
because data is described by elements such as <user>, <username>, elements
and <homepage>in the preceding example Another option for
repre-senting usernames and home pages would be XML attributes:
<user username= "stu" homepage= "http://blogs.relevancellc.com" > </user>
The attribute syntax is obviously more terse It also implies
seman-tic differences Attributes are unordered, while elements are ordered
Attributes are also limited in the values they may contain: Some
char-acters are illegal, and attributes cannot contain nested data (elements,
on the other hand, can nest arbitrarily deep)
There is one last wrinkle to consider with this simple XML document
What happens when it travels in the wide world and encounters other
elements named <user>? To prevent confusion, XML allows
names-paces These serve the same role as Java packages or Ruby modules, namespaces
but the syntax is different:
<rel :user xmlns:rel= "http://www.relevancellc.com/sample"
username= "stu"
homepage= "http://blogs.relevancellc.com" >
</rel :user>
The namespace is http://www.relevancellc.com/sample That would be a
lot to type in front of an element name, so xmlns:relestablishes relas a
prefix Reading the previous document, an XML wonk would say that
<user>is in thehttp://www.relevancellc.com/samplenamespace
YAML is a response to the complexity of XML (YAML stands for YAML
Ain’t Markup Language) YAML has many things in common with XML
Most important, both YAML and XML can be used to represent and
seri-alize complex, nested data structures What special advantages does
YAML offer?
The YAML criticism of XML boils down to a single sentence XML has
two concepts too many:
• There is no need for two different forms of nested data Elements
are enough
• There is no need for a distinct namespace concept; scoping is
suf-ficient for namespacing
To see why attributes and namespaces are superfluous in YAML, here
are three YAML variants of the same configuration file:
Trang 2Download code/rails_xt/samples/why_yaml.rb
user:
username: stu
homepage: http://blogs.relevancellc.com
As you can see, YAML uses indentation for nesting This is more terse
than XML’s approach, which requires a closing tag
The second XML example used attributes to shorten the document to a
single line Here’s the one-line YAML version:
Download code/rails_xt/samples/why_yaml.rb
user: {username: stu, homepage: http://blogs.relevancellc.com}
The one-line syntax introduces{}as delimiters, but there is no semantic
distinction in the actual data Name/value data, called a simple
map-pingin YAML, is identical in the multiline and one-line documents simple mappingHere’s a YAML “namespace”:
Download code/rails_xt/samples/why_yaml.rb
http://www.relevancellc.com/sample:
user: {username: stu, homepage: http://blogs.relevancellc.com}
There is no special namespace construct in YAML, because scope
pro-vides a sufficient mechanism In the previous document, user belongs
to http://www.relevancellc.com/sample Replacing the words “belongs to”
with “is in the namespace” is a matter of taste
It is easy to convert from YAML to a Ruby object:
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> YAML.load("{username: stu}")
=> {"username"=>"stu"}
Or from a Ruby object to YAML:
irb(main):003:0> YAML.dump 'username'=>'stu'
=> " - \nusername: stu"
The leading - – \n:is a YAML document separator This is optional, and
we won’t be using it in Rails configuration files See the sidebar on the
next page for pointers to YAML’s constructs not covered here
Items in a YAML sequence are prefixed with’- ’:
- one
- two
- three
Trang 3Data Formats: More Complexity
For Rails configuration, you may never need YAML knowledge
beyond this chapter But, if you delve into YAML as a
general-purpose data language, you will discover quite a bit more
complexity Here are a few areas of complexity, with XML’s
approach to the same issues included for comparison:
Complexity YAML Approach XML Approach
whitespace Annoying rules Annoying rules
Repeated data Aliases and anchors Entities, SOAP sect 5
If you are making architectural decisions about data formats,
you will want to understand these issues For YAML, a good
place to start is the YAML Cookbook.∗
∗ http://yaml4r.sourceforge.net/cookbook/
There is also a one-line syntax for sequences, which from a Ruby
per-spective could hardly be more convenient A single-line YAML sequence
is also a legal Ruby array:
irb(main):015:0> YAML.load("[1, 2, 3]")
=> [1, 2, 3]
irb(main):016:0> YAML.dump [1,2,3]
=> " - \n- 1\n- 2\n- 3"
Beware the significant whitespace, though! If you leave it out, you will
be in for a rude surprise:
irb(main):018:0> YAML.load("[1,2,3]")
=> [123]
Without the whitespace after each comma, the elements all got
com-pacted together YAML is persnickety about whitespace, out of
defer-ence to tradition that markup languages must have counterintuitive
whitespace rules With YAML there are two things to remember:
• Any time you see a single whitespace character that makes the
format prettier, the whitespace is probably significant to YAML
That’s YAML’s way of encouraging beauty in the world
• Tabs are illegal Turn them off in your editor
Trang 4If you are running inside the Rails environment, YAML is even
eas-ier The YAML library is automatically imported, and all objects get a
In many situations, YAML’s syntax for serialization looks very much
like the literal syntax for creating hashes or arrays in some
(hypotheti-cal) scripting language This is no accident YAML’s similarity to script
syntax makes YAML easier to read, write, and parse Why not take this
similarity to its logical limit and create a data format that is also valid
source code in some language? JSON does exactly that
9.4 JSON and Rails
The JavaScript Object Notation (JSON) is a lightweight
data-inter-change format developed by Douglas Crockford JSON has several
rel-evant advantages for a web programmer JSON is a subset of legal
JavaScript code, which means that JSON can be evaluated in any
JavaScript-enabled web browser Here are a few examples of JSON
First, an array:
authors = [ 'Stu' , 'Justin' ]
And here is a collection of name/value pairs:
prices = {lemonade: 0.50, cookie: 0.75}
Unless you are severely sleep deprived, you are probably saying “This
looks almost exactly like YAML.” Right JSON is a legal subset of
Java-Script and also a legal subset of YAML (almost) JSON is much simpler
than even YAML—don’t expect to find anything like YAML’s anchors
and aliases In fact, the entire JSON format is documented in one short
web page athttp://www.json.org
JSON is useful as a data format for web services that will be
con-sumed by a JavaScript-enabled client and is particularly popular for
Ajax applications
Trang 5Rails extends Ruby’s core classes to provide ato_jsonmethod:
If you need to convert from JSON into Ruby objects, you can parse
them as YAML, as described in Section9.3, YAML and XML Compared,
on page261 There are some corner cases where you need to be careful
that your YAML is legal JSON; see _why’s blog post4 for details
JSON and YAML are great for green-field projects, but many developers
are committed to an existing XML architecture Since XML does not look
like program source code, converting between XML and programming
language structures is an interesting challenge
It is to this challenge, XML parsing, that we turn next
9.5 XML Parsing
To use XML from an application, you need to process an XML
docu-ment, converting it into some kind of runtime object model This
pro-cess is called XML parsing Both Java and Ruby provide several differ- XML parsing
ent parsing APIs
Ruby’s standard library includes REXML, an XML parser that was
orig-inally based on a Java implementation called Electric XML REXML is
feature-rich and includes XPath 1.0 support plus tree, stream, SAX2,
pull, and lightweight APIs This section presents several examples using
REXML to read and write XML
Rails programs also have another choice for writing XML Builder is a
special-purpose library for writing XML and is covered in Section 9.7,
Creating XML with Builder, on page276
4 http://redhanded.hobix.com/inspect/jsonCloserToYamlButNoCigarThanksAlotWhitespace.html
Trang 6The next several examples will parse this simple Ant build file:
Download code/Rake/simple_ant/build.xml
<project name= "simple-ant" default= "compile" >
<target name= "clean" >
<delete dir= "classes" />
</target>
<target name= "prepare" >
<mkdir dir= "classes" />
</target>
<target name= "compile" depends= "prepare" >
<javac srcdir= "src" destdir= "classes" />
</target>
</project>
Each example will demonstrate a different approach to a simple task:
extracting aTargetobject withnameanddependsproperties
Push Parsing
First, we’ll look at a Java SAX (Simple API for XML) implementation
SAX parsers are “push” parsers; you provide a callback object, and
the parser pushes the data through various callback methods on that
object:
Download code/java_xt/src/xml/SAXDemo.java
throws ParserConfigurationException, SAXException, IOException {
SAXParserFactory f = SAXParserFactory.newInstance();
SAXParser sp = f.newSAXParser();
sp.parse(file, new DefaultHandler() {
String qname, Attributes attributes)
Trang 7An REXML SAX approach looks like this:
Download code/rails_xt/samples/xml/sax_demo.rb
def get_targets(file)
targets = []
parser = SAX2Parser.new(file)
parser.listen(:start_element, %w{target}) do |u,l,q,atts|
targets << {:name=>atts[ 'name' ], :depends=>atts[ 'depends' ]}
end
parser.parse
targets
end
Even though they are implementing the same API, the Ruby and Java
approaches have two significant differences Where the Java
implemen-tation uses a factory, the Ruby implemenimplemen-tation instantiates the parser
directly And where the Java version uses an anonymous inner class,
the Ruby version uses a block
These language issues are discussed in the Joe Asks on page 272
and in Section 3.9, Functions, on page 92, respectively These
differ-ences will recur with the other XML parsers as well, but we won’t bring
them up again
There is also a smaller difference The Ruby version takes advantage
of one of Ruby’s many shortcut notations The %wshortcut provides a shortcut notationssimple syntax for creating an array of words For example:
irb(main):001:0> %w{these are words}
=> ["these", "are", "words"]
The %w syntax makes it convenient for Ruby’s start_element to take a
second argument, the elements in which we are interested Instead of
listening for all elements, the Ruby version looks only for the <target>
element that we care about:
Download code/rails_xt/samples/xml/sax_demo.rb
parser.listen(:start_element, %w{target}) do |u,l,q,atts|
Pull Parsing
A pull parser is the opposite of a push parser Instead of implementing
a callback API, you explicitly walk forward through an XML document
As you visit each node, you can call accessor methods to get more
infor-mation about that node
Trang 8In Java, the pull parser is called the Streaming API for XML (StAX).
StAX is not part of the J2SE, but you can download it from the Java
Community Process website.5 Here is a StAX implementation of
getTar-get( ):
Download code/java_xt/src/xml/StAXDemo.java
- XMLInputFactory xif= XMLInputFactory.newInstance();
- XMLStreamReader xsr = xif.createXMLStreamReader( new FileInputStream(f));
- for ( int event = xsr.next();
Unlike the SAX example, the StAX version explicitly iterates over the
document by calling next( ) (line 6) Then, we detect whether we care
about the parser event in question by comparing theeventvalue to one
or more well-known constants (line 9)
Here’s the REXML pull version ofget_targets( ):
5 if event.start_element? and event[0] == 'target'
- targets << {:name=>event[1][ 'name' ], :depends=>event[1][ 'depends' ]}
Trang 9As with the StAX example, the REXML version explicitly iterates over
the document nodes Of course, the REXML version takes advantage
of Ruby’s each( ) (line 4) Where StAX provided an event number and
well-known constants to compare with, the REXML version provides an
actual event object, with boolean accessors such as start_element? for
the different event types (line 5)
Despite their API differences, push and pull parsers have a lot in
com-mon They both move in one direction, forward through the document
This can be efficient if you can process nodes one at a time, without
needing content or state from elsewhere in the document If you need
random access to document nodes, you will probably want to use a tree
parser, discussed next
Tree Parsing
Tree parsers represent an XML document as a tree in memory,
typi-cally loading in the entire document Tree parsers allow more
power-ful navigation than push parsers, because you have random access to
the entire document On the other hand, tree parsers tend to be more
expensive and may be overkill for simple operations
Tree parser APIs come in two flavors: the DOM and everything else The
Document Object Model (DOM) is a W3C specification and aspires to
be programming language neutral Many programming languages also
offer a tree parsing API that takes better advantage of specific language
features Here is thebuild.xmlexample implemented with Java’s built-in
- Document doc = db.parse(file);
5 NodeList nl = doc.getElementsByTagName( "target" );
- Target[] targets = new Target[nl.getLength()];
- for ( int n=0; n<nl.getLength(); n++) {
- Element e = (Element) nl.item(0);
Trang 10The Java version finds users withgetElementsByTagName( ) in line 5 The
value returned is a NodeList, which is a DOM-specific class Since the
DOM is language-neutral, it does not support Java’s iterators, and
loop-ing over the nodes requires aforloop (line 7)
Next, using REXML’s tree API, here is the code:
Download code/rails_xt/samples/xml/dom_demo.rb
- Document.new(file).elements.each( "//target" ) do |e|
- targets << {:name=>e.attributes[ "name" ],
5 :depends=>e.attributes[ "depends" ]}
REXML does not adhere to the DOM Instead, the elements( ) method
returns an object that supports XPath In XPath, the expression//target
matches all elements named target Building atop XPath, iteration can
then be performed in normal Ruby style witheach( ) (line 3)
Of course, Java supports XPath too, as you will see in the following
section
XPath
XML documents have a hierarchical structure, much like the file
sys-tem on a computer File syssys-tems have a standard notation for
address-ing specific files For example, path/to/foorefers to the file foo, in the
to directory, in the path Better yet, shell programs use wildcards to
address multiple files at once:path/*refers to all files contained in the
pathdirectory
The XML Path Language (XPath) brings path addressing to XML XPath
is a W3C Recommendation for addressing parts of an XML document
(seehttp://www.w3.org/TR/xpath.html)
The previous section showed a trivial XPath example, using //targetto
select all <target>elements Our purpose here is to show how to access
the XPath API using Java and Ruby, not to learn the XPath language
itself Nevertheless we feel compelled to pick a slightly more interesting
example
Trang 11Joe Asks .
Why Are the Java XML Examples So Verbose?
The Ruby XML examples are so tight that you have to expect there’s a
catch Are the Ruby XML APIs missing something important?
What the Java versions have, and the Ruby versions lack utterly,
is abstract factories Many Java APIs expose their key objects via
abstract factories Instead of sayingnew Document, we say
Document-BuilderFactory.someFactoryMethod() The purpose of factory methods in
this context is keep our options open If we want to switch
implemen-tations later, to different parser, we can reconfigure the factory
with-out changing a line of code On the other hand, callingnewlimits your
options Sayingnew Foo()gives you aFoo, period You can’t change
your mind and get subclass ofFooor a mock object for testing
The Ruby language is designed so that abstract factories are generally
unnecessary, for three reasons:
• In Ruby, the newmethod can return anything you want Most
important,newcan return instances of a different class, so
choos-ingnewnow does not limit your options
• Ruby objects are duck-typed (see Section3.7, Duck Typing, on
page89) Since objects are defined by what they can do, rather
than what they are named, it is easier to change your mind and
have one kind of object stand in for another
• Ruby classes are open Choosing Foo now doesn’t limit your
options later, because you can always reopenFooand tweak
its behavior
In Java, having to choose between abstract factories andnew
under-mines agility A central agile theme is “Build what you need now, in
a way that can easily evolve to what you discover you need next
week.” For every new class, we have to make a Big Up-Front
Deci-sion (BUFD, often also BFUD) “Will it need pluggable implementations
later?” If yes, use factory If no, callnew The more BUFDs a language
avoids, the easier it is to be agile In Java’s defense, you can avoid
the dilemma posed by abstract factories in several ways You can skip
factories and use delegation behind the scenes to select alternate
implementations A great example is the JDOM (http://www.jdom.org),
which is much easier to use than the J2SE APIs With Aspect-Oriented
Programming (AOP), you can unmake past decisions by weaving in
new decisions With Dependency Injection (DI), you can pull
configu-ration choices out of your code entirely Pointers to more reading on
all this are in the references section at the end of the chapter
Trang 12The following Java program finds the name of all <target> elements
whosedependsattribute isprepare:
String[] results = new String[nl.getLength()];
- for ( int n=0; n<nl.getLength(); n++) {
15 results[n] = nl.item(n).getNodeValue();
Java’s XPath support builds on top of its DOM support, so most of
this code should look familiar Starting on line 4 you will see several
lines of factory code to create the relevant DOM and XPath objects The
actual business of the method is conducted on line 10 when the XPath
expression is evaluated The results are in the form of aNodeList, so the
iteration beginning on line 13 is nothing new either
Ruby’s XPath code also builds on top of the tree API you have already
That’s it Just one line of code The XPath API in Ruby is all business,
no boilerplate In fact, the syntax can be made even tighter, as shown
in the sidebar on the next page
Trang 13The Symbol#to_proc Trick
You may be thinking that this Ruby XPath example is a bit too
The Rails team thought so and provided another syntax to be
used when invoking blocks:
XPath.match(Document.new(file),
"//target[@depends='prepare']/@name" ).map(&:value)
The new syntax &:value takes advantage of Ruby’s alternate
syntax for passing blocks, by passing an explicitProcobject (A
Proc is a block instantiated as a class so you can manipulate
it in normal Ruby ways.) Of course, :value is not a Proc; it’s a
Symbol! Rails finesses this by defining an implicit conversion from
The Symbol#to_proc trick is interesting because it demonstrates
an important facet of Ruby The Ruby language encourages
modifications to its syntax Framework designers such as the
Rails team do not have to accept Ruby “as is.” They can bend
the language to meet their needs