Both formats provide a number of optional elements, but for the most part we will ignore these in favor of those most commonly encountered “in the wild.” Version 0.9 The following is a c
Trang 1CHAPTER 9 RSS and Syndication
I N THIS CHAPTER , we describe how portlets can aggregate links to content on
external web sites using the group of standards known as RSS We also discuss how the content of your own portal could be syndicated for convenient inclusion
in external sites using the same mechanism
Overview of RSS
RSS is not a single standard It is several standards, some closely related, and others more loosely so
The versions of RSS that are most commonly used are 0.9 and 0.91, both of which were released by Netscape to allow content from external web sites to be aggregated into its My Netscape portal
Since 0.91, two groups have produced new versions of RSS with varying degrees
of backward compatibility The company UserLand Software carried out early development of RSS for Netscape and has subsequently released versions 0.92, 0.93, 0.94, and 2.0 The RSS-DEV working group (an independent group of devel-opers) released the 1.0 version of RSS stemming from the 0.91 version
NOTE Some but not all of these versions are based on the Resource Description Framework (RDF) format This rather more consistently managed standard from the World Wide Web Consortium (W3C) standards body provided a stan-dard for presenting metadata A syndication feed is a set of metadata; it does not (generally) provide the articles itself, but will provide their titles, some associated links, and abstracts of the articles.
RDF in this respect is ideal—however, it is quite a complex standard; RSS prag-matically provides a reasonable subset of this information oriented specifically toward syndication at the cost of a somewhat fragmented standard.
Even the naming of the standard reflects the version confusion Correctly or otherwise, you may see any of these versions referred to as one of “Really Simple Syndication,” “Rich Site Summary,” or “RDF Site Summary.” In practice, it is sim-plest to refer to RSS by its acronym alone, and use a version number if you feel the need to be specific
Trang 2The good news is that amid this riot of colorful standards for RSS, the RSS Portlet that we use to acquire and present syndicated content is quite agnostic You can import an RSS feed in formats 0.90 through to 2.0 The only thing that you cannot import is invalid XML
RSS is not the only game in town—there are various other standards for mak-ing this type of meta-information available and for syndicatmak-ing content Although
we won’t be covering them any further, you should be familiar with RDF (which we’ve already mentioned) and the up-and-coming “Atom” standard (in development
at www.atomenabled.org), which aims to be a more “standard” standard!
Walking Through an Example RSS File
Let’s now take a look at some concrete examples of RSS feeds in the most com-monly encountered 0.9 and 0.91 formats Both formats provide a number of optional elements, but for the most part we will ignore these in favor of those most commonly encountered “in the wild.”
Version 0.9
The following is a correct RSS 0.9 feed describing the authors’ web site, including the compulsory elements and some of the optional ones:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/">
<channel>
<title>PortalBook Technical Notes</title>
<link>http://portalbook.com/</link>
<description>
Discourse and exposition on Java and developing Portlets
</description>
</channel>
<item>
<title>New version of Jetspeed released</title>
<link>
http://portalbook.com/notes/005.html
</link>
</item>
Download at Boykma.Com
Trang 3<title>Collections and iterations</title>
<link>
http://portalbook.com/notes/004.html
</link>
</item>
<item>
<title>Deprecated techniques</title>
<link>
http://portalbook.com/notes/003.html
</link>
</item>
</rdf:RDF>
The format is so simple it barely needs explanation, which is indubitably one
of the reasons for the enthusiastic early take-up
The first version of RSS was a valid RDF document As such it fell within the RDF namespace defined by the W3C The simple elements required by Netscape’s format are specified in the default namespace:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/">
The <channel>element contains the metadata for the feed—its title, the site from which it can be obtained, and a human-readable description of its content
One of the deficiencies of the 0.9 format over later submissions is that it is restricted
to a single channel, so a web site proffering diverse subject matter must provide multiple distinct feeds rather than a single RSS feed with multiple channels:
<channel>
<title>PortalBook Technical Notes</title>
<link>http://portalbook.com/</link>
<description>
Discourse and exposition on Java and developing Portlets
</description>
</channel>
The <item>element repeats multiple times, once for each article or item of interest that is being publicized in the feed There is a hard limit of 15 items permissible in the channel The items includes a title describing the data to be
Trang 4propagated and a link to the data in question This extremely sparse information
is all that is permitted:
<item>
<title>Deprecated techniques</title>
<link>
http://portalbook.com/notes/003.html
</link>
</item>
Version 0.91
The following is a correct RSS 0.91 feed describing the authors’ web site, including the compulsory elements and some of the optional ones
<?xml version="1.0"?>
<rss version="0.91">
<channel>
<title>PortalBook Technical Notes</title>
<link>http://portalbook.com/</link>
<description>
Discourse and exposition on Java and developing Portlets
</description>
<language>en-us</language>
<copyright>
Copyright: (C) 2003 Dave Minter and Jeff Linwood
</copyright>
<item>
<title>New version of Jetspeed released</title>
<link>http://portalbook.com/notes/005.html</link>
<description>
We let you know the latest changes and improvements to the Jetspeed portlet server in the new version.
</description>
</item>
<item>
<title>Collections and iterations</title>
Download at Boykma.Com
Trang 5<description>
Misuse of Collections can result in hidden nested iterations that rapidly become a serious performance drag We discuss how
to avoid this and similar pitfalls.
</description>
</item>
<item>
<title>Deprecated techniques</title>
<link>http://portalbook.com/notes/003.html</link>
<description>
Bad habits die hard We discuss some of the techniques that were legitimate in older versions of Jetspeed and the approaches that should replace them.
</description>
</item>
</channel>
</rss>
This format is not quite as simple as that of version 0.9 but does contain some compensatory features
The version of RSS is specified in this version, making it a little easier to keep track of what data is incoming:
<rss version="0.91">
Again only one channel is permitted by the standard In this version, however, the <channel>element encompasses all of the subsequent items along with the channel’s metadata:
<channel>
Rather more information about the channel is available in a 0.91 feed As well
as the title, link, and description, we are provided with an associated language and copyright information:
<title>PortalBook Technical Notes</title>
<link>http://portalbook.com/</link>
<description>
Discourse and exposition on Java and developing Portlets
</description>
Trang 6<language>en-us</language>
<copyright>
Copyright: (C) 2003 Dave Minter and Jeff Linwood
</copyright>
The <item>elements are also rather better equipped In addition to the <title> and <link>elements, we have a description This is usually populated with an abstract of the content that is to be covered in the associated link:
<item>
<title>New version of Jetspeed released</title>
<link>http://portalbook.com/notes/005.html</link>
<description>
We let you know the latest changes and improvements to the Jetspeed portlet server in the new version.
</description>
</item>
This version of the standard is not limited to 15 item elements, but enough software exists that makes this assumption that we figure it is safer to so limit it
Version 2.0
The following is a correct RSS 2.0 feed describing the authors’ web site, including the compulsory elements and some of the optional ones:
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>PortalBook Technical Notes</title>
<link>http://portalbook.com/</link>
<description>
Discourse and exposition on Java and developing Portlets
</description>
<language>en-us</language>
<copyright>
Copyright: (C) 2003 Dave Minter and Jeff Linwood
</copyright>
Download at Boykma.Com
Trang 7<title>New version of Jetspeed released</title>
<link>
http://portalbook.com/notes/005.html
</link>
<description>
We let you know the latest changes and improvements to the Jetspeed portlet server in the new version
</description>
</item>
<item>
<title>Collections and iterations</title>
<link>
http://portalbook.com/notes/004.html
</link>
<description>
Misuse of Collections can result in hidden nested iterations that rapidly become a serious performance drag We discuss how
to avoid this and similar pitfalls.
</description>
</item>
<item>
<title>Deprecated techniques</title>
<link>
http://portalbook.com/notes/003.html
</link>
<description>
Bad habits die hard We discuss some of the techniques that were legitimate in older versions of Jetspeed and the approaches that should replace them.
</description>
</item>
</channel>
</rss>
If you compare this feed with the one demonstrated in the 0.91 version of RSS, you’ll see a striking similarity In fact, they’re identical aside from the version num-ber So what’s the point?
Trang 8Figure 9-1 NetNewsWire Lite presenting a set of RSS feeds
RSS 2.0 provides a much larger set of optional elements that can be included
in your feed However, the later the version of RSS that you select for your imple-mentation, the less likely it is that client software will provide compatibility for
it Therefore, you need to weigh this disadvantage against the richer variety of optional metadata (publication dates, unique identifiers, and so forth—for the full list, see the current specification for RSS 2.0 at http://blogs.law.harvard.edu/ tech/rss) that you can include in a 2.0 feed
RSS Browsers
As we have discussed, the original purpose of RSS was to allow headlines to be imported into other web pages However, a number of specialized browsers have appeared that provide a convenient user interface for browsing through these content summaries
The example shown in Figure 9-1 is for NetNewsWire Lite running on
a Macintosh and illustrates the basic functionality you can expect to see in an RSS browser
The browser shown provides a list of sites from which you may choose an RSS feed Selecting a site lists the article titles available on the site Selecting a title displays the abstract of the article A link is provided that will launch a browser with the article in question
Most of the rest of the example screen shots in this chapter will be taken from a Java Swing-based RSS browser called RSS Viewer, which you can down-load from http://sourceforge.net/projects/rssview/
Download at Boykma.Com
Trang 9RSS Viewer lacks some of the finesse of NetNewsWire Lite, but being Java-based,
it will run on any platform A list of other RSS resources, including RSS browsers for various platforms, is available at www.lights.com/weblogs/rss.html
Displaying Syndicated Information in Portlets
It is possible that your portal will supply a portlet for displaying RSS streams, but failing that, a number of third-party portlets already exist that provide this service
We will discuss a portlet available from the Portlet Open Source Trading (POST) site at http://portlet-opensrc.sourceforge.net/
NOTE The Portlet Open Source Trading (POST) site provides a set of open source portlets that conform to the Java portlet API or the Web Services for Remote Portlets (WSRP) standard As of this writing, it has also released
a Google portlet, an e-mail portlet, a wizard portlet, and an upload portlet.
The portlet application we are using is called RSS Portlet and is provided as
a WAR file to be deployed in your portal The open source license for RSS Portlet
is a BSD-style license, so you can use it for free as-is, or make any changes you like to it (although under this license if you do so, you’re not allowed to call your derivation “RSS Portlet”)
The RSS Portlet uses XSL files to translate the incoming RSS feeds into HTML
A style sheet called html.xsl converts 0.9x RSS feeds and a style sheet called Rss20.xsl converts 2.0 RSS feeds Both of these files are stored in the WEB-INF directory of the portlet
TIP At the time of writing, there is a bug in the html.xsl file If your portlet fails to load and leaves errors like the following:
"Can not resolve namespace prefix: im"
in your log files, you will need to remove the “im” and “rss-sample” entries from the line beginning “exclude-result-prefixes” so that it now reads:
exclude-result-prefixes=
"rdf dc dcterms rss content annotate admin image cc reqv"
The entries removed from this reference XML namespaces, which have not been included Earlier versions of the XML parser tolerated this error.
Trang 10248 Figure 9-2 Browsing 0.9-style RSS feeds
This portlet makes use of the Xalan XML parser to read and translate the RSS streams Although the J2SE 1.4 runtime is provided with a version of Xalan, it lacks some of the more up-to-date features required by the RSS Portlet You may, therefore, need to install the latest Xalan JAR files in your portal
TIP If you are using a portlet server based on the Tomcat application server, such as Pluto or Jetspeed, and you are using the 1.4 version of the J2SDK, you will need to take additional measures to have your new Xalan JAR files over-ride the JAR files provided with the SDK To do this, place the JAR files in the common/endorsed/ directory.
An error message like “The output format must have a '{http://xml.apache.org/ xslt}content-handler' property!” is indicative of this particular problem.
The RSS feeds that will appear in the portlet are configured from the “RssXml” preference The default set of preferences configured in portlet.xml is as follows:
<name>RssXml</name>
<value>http://www.theserverside.com/rss/theserverside-0.9.rdf</value>
<value>http://rss.com.com/2547-12-0-20.xml</value>
<value>http://headlines.internet.com/internetnews/top-news/news.rss</value>
<value>http://headlines.internet.com/internetnews/fina-news/news.rss</value>
<value>http://www.sciencedaily.com/newsfeed.xml</value>
Naturally you will want to customize the available feeds to suit the audience
of your portal
The default list includes some 0.9-style feeds, as shown in Figure 9-2, along with some 2.0-style feeds, as shown in Figure 9-3
Download at Boykma.Com