Document management, imaging, product data management, digital media and asset management, knowledge management, and web content man- agement are some of the different types of content m
Trang 1CHAPTER 15 Content Management
Systems
ONE OF THE MOST COMMONuses of a portal is to provide an interface to content management systems (CMSs) Some users may need to get information from the CMS, while others may need to create content Many portals integrate with a CMS from the same vendor—sometimes the portal ships with the CMS, and in other cases it is a separate product If you do not have a vendor-supplied integrated solu- tion, you will probably need to develop one using the portlet API and a CMS API
In this chapter, we discuss the Java Content Repository API (JSR 170), and the WebDAV protocol We also build a portlet that uses WebDAV to connect to a content store—in this case, the open source CMS Apache Slide (http://jakarta.apache.org/
slide) Our portlet should work with any WebDAV server, so you can use yourown CMS if it supports WebDAV.
Overview of Content Management Systems
Content management is a broad field that encompasses a wide range of software applications Document management, imaging, product data management, digital media and asset management, knowledge management, and web content man- agement are some of the different types of content management systems Usually, all of these different systems are grouped together into a field called enterprise content management.
From a technical perspective, many of these systems share a common base of functionality and features All of them have a content repository, where content
is stored on a database or file system Most systems use some kind of hierarchical organization for the content, although you will certainly find CMS applications where all of the content is at the same level Most CMS packages with a hierarchical view actually store all of the content in a single database table or directory The relationships for the hierarchy are stored in the database This provides advantages for access and retrieval, and allows the same piece of content to appear in two or more different locations.
The next piece of the CMS puzzle is content delivery Most web content agement packages are optimized for content delivery, and can easily be plugged
Trang 2man-into a web-based application In some cases, part of the vendor-provided content delivery is a display portlet that can save you a lot of development effort One disadvantage of CMS tools with content delivery is that they often include page assembly features, for delivering a web page with navigation, headers, and footers This is not very useful from a portal perspective, where the portal page provides the interface Usually there is a way to access the raw content directly, without any
of the page assembly.
A common use of content management systems is to introduce workflow into the content production process A classic business use case for a CMS proj- ect involves a content producer creating a Microsoft Word file locally and then uploading it into the CMS His manager gets a review notice through e-mail and logs into the CMS The manager approves the content inside the CMS, and the content is ready for delivery Most enterprise CMS applications will have workflow
or an approval process for publishing built-in The level of automation and custom development varies from CMS to CMS Sometimes it can be very easy to create
a complex content review process that turns out to be unwieldy for the end user
in practice Bottlenecks will start to appear, especially with different levels of approval Creating a “ready for review” portlet for a CMS is usually a straightfor- ward development project involving a proprietary API
Content production and authoring is a newer technology that has become more popular with the availability of rich text or HTML authoring controls for web pages These content-creation tools could be Java applets, ActiveX controls, DHTML, Flash, or any other client-side technology Some CMS applications come with these as part of an authoring workspace The rapid adoption of WebDAV in desk- top applications means that these controls may not be the best solution for your users, especially if they are already familiar with tools like Macromedia Dreamweaver MX It is easy enough to embed one of the client-side HTML author- ing tools into a portlet—saving the HTML onto the server will depend on the CMS Personalization is a feature that somewhat overlaps with portals Your content management system may support varying levels of personalization, some of which may coincide with your portal vendor’s personalization product If you have to choose between the two, the portal’s personalization will work for other applica- tions running on the portal, but the CMS personalization will be portable across multiple portals Ideally, future versions of the portlet API will standardize person- alization, so this will become less of a problem.
Almost every CMS includes some level of search support, whether it is a simple SQL query interface or an integrated search engine like Verity or Lucene The Java Content Repository API defines a standard for queries and query languages that should gain support from Java-based CMS vendors The trickiest part of external search engine integration with a CMS and a portal will be indexing the CMS prop- erly If your site includes multiple user groups with different access to content, you should consider a federated search approach, as we described in Chapter 10 for Lucene Commercial search vendors will have their own recommendations, and will probably offer ready-made JSR 168 portlets either now or in the near future.
Trang 3Integration with a Content Management System
Most portal deployments require integration with at least one content management system; often, integration with several different vendors’ systems is necessary From
a project management perspective, bringing content into a portal requires several steps The first is to identify which content should be available and where the content is coming from The next step is to determine which sets of users should see which content The third step involves identifying which functionality in the content management system belongs in a portlet After these business process steps are completed, you can start planning the technical architecture of the integration—does the vendor provide a JSR 168 portlet already? Many vendors write portlets for their content management systems, which can make your job much easier Two commercial vendors with JSR 168 portlets at the time of writing are Stellent and Documentum; other vendors likely have products on the way If you do not have a ready-made portlet application to roll into your application, you are going to need to look into the integration APIs for the content manage- ment system.
There are two major standards for CMS APIs: WebDAV and the new Java Content Repository API (JSR 170) WebDAV is a set of extensions to the HTTP protocol for versioning, accessing metadata, making directories, locking files, and checking files in and out, among other things WebDAV is not tied to a single platform or architecture, although the CMS must specifically implement a WebDAV layer The Java Content Repository API (JCR API) is a new standard for Java content management systems The JCR API defines a standard set of interfaces and classes that CMS clients can use to connect to a CMS and access content and metadata.
We discuss both WebDAV and the JCR API in this chapter.
Neither of these APIs covers all of the possible functionality for a CMS In addition, not every CMS implements one of these APIs—most will have a separate proprietary API, which you will have to implement yourself If there are any servlet/JSP example applications, they should be easily adapted to a portlet application.
You can pull content out of almost any content management system through its database or file system store, but that should be a last-ditch integration step.
Of course, if your CMS is 10 years old, running on a legacy platform, and does not have an open API, this may be your only choice It is probably better at that point to migrate the legacy CMS to something newer, but for lots of reasons that may not make business sense.
Common Problems with CMS and Portals
Some of the most common technical problems with CMS integration with portals are authentication, access control, link rewriting, and content delivery We can manage authentication with Single Sign-On (SSO) functionality, which we discussed
Trang 4in Chapter 8 If the application is not suitable for SSO, you can collect the correct CMS credentials from the user once, and then store them in the user’s portlet preferences.
Access control partly comes from SSO, especially if you have an enterprise-wide set of permissions for your portal and your CMS If all of your access control is maintained in one directory, you can cut down on technical support, but your software development costs for integration will be huge If you do not have an enterprise access control system, most content management systems will only display content that the user has access to You will have to manage the permis- sions yourself, either programmatically or through an administrative GUI Link rewriting is another common problem The links in your CMS content will not stay in the portal You could write a set of content display adapters that rewrites your HTML content with the appropriate portlet URLs for links Another approach would be to standardize on an enterprise-wide XML format for content Each content delivery or content authoring system would be responsible for ren- dering the XML correctly for display in that system, but creating the correct links would be easy.
Any content that relies on JavaScript will probably not work, unless the JavaScript is completely contained in the piece of content Because you may want to use the content in more than one location, you probably do not want JavaScript embedded in your content Convince your content producers that they do not need to use scripts—one way to encourage this is to provide support for custom HTML or XML tags, such as <PrinterFriendly>, <PopupWindow>, <DynamicMenu>,
or similar tags Your content delivery applications would render these tags in the appropriate manner for display, or ignore them altogether This puts more control
on the systems side, and takes control away from the content creators Portals especially need this type of control over content because the content needs to appear in a portlet.
You will have to determine how content delivery through a portal will work You could display all of your content in new browser windows that open up out- side of the portlet window If you take this path, your CMS portlets would open links to web applications that display the content correctly, with working links, styles, and images Another approach is to display HTML or XML content inside the portlet, and rewrite the links to any binary data such as PDF files or images
to use a servlet for access Your portlet cannot stream binary data to the user’s web browser directly, so an approach like this is necessary You could also look into ActiveX controls for PDF files, Microsoft Office files, and the like.
Java Content Repository API (JSR 170)
The Java Content Repository API (JCR API, www.jcp.org/en/jsr/detail?id=170)
is a common interface to content management systems, just like the portlet API is
a common interface for portals The JCR API is Java Specification Request 170
Trang 5(JSR 170), and at the time of this writing, it was in public review Similar to the portlet API, the motivation for the JSR 170 standard was that each CMS vendor used a different API Writing applications on top of these proprietary APIs was difficult because the application ran only on one CMS or because porting and maintaining compatibility required lots of development resources Imagine try- ing to build an application (for instance, a search engine) that ran on a number
of portals and used several different content management systems Then imagine supporting that application for all the combinations of systems your customers might have.
The advantage of the JCR API is that more applications can take advantage
of content management systems—the barrier to entry is lower, and there is less worry about proprietary lock-in A client application does not have to know the details of how the JSR 170 implementation works on the content management system The JCR API does not specify a client/server protocol Because some CMSs organize content in a hierarchy of folders and content items, and others organize content in a flat set, the JCR API can use either type of structure.
The JCR API does not cover all of the possible functions of a content agement system The standard covers the most common functionality for
man-a content repository, but does not include such man-areman-as man-as personman-alizman-ation, lishing, workflow, or taxonomies There are two levels of the JCR API The first level is Level 1, and it includes basic content repository functionality The main features it includes are
• Changing and retrieving different content types
The more advanced functionality is grouped into Level 2 Level 2 is not required because not every CMS needs that level of complexity The advanced features in Level 2 are
• Transactions
• Versions
• Observation
Trang 6of the APIs and key concepts might be implemented differently in the CMS You should understand how the concepts described next map onto your CMS, especially noting which functionality is unavailable through the JCR API.
Repository
The javax.jcr.Repository interface models a Java content repository The tory represents all of the content, relationships, and metadata in the content management system The content repository contains content workspaces Your portlets will ask this class for a ticket that represents access to a workspace for
reposi-an authenticated user The repository will need a valid set of credentials for the user.
Ticket
Tickets map authenticated users to workspaces Ticket objects implement thejavax.jcr.Ticketinterface A ticket maps to a single workspace Each ticket pro- vides access to the repository for the user, but the ticket will keep any changes queued until the portlet either reverts or saves the changes.
Credentials
The javax.jcr.Credentials interface represents the user authentication information for the user If the credentials are valid, the repository will return a valid ticket that grants access to a workspace Your CMS will implement the interface with whatever information it needs to grant access—this will usually include a user- name and password, and could include a group or a domain, or other custom authentication attributes.
Trang 7Use the javax.jcr.Workspace interface to get access to a content workspace The repository holds one or more workspaces Each workspace has a tree of items, which are organized under a root node.
Each Workspace object for an authenticated user maps to aTicketobject.
Nodes can have zero or more properties.
Each node has one primary node type, but can also have multiple mixin node
types Mixin types describe additional information about a node, beyond its mary node type Each primary node type inherits from the nt:base node type, which must be supported The CMS may define its own node types below the hierarchy Some predefined (but optional) node types are nt:file, nt:folder,nt:version, and nt:query Certain primary node types require mixin types, andothers allow only certain mixin types Nodes can have versions, although the node must have the mixin node type mix:versionable.
pri-Property
Properties are children of nodes, and have only one parent node The property interface is javax.jcr.Property Properties represent pieces of metadata about nodes The values of properties must conform to allowed property types, which include strings, binary data, dates, longs, doubles, and booleans Properties may also be soft links or references Soft links are links to paths in the content repository.
These are soft references; the linked content may be moved, deleted, or may not even exist The soft link’s path can be absolute or relative References are hard links
to nodes They link by the node ID (UUID), and they must exist If a reference
Trang 8exists to a node, that reference must be deleted before the node may be moved
or deleted Some properties may have multiple values.
Path
A path points to an item in the repository Paths may be either relative or absolute /Engineering/Reports/11222.doc is an example of an absolute path in the repos- itory /Reports/11222.doc is a relative path, just like a file system Your portlet may get a node through the ticket by its absolute path If two or more nodes under the same parent node have the same name, the path can be tricky You will have
to use array-based notation (starting at 1, not 0) to reference the node you want.
a standard set of fields The difficulty with Lucene is writing classes that synchronize the contents of the CMS with the Lucene index, especially if you are integrating multiple systems We expect that many JCR API implementations will use Lucene
to provide search capabilities.
The JCR API defines two query languages for the search function:
• JCRQL (with SSES): Java Content Repository Query Language (with
Simple Search Engine Syntax) is similar to SQL, but has extensions for the hierarchical content model and also supports standard search query terms.
• XPath: XPath 2.0 is an XML technology for searching through a hierarchical
XML document and extracting elements that match an XPath expression The JCR API XPath query language supports a subset of the XPath 2.0 func- tionality plus some extensions needed for the JCR API.
Each content repository has to support at least one of these query languages Each CMS can also support additional languages—for instance, a Google-style query language, or a Lucene query language with named fields This means your application will need to know which query language the CMS supports Thejavax.jcr.query.QueryManagerclass has agetSupportedQueryLanguages()method
Trang 9that will return the supported languages If you are building a general-purpose application, you will probably need to support both of the standard query lan- guages This way, your application will run on any JCR API–compliant CMS Your support may just be limited to different help files for the search engine because the QueryManager class also parses the query from the user’s statement.
Development with the JCR API
The JCR API classes belong to the javax.jcr package and its subpackages To start developing with the JCR API, you will need to select and install a server that implements the standard The standard is still quite new, so we expect that a ref- erence implementation of the JCR API will be released around the time that this book is published Some of the details of the API may have changed since the pub- lic review, but all of the major concepts should be the same.
The first step with the JCR API is to obtain ajavax.jcr.Repositoryobject Your content management system should include directions for getting an instance ofRepository, because this is one area of the API that is not standardized The authors
of the specification expect that a JNDI lookup will be a common approach Repository
is an interface with one method, login():
public Ticket login(Credentials credentials, String workspaceName)throws LoginException, NoSuchWorkspaceException
Thelogin()method takes a set of credentials and a workspace name Thejavax.jcr.Credentialsinterface consists of agetUserId()method; agetPassword()method; and several methods for storing, setting, and removing attributes on the credentials The JCR API provides a basic implementation of theCredentialsinter- face with thejavax.jcr.SimpleCredentialsclass You can create a new instance ofSimpleCredentialsby calling its constructor and passing a user ID and password
as arguments Upon successful authentication, thelogin()method returns aTicketobject.
The javax.jcr.Ticket class is the main gateway for your client to access the content repository From the ticket, you can get the root node of the workspace,
or you can get a node by its absolute path You can also import an XML document that represents new items.
Once you have a node, you can continue traversing the tree by relative paths.
The Node class has methods for retrieving and setting the node’s properties You can also create new nodes or add existing nodes as children After you make any changes, you will have commit your changes by saving the node You can also save all of your changes for the workspace by calling the save() method on the ticket.
Trang 10Retrieving a document out of a content repository with the JCR API is simple When you have a node with the primary type nt:file, that node will have a child node called jcr:content The jcr:content node holds the content in one of its properties, which could be called data You could get the value of the data property, and then pass it back through to the portlet.
WebDAV
WebDAV (www.webdav.org) is a commonly implemented protocol for connecting to content management systems and other content stores The WebDAV specification (RFC 2518) can be found at www.webdav.org/specs/rfc2518.htm Many applications and operating systems are WebDAV compatible A non-exhaustive list of compati- ble client applications follows:
WebDAV is an extension of the HTTP 1.1 protocol, so it is relatively easy to implement
WebDAV Methods
If you are already familiar with the GET and POST HTTP methods, the WebDAV methods will look very similar WebDAV adds many new methods beyond GET,
Trang 11POST, HEAD, and the other standard HTTP methods The biggest difference is that WebDAV supports, and for some methods, requires, an XML message body for the WebDAV methods.
We explain some of the most commonly used WEBDAV methods in this tion Other methods in the WebDAV specification are MOVE, COPY, LOCK, and UNLOCK The versioning extensions to WebDAV (RFC 3253) also define the VERSION-CONTROL, REPORT, CHECKOUT, CHECKIN, and UNCHECKOUT methods Other related specifications are WebDAV Ordered Collections (RFC 3648) and WebDAV Access Control (RFC 3744).
sec-PROPFIND
Use the PROPFIND method to access the properties of a resource The client will ask for a set of properties by name for a given WebDAV resource The server can also return all of the properties for the resource.
The client may also ask for all of the resource’s properties along with all of its children’s properties up to a given depth The depth may be 0, 1, or infinity An infinity depth returns the properties of all resources under the named resource.
If your WebDAV client needs to browse through content in the server’s tory, the PROPFIND method is useful for determining what the current resource’s properties are and the names and types of resources under the current resource You can easily create a directory listing style interface If you use a depth of 1 or infin- ity, your application could cache the properties to improve performance
reposi-PROPPATCH
Client applications use the PROPPATCH method to create, modify, or remove properties on a resource You can both set and remove one or more properties in
a single request The PROPPATCH method is an all or nothing proposition—if any
of the requests to set or modify a property fail, none of the requests will be manent Any changes before the failed request will be set back to the way they were If a PROPPATCH call fails, you will get an error message back explaining what the problem was.
per-MKCOL
The MKCOL method creates a new collection resource The request’s path should not already exist If the path does exist, the MKCOL method will not work Another condition to consider is that the specified path’s parent collections must already exist—only one collection will be created.
Trang 12The DELETE method removes the non-collection resource at the specified path.
If the resource is a collection, the DELETE method will remove the collection and all resources under the collection This is a very powerful method.
PUT
The PUT method creates a new resource at the given path, or it replaces the tents of the existing resource The PUT method works only for non-collections If you need to create a new collection, use the MKCOL method.
con-Slide WebDAV Client Library
The client distribution for Apache Slide includes a command-line client and
a WebDAV client library We will use both to build a portlet that communicates with a WebDAV server Both the WebDAV client library and the command-line client are open source projects licensed with the Apache Software License The Java WebDAV client API is straightforward, once you understand the basics of the WebDAV protocol
The WebDAV client library is in the jakarta-slide-webdavlib-2.0.jar file You need to copy that file into your WEB-INF/lib folder The source distribution
of the Slide client includes the source code of the command-line client The command-line client is the best source for information on how to use the WebDAV client library, so we used its source code as a model for our portlet
The WebDAV client library classes are in the org.apache.webdav.lib,org.apache.webdav.lib.methods, and org.apache.webdav.lib.propertiespack- ages Your application will create an org.apache.webdav.lib.WebdavResource object that represents a resource on the remote server You will need the URL of the remote resource, along with the username and password (if necessary) If the resource does not exist, you can create it once you have the object.
Once you have an object that represents a resource, you can execute the WebDAV methods we discussed in the previous section The WebDAV methods are available directly on the WebdavResource object (mkcolMethod(), moveMethod(),deleteMethod(), etc.) Other methods on the WebdavResource object use WebDAVindirectly, or return information that already exists on the object These includelist(), listBasic(), and listWebdavResources() Each of these methods returnsinformation about the child resources of the current resource if the current resource
is a collection The list() method returns aStringarray of pathnames to the children The listBasic() method returns the child resources’ path names, content
Trang 13length, either collection or a content type, and the last modified date This mation is stored in an array for each resource, and then each array is stored in
infor-aVector If you would like to get an array of WebdavResourceobjects that represents each of the child resources in the collection, use the listWebdavResources() method.
Use your operating system’s built-in WebDAV support to add some files and folders to the Slide content repository Documentation for both Windows and Mac
OS X is available on the Slide web site You can also use the command-line Slide WebDAV client Our portlet only allows content browsing and viewing, although
we could certainly add more file management support.
We are using several open source libraries for our WebDAV portlet The first
is the WebDAV client library we discussed previously That library is packaged in the jakarta-slide-webdavlib-2.0.jar file It also requires the Jakarta Commons HTTP client library and the Jakarta Commons logging libraries The correct versions are
in the Slide client binary distribution The other libraries we will use are the JSP Standard Tag libraries, for our JSP files We use the Jakarta Commons Standard Tag Library, version 1.0.4 (the same as Chapter 5) We will need the jstl.jar and the standard.jar files All of these libraries should go in your WEB-INF/lib directory.
Our portlet will display the available resources for a collection, or it will display the contents of a noncollection resource If you select a collection, the portlet will update its internal pointer to a WebDAV resource, and then display the resources
in the collection If you select a file, the portlet will retrieve its contents as a string, and then display them in the portlet window You may also navigate back up the hierarchy with the parent folder link at the bottom of the page.
We created one portlet class, CMSPortlet.java It responds to action requests and render requests The WebDAVHelper class encapsulates the WebDAV functionality.
WebDAVHelperis a bridge between the portlet and the client library, and it includes some utility methods Our JSP file, ListFiles.jsp, uses the portlet and standard JSP tag libraries to display the resources for a collection.
Trang 14The CMSPortlet class initializes itself from the initialization parameters on the let deployment descriptor Three parameters, URL, username, and password, contain the connection information for the WebDAV server.
port-The doView() method looks at the current WebDAV resource to determine if
it is a collection If it is a collection, it dispatches the request to the ListFiles.jsp page If it is not a collection, it asks for the contents of the resources as aString,and displays them in the portlet output We could also have created links in the JSP file that would show the contents of the resources in a new window if the resource was an image, PDF file, or another binary file.
The processAction() method looks at the COMMAND parameter and then performs an action based on the command All of our commands change the current WebDAV resource.
public static final String COMMAND = "COMMAND";
public static final String CHANGE_COLL = "CHANGE_COLLECTION";
public static final String DISPLAY_CONTENT = "DISPLAY_CONTENT";
public static final String DISPLAY_PARENT = "DISPLAY_PARENT";
public static final String PATH = "PATH";
WebDAVHelper helper;
Trang 15public void init(PortletConfig config) throws PortletException{
super.init(config);
helper = new WebDAVHelper();
try{String url = config.getInitParameter("URL");
String username = config.getInitParameter("username");
String password = config.getInitParameter("password");
helper.openURL(url, username, password);
}catch (IOException e){
System.out.println(e.getMessage());
e.printStackTrace();
throw new UnavailableException(e.getMessage());
}}protected void doView(RenderRequest request, RenderResponse response)throws PortletException, IOException
{response.setContentType("text/html");
Writer writer = response.getWriter();
PortletContext portletContext = getPortletContext();
WebdavResource resource = helper.getResource();
request.setAttribute("resource", resource);
System.out.println("name: " + resource.getName());
if (resource.isCollection()){
PortletRequestDispatcher prd =portletContext.getRequestDispatcher(
"/WEB-INF/jsp/ListFiles.jsp");
prd.include(request, response);
Trang 16}}public void processAction(ActionRequest request, ActionResponse response)throws PortletException, IOException
{String cmd = request.getParameter(COMMAND);
System.out.println("Command: " + cmd);
if (CHANGE_COLL.equals(cmd)){
String path = request.getParameter(PATH);
if (path != null){
System.out.println("path: " + path);
try{helper.setPath(path);
}catch (WebdavException e){
System.out.println(e.getMessage());
e.printStackTrace();
}}}else if (DISPLAY_CONTENT.equals(cmd)){
String path = request.getParameter(PATH);
if (path != null){
System.out.println("path: " + path);
try{helper.setPath(path);
}catch (WebdavException e){
Trang 17e.printStackTrace();
}}}else if (DISPLAY_PARENT.equals(cmd)){
String path = request.getParameter(PATH);
if (path != null){
System.out.println("path: " + path);
try{helper.setPath(helper.getParentPath(path));
}catch (WebdavException e){
System.out.println(e.getMessage());
e.printStackTrace();
}}}}}
WebDAVHelper.java
The WebDAVHelper class is a utility class we created to simplify our interactions with the WebDAV client library We have several methods to make working with paths easier because the slashes can become tricky.
The openURL() method creates a new HttpURL object that represents the URL
to the WebDAV server Because our WebDAV server is protected with HTTP tication, we set the user info on the HttpURL object with the user’s username and password.
authen-package com.portalbook.portlets;
import java.io.*;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpURL;
Trang 18import org.apache.webdav.lib.WebdavException;
import org.apache.webdav.lib.WebdavResource;
public class WebDAVHelper{
private WebdavResource resource = null;
protected void openURL(String uri, String username, String password)throws HttpException, IOException
{HttpURL url = new HttpURL(uri);
if (resource == null){
url.setUserinfo(username, password);
resource = new WebdavResource(url);
}else{resource.close();
resource.setHttpURL(url);
}}protected void setPath(String path) throws WebdavException{
try{String collPath = fixPath(path);
resource.setPath(collPath);
if (!resource.exists()){
throw new WebdavException("Path does not exist.");}
}catch (Exception e){
throw new WebdavException(e.getMessage());
}}protected String getParentPath(String path){
path = fixPath(path);