Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
   Ask PerfectXML
   .NET XML KB Articles
   XML Certification
   Events Calendar
   Product Reviews
   Book Reviews
   Books & Magazines
   Getting Help
   Microsoft & XML
  Sample Chapters
   By Subject
   By Publisher
   Web Services
   Native XML Databases
   Web Services
   XSLT Editors
   XML Editors
   Development Tools
   Contact Us

You are here: home »» Free Library »» New Riders »
Cocoon: Building XML Applications
Friday, 13 July 2007
Cocoon: Building XML Applications Cocoon: Building XML Applications
Carsten Ziegeler, Matthew Langham

Cocoon: Building XML Applications is the guide to the Apache Cocoon project. The book contains the much needed documentation on the Cocoon project, but it does not limit itself to just being a developer s handbook. The book motivates the use of XML and XML software (in particular open source software). It contains everything a beginner needs to get going with Cocoon as well as the detailed information a developer needs to develop new and exciting components to extend the XML publishing framework. Although each chapter builds upon the previous ones, the book is designed so that the chapters can also be read as individual guides to the topics they discuss. Varied "hands-on" examples are used to make the underlying concepts and technologies absolutely clear to anyone starting out with Cocoon. Chapters that detail the author s experience in building Internet applications are used to embed Cocoon into the "real world" and complete the picture.

Also see: Chapter 6: A User’s Look at the Cocoon Architecture  (864KB, .PDF)

Copyright New Riders. Used with permission.

Chapter 11: Designing Cocoon Applications

The previous chapters discussed how Cocoon provides a complete XML platform for building applications. We looked at how solutions developed with Cocoon can meet the challenges facing today's modern application architectures. We also presented some examples for small applications and built a personalized news portal using Cocoon concepts and technologies.

Cocoon is not a platform specifically aimed at only one application area, such as a portal. Cocoon can be used to build a variety of applications and solutions. Because we have been using Cocoon as a base for the paid work we do, we have built web sites and portals and have also used Cocoon to build front ends for databases, XML processing systems, and integration systems for different hosting environments, such as those used for Application Service Providing (ASP).

It is our experience that learning to use Cocoon to build these types of applications takes time, because the philosophy behind the solution is different from the way Internet applications are commonly built, using scripting languages such as ASP and JSP or dedicated software solutions built using servlets or other components.

We have included this chapter to provide additional background information and tips that we hope will help you if you want to develop a more advanced Cocoon application, such as an Internet portal. A lot of this information will not be completely new if you have read through the book. However, we have often heard people say, "There is so much in Cocoon. What do I actually need if I want to build a certain type of application?" The aim of this chapter is to provide this information in a different context so that you can then go back to where we originally explained it for the full details.

Before getting into the different types of applications you can build with Cocoon, we will start with some general points that are important when designing any type of software solution. Although this might seem to be a long list of things to think about, remember that you will probably only need to look at individual points when you start building real applications, such as a new Internet portal for a major client. That being said, it is always a good idea to start with a concept of what you will do.

The Application Concept

Few people can cook exotic meals without a recipe. The recipe gives you an idea of what the result will look like, what ingredients you need, and how you should prepare the dish. Using a recipe as a concept for your meal is common sense.

When you build an application with Cocoon, a concept that includes the points discussed in the following sections helps you plan your solution and prevents you from making some of the more common mistakes. The following sections define the system functionality, the application architecture, and aspects such as performance and presentation design.

While thinking about these points, you can also try to work out which of the described Cocoon technologies will be important for what you want to do and whether you perhaps need to write additional components. We will also provide some guidance for the times when, even after you've done all this, things still don't work as you expected. We will start with probably the most common question asked of any application: "What's it supposed to do?"

General Functionality

The first step is to define the functionality of the system you will build. Most systems built with Cocoon publish data in some way. In addition, there might be functions that allow the user to interact with the application in some form. Depending on the type of application, it might be necessary to define areas of information that are then combined into the complete application.

As an example, imagine that you are building a web site application for an imaginary company that produces Rewinders (don't ask us what these are; it's imaginary). Obviously you need functions that allow general information about your firm to be published. However, you have several different areas of information you want to publish, so you need to structure the application. Here are a few areas you might want to define:

  • General information about Rewinders

  • News about the company (Rewinder Inc.)

  • Industry news

  • Products offered

  • Jobs

  • Information for employees only

If you check out some company web sites, you will see that most have this sort of structure. Each area in your web site will have subareas that provide more detailed information. An area such as "Products" will contain all the different types of Rewinders that are offered. The section called "Employees Only" will provide special information about upcoming Rewinders. This information should be available only to someone who has logged on to the system.

After you have designed the application's structure, it is time to think about any interactive components you might need. Perhaps you will need an application form in the "Jobs" area or a feedback form in the "Products" area. In addition, you will need some form of login page for the "Employees Only" area. You also want to know when someone looks at the new "Cool Blue Rewinder," so you specify that you want an email to be sent when that document is viewed.

Depending on the type of application you are building, you might need only publication functions. If your solution is aimed more at processing information, you need more functions that allow interaction with your system.

After you have set up the application's structure, you must work out how navigation through the system is possible. After someone enters the "Products" area, what other areas can he access from there? What happens if he accesses the "Employees Only" area? Working out the navigation and flow can be one of the most time-consuming jobs when designing the application.

A typical application will be a combination of published data and data that flows from the user to the application. After you have defined the site's structure, it is time to think about the content.

The data you want to publish must come from somewhere. Either it is already stored in a file or database, or it will be obtained from external sources at runtime. An area such as "Jobs" will access the current job openings from a database. An application area such as "Industry News" will probably access a news provider to obtain news about the current state of the Rewinder industry. The authentication data is also probably contained in a database. You will need access to it to check such things as the password when a user wants to access the "Employees Only" area.

As soon as you know where the data you want to publish comes from, you need to determine what format it is available in. Of course, it is ideal if the data is supplied in an XML format.

Next you need to define your output formats. Your imaginary company wants to publish its web site in HTML first. In addition, some of the documents are to be in PDF, and you want to offer product descriptions in WML.

Notice that we have not yet talked about a specific technology. Indeed, a first concept does not require any knowledge of how you will realize your application. As soon as you have the concept in place, you can decide which technology to use (in this case, Cocoon) and then move on to defining the actual system architecture.

Application Architecture

After you have defined and documented the points just discussed, you can start building the actual architecture for your application using Cocoon. You need to define the various documents you want to publish through the web site and work out what sort of pipelines you need in order to generate the different formats.

Here are some of the types you need for Rewinders Inc.:

  • Pipelines that obtain data from a file and format that data in HTML or WML (depending on the browser)

  • An additional pipeline that sends an email if a particular document is chosen (such as the Cool Blue Rewinder)

  • Pipelines that access data from a database and format it in PDF (for online product handbooks)

  • A pipeline that accesses online industry data and formats it in HTML

  • A pipeline that receives the incoming forms data from the feedback form and saves it to a database

As soon as you have laid out the types of pipelines you need and have decided how many of each you require, you might need to think about splitting them between sub-sitemaps to ease maintenance of the complete site. Another alternative might be to use content aggregation to combine separate pipelines into a single pipeline that is then formatted for output.

Because a complete application architecture is seldom confined to just one area, such as what you build with Cocoon, you also need to think in advance about things such as bottlenecks that might occur when you roll out your solution:

  • What would happen if all 30,000 employees accessed the "Employees Only" page at the same time?

  • How will the system react when all the customers hit the "Cool Blue Rewinder" page at exactly the same time?

  • Will the email system be able to cope with all the emails?

These are the sorts of questions you should ask yourself while designing the application architecture. This brings up one of the most important aspects of such a system: performance.

Performance and System Environment

We have been building Internet applications for quite a few years, and it is our experience that one of the most common problems is that after it is installed, the solution is always too slow. This is something that all applications suffer from, as you can see in the various discussion forums of any software product.

This does not always mean that the programming is sloppy (although perhaps often it is). There is often a great difference in the speed an application can actually achieve and the perceived performance that the end-user might experience. Also, a system that performs well when only one user accesses it might collapse if several users send requests at the same time.

A system that integrates many different data sources might suffer from bad performance even though the actual portal application might be fast enough. A portal's speed is defined to a great extent by the speed at which data from external systems is delivered. So a portal will be slow if one of the data sources is slow. Unfortunately, no one will care that it is not your fault if it takes minutes for the portal to appear in the browser.

When designing a complex software solution, it is always best to define performance expectations beforehand and to test for performance as early as possible. This sounds simple, but this point is often forgotten until it is too late. If nobody takes the time at the beginning of the project to define the expected performance, the system will always be too slow. It is a lot more difficult to correct performance problems after the solution is in a production environment.

When we installed our first online Internet banking solution, very few people accessed their accounts via the Internet. The application worked well and delivered the web pages quickly enough. However, no real stress testing was performed at the beginning, so we did not really know how many requests our system could handle. Over the months the application was installed, the number of requests grew slowly but steadily. Still, no stress testing was done. After all, the system ran OK didn't it? Then, for some strange reason, the number of people using the system suddenly exploded overnight! Needless to say, the whole system collapsed under the load. It was far worse having a nonfunctional system in this situation than it would have been when Internet banking was still an exotic application.

How do you define a system's expected performance? It depends on what the system is supposed to do. The first thing you can do is check out the data sources and decide what sort of performance you can expect from them. If you are integrating standard data sources (such as a standard database), you can often obtain performance data from the vendor. Get that data, but take in the information with a grain of salt. To really check, you need to run your own isolated tests against the single system if you can. It is much more difficult to find bottlenecks after everything is integrated.

Before you start evaluating the performance of individual systems, make sure you also define your computing environment. What's the good of testing the system on some high-powered system if it will actually be running on a low-end box? Also make sure you test on the same operating system and using the same hosting software (such as a servlet engine). The servlet API might be standardized, but in reality you will find that life is not so simple. And it is a lame excuse to say, "We didn't test on that system" when a complaint comes in.

Another way to find out what to expect from your system is to check out other solutions that might do the same thing you are planning on doing. See how fast they run, and try to obtain some information on how they work. Check out case studies, often published on web sites, to find out the architecture used to build the application. You might also be able to find out by asking whoever built the system.

As soon as you are satisfied that you know what to expect of your application, here are some tips on what you can use in Cocoon to achieve the fastest possible application:

  • Use the built-in Cocoon caching whenever possible when building your pipelines.

  • If you need to write your own components, make sure they support the caching interfaces in Cocoon if possible.

  • Stress-test your application using an available tool, and observe how the performance changes if you adjust the pooling of Cocoon components.

  • Make sure you are running your application with the lowest level of trace, where only errors are logged.

Another piece of advice when writing components that connect to a specific data source (especially if it is not your data source) is to make sure you add a time trace. In other words, trace when you connect to the external data source, and trace when the data is returned. That is the time someone else has to worry about.

If, after testing with stress tools, you find that your system performance is not good enough, you will want to look into what else you can do to improve the response time. Obviously it is a good idea to make sure the system has enough memory and the processor is fast enough. If you are running in a servlet environment, you might want to try an alternative servlet engine to see if you can get better performance.

You might also want to look into front-side and back-side caching. A front-side cache is placed between Cocoon and the Internet. Any client program requesting a particular document receives it from the cache, not from Cocoon itself. The cache can store the complete document and request it from Cocoon only if it has expired. Cocoon then generates the new document and serves it to the cache to be stored. Look into how you can control the expiration of generated documents using the appropriate HTTP headers in your documents. There are several ways of doing this. For example, the Cocoon reader component allows you to set HTTP headers. Another way is to write your own component, such as an action that sets headers when used in a pipeline.

If you are accessing an external data source that is too slow, you might need to implement a backside cache. This type of cache sits between Cocoon and the external data source. The pipeline requests the data from the cache, not from the data source itself. There are various ways of implementing the cache. You can look at the description of how Cocoon caches pipelines to get some ideas on how to implement your own.

It is a good idea to provide the user with some visual feedback to show what is going on. If the user cannot see anything happening on the screen, he will perceive system performance as being too slow, even though it might not be. One way of doing this is to load an intermediate page that says something like "Please wait; your data is being fetched" and then let this page call the function on the server that does this. Presenting the user with something to read while the work goes on in the background means that by the time the user has finished reading, part or all of the data will have been retrieved. Look into redirects and metatags to do this if you are building a site in HTML.

When designing HTML web sites, one of the mechanisms used most often is frames. Although this is not a book on magical HTML design, here's a piece of advice: Remember that each part of a frame causes a new request to be sent to the server. So if you have a page containing four different parts (header, footer, navigation, and actual content), that is a total of five requests to the server and five pipeline calls in Cocoon. Try to reduce the use of frames if possible. One way is by using Cocoon's content aggregation to aggregate the different parts of a page and then use a stylesheet to format the output.

In addition to the tips just discussed, there are additional areas you will want to check when you design the output formatwhich brings us to presentation.


Most applications have some form of presentation. Because presentation in Cocoon is done using XSL stylesheets, you need a working knowledge of this technology to be able to author your presentation. You will also want to look at tools that help you author stylesheets.

One of the major steps is deciding what presentation format you need. Of course, the advantage of Cocoon is that you can add further types of presentations by adding stylesheets as you need them. However, this should not keep you from planning your presentation carefully.

Decide whether you want to support each client application (such as the different browsers) individually or whether you want to go for a format that suits both. Be aware that by the time you have finished your application, a yet-unknown browser might be the market leader.

Design your presentation for speed. This point is not necessarily limited to Cocoon applications, but it is worth stressing. If you plan on presenting your data in HTML, make sure you follow the guidelines as to how you should construct HTML pages for maximum speed when you author your stylesheets. This can depend on the browser type, so refer to available information on this subject.

Make sure you follow the Cocoon paradigm of separating concerns. Even though Cocoon offers you ways of splitting layout and content, it does not force you to. We have seen Cocoon applications built where XHTML was used as the format for the data. Although this might seem like a good idea to start with, after all, XHTML is an XML format. Imagine trying to then provide a presentation layer in WML. As mentioned in Chapter 2, "Building the Machine Web with XML," extracting the actual data from a format like XHTML is quite difficult.

Decide whether your presentation is static or whether it offers personalization of some sort. Check out the later section "Portals" for more information on using personalization to influence the output of your application.

Think about seasonal changes to your presentation. Make your application interesting by making small changes to the web site's appearance, depending on the current season. For example, you could give your site a Christmas feeling during November and December. Write a component such as a selector that provides you with this information.

If you already have HTML pages that you want to reuse in your Cocoon application, this is also possible. You would use the HTML generator to read the HTML and then have a stylesheet format the XHTML into the format you require. This is a way of easing the migration path to a complete XML/XSL-based solution. Another way of migrating is to have the Cocoon solution run in parallel to the application you already have. Cocoon can then generate parts of your site for you. Any new HTML pages can be authored using stylesheets, and the existing site can be served as before.

Even though you might have authored your HTML documents using stylesheets, there will be times when you need to include technologies such as JavaScript in your pages. Another technology that is often used with HTML is Cascading Style Sheets (CSS). CSS is often used to achieve dynamic look-and-feel changes on HTML pages. All of this can be used (or reused) in a Cocoon environment. The site map must be configured to allow the JavaScript (.js) files and the CSS (.css) files to be served through Cocoon. Look into using a reader to do this. Alternatively, these files can be served directly from the web server.

It's possible to use other technologies inside your web pages in the same way. You can use Java applets inside web pages by using the appropriate tags to include them inside the generated HTML pages. Just make sure your .jar file can be served either through Cocoon or directly.

While someone is working on the presentation side of the application, someone else can be defining the content.

Know Your Content

We have already mentioned that the XML parser used in Cocoon can validate the XML data it parses. It can do this using DTDs or XML Schemas. When building the application, you will probably not yet have a DTD for all your data. This means that you cannot use XML validation in Cocoon, because you can only activate it for all the documents, not for an individual one. Even if you do not use the parser to validate the data, you should document your XML using either a DTD or an XML Schema before moving the application into a production environment. (Of course, the earlier the data's format is documented, the better.)

As more and more XML tools come onto the market, they begin to offer advanced features such as automatically validating the data you enter into, say, an editor. Now, suppose you have a Cocoon-based system and have authors who are writing content for that system. Often, they will use third-party tools to do this and then upload the content to the system or deploy it through some other means (perhaps saving it to a database). Obviously this is ideal if you can provide these authors with a DTD of the data. They can then use the DTD inside their editing program, and you know that the data they submit will be in a format you expect and have written stylesheets for.

While the designers are working on developing the stylesheets that will present the data, that data also needs to be defined and documented.

Document Your Data Sources

We talked briefly about external data sources when we discussed application performance. However, other factors also need to be taken into account when data is obtained from an external provider, such as a news feed.

Obviously, the most important fact is that you know exactly what format the data will be in. The best way to achieve this is if the data's format is documented in some way, such as in a DTD. You read about the various ways of documenting XML data in Chapter 2. It is an enormous advantage if your provider can send you the data in a standardized format. This becomes a great time-saver if you have to integrate several sources and they all can provide the data in the same format. It will then be possible to reuse the stylesheets. This is true of the news providers we looked at when building the Cocoon news portal in this book. Because the news is provided in RSS format, you could use the same stylesheet for several different feeds.

When designing the flow of data through your application, you need to consider two important points. The first point is the internal data definition. As shown in Figure 11.1, this is the format of the news data in your application. Every external data format needs to be converted into this format, so you need a stylesheet for every data source. Obviously it makes sense to choose a standardized format as your own internal format. This reduces the number of transformations you need, because not every external source that already supports your internal format needs a stylesheet transformation.

The next step is to define a logical layout format. News data is not normally structured for presentation, so you need to think about defining a format that allows transformations into the end format, such as HTML or PDF. If your application is not limited to publishing just news data, but it also publishes other types of information, you will want to look into defining a logical layout format that is not data-specific. This lets you easily publish different types of data using the same stylesheets.

If you opt to use a standard format such as WML or XHTML as your logical layout format, make sure you will still be able to convert this format into a different layout, as shown on the right side of Figure 11.1.

This concept leaves you with three different transition areas:

  • Incoming data must be transformed into your news data format.

  • The news format must then be transformed into the logical layout format.

  • The last area of transformation is into the regular output format.

Format transitions using stylesheets
Figure 11.1 Format transitions using stylesheets.

Check to see whether your data source is always online. Nothing is more embarrassing than finding out that your news provider is online only during the day when your news portal crashes the first night. Use appropriate selectors in the pipeline to ensure that you access the online server during the day and perhaps a database repository at night.

Make sure you can obtain the data you need with the least number of requests possible. We have seen a Cocoon-based application built to present stock information in which one block of information (such as an overview page) required the middleware solution to perform more than 20 requests against the data provider. Even worse, most of these requests had to be sent in order, because they were dependent on each other. The problem isn't that this can't be done with Cocoonit can. But if you remember the earlier tips on performance, perhaps you will see why this point is worth stressing.

After you've defined the functions your system should have, the layout you want to present to the user, and the data format that is to be the core of your application, you need to look at the Cocoon components you can use to do all this.

Different Technologies

As mentioned at the beginning of this chapter, Cocoon provides many ways of solving certain problems. People new to Cocoon are sometimes overwhelmed by the many possibilities. Often, only one type of component is used to solve a problem when perhaps a different solution would have been better. As an example, when starting out with Cocoon, we often found ourselves writing new transformers when it would have been better to use an action or selector instead.

Here are some tips on when to use what:

  • Using a given component is better than writing your own.

  • Use generators when you have an identifiable data source that can be used as the starting point for your pipeline.

  • Use transformers when you need to manipulate the XML data flowing through the pipeline.

  • Use actions and selectors to influence the pipeline if their results do not need to manipulate the output document.

  • Use an action if you want to execute a task that does not influence the XML processing pipeline.

  • Use a selector if you want to choose between different processing pipelines.

  • Use XSP for rapid development of a custom generator, and transform it later into a real generator.

This section has looked at a few aspects that are important when you design your Cocoon application. Performance is probably the key factor when the application is actually finished and installed. A well-thought-out concept is a necessary starting point for good design. "Program now; think later" is, in our opinion, not the way to build Cocoon applications. Unfortunately, even writing a great concept beforehand still might not prevent problems from occurring.

Solving Problems

So, you've written the concept, designed the architecture, written any needed components, and built the pipelinesand things still don't work as you expected. Here is a two-sentence answer to this problem:

  • Someone else has already solved your problem! All you need to do is find that person and solution.

Sounds simple, doesn't it? But for many cases, this is true. Problem solving has become easier with the Internet. When we first started using Usenet newsgroups (which were exchanged using UUCP back in those days), we could post our problemsnot just to our colleagues in Paderborn, Germany, but to the whole world! And the Internet has expanded this "knowledge base" so that now it is very probable that someone out there has already had the same problem you are trying to solve.

The Cocoon web site is a good starting place for finding information and help. There you can find mailing lists and archives of past list discussions. Chances are your question is there somewhere. Subscribe to the mailing lists and join the Cocoon community. Appendix C, "Links on the Web," lists links for the Cocoon web site.

Search engines are also a good choice when you are looking for a solution to your problem. However, if you query a search engine, you probably will be swamped with thousands of answers that don't really help. If you already know roughly the area your question applies to, perhaps checking one of the newsgroups is a better way to go. There are newsgroups for most of the subjects in this book, such as XML and XSL. However, there is as yet no newsgroup for Cocoon. Hopefully, you will be able to solve any problem that might arise using one of the listed methods.

Using the information discussed so far should allow you to complete your application concept and design the architecture of your solution, complete with the required Cocoon technologies. Even though most people who look at Cocoon and read this book will already have an exact idea of the type of application they want to build, it is always a good idea to see how other people are using the technology. The following examples might provide some additional ideas for the types of applications you can build with Cocoon.

Different Types of Applications

Cocoon lends itself to being used to build a variety of solutions. Although Cocoon is aimed primarily at the XML publishing sector, adding your own components lets you expand Cocoon into a complete middleware architecture.

In the past we have worked on building a commercial solution that provides additional (and sometimes customer-specific) components needed to provide a complete solution. We added components and functionality to Cocoon without throwing away a single Cocoon concept. This shows the extensibility of the architecture.

To give you some idea of what perhaps you can do to solve a specific problem, here are some of the extensions we have written to provide the various solutions we have built with Cocoon:

  • Components for authentication and user administration

  • Portal framework components

  • A complete XML/XSL-based content management system

  • Integration components for a commercial XML database

  • System management components

Although these components were not written as part of the Cocoon project, some of them will find their way back into Cocoon and hopefully will be available in the not-too-distant future.

Using Cocoon and the additional components allows you to build applications such as portals, flexible publishing systems, and web sites. Because Cocoon can process XML data, you can also build solutions that can receive complete XML documents as input and process them using pipelines.

Let's look at some of these application types in more detail. The most common Internet application is the web site, where information is published as HTML. This type of application becomes more complex to develop when the information is stored in external systems such as databases and when additional formats such as PDF are required. The web site needs to be extended into a network publishing application to provide these advanced capabilities. When several different types of users are accessing the system, some form of personalization is called for. The term portal is often used to describe this type of application. This chapter concludes with a look at how to use Cocoon to build portals.

Using Cocoon to Build Web Sites

One of the most common uses of Cocoon is as a system for building web sites. After all, that is its main function. Many web sites already use Cocoon; they are listed on the Cocoon web site. We discussed a web site example earlier in this chapter. Now we will add to the information that was discussed there.

Remember that Cocoon organizes a web site's content using a sitemap. Although it is possible to define a pipeline for each document your web site will serve, this would result in a sitemap that becomes very hard to maintain. Therefore, you need to define pipelines that can handle similar types of content, perhaps split into different areas. Look into how you can use wildcards in the sitemap as a method of combining several documents into one pipeline.

Make sure the layout developer (the author of the stylesheets) uses a tool that can perform XSL transformations on some sample data for that format. You should provide the author with sample data to use. It will be easier for him to test individual stylesheets this way instead of having to use Cocoon each time.

Another important point is to make sure the layout deployers use a tool that either already uses the Xalan XSLT component or that lets you use it additionally. If the tool allows a version of Xalan to be used, make sure you use the same version as the one in the Cocoon you will be running. Which tool is best suited for the job depends largely on exactly who will be using it and for what purpose. We have provided a list of relevant links to tools in Appendix C.

Although your first-version web site might only read its content from XML files and publish to a single format such as HTML, one day you will want to use something more advanced to store your data, such as a database. You might also need to integrate external systems such as mainframes into your application. In addition, there might be demand for additional formats as users use devices such as mobile phones to access your solution. The web site must therefore be extended into a network publishing application.

Network Publishing Applications

Although this is only a different way of defining something, we use the term publishing application to emphasize that the data you want to display is actually stored somewhere, and we don't mean in a file. A publishing system might generate reports from data that is obtained from a database, for example. It then might manipulate the data in some way, perhaps to generate different views and then publish that data in one or more formats.

Areas you will want to look into include the Cocoon components that allow you to access data from a database or external systems such as a remote XML server via HTTP. You will also want to learn more about standards such as XSL:FO. After it is formatted this way, your data can be laid out in different output formats, such as PDF or PostScript.

Publishing systems might be the first time you need to publish data that is dependent on the type of end device. For example, you could allow mobile phone users to access only the most important information while allowing browser users to access the full beauty of your web site.

In our experience, using Cocoon as a publishing system for specific data is an ideal way to introduce the technology into a new area. Applications such as a report generator, which reads data from a database, consolidates it, and then presents that information in HTML and PDF, can be built in an isolated fashion that does not intrude on given software structures. The first little application we built with Cocoon was a front end to an internal database we had at that time containing work reports. The solution read the data from the database dependent on a query parameter and then presented an overview of the data in the various formats. As a prototype showing what could be done with Cocoon and how flexible it was, this was an ideal solution.

Publishing systems might be the first time you also need to integrate something like user authentication and personalizationallowing only certain people to access the data. This brings us to the next application formthe portal.


Although you probably think of something like myYahoo or myAOL when the term portal is used, portals can actually be a lot simpler. We refer to this type of application whenever some form of user authentication is necessary to access information or when information can be individually personalized. This personalization can range from changing the color of a single document to configuring external news sources in a news portal.

In our portal example, built over several chapters, we have already seen how it is possible to build a portal using Cocoon. Nevertheless, and because we know that some readers might jump right to this section, we will go over some of the main points again and in a more general context.

In order for personalization to be possible, we need to be able to recognize the user when he accesses the portal. Most portals require some form of authentication, such as entering a user ID and password. This data is then matched against a repository, such as a database, and the user is rejected if there is no match. Each user therefore requires an entry in the database, and the application perhaps also needs to cater to an anonymous user (a user without a login). After the user is authenticated, the application will want to allow the user to access the different areas in the portal without having to log in again. Look into ways of creating a session when running inside a servlet engine in order to do this. It will also be necessary to recognize a returning portal user so that he does not have to log in each time he accesses some part of the portal. An appropriate action component can solve this problem.

Another important step is to define the portal structure. What information will be available to the user after he has logged in? Will each user have an individual profile, or will the portal cater to only specific groups of users? As soon as this has been decided, a suitable XML format for the profiles can be defined. The profile should then contain information relevant to the personalization (such as colors) or to the individual preferences in regard to the types of information to be displayed.

Therefore, the first step of building the portal is to define where the user data and the portal profile are to be stored. Then the application needs to define and set up a pipeline in Cocoon for the authentication. One way of doing this is to have an HTML form send the user ID and password to Cocoon and then use the sql_transformer to select the user and profile from the database.

If the portal profile contains data on the types of information that are to be displayed, this information must be fetched and integrated into the profile so that it is complete before it reaches the stylesheet. Look into using content aggregation as a way of doing this. Each different data source will then return information that is added to the user's profile, so that the end result will be a complete portal in XML.

After the profile has been selected and all the data fetched from the various sources, the complete profile can then be transformed into a specific look and feel using a stylesheet. The stylesheet can access specific details contained in the individual profile and format the output as necessary.

If the personalization is based on the user who accesses the site, you need to define what types of information the user can change and how the presentation should be affected by, say, his age. If you will be providing a different layout for teenagers than for middle-aged people, you will need to define the criteria by which this can be decided. Writing a new component such as a selector is an ideal way of doing this.

Think about whether you want to change the presentation dependent on other factors, such as the time of day or the weather. Say you are building a stock-quote portal and you present the current market chart (say NASDAQ) on your front page. After the NASDAQ closes for the day, it might be a good idea to present a different chart, such as from Asia. So if you want to switch content and presentation dependent on the time of day, look into the Cocoon selector component as a way of doing this.

If you are thinking about building a late-night portal, in which the presentation changes after a certain hour, remember that your user might be living in a different time zone, so it might be the middle of the day for him when you select the late-night presentation.


This completes this chapter on Cocoon application design. As we said at the beginning, you can build many different types of applications with the current version of Cocoon. Although Cocoon's main focus currently is on web sites, as more components are built that integrate into the Cocoon architecture, it will expand and become a platform for other types of applications as well.

This is one of the great advantages of using Cocoon as a base for XML applications. Because of the way new components can be easily added, there is really no limit as to how you can use Cocoon as the platform for your solution. As an open-source project, it has much support from individuals and companies. Several firms have donated components to the Cocoon project and in so doing have helped the software become better suited for application scenarios such as the network publishing system and portal described in this chapter. The next chapter outlines some of the directions Cocoon might go in as XML and XML applications become more widespread. It also provides some additional ideas as to where Cocoon can be usedperhaps in your particular environment.

  Contact Us |  | Site Guide | About PerfectXML | Advertise ©2004 All rights reserved. | Privacy