The legacy document-generation subsystem used Word to create the document: a template was created to get the layout and formatting right, and simple placeholders used to mark the locations of the data we would substitute, taken from the XML data. This approach actually loads Word into memory (on the server!) to do the work – it’s slow, memory-hungry and just generally clumsy and ugly. Plus you end up with a Word document, so the clinician needs Word (or the reader) to view it.
PDF is a more acceptable format, in my view. We can secure and digitally sign the document when we generate it to prevent subsequent changes. Recipients can view PDF on any platform with a free viewer. The problem for me was how to generate the PDF programmatically, from the XML data. There are probably several ways to do this, but I chose XSL-FO and the Apache FOP project mainly because I wanted to avoid using a proprietary PDF generator product (there are lots out there), but also since XSL-FO can do more than just generate PDF.
First problem: how do you create the FO which is the ’shape’ of the generated document? Of course, you could simply read through all the manuals and write one from scratch. Well, I’m just plain lazy, you see, and I don’t want to do all that. I want to take my nice OpenOffice document, or even Word document, and have a tool create an XSL-FO for that document. And I’d like the tool to be free (there are commercial tools of course, but I’m cheap). Does such a thing exist?
It does! Amazingly, precisely this facility exists in Abiword: not my favourite word-processor by a long, long way, but a good solution for this particular problem. OpenOffice should be really good at doing this, as it stores documents natively as XML and already uses FO internally for some style information. But, despite some promising hints, there is no mature support for this. This is a real shame: this is just the sort of thing OOo should be capable of, especially as it’s apparently half way there already.
Here’s what I managed to find on the OOo site:
- Some OpenOffice transforms and things.
- And this one talks about ooo2xslfo but that project looks as if it’s just a SVN repository – not sure I have time to play with that.
- The OOo XML site has a page on XML filters too.
Important to mention also that Microsoft does have an XSLT which you can apply to Word documents to generate XSL-FO. It’s freely downloadable from this download page. I tried this, and it works, but the resulting FO is much messier than Abiword’s.
Once you have the FO, the obvious step is to embed it in an XSL, add xsl:value-of elements in the appropriate places and use a transform to populate the template. This is the approach I took for the proof-of-concept and it worked well. The resulting PDF looks almost right – with a small amount of FO-tweaking, we should have something very usable.
But using XSL means loading up and running the (trivial) transform which I think may be very inefficient for such a simple case, plus it requires the FO to be edited. I’ve decided to use a simpler approach (using StringTemplate) which I hope will be more efficient, and requires less FO editing (just the addition of $fieldName$ placeholders). All we need is a list of (fieldname, XPath) pairs for our XML message, in order to drive this template. Of course, most other applications will need the power of XSL (e.g. to deal with tables of entries): I’m only avoiding it here because the data is so simple.
This is something we’re bound to want to do again, in different contexts, so I’m using this project and the prototype to build a tool-chain and utilities for this, so we can use the same approach more easily next time.

