OSS4U, LLC

OOpyRep

Jan 29, 2003

The basic idea of this project comes from the need to be able to do large volumn printing from a program written in python. I made several attempts to do this using XML:FO as well as Python-UNO bindings involving OpenOffice.org as the formatting engine. All attempts basically worked, however not for high volume. Also the python interfacing is either incomplete or simply to slow to deploy for server-based printing.

The only page formatting engine I could find to suite the needs is reportlabs formidable reportlab engine. It's pretty quick and has a decently documented API. Reportlab also is commercial grade (whatever that means :-)), since the company which created it actually sells it off for large scale printing. The only problem I came across is that being a library / engine, reportlab misses usability from a user point of view. If you can embed it with a fixed set of page templates it can be used right out of the box. However if a user ever has to change, add or delete templates he needs a developer to do so.

Use OpenOffice.org as a template designer. This Office Suite is easy to use for anyone . Take the resulting file, which is a zip packed set of xml files, and run it thru a parser which generates python code representing the document. Then you can use the python representation in conjunction with reportlab to generate the actual document.

The OpenOffice.org file is a zip archive. A normal writer file has the followin components:

The two really interesting files are styles.xml and content.xml. From the names the content is pretty obvious, however content.xml also contain styles. If you look at the files with a XML tree viewer (like mlview) the structure is nicely sorted. Also to mention that the tags are ordered in the way the document is constructed, so the styles come first and the actual contents comes last.

To represent the structure in python a class representation of the different tag groups seems a viable option. Therefor it seems best to create classes for styles, fonts, tables, paragraphs and the document itself. To make it more complete images and other advanced document components should find a place also.

Disclaimer: This is just a quickly hacked OO document saved as html. Nothing worse than a SF project without anything. Here is more to come shortly.