Draft of end-to-end publishing solution

= Use cases =

As a start it would be good if users described their use cases, e.g. Matt Donnelly:

I work for a wholesale pet supply company that publishes print catalogs in addition to several websites. We have no workflow, no content management system (CMS), no wikis, and the editors use Word while the designers use InDesign. We're a growing company that struggles to communicate well with each other as well as with our customers.

I'd think it would be terrific to find some way to glue together Scribus, OpenOffice, Gimp, a CMS app, a wiki, etc. to create an end-to-end publishing solution that takes companies through the entire catalog creation lifecycle and gets new info on/from the websites fast. Fold in Web 2.0, and it's a goal.

The closest we've come is the idea of using InCopy for editorial and InDesign for layout. This would at least get editors and designers on the same (virtual) page. But it doesn't solve the problem of effectively sharing information, workflow, modular content that can be reused across catalogs/websites, etc. Microsoft is trying to do some of this with SharePoint/Office integration...'' ''

Use Case #1 (Matt Donnelly)
The idea is to get all the information in one place and massaged as quickly and efficiently as possible. There would be workflow engines/CMS in the process to make sure materials are routed properly, quickly, and efficiently. This would also identify bottlenecks, send alerts, etc.

Now our system works this way:

1. We get product information (on paper ) from merchandisers, i.e., those who buy the products. These sheets include photos, product id numbers, data like color, sizing, etc., buyer name, etc.

2. The copywriters (in a different department) take that information and (sometimes using old copy as quasi-templates) create different copy for that sku for different audiences and locations: best buys, insection, dealer, label, etc. This is all written in Word and a copy is saved to the server in a folder.

3. Then the Word documents are printed out, put in folders, and routed to fact checkers and proofreaders

4. Meanwhile, designers are laying out (in InDesign) pages with product images and placeholders for copy

5. The edited copy sheets are then brought back to the original copywriter, who keys in the changes, then marks the finished docs with the color green and writes in each "OK TO PLACE". Sometimes the copy has to route a second time if the changes were extensive.

6. The designers periodically check the server for "green" files and flow them into the placeholders (this is where some thought that giving InCopy-like products to copywriters would shorten the process)

7. Sometimes the text is too long or too short, so the designer goes back to the copywriters for cuts

8. When the page proofs are done (text/images and other assets like bursts/promotions), they are printed out and routed in clear plastic folders (!) so they can be proofed on paper by 4-5 people

9. Once they proofs are done, the designers make the (hopefully) final changes

10. Usually in each production cycle, products are added at the last minute, others are dropped (or colors, sizes are dropped), and some copy is missing, to be written in haste at the 11th hour

This is a rough sketch. Some obvious needs come out of this:

1A. Tracking documents throughout their lifecycle (from docs to text on a page, etc.)

1B. Version control

2. Making sure all assets for each product stay with it (metadata?)

3. Finding a way (a la InCopy and InDesign) for editors to edit copy even in layout, while designers stick to design

4. A workflow tool to track edits, comments, etc.

5. A good reporting tool on the back end to measure efficiency and uncover bottlenecks

Then there are internal issues with any company that we should address in the design:

1. How can we create a system that even technology-fearful editors and designers will use?

2. What commercial products are out there -- and why would a company choose an open source solution? Why would it make good business sense?

3. Would it be worthwhile to produce an open source framework with "adapters" (APIs) flexible enough to interface with existing commercial products like InDesign? (Put another way: How could companies use their existing software/hardware investments with such a system -- without feeling like we'd be asking them to throw it away?)

More could be said, but that's the gist ;-)

--Matt from Boston

Use Case #2 (Ludwig M. Solzen)
Some preliminary ideas on how this might work

1. Setting up the workflow. — The CMS administrator (for each workflow) designs the data structure, according to the particular needs of the organisation or project. Generally, this is but defining the proper entries, fields, attributes, their assets and validation. He could do this by means of an intuitive graphical application; from this design XML schemes, DTDs and the database lay-out is generated. The entries, fields etc. are collected and visually depicted as reusable or generic elements, and travel with all users within the project, in each application, feasibly using some sort of widget.


 * COMMENTS (Matt Donnelly): What would the graphical application look like? Would it make sense to post some sketches for review? I like the idea of "generic elements" -- like assets or modules. (In SharePoint, they use the image of books in a library.) A lot of copywriting would be easier if there could be things like template libraries...


 * COMMENTS (Timo Stollenwerk): Why do you want to design further abstraction layers like a graphical application and a database scheme to define your data structure. A CMS can provide interfaces to generate a scheme and then import and export it as xml or whatever you like. All you have to do is define what content types can contain other content types (magazines can contain articles, articles can contain headlines, paragraphs, images, etc.).

2. Designing the forms. — Designers use the application of their choice to respectively create web forms, PDF forms (or yet another filing tool). They pick-up the available elements from the widget, by simple drag-and-drop into the application, after which the element is shaped graphically. E.g. the administrator created an entry "product", which has attributes "product name", "price", "color", "description". The designer drags the "product"-icon into Scribus and finds that he has to shape the elements "product name", "price", "colour" and "description". He thus determines font, position, color etc. Since he is not creating the final catalogue template, but a PDF form, the application knows (through the CMS) that the designer also has to shape form fields, so that he is presented with those.


 * COMMENTS (Matt Donnelly): Could OpenOffice Base (or the KDE equivalent) be used to create these forms? The key is to keep everything in one place and constantly enrich an item's associated metadata (which catalogs it was in, page numbers, version history, etc.). I think InfoPath is what Micro$oft uses??

3. Collecting the data. — Users (data providers, copy writers) could input the data by several means: PDF forms, webpage formulars, stand-alone text editors or online editing applets, or even a custom-made GUI stand-alone application. The data is extracted (PDF formulars) or directly routed to the databases, using an XML scheme. The XML validator tells the user if his data is compliant (min/max amount of characters, image quality etc.), if not, rejects the submission and forces the user to re-edit the content.


 * COMMENTS (Matt Donnelly): XML would work nicely. For the end user, this could be invisible. That's key for editors and designers who have an inordinate fondness for paper ;-)

4. Designing the templates. — Similar to [2] the designer creates the final template; the XML scheme tells him about the features of an element (length, least, highest and intermediate amount of characters, etc.) so that he takes this into account while designing and positioning the elements.


 * COMMENTS (Matt Donnelly): Good points. A template library -- and, within that, even a copy library -- would be ideal. No sense starting from scratch ever time, especially since some text is repeated across catalogs with only minor variations.

5. Proofing the data. — At each occasion the CMS provides all users and applications involved with the most up-to-date content and templates on the server. Users have respectively read-only or edit access to the data, according to the rights that are accredited to their account. Proofreaders may edit the data using a web application that uses the data from the server, or by using editable PDF forms. Designers could have Scribus generate a draft from the final document by applying the current data to the template; Scribus collects the XML and flows the data into the design elements, adding extra pages when necessary.


 * COMMENTS (Matt Donnelly): I like the image of a library where someone checks out/checks in a book. Is there any way to replicate the interaction between InCopy and InDesign, where designers and editors can work in tandem on the same page at the same time? Or is there a better way?


 * Anyway, thanks, Ludwig, for enriching the discussion. Anyone else? -- Matt

Ludwig

P.S. This is my first contribution and I'm not familiar with Wiki's. Please excuse me if I should have put this elsewhere.


 * [EDIT: I've seen that there are some publishers using Scribus: http://www.tuxmachines.org/node/11855. Would any of them be interesting in working with us on this project? --Matt]

General Idea
Developing a publishing workflow solution for the production of a newspaper or magazine. Please also have a look at the Draft for the GSoC Application Abstract (Timo Stollenwerk).

CMS Component
All required content is stored in the Content Management System (text, high resolution images, etc.). The CMS provides all necessary content types to model a print publication. The User can add a magazine issue, categories, articles and images through web interfaces. Magazine articles can be sorted and placed into predefined categories. Article objects can inherit headlines, subheadlines, subheadings, text-flow, tables and images in any number or order.

Features

 * workflow support (CMS, e.g. Plone)
 * versioning (CMS, e.g. Plone)
 * lock articles that are currently edited by a content editor (CMS) or a designer (Scribus) (WebDav, or CMS e.g. Plone/Zope)

Possible Features

 * choose predefined layout templates for articles for the initial checkout to Scribus.
 * adaptive layout templates
 * import/export magazines or articles in XML (e.g. docbook)
 * support for the storage of fonts in the cms

Scribus
A magazine issue or article can be fetched either through a web browser or by a WebDAV plug-in inside Scribus.

Through the Web
CMS such as Plone have build-in-support for through the web editing. Files are locked while opened in Scribus. After working with the file, it is written back into the CMS. This can be done by uploading the file manually or by a python script within Scribus.

WebDav Plug-in
Enhance Scribus to operate as a WebDav client so that files on a WebDAV server can be opened and saved through the Scribus interface. WebDav clients are available in several programming languages (including C++ and Python).

The use of WebDAV has many benefits for Scribus that goes beyond this proposed solution. WebDAV can be used in any collaborative environment. Most modern operation systems (Windows, Linux, Mac OSX) provide built-in support for WebDAV. Commercial products as Photoshop and Dreamweaver have WebDAV support (The Inkscape folks are also thinking about it). Apache and Subversion can act as WebDAV Servers as well as several CMS.

WebDAV Features:


 * locking (overwrite prevention)
 * properties (creation, removal, and querying of information about author, modified date, etc.)
 * name space management (ability to copy and move Web pages within a server's namespace)
 * collections (creation, removal, and listing of resources)

As a proof-of-concept I wrote two python scripts (WebDAVOpenDocument.py, WebDAVSaveDocument.py) using the Scribus python api and Davclient to fetch and save Scribus files from a WebDAV server (tested with Apache and Zope/Plone).

Maybe Scribus will soon support WebDAV out-of-the box, see bugs 1693 and 1924

Example Workflow
The first step in the publishing workflow is to define what content elements a publication should contain, this can be done by uploading a docbook file (with content) or by hand through the CMS interface. After that an editor can fill in content:



A designer can now create a template with copyholes for the content defined in the cms. (If such a template is not defined, the cms only generates a plain scribus file containing only content elements.)



If the content is in place and a template is uploaded to the cms, the output could look like this:



After manual adding more content (image, image description) and some layout changes the end product could look like this:



Every content change in Scribus is written back into the CMS. If content elements are added, the CMS automated generates a content field (e.g. if an image is added to an article, it is uploaded to the cms and an image field is added to the article content type)

Unnamed Styles
How to display and keep unnamed styles in the CMS?


 * tweaking the space between two letters
 * tweak linebreaking
 * fit text into a given space
 * sometimes users want to change formatting without defining a new named style
 * changing the tracking in order to fit a sequence of words on one line
 * use a different vertical space before or after a paragraph

  Hello World! This is a test document for the new Scribus file format. This block is aligned to the left  and this one is justified but inherits all other properties of the same paragraph style.

Linking of Elements between the CMS and Scribus
Use itemnames and/or item attributes to link sla elements to CMS contents.

Deliverables/ToDo

 * Python script or C++ plugin to enhance Scribus to act as webdav client. It should be possible to open and save scribus documents (with external images) on the server.
 * XML processing engine to read, write and process the scribus file format. All content elements should be accessable while keeping predefined styles and the layout untouched.
 * CMS extension for Plone to create, edit and manage Scribus document content. Make it possible to create and manage a magazine structure. Implement simple workflow and locking of documents.

Whishlist

 * make it possible to import/export xml structured documents like docbook

Transforming the Scribus File Format (outdated)
The CMS can read and generate XML-based Scribus files and can change the content of these files through XSLT Stylesheets. The general idea is to transform a scribus .sla file (scribus-out.sla) into a xml representation of the content (cms-in.xml) that can be easily accessed by the cms component. After editing this representation of content (cms-out.xml) within the cms, the edited content is written back into the original Scribus file (scribus-out.sla + cms-out.xml -> scribus-in.sla), leaving the whole scribus layout untouched:


 * This is quickly becoming outdated as the new format is taking hold in the new text engine. Maybe using the native xml would be the best way already Malex 23:13, 16 March 2007 (CET)

+-+                  ++ | SCRIBUS FILE: scribus-out.sla       |                   | CONTENT REPRESENTATION: cms-in.xml     | +-+                  ++ |                                     |                   |                                        | |  |                   |  | |                          |                   |        | |                        |  sla2docbook.xsl  |                          | |               |-->|     My Title            |-+ |                       |                   |                         |                     | |                         |                   |                              |                     | |                   |-+         |     My Text               |                     | |                                    |         |         |                             |                     | |                                     |         |         |                             |                     V |                                     |         |         |                                        |          +---+ +-+        |         ++          |                       |                                                 |                                                             |  Edit content within  | |                                                            |      the Content      | |                                                            |   Management System   | +-+        |         ++          |                       | | SCRIBUS FILE: scribus-in.sla        |         |         | CONTENT REPRESENTATION: cms-out.xml    |          +---+ +-+        |         ++                      | |                                     |         |         |                                        |                      | |  |         |         |  |                      | |                          |         V         |        |                      | |                        |  docbook2sla.xsl  |                          |                      | |      <ITEXT CH="New Text"/>        |<--|     New Title           |<-+ |    </PAGEOBJECT>                   |                   |                         | |  </DOCUMENT>                       |                   |                              | | </SCRIBUSUTF8NEW>                  |                   |     New Text              | |                                    |                   |                             | |                                     |                   |                             | |                                     |                   |                                        | +-+                   ++

As a proof-of-concept I wrote the xsl-stylesheets (sla2docbook.xsl and docbook2sla.xsl) for a magazine article (using the docbook representation). I'm currently working on the implementation of a simple cms-component to edit the xml files.

= Implementation =

I propose a separation in four sections:

CMS / Database
Do we want to choose one system, eg. Exist, MySql, Zope or whatever or do we want to be able to use different CMS / database systems? Database connections are already well normed, but CMS APIs would need some work to adapt.

What do we want to store in the CMS? Just text and images? Or do we also want workflow support? What about versioning?


 * [Based on the use cases given, I would say that workflow and versioning support is are absolute must --BenjaminGreen]


 * [EDIT: Here are a few commercial products that might be useful as reference: http://www.apsiva.com/, http://www.technicon.com/solutions_catalog.html


 * These raise an interesting question about the relationship between print and web catalogs. It seems that the print catalogs can be generated as a subsection of the web site/web catalog. --Matt]

Scribus
No real choice here! :-)

The work would be to allow external linking to text and maybe provide a mechanism to pull content from the CMS automatically.

Another nice idea would be if Scribus could save its docs, templates, scrapbook and styles in an XML database like eXist.

Content Editors
Plain text, OOo, LyX, ...

Scribus already supports import from plain text, html, OOo and others. It would be nice to have an import plugin for XML+CSS, with an option to substitute CSS fonts and colors automatically.


 * [EDIT: I guess having Scribus be able to manage XML data sources is the main asset at stake here. It really shouldn't matter which input tool was used, or how and where the data was saved, as long as it is provided through a validated XML feed. Image paths should be included in this XML feed as well, not separately linked to. Scribus would be used as a template design app, with XML-tag placeholders. A source path (either locally or on a server) is applied to the template and Scribus generates the final document, by flowing the validated data into the template. I think it would be useful to look at ConTeXt. Perhaps Scribus and the ConTeXt/TeX developers could collaborate in that Scribus might come to generate TeX document classes (templates), and, vice versa, ConTeXt XML data would be used natively within Scribus text frames. — Ludwig]


 * I'm not familiar with ConTeXt/XML. Could you elaborate? - avox


 * [ConTeXt is probably the most advanced TeX distribution, natively supports 16-bit Unicode encoding, and integrates with XML workflows. ConTeXt uses XML feeds to import database content material, while automating the typesetting according to predefined templates. There is a ConTeXt wiki, here. XeTeX on the other hand also supports Unicode and has a fully operational OpenType engine, so that OT features are applied. Both the teams of ConTeXt and XeTeX are planning to work together on a new full-featured TeX. I think it would be a major leap forward, if these TeXers and the Scribus team would collaborate, or at least would exchange thoughts and strategies. Scribus could benefit from the code already done in ConTeXt for XML-import and batch processing, while ConTeXt could use, in a way, the WHYSIWIG GUI that Scribus offers. I'm sure Hans Hagen, ConTeXt's main developer, would be interested in collaboration of any kind. Here is his website. — Ludwig]


 * [Osnews:
 * "Quite some time ago I was searching for the possibility to have a server side storage for documents, and to work on them together with others. Sure, in principal you can realize that with KOffice and kparts where you access and store all data over ssh on a server. But first of all that is not supported by OpenOffice, and it would be cool to have this also with some kind of web interface to check the versions of the documents, etc. So I continued my search - and found PengYou."


 * http://www.osnews.com/story.php?news_id=17135
 * http://liquidat.wordpress.com/2007/02/01/collaborative-work-with-openoffice-pengyou/
 * http://www.pengyou-project.info/en/index.php


 * Ludi]


 * [PengYou really seems nice, it uses the WebDav protocol, so if scribus can act as WebDAV client the integration of PengYou should not be a problem. See my proposal timo].

Image Editors
InkScape, Gimp, Cinepaint, Xara, ...

Once those *all* support Colour Management ( and spot colours / CMYK depending on workflow) it's a mere question of linking those apps to the CMS, and also providing sensible warnings about changes to images (eg changes in size) and providing update options accordingly (eg keep current scale or keep current size in Scribus)

Related Feature Requests

 * #3100: Enhancements to integrate Scribus as an Engine for Content Management Systems

Links

 * An example for a FOP/Batik based publishing solution
 * Open Source Geospatial Foundation offers interesting software for data based production of maps. Maybe some of it can be used in this context.