Dynamic Layout Support Discussion

Structured Layout Styles for Scribus
This is the discussion page for the Google Summer of Code project I've applied for on supporting dynamic layout styles. Please add any comments, ideas, suggestions, or additional use cases that you might think of for discussion. Thanks! --Michael Koren

Background
I'm currently a PhD student doing my dissertation research on a range of computer science topics, centered around abstract data representation, processing, and interfaces. I started using Scribus to replace WordPerfect shortly after I switched to linux a few years ago, and immediately became intrigued by its potential for exploring some of my ideas in the areas of both visual (layout) and content organization.

My main feature wish in using Scribus has been to somehow integrate its precise, WYSIWYG layout capabilities with the flexibility to rapidly modify, replace, or replicate an entire layout according to saved rules or contraints--in other words, to have dynamic layout styles for structured content. I have been wanting to do a project to implement this capability for a year now, and have been browsing the Scribus code and planning my implementation based on my needs in documents I've created, but so far I haven't had the designated time to devote to it, so I would love the opportunity to participate in it through Google Summer of Code.

Overview
Creating complex, one-time document layouts in Scribus is easy thanks to its precise, yet easy to use frame-based interface. But often it would be desirable to create a carefully-constructed layout style that can be reused for many documents with the same general structure, that can adapt to small variations in content. This would also mean that the same content can be dynamically reformatted by applying a different layout style.

Due to the many different usage possibilities offered by this concept and the many specific features that could be useful, I plan to implement a general framework to support structured layout that can be adapted and extended to many different purposes. I will then implement a few specific applications of it in Scribus. The core of the framework will be a self-contained c++ class library for manipulating abstract structured objects and the relationships between them, designed to be usable by multiple applications in the future.

The system will be based on sequences of objects (in this case, text or image frames in a layout). Each object in a sequence can have its attributes defined relative to the previous object or the parent object (the sequence itself). Thus, frames can be kept equally spaced as their size changes, or distributed equally across the page. Other attributes, such as color, size, and even content, can be set to be the same for an entire sequence also, so that linked object "clones" will also be a possibility.

Finally, sequences can be nested or even overlap, and sequence styles can be defined and applied to any sequence with a compatible structure. Sequence styles can both define the layout (relative positions) of objects and apply specific formatting styles to each object in the sequence, so the entire look of a document can be changed by changing its style.

Initially, I will provide support in Scribus for sequences of frames in a document, by adapting the existing layout code to call my library to find the objects' positions and attributes. Eventually, it could be integrated into the text layout engine as well to provide more fine control over text layouts, even allowing for specific needs such as rendering equations from latex or user-defined sources.

Dynamic layout of objects/frames (to be done in this project)

 * layout templates for specific content or entire documents, usable directly from Scribus or though a content management system interface if developed (see the other Scribus proposal on the subject)


 * structured relative positioning of frames--persistent align and distribute; easily modifying complex charts of many individual frames where many symmetry relationships should be maintained (see my real example at http://bugs.scribus.net/file_download.php?file_id=1919&type=bug). The idea is the whole layout could be determined by a few good layout styles, and then the whole thing could be reformatted to a new look by changing the styles and leaving the content alone. This would be useful for a content management/external content library system as well.


 * object clones with shared attributes, as well as possible variations (a sequence of colors, sizes, etc.), as on the ideas page. I think that this proposal would be a superset of that functionality.


 * making borders around frames using repeated graphic elements (see request at http://www.nabble.com/Text-frame-border-from-clipart-tf3493594.html)

Potential integration with text layout engine (long-term, not for this project)

 * Integration of this framework with the text layout engine, which is a specific case of a structured layout engine, would allow finer control over details of columns, sections, and intelligent spacing around headings and inline objects. I'm thinking section styles here--content logically identified as a section with a heading and a body can have a style applied to it to determine the spacing between headings and sections, and apply specific text styles to each as well.


 * native rendering of equations or latex content by applying layout styles to structured input content (see http://bugs.scribus.net/file_download.php?file_id=1924&type=bug at the top for an example idea). This would allow easily reformatting an equation with a new style, etc. [This would also require an import parser to turn structured input (XML, latex, or even just "(3x+2y)^n") into scribus groups and apply the appropriate styles.]


 * Update: This may be possible even by the end of this project, without modifying the text layouter, by dynamically creating multiple frames in a sequence. See content-determined layout section below. --Michael


 * a formalized representation of the Scribus layout algorithm. While the core text rendering engine will be optimized for efficient rendering and supporting the most common needs, an abstract layout framework would allow formally defining its behavior (in terms of sequences), so that it could be easily extended for special-case needs, and Scribus layouts could be supported by other programs or implementations. Both the built-in text layouter and user layout styles for frames could be described in the same language to an external program, facilitating interoperability and the long-term reliability of layouts. Also very relevant to content-sharing systems, which could use multiple client interfaces to the same data depending on the function.

Deliverables

 * a general class library for manipulating structured layout elements, ideally independent of Scribus and Qt so it could be used by any Scribus version as well as other applications. It would include:


 * 1) "spacers" - this is my name for objects that define relationships between the properties of other objects, most commonly distance offsets, but also size, rotation, text and color properties, even content. They would be represented as arrows or such on the canvas.
 * 2) * between sibling objects
 * 3) * to a parent object, e.g. anchoring one corner of an object to a place on the page
 * 4) "selectors" - choose the anchor point on an object for a spacer to connect to, e.g., top left corner, center, or (long-term) named points on the object "Update: generally, a selector will take an input object and generate output of a particular type based on some rule, such as choosing an anchor point on a frame. Another use is taking an input text block and outputting a sequence for use by a layout style, as discussed in the section on content-determined layout styles below. --Michael"
 * 5) sequences - these are ordered groups of objects which allow for simply defining sequential relationships between their members. A sequence would consist of objects plus a spacer style which would be applied between successive member objects. E.g., persistent align and distribute: a sequence of frames which remain equally spaced by the gap between them and centered on the page when one of the objects is resized. Sequences could be nested to form arrays or generalized tables, etc., as in my chart example (http://bugs.scribus.net/file_download.php?file_id=1919&type=bug).
 * 6) sequence styles - the big picture. Take any given logically structured content--a simple sequence, a structured equation, a section with a heading, subheading, and body--and define sequence styles for it that can be swapped in and out and reused. Would handle both layout per se and applying formatting styles to each object in the sequence.
 * 7) * styles that can handle any sequence, like align and distribute
 * 8) * styles that depend on a certain logical structure, like a section


 * Scribus GUI integration for parts 3 and 4 above. I would modify or duplicate the linker tool to allow creating sequences on the canvas, and extend the properties dialog to support setting relative properties. The first two use cases named above should be supported.


 * integration of sequences with the Scribus group structure, allowing proper display in the outline, etc.


 * adding support for sequence styles to the existing Scribus style system


 * developing Scribus file format support for sequences and sequence styles

I would not plan to support free, individual spacers (not part of a sequence) or text-layout integration yet in this project, but the framework would support them.

Deficiencies in the original proposal and proposed solutions
The method for creating sequences I descibed above using a linker-like tool does not support probably the most common application and requested feature of layout styles, namely handling stuctured text import, where the desired structure and layout should be inferred from the structure or metadata in the content. For example, importing an XML document and creating heading and body frames as appropriate to which a layout style can be applied.

Eventually, it would be nice to support structured text directly in the text engine, but as that is outside of the scope of this project (even besides changing the engine itself and the internal representation of text there would be other issues, such as displaying structured content in the story editor, that I can't go into this summer), here are two methods that I think will work based on what I am planning to implement:


 * 1) Add the ability to parse imported text, split it into separate frames within a sequence, and apply layout styles to the text importer. This would likely only require a bit more control over regular expression matching from the frontend side over what's currently available, for instance to match and strip closing markup tags. Ideally there would then be an exporter as well to reexport the split-up content as a single text file.
 * 2) Make the process in (1) dynamic, i.e. store the text in its original form and apply the filter dynamically to create a sequence and then apply the layout style. This has the significant advantage that the full content can be edited in the story editor or reimported any time, but adds the implementation complexity that frames and sequence structure can be dynamically created and destroyed based on the content. I would probably treat such dynamic filters as an extension of the planned selector concept (see above).

Structural versus semantic layout styles
The simplest kind of layout style would create a layout based on the structure of its contents, e.g. an equally spaced array of frames or a chart style which expects heading and cell frames in a particular structure within the sequence. But it should also be possible for a semantic style, like "section" or "chart," applied to a part of the sequence content to be interpreted by the layout style to support semantically labelled input formats.

Questions
My main question with this at the moment is how to handle long content, like most stories, that must flow across pages and, hence, have a variable number of frames. In other words, a section consisting of a heading and a body would result in a heading frame and a body frame handled by a layout style, but actually it would have to create n linked body frames across pages as appropriate.

Import/export of external content and styling languages
Please list any existing content or stylesheet languages for which you would be interested in seeing import/export support. I won't necessarily get to writing filters for (any/all?) of them this summer, but I will look at the features they support to try and make sure my implementation is compatible if possible. I've listed a few mentioned on the mailing list to start with.

Content formats

 * Docbook
 * (La)tex source/equations (see below)
 * plain XML
 * MathML
 * others...?

Stylesheet languages/engines

 * XSL
 * CSS
 * TeX layout engine/Latex styles, etc. :)
 * Framemaker styles
 * others...?

Why this will allow native equation formatting after my project
I listed this as a long-term use case above, but it should actually be possible even without modifying the text engine using this approach. Procedure:


 * 1) Enter equation content as text or markup, e.g. "Integral((x^2+1)/x*dx)"
 * 2) Define a filter (selector) that splits it into a sequence and applies semantic styles:  {  {  {  {  {  x,2 }, 1 }, x }, dx } } (the <> denote semantic styles that will be used by the layout style to determine layout).
 * 3) Apply a layout style to the equation content that uses the selector to dynamically determine the layout (each piece of the equation ends up in its own frame).

Implementation
All code would be C++. Outline of class library functionality:


 * classes for spacers, selectors, sequences, and sequence styles
 * methods to return objects referenced by a spacer or selector object, specified elements in a sequence, etc.
 * methods to return the absolute position and attributes of objects determined by spacers or sequences, so that they can be rendered using the existing rendering code with just a hook to call my code
 * methods to validate a style against content for styles that require a particular content structure
 * all references to individual objects and styles would be generic (though templates/subclassing/other...?) so that the library need not know the details of the attributes supported by a Scribus object (I would write the glue code for the specific applications in Scribus outlined above)
 * ideally, sequences would be able to intersect, so that different properties (x and y position, size, rotation, etc.) could be controlled by different sequences. However, I would probably also support constraints to disallow this or other more complex features to support use in simpler apps. I would likely not implement the more complex cases for this project, but design the API to allow them.