Dynamic Layout Support Discussion

Structured Layout Styles for Scribus
This is the discussion page for the Google Summer of Code project I've applied for on supporting dynamic layout styles. Please add any comments, ideas, suggestions, or additional use cases that you might think of for discussion. Thanks! --Michael Koren

Background
I'm currently a PhD student doing my dissertation research on a range of computer science topics, centered around abstract data representation, processing, and interfaces. I started using Scribus to replace WordPerfect shortly after I switched to linux a few years ago, and immediately became intrigued by its potential for exploring some of my ideas in the areas of both visual (layout) and content organization.

My main feature wish in using Scribus has been to somehow integrate its precise, WYSIWYG layout capabilities with the flexibility to rapidly modify, replace, or replicate an entire layout according to saved rules or contraints--in other words, to have dynamic layout styles for structured content. I have been wanting to do a project to implement this capability for a year now, and have been browsing the Scribus code and planning my implementation based on my needs in documents I've created, but so far I haven't had the designated time to devote to it, so I would love the opportunity to participate in it through Google Summer of Code.

Overview
Creating complex, one-time document layouts in Scribus is easy thanks to its precise, yet easy to use frame-based interface. But often it would be desirable to create a carefully-constructed layout style that can be reused for many documents with the same general structure, that can adapt to small variations in content. This would also mean that the same content can be dynamically reformatted by applying a different layout style.

Due to the many different usage possibilities offered by this concept and the many specific features that could be useful, I plan to implement a general framework to support structured layout that can be adapted and extended to many different purposes. I will then implement a few specific applications of it in Scribus. The core of the framework will be a self-contained c++ class library for manipulating abstract structured objects and the relationships between them, designed to be usable by multiple applications in the future.

The system will be based on sequences of objects (in this case, text or image frames in a layout). Each object in a sequence can have its attributes defined relative to the previous object or the parent object (the sequence itself). Thus, frames can be kept equally spaced as their size changes, or distributed equally across the page. Other attributes, such as color, size, and even content, can be set to be the same for an entire sequence also, so that linked object "clones" will also be a possibility.

Finally, sequences can be nested or even overlap, and sequence styles can be defined and applied to any sequence with a compatible structure. Sequence styles can both define the layout (relative positions) of objects and apply specific formatting styles to each object in the sequence, so the entire look of a document can be changed by changing its style.

Initially, I will provide support in Scribus for sequences of frames in a document, by adapting the existing layout code to call my library to find the objects' positions and attributes. Eventually, it could be integrated into the text layout engine as well to provide more fine control over text layouts, even allowing for specific needs such as rendering equations from latex or user-defined sources.

Dynamic layout of objects/frames (to be done in this project)

 * layout templates for specific content or entire documents, usable directly from Scribus or though a content management system interface if developed (see the other Scribus proposal on the subject)


 * structured relative positioning of frames--persistent align and distribute; easily modifying complex charts of many individual frames where many symmetry relationships should be maintained (see my real example at http://bugs.scribus.net/file_download.php?file_id=1919&type=bug). The idea is the whole layout could be determined by a few good layout styles, and then the whole thing could be reformatted to a new look by changing the styles and leaving the content alone. This would be useful for a content management/external content library system as well.


 * object clones with shared attributes, as well as possible variations (a sequence of colors, sizes, etc.), as on the ideas page. I think that this proposal would be a superset of that functionality.


 * making borders around frames using repeated graphic elements (see request at http://www.nabble.com/Text-frame-border-from-clipart-tf3493594.html)

Potential integration with text layout engine (long-term, not for this project)

 * Integration of this framework with the text layout engine, which is a specific case of a structured layout engine, would allow finer control over details of columns, sections, and intelligent spacing around headings and inline objects. I'm thinking section styles here--content logically identified as a section with a heading and a body can have a style applied to it to determine the spacing between headings and sections, and apply specific text styles to each as well.
 * Hmm, I can do that anyway now, it seems; see below --Michael


 * native rendering of equations or Latex/MathML content by applying layout styles to structured input content (see http://bugs.scribus.net/file_download.php?file_id=1924&type=bug at the top for an example idea). This would allow easily reformatting an equation with a new style, etc. [This would also require an import parser to turn structured input (XML, latex, or even just "(3x+2y)^n") into scribus groups and apply the appropriate styles, i.e., "Level 1.5" functionality (see below).]


 * Update: This may be possible (to a very limited extent, with user-created translation filters) even by the end of this project, without modifying the text layouter, by dynamically creating multiple frames in a sequence. See the content-determined layout section below. --Michael


 * a formalized representation of the Scribus layout algorithm. While the core text rendering engine will be optimized for efficient rendering and supporting the most common needs, an abstract layout framework would allow formally defining its behavior (in terms of sequences), so that it could be easily extended for special-case needs, and Scribus layouts could be supported by other programs or implementations. Both the built-in text layouter and user layout styles for frames could be described in the same language to an external program, facilitating interoperability and the long-term reliability of layouts. Also very relevant to content-sharing systems, which could use multiple client interfaces to the same data depending on the function.

Deliverables

 * a general class library for manipulating structured layout elements, ideally independent of Scribus and Qt so it could be used by any Scribus version as well as other applications. It would include:


 * 1) "spacers" - this is my name for objects that define relationships between the properties of other objects, most commonly distance offsets, but also size, rotation, text and color properties, even content. They would be represented as arrows or such on the canvas.
 * 2) * between sibling objects
 * 3) * to a parent object, e.g. anchoring one corner of an object to a place on the page
 * 4) "selectors" - choose the anchor point on an object for a spacer to connect to, e.g., top left corner, center, or (long-term) named points on the object "Update: generally, a selector will take an input object and generate output of a particular type based on some rule, such as choosing an anchor point on a frame. Another use is taking an input text block and outputting a sequence for use by a layout style, as discussed in the section on content-determined layout styles below. --Michael"
 * 5) sequences - these are ordered groups of objects which allow for simply defining sequential relationships between their members. A sequence would consist of objects plus a spacer style which would be applied between successive member objects. E.g., persistent align and distribute: a sequence of frames which remain equally spaced by the gap between them and centered on the page when one of the objects is resized. Sequences could be nested to form arrays or generalized tables, etc., as in my chart example (http://bugs.scribus.net/file_download.php?file_id=1919&type=bug).
 * 6) sequence styles - the big picture. Take any given logically structured content--a simple sequence, a structured equation, a section with a heading, subheading, and body--and define sequence styles for it that can be swapped in and out and reused. Would handle both layout per se and applying formatting styles to each object in the sequence.
 * 7) * styles that can handle any sequence, like align and distribute
 * 8) * styles that depend on a certain logical structure, like a section


 * Scribus GUI integration for parts 3 and 4 above. I would modify or duplicate the linker tool to allow creating sequences on the canvas, and extend the properties dialog to support setting relative properties. The first two use cases named above should be supported.


 * integration of sequences with the Scribus group structure, allowing proper display in the outline, etc.


 * adding support for sequence styles to the existing Scribus style system


 * developing Scribus file format support for sequences and sequence styles

I would not plan to support free, individual spacers (not part of a sequence) or text-layout integration yet in this project, but the framework would support them.

More detailed examples
One of the main strengths of the proposed system is the wide range of applications which it would support. Below are listed some more detailed examples and explanations of some of the use cases that would be possible through this approach, ordered from least to most technically complex to implement.

Level 1: Explicit sequences and content-independent layout styles

 * structured layout relationships applied to preexisting objects; does not involve auto-generating linked text frames
 * object dependencies (sequences) are applied manually by the user on the canvas via a link tool
 * the type of relationships are either entered manually or determined by a style which matches the applied sequence structure
 * common structures can be copied and reused for multiple layouts (maybe saved as templates)

Examples:


 * charts and diagrams


 * Example chart from the implementation section below:


 * [[Image:Layout_schematic.png]]


 * View showing sample sequence and spacer relationships for such a chart:


 * [[Image:Schematic_spacers.png]]


 * (Not all relationships are shown.)


 * Spacers (blue arrows) dynamically preserve relationships between individual objects or sequences of objects with the same relationships (dotted blue lines)
 * Sequences and spacers can be used to:
 * preserve a fixed spacing between frames, even if frame size changes
 * equally distribute spacing among frames in a sequence
 * preserve a given margin between a sequence and an enclosing border (the border frame can snap to fit around the content frames)
 * apply consistent layout and formatting styles to sub-elements in a sequence, to automatically keep consistent margin, spacing, and color schemes for all subsequences
 * preserve equal width or height of all frames in a sequence, or even nested subsequences, so that resizing one automatically updates the rest
 * The applied sequences and relationships in a layout can be saved as a layout style and applied to anything with a compatible structure, so that the same color scheme, margins, and inter-frame spacing values can be used for many charts, or a given chart can be reformatted by applying a new style


 * object clones


 * A spacer can be used to duplicate any or all attributes, including content, of one frame or sequence in another, allowing clones to be made, which can further be aligned in arrays using sequences if desired


 * Any attributes can be left out of the relationship, allowing clones with one or more properties altered
 * The two similar boxes on the left side of the above chart could be clones with the heading frame content and line color changed
 * Attributes can depend on the item number in a sequence, allowing sequentially morphing clones:


 * [[Image:Clone_morph.png]]


 * Note that this is dynamic, like all sequences, so if the source object is changed, the whole sequence updates

Level 1.5: Content-determined layout with fixed text elements (simple text import without soft frame breaks)

 * same as before, only sequence structure, formatting, and layout style application are determined by applying user-defined filters to structured input text
 * engine is not required to create multiple text frames for single content elements based on soft frame breaks
 * good for charts where the content has a natural order, such as numerical data
 * the user creates import filters consisting of selector styles which identify group divisions and apply logical and formatting styles
 * a simple interface would modify the text import dialog and match markup using regular expressions
 * could potentially call external processors to parse existing structured text formats?


 * The selector process could be dynamic, meaning the original content could be changed and sequence structure and layout would change accordingly

Structural versus semantic layout styles

The simplest kind of layout style would create a layout based on the structure of its contents, e.g. an equally spaced array of frames, or a chart style which expects heading and cell frames in a particular structure within the sequence. But it should also be possible for a semantic style, like "section" or "chart," applied to a part of the sequence content to be interpreted by the layout style to support semantically labelled input formats.

Examples:


 * Data tables with a specific format; layout could be easily reused for multiple data sets


 * Simple standalone equation rendering according to user-defined layout rules


 * I listed this as a long-term use case above, but it should actually be possible even without modifying the text engine using this approach. Procedure:


 * Enter equation content as text or markup, e.g. "Integral((x^2+1)/x*dx)"
 * Define a filter (selector) that splits it into a sequence and applies semantic styles: {  {  {  {  {  x,2 }, 1 }, x }, dx } } (the <> denote semantic styles that will be used by the layout style to determine layout).
 * Apply a layout style to the equation content that uses the selector to dynamically determine the layout (each piece of the equation ends up in its own frame).


 * Note that this does not mean a graphical equation editor, nor does it work well for inline equations without text layout integration, but it is a potentially useful possibility --Michael


 * additional possibility--content editing/filtering by import filters, to support alternate text or adding standard content automatically by a style

Level 2: Content-determined layout styles and structured text import with automatic frame breaks (whole document import)

 * this is probably the most requested feature and application of layout styles, namely handling stuctured whole-document import, where the desired structure and layout should be inferred from the structure or metadata in the content. For example, importing an XML document and creating heading and body frames as appropriate, to which layout styles can be applied.
 * again uses user-defined filters to determine structure and apply styles
 * engine must support dynamically creating enough frames to hold flowed text content, broken across page or other specified boundaries
 * (this will probably require repeatedly calling the text layout engine to fill successive frames and finding out where soft breaks occur; see the orange arrow in the block diagram at bottom. This may require added support from the text layout engine, though not much)
 * when the text engine supports fit-frame-to-content, this method will allow within-page structured styling of headings for more control than normal heading styles (see example below). Until then, it will allow handling of sections that start on their own page


 * this level of functionality from my layout engine would allow a lot of simple, but nice text placement features, like full-page-height sidebars with separate stories, without requiring the text layout engine to support (heirarchically) structured content at all*

(*) Eventually, it would be nice to support structured text directly in the text engine, but that is outside of the scope of this project (even besides changing the engine itself and the internal representation of text, there would be other issues, such as displaying structured content in the story editor, that I can't go into this summer).

How it will be implemented

Two methods, the first easier to implement but less powerful. (This section is a little outdated; I plan to support both.)


 * 1) Add the ability to parse imported text, split it into separate frames within a sequence, and apply layout styles to the text importer. This would likely only require a bit more control over regular expression matching from the frontend side over what's currently available, for instance to match and strip closing markup tags. Ideally there would then be an exporter as well to reexport the split-up content as a single text file.
 * 2) Make the process in (1) dynamic, i.e. store the text in its original form and apply the filter dynamically to create a sequence and then apply the layout style. This has the significant advantage that the full content can be edited in the story editor or reimported any time, but adds the implementation complexity that frames and sequence structure can be dynamically created and destroyed based on the content. I would probably treat such dynamic filters as an extension of the planned selector concept (see above).

Examples:


 * allow supporting paragraph, character, and section logical styling in input files, translated to user-definable scribus styles by the parse filter (sequence selector)


 * Note: I can look into specific support for DocBook or other popular formats (enter your favorites in the section below). Most likely only a subset will be directly supportable without Level 3 support from my engine (not this summer :), but the rest may be usable via XSL prefiltering


 * multiple stories in input text, placed into separate panes or otherwise automatically (and dynamically) according to layout styles


 * more intelligent support for setting space before and after paragraph styles and headings in a single story, depending on section context (will require features of the new text layout system). At least for me, the current system is extremely clumsy due to lack of context information.

Level 3: Advanced features--positionally linked stories, footnotes, "intelligent" adaptive layout, integration with the text layout engine
There are more possibilties to explore, but not for now. ;-)

Discussion: external content and styling languages to support (import/export)
Please list any existing content or stylesheet languages for which you would be interested in seeing import/export support. I won't necessarily get to writing filters for (any/all?) of them this summer, but I will look at the features they support to try and make sure my implementation is compatible if possible. I've listed a few mentioned on the mailing list to start with.

Content formats

 * Docbook
 * (La)tex source/equations (see below)
 * generic XML
 * MathML source
 * XSL-FO? (see comments on discussion page)
 * others...?

Stylesheet languages/engines

 * XSLT
 * CSS
 * TeX layout engine/Latex styles, etc. :)
 * Framemaker styles
 * MathML processor
 * GUI widget layout systems (for ideas...)
 * others...?

Implementation
All code would be C++. Outline of class library functionality:


 * classes for spacers, selectors, sequences, and sequence styles
 * methods to return objects referenced by a spacer or selector object, specified elements in a sequence, etc.
 * methods to return the absolute position and attributes of objects determined by spacers or sequences, so that they can be rendered using the existing rendering code with just a hook to call my code
 * methods to validate a style against content for styles that require a particular content structure
 * all references to individual objects and styles would be generic (though templates/subclassing/other...?) so that the library need not know the details of the attributes supported by a Scribus object (I would write the glue code for the specific applications in Scribus outlined above; see schematic diagram and discussion below)
 * ideally, sequences would be able to intersect, so that different properties (x and y position, size, rotation, etc.) could be controlled by different sequences. However, I would probably also support constraints to disallow this or other more complex features to support use in simpler apps. I would likely not implement the more complex cases for this project, but design the API to allow them.

Architectural schematic
Here is a high-level conceptual block diagram of the proposed system.




 * items in red indicate what I will add
 * arrows denote flow of dependent functionality, e.g., UI elements that update objects in sequences will call the sequence manager to process the changes
 * the dashed orange arrow from the text layout engine is only needed for Level 2 layouts with multiple generated text frames for a single text passage, where the sequence manager will repeatedly call the text engine to determine the needed number of frames*

(*) Long content, like most stories, must flow across pages and, hence, will have a variable number of frames. In other words, a section consisting logically of a heading and a body would generate a heading frame and a body frame handled by a layout style, but actually the style would have to create n linked body frames across pages as appropriate.

Layout engine attribute independence and application-side attribute dependency rules
In order to be most generic for use with many projects and for ease of updating, I plan to keep the dependency-tracking logic that handles the relationships between objects and attributes independent of the actual attributes supported by Scribus, such as frame position and size, color, content, etc. That means that a table of available attributes and their possible relationships must be maintained and provided by the application, as shown by the left red box in the diagram.

Properties needed for each attribute - at least data type (integer [e.g., position], name [e.g., color], string [text content], etc.) and non-independence from other attributes -- e.g., left frame edge, right frame edge, and width are not independent


 * Considerations:


 * attributes with many possible views, such as position, which is a vector with different coordinates in rotated directions, or color could be specified by either name or numerical representation in a color space
 * should it be up to the developer to specify allowed attribute relationships seen by the layout engine, or should it be automatic based on the data types and inherent relationships?
 * some relationships may need to be between more than two objects at once; support plugin external relationship handlers for special functions (multi-frame optimization placement ...)?
 * - what is the best data structure to use for attribute rules to support all this?
 * - these attribute rules are closely related to spacers, which define user-specified relationships between objects/attributes; maybe attribute rules then just define data types and built-in "spacer" relationships (and external plugins could define fancy spacer types)...
 * - it would be nice if the spacer/attribute rule implementation could be decoupled as much as possible from the dependency engine, so the latter wouldn't even need to know details of their structure