Scribus XML using Scripter

One of the issues that very quickly was noticeable with using a modified SLA file as in Scribus files as XML was the realization that PAGEOBJECTS in an SLA file are appended at the end of the file as they are created. There is an attribute, OwnPage, that tells which page the object is placed on. Even though there might be a trend to add objects in sequence with the pages, it isn't unusual sometime near the end of the layout process to add something up front. This can be overcome (most likely), but requires a lot more logic in the XSL file.

Scripter, of course has a LOT of access to document and object information, thus this idea was born. You might also be interested in Scribus xhtml using Scripter.

export2xml.py
This is a work-in-progress, so expect it to change as new features are added.

2013-11-4: I have now incorporated position attributes, but not using them yet in the XSL below.
 * 1) !/usr/bin/env python
 * 2) File: export2xml.py - Extracts the content from a document, saving to an xml file
 * 3) 2013.10.29 Gregory Pittman
 * 4) This program is free software; you can redistribute it and/or modify
 * 5) it under the terms of the GNU General Public License as published by
 * 6) the Free Software Foundation; either version 2 of the License, or
 * 7) (at your option) any later version.

import scribus

def exportText(textfile): page = 1 pagenum = scribus.pageCount T = [] content = [] T.append('\n') T.append('\n') T.append(' \n') T.append(' Scribus ' + str(scribus.scribus_version_info[0]) + '.' + str(scribus.scribus_version_info[1]) + '.' + str(scribus.scribus_version_info[2]) + str(scribus.scribus_version_info[3]) + ' \n') while (page <= pagenum): scribus.gotoPage(page) pagesize = scribus.getPageSize d = scribus.getPageItems strpage = str(page) T.append('\n') for item in d:           Xpos, Ypos = scribus.getPosition(item[0]) Xpos = str(int(Xpos)) Ypos = str(int(Ypos)) if (item[1] == 4): filtered = ' ' contents = scribus.getAllText(item[0]) if (contents in content): contents = '' else: for char in contents: if (ord(char) == 28): char = ' ' elif (ord(char) == 27): char = chr(10) elif (ord(char) == 38): char = '&amp;' elif ((ord(char) == 13) or (ord(char) == 10)): char = " " filtered = filtered + char contents = filtered T.append('' + contents + ' \n') T.append(' \n') content.append(contents) elif (item[1] == 2): imgname = scribus.getImageFile(item[0]) imgsize = scribus.getSize(item[0]) picwidth = int(700 * imgsize[0]/pagesize[0]) T.append('' + imgname + ' \n') T.append(' \n') page += 1 T.append(' \n') output_file = open(textfile,'w') output_file.writelines(T) output_file.close endmessage = textfile + ' was created' scribus.messageBox("Finished", endmessage,icon=0,button1=1)

if scribus.haveDoc: textfile = scribus.fileDialog('Enter name of file to save to', \                                 filter='Text Files (*.xml);;All Files (*)') try: if textfile == '': raise Exception exportText(textfile) except Exception, e:       print e

else: scribus.messageBox('Export Error', 'You need a Document open, and a frame selected.', \                      icon=0, button1=1)

And here is an example of the XML you get from this:   Scribus 1.4.3 /home/gregp/.scribus/scrapbook/main/Image1/galapagos1.jpeg /home/gregp/.scribus/scrapbook/main/Image2/galapagos2.jpeg  3 May. Bistritz.--Left Munich at 8:35 P.M., on 1st May, arriving at Vienna early next morning; should have arrived at 6:46, but train was an hour late. Buda-Pesth seems a wonderful place, from the glimpse which I got of it from the train and the little I could walk through the streets. I feared to go very far from the station, as we had arrived late and would start as near the correct time as possible. I found my smattering of German very useful here, indeed, I don't know how I should be able to get on without it. /home/gregp/.scribus/scrapbook/main/Image4/galapagos3.jpeg

The starting point was to generate pseudopages (this is all just one big webpage) by using the table tags to map out a space. By using page dimensions, a proportionally-sized area could be made based on an arbitrary width of 900px, with the plan to have additional tables inside for objects.

Some initial issues
Similar to the overall object-ordering problem in a SLA file, each page numbers frames as you add them, so as you make a list of page objects, this order is created, but this is at least a smaller problem than ordering objects from many different pages. My plan is to use the getPosition data on objects to then sort them – after that, the trick is to use some positioning method. Presently, I still haven't decided whether to use CSS or some other method to try to semi-accurately position objects on the table page space.

In contrast to using an SLA file, the output from Scripter with text frames may not be XML-compliant. For example, the '&' character comes out as that character. The answer is to send it to XML as &amp;amp; instead. I suspect I will discover other characters that need similar treatment.

Another issue is tabs, which come out as a Ctrl-something-inscrutable character. So far, I have simply deleted these.

Next is the issue with CR/nextline characters, which come out as ASCII 10 or 13. Taking a cue from the SLA files, I instead create a para tag to handle this, to create p tags.

At this point, the script works, but only handles text and image frames. No usage of style information is present, as of yet. It does work with both Scribus 1.4.3 and 1.5.0svn.

scribus.xsl
This is not the same as the one used to convert the modified SLA/XML files, so watch the naming not only here, but also in the script above, and change as needed. Note that the background of the webpage is a grey, while the background of the document "pages" is white.



 

  This document was created using <xsl:value-of select="version" /> <xsl:for-each select="page"> </xsl:if> <xsl:if test="@type='image'"> </xsl:if> </xsl:for-each> </xsl:for-each> </xsl:for-each> </xsl:template> </xsl:stylesheet>