Extracting All Text from a Document

This script will allow you to extract all text from all frames from within Scribus. In addition, it will list all pathnames for images in image frames.

When you start the script it will ask for a name for the text file to create. As soon as you enter this, it scans your Document and creates the file.

The original version relied on text frame names starting with 'Text' such as is the default, but this revision now tests for frame type. Frames other than text and image frames will be ignored.

Here is the script:
 * 1) !/usr/bin/env python
 * 2) File: extract_text.py - Extracts the text from a document, saving to a text file
 * 3) also lists image files with pathnames
 * 4) 2006.03.04 Gregory Pittman
 * 5) 2008.02.28 Petr Vanek - fileDialog replaces valueDialog
 * 6) this version 2008.02.28
 * 7) This program is free software; you can redistribute it and/or modify
 * 8) it under the terms of the GNU General Public License as published by
 * 9) the Free Software Foundation; either version 2 of the License, or
 * 10) (at your option) any later version.

import scribus

def exportText(textfile): page = 1 pagenum = scribus.pageCount T = [] content = [] while (page <= pagenum): scribus.gotoPage(page) d = scribus.getPageItems strpage = str(page) T.append('Page '+ strpage + '\n\n') for item in d:           if (item[1] == 4): contents = scribus.getAllText(item[0]) if (contents in content): contents = 'Duplication, perhaps linked-to frame' T.append(item[0]+': '+ contents + '\n\n') content.append(contents) elif (item[1] == 2): imgname = scribus.getImageFile(item[0]) T.append(item[0]+': ' + imgname + '\n') page += 1 T.append('\n') output_file = open(textfile,'w') output_file.writelines(T) output_file.close endmessage = textfile + ' was created' scribus.messageBox("Finished", endmessage,icon=0,button1=1)

if scribus.haveDoc: textfile = scribus.fileDialog('Enter name of file to save to', \                                 filter='Text Files (*.txt);;All Files (*)') try: if textfile == '': raise Exception exportText(textfile) except Exception, e:       print e

else: scribus.messageBox('Export Error', 'You need a Document open, and a frame selected.', \                      icon=0, button1=1)

Script Variant
For those with an aversion to coming to grips with Python and modifying a script, here is a version that will only look for the image frames, and output that to a file:


 * 1) !/usr/bin/env python
 * 2) File: extract_imageframes.py - Extracts image frame pathnames from a document, saving to a text file
 * 3) 2010.06.15 minor mod to eliminate checks for text frames, only image frames
 * 4) 2006.03.04 Gregory Pittman
 * 5) 2008.02.28 Petr Vanek - fileDialog replaces valueDialog
 * 6) this version 2008.02.28
 * 7) This program is free software; you can redistribute it and/or modify
 * 8) it under the terms of the GNU General Public License as published by
 * 9) the Free Software Foundation; either version 2 of the License, or
 * 10) (at your option) any later version.

import scribus

def exportText(textfile): page = 1 pagenum = scribus.pageCount T = [] content = [] while (page <= pagenum): scribus.gotoPage(page) d = scribus.getPageItems strpage = str(page) T.append('Page '+ strpage + '\n\n') for item in d:           if (item[1] == 2): imgname = scribus.getImageFile(item[0]) T.append(item[0]+': ' + imgname + '\n') page += 1 T.append('\n') output_file = open(textfile,'w') output_file.writelines(T) output_file.close endmessage = textfile + ' was created' scribus.messageBox("Finished", endmessage,icon=0,button1=1)

if scribus.haveDoc: textfile = scribus.fileDialog('Enter name of file to save to', \                                 filter='Text Files (*.txt);;All Files (*)') try: if textfile == '': raise Exception exportText(textfile) except Exception, e:       print e

else: scribus.messageBox('Export Error', 'You need a Document open, and a frame selected.', \                      icon=0, button1=1)