Extracting All Text from a Document

This script will allow you to extract all text from all frames from within Scribus. In addition, it will list all pathnames for images in image frames.

When you start the script it will ask for a name for the text file to create. As soon as you enter this, it scans your Document and creates the file.

The original version relied on text frame names starting with 'Text' such as is the default, but this revision now tests for frame type. Frames other than text and image frames will be ignored.

Here is the script:
 * 1) !/usr/bin/env python
 * 2) File: extract_text.py - Extracts the text from a document, saving to a text file
 * 3) also lists image files with pathnames
 * 4) 2006.03.04  Gregory Pittman
 * 5) this version 2007.10.27
 * 6) This program is free software; you can redistribute it and/or modify
 * 7) it under the terms of the GNU General Public License as published by
 * 8) the Free Software Foundation; either version 2 of the License, or
 * 9) (at your option) any later version.

import scribus

if scribus.haveDoc: textfile = scribus.valueDialog('Filename','Enter name of file to save to\n".txt" will be appended') textfile = textfile + '.txt'

page = 1 pagenum = scribus.pageCount T = [] content = [] while (page <= pagenum): scribus.gotoPage(page) d = scribus.getPageItems strpage = str(page) T.append('Page '+ strpage + '\n\n') for item in d:           if (item[1] == 4): contents = scribus.getAllText(item[0]) if (contents in content): contents = 'Duplication, perhaps linked-to frame' T.append(item[0]+': '+ contents + '\n\n') content.append(contents) elif (item[1] == 2): imgname = scribus.getImageFile(item[0]) T.append(item[0]+': ' + imgname + '\n') page += 1 T.append('\n') output_file = open(textfile,'w') output_file.writelines(T) output_file.close endmessage = textfile + ' was created' scribus.messageBox("Finished",endmessage,icon=0,button1=1)