Extracting All Text from a Document

This script will allow you to extract all text from all frames from within Scribus. In addition, it will list all pathnames for images in image frames.

When you start the script it will ask for a name for the text file to create. As soon as you enter this, it scans your Document and creates the file.

The original version relied on text frame names starting with 'Text' such as is the default, but this revision now tests for frame type. Frames other than text and image frames will be ignored.

Here is the script:
 * 1) !/usr/bin/env python
 * 2) File: frameslist.py
 * 3) 2006.03.04  Gregory Pittman
 * 4) this version 2007.10.27

import scribus

if scribus.haveDoc: textfile = scribus.valueDialog('Filename','Enter name of file to save to\n".txt" will be appended') textfile = textfile + '.txt'

page = 1 pagenum = scribus.pageCount T = [] content = [] while (page <= pagenum): scribus.gotoPage(page) d = scribus.getPageItems strpage = str(page) T.append('Page '+ strpage + '\n\n') for item in d:           if (item[1] == 4): contents = scribus.getAllText(item[0]) if (contents in content): contents = 'Duplication, perhaps linked-to frame' T.append(item[0]+': '+ contents + '\n\n') content.append(contents) elif (item[1] == 2): imgname = scribus.getImageFile(item[0]) T.append(item[0]+': ' + imgname + '\n') page += 1 T.append('\n') output_file = open(textfile,'w') output_file.writelines(T) output_file.close endmessage = textfile + ' was created' scribus.messageBox("Finished",endmessage,icon=0,button1=1)