I am in the process of writing a proxy web site that would be a projection of gutenberg.org, but would add an option to download the file in a LRF format (which I would construct on the fly from the text files).
Today I finished the proxy part, and started experimenting woth the converter. Luckily, someone has already written a program to convert text into LRF, which is available here: http://www.sven.de/librie_files/makelrf3.zip.
So all I need to do is preprocess the text file to remove unnecessary line breaks
(otherwise processed book ends up having a
jagged appearance
on the smaller screen, because they break
the line both at
the screen end, and at the end of line).
The simplest way to experiment is to just write the script. The simplest way to write a script is to use python. So here goes it, you can copy it from here. It assumes that all makelrf files are in c:\bin, and the books get output into c:\books.
It takes the book as a number (140), or a full URL to the text file (http://www.gutenberg.org/files/140/140.txt)
import os
import sys
import urllib
import tempfile
DESCR_TEXT = 'The Project Gutenberg EBook of '
AUTHOR_TEXT = 'Author: '
AUTHOR_LEN = len(AUTHOR_TEXT)
TITLE_TEXT = 'Title: '
TITLE_LEN = len(TITLE_TEXT)
def main(argv):
if len(argv) != 2:
print 'Usage: gutenbergtolrf.py number or URL'
print 'The output goes into c:\books'
return
url = argv[1]
if not url.startswith('http://'):
url = ('http://www.gutenberg.org/files/%s/%s.txt'
% (argv[1], argv[1]))
(fd, temp_file_name) = tempfile.mkstemp(
suffix = '.txt', text = True)
url_file = urllib.urlopen(url)
title = None
author = None
description = None
# Read URL, and convert everything into
# single-line paragraphs. Also parse out
# title, author and description
first_line = True
for l in url_file:
if l.endswith('\r\n'):
l = l[:-2]
if l:
if not description and l.startswith(DESCR_TEXT):
description = l
if not author and l.startswith(AUTHOR_TEXT):
author = l[AUTHOR_LEN:]
if not title and l.startswith(TITLE_TEXT):
title = l[TITLE_LEN:]
if first_line:
first_line = False
else:
os.write(fd, ' ')
os.write(fd, l)
# This could be a poetry stanza,
# treat short lines differently
if len(l) < 50:
os.write(fd, '\r\n')
first_line = True
else:
if first_line:
os.write(fd, '\r\n')
else:
os.write(fd, '\r\n\r\n')
first_line = True
os.close(fd)
if not (author and title and description):
print 'Could not parse the file!'
os.remove(temp_file_name)
return
os.chdir('c:\\bin')
target = 'c:\\books\\%s.lrf' % title
if os.path.exists(target):
os.remove(target)
print temp_file_name
print 'Title: ' + title
print 'Author: ' + author
print 'Description: ' + description
print 'Converting to LRF: ' + target
os.spawnv(os.P_WAIT, 'c:\\bin\\makelrf.exe',
['c:\\bin\\makelrf.exe',
'-d', '"%s"' % description,
'-a', '"%s"' % author,
'-t', '"%s"' % title,
'-o', '"%s"' % target,
temp_file_name])
os.remove(temp_file_name)
if __name__ == '__main__':
main(sys.argv)
2 comments:
Hi Sergey,
I invite you to join our Sony Reader community where you can find plenty of additional information regarding how to convert content to LRF.
Keep up the great work!
Cheers,
Alex
You should consider using libprs500 for LRF generation, it's much much more powerful than makelrf. It's in Python to boot :)
https://libprs500.kovidgoyal.net/
Post a Comment