Does anybody here have experience or suggestions related to creating eBooks?
I have a few books I wish to convert from old paperbacks to eBooks suitable for reading on a Kindle.
Here's how I have been doing it:
1) I take my jigsaw and cut the binding off the paperback.
2) I run the loose pages through my scanner: 30 sheets per minute (double-sided!) at 300 dpi.
3) I saved the scanned images into a single PDF file.
4) I run Adobe Acrobat's OCR module through the PDF file.
5) I save the OCR'd PDF file as an MS-Word editable RTF file.
6) I use MS-Word to strip out headers and footers, i.e., page numbers and Author/Title on each page using global search and replace, and then format chapter headings to my liking by putting in forced page breaks and bold type.
7) I save the file, and import it into Calibre where I can convert it into *.mobi format for the Kindle.
While this seems very complicated and involved, the whole process, including the "carpentry"
, takes little more than an hour.
However… (you knew there'd be a 'however', didn't you?
The OCR and saving as RTF is less than robust. In particular, Adobe's OCR does an inadequate job of dealing with ligatures ("fi" may be be read as "h", for example) and does not do well parsing dialog. For instance:
"If you set the cat on fire, he won't like it."
"Well, I don't have any matches so I can't do it."
Will very likely come out as
"If you set the cat on fire, he won't like it." "Well, I don't have any matches so I can't do it."
I can fix these with a global search for quote-space-quote, replaced with quote-carriagereturn-tab-quote, but many other deficiences are harder to deal with. To thoroughly proofread and correct the scanning + OCR errors (not to mention the errors in the original document!) can take six to ten hours.
I only have a few books I wish to convert, and don't wish to spend money on some powerful, difficult to learn and use dedicated bookmaker program. Instead, I am looking for suggestions that might make what I am doing more efficient and transparent.
Ideas?
tanstaafl.