OCR with ABBYY | Off Topic | unofficial empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

You are not logged in. [Log In] empegbbs.com » Forums » General Discussion » Off Topic » OCR with ABBYY

Topic Options

#357766 - 04/03/2013 18:30 OCR with ABBYY
tanstaafl. carpal tunnel Registered: 08/07/1999 Posts: 5552 Loc: Ajijic, Mexico	It has been more than 20 years since the last time I played seriously with OCR, and back then neither the hardware nor the software was up to the task. Back then, OCR was done letter by letter. "Let's see... that's a circle with a bar sticking out of it. Is the bar on the left or the right? On the right, hmm, I'll bet it's a lower case d. Now let's try the next letter. It looks kinda like another circle. Maybe a lower case o. Or an e. Or a c. One chance in three. Go for it." Now OCR does it word by word, with built-in dictionaries for hundreds of different languages. And it's f a s t! Three or four seconds a page. I just spent $30 on the ABBYY program for OCR (half-price sale) and compared to my previous experiences the results are staggering. On clean text, it is nearly 100% perfect. On dirty text... well, see for yourself. The sample below is a screenshot from a PDF file that was fed to ABBYY. (As bad as it is, it is still better than the original paper document which I massaged intensely with my Fujitsu scanner!) I ran the PDF by ABBYY as a joke, just to see if it could parse out any of the text, and it ran at least 99% accurate. [OK, in the interests of accuracy, there were about 30 OCR errors out of 2,603 characters, so it was only 98.8% accurate.] Color me impressed, tanstaafl. Attachments _________________________ "There Ain't No Such Thing As A Free Lunch"
Top

View All Topics