Getting text from scans?

sponge008

Senior member
Jan 28, 2005
325
0
0
So, I have a ~1000-page scanned PDF document which consists of mainly text, but some pictures as well. Is there any way to extract the text, and desirably pictures, to an HTML/.doc/.txt even in a reasonable amont of time? I tried Able2Extract, but it froze if I fed it even as little as a paragraph. I have an Athlon 64 3200+ with 2 gigs of RAM, so that's (hopefully) not the problem.

Edit:
I figured it out: Acrobat can do it, Foxit can't.
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
That would be called Optical Character Recognition (OCR). There are several programs that can do that, but I'm not sure if any are free. I think something that comes with the MS Office tools can do it.

Edit: never mind...I misunderstood you. The content of the PDF is actually text? (you can select the characters in Acrobat). Just looking for a batch PDF->TXT converter?
 

WildHorse

Diamond Member
Jun 29, 2003
5,006
0
0
I've got an app named AbbyFine that does that.

It came bundled with a Lexmark printer, and is the only such app I have any experience with, so I don't know how it compares against others.

It's not free, but you could test it out on their free trial

AbbyFine


It will scan and convert to text, and if you already have saved files of images like your ,pdf, it also will open the saved file, read it, and convert it that to text. It saves the text to MS Word or to a Text file.
 

shortylickens

No Lifer
Jul 15, 2003
80,287
17,082
136
Yup, most good OCR programs will also take an already scanned document (pdf or mdi) and do an image-to-text conversion on it.

I think Omni-Page is great. You get a cheap version with many scanners but the full one works really well. Ihave this fantasy of scanning all my Rifts books and organizing them into something that makes sense and doesnt repeat itself.

I am of the opinion that the scanning software available on MS Office was thrown in just to make it look complete. Its not good at all.