• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Is there software that can build a database by reading info from PDF files??

Mizugori

Senior member
Is there such a thing as software that could read through a number of PDF files and look for specific pieces of text (such as "Policy Number: 123456789") and use those pieces of information to create entries in a database??

I'm trying to find a simpler way to create a database from thousands of PDF documents from scanned client files... Typing them all into Access would be quite a project... and error prone...

Help? Please? Thanks!
 
You say scanned, are they OCRed or just images inside a PDF? If they're not already OCRed, you're going to have problems. If they are, and there's text data inside the PDF, just use a search engine that can read PDFs.
 
No software that I know of, but this is where Windows Scripting (http://msdn2.microsoft.com/en-us/library/ms950396.aspx) can come in handy.

I would use a tool like http://www.pdfpdf.com/pdfconverter.html to extract the text, then write a script that could loop through the text of each document and then insert the text into a database. Its not for the faint of heart but it would be a great learning experience for you to script it.

Heres how to open a file in script. The site above has tons of code samples to help you.

Const ForReading = 1, ForWriting = 2, ForAppending = 8

Dim fso, f

Set fso = CreateObject("Scripting.FileSystemObject")

Set f = fs😵penTextFile("c:\testfile.txt", ForReading, True)

firstline = f.ReadLine() 'read the first line

f.Close
 
boberfett, what do you mean use a search engine? can you be more descriptive? Assuming they are OCRed, how could I use say google to do this? And how would the information get entered into the database?
 
Back
Top