Converting a medium sized book collection to PDF

Chaotic42

Lifer
Jun 15, 2001
34,957
2,109
126
Aside from Aquaman, who I don't think posts here any more, I don't know of anyone who has tried to convert a relatively medium sized book collection into PDFs by cutting and scanning them. I have approximately fourteen 28-inch shelves of books which I estimate to contain around 215,000 pages of content.

1DollarScan will scan the books in at 600 dpi, destructively, for $2 per 100 pages. This means that I'd pay about $4300 for the service. With a rough calculation (http://www.lugaru.com/bookweight.html) of 1500 grams per 1000 pages, that gives 322.5kg. I'm figuring around $1000 to ship it to California where 1DollarScan is.

So we're looking at $5300 or so to get this done.

Has anyone ever tried doing something like this at home? Obviously, $5300 would buy me a great scanner and cutting tool, but it would be really time consuming, plus I'd have a lot of waste to deal with. I know this is kind of crazy, but is it *crazy*, or something interesting that no one really wants to try at home? :p
 

lxskllr

No Lifer
Nov 30, 2004
60,361
10,762
126
If it were me, I'd look for the books on some site or another, download those, then donate the hard copies to a library.
 

Chaotic42

Lifer
Jun 15, 2001
34,957
2,109
126
If it were me, I'd look for the books on some site or another, download those, then donate the hard copies to a library.
That would be ideal, but I have a lot of older, cheap mathematics books, which aren't in any sort of digital format. I could go through and type them all in via LaTeX, but that would take maybe 5 minutes per page. That's two years solid. :p
 

lxskllr

No Lifer
Nov 30, 2004
60,361
10,762
126
You could make a copying jig, and take a photo of each page with a camera. Having a jig would take out some of the tedium, but it would still suck.

I started a project at work taking pictures of the job book. I was doing it 100% manually, and intended on making a pdf out of the results. The results sucked, so I manually entered everything in a spreadsheet. Between that, and importing/cleaning up some previous efforts, I got it done, but it was a long boring job.

All this is to say, there's no good solution imo. I'd try to find electric copies of useful info, if not the exact books, get rid of the stuff that isn't unique, make pdfs of the semi good books, and keep hardcopies of the very good books.
 

IronWing

No Lifer
Jul 20, 2001
73,123
34,429
136
With pdf you have a couple options.

1) You can scan each page as an image and keep it as an image. This method maintains accuracy but nothing is searchable and file sizes are large.

2) You can scan each page as an image and use OCR to convert the images to text, either letting the copier/scanner do this or doing it in software later. This method produces searchable results but may cost you in accuracy. With math texts, I can't see going this route as accuracy matters.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
With pdf you have a couple options.

1) You can scan each page as an image and keep it as an image. This method maintains accuracy but nothing is searchable and file sizes are large.

2) You can scan each page as an image and use OCR to convert the images to text, either letting the copier/scanner do this or doing it in software later. This method produces searchable results but may cost you in accuracy. With math texts, I can't see going this route as accuracy matters.

or you can do what Open Library does and create searchable image pdfs where they keep the page image but put a text layer underneath so you can search but still see the actual page
 

Chaotic42

Lifer
Jun 15, 2001
34,957
2,109
126
With pdf you have a couple options.

1) You can scan each page as an image and keep it as an image. This method maintains accuracy but nothing is searchable and file sizes are large.

2) You can scan each page as an image and use OCR to convert the images to text, either letting the copier/scanner do this or doing it in software later. This method produces searchable results but may cost you in accuracy. With math texts, I can't see going this route as accuracy matters.

Yeah, definitely not going standard OCR. I might be interested in writing a more math-friendly OCR software, but I don't have one on hand.
 

IronWing

No Lifer
Jul 20, 2001
73,123
34,429
136
or you can do what Open Library does and create searchable image pdfs where they keep the page image but put a text layer underneath so you can search but still see the actual page
Cool, I wonder if Adobe Pro lets you do this?