Converting a medium sized book collection to PDF

Chaotic42 · Apr 15, 2017

Aside from Aquaman, who I don't think posts here any more, I don't know of anyone who has tried to convert a relatively medium sized book collection into PDFs by cutting and scanning them. I have approximately fourteen 28-inch shelves of books which I estimate to contain around 215,000 pages of content.

1DollarScan will scan the books in at 600 dpi, destructively, for $2 per 100 pages. This means that I'd pay about $4300 for the service. With a rough calculation (http://www.lugaru.com/bookweight.html) of 1500 grams per 1000 pages, that gives 322.5kg. I'm figuring around $1000 to ship it to California where 1DollarScan is.

So we're looking at $5300 or so to get this done.

Has anyone ever tried doing something like this at home? Obviously, $5300 would buy me a great scanner and cutting tool, but it would be really time consuming, plus I'd have a lot of waste to deal with. I know this is kind of crazy, but is it *crazy*, or something interesting that no one really wants to try at home?

lxskllr · Apr 15, 2017

If it were me, I'd look for the books on some site or another, download those, then donate the hard copies to a library.

Chaotic42 · Apr 15, 2017

lxskllr said:
If it were me, I'd look for the books on some site or another, download those, then donate the hard copies to a library.

That would be ideal, but I have a lot of older, cheap mathematics books, which aren't in any sort of digital format. I could go through and type them all in via LaTeX, but that would take maybe 5 minutes per page. That's two years solid.

lxskllr · Apr 15, 2017

You could make a copying jig, and take a photo of each page with a camera. Having a jig would take out some of the tedium, but it would still suck.

I started a project at work taking pictures of the job book. I was doing it 100% manually, and intended on making a pdf out of the results. The results sucked, so I manually entered everything in a spreadsheet. Between that, and importing/cleaning up some previous efforts, I got it done, but it was a long boring job.

All this is to say, there's no good solution imo. I'd try to find electric copies of useful info, if not the exact books, get rid of the stuff that isn't unique, make pdfs of the semi good books, and keep hardcopies of the very good books.

tynopik · Apr 15, 2017

talk to open library/internet archive and see if they will take your books

if so, they'll scan them for free and make them freely available
https://openlibrary.org/bookdrive

IronWing · Apr 15, 2017

With pdf you have a couple options.

1) You can scan each page as an image and keep it as an image. This method maintains accuracy but nothing is searchable and file sizes are large.

2) You can scan each page as an image and use OCR to convert the images to text, either letting the copier/scanner do this or doing it in software later. This method produces searchable results but may cost you in accuracy. With math texts, I can't see going this route as accuracy matters.

tynopik · Apr 15, 2017

IronWing said:
With pdf you have a couple options.

1) You can scan each page as an image and keep it as an image. This method maintains accuracy but nothing is searchable and file sizes are large.

2) You can scan each page as an image and use OCR to convert the images to text, either letting the copier/scanner do this or doing it in software later. This method produces searchable results but may cost you in accuracy. With math texts, I can't see going this route as accuracy matters.

or you can do what Open Library does and create searchable image pdfs where they keep the page image but put a text layer underneath so you can search but still see the actual page

Chaotic42 · Apr 15, 2017

IronWing said:
With pdf you have a couple options.

1) You can scan each page as an image and keep it as an image. This method maintains accuracy but nothing is searchable and file sizes are large.

2) You can scan each page as an image and use OCR to convert the images to text, either letting the copier/scanner do this or doing it in software later. This method produces searchable results but may cost you in accuracy. With math texts, I can't see going this route as accuracy matters.

Yeah, definitely not going standard OCR. I might be interested in writing a more math-friendly OCR software, but I don't have one on hand.

IronWing · Apr 15, 2017

tynopik said:
or you can do what Open Library does and create searchable image pdfs where they keep the page image but put a text layer underneath so you can search but still see the actual page

Cool, I wonder if Adobe Pro lets you do this?

tynopik · Apr 15, 2017

IronWing said:
Cool, I wonder if Adobe Pro lets you do this?

I'm pretty sure it does. I know FineReader does.

SKORPI0 · Apr 15, 2017

How we built a DIY book scanner with speeds of 150 pages per minute
Related link.. http://diybookscanner.org/

Search

Converting a medium sized book collection to PDF

Chaotic42

Lifer

lxskllr

No Lifer

Chaotic42

Lifer

lxskllr

No Lifer

tynopik

Diamond Member

IronWing

No Lifer

tynopik

Diamond Member

Chaotic42

Lifer

IronWing

No Lifer

tynopik

Diamond Member

SKORPI0

Lifer

TRENDING THREADS