Scanning and compression

LordGorzul

Member
Mar 21, 2000
156
0
0
We are trying to scan a lot of pages for online publishing, we're using an A3 scanner, and as much as we try to compress it, in black nd white (as long as it's still readable) we cannot get it to agood size, the smallest we've achieved is about 200k per page, we need to get it down to about 50-60 if possible do you guys have any advice?
Not advice such as, take the pictures out etc etc, but maybe you could tell me about a program that would compress the image further or something. we compressed in jpg or pdf and it's too large, especially pdf, we got files that were over 800k a piece (page), when we're talkign about 200-300 pages on the website it's too big, we need to put them in 10 page files that wouldnt exceed say 700-800k in total.

Thanks for the advice.


LG
 

LordGorzul

Member
Mar 21, 2000
156
0
0
or you culd tell me what owoudl the the best scanning software to use, I tried adobe acobat, photoshop and the scanning software that came with the mustek scanner
 

UlricT

Golden Member
Jul 21, 2002
1,966
0
0
try GIF... becoz that format compresses MUCH better when the pic has to have readable text... but i hear that there are some licensing issues nowadays(?)
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
You could try variable jpeg compression but you're not going to be get much of a performance boost over what you're already getting. Try http://www.xat.com/ for variable jpeg compression. You could also try JPEG 2000 but many people can't view jpeg 2000. You should also try a lossless format such as png if your docs aren't very graphical. Don't use GIF, it's outdated.

Also, JPG is a form of compression but PDF is NOT. PDF is just a sort of postscript based document and postscript is not known for it's efficiency.

But none of the things I've mentioned will get you what you want. It's just not possible to do what you want with image compressors.

The only way that you are going to get the sort of small file size you desire is to use OCR. I know that using OCR isn't really what you're looking for but it's simply the only way. OCR would translate your documents into actual text not pixels. A good OCR package will support automatic insertion of images into your document too. The OCR program would produce MS Word files or something along those lines. You'll need a good OCR package that supports formating and graphics. Your scanner probably came with an OCR program but it might not be good enough for what you'll need to do.

Since you're doing online publishing, do you have access to the original digital versions of the documents?