OCR could explain some things, but it doesn't begin to explain all that's wrong with the PDF document posted by the Whitehouse. OCR doesn't begin to explain a layer of white-corrections where someone went through 'erasing' things, but didn't know enough to realize that painting with white is ADDING pixels to a document, not just erasing whatever it is you're whiting out. It would of course be undetectable in a flattened document- but with the layers left intact, it's there for all to see. OCR doesn't account for it- so then, WHY did someone paint with white on what should be a simple scan of an existing document? For what purpose?
It's not just OCR that could be applied but other post-process filters meant to make a document more legible. For example, removing sensor noise or dust particles. And yes, potentially done in separate layers.
OCR also doesn't really account for the pixel duplicated objects- and again, you'll notice NO ONE has actually been able to duplicate this, and show, "See? When you scan something and OCR it, it does exactly this same thing." And then actually release their results for anyone to examine. Again, I was hoping for something like this in the snopes article, or anywhere else, but still haven't seen it.
Actually it explains it exactly. I know you already understand how OCR works so I'm not sure why you're contesting pixel duplications for fields rendered as text..
And yes, if only some characters are recognized by the OCR and others aren't you could end up with only some appearing anti-aliased.
Since we don't know what software they used it'd be difficult to reproduce something exactly.
So again, WHY the need for all of this nonsense? Why would you even use OCR on this in the first place, if people want to believe that explains away all the oddities? All that you'd ever need to do was just scan the document with any garden variety scanner, and release a simple image of it. Any of us knows that's all we'd do if asked to scan a document we had, not run it through OCR. And why release something so incompetent?
OCR is something you'd normally want when scanning PDFs, it was most likely part of whatever scanning software they used. You don't just want it so you can change the text, but so you can more easily copy the text to other programs. It's generally a useful feature, particularly if whoever scanned it was used to converting documents for some other kind of electronic archiving - if you've never looked at a PDF that's a scan of an old document and not resented being unable to copy its content for presentation somewhere else then you haven't dealt with enough PDFs of old documents.
They could have disabled it, and maybe they should have released an updated one with the post-processing disabled along with an explanation that it was on and why it was on. There's a chance that whoever made the scan didn't really know what they were doing in this regard.
You yourself admit that it makes no sense why they would forge the document, and it makes no sense that a forgery would look like this. Isn't software post-processing a much more plausible explanation than a deliberate concoction of what we see?