• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

OCR Workstation Advice Sought

Pesca

Junior Member
I would be grateful for advice on a new OCR workstation build. The machine will be used nearly 24/7 for OCR with ABBYY FineReader. My goal is to maximize OCR rate. I would consider multiple machines, if that would result in greater total OCR rate. Documents being OCR'd vary widely in size, 10 MB - 2 GB, median is probably about 100 MB. Thank you!

1. System will be used 98% for OCR (ABBY FineReader), some file transfer (local and Internet), and archiving/storage of OCR documents.

2. Budget: $5,000.

3. Parts from USA.

5. Brand preferences: none.

6. Re-used parts: none.

7. Overclocking: not unless stable for 24/7 100% CPU load.

8. Monitor: 1600x900 min resolution, just so I have enough workspace. Color accuracy, refresh, etc. not very important.

9. Purchase date: Oct - Dec 2013.

X. Additional software: Windows.

OCR speed is top priority, but a quiet system would be great.

Thanks for your advice!
 
Is this something that you use for work?

Also, do you know if FineReader has any built-in distributed computing features? It doesn't look like it from a quick glance, but you know more about it than I.
 
Mfenn, thank you for your quick reply! In answer to your questions:

No, this is not for work. The system will be at home, which is why I wrote that quiet would be nice (but not an absolute requirement).

There is a version of FineReader that will distribute a recognition task to multiple machines, FineReader Recognition Server, but the licensing is very different than FineReader Professional or Corporate, which will not distribute to multiple machines (but will use all available threads). In my case, distributing one recognition task across multiple machines is not much better than having each machine work on a separate task, so, if I were to go with multiple machines, I would probably get a separate single-machine license for each.
 
http://www.tomshardware.com/reviews/high-end-mini-itx-overclocking,3506-17.html
HT helps

http://www.tomshardware.com/reviews/core-i7-4770k-haswell-review,3521-15.html
More cores helps, but I have a feeling that's getting into diminishing returns with the LGA2011 option. Going Intel with HT seems to offer the most gains.

Given the same PC, assuming the OCR processing is basically automatic, you could get near double the speed with two computers. Get a KVM, and they could both be at the same desk, without duplicating the peripherals.

I was really thinking of a Optiplex, TBH, but they don't seem to have any sufficiently suitable preconfigured options, nor any real options at all (also, I can't get the customize page up for a Lenovo M92 tower). So, instead, I'm thinking more along the lines of:
i7-4770
B85/H87 mobo (ASRock? Asus? GB?)
2x8GB RAM
Fractal Design Define R4 case
Big heatsink (I'm fond of the TR Macho, but it's hardly the only option)
Seasonic G series PSU, maybe a 550W (overkill in power, but mainly so the fan can stay off most of the time)
Nice 1080P or 1920x1200 monitor (Acer or Dell IPS?)

1. So, then, how much total data will the PC(s) likely have stored on them at any given time?
1A. How big is the archival portion likely to be?
1B. Do you have a NAS (PC copy + NAS copy = singly-redundant live backup)?

2. How often, if ever, are you stuck waiting on the hard disks to finish reading or writing your files?

3. Do you have any plans or preferences for peripherals?
 
Last edited:
Cerb, thanks for your very thoughtful response.

About the benchmarks, I wish I knew precisely how they were done, but, yes, it seems like the 4770 gives quite good performance for its price. I've also thought about using the Xeon E3-1230 V2 in two or three computers for the same reason, and wonder whether it would be any more stable for 24/7 100% operation.

I've never used a KVM, but that looks like an attractive option. Any special considerations? Thanks!

In answer to your questions:

1. I'll have less than 100GB of "live" data at any time, probably 2TB/year added to the archive. I do not have a NAS, but there will be remote backups, and I won't need super quick access to backups.

2. Read and write time aren't very significant for me now, but they could be relatively more important with a faster processor. I'm thinking about a 256GB Samsung 840 Pro SSD just in case. More capacity than I need, but only using it at 50% capacity should improve its lifetime, right?

3. No plans or preferences for peripherals.

Thank you! Also, I'd be interested to hear more about your heatsink recommendations. Since I'll be running nearly 24/7 at 100% CPU, I'd rather not skimp in that department.
 
This might be one of the few cases where a 8350 is the right machine for the job. It gets comparable performance to the Intel 4770 but is £100 less. KVM and such is a good idea as the budget should allow for about 3 machines if aimed around the mid to high end mark.
 
Thanks, BrightCandle. It looks like I should indeed consider the 8350. Perhaps, though, the price difference with 4770 might be offset by higher electricity cost, if I'm running 24/7 x 3 machines.
 
Cerb, thanks for your very thoughtful response.

About the benchmarks, I wish I knew precisely how they were done, but, yes, it seems like the 4770 gives quite good performance for its price. I've also thought about using the Xeon E3-1230 V2 in two or three computers for the same reason, and wonder whether it would be any more stable for 24/7 100% operation.
With a Xeon E3 V3 (might as well go w/ current-gen), a C-chipset motherboard, and ECC RAM, you would be able to find logs of any RAM errors. If RAM errors aren't occurring anyway (common, but unknowable with non-ECC RAM), of course, there'd be no difference there, either.

With a desktop chipset and regular RAM, they are interchangeable. They are not going to be more or less stable because of being branded Xeon.

I've never used a KVM, but that looks like an attractive option. Any special considerations? Thanks!
USB, and whatever you hook up your monitor with (such as DVI), need to be supported. USB+VGA ones are cheaper, but you may get artifacting, usually as crawlies, fuzziness, text looking it's been through a sharpness filter, etc. (personally, I would get a Dell monitor, for something like that, as they tend to be better than average about handling so-so VGA inputs, IME). Going DVI, it looks like Belkin's is crap, but I haven't tried any you can buy, today (I have a 2-port Zonet w/ DVI, but it's long since been discontinued). For just VGA, I've found Trendnet's to be of good overall quality, though.

2. Read and write time aren't very significant for me now, but they could be relatively more important with a faster processor. I'm thinking about a 256GB Samsung 840 Pro SSD just in case. More capacity than I need, but only using it at 50% capacity should improve its lifetime, right?
A little, but it will also improve its performance, having free space. A bit cheaper SSD would be fine, too. The practical differences in performance are nil, unless you're commonly stuck waiting for the PC or your application to unfreeze from random disk access. There are worse ones than others, but a $180 Toshiba Q Series or Samsung 840 Evo (brand new, so maybe let the early adopters check them out, first 🙂), or a $190 or so Crucial M500, Corsair Neutron, or Plextor M5S, will still be tens of times faster at random IO than any HDD, 2-4x faster at sequential work, and have been good drives.

Also, I'd be interested to hear more about your heatsink recommendations. Since I'll be running nearly 24/7 at 100% CPU, I'd rather not skimp in that department.
Above the Core i3s, Intel's stock heatsinks don't stay anywhere near quiet when under load. As the chips have gotten more efficient, and the heatsinks themselves more efficient, the heatsinks have gotten smaller, to save costs. That's pretty much the why in a nutshell.

Bigger heatsinks with more fin area can cool to lower temperatures, but often need more powerful fans to do that, resulting in much noise. Heatsinks with lots of mass, but wider fin spacing, but less total area due to fewer fins, present less resistance, and give better results with lower RPMs, fewer fans, etc., allowing noise to be greatly reduced, and sometimes even not needing a fan on the CPU HSF at all. A handful of makers have models catering to low noise, along with high performance, rather than just really high performance.

Keeping it relatively quiet and comfortable is also why I'd not go for the AMD. They use significantly more average power, meaning more heat to exhaust from the PCs, and more to exhaust from the room, and a warmer room. They're cheaper for good reason, sadly (it would be nice to have better competition). $5K should be able to fit around 4 i7-based desktops, a KVM, monitor, keyboard, and mouse. If not (depending on how high-end you want to go for each), it can definitely fit 3 of them, with some budget left over.
 
Cerb, really appreciate your detailed response. The KVM option is very appealing, and I doubt I would have considered it without your input. Thanks. Will check out the SSDs you mentioned, too. Power consumption of the 8350 seems like an issue, so I'm leaning towards the 4770. I understand the IvyBridge-E launch is coming this month, so I'll at least wait to see what's offered and what happens to other prices.

Snoturtle, those are definitely deals. Thanks for looking for them. I don't need the discrete graphics, though, so it's possible my own build might be better. Will consider them, though.
 
Snoturtle, those are definitely deals. Thanks for looking for them. I don't need the discrete graphics, though, so it's possible my own build might be better. Will consider them, though.
Your own build would surely be better. But, by how much, and how much, in terms of your money, time, and interest, is that worth? I mean, this forum being what it is, wanting to make your own custom PC is a perfectly valid reason to spend $300 more.

Those are good deals, though, even if you just take one and remove the graphics card. They're very much worth considering, and the mini/mid towers are pretty upgradable. Take the 8700, FI, download a start menu replacement, and you would be in business, with a computer 80-90% as good as a custom build for $900-1000. If you found the HDDs to be limiting, you could add SSDs to them, and re-install the OS (with the custom build, you'd have to install from scratch, anyway).

It's really a question of availability when you make a decision to purchase, and your priorities. Those deals offer better bang/buck, without question.
 
Building one custom PC is fun. Building 3 at a time starts to feel a lot like work. :awe: That XPS 8700 with a 4770 for $570 looks like a pretty awesome deal. The processor alone is $300.

OP, I'm betting that FineReader doesn't use any kind of fancy 3D graphics, so it seems to me that using Remote Desktop would be another way to access the secondary PCs without having to spend money on a KVM and cables.
 
I would think that a 4770 would tear through OCR. I worked for a legal company and we scanned thousands of pages of exhibits or about 500 depositions a day on run of the mill HP desktops with huge tray fed scanners.
 
Mfenn, a deal like the XPS 8700 is attractive, but I'd like to wait until the IvyBridge-E launch. Thanks for the Remote Desktop tip. I've been looking into it and reading more about KVMs. There's also Microsoft's Mouse Without Borders and Synergy+ to consider. The question is how much do I want/need to see all displays simultaneously vs. check them periodically, since KVMs that support simultaneous viewing of 4 screens in one display seem rather expensive. On the other hand, even if I don't need 4 monitors, they aren't very expensive and I suppose there's a value in having them around for the future.

KentState, you're right that a 4770 would be great for OCR. For typical home usage, and probably for many office environments, it would be sufficient, but I could make use of four.
 
OCR is such an iffy proposition to begin with i see no advantage to ECC

SSD isn't going to gain you anything

mind if I ask what you're doing? OCR is pretty fast, I can't imagine needing more than even a basic machine 24/7
 
Tynopik, I agree there's no reason for ECC in terms of the data being generated by the OCR process. It would only be to prevent crashes, but I've been doing a ton of OCR and never had that problem, so I'm not considering ECC.

I'd like to OCR about 10 million pages per year. Speed depends on more properties of the documents than just page count, so it's difficult to judge from some online benchmarks. If you print a 12pt Word file on your laser printer, scan it at 300dpi, and OCR it, yes that's fast, but not all documents are like that. For my usage, I think three 4770s would be sufficient, two not quite. A fourth would be useful for expansion, since the 10 million figure isn't firm.
 
Last edited:
how much is it worth to you to minimize the number of machines? in terms of pure bang for buck, 5 low-end machines might be best vs 2 high-end machines

otoh the hassle of dealing with 5 separate machines might negate the savings vs dealing with just 2 high-end machines
 
how much is it worth to you to minimize the number of machines? in terms of pure bang for buck, 5 low-end machines might be best vs 2 high-end machines

otoh the hassle of dealing with 5 separate machines might negate the savings vs dealing with just 2 high-end machines

I'm trying to resolve this in my own mind, and I'm not sure. I've never had more than a two-machine setup. Also, each machine will need a $300-$400 software license, which reduces the cost benefit of multiple machines. I think I'm going to have to price out different options assuming 1, 2, 3, 4 machines at the same total performance level.
 
Well, for that, you'd have to weigh the savings, but the Dells w/ coupons save quite a bit up front. LGA2011 saves some compared to LGA1150 build, but how much? It typically costs $300 or more more per PC, compared to a regular white-box, making much of the savings being in your power bill and floor space. But, that's $500+ more compared to deals like the linked Dells, possibly even as much as $700, by the time they're fully priced out.
 
Well, for that, you'd have to weigh the savings, but the Dells w/ coupons save quite a bit up front. LGA2011 saves some compared to LGA1150 build, but how much? It typically costs $300 or more more per PC, compared to a regular white-box, making much of the savings being in your power bill and floor space. But, that's $500+ more compared to deals like the linked Dells, possibly even as much as $700, by the time they're fully priced out.

Did you mean to reverse the order of the bolded part?
 
Did you mean to reverse the order of the bolded part?
No. Those deeply discounted Dells do skew things quite a bit. But, comparing LGA2011 to LGA1150, without said deals, the up front cost differences are all in the CPU, motherboard, and software.

How much are 4 sets of case, PSU, RAM, HDD, OS, and FR, compared to 3?
Then, compare that to the added cost of using LGA011 for those 3.
 
Last edited:
a quick and dirty look at a super-high-end machine that gives you 16 real cores

not guaranteeing compatibility or that all component choices are optimal, but it should be ballpark

(and of course the Ivy versions should be out shortly)

$1935 x 2 Intel Xeon E5-2687W 8-core 3.1GHz (3.8GHz turbo)
$364 SuperMicro MBD-X9DRD-IF-O
$102 Kingston 4x2GB ECC Registered
$37 x 2 Dynatron R17 HSF
$80 Seagate Barracuda 2TB
$140 Windows 8 Pro (for 2 processors)
$110 Fractal Design Define XL R2
$157 Corsair AX760
$110 Acer 21.5" 1920x1080
$13 MSI Radeon HD 5450

$5020
 
Last edited:
Abby's page on multi-core performance makes a couple key points

1. To get the most benefit from multiple cores, you needs lots of pages in a document
2. Multiple cores don't help with saving. So if saving a document takes a considerable amount of time, you might be better off splitting the jobs between multiple machines
 
Tynopik, thanks very much for the example build, and for checking the FineReader page (I do have documents with many pages, and save time is not very important).

Based on everyone's input (thanks!), I'm thinking about that Dell deal with the 4770s if I am comfortable with 3+ machines. If I want to stick to 1 or 2 machines, the decision seems more complicated.

Since FineReader benchmarks aren't available for all processors, I'm looking at PassMark scores. For dual-CPU machine, I'm not sure if I should look at the dual-CPU PassMark scores or 2x the single-CPU scores, since FineReader makes good use of all threads. If the latter, then a strong candidate might be two machines, each with 2x E5-2630 (or E5-2640 v2, maybe?).

If one machine, I wonder if 4x E5-4607 could ever make sense? Or, should I be looking at Tynopik's suggestion of 2x E5-2687W or maybe 2x E5-2660 v2?

Three or four machines with 4770s would be great, if I could manage them, and that might be what my decision comes down to, ultimately.
 
Back
Top