free OCR software
-
A non for profit is looking for something that can take image based .pdf files and turn them in to text based .pdfs. (OCR) any suggestions?
-
-
I use PDF Split and Merge all the time to extract pages and put PDFs together. The paid version is $30 and has OCR. I've never used the OCR part but if it works as well as the split and merge part it would be great.
-
@scottalanmiller uhh, can I get a Windows .exe of that?
-
@Mike-Davis said:
@scottalanmiller uhh, can I get a Windows .exe of that?
You want cheap, but only on an expensive platform?
-
let's just call it a commodity platform that my end user recognizes. Since it's been a long time since I have compiled anything, $30 seems very reasonable.
-
Don't most scanners come with OCR software?
-
@Mike-Davis said:
let's just call it a commodity platform that my end user recognizes. Since it's been a long time since I have compiled anything, $30 seems very reasonable.
Where did you get the idea that you would be compiling something?
And yes, it does come on Windows.
-
@scottalanmiller said:
@Mike-Davis said:
let's just call it a commodity platform that my end user recognizes. Since it's been a long time since I have compiled anything, $30 seems very reasonable.
Where did you get the idea that you would be compiling something?
And yes, it does come on Windows.
Probably from the same place I did - though I guess by your comment, that the github has the precompiled files for xzy OS right there for download?
-
Guess you guy's didn't read the site: https://github.com/tesseract-ocr/tesseract/wiki/Downloads
-
@Dashrender said:
Probably from the same place I did - though I guess by your comment, that the github has the precompiled files for xzy OS right there for download?
Yes, the binaries for all major OSes are right there on the download link.
-
OK I learned that github doesn't only have code on it.. it also has executables.. good to know.
-
@Dashrender said:
OK I learned that github doesn't only have code on it.. it also has executables.. good to know.
Quite commonly, yes. And much of the code that they have is ready for deployment. Like NodeBB comes from there and you just deploy it, no compiling.
Lots of projects have downloadable files from GitHub. GH also has a full wiki and documentation system as well.
-
From a TechSoup article:
Optical Character Recognition
If you've got lots of paper notes or forms that you've collected on-site and you need to get the information into your case management database, try optical character recognition (OCR) software. OCR tools allow handwritten or printed text to be scanned using an external scanner; that image is then converted to machine-readable text that can be searched, analyzed, and imported into the system you use. OCR improves the accuracy of data collected and reduces the time it takes staff to enter the data. The technology isn't infallible, though — it's best if staff members take the time to check over the scans and correct them if needed. If you're on a tight budget, consider freeware OCR software such as OCRFeeder, FreeOCR, Tesseract GUI, or TextRipper.
-
Thanks for all the suggestions. I found that tesseract doesn't support .pdf files as a source. FreeOCR and others that use tesseract therefore won't work for what I'm trying to do. (Take an image based .pdf and OCR it) The other programs listed in the techsoup article don't have a Windows version.
-
Not free, but you could look into Paperport, which is available from Amazon for pretty inexpensive.
-
Looks like for a non-profit you could get Acrobat pretty cheap: http://www.techsoup.org/products/acrobat-xi-pro-win-esd--G-40959--
-
@Jason Acrobat Pro for $55 from techsoup.org for the win. I thought the only option was the creative cloud suite subscription. I missed the Acrobat Pro option.