These days, almost everything (e.g. photos, music, videos) has gone digital (and that makes sense, as digital content can be conveniently managed, edited, and shared). So how can textual documents stay behind. Thanks to the advancements in Optical Character Recognition (OCR) techniques, it’s now easier than ever to digitize the textual matter in printed/handwritten documents, thus making it editable by word processing programs.
Now, to do that, you need some really good OCR software applications, and that’s exactly what this article is all about. These software can either acquire the source printed documents as images from scanning devices, or you can input your own document images to be converted into editable text. Intrigued? Well then let’s not beat around the bush, and get to the 5 best OCR software.
1. ABBYY FineReader
When it comes to Optical Character Recognition, there’s hardly anything that comes even close to ABBYY FineReader. Loaded to the brim with an insane amount of powerhouse features, ABBYY FineReader makes extracting text from all kinds of images a breeze.
Despite toting and extensive list of features, ABBYY FineReader is super simple to use. It can extract text from almost all kinds of popular image formats, such as PNG, JPG, BMP, and TIFF. And that’s not all. ABBYY FineReader can also extract text from PDF and DJVU files. Once the source file or image (which should preferably have a resolution of at least 300 dpi, for optimal scanning) is loaded up, the program analyzes it and automatically determines different sections of the file having extractable text. You can either have all of the text extracted, or choose only some specific sections. After that, all that you need to do is use the Save option to choose the output format, and ABBYY FIneReader will take care of the rest. There are numerous output format supported, such as TXT, PDF, RTF, and even EPUB.
The output text is perfectly editable, and text from even the most content intensive documents (e.g. those having multiple columns and complex layouts) is extracted flawlessly. Other features include extensive language support, numerous font styles/sizes, and image correction tools for files sourced from scanners and cameras.
In a nutshell, if you want the absolute best OCR software out there, complete with extensive input/output format and processing support, go for ABBYY FineReader.
Platform Availability: Windows 10, 8, 7, Vista, and XP; Mac OS X 10.6 and later
Price: Paid versions start from $169.99, 30 days free trial available
On the hunt for an extremely powerful OCR software that’s heavy on features, but doesn’t really take a whole lot of effort to get started with? Take a look at Readiris, as it just might be what you need.
A professional grade application, Readiris has an extensive feature set that’s largely identical to the previously discussed ABBYY FineReader. From BMP to PNG, and from PCX to TIFF, Readiris supports quite a few image formats. Other than that, PDF and DJVU files can be processed just as well. Images can be sourced from scanner devices, and the application also lets you set custom processing parameters to source files/images, such as smoothening and DPI adjustment, before analyzing them. Although Readiris can process lower resolution images just fine, the optimal resolution should be at least 300 dpi. Once analysis is done, Readiris determines text sections (or zones), and the text can be extracted from either specific zones, or the entire file. The extracted text is editable, and can be saved in numerous formats, such as PDF, DOCX, TXT, CSV, and HTM.
What’s more, Readiris Pro’s cloud saving feature lets you directly save your extracted text to different cloud storage services like Dropbox, OneDrive, GoogleDrive, and then some more. There are also a healthy number of text editing/processing features as well, and even barcodes can be scanned.
All in all, you should use Readiris if you want robust text extraction/editing features in a simple to use package, complete with extensive input/output format support. However, Readiris does falter a little bit when it comes to processing documents with complex layouts like multiple columns, tables, etc.
Platform Availability: Windows 10, 8, 7, Vista, and XP; Mac OS X 10.7 and later
Price: Paid versions start from $99, 10 days free trial available
If you’re looking for a simple and no fuss OCR software with decent text recognition capabilities, look no further than FreeOCR. While it may not be overloaded with all kinds of fancy features, it still works extremely well for what it is.
Based on the extremely popular, Google backed Tesseract OCR engine, FreeOCR is extremely easy to use. It can obtain printed documents scanned via scanners, and also lets you upload images having textual content. Not only that, it can also extract text from heavily formatted multi page documents. You can have the application extract either all of the text from the input PDF/image, or define a specific chunk of text. Conversion speeds are pretty good, and the converted text can be either saved in formats like TXT and RTF, or exported directly to Microsoft Word. FreeOCR supports all major image formats, like PNG, JPG, and TIFF.
That being said, FreeOCR does have some shortcomings. It’s a too basic, and doesn’t have any text post-processing functions. Moreover, the layout of the extracted text often gets messed up, with overlapping lines and columns. Use it only if you require some basic OCR functionality for occasional usage.
Platform Availability: Windows 10, 8, 7, Vista, and XP
4. Microsoft OneNote
OneNote is an impressively feature rich note-taking application that’s easy to get started with as well. However, notetaking isn’t the only thing it’s good at. If you use OneNote as part of your workflow, you can use it to do some basic text extraction, thanks to the OCR goodness built into it.
Using OneNote to extract text from images is ridiculously simple. If you use the desktop application, all you have to do is use the Insert option to insert the image into any of the notebooks or sections. Once that’s done, simply right click on the image, and select the Copy Text from Picture option. The entire textual content from the image would be copied to the clipboard, and can be pasted (and hence, edited) anywhere, as per requirement. Whether it’s PNG, JPG, BMP, or TIFF, OneNote supports almost all major image formats.
However, OneNote’s text extraction capabilities are quite limited, and it can’t deal with images having complex textual content layouts such as tables and sub-sections. So that’s something you should bear in mind.
Platform Availability: Windows 10, 8, 7, and Vista; Mac OS X 10.10 and later
Note: Before getting started, it’s important to know that even though GOCR supports regular image formats such as PNG and JPG, it failed to recognize them during our testing (performed on a Windows 10 running PC). It’s very much possible that it might work with those formats on Linux machines, but if you’re using Windows, you’ll need to convert the source image(s) to the PNM format. This can be done via numerous online file conversion tools, such as this one.
What sets GOCR apart from the lot is that it doesn’t really have a graphical user interface (GUI) front-end. It’s a command line based tool and as such, isn’t really the easiest to use. But once you’re comfortable with the basics, GOCR can prove really helpful in text extraction from images. It’s also worth noting that for GOCR to work properly, the source images should have clearly visible textual content, and preferably white background, as the utility doesn’t really work with complex source files. GOCR extracts the text from images and saves them in the TXT format. While it supports quite a few arguments and functions, only a few need to be known to get started. For example, to extract text from a sample PNM image, you should enter the following at the command prompt.
X:\sample folder\gocr049 -i file.pnm -o file.txt
Here, X:\sample folder is the location where GOCR’s command line tool is located, and file.pnm and file.txt are the input and output files, respectively (both in the same location as GOCR as well; if the location is different, the complete path should be specified). Also if you want to change the greyscale levels for the image, you can specify a numerical value as argument, along with -l. Click here to read about the usage in detail.
To sum it up, GOCR is a fairly good OCR utility, and when it comes to text extraction from simple images, it works exceptionally well. However, it’s severely limited in features, and requires a fair amount of effort to get working.
Platform Availability: Windows 10, 8, 7, Vista, and XP; Linux; OS/2
SEE ALSO: 8 Best LaTeX Editors
All set to convert images to text?
Digitizing printed (and handwritten) textual content is extremely useful, as it makes storing, editing, and sharing text extremely easy. And the above discussed OCR software make quick work of doing just that, no matter how basic or advanced your text extraction needs are. Need professional level text extraction features with the best post processing tools? Go for ABBYY FineReader or Readiris. Would prefer a simpler OCR software that just gets the basics done? Use OneNote or FreeOCR. Try them out, and see how they work out for you. Know of any other OCR software that could’ve been included in the listing above? Shout out in the comments below.