Can Deep Learning and AI Help in Preprocessing Images for OCR?
by Mark Clark, on February 27, 2020 10:00:00 AM PST
Deep learning is an emerging subset of artificial intelligence in which systems are developed to learn from observation and make decisions without human intervention.
Deep learning systems seek logical patterns in unstructured and uncategorized data sets and, as we saw in How Artificial Intelligence And Deep Learning Algorithms Deliver OCR Accuracy for Business, has powerful applications in optical character recognition (OCR) conversions.
![]()
|
Hey OCR which one of these is a “3”? But an OCR would fail this quiz. |
By mimicking the human brain’s recognition and recall faculties, deep learning technology enhances the accuracy of OCR conversions and helps generate insights from the extracted text.
Let’s now look at the conversion process more closely, and specifically see how deep learning helps in the preprocessing of images.
OCR vs. OCR+ML vs. IDP
OCR has inherent limitations when used to extract data from unstructured documents and images. The addition of ML as a preprocessing step helps, but is not sufficient. A better approach is termed Intelligent Data Processing (IDP) which is a template-free, AI-driven approach that is well suited to complex document extraction.
OCR Technology
OCR technology has been in use for a long time to extract text from scanned documents and images to an editable, machine-readable form. The output scanned document is a file that you can edit in Word, not a jpg.
This text recognition delivers significant savings of time, effort, and cost over manual reading and data entry. Many OCR applications today include add-on functionality to recognize some page attributes such as simple tabular structures, fonts, and paragraph styles.
How does OCR work?
The OCR conversion process uses the following four steps:
The image to be extracted is presented to the final step using a well-defined template. The template tells the OCR to extract the data in a particular, stationary location on the document. Through this process, OCR converts an image into text.
The Importance of Image Quality in OCR Extraction
The success and accuracy of OCR conversion systems are a function of the algorithm used, the quality of the image, and other considerations. While OCR engines are very mature and stable technology, we also need technologies that can improve the quality of images to yield better OCR results.
Images we want to process may have a range of problems such as:
- Image is blurred
- Image lacks contrast
- Scan is skewed
- Scan is warped
- Scan is fuzzy
- Text not aligned in the original document
- itself could have text that is not properly aligned
- Page background may be smudged or muddled
- And more
For this reason, image preprocessing is an extremely important part of the process.
Traditionally, some preprocessing was done manually, prior to attempting the OCR step. Then rules-based software applications that had the ability to do some preprocessing became available.
Now we see deep learning applications being used to handle image preprocessing.
What is OCR Preprocessing?
Image preprocessing works to normalize the content and exclude variations that would reduce the likelihood of text recognition. The steps in preprocessing could include:
Extraction preprocessing step |
What it does |
Cleaning-up |
This process adjusts brightness and contrast, removes dirt and noise on white background, deletes borders, and calibrates the detection threshold. |
Deskewing the image |
On a regular page, the characters are in a straight line. However, this may not be true in a less-than-perfect page scan. The book’s binding could raise pages above the scanner’s glass, to create a skew or curved image of the text. The deskewing process helps straighten such images. |
Sharpening and shake reduction |
The input may be captured on a smartphone rather than a flat-bed scanner, where the shaking of images is a distinct possibility.
|
Normalization |
This function calls for equalization of the histogram to normalize the image that could be under varying conditions of light. Then the image is smoothened to reduce the noise created during the equalization process. |
OCR vs IDP Preprocessing
Although on the surface the preprocessing steps look are similar between OCR and IDP, what occurs within each step significantly different. IDP uses an adaptive multi-layered approach compared to the OCR approach which is static and simple.
OCR Preprocessing |
IDP Preprocessing |
One OCR process for all docs Classification using word density (easiest) Limitations on doc variations Low accuracy |
Multi-layered Sees the doc layout and type Adapts the process for each type Reads the doc Understands and makes corrections Handles unlimited doc variation Best-in-class accuracy |
OCR Processing Steps
Once a better image quality has been achieved, the OCR process progresses with the following operations. A reminder: These steps are all based on a template processing approach which assumes a static document that has no variation.
OCR Process Step |
What it does |
Segmentation |
Some text may be in stylized bitmaps, making their detection even more difficult. Detecting this type of text requires a deep learning neural network. The software attempts to find text-block structures, separation of paragraphs, creation of lines, and finally, recognition of characters. This operation is important as character recognition works one line at a time. |
Line removal |
Removing a line is required if you are working on a filled form that could contain typed or even hand-written text. The operation detects long lines and fills in the spaces with gaps. This step helps remove errors in detection caused by the crossing of alphabets with the long lines. |
Extracting the features |
The OCR tries to detect geometric features like curves, straight lines, holes, different loops, and similar. More modern OCR systems replace feature extraction with deep learning, in which neural networks can recognize features on their own, without human inputs. |
Deep Learning in OCR
Most OCR extractions use templates to guide the process. Templates define where in the scanned document the field is to be extracted. A template-based approach works well enough when there is little variation in structure among documents, but it hits a wall when change happens. And trust me, change happens more often than you want it to happen.
For example, any time the document structure changes more than a little, you’ll have to adjust the template. Some back offices have multiple full-time employees just managing and maintaining templates.
The IDP approach does not rely on templates, and as a result, it’s much more capable of dealing with change and getting better results. In fact, when template-free IDP models have been trained using a robust set of samples with sufficient variation in the data set, IDP can often read document types it has never seen before.
Read: Templates vs Machine Learning OCR
Deep learning in OCR is intended to help a template-based OCR system become a bit more tolerant to change. It uses ML to help fit documents into templates and then manage the templates. While this is treating a symptom instead of curing the disease (In the age of AI, why use templates at all?) deep learning can help OCR work better under some conditions.
If you do choose to use templates, deep learning can help to sort and classify the document. Deep learning can also help to match sections of the document with the correct template. Deep learning can help with the following preprocessing tasks:
Machine Learning Preprocessing |
What it does |
Recognizing stamps |
At times, the OCR needs to extract text out of a stamp put on a document like an invoice, payment receipt, or passport. ML can help in determining if the page has a stamp and recognizing the type of stamp. This still remains a major challenge for OCR even with help from DL. |
Validating signatures |
Deep learning can help understand if a page has a signature and if required, match the signature using image recognition techniques. Remember that handwriting performance varies significantly between vendors. (Want to know the best OCR engine for handwriting? Give us a call). |
Detecting logos |
Documents may have brand images and logos. Deep learning can be used to detect, recognize, and match logos. Again, OCR systems, even with ML, are very poor at this step. |
Extracting tabular data |
Extraction of data from a table is a big challenge for an OCR. Most OCR systems can handle only the simplest tables. IDP solves this problem using computer vision (a deep learning method) rather than use plain OCR techniques. In the end, you get accurate extraction of tabular data through a combination of deep learning, image recognition, and narrow use of OCR extraction. |
Extracting graphical data |
Extraction of data from graphs is even more difficult than pulling data from a table. OCR falls short again. IDP uses deep learning to preprocess the document and narrow the scope of the data extraction. Then IDP uses that selected section to extract the relevant data. |
A note of caution - deep learning is not the best option for functions such as deskewing, correction of perspective, and picture cleaning. There are other dedicated techniques that work better for these objectives.
The use of deep learning in OCR systems can help with some OCR challenges but DL cannot overcome the fundamental limitations of OCR. To break through that wall of limitations you need IDP, which is a completely different approach designed to achieve the maximum benefit from ML and DL.
Can Deep Learning and Artificial Intelligence Help in Preprocessing Images for OCR?
Yes, it can. But these technologies are much more effective when applied in an AI-native IDP framework. This post described the application of DL and ML technology to extract information from unstructured documents. We found Intelligent Data Processing (an AI-native-based approach) can make full use of these technologies whereas OCR cannot achieve similar benefits because of its older template-based framework.
Adding AI to a mature OCR technology can only take it so far before it hits the wall of performance limitations as documents get too complex for it to handle.