How AI & Deep Learning Algorithms Deliver OCR Accuracy
by Mark Clark, on March 5, 2020 11:30:00 AM PST
The technology that enables us to extract information from scanned images of printed or handwritten documents is called Optical Character Recognition or OCR.
The technology has applications for the processing of business contracts, receipts, utility bills, passports, product sheets, handwritten forms, bank statements, and a variety of PDF documents generated by businesses.
Accurate and reliable data extraction plays an important role in finance, accounting, electronic health record updates, insurance claims processing, personal and national security, logistics documentation, and legal documentation.
The extraction accuracy is extremely important because OCR is often used to convert documents such as passports, invoices, or receipts, where every bit of information is vital and errors can prove costly.
But OCR systems can perform poorly (or not at all) with unstructured documents.
An entirely different approach from OCR, called Intelligent Data Processing (IDP), can be used to provide higher accuracy across a much broader set of document types.
To remedy these OCR shortcomings, IDP uses multiple AI technologies to preprocess, extract and post-process the data.
Limitations with Conventional OCR Processes
Traditional OCR engines have a number of limitations and are unable to extract all the relevant information from documents accurately.
One major limitation is the use of templates. OCR engines use templates as the processing framework to identify entities on the document.
A template framework says “Hey, documents are fixed. There will not be variations in the documents we need to process.”
But there are variations in your documents, and their structure can change over time. Templates are not able to handle possible complexities such as a combination of printed and handwritten text, tabular data, and variations in formats. With templates, there is, by design, almost no flexibility to accommodate variations in documents.
IDP, on the other hand, assumes documents are not fixed. And this one assumption changes everything about how extraction is performed. The extraction process is template-free and AI-driven.
Read Why OCR fails: 5 Drawbacks and Limitations of Conventional OCR Engines
The weaknesses of conventional OCR conversion can become a burden on your business process. Errors may need to be corrected manually - increasing the time, effort, and cost of the conversion. It will be a necessity to manually measure and monitor extraction results.
Variation among documents is a challenge for OCR, but images can cause OCR performance to crash into a brick wall.
For example, a financial services company that we worked with was achieving only 60% accuracy when converting long payment receipts using OCR. The receipts had numerous handwritten details as well as stamps (images) on the document. The firm’s OCR process simply could not accurately and clearly extract the needed information because OCR isn’t designed for that challenge. So the firm’s receipt process struggled.
Artificial Intelligence (AI) for Extraction Delivers Breakthrough Results
AI is an extremely powerful way to overcome the challenges associated with traditional OCR methods and achieve substantially more accurate results.
One approach to overcome OCR challenges is to use Machine Learning to preprocess documents before the doc is passed to a template. ML can improve OCR performance in these scenarios. However, the OCR is still limited because of its template approach and lack of robust image handling.
In contrast, IDP integrates AI capabilities that handle variations and “understand” what it is reading from both documents and images.
We recommend with challenges like data extraction from unstructured documents, that you take a big problem and break it into smaller, more solvable parts. Then apply AI to solve specific problems.
Infrrd has developed deep learning algorithms that dramatically improve the accuracy of extraction by learning from existing business records. Data is extracted in a categorical context that allows it to fill in the missing gaps (eg, ML-based data cleaning). Additionally, variations in document formats do not hinder the extraction of information.
For example, AI can be used in preprocessing to identify relevant sections for extraction and to classify documents before extraction. This helps to know what to expect from extraction. Over time, AI models can also be trained to analyze historical data and flag possible errors, exceptions, and fraudulent activity.
Receipt Data Extraction
Back to the financial services company with 60% accuracy for receipt extraction.
So how did we handle the conversion of these receipts?
The receipt extraction process improved from 60% to 95% accuracy. That massive improvement in performance reduced extraction errors by 88% and, as you might expect, resulted in dramatic improvements to their business process.
The application we developed uses a multi-step process (eg, smaller problems) where the right ML model was used at each step. For this client, the extraction application was designed to classify documents before extraction - to automatically identify whether they are invoices or receipts.
The processing steps were:
- Send multiple images of receipts (as URLs) to the application API.
- Stitch images together using an AI algorithm.
- Extract key fields and line items from the stitched images using an extraction module.
- Extract field coordinates from handwritten text and images using an ML module.
- The extracted OCR, handwritten text, and stamp details were combined into one JSON response for the client’s use. (OCR was used here for a specific purpose and is not constrained by templates).
This process resulted in a very high level of accuracy delivered within a very short time.
We’ve found that using ML in this way can deliver accuracy between 90 and 98% for a variety of use cases across verticals.
Freedom from Templates
An AI-based OCR solution does not depend on templates and is able to identify fields, such as names, bank details, amounts, and dates irrespective of where they appear in the document.
The IDP approach defines the extracted fields needed and trains an ML module to find these fields. This eliminates the need to manually enter data in fields that the application fails to identify - a common problem with conventional OCR conversion.
Multiple OCR Engines for Better Results
We have successfully used OCR for data extraction of complex documents such as engineering diagrams, electrical layout charts, and tables, as well.
But we use OCR in a very different way.
OCR converts an image of text to readable text. That is its job. That is all we use it for. AI does the rest.
In addition to using OCR for converting to text, we have also found no single OCR is good for all jobs. Each OCR engine has its own strengths - one may work particularly well with scanned documents while another handles images from mobile devices well. So vendor A’s OCR engine may perform better than vendor B’s OCR engine for certain jobs. This means applying a single OCR engine across multiple use cases almost certainly reduces your overall accuracy and drives up manual labor costs.
Experience has taught us that combining machine learning technologies with multiple OCR engines delivers the best results. Choose the best tool for each job. And you can combine OCR engines to give better performance.
Deep Learning to Garner Insights
Let’s talk for a minute about post-processing the extracted data.
Document conversion is of limited value if it yields just electronic versions of documents.
For example, an insurance company that converts contracts - simply having the contracts in electronic format will have limited value, but if the firm has the ability to analyze contracts and gain insights about their risk exposure, then there is a clear business benefit.
This is where deep learning has a role to play.
Deep learning is the term used for a multi-layered neural network that mimics the functioning of the human brain. These algorithms don’t rely on historical patterns to determine accuracy -- they can do it themselves. The application can now go beyond recognizing text, to actually deriving meaning from it.
An insurance company can utilize deep learning to gain insights from large numbers of contracts, a daunting task on its own, and derive business benefits from them.
Deep Learning Algorithms Learn from Continuous Feedback Made by Corrections on the Extracted Data, to Deliver Better and Better Results Over Time.
So if you want to add more value to your business, do this:
• Reframe the organizing principle that drives your solution
• Move away from solving for structured documents that fit into a template
• Focus on how AI can extract data and insights from unstructured documents
Assuming a document has variance leads you to a multi-step methodology where ML runs the process, not OCR. And the reward is a solution that can handle much more complex documents, is highly tolerant of variation and changes in document structure, and is capable of increasing the level of automation in your business processes and the level of value you can create as a result of the data and insights delivered.
This is the difference between viewing something as a classic OCR problem and viewing it as an AI-driven IDP problem that can deliver considerably more value to your business.