How AI Is Disruptive Innovation For OCR
by Mark Clark, on March 12, 2020 11:00:00 AM PDT
How AI can Solve your OCR Problem
The success of your Intelligent Automation strategy depends on your ability to extract necessary data from paper-based documents such as contracts, manuscripts, books, invoices, receipts, etc., and convert it into machine-readable text. Unless you want a manual data entry bottleneck in front of your whole automation process, step one of your automation strategy should include the automation of that data extraction process.
For years this has been done by a technology called Optical Character Recognition (OCR). OCR has been used to help automate business processes such as Procure-to-Pay in enterprises throughout the world. While OCR has improved over the years, it’s still not perfect. And guess what? That means even with OCR you probably still have that manual data entry bottleneck in front of your automation process as humans make manual corrections where OCR fails.
OCR has some inherent weaknesses, and one of the most significant is its dependence on templates. Fortunately, Artificial Intelligence (AI) technology can solve this problem by using a completely different approach that’s a powerful alternative to conventional template-based OCR conversion.
Let’s take a look at why template-based conversion creates some operational challenges and how a native AI-based technology can solve them.
What are Templates in OCR Conversion?
Have you ever played those hidden picture games where you have to find an object hidden somewhere in a giant image filled with tons of random things? It’s easier when you get a hint, like “The hidden turtle is somewhere in the lower-left quadrant just below the bookshelf near the chair.”
OCR needs hints like those to do its job better. If you tell it where to look, OCR can more accurately extract the data you want, and that’s what templates do for OCR.
Templates specify the location of information that needs to be extracted from the document that is being converted. The user marks coordinates and the software tool extracts text located at that specific location in the scanned image. This method of using coordinates is called zone OCR, and this helps OCR extract the right data so it can be stored in a structured database.
Zone OCR is effective when documents *reliably* have a structure similar to the template. As long as there is little or no variation among documents, your good to go.
But if something changes or there’s too much variation, watch out.
For example, even if we set up a template for a standard invoice format, the total amount payable may appear in different positions because the number of items in the invoice varies and the total at the bottom gets pushed up or down. This variation causes all kinds of problems for your template. It’s as though that hidden turtle is crawling around and its location no longer matches the hint.
The Problem with OCR Templates
In fact, templates have a lot of limitations that impact accuracy and eat away at both your ROI and your ability to increase process automation. Just for fun, we made a list. See how many of these OCR template issues wreak havoc on your business processes:
• Template-based OCR offers no flexibility to accommodate variations in document formats. Each time the format changes a new template has to be configured. Do you control the format of all your documents? I bet you don’t.
• The template’s inability to handle a wide range of document formats reduces accuracy. And that impacts your manual data entry bottleneck.
• The time and effort invested in template configuration takes away from the advantage of conversion and increases the cost of the project. Let’s say you have 200 different document types and variations and it takes an average of 3 hours each for template setup. That’s 15 weeks of worker time - PLUS you get to maintain hundreds of templates every time something changes. We worked with an organization that spent $2 million on a team that did nothing but create and maintain templates!
• When documents don’t follow a standard format, templates are next to useless. Think about documents like annual reports or contracts with tables and images.
• Templates cannot handle complex elements, such as tables, stamps, images, or diagrams.
• With templates, the text is captured purely based on location and does not account for context, meaning, whether in a different language or any other factors, to extract meaningful information.
Artificial Intelligence (AI) Finds Meaning in Text and Extracts it
As AI capabilities evolved and developed, a new approach to OCR emerged. Using AI takes an extraction solution well beyond what OCR is capable of.
![]() |
Machine learning has enabled us to create algorithms trained on large volumes of data, that can extract data more efficiently. Now you can document variations with ease. |
![]() |
Natural language processing methods can be applied to understand the text and its context. Text analytics allows you to turn raw data into information. |
![]() |
Computer Vision (Deep Learning) systems are able to extract data from non-textual information such as stamps, diagrams, images, or tables. |
A Bank Moves Beyond OCR
A large global bank offers a business loan service to companies in 15 countries. As part of the debt servicing process, the bank analyzes financial data and statements provided by the borrower. The analysis ensures that the borrower can repay the loan and is not at risk of default.
The bank used manual processing of the documents by skilled staff because of the documents:
• Varied from the borrower to borrower.
• Varied over time. One year’s annual report may be laid out differently than the next year’s report.
• Had important context within the documents that needed to be maintained. Footnotes, for example, make materially important changes to the figures in a table.
• Had data that was presented in the table and, sometimes, in nested tables.
This manual process worked, but it was slow, expensive, and often had accuracy problems.
An OCR, no matter how smart, could not handle the extraction job. The bank needed to move beyond OCR.
We worked with the bank to develop an AI-based solution that could extract all the data and context, and account for variations and changes.
(We also built an intelligent application for this bank that used natural language generation to generate a summary report from the extracted data.)
The outcome reduced operating expenses, decreased processing time, and improved accuracy. And the staff used for document extraction could be assigned to higher-value tasks such as analysis of the data.
Next-Gen OCR
Data extraction solutions now exist that are beyond OCR and beyond OCR templates.
These Intelligent Data Processing solutions now process what only humans could process a short time ago. And these solutions can do what OCR could never do: process unstructured, complex documents.