Templates Vs Machine Learning OCR

By
Amit Jnagal
Founder & CEO

Over the past 15 years, I have had the chance to work with many OCR tools and one thing I can say with certainty is that the text extraction quality of these tools has steadily improved with ongoing improvements in artificial intelligence and machine learning OCR techniques.

More than ever businesses are trying to derive useful insights and meaning from scanned images and documents. For example, banks are wanting to extract intelligence such as parties involved and contract expiry dates from scanned contracts, insurance companies are wanting to detect fraudulent receipts submitted during the claims process and many more. Use cases like these require unstructured text be converted into structured meaningful data during OCR or post OCR.

OCR tools inherently lack the intelligence to parse or understand extracted text beyond just extracting it. To assign meaning and structure to the content, another system needs to process the extracted text and extract entities and entity types from it.

In this example, the OCR system does an accurate extraction of text however it does not have the intelligence to identify the specifics of the merchant name, merchant address or other important details such as tax, total and individual line items.

In this article, I want to compare two post-OCR text enrichment techniques. One is the conventional technique of using templates while other is the modern approach of applying machine learning.

Let’s dive into templates first. Templates are what the name says, templates. In this method, the user manually marks co-ordinates for the text of interest on the image and subsequently uses the output of OCR engine to locate and extract text. This approach works well and is highly accurate if the text layout within the scanned image matches the layout coded in the template.

However, this approach starts to fail for systems which deal with a large number of document layouts and for systems which frequently encounter new types of documents. An invoice processing system which receives new types of invoices from different suppliers is a good example. For an invoice processing system, a template approach may work fine initially but will soon become unmanageable as the number of suppliers grows and change.

Now let’s consider the alternative machine learning approach. A  machine learning OCR uses a trained model which encodes thousands of rules for determining the meaning of the content. This model is generally trained using a combination of supervised and unsupervised learning methods. For example, one approach for training could be to use feature data set as follows to predict if a line in the text contains a merchant name.

Line Font Size Website Proximity Language Key Words Entity Match Merchant Name
1 Larger 1 Address English Restaurant 1 1
1 Normal 0 Logo English Cafe 0 1
1 Small 0 Logo English 97229 0 0
1 Larger 0 Logo English Ristorante 1 ?

A trained model can fine tune itself as more training data is collected and ingested into the training process. Machine learning approach is much more scalable across languages and across different types of documents even if they are not processed by the system. Although this approach requires that initial effort to build high-quality training models and entity recognition models, but once built, this approach scales faster and better than the templates approach.

At Infrrd we are researching and experimenting with various techniques involving machine learning to improve content enrichment post-OCR text extraction from different document types such as receipts, invoices, contracts and shipping labels.

Frequently asked questions

What does your pricing model look like?

We price based on the annual volume of pages and complexity of document type.  We can get you preliminary pricing once we outlined a solution.  Let's do this.

To know more, book a 15-min session with an IDP expert

How can I try Infrrd before I commit to a full deployment?

Sure.  The first step is to schedule a guided demo where you get to jump into the thick of it.  After you explore our solution you can try a proof of concept. When you're ready, you can deploy the system to one use case.  Then more use cases.  Then across your enterprise.

To know more, book a 15-min session with an IDP expert

How does your system integrate with others in my enterprise?

We play nice.  Our solutions are API-based.  Your documents are feed into the solution using APIs. And extracted data is sent out through APIs.  We use REST APIs.

To know more, book a 15-min session with an IDP expert

Does your solution run in the cloud or on premise?

Our solution is cloud-native but is also design for premise deployments.  Your choice on how you want to deploy it.

To know more, book a 15-min session with an IDP expert

Does Infrrd run on mobile or desktop device?

Glad you asked.  Our data extraction process runs on servers.  We have found performance and accuracy decline when running on a desktop or mobile device. (Remember Infrrd is running a powerful AI stack).

To know more, book a 15-min session with an IDP expert

Does your system work out of the box or does it require training?

Common documents and use cases work out of the box.  The cool thing is your solution will improve as the system learns from your documents upfront and over time.

To know more, book a 15-min session with an IDP expert

How does your solution handle corrections?

Did you know no system is 100% accurate all the time?  When extraction errors occur you want to correct them.  We provide a simple UI that your business analyst will use to make corrections.

To know more, book a 15-min session with an IDP expert

Does your solution work with handwriting?

Our solution excels at data extraction from handwriting.  We've got proprietary methods and techniques that do the trick.  It's pretty cool.  See for yourself.

To know more, book a 15-min session with an IDP expert