How To Improve The OCR Accuracy

Amit Jnagal
Founder & CEO

OCR technology has become widely popular today. Existing workflows and business processes have improved a lot after companies started adopting it. Some have even created their own versions of it to achieve better results in terms of productivity. Although, increasing the OCR accuracy isn’t something which can be done overnight but one can definitely try to do so in due course of time.

So how can someone fine-tune their Optical Character Recognition engines gradually? Well, there are different ways to attain this goal. We at Infrrd keep in mind the following tips:

  • Accuracy is achievable at a character level.
  • Accuracy is achievable at a word level.

On the character level accuracy, an OCR capability is judged on how often it can recognize the right character, rather than how often it identifies a wrong character. Similarly, word-level accuracy means how frequently an OCR identifies the right word. Infrrd OCR has different accuracy levels for the different kind of documents scanned, but we make it a point to achieve at least a minimum of 70% accuracy.

To increase the existing accuracy of our OCR engine we follow the below steps:

1. Checking the Source Image Quality:

Our experts make sure that the original source image is visible enough so that they can get better OCR results. There’s no point of scanning a hazy image in the first place. OCR should be able to recognize high contrasts, character borders, pixel noise, and aligned characters.

2. Choosing the Best OCR Engine:

As we all know that OCR is mainly responsible to understand the text in a given image, so it’s necessary to choose the right one which can preprocess images in a better way. Our software does a good job at that. Still, we keep updating it every now and then to make the result more accurate.

3. Scaling Image to the Right Size:

We try to scale an image to a standard size which is around 300 dpi. Any image which is lower than this size will give an unclear result, while images above 600 dpi will make the output file bigger without much quality.

4. Enhancing the Contrast of Images:

Contrast and density are vital factors to consider before scanning an image in the OCR. We process the image to enhance these factors to get clearer outputs.

5. Removing Noise from the Images:

If an image has background or foreground noise present in it, we make it a point to remove it so that we get high-quality data extraction.

6. Preparing and Handling the Document Properly:

We make sure that documents of any size can be loaded into the scanners. Also, our capture software reduces the document preparation time after they’ve been fed into these scanners.

7. Deskewing and Analyzing Page Layout:

In the preprocessing stage, it’s important to deskew the pages so that the word lines are horizontal. We try to reduce the complexity of the page layout to help the OCR identify text boundaries in a more accurate manner.

8. Analyzing the Character Edge:

The capture tool and the Optical Character Recognition software must be able to optimize the character edge so that there’s minimal labor required while extracting results.

9. Using Filters, Databases, and Thesaurus:

Extra care should be taken to reduce errors. That’s why we use language filters, databases, and thesaurus so that the extracted results make sense and don’t need further inspection.

We keep trying and testing new ways to achieve a more accurate result post-extraction. However, it’s not an overnight process, it takes a thorough understanding of the preprocessing steps to gain momentum. At first, it’s very important to know the defects of the document which has to be scanned. Only then can one take the necessary actions to improve OCR accuracy.

Click here for free OCR demo

Frequently asked questions

What does your pricing model look like?

We price based on the annual volume of pages and complexity of document type.  We can get you preliminary pricing once we outlined a solution.  Let's do this.

To know more, book a 15-min session with an IDP expert

How can I try Infrrd before I commit to a full deployment?

Sure.  The first step is to schedule a guided demo where you get to jump into the thick of it.  After you explore our solution you can try a proof of concept. When you're ready, you can deploy the system to one use case.  Then more use cases.  Then across your enterprise.

To know more, book a 15-min session with an IDP expert

How does your system integrate with others in my enterprise?

We play nice.  Our solutions are API-based.  Your documents are feed into the solution using APIs. And extracted data is sent out through APIs.  We use REST APIs.

To know more, book a 15-min session with an IDP expert

Does your solution run in the cloud or on premise?

Our solution is cloud-native but is also design for premise deployments.  Your choice on how you want to deploy it.

To know more, book a 15-min session with an IDP expert

Does Infrrd run on mobile or desktop device?

Glad you asked.  Our data extraction process runs on servers.  We have found performance and accuracy decline when running on a desktop or mobile device. (Remember Infrrd is running a powerful AI stack).

To know more, book a 15-min session with an IDP expert

Does your system work out of the box or does it require training?

Common documents and use cases work out of the box.  The cool thing is your solution will improve as the system learns from your documents upfront and over time.

To know more, book a 15-min session with an IDP expert

How does your solution handle corrections?

Did you know no system is 100% accurate all the time?  When extraction errors occur you want to correct them.  We provide a simple UI that your business analyst will use to make corrections.

To know more, book a 15-min session with an IDP expert

Does your solution work with handwriting?

Our solution excels at data extraction from handwriting.  We've got proprietary methods and techniques that do the trick.  It's pretty cool.  See for yourself.

To know more, book a 15-min session with an IDP expert