Do You Really Know Your Accuracy?
by Amit Jnagal, on April 17, 2019 10:45:00 AM PDT
If data is the new oil then accuracy is its Octane Number. Companies may have hundreds of disparate sources of data - spreadsheets, documents, receipts, invoices, RFP’s, financial reports etc., if not scraped through accurately that would be a major show stopper. Manual scraping is time-consuming and laborious, while un-intelligent data extraction tools will require human intervention to monitor and make corrections. Need of the hour is an Intelligent Data Capture platform that drives contextual accuracy like a human. Accuracy is one of the most important aspects of data extraction.
There are multiple data extraction tools that quote different accuracies. It is imperative that buyers understand these different kinds of accuracy levels to be able to choose the right kind of platform for their needs.
Let’s take a look at different categories of accuracy.
Accuracy is gainable at a word level. On the character level accuracy, an OCR capability is judged on how often it can recognize a right character, rather than how often it identifies a wrong character. Similarly, word-level accuracy means how frequently an OCR identifies the right word. In this kind of accuracy, data extraction processes a document on a word to word-level especially if the businesses require a search for a name in a file or registration documents. It will give you an idea as to how often a word has been identified correctly. However, there’s no particular manner in which word-level accuracy can be measured. This is due to the fact that there can be many relevant words like city names, person names, store names, etc. as well as many irrelevant words such as “the”, “and”, “a”, etc. In most cases, the extracted words have to be compared with a dictionary word list or a database to obtain accuracy.
In this scenario, the data extraction tool will tell you how often a particular character is recognized correctly rather than how often a character is identified incorrectly. If the data extraction offers you a 99% accuracy, then that means 1 out of 100 characters is uncertain. The uncertain characters may or may not be correct. The tool alone won’t be able to make a final conclusion in such a case even after applying built-in classifiers and internal voting algorithms. This would require human involvement to understand which character’s wrong and which is right.
While processing documents, the data extraction solution has to get the values from many fields such as merchant name, product, price, tax, date, total amount, etc. If a data extraction tool is offering you 90% accuracy in terms of field level, then it means that more than half of the values extracted for these fields are true or correct.
This is mostly a combination of character and field-level accuracy. A data extraction tool gives confidence scores after it has successfully extracted a field or a character. The software originally doesn’t know whether any character or field is accurate or not, that’s why it can only be confident or not confident about it being correct. If the confidence scores generated for individual characters and fields are high for a particular document, then it will mean that the data extraction solution has successfully been able to gather accuracy for extracting the data out of it.
While choosing the right kind of OCR solutions, a buyer must bear in mind the contextual accuracy that they want to achieve and demand the exact kind of accuracy levels that a data extraction tool can drive. A simple comparison by accuracy percentages without the context would result in a disastrous choice of supplier. Hence the need for an Intelligent Data Capture platform that combines cognitive capabilities with data extraction consequently drives contextual accuracy with intrinsic self-learning abilities. Infrrd’s Intelligent Data capture platform has delivered an unprecedented level of accuracy via employing its AI platform that enables Machine learning and Natural Language Processing capabilities to an Enterprise at its core. Click here for a free demo.