Understanding IDP Categories: Document Classification
by Sujith Parakkunnath, on November 18, 2021 9:00:00 AM PST
According to Gartner, "The market for document capture, extraction and processing is highly fragmented. Data and analytics leaders should use this research to understand the process flow and differentiated capabilities offered by intelligent document processing solutions." Gartner's recently released “Infographic: Understand Intelligent Document Processing" covers these 6 critical flows in IDP.
Source: Gartner, Infographic: Understand Intelligent Document Processing, Shubhangi Vashisth et al., 22 September 2021
In this second post in our "Understanding IDP" series we explore Document Classification. (Check out our first post featuring Capture or Ingestion and Document Preprocessing.)
IDP is inevitably becoming essential for businesses to automate and scale exponentially and competitively. The key to IDP is how efficiently and accurately your legacy, semi-structured, unstructured, or multi-variation documents are extracted. Before extracting the data, a key but complex activity is document classification, which means indexing, detecting, and classifying different document types.
In today’s digital world, businesses are transforming rapidly with technology to stay competitive. This means that a large volume of data and documents are processed and classified, with unstructured document data amplifying the challenge.
Before touching upon Infrrd’s deep learning-based and industry-leading classification features, let us look at the business use cases or challenges in document classification and how an IDP solution can be a game-changer in this space.
Let’s consider a prospective customer applying for a loan with your mortgage company. Here, a lot of information is exchanged between the borrower and company, such as W-2 forms, bank statements, and ID cards. There would be several ways to collect this information—the borrower may be required to send an email or to upload these documents to a Web portal. Now, as your mortgage company receives different types of documents—most not in a routed or defined way—the first step is to interpret the different types of documents received. If you are in the mortgage industry, you understand well the complexity for a loan officer to accurately and efficiently organize these documents, notwithstanding the possible inaccuracies or errors in this process. This is where an IDP solution offers excellent ROI with intelligent automation to automatically classify document types with an exponential increase in time and accuracy.
Another challenge for the mortgage industry is the loan closing package. When the loan is approved, the company sends a loan closing package, a set of documents, such as the completed loan application, home title, and other mortgage documents that borrowers sign to finalize the loan processing. The volume of documents to process in loan closing packages can run up to hundreds and even thousands of pages. So, you can imagine the complexities and time spent by loan servicers involved in this process.
Similar to the mortgage industry, any sector where a large volume of documents is processed is a perfect domain for IDP solutions.
As challenges are complex, let us see what Infrrd’s IDP systems offer. Infrrd’s classification features are based on a combination of AI technologies, such as deep learning and NLP, and proprietary machine learning algorithms. We call it Intelligent Classification. Using Infrrd’s IDP, you can create your own classification models and map each document type to specific extraction models.
In today’s IDP space, classification does not just detect or identify the type of content in a document and categorize it but does more to achieve intelligent classification. What does that mean? Let’s say a borrower who applied for a loan submits W-2 forms for the previous two years. What you need is the W-2 forms not for any two years but for the immediately preceding years. This is where Intelligent classification plays a major role. It goes deeper and enables you to classify the documents based on the dates, or any other data, in the document.
Our classification models support multi-language processing and address diverse business scenarios, including document classification and page classification.
1. Document Classification
Infrrd has a built-in, out-of-the-box, computer vision-based Document Classification model to classify various types of documents. Consider that you have 100 documents, 60 of which are invoices and 40 are receipts. All you have to do is zip those documents and upload them to our Document Classification model. The Infrrd system will recognize the various document types and categorize them for you sooner than you think.
2. Page Classification
Page Classification is an Infrrd proposition to address a unique challenge for a large number of businesses. In reality, there are several instances where different documents are in a single file. In these cases, each page may have to be split based on the document type. This challenge requires a paradigm shift in classifying the document types. For example, you have a 100-page unstructured document, where legacy invoices and receipts are scattered throughout making it a daunting task to make sense of it. However, you just have to upload the document to our Intelligent Page Classification model, and the rest is taken care of for you.
Infrrd’s Patent-Pending Page Continuation
Before we conclude, let me touch upon the Page Continuation feature that should bring a paradigm shift in document classification. Page Continuation, a patent-pending Infrrd feature, is a unique capability of the Page Classification model where Infrrd’s proprietary machine learning algorithms distinguish similar data stacked together. For example, in your 100-page document, pages 12 to 15 are 3 monthly bank statements of a specific bank - say Bank of America. However, you may need to verify whether the bank statements are recent or you may want to distinguish them based on other parameters. Our Page Continuation feature has a proprietary logic that distinguishes bank statements for each month even though the document type is the same.
The Page Continuation feature can eliminate manual efforts drastically, reducing the hundreds and thousands of hours that you may have had to invest for detailed analyses of classified documents - making this IDP feature a high-value proposition for your business.
Now, let’s take a look at a common pitfall while choosing an IDP solution. We have heard from our customers that they initially choose vendors that provide 50% to 60% classification accuracy because it brings some level of automation. However, they quickly realize this partial solution limits their productivity. It always makes sense to choose an IDP solution that provides Intelligent Classification with an accuracy of 90% or more to remain competitive.
It is a reality that your business may have to constantly evolve to stay competitive which means frequent changes to your document processing workflow. Infrrd’s classification approach is beneficial because our classification models recognize and easily integrate with trained extraction models, i.e. trained document types. You need to train or supervise only the new data sets. Let’s say you want to classify two documents - invoices and loan documents. If you have already trained an extraction model for invoices, additional training or supervision may not be required during classification; you just need to train the data set for only the new document type, the loan document.
Moreover, Infrrd’s ML-first, API-driven IDP solution enables you to group multiple classification and extraction models to create a new model. In a nutshell, Infrrd’s classification models are tightly integrated with existing extraction models to offer you flexibility, accuracy, and versatility in managing rapid, constant redirections or transitions in your business, or document-processing, workflows.
Choosing the right IDP partner keeps you competitive and eliminates a myriad of pitfalls. During your IDP selection process, we recommend you add Intelligent Classification to your evaluation checkpoints.
Be sure to check out our next post, where we explore Gartner’s description of the fourth critical flow, Data Extraction, and see how Infrrd stacks up.