Valuable Data Insights Demand Smarter Data Intake
by Mark Clark, on August 27, 2020 10:30:00 AM PDT
Intelligent Document Processing (IDP) can help healthcare and insurance firms solve document processing challenges. In this blog post, we will walk through an example of how one firm used IDP to overcome manual processing bottlenecks.
Manual Data Extraction From Documents
“Clear Health” (fictitious name) faced a serious challenge: It had a platform that could uncover valuable insights from medical form data, but to feed the platform, data entry staff had to manually key in all of the data from the forms. The forms were too complex for the system to read automatically. This manual processing step was slow and costly.
Could Clear Health automate to resolve this manual processing bottleneck?
Clear Health processed multiple medical forms and document types.

The Automation Challenge
Clear Health is an outsourcer that processes Medicare and ACA medical forms for payers and providers. A typical Clear Health customer would send the firm a single PDF containing a set of 40 medical forms and unstructured documents that need to be processed.
Data extracted from the forms are fed into the firm’s proprietary platform. It generates insights, which are used for better care, and to ensure complete reimbursement.
Clear Health’s Manual Process Flow
It costs $7M to manually process the documents. The team puts the documents into categories, extracts the target data, and verifies the data is correct.
Clear Health wanted to automate document processing to improve its cycle time and to reduce costs. The firm was aiming to drop its costs to $1.7M, for more than $5M in savings a year.
Template-Based OCR Ruled Out
The source documents were incompatible with templates, which immediately ruled out an OCR-based approach. The firm was also seeking a business solution, as opposed to a technical data science platform, to address these requirements.
Extraction Functional Requirements
Clear Health divided the project into phases. Phase one was a preliminary step and a milestone to ensure the project delivered the desired business outcomes.
Phase one required the automated system to perform these functions:
- Read the PDF file and extract the Date of Service.
- Split the PDF file and group pages based on the Date of Service. The pages grouped should all belong to services provided on that Date of Service.
- Generate a CSV file that contains these data elements:
- Patient Account Number
- Patient Name
- Member ID/HIC
- Claim Number/Control Number
- Service Date
- CPT Code
- Revenue Code
- Modifier
- Service Units
- Billed Amount
- Allowed Amount
- Non-Covered Amount
- Denied Amount
- Reason Code
- Payment Amount
- Co-pay Amount
- Co-insurance Amount
- Deductible Amount
- Primary Payer Payment
- Sequestration Amount
An accuracy score for the automated system was then derived based on the Date of Service and page splits.
The Infrrd Solution
Working with Clear Health, Infrrd configured an Intelligent Document Processing (IDP) solution to meet the firm’s requirements. IDP is a business solution focused on automating extraction from complex, unstructured documents, such as healthcare forms.
Infrrd’s platform is designed and built to address business and technical challenges that clients like Clear Health have. Key features of the IDP platform that support this use case are:
Key Features
AI Native Platform |
IDP is a platform with expertly-built models developed to address challenges in healthcare and other verticals. Clients get the best performance by leveraging industry domain knowledge plus insights gained from cross-industry models and solutions. Extraction models are then fine-tuned based on a specific client’s requirements and documents. |
Layered AI Approach |
The IDP platform uses a broad set of AI tools to process the data. Computer Vision, NLP, DNN, and other ML methods are designed and built for specific processing tasks. It is the combination of these tools that allows Infrrd to work on complex, unstructured documents. |
Multi-Step Process |
Infrrd’s solution uses a multiple-step process to extract value from documents. Each step is designed to perform a task with the highest performance, using our unique and proprietary technologies. |
Template Free |
Infrrd’s solution does not use templates found in OCR systems. Instead of using rule-based templates to define extraction fields, Infrrd uses ML methods to find and extract data fields. This allows the solutions to adapt to changes across documents -- without manual intervention. |
Learns and Adapts |
IDP learns, improves, and adapts as the solution sees more documents and document variations. |
Build For The Business User |
IDP is designed to be a solution for business users and subject-matter experts. Instead of needing data experts, a business user performs corrections, adds/modifies documents, administers the system, and monitors performance. |
IDP Model Training and Retraining
Infrrd’s solutions have two stages: Model Training and Operations with Re-Training. Model Training tunes the IDP platform’s models to fit each client’s specific documents and then deploys that solution into operations.
When IDP is in operations, it processes the documents and feeds them into the Clear Health platform. IDP sees more documents and document variations once deployed. With more data, IDP learns and improves its performance over time.
IDP Modeling Training
The first step in the pre-deployment phase is training the page splitting and extraction models. This is where existing models are fine-tuned, using a data set provided by the client. We set a performance baseline, and the model’s accuracy increases over time as the system sees more documents. The solution is deployed into production, starts processing documents, and becomes more intelligent.
Model Training & Deployment
Operations and Model Retraining
The operational flow includes page splitting, classification, and data extraction. If the extracted data passes an accuracy confidence level check, then it is sent to Clear Health’s platform. If the data does not pass the test, then it is sent to the correction workflow.
Operational Process Flow
Infrrd provides an easy-to-use UI and workflow to manage corrections. Corrected data is sent back into the system, delivered to the Clear Health platform, and used to improve the data processing model.
When the system collects sufficient corrected data, the models will be re-trained. Re-training occurs more frequently in the first months of deployment.
This iterative approach allows Infrrd’s solution to start with a smaller data set, and to learn and improve over time. It also allows the solution to adapt to variations in the documents.
Good Initial Accuracy
After seeing 10k documents in an initial phase, the IDP solution delivered 76% data extraction accuracy. To get even better results, Infrrd laid out a learning roadmap to raise the accuracy rate.
Learning Roadmap Improves Accuracy
Infrrd believed it could get to 93% accuracy within six months, given the client’s document volume.
The roadmap focused on four areas of improvement, most of which could be achieved by processing more documents and exposing the system to more document variations.
Date Model Accuracy Improvements
- Manual tagging of date of service on the extracted text from split documents
- The date model was trained on a limited data set. A larger set of training data would yield better accuracy.
Page Split Model Enhancement
- Use a larger training set for page splits to train the model on variations
Confidence Score Thresholds
- Add another layer of models to predict the right confidence score
- Identify correct thresholds to allow ‘document straight-through processing
Business Rules Processing
- Understand and identify business processing rules around DOS processing to help model train on variations
Infrrd’s Solves Healthcare’s Document Challenges
Infrrd’s platform is designed to handle complexity and variations across healthcare documents and forms. Solutions are built on expert models developed cross-industry, then fine-tuned for specific use cases. The IDP learns and improves over time, which ensures compelling business outcomes for our healthcare clients.