Automated Data Extraction From Annual Reports
by Mark Clark, on June 24, 2020 11:54:34 AM PDT
Are use cases where data needs to be automatically extracted from financial reports with complex tables, unstructured documents, non-English languages, and contextual relationships a good application fit for Intelligent Document Processing?
Can Intelligent Document Processing Solve This Client’s Challenge?
In this post, we walk through a use case in which an investment advisory firm needed to automate data extraction from complex, unstructured financial reports. The firm looked at solutions like OCR and various AI-based systems but nothing could meet their accuracy requirements.
The firm found manual data extraction was the only way to deal with complex, unstructured documents. But this method was costly, slow, prone to bias, and prone to errors.
Could Intelligent Document Processing (IDP) solve this automation challenge?
The Annual Report Use Case
The Use Case
|An investment advisory firm uses data extracted from complex unstructured financial documents to develop research reports. It has valuable data stuck in those documents that it could use to make not only better reports, but also smarter business decisions.|
|Manual data extraction was the only way to deal with complex, unstructured documents. This method was costly, slow, prone to bias, and prone to errors.|
|Source Documents||Annual financial reports with layout variations, complex tables, context, and multiple languages|
|Solution||Intelligent Document Processing (IDP) automates the data extraction process|
|Impact||63% reduction in process cost, reduced time to analyze a report, and more efficient use of labor|
An Investment Advisory Firm Runs On Data Insights
A large independent investment advisory firm (we’ll call the firm “Golden”) offers an extensive line of products and services to retail investors, financial advisors, and institutional investors. The quality and timeliness of research, analysis, and advice are what differentiates Golden from its competitors.
Golden is known for its in-depth, thorough research, and its analysis of public companies. Golden’s research requires analysts to dig through annual reports and other financial documents to find data that could reveal how firms are performing and help infer how a firm is likely to perform.
Extracting Data From Annual Reports
Data had to be extracted from annual reports having complex and unstructured characteristics. The source documents looked like this:
|A Profile:||Golden’s Annual Reports|
|Golden worked with annual reports in 36 languages. The solution needed to extract data from these reports and present the extracted data in English without using a translation service.|
Unstructured and Variations
|The source documents were unstructured and did not follow a fixed format. The solution needed to provide accurate data extraction of a large volume of documents with high variability -- a challenge even for humans.|
|Layout Provides Context||The extracted data had to be in the same layout and position as in the source document. The layout contained important context.|
|Data in Tables||Much of the financial data was in tables, and tables present tricky extraction challenges. The solution needed to extract data from nested tables -- where a table is within a table -- and retain the tabular layout. The solution also needed to identify table elements like columns, rows, and cells from one another.|
|PDF Format||Turn the PDF source document into a searchable HTML file.|
Can Data Extraction Be Automated?
Golden needed a way to automate data extraction from these documents and improve the overall data processing system. Once this automation was in place, investment insights could be generated faster and with greater accuracy.
The current manual data extraction process was:
- Error and bias prone
- High cost
- Only worked with English documents
OCR Failed To Process Financial Reports
Processing documents like these annual reports proofed to be too difficult for OCR-based solutions, and while the manual process worked, it was slow and inefficient.
This manual data extraction step was a major bottleneck in an otherwise efficient insight generation process. It was a pain point worth solving.
Is Intelligent Document Processing (IDP) a Fit For This Use Case?
After hitting a wall with other solutions, Golden reached out to Infrrd to see if its IDP solution could solve their problem. Working with Golden, Infrrd developed a solution architecture that included the following elements:
The IDP Solution
After understanding Golden’s requirements, Infrrd designed a solution that would help Golden remove the bottlenecks and help it achieve its business goals. The solution was built on Infrrd’s IDP platform and configured for Golden’s specific use case.
The IDP platform is an AI-native approach to document processing that combines machine learning, natural language processing, computer vision, OCR, and other technologies necessary to extract data from unstructured, complex documents such as financial reports.
Golden’s IDP solution was able to:
1. Preprocess the documents to improve accuracy
A processing step is used to prepare the annual report for extraction. The platform uses computer vision and machine learning methods to correct image orientation and skewing issues. The images are then enhanced, and background noise is removed. The solution also uses image processing and ML algorithms to segment, analyze, understand, and preserve individual table layouts and structures.
2. Extract data from the annual reports
Infrrd’s IDP platform uses a multiple-step process to extract data and contextual information from the source document. In addition to advanced preprocessing, the solution uses multiple AI techniques plus specialized OCR engines to extract the target data. Once extracted, the data is passed through additional AI processes to validate, clean, enrich, and integrate the data.
3. Translate any of the 36 languages into English
Infrrd’s IDP platform uses proprietary language translation capabilities based on deep neural network technologies. This functionality has the ability to learn from new documents and languages it sees. IDP can also learn patterns from a document in one language and apply those learnings to a document in another language.
4. Adapt and Learn
Companies change their annual reports from year to year. Layout and designs are different, and the desired data can move around on a page. Infrrd’s IDP solution is constantly learning and improving as it sees new documents. The result is that extraction accuracy improves over time.
5. Convert Source PDFs Into Searchable HTML-- Keeping The Layout
Using advanced AI methods, the platform is able to extract the data in the PDF and transform it into a searchable PDF, while preserving the original layout. This searchable HTML is sent to Golden’s analytics platform that develops insights from the extracted data.
IDP Removed The Manual Data Extraction Bottleneck
Golden’s pain point could finally be resolved using Infrrd’s advanced IDP platform. With the manual bottleneck removed, Golden could transform its financial report analytics process into one with higher performance and efficiencies. With this solution in place, Golden expected it will help them reduce costs and time to process by over 50%.
5 Items That Make This A Good Fit For IDP
This use case highlighted what makes a good fit for using an Intelligent Document Processing solution approach:
- The target back-office process uses manual efforts to extract data from documents.
- Source documents are complex and unstructured. Documents similar to the financial reports Golden processed are a very good fit.
- The manual step is costly, slow, inefficient, error-prone, and will not scale.
- The manual step means that ability to execute a digital operating model is blocked.
- There is a sufficient volume of documents to process that automation makes sense.
"But Our Use Case Is Impossible To Automate"
Many of our clients come to us with data extraction use cases similar to Golden’s. They tried to solve the problem with other OCR or other technical approaches. None worked. They considered their use case impossible to automate.
But IDP was able to resolve the bottleneck.
Even if you have an “impossible” use case, Intelligent Document Processing is worth exploring. You might be surprised what's possible with the latest AI and ML-based IDP technology.