financial data extraction from annual reports

Financial Data Extraction From Annual Reports | Infrrd

by Mark Clark, on June 24, 2020 11:54:34 AM PDT

Are use cases where data needs to be automatically extracted from financial reports with complex tables, unstructured documents, non-English languages, and contextual relationships a good application fit for Intelligent Document Processing?

Can Intelligent Document Processing Solve This Client's Challenge?

financial data extraction - challenge

In this post, we walk through a use case in which an investment advisory firm needed to automate data extraction from complex, unstructured financial reports often present in the form of PDF documents.  The firm looked at solutions such as data extraction tools like Optical Character Recognition (OCR) and various AI-based systems but nothing could meet their accuracy requirements.

The firm found manual data extraction was the only way to deal with complex, unstructured documents. But this method was costly, slow, prone to bias, and prone to errors.

Could Intelligent Document Processing (IDP) solve this automation challenge?

The Annual Report Use Case 

financial data extraction - annual report case

The Use Case

An investment advisory firm uses data extracted from complex unstructured financial documents to develop research reports. It has valuable data stuck in those documents that it could use to make not only better reports, but also smarter business decisions.

The Challenge

Manual data extraction was the only way to deal with complex, unstructured documents. This method was costly, slow, prone to bias, and prone to errors.

Source Documents

Annual financial reports and/or financial statements as well as balance sheets in varied document formats, layout variations, complex tabular data, context, and multiple company filings, and in some cases, multiple languages


Intelligent Document Processing (IDP) automates the data extraction process


63% reduction in process cost, reduced time to analyze a report, and more efficient use of labor

An Investment Advisory Firm Runs On Data Insights

financial data extraction - investment

A large independent investment advisory firm (we'll call the firm “Golden”) offers an extensive line of products and services to retail investors, financial advisors, and institutional investors. The quality and timeliness of research, analysis, and advice are what differentiates Golden from its competitors. 

Golden is known for its in-depth, thorough research, and its analysis of public companies. Golden's research requires analysts to dig through annual reports and other financial documents to find data that could reveal how firms are performing and help infer how a firm is likely to perform. Needless to say Golden processes a wide variety of data structures to get the job done.

financial data extraction - investment

Extracting Data From Annual Reports

Data had to be extracted from annual reports having complex and unstructured characteristics. The source documents looked like this: 

A Profile: Golden's Annual Reports

financial data extraction - golden annual report

Multiple Languages

Golden worked with annual reports in 36 languages. The solution needed to extract data from these reports and present the extracted data in English without using a translation service.

Unstructured Data and Variations

The source documents were unstructured and did not follow a fixed format. The solution needed to provide accurate data extraction of a large volume of documents with high variability -- a challenge even for humans.

Layout Provides Context

The extracted data had to be in the same layout and position as in the source document.  The layout contained important context. 

Data in Tables

Much of the financial data was in tables, and tables present tricky extraction challenges. The solution needed to extract data from nested tables -- where a table is within a table -- and retain the tabular layout.  The solution also needed to identify table elements like columns, rows, and cells from one another. PDF FormatTurn the PDF source document into a searchable HTML file.

Can Data Extraction Be Automated?

Golden needed a way to automate data extraction from these documents and improve the overall data processing system.  Once this automation was in place, investment insights could be generated faster and with greater accuracy.

The current manual data extraction process was:

  • Slow
  • Error and bias prone
  • High cost
  • Only worked with English documents

OCR Failed To Process Financial Reports

Processing documents like these annual reports proved to be too difficult for OCR-based solutions, and while the manual process worked, it was slow and inefficient.  

This manual data extraction step was a major bottleneck in an otherwise efficient insight generation process.   It was a pain point worth solving. Hence, the organization had its eyes and ears open for more sophisticated extraction tools which could offer the much-needed resolution to the issue at hand.

Ok, But What About ML OCR?

Is Intelligent Document Processing (IDP) a Fit For This Use Case?

financial data extraction - report

After hitting a wall with other solutions, Golden reached out to Infrrd to see if its IDP solution could solve their problem.  Working with Golden, Infrrd developed a solution architecture that included the following elements:

The IDP Solution

financial data extraction - contract

After understanding Golden's requirements, Infrrd designed a solution that would help Golden remove the bottlenecks and help it achieve its business goals. The solution was built on Infrrd's IDP platform and configured for Golden's specific use case.

The IDP platform is an AI-native approach to document processing that combines machine learning, natural language processing, computer vision, OCR, and other technologies necessary to extract data from unstructured, complex documents such as financial reports. 

Golden's IDP solution was able to:

1. Preprocess the documents to improve accuracy

financial data extraction - accuracy

A processing step is used to prepare the annual report for extraction. The platform uses computer vision and machine learning methods to correct image orientation and skewing issues. The images are then enhanced, and background noise is removed.  The solution also uses image processing and ML algorithms to segment, analyze, understand, and preserve individual table layouts and structures. 


2. Extract data from the annual reports

financial data extraction - annual report

Infrrd's IDP platform uses a multiple-step process to extract data and contextual information from the source document which could be in the form of PDF files or other document formats.  In addition to advanced preprocessing, the solution uses multiple AI techniques plus specialized OCR engines to extract the target data.  Once extracted, the data is passed through additional AI processes to validate, clean, enrich, and integrate the data. 

3. Translate any of the 36 languages into English

financial data extraction - translate language

Infrrd's IDP platform uses proprietary language translation capabilities based on deep neural network technologies. This functionality has the ability to learn from new documents and languages it sees. IDP can also learn patterns from a document in one language and apply those learnings to a document in another language.   

4. Adapt and Learn

financial data extraction - adapt & learn

Companies change their annual reports from year to year. Layout and designs are different, and the desired data can move around on a page. Infrrd's IDP solution is constantly learning and improving as it sees new documents. The result is that extraction accuracy improves over time. 

5. Convert Source PDFs Into Searchable HTML-- Keeping The Layout

financial data extraction - convert source pdf

Using advanced AI methods, the platform is able to extract the data in the PDF and transform it into a searchable PDF, while preserving the original layout. This searchable HTML is sent to Golden's analytics platform that develops insights from the extracted data.

IDP Removed The Manual Data Extraction Bottleneck 

Golden's pain point could finally be resolved using Infrrd's advanced IDP platform.  With the manual bottleneck removed, Golden could transform its financial report analytics process into one with higher performance and efficiencies.  With this solution in place, Golden expected it will help them reduce costs and time to process by over 50%. 

5 Items That Make This A Good Fit For IDP

This use case highlighted what makes a good fit for using an Intelligent Document Processing solution approach:

• The target back-office process uses manual efforts to extract data from documents.
• Source documents are complex and unstructured. Documents similar to the financial reports Golden processed are a very good fit.
• The manual step is costly, slow, inefficient, error-prone, and will not scale.
• The manual step means that ability to execute a digital operating model is blocked.
• There is a sufficient volume of documents to process that automation makes sense.

"But Our Use Case Is Impossible To Automate"

Many of our clients come to us with data extraction use cases similar to Golden's. They tried to solve the problem with other OCR or other technical approaches.  None worked.  They considered their use case impossible to automate. 

But IDP was able to resolve the bottleneck. 

Even if you have an “impossible” use case, Intelligent Document Processing is worth exploring. You might be surprised by what's possible with the latest AI and ML-based IDP technology.

FAQs on Financial Data Extraction

What are the benefits of financial data extraction?

There are many benefits to financial data extraction, including the ability to quickly and easily access large amounts of data, the ability to process and analyze data more efficiently, and the ability to share data with others more easily. Financial data extraction can also help businesses and individuals save time and money by automating tasks that would otherwise be time-consuming and expensive.

How does the Data Extraction Process Work?

The data extraction process begins with the collection of data from various sources. The data is then cleaned and processed to extract the relevant information. The extracted data is then stored in a database for further analysis.

What kind of data can be collected via financial data extraction?

Several different types of data can be extracted from financial documents. This includes information on income, expenses, assets, liabilities, and more. This data can be used to help individuals and businesses make better financial decisions. It can also be used to track trends and monitor financial performance.

What are the use cases of data extraction in finance?

There are many use cases for data extraction in finance. For example, data can be extracted to perform financial analysis, track financial performance, monitor financial risks, and support financial decision-making. Additionally, data extraction can be used to create financial reports, support auditing, and compliance activities, and detect and prevent financial fraud.

Topics:Intelligent AutomationAI ReadinessBusiness InsightsHow To

About this blog

AI can be a game-changer, but only if you know how to play the game. This blog is a practical guide to turning AI into real business value. Learn how to:

  • Make sense of complex documents and images.
  • Extract the data you need to drive intelligent process automation.
  • Apply AI to gain insights and knowledge from your business documents.

Get the Infographic: IDP Vs OCR

Subscribe to Updates