Process Automation using Machine Learning and OCR
Source: Getty Images

Paper forms are becoming less useful as the world becomes more digitized. But some organizations, especially in the healthcare industry, continue to deal with paper documents. The process to digitize these forms can be tedious and time-consuming. However, streamlining processes through automation saves both time and money.

Client’s Need

A current client tasked Data-Core Healthcare with automating its payment posting process.  This Healthcare organization receives a large volume of document batches for daily processing. Batches can include paper checks, Explanation of Benefits (EOBs), Remittance Advice, and a variety of correspondence. Evidently, not all documents received are related to payment posting. Those that are can be for one patient or many.  Previously, the client manually separated the EOBs, identified regions of interest within each document, and posted payments by entering data from these regions. The length of processing time and excess money spent on labor created a need for a process automation solution.

Automating the conversion of paper EOBs to a clean electronic data stream conforming to ANSI standards posed several challenges. First, the client receives EOB documents mixed with other types of documents from the payers. Also, due to the lack of standardization of EOB forms, and the multitude of payers in the healthcare industry, necessary patient and claim information on the EOBs is not easily located. Finally, the quality of the documents is not uniform, meaning some are easily legible, while others are less clear, due to various reasons such as ink color and printing quality.

Our Solution

Data-Core’s solution includes a pre-processing element to improve the quality of the scanned images. This step uses standard Optical Character Recognition (OCR) techniques to make the documents readable. Machine Learning techniques, such as Convolutional Autoencoders and Object Detection CNNs, are used to (1) identify the EOBs (2) classify them by type (3) map out the regions of interest within each EOB, and (4) perform contextual data extraction from these regions. Custom-built validation algorithms then ensure the accuracy of the extracted data through a pass/fail labeling method. Our team of experts review, correct, and resubmits all claims that fail validation. After the claims pass validation, the solution produces a customized data file (EDI-835). The file is then imported into the client’s system for automatic payment posting.

The various Machine Learning modules utilized in the solution have a Recall of over 98% and Precision of over 95%, resulting in high confidence in the conversion process1. With an average of 1 million scanned documents per month, this solution produced an overall 80% automation rate. Automation rate refers to the percentage of EOBs successfully converted into EDI-835 files, without any manual intervention.

As a result of Data-Core’s work, our client’s turnaround time has drastically improved. It now takes significantly less time for payments to be posted into patient accounts.

  1. Recall is the ability of a model to find all the relevant cases within a dataset and precision is the ability of a model to identify only the relevant data points.