OCR: Outdated, Confined, Unreliable?

/ by
Reading Time: 2 minutes

Old, outdated technology like OCR was good in its time, but the future of data capture and processing resides in automated, line-level data layer capture without the need for manual correction. There are new and proven technologies that largely relinquish OCR to the bottom of the list if you’re able to extract data directly from the digital document. This proven approach enables businesses to cut manual processes, eliminate human error and drive efficiency savings.

Optical Character Recognition (OCR) has been around for decades and became mainstream when LexusNexis began using the technology in the late 1970’s.

The basic technology of OCR is based on image scanning; documents are scanned, creating an image, and data created out of the information discerned from the scan. This technology dramatically reduced expenses due to human capital time by taking away some manual processing and reduced the risk of duplicate payments, and over and under-payments.

With limitations around processing speed, processing errors, human costs, and compliance, OCR is not the technology of the future. These systems have largely failed to deliver because they rely on the scanning of documents to understand the data instead of using technology to inform the processing of the data layer, and that is where CloudTrade comes in.

CloudTrade technology is built to extract data from the document itself rather than read or interpret the data. The perfectly accurate data is validated and then processed straight into back-office systems for rapid and immediate processing. And it happens with 100% accurate data capture, every time.

Human processors can only process invoices -- without error -- at a certain speed. And while humans will always play a vital role in accounts payable, there are limits to the value humans and OCR can provide. OCR deals with what we like to call “human perception” whereas CloudTrade is focused on “human understanding” as it relates to documents and the data those documents hold.

OCR Helps with Limited Document Options


We think of OCR as the process of scanning an image to gather information. In this process there is an increased likelihood for error, even with the technology. The technology is forced to make informed decisions on what data the image holds. Although this approach can be valuable when an image is the only document type available, the value can only be provided when combined with human review and the costs associated with this.

OCR Can’t “Read” the Data it Provides


The basis of OCR is that it scans image-based documents, deciphers the information it has scanned, and presents raw data. But errors are more likely to occur with this method. OCR “perceives” the information whereas CloudTrade’s approach is to “understand” the data in data-rich PDFs. Perhaps you are receiving PDF invoices which you are printing out to scan into your OCR system? This process strips out the actual data that could be used to import directly into your systems, thereby reducing technical and human error.

What it Means to Understand the Data


Perfect data is required for efficient and touchless automated downstream processing. It is possible to acquire perfect data when rules are set to analyze PDFs and emails; which is not possible when relying solely on OCR-based technology. With Natural-Language Processing (NLP) rules set up at supplier-level, there are less than 0.01% of digital PDF documents that CloudTrade are unable to process. This level of data understanding provides a human intervention rate of less than 1%.

Our technology seamlessly processes millions of electronic documents each month – with no data capture mistakes, and no time wasted. CloudTrade automatically converts PDFs into data structures that can be uploaded directly into your finance systems, without having to deal with the errors and uncertainty produced by OCR technology.


Click here to download an overview of our patented e-document technology.