Why OCR invoices when the data is in front of you?

Reading Time: 2 minutes

why ocr for p2p automation

An interesting question, the likely answer is that you don’t know the data is in front of you. 

The paperless office has been promised for almost 20 years – but unfortunately it is yet to become a reality.  Originally we had document scanning and archiving; then OCR technology allowed us to extract data from structured and semi-structured documents. Capture evolved into intelligent capture with the introduction of so called ‘learning algorithms’, whilst vendors started to evaluate themselves and the competition in terms of recognition and straight through processing rates.

Yet even with various technological advances and fierce competition among industry leaders, scanning and to OCR invoices is – and will always be – a flawed approach to the conversion of 'human readable documents' into 'machine readable structures'.


Scanning: Firstly, paper needs to be converted into a format that can be processed by an OCR platform. A manually intensive process in itself: mail is received, opened and sorted. Staples are removed and batches prepared. Paper documents are scanned and original documents either archived or destroyed.

OCR: The scanning function feeds the OCR platform which reads the photographed image of the document, attempting to convert the black & white pixels, into meaningful characters. OCR companies’ often boast about high recognition rates – but in reality the variables impacting success are often outside of their control: poor quality paper used by the supplier; the way data is laid out on the document; the quality of the scanner used and even the way the paper has been folded in the envelope, can all impact on OCR results and make the most powerful platforms close to useless.

Quality control: Irrespective of how good the image is, you still need an operator to check OCR results and either correct what’s been captured or fill in what’s missing.

So where is the data?

Find out about PDF invoicing.

Most organisations now send and receive PDF documents via email. It is the easiest and most efficient way to send documents in the P2P process, such as invoices and orders, as the functionality is ‘out of the box’ with modern billing and procurement applications.

There is no question that email and PDF is ubiquitous. However, what many may not be aware of is that when an application generates a PDF, in almost all instances, the data – such as invoice number, line quantity and amounts – will be embedded within the PDF, put there by the generating application.

Methods such as this guarantee data quality and remove the manual activities and risks associated with scanning and OCR moving us into the realms of true P2P automation.

Now you know where the data is stored you can automatically map this data to an e-document structure that’s compatible with your processing application...


Patented technology that enables documents, in particular supplier invoices and customer orders to be received as PDFs and automatically converted into an e-document structure. We map the data embedded within the PDF as created by the originating system and convert this into a format accepted by the recipient.

As this approach is so simple and non-disruptive to any supply chain, adoption rates are extremely high when an organisation promotes this method of e-invoicing. So again… why OCR?

Of course it would be fool hardy to predict that there will ever be a truly paperless office. Some paper will likely remain – at least in the short term. But as most billing applications can generate and send PDF invoices via email, it is the easiest and quickest way to move closer to a paperless office.

New to PDF invoicing and interested in how it works?

 Download our Free PDF E-Invoicing Guide


Read: 5 facts and myths about PDF invoices