Posts

OCR and Perception

Is it OCR?

Reading Time: 5 minutes
OCR and Perception
How data is perceived is crucial for it to be understood for automation

When looking at data capture solutions, the term OCR often pops up. A technology that has been around for years, it is often the ‘go-to’ for companies looking to automate their data capture. In this blog post, the first in a trilogy explaining the CloudTrade data-capture solution, Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means. When it comes to the future of data capture and enabling automation, we need to look at data perception and understanding…

I am often asked to explain the difference between the service that we provide here at CloudTrade and those services which are sold under the banner of “Optical Character Recognition” (OCR).

There is almost a straight answer to this, which is that OCR deals with what we might call “human perception” whereas CloudTrade is more about “human understanding”.

I say “almost” because the waters get muddied on a couple of counts. I shall come to these later; but let me first define exactly what I mean by “perception” and “understanding”.

What do we mean by data perception and data understanding?

Perception is all about recognition in its most basic form. It’s the bit in our brains which translates swirly lines and dots and circles into meaningful letters in the English language. It’s also the bit that has to struggle with differentiating between “i” and “j” or “b” and “h” so that we don’t end up wishing people “bappy hjrthdays” or catching a “fjshes” on a “fjshjng book”.

Understanding, however, is all about meaning. It’s the bit that comes in after perception has done its job (assuming that it gets it right!) and figures out, say, that the word “fishing” in “fishing for compliments” has nothing to do with the word “fishing” when you’re fishing in the sea.

Where the difference in perception and understanding starts to get muddied is that both the providers of OCR based solutions and we, ourselves, at CloudTrade, offer services which are based on a combination of both of these technologies.

You can’t have one without the other

You can’t, after all, have understanding without perception (unless you’re some sort of yogi floating over a mat in the Himalayas), or perception without understanding (imagine trying to find your way around the Tokyo underground system when you don’t speak Japanese). CloudTrade and OCR-based solutions need to use both of these elements because providing this service means not only extracting the right numbers and letters from those documents that are sent to us but also understanding them well enough to explain that, for example, “quantity 1” in an order line next to “car mats” is probably referring to a pack of 4 whereas the same phrase next to “Lamborghini Veneno Roadster” is unlikely to be referring to a pack of 4 of them at all.

Traditionally, OCR-based solutions have focussed on the perception side of the problem because that is where they have invested the bulk of their R&D, leaving the understanding part to be provided mostly by humans.

The value is in the understanding

CloudTrade, on the other hand, has invested all of its R&D efforts on understanding, succeeding in bypassing the perception part completely by focusing on “data” documents such as “data” PDFs (where, for example, the letter “s” is unambiguously stored as the letter “s” rather than as a set of drawing instructions resulting in something which could look like the letter “s” to the human eye).

Data PDFs do not need OCR and can therefore be thought of as producing a “perception” result which is 100% accurate. 100% perception is the key enabler for the process of understanding, as it allows a natural language analysis to take place with high levels of sophistication as there is no fear that all of the logical steps taking place within it will be broken by some stray spanner in the works which changes the word “battery” to a “hattery” or omits a very important decimal point in the phrase “don’t exceed the recommended dose of 1.234 ml every 24 hours”.

Providing the fuel for automation

Sophisticated systems of understanding remove the need for human operators and allow services to operate in a fully automated manner. At the time of writing, CloudTrade is processing ten million documents a year in this fashion. As soon as errors in perception are introduced, such as by using OCR, failures start to occur in the grammatical rules which underpin the process of understanding, and more and more human intervention is needed resulting in less and less automation.

Alternatively, OCR solutions operate in this field because they embrace the human element of document processing. The advantage is that they are not limited to only processing data PDFs. Their disadvantage is that they cannot fully automate.

To assume is to…

The second way in which the difference between perception and understanding has been muddled is in the technology behind OCR, which has now made inroads into the world of understanding. To quote Douglas Hofstadter from his seminal paper on OCR and AI called “on seeing A’s and seeing As”:

“A tacit assumption is thus that the components of sentences–individual words, or the concepts lying beneath them–are not deeply problematical aspects of intelligence, but rather that the mystery of thought is how these small, elemental, “trivial” items work together in large, complex (and perforce nontrivial) structures.”

Douglas Hofstadter

This assumption is certainly true with data PDFs, and that “mystery of thought” is clearly where CloudTrade has put in all of its R&D efforts. However, should the need for OCR not disappear completely, as might happen if all interactions become electronic and “data” based documents become the norm, then the most promising future for OCR is likely to come out of a hybridisation of perception and understanding.

Variety is the spice of life? Not for data.

Although as I said earlier, OCR makes mistakes such as reading “fish” for “fjsh”, what it actually does is identify lists of variations rather than hard and fast answers and then present those variations with their individual certainty values to a user for arbitration (i.e. it could be “fjsh” (60%) or perhaps it’s “fish” (50%)). OCR vendors can then use dictionaries to automatically strip out nonsense words like “fjsh” and perhaps narrow down the possibilities to arrive at the right answer. This doesn’t work, however, when the OCR mistakes still result in words present in the dictionary, or when a word being considered is not necessarily an English word at all (like a part number in a catalogue).

A far more sophisticated solution would be to bring in all these variations in perception straight into the “understanding” engine and then allow the latter to crunch through all of the grammatical options.

This is something that we have experimented with at CloudTrade, since it is possible for us to connect to OCR as the “perception” part of our solution. In doing so we have, indeed, found that with a bit of patience and tailoring we can deliver an OCR based service which is just about acceptable and automatic for header-level capture, but it’s too painful and slow to be feasible on complex or not “near-perfect” scanned images.

Dictionary lookups have been a standard feature with OCR vendors for some time. Advances in Machine Learning may well improve matters further in the future. I doubt very much that any improvements will happen with things like invoices and purchase orders, where a lot of the key information doesn’t have very much context to draw upon to allow significant automatic corrections to be made, but there could be mileage in using this technology with historical documents written in proper flowing prose.

OCR may well have an interesting future when it comes to scanning documents that were written in the past, but it’s more than likely to now be a past technology when it comes to documents that are to be written in the future.


CloudTrade vs OCR

Want to know more about how CloudTrade differs from OCR Technology?

Download our guide outlining the key features of both types of solution and the differences between them.

 

OCR and Perception

How is CloudTrade technology different to OCR?

Reading Time: 5 minutes
OCR and Perception
How data is perceived is crucial for it to be understood for automation

When looking at data capture solutions, the term OCR often pops up. A technology that has been around for years, it is often the ‘go-to’ for companies looking to automate their data capture. In this blog post, Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means. When it comes to the future of data capture and enabling automation, we need to look at data perception and understanding…

I am often asked to explain the difference between the service that we provide here at CloudTrade and those services which are sold under the banner of “Optical Character Recognition” (OCR).

There is almost a straight answer to this, which is that OCR deals with what we might call “human perception” whereas CloudTrade is more about “human understanding”.

I say “almost” because the waters get muddied on a couple of counts. I shall come to these later; but let me first define exactly what I mean by “perception” and “understanding”.

What do we mean by data perception and data understanding?

Perception is all about recognition in its most basic form. It’s the bit in our brains which translates swirly lines and dots and circles into meaningful letters in the English language. It’s also the bit that has to struggle with differentiating between “i” and “j” or “b” and “h” so that we don’t end up wishing people “bappy hjrthdays” or catching a “fjshes” on a “fjshjng book”.

Understanding, however, is all about meaning. It’s the bit that comes in after perception has done its job (assuming that it gets it right!) and figures out, say, that the word “fishing” in “fishing for compliments” has nothing to do with the word “fishing” when you’re fishing in the sea.

Where the difference in perception and understanding starts to get muddied is that both the providers of OCR based solutions and we, ourselves, at CloudTrade, offer services which are based on a combination of both of these technologies.

You can’t have one without the other

You can’t, after all, have understanding without perception (unless you’re some sort of yogi floating over a mat in the Himalayas), or perception without understanding (imagine trying to find your way around the Tokyo underground system when you don’t speak Japanese). CloudTrade and OCR-based solutions need to use both of these elements because providing this service means not only extracting the right numbers and letters from those documents that are sent to us but also understanding them well enough to explain that, for example, “quantity 1” in an order line next to “car mats” is probably referring to a pack of 4 whereas the same phrase next to “Lamborghini Veneno Roadster” is unlikely to be referring to a pack of 4 of them at all.

Traditionally, OCR-based solutions have focussed on the perception side of the problem because that is where they have invested the bulk of their R&D, leaving the understanding part to be provided mostly by humans.

The value is in the understanding

CloudTrade, on the other hand, has invested all of its R&D efforts on understanding, succeeding in bypassing the perception part completely by focusing on “data” documents such as “data” PDFs (where, for example, the letter “s” is unambiguously stored as the letter “s” rather than as a set of drawing instructions resulting in something which could look like the letter “s” to the human eye).

Data PDFs do not need OCR and can therefore be thought of as producing a “perception” result which is 100% accurate. 100% perception is the key enabler for the process of understanding, as it allows a natural language analysis to take place with high levels of sophistication as there is no fear that all of the logical steps taking place within it will be broken by some stray spanner in the works which changes the word “battery” to a “hattery” or omits a very important decimal point in the phrase “don’t exceed the recommended dose of 1.234 ml every 24 hours”.

Providing the fuel for automation

Sophisticated systems of understanding remove the need for human operators and allow services to operate in a fully automated manner. At the time of writing, CloudTrade is processing ten million documents a year in this fashion. As soon as errors in perception are introduced, such as by using OCR, failures start to occur in the grammatical rules which underpin the process of understanding, and more and more human intervention is needed resulting in less and less automation.

Alternatively, OCR solutions operate in this field because they embrace the human element of document processing. The advantage is that they are not limited to only processing data PDFs. Their disadvantage is that they cannot fully automate.

To assume is to…

The second way in which the difference between perception and understanding has been muddled is in the technology behind OCR, which has now made inroads into the world of understanding. To quote Douglas Hofstadter from his seminal paper on OCR and AI called “on seeing A’s and seeing As”:

“A tacit assumption is thus that the components of sentences–individual words, or the concepts lying beneath them–are not deeply problematical aspects of intelligence, but rather that the mystery of thought is how these small, elemental, “trivial” items work together in large, complex (and perforce nontrivial) structures.”

Douglas Hofstadter

This assumption is certainly true with data PDFs, and that “mystery of thought” is clearly where CloudTrade has put in all of its R&D efforts. However, should the need for OCR not disappear completely, as might happen if all interactions become electronic and “data” based documents become the norm, then the most promising future for OCR is likely to come out of a hybridisation of perception and understanding.

Variety is the spice of life? Not for data.

Although as I said earlier, OCR makes mistakes such as reading “fish” for “fjsh”, what it actually does is identify lists of variations rather than hard and fast answers and then present those variations with their individual certainty values to a user for arbitration (i.e. it could be “fjsh” (60%) or perhaps it’s “fish” (50%)). OCR vendors can then use dictionaries to automatically strip out nonsense words like “fjsh” and perhaps narrow down the possibilities to arrive at the right answer. This doesn’t work, however, when the OCR mistakes still result in words present in the dictionary, or when a word being considered is not necessarily an English word at all (like a part number in a catalogue).

A far more sophisticated solution would be to bring in all these variations in perception straight into the “understanding” engine and then allow the latter to crunch through all of the grammatical options.

This is something that we have experimented with at CloudTrade, since it is possible for us to connect to OCR as the “perception” part of our solution. In doing so we have, indeed, found that with a bit of patience and tailoring we can deliver an OCR based service which is just about acceptable and automatic for header-level capture, but it’s too painful and slow to be feasible on complex or not “near-perfect” scanned images.

Dictionary lookups have been a standard feature with OCR vendors for some time. Advances in Machine Learning may well improve matters further in the future. I doubt very much that any improvements will happen with things like invoices and purchase orders, where a lot of the key information doesn’t have very much context to draw upon to allow significant automatic corrections to be made, but there could be mileage in using this technology with historical documents written in proper flowing prose.

OCR may well have an interesting future when it comes to scanning documents that were written in the past, but it’s more than likely to now be a past technology when it comes to documents that are to be written in the future.


CloudTrade vs OCR

Want to know more about how CloudTrade differs from OCR Technology?

Download our guide outlining the key features of both types of solution and the differences between them.

 

A Brief History Of OCR

A Brief History Of OCR

Reading Time: 3 minutes
A Brief History Of OCR 1

Optical Character Recognition (OCR) used to be seen as the unsung hero of the business world.

As you probably know, OCR is the process by which text images are converted into machine-encoded text. Digitising text means it can be easily presented, edited, stored and searched, optimising key administrative tasks such as invoicing and sales processing.

But how was this technology developed? And in the era of digitisation, can OCR remain relevant?

The genius of Emanuel Goldberg

OCR traces its roots back to telegraphy. On the eve of the First World War, physicist Emanuel Goldberg invented a machine that could read characters and convert them into telegraph code. In the 1920s, he went a step further and created the first electronic document retrieval system.

At this time, businesses were microfilming financial records – great in principle, but quickly retrieving specific records from spools of film was nigh on impossible. To overcome this, Goldberg used a photoelectric cell to do pattern recognition with the help of a movie projector. By repurposing existing technologies, he took the first steps towards the automation of record keeping. The US patent for his “Statistical Machine” was later acquired by IBM.

Since then, OCR technology has proliferated, with businesses all over the world relying on it to help reduce overheads when it comes to converting extracting data from paper documents.

The problem with OCR

Early versions of OCR had to be trained with images of each character and were limited to recognising one font at a time. In the 1970s, inventor Ray Kurzweil commercialised “omni-font OCR”, which could process text printed in almost any font. In the early 2000s, OCR became available online as a cloud-based service, accessible via desktop and mobile applications.

Today, there’s a host of OCR service providers offering technology (often accessible via APIs) capable of recognising most characters and fonts to a high level of accuracy. Even though the technology continues to improve, there is always scope for errors. And that means costly human intervention to validate information and ensure it’s ready to be used by the wider organisation – not to mention the economic and environmental cost of persisting with paper.

What’s changed?

For decades, OCR was the only way to turn print outs into data that could be processed by computers, and it remains the tool of choice (outside of EDI and invoice portals) for converting paper invoices into extractable data that can be integrated into finance systems as an example.

But e-document submission now offers businesses a far superior approach to areas such as invoicing and sales processing, cutting costs and freeing up staff to focus on higher value tasks. Going forward, we expect advances in AI and Machine Learning to further accelerate the demise of data extraction.

The future of OCR

OCR will continue to be a valuable tool for filling in gaps whereby an application-generated electronic document cannot be generated. Ultimately, the truly “paperless business” doesn’t (yet) exist and data extraction is still a useful tool that can augment e-document processing. That’s why CloudTrade has teamed up with partners to help clients who are still grappling with paper. These partnerships bridge the gap by bringing together paper and electronic invoicing into the same processing platform, enabling customers to target savings of up to 80%.

We accept that a complete mail-room service may well require OCR until the world becomes completely digitised. Recognising demand from business to go digital, CloudTrade was formed in 2009 to help accelerate a paradigm shift in the data acquisition process. We recognised that many solutions presented barriers to entry, often requiring IT projects that once completed, still required human intervention in the process.

CloudTrade was formed to remove all barriers for the sending party. We make it easy for trading parties to transact and provide receiving party data that’s 100% accurate and meets their specific business rules. By doing this we negate the need for human intervention, saving businesses time and money.


Learn what benefits modern data capture can give your business