Posts

OCR and Perception

Is it OCR?

Reading Time: 5 minutes
OCR and Perception
How data is perceived is crucial for it to be understood for automation

When looking at data capture solutions, the term OCR often pops up. A technology that has been around for years, it is often the ‘go-to’ for companies looking to automate their data capture. In this blog post, the first in a trilogy explaining the CloudTrade data-capture solution, Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means. When it comes to the future of data capture and enabling automation, we need to look at data perception and understanding…

I am often asked to explain the difference between the service that we provide here at CloudTrade and those services which are sold under the banner of “Optical Character Recognition” (OCR).

There is almost a straight answer to this, which is that OCR deals with what we might call “human perception” whereas CloudTrade is more about “human understanding”.

I say “almost” because the waters get muddied on a couple of counts. I shall come to these later; but let me first define exactly what I mean by “perception” and “understanding”.

What do we mean by data perception and data understanding?

Perception is all about recognition in its most basic form. It’s the bit in our brains which translates swirly lines and dots and circles into meaningful letters in the English language. It’s also the bit that has to struggle with differentiating between “i” and “j” or “b” and “h” so that we don’t end up wishing people “bappy hjrthdays” or catching a “fjshes” on a “fjshjng book”.

Understanding, however, is all about meaning. It’s the bit that comes in after perception has done its job (assuming that it gets it right!) and figures out, say, that the word “fishing” in “fishing for compliments” has nothing to do with the word “fishing” when you’re fishing in the sea.

Where the difference in perception and understanding starts to get muddied is that both the providers of OCR based solutions and we, ourselves, at CloudTrade, offer services which are based on a combination of both of these technologies.

You can’t have one without the other

You can’t, after all, have understanding without perception (unless you’re some sort of yogi floating over a mat in the Himalayas), or perception without understanding (imagine trying to find your way around the Tokyo underground system when you don’t speak Japanese). CloudTrade and OCR-based solutions need to use both of these elements because providing this service means not only extracting the right numbers and letters from those documents that are sent to us but also understanding them well enough to explain that, for example, “quantity 1” in an order line next to “car mats” is probably referring to a pack of 4 whereas the same phrase next to “Lamborghini Veneno Roadster” is unlikely to be referring to a pack of 4 of them at all.

Traditionally, OCR-based solutions have focussed on the perception side of the problem because that is where they have invested the bulk of their R&D, leaving the understanding part to be provided mostly by humans.

The value is in the understanding

CloudTrade, on the other hand, has invested all of its R&D efforts on understanding, succeeding in bypassing the perception part completely by focusing on “data” documents such as “data” PDFs (where, for example, the letter “s” is unambiguously stored as the letter “s” rather than as a set of drawing instructions resulting in something which could look like the letter “s” to the human eye).

Data PDFs do not need OCR and can therefore be thought of as producing a “perception” result which is 100% accurate. 100% perception is the key enabler for the process of understanding, as it allows a natural language analysis to take place with high levels of sophistication as there is no fear that all of the logical steps taking place within it will be broken by some stray spanner in the works which changes the word “battery” to a “hattery” or omits a very important decimal point in the phrase “don’t exceed the recommended dose of 1.234 ml every 24 hours”.

Providing the fuel for automation

Sophisticated systems of understanding remove the need for human operators and allow services to operate in a fully automated manner. At the time of writing, CloudTrade is processing ten million documents a year in this fashion. As soon as errors in perception are introduced, such as by using OCR, failures start to occur in the grammatical rules which underpin the process of understanding, and more and more human intervention is needed resulting in less and less automation.

Alternatively, OCR solutions operate in this field because they embrace the human element of document processing. The advantage is that they are not limited to only processing data PDFs. Their disadvantage is that they cannot fully automate.

To assume is to…

The second way in which the difference between perception and understanding has been muddled is in the technology behind OCR, which has now made inroads into the world of understanding. To quote Douglas Hofstadter from his seminal paper on OCR and AI called “on seeing A’s and seeing As”:

“A tacit assumption is thus that the components of sentences–individual words, or the concepts lying beneath them–are not deeply problematical aspects of intelligence, but rather that the mystery of thought is how these small, elemental, “trivial” items work together in large, complex (and perforce nontrivial) structures.”

Douglas Hofstadter

This assumption is certainly true with data PDFs, and that “mystery of thought” is clearly where CloudTrade has put in all of its R&D efforts. However, should the need for OCR not disappear completely, as might happen if all interactions become electronic and “data” based documents become the norm, then the most promising future for OCR is likely to come out of a hybridisation of perception and understanding.

Variety is the spice of life? Not for data.

Although as I said earlier, OCR makes mistakes such as reading “fish” for “fjsh”, what it actually does is identify lists of variations rather than hard and fast answers and then present those variations with their individual certainty values to a user for arbitration (i.e. it could be “fjsh” (60%) or perhaps it’s “fish” (50%)). OCR vendors can then use dictionaries to automatically strip out nonsense words like “fjsh” and perhaps narrow down the possibilities to arrive at the right answer. This doesn’t work, however, when the OCR mistakes still result in words present in the dictionary, or when a word being considered is not necessarily an English word at all (like a part number in a catalogue).

A far more sophisticated solution would be to bring in all these variations in perception straight into the “understanding” engine and then allow the latter to crunch through all of the grammatical options.

This is something that we have experimented with at CloudTrade, since it is possible for us to connect to OCR as the “perception” part of our solution. In doing so we have, indeed, found that with a bit of patience and tailoring we can deliver an OCR based service which is just about acceptable and automatic for header-level capture, but it’s too painful and slow to be feasible on complex or not “near-perfect” scanned images.

Dictionary lookups have been a standard feature with OCR vendors for some time. Advances in Machine Learning may well improve matters further in the future. I doubt very much that any improvements will happen with things like invoices and purchase orders, where a lot of the key information doesn’t have very much context to draw upon to allow significant automatic corrections to be made, but there could be mileage in using this technology with historical documents written in proper flowing prose.

OCR may well have an interesting future when it comes to scanning documents that were written in the past, but it’s more than likely to now be a past technology when it comes to documents that are to be written in the future.


CloudTrade vs OCR

Want to know more about how CloudTrade differs from OCR Technology?

Download our guide outlining the key features of both types of solution and the differences between them.

 

OCR and Perception

How is CloudTrade technology different to OCR?

Reading Time: 5 minutes
OCR and Perception
How data is perceived is crucial for it to be understood for automation

When looking at data capture solutions, the term OCR often pops up. A technology that has been around for years, it is often the ‘go-to’ for companies looking to automate their data capture. In this blog post, Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means. When it comes to the future of data capture and enabling automation, we need to look at data perception and understanding…

I am often asked to explain the difference between the service that we provide here at CloudTrade and those services which are sold under the banner of “Optical Character Recognition” (OCR).

There is almost a straight answer to this, which is that OCR deals with what we might call “human perception” whereas CloudTrade is more about “human understanding”.

I say “almost” because the waters get muddied on a couple of counts. I shall come to these later; but let me first define exactly what I mean by “perception” and “understanding”.

What do we mean by data perception and data understanding?

Perception is all about recognition in its most basic form. It’s the bit in our brains which translates swirly lines and dots and circles into meaningful letters in the English language. It’s also the bit that has to struggle with differentiating between “i” and “j” or “b” and “h” so that we don’t end up wishing people “bappy hjrthdays” or catching a “fjshes” on a “fjshjng book”.

Understanding, however, is all about meaning. It’s the bit that comes in after perception has done its job (assuming that it gets it right!) and figures out, say, that the word “fishing” in “fishing for compliments” has nothing to do with the word “fishing” when you’re fishing in the sea.

Where the difference in perception and understanding starts to get muddied is that both the providers of OCR based solutions and we, ourselves, at CloudTrade, offer services which are based on a combination of both of these technologies.

You can’t have one without the other

You can’t, after all, have understanding without perception (unless you’re some sort of yogi floating over a mat in the Himalayas), or perception without understanding (imagine trying to find your way around the Tokyo underground system when you don’t speak Japanese). CloudTrade and OCR-based solutions need to use both of these elements because providing this service means not only extracting the right numbers and letters from those documents that are sent to us but also understanding them well enough to explain that, for example, “quantity 1” in an order line next to “car mats” is probably referring to a pack of 4 whereas the same phrase next to “Lamborghini Veneno Roadster” is unlikely to be referring to a pack of 4 of them at all.

Traditionally, OCR-based solutions have focussed on the perception side of the problem because that is where they have invested the bulk of their R&D, leaving the understanding part to be provided mostly by humans.

The value is in the understanding

CloudTrade, on the other hand, has invested all of its R&D efforts on understanding, succeeding in bypassing the perception part completely by focusing on “data” documents such as “data” PDFs (where, for example, the letter “s” is unambiguously stored as the letter “s” rather than as a set of drawing instructions resulting in something which could look like the letter “s” to the human eye).

Data PDFs do not need OCR and can therefore be thought of as producing a “perception” result which is 100% accurate. 100% perception is the key enabler for the process of understanding, as it allows a natural language analysis to take place with high levels of sophistication as there is no fear that all of the logical steps taking place within it will be broken by some stray spanner in the works which changes the word “battery” to a “hattery” or omits a very important decimal point in the phrase “don’t exceed the recommended dose of 1.234 ml every 24 hours”.

Providing the fuel for automation

Sophisticated systems of understanding remove the need for human operators and allow services to operate in a fully automated manner. At the time of writing, CloudTrade is processing ten million documents a year in this fashion. As soon as errors in perception are introduced, such as by using OCR, failures start to occur in the grammatical rules which underpin the process of understanding, and more and more human intervention is needed resulting in less and less automation.

Alternatively, OCR solutions operate in this field because they embrace the human element of document processing. The advantage is that they are not limited to only processing data PDFs. Their disadvantage is that they cannot fully automate.

To assume is to…

The second way in which the difference between perception and understanding has been muddled is in the technology behind OCR, which has now made inroads into the world of understanding. To quote Douglas Hofstadter from his seminal paper on OCR and AI called “on seeing A’s and seeing As”:

“A tacit assumption is thus that the components of sentences–individual words, or the concepts lying beneath them–are not deeply problematical aspects of intelligence, but rather that the mystery of thought is how these small, elemental, “trivial” items work together in large, complex (and perforce nontrivial) structures.”

Douglas Hofstadter

This assumption is certainly true with data PDFs, and that “mystery of thought” is clearly where CloudTrade has put in all of its R&D efforts. However, should the need for OCR not disappear completely, as might happen if all interactions become electronic and “data” based documents become the norm, then the most promising future for OCR is likely to come out of a hybridisation of perception and understanding.

Variety is the spice of life? Not for data.

Although as I said earlier, OCR makes mistakes such as reading “fish” for “fjsh”, what it actually does is identify lists of variations rather than hard and fast answers and then present those variations with their individual certainty values to a user for arbitration (i.e. it could be “fjsh” (60%) or perhaps it’s “fish” (50%)). OCR vendors can then use dictionaries to automatically strip out nonsense words like “fjsh” and perhaps narrow down the possibilities to arrive at the right answer. This doesn’t work, however, when the OCR mistakes still result in words present in the dictionary, or when a word being considered is not necessarily an English word at all (like a part number in a catalogue).

A far more sophisticated solution would be to bring in all these variations in perception straight into the “understanding” engine and then allow the latter to crunch through all of the grammatical options.

This is something that we have experimented with at CloudTrade, since it is possible for us to connect to OCR as the “perception” part of our solution. In doing so we have, indeed, found that with a bit of patience and tailoring we can deliver an OCR based service which is just about acceptable and automatic for header-level capture, but it’s too painful and slow to be feasible on complex or not “near-perfect” scanned images.

Dictionary lookups have been a standard feature with OCR vendors for some time. Advances in Machine Learning may well improve matters further in the future. I doubt very much that any improvements will happen with things like invoices and purchase orders, where a lot of the key information doesn’t have very much context to draw upon to allow significant automatic corrections to be made, but there could be mileage in using this technology with historical documents written in proper flowing prose.

OCR may well have an interesting future when it comes to scanning documents that were written in the past, but it’s more than likely to now be a past technology when it comes to documents that are to be written in the future.


CloudTrade vs OCR

Want to know more about how CloudTrade differs from OCR Technology?

Download our guide outlining the key features of both types of solution and the differences between them.

 

Women in Technology

CloudTrade – a woman’s world?

Reading Time: 3 minutes
CloudTrade - a woman's world? 1
Gender Equality in Technology

Well, not quite yet… But we’re getting there! Here Amee Patel, Operations Manager, discusses some of the challenges of being a woman in a male-dominated industry and how CloudTrade has changed our practices to encourage female applicants for job openings.

Being a woman in a typically male-dominated environment, like tech, can be a tough gig, especially if you are the first female to join the technical team, as I was at CloudTrade a few years ago. However, I can safely say that I have survived (and now thrive!) in an environment with roles mainly filled by men and am continuing to show women that roles in tech are not scary places, and they can fit in here!

I began my career in 2013 on an IT helpdesk – where I was affectionately known as “Helpdesk Girl”, (the name wasn’t quite that nice, but you get the idea) which accurately summed up my duties. Two years after carrying this mantle, the glitz and glamour of commuting into London became too much of a temptation and I started to apply to join tech companies in the big city. When I interviewed at CloudTrade, one of the first questions I was asked was: “You will be the only woman in the company. Is that okay?”. I was taken aback, – I come from a technical background and I spent three years studying a male dominated field at university! Of course it was okay, I thought, and ultimately it had to be okay!

Yet during my first week at CloudTrade, the imposter syndrome set in. What am I doing here?! This is far too technical for me! The men in the team are much better than me… I will never be successful here, and so on… But I stuck at it and I worked hard. I made it my business to become a master of my trade. I was supported by management and I felt like I had found somewhere that I could succeed and was not made to feel inferior to my male colleagues.

As CloudTrade is a small tech company that continues to grow, we often recruit to fill new positions. Six months into my employment, CloudTrade employed its second woman to fill a marketing position. Eighteen months after that, we employed our third into a technical role. Within another six months, I was lucky enough to move into a management role within our Operations team, and within that time we recruited another three women. CloudTrade went from no women to six, but this took almost two years – recruiting women into tech roles is just not that easy, and whenever we tried to recruit, we saw a huge imbalance in the gender of those applying.

This disparity of genders did not surprise me. Being a woman in a tech company is daunting. Imposter syndrome is real. Feeling like you need to work hard to prove your credibility and gain recognition is extremely common, and it doesn’t feel very fair. These sentiments are echoed with facts. Women are less likely than men to study STEM subjects, and even less likely to pursue careers in tech.

While CloudTrade actively tried to recruit women to balance the gender divide, the wider sector also saw more global initiatives to support women, which were gaining momentum. We recognised this wasn’t just happening at CloudTrade, it was happening everywhere, and was being brought to the forefront of people’s attention. Here at CloudTrade, to help redress the balance we reviewed our recruitment process across the various departments – the adverts we were producing, the perception of the company from potential employees, and the profile of the people we were targeting. This piece of work saw a change in the candidates we were getting. Suddenly, we were getting applications from women – talented, qualified, ambitious women!

Today, as we approach International Women’s Day, I write this blog post as one of twelve women in a company of forty-four. I am fortunate enough to work closely with these women, who all bring something different to the table in their various roles. It is never easy as a woman to walk into a tech company and not feel a sense of “I don’t belong here”, which is why I feel so proud of CloudTrade’s journey over the last five years, and so grateful for the personal and professional growth I, as a woman, am offered here.