Optical Character Recognition

Did you hear the long tale about the long tail?

Reading Time: 4 minutes
Long tail
A tail up to 2.4m is pretty long

National Geographic tells us that the Giraffe has the longest tail of any land mammal – a Giraffes tail can measure up to 2.4m (apparently).

I asked my children what other animals have a long tail (it should be noted that during the months of lockdown we’ve been asking more and more abstract questions, so this seemed quite normal to them!). Responses included: the ring tailed lemur, monkeys (no particular sub-species offered), rodents and our (long suffering) dog. However, the longest tail when it comes to the proportion of an animal’s body in relation to its tail length is the Asian grass lizard, according to National Geographic. Although the tail length is only 25cm’s long, this is over three times its body length.

I’ve been fortunate to have travelled the world quite a bit for work and fun and I’ve had opportunity to see giraffes, lizards and monkeys up close. I’ve also had opportunity to see rodents up close (very close!) recently during a house renovation.

Each of these animals seems quite comfortable with its body and its long tail. In fact, our dog can entertain herself for hours chasing her tail (but that’s another blog altogether).

The only animal that seems to suffer from a long tail is the Homosapien.

So what do we mean by the ‘long tail’?

Simply put, long tail documents in the business world are the low volume of documents from a high volume of senders. This is typically seen in the Accounts Payable and Finance departments, but the premise can also be applied to any type of business transaction requiring documentation.

Having worked with ERP systems for 20 years, I’ve seen the challenges, pitfalls and benefits of getting to grips with the data trail. So, sharing some of these wise old years, I’ve pulled together my thoughts of the different solutions available to address the challenges.

Optical Character Recognition (OCR) – Recognising the limitations

Optical Character Recognition (OCR) was a game changer to the business world. With it’s earliest inventions originating in 1870, it was developed and in widespread application use since the 1960’s, it has helped streamline business processes and, to a point, support automation. Incorporating OCR tech into business process was good addition, however its limitations always meant that other solutions were needed to support the process. The misreading characters, changing document structures and the manual intervention, needed to ensure high levels of data accuracy, have pushed organisations to look for more sophisticated technologies to help automation.

Also, when you think about it, bar coding, loading scanners and correcting mistakes doesn’t really support the digital transformation organisations are looking for or need. It seems like OCR tech needs more manual processes to solve a manual process – odd huh?

Electronic Data Interchange (EDI) – Interchangeable but inflexible

Many organisations turned to Electronic Data Interchange (EDI). This is a much more reliable method of capturing data accurately and at speed. Large files transmit data in an agreed format and allow seamless integration between sender and receiver. Ok, sounds great! But the challenge here is that it needs both sides to commit to a technical and operational strategy and often requires a high financial commitment (relative to the value of the document processed) to set up and maintain. So, EDI is ideal for the highest volume senders, but for the long tail? Most definitely not.

Purchase Order Flip (PO) – PO Flip or PO Flop?

The emergence of portals and offering the supplier the ‘opportunity’ to do PO Flip to create an invoice seems like a perfect option to reduce the Accounts Payable long tail, well in theory. The major challenge here is that suppliers don’t want to re-key information or have to manage multiple portals to raise their invoices, ok this idea might just be a PO Flop.

Although these technologies have helped reduce the long tail slightly, they do not provide the coverage needed. Long tail? Still a problem.

The new tech on the block?

Emerging technologies like Robotic Process Automation (RPA) and Neural networks are technologies that I believe will offer some assistance in this area in the future. Although not new tech, their increasing involvement in the document automation domain has been noted.  

More organisations are exploring RPA with great effect in areas such as sharing data (critical at the moment for the response to Covid-19) and accelerating tasks such as the onboarding of staff. However, for processing documents the projects are falling short of expectations. Many are proving costly or just unreliable as the systems are still reliant on dated technology at the source (OCR) to capture the data on entry. Using RPA to process inbound documents is arguably a problem for the entire supply chain, not just the long tail, and many projects still require manual correction of data. It’s still a problem either way.

Natural Language Processing (NLP) is the technology on which CloudTrade’s service is based. This is a proven subset, having been around for many years, of Artificial Intelligence that enables our service to understand the logic and meaning of a document. Once you understand this the data is available with 100% accuracy regardless of ‘problems’ like data moving on the page.

Additional benefits such as no change in process for the supply chain, deployment within weeks and with no manual processing, mean its clear why there is such a high demand for the service, now more than ever.

Did you know?

The tufted ground squirrel (nicknamed the ‘vampire squirrel’) has a tail that is 130% of its body volume. This is to confuse its predators.

The longtail seems to cause confusion to a lot of solution and service providers, but there is a way to manage it. Does CloudTrade solve the challenge of the long tail? Well simply put, yes. Our core solution, Universal Capture, does process the long tail documents and automate much of the processing with great accuracy. Perhaps not quite as well as the short tail (these can be truly automated with perfect accuracy), but since many of our clients have elected to close their post rooms, get rid of their scanners for documents such as invoices and use our solution instead, I think it’s the closest the world has seen yet.


Our journey from Monolith to Microservices

Reading Time: 5 minutes
Our journey from Monolith to Microservices 1

Richard Hooper, Head of Systems, explains how CloudTrade upgraded its software environment to cope with increased demand and some of the problems solved along the way.

Just over a year ago at CloudTrade, we made the jump and decided that containers (using Kubernetes) were the answers to all our application issues. In this article I will examine why we have chosen to jump on the container band wagon, which could be termed as the ‘latest tech craze’, as well as how we solved some of the issues along the way, but firstly, a little about me.

About me

I’m Richard Hooper, Head of Systems and a Microsoft MVP in Azure. I started with CloudTrade back in March 2018 as a Systems Architect. As CloudTrade grew so did my responsibilities, and now I manage a team that look after the internal servers as well as the desktop, Azure estate, and the whole production estate.

My passion lies in all thing’s technology based and specially Microsoft Azure. In my spare time I blog about Azure at Https:// and can be found hosting the North East Azure User Group.

Was a container system the right thing to do?

It’s a question I ask myself often. With the rate of change in the cloud world you kind of have to keep questioning and evaluating, as a new technology comes out almost monthly, well it seems to anyway. Every time I ask myself, I always come to the conclusion of, yes. However, as we became more familiar with microservices and what we need from our application, I know we made the right choice.

Why microservices?

The application that powers CloudTrade’s unique data acquisition technology, Gramatica, started life as a sort of Desktop application. It needed the user to be logged in and wrote a lot of files onto the server or desktop. One good thing is that when the application was first created, it was created with steps and each step had a sort of handover using files. When I found out about this, it was a relief as it should make the move to microservices easier.

Why change then, I hear you ask! Well for a start the management of the server and application became difficult, especially if you wanted to do any kind of automated patches and, I certainly did not want to keep patching servers out of hours. But the main driving force for the move was scalability – the dream for a software business.

With the way the application was created, and all the file access, at the time, scaling was a right pain! First you had to run more copies of the application per user if there were enough free resources on the server or spin up a new server and migrate the user and application to it. Sometimes we would also hit disk issues, capacity and IOPS.

With the move to Kubernetes, an open-source container-orchestration system, and more specifically Azure Kubernetes Services (AKS) this headache has gone away. Our AKS cluster utilises something called Virtual Machine Scale Sets (VMSS) which allows for the cluster to auto scale it’s nodes when resources are becoming constrained, all done automatically. Another great feature with Kubernetes is the way it can automatically scale your deployments (a deployment is a collection of pods, a pod is a wrapper for containers in Kubernetes). How awesome is that?! But all this awesomeness still came with issues, issues that we had to get over to make this journey a true success.

Oh no, not issues!

Yes, with any journey you are always going to have hurdles along the way and this one is no different. One of our main issues, is that part of our new microservices application needs to be run in Windows containers. This was the problem we tried to fix first – some may say that was a mistake as Kubernetes did not support Windows containers at the time, but Docker did!

To get round this issue, we are currently running the microservice on Windows server 2019 in a VMSS using a custom hardened image. We currently run 6 containers per node, 1 for configuration and 5 for actual processing.

Scaling became a bit of an issue as we moved more onto this new microservice. As we are now using RabbitMQ instead of the file system, we came up with a brilliant solution of using an Azure Logic App to query the RabbitMQ cluster, which is running inside our AKS cluster, every 15 minutes. It checks the queue size and how many containers are consuming the queue and will then either scale up or down the VMSS nodes. Unfortunately, we had to choose 15 minutes for the check as the nodes can take a while to come up.

We are currently rewriting this application to run in Linux, so my tip is if you can get away with not running Windows containers then do it!

As we are using RabbitMQ, to scale our microservices that run inside the AKS cluster, we were unable to utilise any of the basic container autoscaling that comes with Kubernetes. After some research we came across Keda, which is an open source project by Microsoft and Red Hat. Keda extends the basic container autoscaling and allows us to scale based on RabbitMQ queue size and quicker than the logic app approach we used above. We were quite lucky that Keda went GA just in time for us to release the second phase of containers.  

What’s next?

We are continuing our journey with the next phases being worked on. We hope to get the release into production by the second half of this year. Once each step has been finished, we will end up with what we are calling a skeleton of our old application which will still be running on the servers. There will need to be some time spent to remove these to complete our journey as we are envisioning that there will be no need for any servers apart from the AKS nodes.

We will also continue with another journey. This one is to utilise tools like GitHub Actions and Azure DevOps which will help to automatically build and release each microservice to our test and then production AKS cluster. This will enable us to fully embrace the ‘DevOps mentality’ by not only improving internal processes, but also improving the application.

Feel free to reach out if you would like to discuss any of the above – thanks for reading!

Our journey from Monolith to Microservices 2

CloudTrade specialises in converting documents (with 100% accuracy)

so humans can read them.

Learn more about CloudTrade and our technology here.


It may not be rocket science, but it can be complex

Reading Time: 3 minutes
It may not be rocket science, but it can be complex 3

Reading documents may not be rocket science, but computers struggle to do what humans find simple. Is technology finally able to read documents in the same way as humans?

CloudTrade are in the business of extracting and interpreting information out of documents which have been written can be understood not by people, not computers.

This is probably one of the most frustrating problems in the history of IT.

Reading stuff out of documents feels easy to us, as people. Nowadays anything to do with people communicating to other people feels easy, and we ultimately think that since computers are cleverer than we are (in many ways), that if a person finds a task easy, then a computer should find this no trouble at all.

The problem is: we tend to forget just how clever people are. Even if you struggle with long division, that brain of yours which controls everything from getting out of bed in the morning, to washing, driving to work, eating lunch, watching TV and so on, leaves the most powerful computers in the world floundering at the starting pole like electronic tortoises.

Communicating with other people, in speech or in writing, falls into that category of stuff that your brain is very good at but computers struggle to do. People get a lot of practice at it. No computer in the world could have read what you’ve read so far and have any idea of what I’m talking about, but you’ve understood me completely (well I hope so!).

CloudTrade aren’t in the world of building robots, of course, not even robotic tortoises. Neither are we trying to write a full natural language processor which could understand everything that a human being might want to say to it. These sorts of achievements are truly well within the realms of science fiction. However, what we have built at CloudTrade, is a natural language processing engine which can understand those documents which we have programmed it to understand. This is much more sophisticated than the approaches which are otherwise prevalent in the marketplace.

For example, just hoping that a particular bit of information on a document (for instance, a VAT number) might always be found on the same place on a page, just isn’t going to work. Neither will the idea that you might be able to go hunting for some unique piece of text and then look in some predetermined distance and direction to find what you’re after. These sorts of techniques work occasionally, but most of the time pages jiggle around and the chances of being able to find something which is not only guaranteed to be unique, but also always in the same location relative to what you’re looking for is tiny.

We frequently get people coming to us after they’ve tried these sorts of solutions and then given up in frustration and I sympathise with this scenario. Often, they thought that the problem they had was an easy one so they bought into an easy solution, more often or not wrapped up with some sort of neural network element, which then proves unhelpful. They’ve then discovered that this easy solution didn’t work and that they had to spend all of their time filling in for its mistakes, or being told that they had “yet another” special case which would require costly scripting or programming.

CloudTrade are simply not like this.

Ok, I know anyone can make that sort of claim, but I like to think that we put our money where our mouth is by offering our solution as a full service, rather than as a software licence where you may be left to find out for yourself whether the solution works effectively or not. We configure it to fit your requirements and when it’s up and running we correct its mistakes and maintain it for as long as you stay with us. Furthermore, we’ll charge you the same price for every document we handle, no matter how awkward or complicated it may be.

We’re the only company prepared to do this because we know, ultimately, that we’ve built the right solution. It may not be rocket science, but it’s actually pretty clever, and it turns out that you need to be pretty clever if you want to solve this problem.

It may not be rocket science, but it can be complex 4

CloudTrade specialises in converting documents (with 100% accuracy)

so humans can read them.

Learn more about CloudTrade and our technology here.


Self isolation? No problem – keep your business running 24/7

Reading Time: 4 minutes
Remote Working

With the world slowly putting itself into self-isolation, never has it been more evident that digitisation of your business practices enables increased flexibility in where your team works and helping to keep it ‘open as usual’. Here, David Cocks, CloudTrade CEO, discusses how the move towards digitisation and automation can not only help keep your business running during times of crisis such as these, but can also benefit your business in the long run.

In the modern world, customers have come to expect services 24/7. Companies like Amazon have set the bar very high when it feels like that drone delivering the next brown parcel is already overhead before you have even clicked “proceed to checkout”. However, not every company has the luxury of scale to ensure they always have staff ready to complete the onsite business processes, whatever the peaks in demand.

Some companies now incorporate flexible working strategies to encourage home working where it fits with the business needs, but many tasks remain dependent on the physical presence of staff within the business office. This often includes the need for access to paper documents or the use of on-premise technology, be that computers, printers or even scanners.

In these difficult times, we are also forced to accept the real possibility of large numbers of staff not being able to travel to the physical business office through actual illness or community health protection necessities. The more we modernise our business so our staff can work from anywhere, the more we are resilient to the threats of massive disruption.

The state-of-the-art way to ensure your business meets scalability and business continuity demands is to automate fully the critical processes. Persisting with manual, repetitive tasks in the spirit of “it ain’t broke, why fix it” will not work when faced with a global crisis that is emptying streets, offices and leaves the commuter train ghost-like. Also, your customers will not tolerate whatever reasons  as to why you’re unable to deliver the goods and services promised, especially if these business critical – they will look to others who have identified the risks of reliance on outdate manual processes and have taken actions to mitigate these issues.

What do I mean by ‘automating the critical processes’?

Firstly, look at your internal processes and understand why things are done manually. This is often because of external factors, notably data not available in a form or quality that is consistent with the requirements for straight through processing (STP).

Just think of a sales order arriving by email. The customer service team have to open the email, save the attachment, scan the PDF, correct the OCR (scanned data) and finally upload the data into the fulfilment system. One person can process maybe 10 orders an hour – max! If staff are not available or not on-site, then there are no orders in your system, unhappy customers and loss of revenue. All you really need is the correct data in a machine-readable form without the reliance of scanning equipment, and you can go straight through to fulfilment. Your order is shipped in minutes, and not days.

Second is to use the right technology, designed to automate and deliver touch-free processing. I don’t mean systems designed to make a manual process more efficient, where you still need elusive staff to complete routine tasks. Systems that are designed to achieve guaranteed high levels of automation and accuracy are essential for the true, scalable, high-volume straight-through process. A system that can guarantee quality must be deterministic. That is, it must be clear how it works, why it works and (probably just as important) when it fails, why it has failed. If your business systems are non-deterministic, the output cannot be predicted and you don’t know how it works, then you can’t drive for full automation. Perhaps you already have an invoice scanning/OCR service – think how annoying it is when sometimes it gets the data correct, and other times it corrupts or misses information that appears clearly on the original document. The result is you need to check manually each document. It is only when the mechanisms of automation are transparent that you can achieve continuous improvement. A system that is best endeavours and (maybe) improves in an obscure way can never deliver the straight-through processes you should strive for.

So, business leaders think automation and think STP. Modernising your business is not vanity, or even just a way of improving margins, it is a necessity to survival.

I wouldn’t want you to be the next Kodak, Blockbuster, or more recently, Thomas Cook.

CloudTrade specialises in automating traditionally manual business processes

such as invoice processing.

Want to know more about electronic invoicing and the different methods available?


HI, the new AI – What the Terminator got right.

Reading Time: 2 minutes
HI, the new AI – What the Terminator got right. 6
HI, the new AI – What the Terminator got right.

Ever since Schwarzenegger told a desk Sargent that he’d be back and then was back 5 mins later crashing through the wall of police station, the world has been in love with the idea of AI.

And why not? The prospect of machines taking away the mundane tasks of the day-to-day, freeing civilisation to live a decadent and carefree life is a dream to aspire to – right?  And if some of those machines turn out to be human hating cyborgs then surely that’s a price worth paying…

Generations have been working away to create that first version of Skynet (the fictional superintelligence system). But rather than looking at teaching the system how to co-ordinate a nuclear strike (hopefully we have learnt something from the film War Games) companies have instead focussed on the more mundane but ultimately monetizable day-to-day tasks that occupies the humble office worker.

While not something that generally lends itself to a big budget movie, it’s obviously a subject that people are hoping will create a big budget company.

One area in which companies have focussed is on the world of data extraction from documents. Around the world, millions of documents are having their information extracted and placed into a target system. Sometimes they are using new technology to perform the task but often (more than you would think), data extraction is carried out by people just keying in the data.

A prime target for termination you might say? But this is where we begin to see more parallels with a movie script than you would expect, as m­­arketing teams in these companies polish the reality of AI into something Oscar worthy.

A quick google for AI data extraction will fill your screen with companies using buzz words like they are going out of fashion: Neural Networks, Deep Learning, Powered by AI, Machine Learning, Big Data, Document Understanding Platform, Set and Forget, Pre-trained AI models… The list goes on and on.

But the reality is that AI hasn’t quite lived up to the dream that Hollywood has sold us. Despite the claims out there that AI is still in the early stages, nobody has created a truly work-killing system. Either AI is only involved in a small part of the end to end process or it needs a large amount of human interaction to train and review the output, simply moving the human costs of processing from one area of a business to another.

So while we wait for AI to catch up with it’s own hype, and that might be 2030 or even 2060, we need to look for solutions that harness current technology to solve today’s problems. And perhaps we need to harness HI (Human Intelligence) to do this while the machines catch up with us.

Computers can already do amazing things when given the proper guidance. That is how we approach problems here at CloudTrade using our patented data extraction and interpretation software, coupled with the in-house expertise, to quickly and efficiently teach our systems without the trial and error that AI needs.

So maybe Terminator did get something right. You need something part-man and part-machine to deal with difficult problems…