Can a technical service really be 100% reliable?

Reading Time: 5 minutes
100% reliable, 100% accurate?

Can a technical service really be 100% reliable? In this blog post, Richard Develyn, CloudTrade CTO, with a nod to Heisenberg, looks at what is meant by 100% accuracy in the world of computers. He discusses the shortcomings of OCR technology, and describes how CloudTrade has devised a systematic methodology to reach the goal of offering a service that is 100% accurate.

How is it that one can claim to offer a service which is 100% accurate? One hundred per cent seems too absolute; too certain; too devoid of margin of error. Nothing is 100% accurate after all, is it?

Perhaps you think it isn’t because you’ve heard of something called Heisenberg’s Uncertainty Principle. Even if you don’t exactly know what that is, the fact that it has the word “uncertainty” in the middle of it and is clearly written by a very clever German physicist probably suggests to you that no one should ever make the claim of being 100% certain about anything.

Well, ok, but Heisenberg’s Uncertainty Principle applies at the quantum level, and if it’s going to affect anything it’ll probably kick in when processors become so small that computers start to become “quantumly” unreliable (unless, of course, quantum computers somehow or another prevent it).

But I doubt you came to this blog to read about quantum computers.

Are computers really 100% accurate?

Normal computers are, in fact, 100% reliable, in much the same way that gravity is 100% reliable. The last time that we witnessed a mistake being made with a computer processor was in 1994 with the now infamous (if you happen to move in those circles) Pentium FDIV floating point hardware bug that apparently resulted in 1 in every 9 billion floating point divisions coming out with the wrong answer. That might not seem too terrible, but it was a pretty serious problem at the time, causing Intel to recall the chip at the cost of just under half a billion dollars, which would be three quarters of a billion dollars in today’s money.

You can’t really blame the Pentium FDIV processor for the mistake, though. The poor bit of silicone was only doing what it was told. The problem happened because human engineers made a mistake when they built it. This hasn’t happened since, not because human engineers no longer make mistakes, but because human engineers have improved the way in which they test for and correct the mistakes that they make.

How can unreliable people build reliable systems?

The reason people are sceptical about a claim for 100% accuracy is that they imagine that everything must map down to some sort of human process in the end, and human beings are notoriously not 100% reliable. The people who produce computer processors, however, seemed to have bucked that trend.

About ten years earlier, in the world of software, Donald Knuth, an eminent computer scientist, offered a reward of $2.56, doubling every year, to anyone who could find a bug in his very sophisticated typesetting application that he’d written called Tex. There were bugs found at first, admittedly, and Knuth duly paid out, but even though the software continues to have widespread use to this day no one has reported a new bug in it for about the last 12 years.

So, it would seem that it’s possible to produce 100% reliable software as well as 100% reliable hardware. Plenty of software is, actually. Sure, plenty isn’t, but the majority of the stuff that controls the critical aspects of our lives is 100% reliable. We’d soon know if it wasn’t. It might not be 100% reliable when it first comes out but, rather like Donald Knuth’s Tex program, it gets to 100% accuracy with a bit of time and effort.

Here’s the key question: how is it possible for computers and computer programs to be 100% reliable when they are built by people who clearly aren’t?

It really comes down to three things:

  • Having a well-understood and limited problem to solve
  • Having the right tools to solve the problem
  • Ironing out the bugs

The problem with OCR

Optical Character Recognition (OCR), for example, as a technology, fails to achieve 100% reliability because of point (1) – the problem is not well understood or limited in its scope. There are many ways of representing letters to the human eye – some estimates give the number of fonts in existence today at about half a million. These fonts don’t adhere to some sort of hard-line rule about what differentiates a “b” from an “h” or an “I” from a “j”. They’re “artistic”, and we all know what that means: they don’t adhere to rules. Add to that the imperfections that you get from scanning paper images and you can see how OCR is really trying to solve an impossible problem.

Identifying the problem

We don’t try to solve the OCR problem; we avoid it entirely by extracting raw data from documents which already have that data unambiguously present within them. Data dumps are of no use to anyone, of course, so where we, at CloudTrade, concentrate all of our efforts is on solving the problems of data understanding, allowing us to properly identify and label data before we pass it on to our clients. These problems are well defined, allowing us to achieve 100% accuracy by following the next two steps on the list.

Choosing the right tools

First of all, we use the right tools. “Let’s get to the moon” might be a very well-defined problem but you’re not going to get there with a hammer and chisel. The better the tools you have, of course, the easier the problem is to solve. We have invested many years of development at CloudTrade in producing a rules-writing engine which allows people, without an IT background, after about a month of training, to write the necessary rules to extract and interpret data from a document in about fifteen minutes to one hour depending on its complexity. This is the key to our success as a company.

Ironing out the bugs

Of course, sometimes the rules writers get it wrong, because they’re only human, and that’s where point (3) comes in. When we make mistakes, we fix them, and fixed mistakes stay fixed.

The trick for getting to 100% accuracy is in having a repeatable process that you can correct as you go along. You can’t get to this sort of reliability if you use people in your processes because people can’t be fixed in the same way that computers can (even the most ambitious psychoanalyst in the world wouldn’t try to make that claim). Neural Networks, which sort of simulate human thinking, can’t be fixed either because we don’t actually understand how they reason any more than we understand how human beings reason. This is an area of considerable research these days because our inability to discuss reasoning with neural networks greatly limits our ability to use them. Perhaps one day we’ll have neural network psychoanalysts. I wonder whether they’ll also be neural networks. The mind boggles.

So in conclusion, the reason that we can justifiably claim to deliver a 100% accurate service is because of these three key facts:

  1. We limit the problem that we’re trying to solve to something well scoped and understood, because we avoid OCR
  2. We have a system in place purpose-built to solve our particular problem
  3. We have processes in place to correct any mistakes that we might make when we use it.

No human errors, no OCR errors, no neural network errors. Just a repeatable programmable system. That’s how we get to 100%.

Document Processing: Time for a gear change? 1

Document Processing: Time for a gear change?

Reading Time: 4 minutes
Document processing
Is it time for a gear change?

When faced with today’s significant challenges, resilience is key. Steve Britton, CloudTrade Director of EMEA Sales, shares here in the blog post his passion for motorsport, and explains how the verve and determination seen on the racetrack is just what is currently required by shared service centres and global business services. Not only do business leaders need to show resolve, but they must also embrace innovation to survive. He explains how the CloudTrade document-processing service enables businesses to move on from the considerable limitations of using OCR systems. The time for disruptive change is now. Do you have process resilience?

We live in challenging times: Covid-19, lockdowns, presidential elections and a faltering economy. What a year, and on top of all this we need to run our businesses to pay bills and our employees, and to deliver a return to our shareholders.

I want to talk about the challenges we face with the collection of data required to run our businesses, and how critical this is to remain competitive. Without data we can trust, how are we meant to run our businesses, pay bills accurately, process orders efficiently, and satisfy our auditors and regulators?

Why challenge the status quo?

Is this a time to batten down the hatches and weather the storm, or should we rise to the challenge and take this opportunity to embrace change and reap the rewards? In business we must innovate to remain competitive. Even more so now. We need to be resilient to the demands we face; we can’t slam the breaks on change and hope that will be enough. Now is the time to not only face the issues head on, but we must challenge the status quo if we are to succeed. Our competitors will seize the initiative if we don’t.

I am a petrol head and drove a rally car semi-professionally for a while, but children, work demands, and grey hairs have put an end to that. To satisfy my passion now I watch Formula 1 and the WRC whenever I can. Motor sport is hugely competitive and constantly challenged with new FIA regulations; the successful teams embrace change and use this to push the boundaries; they constantly innovate and hone their machines to be world class. There is no “I” in team, and indeed without all the team members working together in a coordinated but agile way, success would not happen.

What does Lewis Hamilton have in common with business leaders? 

However, as with every great team there is the ‘one’ person that brings them together: in the case of motor sport, it’s the one who pilots the car. This is the sharp end, where the rubber meets the road. The driver, confident in the design and build of the car, takes the machine to its limits, pushing the vehicle and the team to achieve that world-class status. We have all been amazed at the tenacity and sheer determination that has elevated Lewis Hamilton to the position of holding the greatest number of F1 wins ever. Many will argue that the likes of Fangio, Moss, Stewart and Senna were in a different league, and indeed their era of motor sport and car development were very different – all heroes in my book – but one thing they had was an unquenchable desire to succeed, and this required constant development of man and machine and a sheer determination to win above all odds.

This necessity for innovation and the continued drive to excellence is the same for business leaders, especially when tackling the challenges we face today. We all benefit from motorsport innovation. Consider how carbon fibres, turbo technology, electric motors and battery technology impact on our daily lives.

The document-processing story 

Shared service centres and global business services have innovated and evolved from cost centres to profit contributors. The early days of an accounts payable focus have expanded to embrace all aspects of PTP and OTC. It was simply a case of innovate and add value to survive. I remember in the early 2000s how scan-to-fiche migrated to scan-to-tape and computer-output-to-laser-disk like the HP Surestore and NAS storage. These on-premise solutions rapidly migrated to the ‘Cloud’ with all the associated benefits of free text search, unlimited storage and instant access to data. This created the ‘Big-Data’ era and the need to control data exchange and storage, supported by legislation triggered by the likes of Sarbanes Oxley and the Enron scandals, and more recently by GDPR requirements, to protect how we use the data we collect. Our businesses are managed and controlled by data. Incorrect or false data can have a devastating impact. Most data that businesses rely on is created by third parties and then has to be passed to us for processing, action and analysis.

Document processing: cost versus accuracy  

My focus for the last 21 years has been business-process automation, and specifically the digital transformation of inbound documents. These documents create the very data we rely on to manage our businesses. When I started in this industry OCR had a purpose as most documents were received as images or on paper, but OCR could never deliver 100% data accuracy and always required manual intervention. To keep the costs down, the industry moved the capture offshore to low-cost centres. Very often, however, low cost meant low accuracy. There was a choice between basic, low-cost capture with re-keying and error correction, or maintaining the FTE count onshore to ensure accuracy.

The OCR industry had to improve the extraction quality to support the offshore demand, so they bolted-on artificial intelligence and machine learning to try and compensate for these errors. After years of refinement, the core technology still relied on converting an image file using OCR. The problem is OCR systems have not been able to reach the accuracy levels required to enable end-to-end automation. With re-keying and exception management the actual cost of OCR varies enormously and can be more than $10/invoice in some circumstances. We must not forget that PO and vendor compliance are very important factors when looking to automate a process like accounts payable, but the fundamental challenge is still: the need to accurately extract the required data from the in-bound document.

New-wave digital transformation 

In recent years, the market has changed and with the migration to the cloud for billing and finance applications and the rapid move to digital submission (accelerated by Covid-19), OCR is being replaced as the demand diminishes. We are now welcoming the next wave of digital transformation for human-readable documents.

Here at CloudTrade we lead the way in converting human-readable documents to an actionable machine-readable output with 100% data accuracy supporting true touchless processing. We do not apply OCR to application-generated (digital) documents. Instead, we have developed patent-protected technology delivered via a managed SaaS model to extract and analyse the technical elements of a document. We do this without error and guarantee 100% data accuracy. The extraction is the first part of our process, the second and equally important process is to understand the content and its context, applying logic and rules to ensure the output will meet the receiving application’s specific requirements to enable touchless processing.

So, if you need to process a supplier’s invoice, customer order, shipping document, application, contract, claim etc, and if the document is electronically generated, we can process within minutes of receipt and provide an end-to-end touchless process, with guaranteed accuracy you can trust.

Business rules, data validations, content enrichment and workflow actions are all part of the service. Sender format updates and rule changes are automatically accommodated, and the service runs 24/7. We support clients across the globe and interface to multiple back-end systems (FMS/ERP/DMS etc).

CloudTrade’s document-processing service: fast, precise, different 

Our implementation and outreach programs ensure your time-to-value can be measured in days and not months. In most cases, there is no change for the sending party as existing email boxes and file-transfer protocols can be accommodated. The sender bears no extra costs (e.g. printing, scans, postage, stationery), therefore adoption is high.

We provide an innovative, rapid ROI, low-touch, fully managed SaaS service. We push the boundaries in process automation and innovation, delivering tangible value to our customers and partners. Like an F1 team, CloudTrade has a passion for success and customer excellence. What we do is different – we are a disrupter. Thanks to our hugely experienced management team, you can have the confidence in the service we deliver in the race to the top.

We welcome enquiries and will be happy to quote for any document type, language or volume. The more complex the requirement, the better – and remember you get 100% data accuracy that you can trust.

For a quick informal chat about how we can solve your data capture requirements, book 15 mins in with me.

OCR and Perception

How is CloudTrade technology different to OCR?

Reading Time: 5 minutes
OCR and Perception
How data is perceived is crucial for it to be understood for automation

When looking at data capture solutions, the term OCR often pops up. A technology that has been around for years, it is often the ‘go-to’ for companies looking to automate their data capture. In this blog post, Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means. When it comes to the future of data capture and enabling automation, we need to look at data perception and understanding…

I am often asked to explain the difference between the service that we provide here at CloudTrade and those services which are sold under the banner of “Optical Character Recognition” (OCR).

There is almost a straight answer to this, which is that OCR deals with what we might call “human perception” whereas CloudTrade is more about “human understanding”.

I say “almost” because the waters get muddied on a couple of counts. I shall come to these later; but let me first define exactly what I mean by “perception” and “understanding”.

What do we mean by data perception and data understanding?

Perception is all about recognition in its most basic form. It’s the bit in our brains which translates swirly lines and dots and circles into meaningful letters in the English language. It’s also the bit that has to struggle with differentiating between “i” and “j” or “b” and “h” so that we don’t end up wishing people “bappy hjrthdays” or catching a “fjshes” on a “fjshjng book”.

Understanding, however, is all about meaning. It’s the bit that comes in after perception has done its job (assuming that it gets it right!) and figures out, say, that the word “fishing” in “fishing for compliments” has nothing to do with the word “fishing” when you’re fishing in the sea.

Where the difference in perception and understanding starts to get muddied is that both the providers of OCR based solutions and we, ourselves, at CloudTrade, offer services which are based on a combination of both of these technologies.

You can’t have one without the other

You can’t, after all, have understanding without perception (unless you’re some sort of yogi floating over a mat in the Himalayas), or perception without understanding (imagine trying to find your way around the Tokyo underground system when you don’t speak Japanese). CloudTrade and OCR-based solutions need to use both of these elements because providing this service means not only extracting the right numbers and letters from those documents that are sent to us but also understanding them well enough to explain that, for example, “quantity 1” in an order line next to “car mats” is probably referring to a pack of 4 whereas the same phrase next to “Lamborghini Veneno Roadster” is unlikely to be referring to a pack of 4 of them at all.

Traditionally, OCR-based solutions have focussed on the perception side of the problem because that is where they have invested the bulk of their R&D, leaving the understanding part to be provided mostly by humans.

The value is in the understanding

CloudTrade, on the other hand, has invested all of its R&D efforts on understanding, succeeding in bypassing the perception part completely by focusing on “data” documents such as “data” PDFs (where, for example, the letter “s” is unambiguously stored as the letter “s” rather than as a set of drawing instructions resulting in something which could look like the letter “s” to the human eye).

Data PDFs do not need OCR and can therefore be thought of as producing a “perception” result which is 100% accurate. 100% perception is the key enabler for the process of understanding, as it allows a natural language analysis to take place with high levels of sophistication as there is no fear that all of the logical steps taking place within it will be broken by some stray spanner in the works which changes the word “battery” to a “hattery” or omits a very important decimal point in the phrase “don’t exceed the recommended dose of 1.234 ml every 24 hours”.

Providing the fuel for automation

Sophisticated systems of understanding remove the need for human operators and allow services to operate in a fully automated manner. At the time of writing, CloudTrade is processing ten million documents a year in this fashion. As soon as errors in perception are introduced, such as by using OCR, failures start to occur in the grammatical rules which underpin the process of understanding, and more and more human intervention is needed resulting in less and less automation.

Alternatively, OCR solutions operate in this field because they embrace the human element of document processing. The advantage is that they are not limited to only processing data PDFs. Their disadvantage is that they cannot fully automate.

To assume is to…

The second way in which the difference between perception and understanding has been muddled is in the technology behind OCR, which has now made inroads into the world of understanding. To quote Douglas Hofstadter from his seminal paper on OCR and AI called “on seeing A’s and seeing As”:

“A tacit assumption is thus that the components of sentences–individual words, or the concepts lying beneath them–are not deeply problematical aspects of intelligence, but rather that the mystery of thought is how these small, elemental, “trivial” items work together in large, complex (and perforce nontrivial) structures.”

Douglas Hofstadter

This assumption is certainly true with data PDFs, and that “mystery of thought” is clearly where CloudTrade has put in all of its R&D efforts. However, should the need for OCR not disappear completely, as might happen if all interactions become electronic and “data” based documents become the norm, then the most promising future for OCR is likely to come out of a hybridisation of perception and understanding.

Variety is the spice of life? Not for data.

Although as I said earlier, OCR makes mistakes such as reading “fish” for “fjsh”, what it actually does is identify lists of variations rather than hard and fast answers and then present those variations with their individual certainty values to a user for arbitration (i.e. it could be “fjsh” (60%) or perhaps it’s “fish” (50%)). OCR vendors can then use dictionaries to automatically strip out nonsense words like “fjsh” and perhaps narrow down the possibilities to arrive at the right answer. This doesn’t work, however, when the OCR mistakes still result in words present in the dictionary, or when a word being considered is not necessarily an English word at all (like a part number in a catalogue).

A far more sophisticated solution would be to bring in all these variations in perception straight into the “understanding” engine and then allow the latter to crunch through all of the grammatical options.

This is something that we have experimented with at CloudTrade, since it is possible for us to connect to OCR as the “perception” part of our solution. In doing so we have, indeed, found that with a bit of patience and tailoring we can deliver an OCR based service which is just about acceptable and automatic for header-level capture, but it’s too painful and slow to be feasible on complex or not “near-perfect” scanned images.

Dictionary lookups have been a standard feature with OCR vendors for some time. Advances in Machine Learning may well improve matters further in the future. I doubt very much that any improvements will happen with things like invoices and purchase orders, where a lot of the key information doesn’t have very much context to draw upon to allow significant automatic corrections to be made, but there could be mileage in using this technology with historical documents written in proper flowing prose.

OCR may well have an interesting future when it comes to scanning documents that were written in the past, but it’s more than likely to now be a past technology when it comes to documents that are to be written in the future.

CloudTrade vs OCR

Want to know more about how CloudTrade differs from OCR Technology?

Download our guide outlining the key features of both types of solution and the differences between them.


Machine Learning and Hybrid Solutions

‘Project Grandalf’, Machine Learning and Hybrid Solutions

Reading Time: 4 minutes
Machine Learning and Hybrid Solutions - Project Grandalf
‘Project Grandalf’, Machine Learning and Hybrid Solutions

Making the right choice when looking at the technological solutions to solve your business problems has always been an issue. Too much commitment, too greater system change, perhaps too greater a cost – but what if you could find the best of both worlds? Richard Develyn, Chief Technical Officer, discusses the new and exciting developments in the CloudTrade solution suite to bring about the best of both worlds when it comes to document content recognition – Project Grandalf.

Implementing a Machine Learning solution in your organisation is a bit of a risky undertaking: you’re never entirely sure how long you’re going to have to wait before you get to see anything useful come out of it, or even whether something useful is ever going to come out of it at all.

In the meantime, however, you have to find a way of delivering your core business in an efficient and dependable way, which these days means using IT: not the strange, new, neural-network style IT being hyped about so much now, but rather the traditional IT which has been holding the world together for the last 50 years or so.

Machine Learning, exciting though it is, is still a very long way away from being able to take over from IT in total, and although there is a class of problem where traditional IT struggles, such as those requiring human assistance, which is where Machine Learning can usefully participate, most IT solutions are still delivered using the traditional IT “programming” way.

Hybrid solutions, however, can combine all of these approaches in order to get the best out of all worlds, as long as they’re created with care. Human judgement is slow to make and Machine Learning slow to learn, so processing still needs to go through the traditional IT route as much as possible if IT-speed levels of automation are going to be achieved. Humans and Neural Networks, however, can configure the system to make it run more accurately or efficiently, without trying to take over the job completely.

Project Grandalf, as it is affectionately known by the CloudTrade Development Team, is CloudTrade’s hybrid solution to the problem of document content recognition.

Extracting properly identified data from human readable documents is a complicated problem to solve. CloudTrade’s flagship product, Gramatica, does so by implementing a rules engine which allows rules to be written specifically for each document format to be processed. It is extraordinarily powerful, and Gramatica can deal with any requirement and complexity as long as it has the right rules written to do so.

Some documents, however, are not sufficiently complicated, or processed in sufficient quantities, to justify the rules-writing effort that Gramatica requires.

This is where ‘Grandalf’ comes in.

Grandalf’s Machine Learning Engine, which runs every night on a comprehensive data sample set, powers its knowledge database of data extraction algorithms. This collection of algorithms is then applied to every new document which arrives at the service, with an operator asked to clarify which algorithm has produced the right answers.

The operator’s responses are persisted in a database so that documents subsequently received from the same sender are automatically addressed by the right algorithm without further need for operator intervention. Should there be a variation in a document so that the right answers cannot be found, then the knowledge-base and operator process can be re-invoked so that the alternatives can be accurately handled.

It is this combination of Machine Learning, traditional IT and human intervention which provides the key benefit that Grandalf brings against the competition. It also illustrates the advantage that hybrid systems have against the more common “one technology only” approach.

There has always been a tendency in the marketplace to look for silver bullets. Silver bullets are easy to sell (i.e. “you have this problem, you use this silver bullet; you have that problem, you use that one”). Even if you were shooting at werewolves, however, you would be silly to make your bullets entirely out of silver – just put enough silver in them to make them toxic to the creature you’re shooting at then build the rest from good old fashioned lead-antimony and steel (that’s probably how they did it – back in the day).

We, at CloudTrade, don’t believe in silver bullets (or werewolves). We believe in solutions which are crafted from the best that the different technologies relative to the problem can offer, especially when they are made to work in harmony. As a result we are firmly convinced that Grandalf’s hybrid solution is the best way to approach the problem of document content recognition, beyond those documents which are so complicated that they require specific rules to be written to understand them (i.e. Gramatica). It is this hybrid combination of approaches that allows Grandalf to hit the sweet spot that the solution demands: Machine Learning and human assistance supporting traditional, deterministic, IT.

Grandalf is characterised by the following features:

  • It learns from one example only
    • Grandalf has a huge knowledge base of data capture rules which are applied to every document, with an operator then asked to help via a simple question and answer form
  • It’s 100% accurate
    • Once chosen, rules don’t exercise judgement or refer back to some Machine Learning database to get possible values and confidence levels; Grandalf’s rules are completely deterministic
  • It’s fast
    • Once an operator has helped Grandalf determine which rules should be used, documents fly through it at the speed of IT
  • It handles document variations
    • Grandalf returns to an operator if the rules for a given document fail to find a value, re-running its knowledge base to offer more alternatives
  • It continually learns and improves
    • CloudTrade’s selected data set feeds into the Machine Learning algorithm which every night updates Grandalf’s rules knowledge base by adding further data capture possibilities
  • It’s expandable
    • Grandalf can easily be expanded to cater for additional customer capture requirements

Project Grandalf, is set to be released and available to CloudTrade customers, under its official name, in January 2021. If you’d like to know more about how CloudTrade can help your business automate its data and documents, irrespective of volume or document type, please arrange a short meeting with us here.

Tour de France 2020

Bonjour de CloudTrade – en présentant la première Française de l’équipe, Rose Massie

Reading Time: 5 minutes
Bonjour de CloudTrade – en présentant la première Française de l’équipe, Rose Massie 2
Tour de France

Read this blog post in English at the bottom of the page.

Comme CloudTrade continue d’élargir sa clientèle dans le monde entier, nous avons décidé d’agrandir notre équipe et avons fait notre première embauche en France. Parallèlement, au lancement de notre site web en français, nous cherchons à faire connaître les produits et services de CloudTrade à un public plus large en France et dans les régions francophones. Dans son premier billet de blog pour CloudTrade, nous présentons Rose Massie qui a eu l’extraordinaire privilège de voir le Tour de France dans sa ville la semaine dernière.

Allez CloudTrade!

Bonjour à tous de Charente-Maritime. Moi, je m’appelle Rose et je suis ravie d’avoir été nommée la nouvelle Directrice de Marketing en France. CloudTrade a déjà une forte présence en France et à partir de maintenant nous offrirons plus de contenu pour nos francophones, y compris le lancement prochain de notre site web en langue française – restez à l’écoute pour le lancement. Je suis extrêmement fière de faire partie de cette entreprise dynamique et je me réjouis de pouvoir aider davantage d’entreprises en France à automatiser leur saisie de données. Même dans la circonstance actuelle qui représente un défi pour nous tous, CloudTrade garde son optimisme et a de plus en plus de clients qui ont besoin de ses services. C’était avec cet esprit de confiance que je me suis trouvée avec le PDG de CloudTrade, David Cocks, à Saint-Palais-sur-Mer en Charente Maritime pour quelques jours cet été. Il est de retour en Angleterre et, bien sûr il est maintenant obligé de travailler de chez lui à cause des nouvelles restrictions. Toutefois, David a eu la chance ultime de voir le Tour de France passer dans ma ville, à cinq minutes de la maison !

C’était un honneur énorme pour CloudTrade de soutenir le Tour de France et, le Jour J arrivé, moi j’ai choisi la robe la plus jaune dans mon armoire. David portait une casquette bleue CloudTrade pour diffuser la présence de l’entreprise d’une manière colorée. Nous avons regardé le peloton de tout près dans l’Avenue de la République à Saint-Palais-sur-Mer avec les fans de cyclisme palaisiens et d’autres touristes de la ville, pour la plupart français – tout le monde, sauf les coureurs, bien masqués comme il faut.

Le tracé de cette dixième étape du Tour est allé de l’Ile d’Oléron à l’Ile de Ré via mon petit coin des Charentes. Il y avait une grande caravane publicitaire qui a pris une avance de 1 h 40 sur le peloton. Alors, il fallait attendre avec patience en attendant l’événement lui-même. Heureusement, la foule n’était pas dense et nous pouvions voir les coureurs en gros plan. C’était vraiment comme un rêve. Le peloton est arrivé en fanfare et avec un bourdonnement palpitant. Le ciel était d’un bleu-azure et la grande chaleur était modérée par un vent léger de l’ouest agréable. J’ai fait une vidéo de vingt-deux secondes, puis les cyclistes ont disparu à toute allure, direction Ile de Ré. Ça a été vingt-deux seconds de splendeur – et, hourra, CloudTrade était là où s’est trouvé l’événement du jour.

 Veuillez consulter le site web de CloudTrade, qui sera bientôt lancé en langue française avec une multitude de blogs et plus d’informations sur nos produits. Vous pouvez me trouver sur LinkedIn et je vous invite à me contacter si vous souhaiter en savoir plus sur nos produits et services.

Rose Massie

Après avoir étudié le français et l’allemand à l’université, Rose a travaillé dans le domaine de l’éducation et de la traduction pendant de nombreuses années avant de se lancer dans le monde des affaires et du marketing.

Elle partage son temps entre Saint-Palais-sur-Mer et le Royaume-Uni. Elle est mariée, a quatre enfants adultes, et passe ses week-ends à marcher, jardiner et lire.

English translation

Bonjour from CloudTrade – Introducing the first French team member Rose Massie

As CloudTrade continues to expand its customer base worldwide, we have decided to grow our team and made our first hire in France. Alongside the launch of our French language website, we look to bring the products and services of CloudTrade to a wider audience in France and the French-speaking regions. In her first blog post for CloudTrade, we introduce Rose Massie, who had the extraordinary privilege of seeing the Tour de France in her home town, just last week.

Allez CloudTrade!

Bonjour to everyone from Charente-Maritime. My name is Rose, and I am delighted to have been appointed the Marketing Manager in France. CloudTrade already has a strong presence in France and from now on we will offer more content for our French speakers, including the upcoming launch of our French language website – stay tuned for the launch. I am extremely proud to be part of this dynamic company and look forward to helping more businesses in France to automate their data capture.   

Even in the current challenging situation, CloudTrade is optimistic and moving forward with growth plans. It was in this confident frame of mind that I spent a few days this summer with CloudTrade’s CEO, David Cocks, who joined me in Saint-Palais-sur-Mer in Charente Maritime on the west coast of France. Having returned to England he must of course now work in isolation for two weeks due to the new restrictions. For David this is definitely a price worth paying for as he had the great good fortune to see the Tour de France pass through my town – only five minutes’ walk from the house.

It was a great honour for CloudTrade to support the Tour de France! On the day I picked a yellow dress from the wardrobe and David sported a blue CloudTrade cap to add a little colourful publicity to the day. We were able to watch the peloton from right up close, at the very edge of the local Avenue de la République, standing with local cycling fans and some other (mostly French) tourists, not too close together and all of us appropriately masked.

The route of this 10th stage of the Tour went from Ile d’Oléron to Ile de Ré via my neck of the woods in Charente-Maritime. There was a raucous one-and-a-half-hour procession of advertising floats that noisily prepared the ground for the arrival of the cyclists, so we had to exercise some patience while waiting for the event itself. However, we were delighted with our wonderful roadside position and when the competitors came into view it was like entering into a fabulous dream. The peloton was given a spectacular welcome and there was a thrilling and intense whirring of very fine wheels. The sky was azure blue and the unusual September heat was moderated by a welcome westerly breeze. I busied myself making a 22-second video and then they were gone, at unbelievable speed, off to Ile de Ré. It had been a magnificent 22 seconds – and, hurrah, CloudTrade had been present at the action!

Please look out for the CloudTrade website, launching in the French language soon, with a host of blogs and more information about our products and services. You can find me on LinkedIn and I invite you to contact me as I would very much appreciate discussing your company’s data capture and document automation requirements.

About Rose Massie –

Having specialised in French and German at University, Rose worked within education and translation for many years before moving into business and software sales.

She divides her time between Saint-Palais-sur-Mer and the UK, is married and has four grown-up children. She spends her weekends walking, gardening and reading.

Video of the Tour de France –

The Tour de France en Charente-Maritime

Optical Character Recognition

Did you hear the long tale about the long tail?

Reading Time: 4 minutes
A tail up to 2.4m is pretty long

National Geographic tells us that the Giraffe has the longest tail of any land mammal – a Giraffes tail can measure up to 2.4m (apparently).

I asked my children what other animals have a long tail (it should be noted that during the months of lockdown we’ve been asking more and more abstract questions, so this seemed quite normal to them!). Responses included: the ring tailed lemur, monkeys (no particular sub-species offered), rodents and our (long suffering) dog. However, the longest tail when it comes to the proportion of an animal’s body in relation to its tail length is the Asian grass lizard, according to National Geographic. Although the tail length is only 25cm’s long, this is over three times its body length.

I’ve been fortunate to have travelled the world quite a bit for work and fun and I’ve had opportunity to see giraffes, lizards and monkeys up close. I’ve also had opportunity to see rodents up close (very close!) recently during a house renovation.

Each of these animals seems quite comfortable with its body and its long tail. In fact, our dog can entertain herself for hours chasing her tail (but that’s another blog altogether).

The only animal that seems to suffer from a long tail is the Homosapien.

So what do we mean by the ‘long tail’?

Simply put, long tail documents in the business world are the low volume of documents from a high volume of senders. This is typically seen in the Accounts Payable and Finance departments, but the premise can also be applied to any type of business transaction requiring documentation.

Having worked with ERP systems for 20 years, I’ve seen the challenges, pitfalls and benefits of getting to grips with the data trail. So, sharing some of these wise old years, I’ve pulled together my thoughts of the different solutions available to address the challenges.

Optical Character Recognition (OCR) – Recognising the limitations

Optical Character Recognition (OCR) was a game changer to the business world. With it’s earliest inventions originating in 1870, it was developed and in widespread application use since the 1960’s, it has helped streamline business processes and, to a point, support automation. Incorporating OCR tech into business process was good addition, however its limitations always meant that other solutions were needed to support the process. The misreading characters, changing document structures and the manual intervention, needed to ensure high levels of data accuracy, have pushed organisations to look for more sophisticated technologies to help automation.

Also, when you think about it, bar coding, loading scanners and correcting mistakes doesn’t really support the digital transformation organisations are looking for or need. It seems like OCR tech needs more manual processes to solve a manual process – odd huh?

Electronic Data Interchange (EDI) – Interchangeable but inflexible

Many organisations turned to Electronic Data Interchange (EDI). This is a much more reliable method of capturing data accurately and at speed. Large files transmit data in an agreed format and allow seamless integration between sender and receiver. Ok, sounds great! But the challenge here is that it needs both sides to commit to a technical and operational strategy and often requires a high financial commitment (relative to the value of the document processed) to set up and maintain. So, EDI is ideal for the highest volume senders, but for the long tail? Most definitely not.

Purchase Order Flip (PO) – PO Flip or PO Flop?

The emergence of portals and offering the supplier the ‘opportunity’ to do PO Flip to create an invoice seems like a perfect option to reduce the Accounts Payable long tail, well in theory. The major challenge here is that suppliers don’t want to re-key information or have to manage multiple portals to raise their invoices, ok this idea might just be a PO Flop.

Although these technologies have helped reduce the long tail slightly, they do not provide the coverage needed. Long tail? Still a problem.

The new tech on the block?

Emerging technologies like Robotic Process Automation (RPA) and Neural networks are technologies that I believe will offer some assistance in this area in the future. Although not new tech, their increasing involvement in the document automation domain has been noted.  

More organisations are exploring RPA with great effect in areas such as sharing data (critical at the moment for the response to Covid-19) and accelerating tasks such as the onboarding of staff. However, for processing documents the projects are falling short of expectations. Many are proving costly or just unreliable as the systems are still reliant on dated technology at the source (OCR) to capture the data on entry. Using RPA to process inbound documents is arguably a problem for the entire supply chain, not just the long tail, and many projects still require manual correction of data. It’s still a problem either way.

Natural Language Processing (NLP) is the technology on which CloudTrade’s service is based. This is a proven subset, having been around for many years, of Artificial Intelligence that enables our service to understand the logic and meaning of a document. Once you understand this the data is available with 100% accuracy regardless of ‘problems’ like data moving on the page.

Additional benefits such as no change in process for the supply chain, deployment within weeks and with no manual processing, mean its clear why there is such a high demand for the service, now more than ever.

Did you know?

The tufted ground squirrel (nicknamed the ‘vampire squirrel’) has a tail that is 130% of its body volume. This is to confuse its predators.

The longtail seems to cause confusion to a lot of solution and service providers, but there is a way to manage it. Does CloudTrade solve the challenge of the long tail? Well simply put, yes. Our core solution, Universal Capture, does process the long tail documents and automate much of the processing with great accuracy. Perhaps not quite as well as the short tail (these can be truly automated with perfect accuracy), but since many of our clients have elected to close their post rooms, get rid of their scanners for documents such as invoices and use our solution instead, I think it’s the closest the world has seen yet.

David Cocks, CloudTrade CEO

CEO David Cocks – ‘CloudTrade continues on the path to growth success’

Reading Time: 4 minutes
CEO David Cocks – 'CloudTrade continues on the path to growth success' 5
Lockdown has got the family and me out cycling!

Despite challenging economic circumstances and the impact on businesses globally, CloudTrade has maintained support levels to meet customer demand and is still set to meet growth targets for FY21. David Cocks, CloudTrade CEO, discusses the challenges, successes and unexpected benefits to come from the global pandemic and subsequent lockdown.

David Cocks, CloudTrade CEO –

The Team

It has now been four months since the UK went into lockdown and we temporarily moved out of our London and Newcastle offices into the atypical situation of complete home working. Fortunately, due to our cloud set-up, all teams were able to quickly settle into home working, with very imaginative work set ups, including bedrooms, kitchens, garden sheds and, for Michael Thomson (Head of Engineering) his under-the-stairs cupboard.

Our team remaining healthy, physically and mentally, has been of the utmost priority and I’m pleased to say that, thank goodness, the team and their families have largely avoided the virus and remained fit and well.

For our customers and partners, the experience of CloudTrade day-to-day has remained unchanged with ‘business-as-usual’ being the overwhelming phrase. Although, our online calls have become a bit more lively and included backgrounds to provoke a conversation (“nice wallpaper” is one I’ve heard) or perhaps the interruption of a pet/child/spouse – which I’d like to think has in fact added variety and the human element to business conversations.

I’m very proud of how our teams have transitioned so smoothly and coped well with the upheaval and unpredictability of the lockdown. As the saying goes, a business is only as good as its people, and that couldn’t be truer than at CloudTrade.     

Transactional volumes – we’re breaking records!

With the well reported economic slump seen during the early spring, it was only natural that CloudTrade saw a speedy decline in volume of business documents processed each day, as trading in general declined. However, these volumes quickly bounced back with May seeing a steady increase and June back to pre-lockdown levels. The dynamic, adaptable, and sustainable nature of CloudTrade’s data capture and extraction software means it responds readily to changes in volume without compromising on the speed of processing or quality of the data capture. The CloudTrade service continues to deliver the data you trust whatever document or how many are processed.

Big news for CloudTrade – July is a record breaker! From our preliminary reports July 2020 is set to be a record month for CloudTrade in terms of the volume of documents process and the number of new customers going live. This is an exciting step in the growth plans for CloudTrade. We are still on course to hit our business growth targets despite a global pandemic and when many businesses in our sector have struggled. This is testament to the hard work of our teams, the fantastic customers we work with and a great product.

The future looks green and blue

In spite of the challenges seen in the last few months, the Development and Operations teams have successfully continued the rollout out of our new, Azure-hosted, auto-scalable-on-demand, containerised architecture. Although processing times have always been speedy, this new rollout has reduced processing times in some cases by over 90%. The inherent fault tolerance and resilience in the design of the containerised architecture now boosts our service when running 365 days a year 24/7 and guarantees processing times even during the absolute peak loads.

Lockdown has also given us opportunities to refocus on our product development. Very excitingly we are developing the latest version of our machine learning algorithms to create ever more sophisticated capture heuristics from large data sets of historical documents, increasing the knowledge learnt from documents therefore enabling speedy and more sophisticated auto-rules writing. We will be demonstrating the beta release with a new user interface around Christmas 2020 – one to watch out for.

Lockdown priorities changed

And finally, as we start to cautiously move back into our offices and make these COVID safe, it is a chance to look back and reflect on what lockdown has meant for our customers, CloudTrade and ourselves. For me, overwhelmingly the lockdown has re-prioritised the need for automated business document processing without the ties to manual processing for many businesses. In times of crisis, the phrase ‘all hands on deck’ springs to mind and everyone rallies to keep a company afloat. What you don’t need is valuable team members being tied to manual processing of invoices etc. or relying on needing a physical OCR scanner in the office (which is closed). We’ve seen some customers appreciate the need for our tech and we’ve then onboarded them quickly (sometimes in two weeks), to support them during the pandemic.

Furthermore, it is a chance to reflect personally on what I’ve experienced during lockdown. Without the commute, my bike rides have certainly been more frequent. I thought you’d all enjoy the picture of my family and me on a recent bike ride around the West Sussex countryside – a beautiful place to enjoy, which I’d thoroughly recommend!

Interested in learning more?

Look out for my upcoming report on the impact of COVID on different industries, looking at the impact on trade in general and the estimated recovery.

David Cocks, CloudTrade CEO and cycling enthusiast

Logistics Technology

That’s enough – the logistics industry needs to change.

Reading Time: 5 minutes

I know, its contentious, but I’ve said it. Logistics as an industry needs to change. I’ve met far too many businesses in this industry that are still using the processes and systems that were put in 20 years ago, that should have been replaced by something less manual. It’s not just the workers or revenue that suffers from repetitive and clunky fulfillment but the customer expects better, especially in an age where you can order something today, and track it all the way to your front door by tomorrow.

I don’t think it’s the big bucks

The reason for the lack of change? Well there’s a few, but it’s certainly not the lack of spend in the market. In 2018 US, companies spent $1.5 trillion on logistical expenses! That’s 8% of the entire US GDP*. In my opinion, the main reason for the lack of change is the concern on implementing modern systems, integrating them with back office processes and the disruption to the supply chain. It’s not like a business can stop for a few days to change systems and then deal with the consequences for up to months on end (I’ve seen this too). But what if there was a solution that could automate your processes to meet and exceed customer expectations, with no change to your processes, no interruption to your supply chain, no corrections needed downstream and with a cost of a few cents a document?

Here’s how you can automate and accelerate your freight invoice processing – a relatively simple but dramatic improvement to processes, using lite touch technology, without business interruption.

Freight invoice processing can be tedious and challenging, and often exacerbated by the dependence on OCR systems or manual keying data from an invoice. The struggle is real – its a labor intensive process and is prone to high error rates when capturing data. Everyone accepts this, recognizes that automating these processes is the key to improve processing times, maintaining customer satisfaction and to removing resource heavy processes from employees daily tasks.

Accurate, efficient and saving your time and money

If my comments in the paragraph above resonate then you should consider Universal Capture from CloudTrade.

We’ve expanded our technology stack and now provide a one-stop solution that automates the capture process for all types of freight invoices – digital files and image files and high or low volumes, with optimized accuracy.

It’s the only solution on the market, with a comprehensive rules engine, that can be tailored to your business’s exact requirements and it meshes perfectly with any ERP or TMS.

Send an email – is that it?

CloudTrade Universal Capture is the leading choice when it comes to capturing data as it removes the entry and engagement barriers of entry to use the solution – we guarantee high supplier adoption. All your customers need to do is send their Freight Invoice to a specific email address which CloudTrade sets up, then in the background we work our magic (well it’s science really, but that’s not quite as exciting), by capturing the data and validating it against your bespoke requirements using our world class rules engine. Then augmenting the data as required and submitting it into your systems automatically. If there is something amiss, its flagged before it enters the system, ensuring your business gets the right data every time 24 hours per day, 365 days per year – often processed and returned to you in minutes.

We know that accuracy is so important to our freight logistics customers because of the cost of rectifying mistakes downstream through the payment or audit process. Delivery failures, cost of returns and correcting volume or load board order errors is costly, and these days, unnecessary expense when the data is easy to capture.

All invoices are welcome!

The struggle for most businesses is trying to apply a ‘one-size fits all solution’ to their system or processes, even though invoices are received in different document types and volumes per supplier. With Universal capture, all invoice types and other documents are welcome, and the solution can be configured to handle a wide range of file types and sizes; when it comes to the processing of the data, we manage single page and multi-page document sets containing both data and images and separate these into single document sets to minimize human intervention and maximize data accuracy.

Universal Capture offers one inbound channel for all types of documents and can validate, sort and process those documents synchronously for organizations to upload into their TMS. Other types of documents are welcome too, from invoices to orders to carrier receipts to the large PDF invoice sets sent out by the larger carriers (we regularly process a 1500-page document into separate shipments for one of our customers).

But how is that actually possible?

So, we’ll let you in to our secret box of tricks. To get the job done, Universal Capture works by being versatile when processing data from different files. With high supplier invoice volumes, CloudTrade can guarantee 100% data accuracy by setting up a standard set of logic rules for data capture for that supplier. The data capture then follows the same rules for each invoice transmitted through CloudTrade, these rules are managed by CloudTrade support to maintain completely accurate data capture, even if the invoice design were to change.

Image files, while less commonly used, are still an important part of data capture and something that Universal Capture can process. Despite image files not containing a data layer (so characters are interpreted rather than extracted), CloudTrade can still offer accuracy levels of around 90%, as we use the same rules technology to check what we have captured – at the time of capture. If we can’t validate the captured data against your requirements, we will send it to our intervention portal where you can fix the problem and resubmit it for processing. Just to be clear some manual correction may be required with image files, but we can help with that to. We can set automated messaging encouraging the suppliers to submit digital files to help improve the accuracy of their data capture and we have found that most will do this if they can get their invoice paid faster!

Finally, for suppliers with low volumes of inbound invoices and documents, CloudTrade Universal Capture can use its learned experience from other invoice types to extract a generic set of the data. Once again, this will eliminate most of the human intervention that is typically required when it comes to processing documents.

Document automation during COVID

We know that these are uncertain times for businesses, but one thing is clear, the trucks will keep on rolling and goods still need to be moved quickly and reliably. In the midst of the current pandemic, human intervention in these processes has, for some organizations, been a challenge and in some cases, it was just not possible with a home-based workforce. The good news is that we were able to help. Our technology is delivered as SaaS, accessed via the web and our portal is accessible from all well-known browsers. We have been able to offer an enhanced service to our clients for their home based workers that guaranteed high capture rates and access to our intervention screens for any data capture or mapping issues – now if only you had known that 3 months ago!

With Universal Capture from CloudTrade, you could use just one solution for all inbound freight and other documents, with implementation in as little as two weeks. Within 15 mins, you can find out how we can help your business and we’ll prove our solution works.

Go on, book in 15 mins with us, it could be the first step to an automation reality.

The blog post was originally published on our Logistics specialist website,

*Flock Freight (2018) For the Love of Logistics

RPA Technology

Robots don’t make mistakes – but data does!

Reading Time: 3 minutes
RPA Technology
RPA bots, don’t make mistakes if the instructions are correct.

There has been a huge amount written about the benefits of Robotic Process Automation (RPA) and probably as many column inches dedicated to the challenges and pitfalls. In this article and our upcoming webinar, we explore the role that data plays in all RPA projects and the impact that bad data has on the robots and the desired business outcome.

Whatever industry you work in, or in whichever interest you may have, you will almost certainly have come across a story about how “data” is changing the face of our world, particularly “big data”. You may have heard this term as part of a study helping to cure a disease, boost a company’s revenue, improve customer service, make a building more efficient or be responsible for those targeted ads we keep seeing.

But we don’t mean THAT “data”!

Despite what term is commonly used, data is simply another word for information. But in computing and business, data refers to information that is machine-readable as opposed to human-readable.

In business, we receive masses of data in human readable form such as contracts, invoices, orders  or HR records etc. These documents need to be converted to a machine-readable form so that technology, like RPA, can be used to automate the process end-to-end.

The challenge is to firstly have the creator of the document produce it in a digital format that is also human readable, so that further downstream this can be read, data extracted and passed to a robotic process for downstream automation. Data extraction can be achieved at 100% accuracy if produced in a digital format (if the format contains a text layer).

Images causing havoc

But, where the sender chooses to create an image file, you must rely on Optical Character Recognition (OCR) to convert the text to a machine-readable format. The problem with OCR is that as the receiver has no control over the image quality or how data is presented, the net result is you can never guarantee accuracy and it’s these data errors that cause havoc with the RPA process.

Ensuring the best data for your robots

To make sure your bots do not go awry, the first challenge is getting the sender to create a digital document. To do this, we need to remove any barriers, ensure there is no cost or resource requirement and ideally no process change for the sender. The second challenge is to remove paper or image files that require OCR.

Bad data, big problems

Let’s consider the consequences of bad data for a minute. The impact of misreading a measurement or value could mean an engine part is manufactured to the incorrect size or an order gets processed with the wrong amount, a -10 becomes 100 and so on. Data without context delivers a second layer of complexity, as ‘ea’ could be read as ‘each box’ and not ‘each unit’ etc. There is a clear and obvious need to not only read data accurately but also to understand the context of a data element.

Now consider these challenges at scale and the impact of such errors on ‘big data’ as more of the world’s business processes become digital and move online, the need to process data at scale accurately has never been more important.

RPA for business process automation

In the world of shared services, we have looked to deploy RPA in areas such as invoice and order processing to increase automation and drive efficiencies. Through the implementation of innovative technologies, such as RPA, the human task is rapidly moving from the mundane and repetitive to those of quality control and cognitive value creation. The theory is great, but the reality is that unless the right technology and business process is deployed to convert human readable documents to that of a machine readable format, the data for the RPA bots will always contain errors. You can read more about RPA integration and CloudTrade here.

Technology for data perfection

There is a solution to read digital documents and process that data into a format a machine can read to give bots the right tools for the job.

We’re running a webinar focusing on this integration for RPA, sign up is available here and will address how this proven approach works for RPA , provide a live demonstration of delivering 100% accurate data, and how to automate business processes that will eliminate human intervention.


Our journey from Monolith to Microservices

Reading Time: 5 minutes
Our journey from Monolith to Microservices 6

Richard Hooper, Head of Systems, explains how CloudTrade upgraded its software environment to cope with increased demand and some of the problems solved along the way.

Just over a year ago at CloudTrade, we made the jump and decided that containers (using Kubernetes) were the answers to all our application issues. In this article I will examine why we have chosen to jump on the container band wagon, which could be termed as the ‘latest tech craze’, as well as how we solved some of the issues along the way, but firstly, a little about me.

About me

I’m Richard Hooper, Head of Systems and a Microsoft MVP in Azure. I started with CloudTrade back in March 2018 as a Systems Architect. As CloudTrade grew so did my responsibilities, and now I manage a team that look after the internal servers as well as the desktop, Azure estate, and the whole production estate.

My passion lies in all thing’s technology based and specially Microsoft Azure. In my spare time I blog about Azure at Https:// and can be found hosting the North East Azure User Group.

Was a container system the right thing to do?

It’s a question I ask myself often. With the rate of change in the cloud world you kind of have to keep questioning and evaluating, as a new technology comes out almost monthly, well it seems to anyway. Every time I ask myself, I always come to the conclusion of, yes. However, as we became more familiar with microservices and what we need from our application, I know we made the right choice.

Why microservices?

The application that powers CloudTrade’s unique data acquisition technology, Gramatica, started life as a sort of Desktop application. It needed the user to be logged in and wrote a lot of files onto the server or desktop. One good thing is that when the application was first created, it was created with steps and each step had a sort of handover using files. When I found out about this, it was a relief as it should make the move to microservices easier.

Why change then, I hear you ask! Well for a start the management of the server and application became difficult, especially if you wanted to do any kind of automated patches and, I certainly did not want to keep patching servers out of hours. But the main driving force for the move was scalability – the dream for a software business.

With the way the application was created, and all the file access, at the time, scaling was a right pain! First you had to run more copies of the application per user if there were enough free resources on the server or spin up a new server and migrate the user and application to it. Sometimes we would also hit disk issues, capacity and IOPS.

With the move to Kubernetes, an open-source container-orchestration system, and more specifically Azure Kubernetes Services (AKS) this headache has gone away. Our AKS cluster utilises something called Virtual Machine Scale Sets (VMSS) which allows for the cluster to auto scale it’s nodes when resources are becoming constrained, all done automatically. Another great feature with Kubernetes is the way it can automatically scale your deployments (a deployment is a collection of pods, a pod is a wrapper for containers in Kubernetes). How awesome is that?! But all this awesomeness still came with issues, issues that we had to get over to make this journey a true success.

Oh no, not issues!

Yes, with any journey you are always going to have hurdles along the way and this one is no different. One of our main issues, is that part of our new microservices application needs to be run in Windows containers. This was the problem we tried to fix first – some may say that was a mistake as Kubernetes did not support Windows containers at the time, but Docker did!

To get round this issue, we are currently running the microservice on Windows server 2019 in a VMSS using a custom hardened image. We currently run 6 containers per node, 1 for configuration and 5 for actual processing.

Scaling became a bit of an issue as we moved more onto this new microservice. As we are now using RabbitMQ instead of the file system, we came up with a brilliant solution of using an Azure Logic App to query the RabbitMQ cluster, which is running inside our AKS cluster, every 15 minutes. It checks the queue size and how many containers are consuming the queue and will then either scale up or down the VMSS nodes. Unfortunately, we had to choose 15 minutes for the check as the nodes can take a while to come up.

We are currently rewriting this application to run in Linux, so my tip is if you can get away with not running Windows containers then do it!

As we are using RabbitMQ, to scale our microservices that run inside the AKS cluster, we were unable to utilise any of the basic container autoscaling that comes with Kubernetes. After some research we came across Keda, which is an open source project by Microsoft and Red Hat. Keda extends the basic container autoscaling and allows us to scale based on RabbitMQ queue size and quicker than the logic app approach we used above. We were quite lucky that Keda went GA just in time for us to release the second phase of containers.  

What’s next?

We are continuing our journey with the next phases being worked on. We hope to get the release into production by the second half of this year. Once each step has been finished, we will end up with what we are calling a skeleton of our old application which will still be running on the servers. There will need to be some time spent to remove these to complete our journey as we are envisioning that there will be no need for any servers apart from the AKS nodes.

We will also continue with another journey. This one is to utilise tools like GitHub Actions and Azure DevOps which will help to automatically build and release each microservice to our test and then production AKS cluster. This will enable us to fully embrace the ‘DevOps mentality’ by not only improving internal processes, but also improving the application.

Feel free to reach out if you would like to discuss any of the above – thanks for reading!

Our journey from Monolith to Microservices 7

CloudTrade specialises in converting documents (with 100% accuracy)

so humans can read them.

Learn more about CloudTrade and our technology here.