Our journey from monolith to microservices
Richard Hooper, Head of Systems, explains how CloudTrade upgraded its software environment to cope with increased demand and some of the problems solved along the way.
Just over a year ago at CloudTrade, we made the jump and decided that containers (using Kubernetes) were the answers to all our application issues. In this article I will examine why we have chosen to jump on the container band wagon, which could be termed as the ‘latest tech craze’, as well as how we solved some of the issues along the way, but firstly, a little about me.
I’m Richard Hooper, Head of Systems and a Microsoft MVP in Azure. I started with CloudTrade back in March 2018 as a Systems Architect. As CloudTrade grew so did my responsibilities, and now I manage a team that look after the internal servers as well as the desktop, Azure estate, and the whole production estate.
My passion lies in all thing’s technology based and specially Microsoft Azure. In my spare time I blog about Azure at Https://pixelrobots.co.uk and can be found hosting the North East Azure User Group.
Was a container system the right thing to do?
It’s a question I ask myself often. With the rate of change in the cloud world you kind of have to keep questioning and evaluating, as a new technology comes out almost monthly, well it seems to anyway. Every time I ask myself, I always come to the conclusion of, yes. However, as we became more familiar with microservices and what we need from our application, I know we made the right choice.
The application that powers CloudTrade’s unique data acquisition technology, Gramatica, started life as a sort of Desktop application. It needed the user to be logged in and wrote a lot of files onto the server or desktop. One good thing is that when the application was first created, it was created with steps and each step had a sort of handover using files. When I found out about this, it was a relief as it should make the move to microservices easier.
Why change then, I hear you ask! Well for a start the management of the server and application became difficult, especially if you wanted to do any kind of automated patches and, I certainly did not want to keep patching servers out of hours. But the main driving force for the move was scalability – the dream for a software business.
With the way the application was created, and all the file access, at the time, scaling was a right pain! First you had to run more copies of the application per user if there were enough free resources on the server or spin up a new server and migrate the user and application to it. Sometimes we would also hit disk issues, capacity and IOPS.
With the move to Kubernetes, an open-source container-orchestration system, and more specifically Azure Kubernetes Services (AKS) this headache has gone away. Our AKS cluster utilises something called Virtual Machine Scale Sets (VMSS) which allows for the cluster to auto scale it’s nodes when resources are becoming constrained, all done automatically. Another great feature with Kubernetes is the way it can automatically scale your deployments (a deployment is a collection of pods, a pod is a wrapper for containers in Kubernetes). How awesome is that?! But all this awesomeness still came with issues, issues that we had to get over to make this journey a true success.
Oh no, not issues!
Yes, with any journey you are always going to have hurdles along the way and this one is no different. One of our main issues, is that part of our new microservices application needs to be run in Windows containers. This was the problem we tried to fix first – some may say that was a mistake as Kubernetes did not support Windows containers at the time, but Docker did!
To get round this issue, we are currently running the microservice on Windows server 2019 in a VMSS using a custom hardened image. We currently run 6 containers per node, 1 for configuration and 5 for actual processing.
Scaling became a bit of an issue as we moved more onto this new microservice. As we are now using RabbitMQ instead of the file system, we came up with a brilliant solution of using an Azure Logic App to query the RabbitMQ cluster, which is running inside our AKS cluster, every 15 minutes. It checks the queue size and how many containers are consuming the queue and will then either scale up or down the VMSS nodes. Unfortunately, we had to choose 15 minutes for the check as the nodes can take a while to come up.
We are currently rewriting this application to run in Linux, so my tip is if you can get away with not running Windows containers then do it!
As we are using RabbitMQ, to scale our microservices that run inside the AKS cluster, we were unable to utilise any of the basic container autoscaling that comes with Kubernetes. After some research we came across Keda, which is an open source project by Microsoft and Red Hat. Keda extends the basic container autoscaling and allows us to scale based on RabbitMQ queue size and quicker than the logic app approach we used above. We were quite lucky that Keda went GA just in time for us to release the second phase of containers.
We are continuing our journey with the next phases being worked on. We hope to get the release into production by the second half of this year. Once each step has been finished, we will end up with what we are calling a skeleton of our old application which will still be running on the servers. There will need to be some time spent to remove these to complete our journey as we are envisioning that there will be no need for any servers apart from the AKS nodes.
We will also continue with another journey. This one is to utilise tools like GitHub Actions and Azure DevOps which will help to automatically build and release each microservice to our test and then production AKS cluster. This will enable us to fully embrace the ‘DevOps mentality’ by not only improving internal processes, but also improving the application.
Feel free to reach out if you would like to discuss any of the above – thanks for reading!