Migrating From Jenkins To Argo At Sendible

Originally written on 17 May 2021

This was originally written by me and published on the Argo blog in May 2021.

Here at Sendible, we are embarking on a program to make our application and development stacks more cloud-native, and we soon found that our existing CI solution wasn’t up to the job. We set about finding an alternative and thought that documenting our process might help others in a similar situation.

Why?

Jenkins is arguably still the de facto CI tool. It’s mature and there are a wealth of knowledgeable people out there on the Internet who can help you get the best out of it. However, with maturity can come challenges.

The main pinch points were…

Plugin spaghetti

Jenkins has an abundance of plugins. The downside is, Jenkins has an abundance of plugins! Finding the right one to suit your needs, assessing the security impact of the plugin, and then keeping on top of updates/maintenance can start to become a real headache.

Not cloud-native

It is, of course, possible to run Jenkins in Kubernetes, and equally possible to spin up dynamic pods as jobs are triggered. However, Jenkins wasn’t originally designed to work this way and after using it, it starts to become clear that it doesn’t interoperate fully with Kubernetes. An obvious example is that the main installation of Jenkins can only run in one pod, so there is no HA deployment in case it is evicted or crashes.

Similarly, Jenkins’ natural approach to running a job is to deploy all required containers into one pod. This means starting up all required containers at the beginning of the run, and not releasing them until the end. As everything is in one pod, and pods cannot span multiple nodes, there is a limitation on how nodes can be used to accommodate the workload.

There are of course ways around this — for a while, we had cascading Jenkins jobs to trick it into providing us with dynamically provisioned pods… but after a while, we realized that we were just fighting a tool into doing something it wasn’t designed to do. The pipeline code soon became hard to maintain as a result, and debugging jobs became complex.

Cost efficiency

At Sendible, we found ourselves putting more and more workarounds in place to try and balance running our CI in a tool we knew, using Kubernetes, and keeping costs down. In reality, we were losing more time and money in maintenance costs than we would ever save.

There are other cost considerations. A well-used Jenkins controller can consume a large number of system resources, and the aforementioned single pod-per-job concern means you may need to provision large servers. If you’re running Jenkins outside of Kubernetes, and you don’t have an auto-scaling system in place, you might have agent nodes running all the time, which can increase your costs.

So why Argo?

We were already using Argo CD for GitOps, and have completed a POC on Argo Rollouts to manage future releases. So it made sense to at least investigate their brothers Workflows and Events.

It was immediately apparent how much faster Argo Workflows was when compared to our existing CI solution, and due to the retry option, we could make use of the Cluster Autoscaler and AWS Spot Instances, which immediately brought our CI/CD costs down by up to 90%! More cost savings were found when we saw that pods are only created when needed, resulting in the ability to provision smaller servers for the same job.

We also wanted something that had the potential to expand beyond CI. Ultimately we were after a flexible “thing-doer” that we could use in multiple situations. As well as regular CI jobs, we already use Argo Workflows and Argo Events for:

  • Alert Remediation (receive an alert from Alertmanager and trigger a workflow to remediate the issue).
  • Test environment provisioning from Slack.
  • Automatically testing our backup restores and alerting when there’s an issue.

How long did it take?

As with all things DevOps, the process is ongoing, but with just one person on the initial project armed with just some Kubernetes knowledge, but no Argo Workflows or Events knowledge, we had a basic proof of concept up and running within a day. This was then refined over two weeks, where we productionised Workflows enough that we were comfortable (making it HA, adding SSO etc.) for it to be adopted by the wider team.

Some things we learned along the way

As with all tool implementations, the process wasn’t without its challenges. Hopefully, this short list below might help others when they embark on a similar journey:

Un-learn “The Jenkins Way”

If you have spent years using Jenkins Pipelines, a cloud-native pipeline solution probably won’t come to your mind naturally. Try to avoid just re-writing a Jenkins pipeline in a different tool. Instead, take the time to understand what the pipeline is designed to achieve, and improve on it.

The dynamic pod provisioning of Argo Workflows means you will have to re-approach how you persist data during your job. The official approach is to use an artifact repository in an external storage solution such as S3, but for more transient data, you could consider setting up a RWM PVC to share a volume between a few pods.

Equally, you can use this migration as an opportunity to re-think parallelism and task ordering. Jenkins pipelines of course offer parallel running of steps, but it’s something one has to consciously choose. Argo Workflows’ approach is to run steps in parallel by default, allowing you to simply define dependencies between tasks. You can write your workflow in any order and just tweak the dependencies afterward. We recommend you keep refining these dependencies to find the best fit for you.

Make use of workflow templates

Where possible, try to treat each step in a workflow as its own function. You’ll likely find that your various CI jobs have a lot of common functions. For example:

  • Cloning from Git
  • Building container(s)
  • Updating a ticket management system or Slack with a status

Write each of these process steps as an individual workflow template. This allows you to relatively build a new CI process by just piecing these templates together in a DAG, and then passing the appropriate parameters to them. With time, writing a new CI process becomes primarily an exercise in putting the building blocks together.

You don’t have to ‘Big Bang’ it

The word “Migration” is scary, and has the potential to be filled with dollar signs. It doesn’t have to be.

If you have Jenkins in place already, resist the urge to just rip it out or think you have to replace everything in one go. You can slowly run Workflows alongside Jenkins — you can even get Jenkins to trigger Workflows. When we started, we moved our automated integration tests across, before then moving on to the more complex CI jobs.

Make use of the Argo Slack channels and the Github Discussions pages

The Argo docs are good, as is the Github repo itself (especially the Github Discussions pages), but there is a good group of knowledgeable people using Argo in weird and wonderful ways, and they mostly hang out in the Slack channels).

What next?

We are still learning new things about Argo Workflows with each day we use it, and we are still in the phase of continually refactoring our Workflows to get the absolute best out of them.

Version 3.1 of Argo Workflows isn’t too far away and we are looking forward to the upcoming features. Of particular note, Conditional Parameters will enable us to remove a number of script steps and Container Sets will allow us to speed up certain steps in our CI.

If you have any questions about my experience with Argo Workflows and Argo Events, you’ll probably find me in the CNCF Slack workspace, or you can contact me through the Sendible website.