Clojure: Unlocking incremental automation

We've started slowly introducing Clojure at Nosco, and I'm very excited to share some first impressions on what it feels like to start with a pretty legacy Node.js codebase and sprinkling some Clojure on top.

It's true that when re-architecting a system, most of the benefits you receive (assuming you're successful) stem from all the hindsight you have, a clean slate, clearer requirements and so on. So I won't go into all the cool stuff we've managed to get into our main system.

What I do want to talk about is something that I believe Clojure really excels at, and that is "incremental automation" of our infrastructure via a REPL.

The problem

We run a multi-tenant SaaS platform but we try to be as white label as possible, since our clients are big enterprises with complex IT and security policies. Clients can bring their own domains and their own mail servers, they need to unblock their spam and phishing filters so that emails go through and other very boring but necessary tweaks.

For a very long time, onboarding a new client has been a manual process that involved a lot of email back-and-forth and also a lot of clicking around in the AWS console, the database provider console, the mail service console and also ssh-ing into a bastion host with access to the prod service to run some database commands and other maintenance scripts.

At the same time, we had a lot of ad-hoc requests from our Customer Success people: This client wants a data dump, the other client deleted something by accident, could we please automate a very tedious process for them etc etc. We tried to do some of this as best as we could, but it has been tedious and annoying: having to copy bash & node scripts around to production servers gets old really fast, plus we were always scared that a fat-finger would do bad things to the production database.

Enter Clojure

When we managed to get Clojure setup on a set of servers, one of the first things I did is to figure out how to run a remote REPL that I could SSH-forward to my local machine and connect my editor to. While still there is an issue with fat-fingering and the danger of developing in production, at least I could write and inspect a whole function in a familiar environment before sending the code across the wire.

What then started to happen is that we'd start writing a lot of tiny one-off functions, that we committed to a catch-all repository. We started with small tasks like generating a one-off report, to updating some DNS entries, to filtering some bad data or even business analytics for our sales team to use; all these tasks transformed from a tedious half-day work that you had to do from scratch every.single.time to a fun and very fast experience.

And at some point, when more and more of our onboarding tasks were automated, we took the plunge, set some time aside, and connected everything to a web form that our non-technical colleagues can now use to do a lot of these tasks. This consolidation took only a handful of days, since it was mostly glue code. I don't believe this would ever get done if we tried to do it with Node.js or any other "conventional" language, since even figuring out the requirements would be a monumental task on its own.

In my opinion, this work was enabled by three very important pillars of Clojure:

a) Interactive development
b) Small, composable functions
c) Access to a rich library ecosystem

Interactive Development

With Node.js, we'd have to either edit a one-off script right at the server, and go back to the shell and running it, or edit locally and scp the script across (and then going back to the shell). Sometimes you might fire up a Mongo or a Node shell but then you'd lose your editor affordances – and nobody wants to type for long into a REPL anyway.

This also meant that every time you wanted to change something, you'd have to re-run your script, losing all your state. Sometime this was OK, but sometimes when you have already done an expensive calculation or fetched a lot of data, it was very slow and annoying to make a mistake.

With Clojure, once the REPL was started and connected, we could just start trying things our right in our editor. That function blew up because of something? Just tweak it, re-evaluate it, run it again with a single keystroke. Not only this process has literally a sub-second overhead, but it also allows you to explore the problem, the data and the solution incrementally, building upon your previous calculations.

Also, we could use all the Clojure's data-exploration functions, our IDEs would show a nice visualisation of the returned data and sometimes we'd even fire up a local real to use a visualisation GUI like the Clojure Inspector, Inspector Jay or REBL.

Small, composable functions

As mentioned before, with Node we'd have to write one-off scripts that had to contain "the universe" – for some reason we never created a proper Node project to collect useful scripts. My hypothesis is that since Node doesn't have an interactive mode and encourages an imperative style for this kind of thing, after doing all the work of actually generating the report or doing the task, you'd have to then refactor the reusable bits out to their own modules, which you would do by copy-pasting them from the remote server to your local editor, making sure they don't rely on or mutate any global state and finally making sure they are actually usable. This is just too much work to do after you have handed over your one-off report.

With Clojure, on the other hand, you are encouraged to write small functions – which are usually pure and side-effect free. Which in practice meant that we'd start with a thread macro ((->> something (map #(...)) (filter #(...)) etc) which of course after a few stages became unwieldy. Then we'd extract out the small "domain" predicated functions, like is-draft?, truncate-to-day, created-between that would be pure and actually reusable in a very obvious way. Or perhaps, we'd managed to figure out a tricky AWS API call – that we could then extract out for future use.

And of course, when you are actually developing in your editor, writing small pure functions, and wrapping all your explorations in a "rich comment", you can just git -am "fiddle with the REPL" so that your teammates can not only use these functions in the future, but also benefit from all your exploration.

Access to a rich library ecosystem

Now of course, most of the work we've done usually relies on slicing and dicing data using the Clojure core functions. But when we actually needed to go above and beyond to that, we were able to leverage some very robust Clojure and Java libraries: e.g. Hickory to convert HTML into Clojure data for further analysis, Apache Tika to detect file mime-types, Apache POI to generate Excel reports etc.

While all this is also available in many other ecosystems, including Node.js, the ability to quickly add a Java library to your project (or its Clojure wrapper) and start to experiment with its API with your actual data takes the whole experience just to the next level.

Clojure: Low friction instead of paper cuts

To sum up our experience so far, we've found that Clojure is the inverse of the "death by a thousand paper cuts" phenomenon. Instead of slowing down because of numerous little inconveniences, with Clojure we found that the very very low friction to get things done enables you to do things that you'd otherwise never even consider. We're extremely happy with our decision so far, so a big thank you to all the fantastic people in the Clojure community that make all this possible!