Sebastian Bensusan

Decomplecting Clojure

Clojure is a programming language that combines several concepts in a unique way. From the outside it is hard to understand the value or importance of these concepts, and to make matters worse, they are often blurred together. As Rich Hickey, the creator of the language, suggests, taking a system apart is difficult but worth it. After "decomplecting" you end up with smaller pieces which are easier to reason about and even build newer things.

Though targeted to beginners, this is not an introduction to the language, but an attempt to explain it. In the process I steal from many sources and leave out technologies like core.async and core.match1 that, though powerful and important, don't characterize Clojure.

Let's break Clojure apart and see what does each piece bring to the table.

The JVM

Clojure's Runtime is implemented in Java. Code can be dynamically loaded into a running JVM or compiled into Java Byte Code for later use. Some of the benefits of using the JVM are:

  • Clojure can leverage Java libraries. Through Java interop, Clojure had an HTTP server, a crypto library, and a package repository from day one2. This is specially useful for huge projects (i.e. Apache family) that target the JVM. Not all Java projects are easy to use from Clojure: some APIs can be comfortably used through Java interop, others can be wrapped with Clojure to provide a more familiar interface, and a few are so far away from Clojure's semantics that no amount of wrapping can make them palatable.
  • Clojure can be deployed wherever Java can and every infrastructure supplier supports Java.
  • Clojure implements many abstractions on top of Java and they are not without cost. The upside is that whenever extra performance is needed, you can go one level down and use Java.
  • The JVM ecosystem has armies working on optimization, profiling, monitoring, and debugging. According to Colin Fleming, the man behind Cursive, it is great to work with such mature tooling. He also mentioned that the implementation details of Clojure are unavoidable when dealing with Byte Code, but luckily for us he has chosen to do it.

The JVM makes Clojure practical.

Every decision comes with trade-offs. Firstly, the JVM doesn't know about Clojure which leads to:

  • Slow start times: when the JVM is started, it must first load Clojure itself and then proceed to run your code. This makes the current implementation of Clojure unsuitable for Android development and command line tools where an extra second of loading time is unacceptable. That being said, the community is working to improve on all fronts.
  • Poor stack traces: stack traces contain all the underlying Java classes and methods involved. Even if there is a clear mapping between Clojure code and Java classes, they are noisy and hard to decipher. Experience and tools help but Clojure stack traces are very daunting for beginners.

Secondly, you inherit all the JVM's limitations:

  • No tail call optimization. Clojure works around it by introducing loop and recur which make tail calls explicit.
  • Code Reloading is hard. The JVM is not designed with code reloading in mind which makes an interactive workflow harder. The community has overcome this limitation with amazing tools and discipline but if you are heavily redefining functions you'll sometimes need to restart the JVM.
Functional Style

The Clojure standard library draws a lot of inspiration from the ML family of languages and heavily features lazy collections. There are plenty of blog posts describing how Functional Programming will solve all your problems and even help you lose weight with this weird trick but let's limit the hype.

I find the "collection + lambdas" approach more powerful than "loops + iterators". Higher-order functions and currying describe your intent more succinctly and allow for more reuse. It is the difference between seeing the forest and only seeing trees:

(filter even? (map inc some-array))
B = [];
var x, i;
for(i = 0; i < someArray.length; i++) {
  x = someArray[i] + 1;
  if (isEven(x)) {
    B.push(x);
  }
}

With that said, I object to statements like this made-up quote:

Functional programming is declarative: you tell the computer what you want and not how to get it.

Whenever you specify the steps of a computation you are telling the computer how to get something (i.e., increment all elements by one, and then give me only the even ones). You might be describing less and doing more, but it is far from declarative as such statements imply. This is different from logic programming where you state truths about your domain and then query it.

A functional style makes Clojure elegant.

Lisp

Lisp is many things to many people, a religion to some and an antique to others. To me it represents two distinct things: a syntax and an approach to designing and developing code.

Lisp programs are defined in terms of s-expressions which have consistent syntax. An s-expression is list of symbols where the first symbol is the operator, the rest are its arguments, and when evaluated it returns a value:

(op arg1 arg2 ... arg-n)

(+ 1 1) ;; => 2

Since the expression is a list (not a string!), it can easily be manipulated by other programs. For example, if you wanted to find the operator of any s-expression:

(first '(+ 1 1)) ;; => + 

A consistent syntax is crucial when writing macros, programs that write/modify other programs. Macros are powerful because they allow you to write language extensions. If your Lisp is missing feature X, you could probably write a set of macros to implement it. A good example is core.async: the team behind Clojure added Go's concurrency model to the language with a library using macros and without modifying the language itself.

s-expressions come with some trade-offs:

  • Parenthesis whining: beginners and non lispers dislike them and complain loudly. It took me a couple of weeks to naturally read s-expressions but it was definitely worth it. Also, some operators like < are always hard to read at the beginning of the expression.

  • Macro abuse: some hackers like to experiment with macro DSLs making their code harder to read and use consistently. It is less of a problem when using popular Clojure libraries since it is regarded as good practice to write macros only when you have to3.

s-expressions can be written and tested individually which leads to an interactive development style. You type expressions in a REPL (most modern languages come with REPLs, shells, or consoles) and modify them until they return the desired value. Any tool that makes development interactive is valuable to the Clojure community.

Building s-expressions from the ground up is not only fun, it also leads you to compose small abstractions into larger ones. The strategy is called bottom-up programming and allows for tremendous reuse. If your abstractions have well-defined interfaces it can also be called layered programming, as presented in Chapter 2 of SICP. SICP and Clojure go one step further by guiding you into giving those components and abstractions the Closure property. An operation has the Closure property if it takes and returns elements of the same type X, allowing you to combine them in some fashion:

;; integers are closed under +
(+ 40 2) ;; => 42 - an integer

;; sets can be merged into a bigger set
(set/union #{1 3} #{1 2}) ;; => #{1 2 3}

;; functions can be *comp*osed into other functions
(comp zero? mod) ;; => divisible-by? - a function

True to its name, Clojure data has closure under most of the core library. This is such a big thing that it deserves an entire section.

Lisp makes Clojure powerful.

Data as Data

Clojure comes with four main data structures and nudges you into modeling your domain with them. In most cases you don't need to create your own types. Instead, you use the available data structures and use all of the core library.

The opposite approach is to define your own types and classes, and then add methods to them. It is hard to explain how much simpler the end result is when using Clojure's data structures. To make my point, I will steal this example since it perfectly captures what I mean. Let's say you need to model a deck of cards. Choose your favorite language and picture what you would do. Would you write something like this?

class Card {
  rank: Int
  suit: Enum
}

class Deck {
  deck: List<Card>
}

In Clojure you would just use a vector for the Card, [3 :spades], and a vector of cards for the deck, [[3 :spades] [1 :hearts]]. Now you can leverage all of Clojure's functions. For example, "get me a poker hand out of a deck of cards":

(->> (mapcat #(map vector (range 13) (repeat %))
              [:spades :hearts :diamonds :clubs])
      shuffle
      (take 2)) ;; We are playing Hold'em

I don't expect you to read that but to understand that we are only using data structures and functions provided by Clojure. We are not adding any abstractions to model and manipulate our domain and the resulting code is simple and straightforward.

Now think about adding the shuffle and sort methods to the Deck class (without forgetting compare for Card!). Since you've wrapped your data, you now have to dispatch each function you need from your class to the underlying implementation, e.g., deck.shuffle() to innerList.shuffle(). All that extra work is needed to reduce the distance between the custom types and the language. Is it necessary? Type safety and speed can justify it, that is not for me to decide. The point is that when you treat Data as Data, code flows from you hands because you have the full power of the language at your disposal.

Data as Data makes Clojure simple.

Immutability

Clojure data structures are immutable: once defined they can't be changed. If there is something that changes as time moves forward (state), you need to explicitly define it as mutable and change it through a well defined interface. Immutability helps you write concurrent code. To understand how, let's analyze the opposite: how does mutability make concurrency hard.

For example, your server needs to know many clients are connected at any time. In most OO languages you would define some object to hold the count:

// On each connection
if client.isNew() {
    clientCount = clientCounter.getCount();
    // What if somebody connects right *now*?
    clientCounter.setCount(clientCount + 1);
}

You are first retrieving the value, then adding one to it, and then setting it. There is a window of time in which you are holding a version of clientCount that might not represent the real count. What if a new client connects in between those statements? Concurrency is hard, and Clojure offers many tools to deal with it. I'll present one of them.

In Clojure, you would define one mutable reference client-count and then mutate it through a transaction:

(def client-count (atom 0))

(when (new-client? client-id)
  (swap! client-count inc))

swap! means: apply the function inc to the value in client-count and then swap the old value and the new result. There are no unsafe variables involved.

State management improves something that I consider more important than concurrency: design. Having to define your state forces you to examine the system and model it clearly. You don't have objects that have important state and objects that are just stateful. You minimize the moving parts and just like in mechanical engineering, your design improves.

Immutability makes Clojure programs easy to reason about.

Principles & Community

You may have noticed that I've used words like "simple" and "design" more than usual. They hint at the language's underlying philosophy. Talks by Rich Hickey and members of the community (notably Zach Tellman and David Nolen) show good taste 4. The presented ideas come from thought and analysis, not improvisation or convenience.

Clojure starts by carefully stating problems and recognizing that good solutions come after discarding bad ones, which explains why "Not everything is awesome" is a mantra in the community. The community is reluctant to adopt libraries/frameworks that are not the right solution to the right problem.

On the quest for the perfect design, Clojure acknowledges that there is no such thing. Every decision involves trade-offs, even those you make by default. For example, big web frameworks are not popular since most users would rather weigh their options for their particular problem instead of scaffold a template with the "standard settings".

What's the problem with standard settings? It depends. In many cases, there is no problem. The standard settings apply to your system and are easy to setup since they come with modules that fit together. What happens after a while, when your business evolves and requirements change? If said modules assume one another, it will be hard to replace any of them. The system was easy to build but it is not simple to modify, and Simple is better than Easy Clojure takes the Single Responsibility Principle to a new level. When choosing a module you ask the following questions:

  • Does it do one thing very well?
  • Does it have a few dependencies?
  • Can I replace it with a similar module in the future?

If the answer to any of those is "No", you discard it. Clojure modules should do one thing well and swap dependencies easily. A great example is Datomic: a database that can use several storage services.

Clojure's unifying objective is to help you manage the complexity of your software by reducing the incidental complexity created by your tools and allow you to focus on the essential complexity of your domain.

In my eyes, these principles make Clojure, well, Clojure.


1core.match is a library that provides pattern matching to the language and core.async provides utilities to write async code following the CSP model, like Go. Since they could only be written as a library with macros, they are testament to the power of macros.

2From a historical perspective, targeting the JVM helped Clojure overcome the chicken-and-egg problem all new languages face: nobody wants to program in a language with no libraries, and no one wants to develop libraries for a language with no users.

3Zach Tellman has a great talk on how/when to use macros.

4As in Kant's Aesthetics.