Decomplecting Clojure

Clojure is a programming language that combines several concepts in a unique way. It is hard to understand the value of each concept, and they often blur together. As Rich Hickey, the creator of the language, suggests, taking a system apart is difficult but worth it. After "decomplecting" you end up with smaller pieces which are easier to reason about and use.

This is not an introduction to the language but an attempt to explain its underpinnings. Let's break Clojure apart and see what does each piece bring to the table.

The JVM

Clojure's runtime is implemented in Java. Some of the benefits of using the JVM are:

  • Clojure can use Java libraries. Through Java interop, Clojure had an HTTP server, a cryptography library, and a package repository from day one1.
  • Clojure can be deployed wherever Java is supported, and that is virtually everywhere.
  • Whenever extra performance is needed, you can seamlessly use Java.
  • There are armies working on optimizations, profiling, and monitoring for the JVM.

The JVM makes Clojure practical.

Every decision comes with trade-offs. Firstly, the JVM doesn't know about Clojure which leads to:

  • Slow start times: when the JVM is started, it must first load Clojure itself and then proceed to run your code. This makes it unsuitable for Android and CLI tools.
  • Poor stack traces: they contain all the underlying Java classes and methods involved and they are hard to decipher.
  • No tail call optimization.2

Functional Style

The Clojure standard library is based on the ML family and heavily features lazy collections. There are plenty of blog posts describing functional programming and how it will solve all your problems but let's limit the hype.

I find the "collection + lambdas" approach more elegant than "loops + iterators". Higher-order functions describe intent more succinctly and allow for more reuse. To me, it is the difference between seeing the forest and only seeing trees:

(filter even? (map inc some-array))
someOtherArray = [];
var x, i;
for(i = 0; i < someArray.length; i++) {
  x = someArray[i] + 1;
  if (isEven(x)) {
    someOtherArray.push(x);
  }
}

A functional style makes Clojure elegant.

Lisp

Lisp is many things to many people, a religion to some and an antique to others. To me it represents two distinct things: a syntax and an approach to designing and developing code.

Lisp programs are defined in terms of "s-expressions", lists with a consistent syntax. An s-expression is list of symbols where the first symbol is the operator, the rest are its arguments, and when evaluated it returns a value:

(op arg1 arg2 ... arg-n)

(+ 1 1) ;; => 2

Since the expression is a list (not a string!), it can easily be manipulated by other programs. For example, if you wanted to find the operator of an s-expression:

(first '(+ 1 1)) ;; => +

(first '(first (+ 1 1))) ;; => first

A consistent syntax is crucial when writing macros, programs that write other programs. Macros give you the power to write language extensions. If your Lisp is missing feature X, you could probably write a set of macros to implement it. A good example is core.async : the team behind Clojure used macros to add CSP, the concurrency model used by Go, as a library without the need to modify the language itself.

s-expressions come with some trade-offs:

  • Parenthesis whining: beginners and non lispers dislike them and complain loudly. Some operators like < are always hard to read at the beginning of the expression.
  • Macro abuse: some people like to experiment with macro DSLs making their code harder to read. It is less of a problem when using popular Clojure libraries since it is regarded as good practice to write macros only when you have to3.

s-expressions can be written and tested individually which leads to an interactive workflow. You type expressions in a console/REPL and modify them until they return the desired value. This leads you to compose small abstractions into larger ones. This strategy, bottom-up programming, allows for tremendous reuse.

Lisp makes Clojure powerful.

Data as Data

Clojure comes with four main data structures (lists, vectors, hash-maps, and sets) and nudges you into modeling your domain with them. In most cases you don't need to create your own types. Instead, you use the existing ones and the core library to manipulate them.

Most functions in the core library when used on the standard data structures have the "closure property". An operation and a type X has the closure property if it takes and returns elements of the same type X, allowing you to combine them in some fashion:

;; integers are closed under +
(+ 40 2) ;; => 42 - an integer

;; sets can be merged into a bigger set
(set/union #{1 2} #{1 3}) ;; => #{1 2 3}

;; functions can be *comp*osed into other functions
(comp zero? mod) ;; => divisible-by? - a function

The opposite approach is to define your own types and then add methods which are specific to them. To see the two approaches I'll steal this example: you need to draw a poker hand out of a deck of cards. The OO encapsulation style leads you to:

class Card {
  rank: Int
  suit: Enum
}

class Deck {
  deck: List<Card>
}

In Clojure you would use a vector for the card, [3 :spades], and a vector of cards for the deck, [[3 :spades] [1 :hearts]]. Now you can leverage all of Clojure's functions to manipulate your cards and decks. For example, "get me a poker hand out of a deck of cards":

(->> [:spades :hearts :diamonds :clubs]
     (mapcat (fn [color]
               (map vector (range 13) (repeat color))))
     shuffle
     (take 2))

;; => ([7 :hearts] [5 :spades])

We don't need any custom data or functions to model and manipulate our domain and the resulting code is straightforward.

Now think about adding the shuffle and sort methods to the Deck class. Since the data is wrapped, we now have to dispatch each function from the class to the underlying implementation, e.g., deck.shuffle() to innerList.shuffle(). All that extra work is needed to bridge the distance between the custom types and the language.

Data as data makes Clojure simple.

Immutability

Clojure data structures are immutable: once defined they can't be changed. If there is something that changes as time moves forward (state), you need to explicitly define it as mutable and change it through a well defined interface. Immutability helps you write concurrent code. To understand how, let's analyze the opposite: how mutability make concurrency hard.

For example, your server needs to know many clients are connected at any time. In most OO languages you would define some object to hold the count:

// On each connection
if client.isNew() {
    Int clientCount = clientCounter.getCount();
    // What if somebody connects right *now*?
    clientCounter.setCount(clientCount + 1);
}

You are first retrieving the value, then adding one to it, and then setting it. There is a window of time in which you are holding a version of clientCount that might not represent the real count. What if a new client connects in between those statements? Alternatively:

;; define a mutable reference with `atom`
(def client-count (atom 0))

(when (new-client? client-id)
  ;; atomically apply `inc` to the value in `client-count`
  ;; and then swap the old one for the new one
  (swap! client-count inc))

Making state explicit forces you to model the system clearly. You don't have objects that have important state and objects that are just stateful. You minimize the moving parts and just like in mechanical engineering, your design improves.

Immutability makes Clojure programs easy to reason about.


  1. From a historical perspective, targeting the JVM helped Clojure overcome the chicken-and-egg problem all new languages face: nobody wants to program in a language with no libraries, and no one wants to develop libraries for a language with no users. [return]
  2. Clojure works around it by introducing loop and recur which make tail calls explicit. [return]
  3. Zach Tellman has a great talk on how/when to use macros. [return]