The Book of Why

Not recommended. While there are clearly genius ideas here, the tone and the half of the content get in the way.

The book makes the following main points:

  1. Statistics as a field has a dogmatic tendency to avoid talking about causation.
  2. For any given study, the most important question is what causes what.
  3. The author's do-calculus is the best tool we have to study causation.

I don't need to be convinced of 2, I don't care at all about 1, and the book convinced me I should understand 3.

Sadly, Pearl holds enough of a grudge on the Statistics community that most of the book takes the following form:

  • Look at experiment A.
  • Look how wrong the statistical results for A are.
  • Look how well the do-calculus solves a simplified version of A.

I guess it is my fault for not reading the math textbook1 directly which probably avoids criticizing Statistics so much.

Also, even though polite, the overall tone of the book is too self-congratulatory for my taste.

Takeaways

Ladder of Causation

  1. Association: X and Y are associated if they have a tendency to change together. In statistics, association is captured by the correlation coefficient.
  2. Intervention: If I change X, what will happen to Y? Intervention tries to predict a the future and allows for experiments. According to Pearl, statistic tools for this are error-prone.
  3. Counterfactuals: If X were different in the past, what would have happened to Y? Counterfactuals allows us to generalize from limited experiences.

It is not clear if Pearl sees the Ladder of Causation as a "true" taxonomy2 or rather, those are the things that fall out of the do-calculus so he presents it as such.

Bayesian networks

With Bayesian networks we can study many random variables that are somehow correlated to each other. We can draw a graph that connects each variable to the ones it is correlated with. Then, using Bayes Thereom and the measured values of some of the random variables, we can derive the likely values of the rest of the variables.

do-calculus

If we can add the causal relationships as arrows in a Bayesian network, we can augment probability and Bayes Theorem with a new operator, do(X), which allows us to study causation. It is called the do-calculus because it also provides rules to manipulate the algebraic expressions involving do(X).

Highlights

Explicit models

Also, notice that the omitted arrows actually convey more significant assumptions than those that are present.

What we leave out of a model is as important as what we put in by assuming it doesn't matter.

In fact, one of the major accomplishments of causal diagrams is to make the assumptions transparent so that they can be discussed and debated by experts and policy makers.

This is very useful when discussing a problem: being extra explicit about the underlying model resolves half of the disagreements I have.3

Progressive intelligence

We will probably will not succeed in creating humanlike intelligence until we can create childlike intelligence4, and a key component of this intelligence is the mastery of causation.

Data & experiments without theory

This point is repeated often to show how Pearl's work deviates from traditional statistical methods:

Data are objective; opinions are subjective. ...

The struggle for objectivity - the idea of reasoning exclusively from data and experiment - has been part of the way that science has defined itself ever since Galileo.

... causal analysis requires the user to make a subjective commitment. She must draw a causal diagram that reflects here qualitative belief - or better yet, the consensus belief of researchers in her field of expertise - about the topology of the causal process at work.

They are data driven, not model driven.

Pearl insists that there is always an explanatory theory at work and that data is never "just data". Putting theories before observation is an idea first attributed to Karl Popper and I was very surprised that I didn't find a single Popper reference in the book.5

DAGs

Before you can use the do-calculus, you need to define a Direct Acyclic Graph with the causal structure of the problem. Keeping that DAG in mind, you can operate on the equation. General purpose computation can also be very well expressed as DAGs. The fact that DAGs are everywhere leads me to believe that DAGs are not everywhere but rather in our mind.

Bayesian networks don't have directed edges, so they don't capture causation, just correlation.

Evolutionary Psychology

While I have no problems with the field of Evolutionary Psychology per se, I do see a tendency of unrelated scholars to invent plausible ad-hoc explanations of how evolution would have allowed their theory of choice. "Ancestral humans" shouldn't make it into every non-fiction book, let alone math ones.

We should ban "what makes us human" from our scientific discourse.

Representation Problem

How do humans represent "possible worlds" in their minds and compute the closes one, when the number of possibilities is far beyond the capacity of the human brain? ... We must have some extremely economical code to manage that many worlds. Could structural models be the actual shortcut that we use?

By "structural" he means the DAGs used by the do-calculus. Sounds a lot like the properties of immutable data structures: allow for cheap variation on a big initial structure without losing the original one.


  1. Causality: Models, Reasoning, and Inference [return]
  2. "True" as derived from experience or from reality, not from mathematical artifacts. [return]
  3. See Aumann's agreement theorem and Rationally Speaking episode on it. [return]
  4. He traces the idea of child intelligences to Turing's original imitation game paper. [return]
  5. If you do find Popper references, please contact me. [return]