Skip to main content

The Data Detective by Tim Harford

Intellectually tepid but harmless. A discussion of the value of statistics in explaining and understanding the modern world, with advice in the form of simple--and sometimes vacuous--rules. The author tries for a Malcolm Gladwell style but he's not quite the writer Gladwell is, and the result is a book that offers readers less insight and less enjoyment than it might.

Notes: 
1) The author opens with a strange critique of the short (and in my opinion very useful) 1950s-era book How To Lie With Statistics, claiming it made people collectively suspicious of statistics. Interestingly, he makes this claim with no evidence given, which is an unfortunate way to open a book about finding the truth with statistics.

On one level this probably seems a stupid thing to nitpick about, but to put it on another level, it is never nitpicking to expect a writer to back up any and all claims with at least some evidence. If an author makes the specific claim that a book written in 1954 produced broad, collective cynicism about statistics, and does not cite a single example of that cynicism, it leaves a reader vaguely appalled to see such a transparently unproved assertion made so confidently in the very introduction of a book. But the author just moves on. 

2) I say "harmless" above, but the author appears unaware of the *harm* sometimes caused by statistics, particularly the Gaussian/normal distribution statistics that he celebrates in the many studies that he refers to in his examples and arguments. It is a critical error the author could have easily cured if he had read any of Nassim Taleb's work (Fooled By Randomness would be a good start for him). The category error here is to assume that the world (especially domains with extreme complexity or unusual distributions) conforms to a Gaussian/normal distribution and can be analyzed as such. Certainly many domains are normally distributed and can be described very helpfully using Gaussian statistical techniques, but it is a grave error to assume therefore that all domains can be modeled or described effectively this way. 

Sadly, the author cites Taleb twice in the book, but it is plain that he hasn't yet integrated any of Taleb's central ideas. 

3) Another oddity, in light of the author's criticism of How To Lie With Statistics: See Chapter 9 where the author cites--without any criticism--a book called How Charts Lie in a chapter that basically describes how to lie with statistics using charts. Wait a minute: if a generation of readers collectively became cynical about statistics from reading the first book, won't a new generation of readers became cynical about charts and lose their trust in charts because of this book? When seeing incontinent thinking like this from an author it makes a reader wonder: is there a coherent message in this book or is it just a salad of words?

4) Harford writes this book with an enthusiastic tone of a man who has discovered an incredibly useful tool (statistics) and wants to share his appreciation for it. Imagine a man who's discovered a hammer for the first time, and is enthusiastically explaining all the uses for it. The problem is: to the man with a hammer all the world is a nail. This is a journey all of us go through in our discovery of statistics: it is a journey of epistemic humility, where we discover where statistics can and should be used, and where it shouldn't and can't. There is thus a kind of enthusiastic naive empiricism about this author, which on one hand is endearing, but on the other hand would be quickly dispelled if he would just read Fooled By Randomness.

5) Another interesting structural problem to think about: How do you write a book to encourage readers to pleasure in the joys of statistical analysis (and also enjoy the benefits of the many scientific studies produced using these statistical methods), while at the same time admitting one of the gravest crises in "studies show science" which is the reproducibility crisis? The author first addresses the reproducibility crisis some 100 pages into the book, but yet the book is filled front to back with citations of studies--almost none of which have actually been reproduced! This is a very interesting structural problem and I have no idea how I would handle it either.

Rule 1: Search Your Feelings
This chapter is actually somewhat helpful in providing to an attentive reader two very useful metaquestions when made to feel feelings by news or information: first ask "how does this make me feel?" Then ask "why does it make me feel this way?" You will be impervious to unethical rhetoric if you remember to ask these questions of yourself.

Rule 2: Ponder Your Personal Experience
This chapter contains excellent examples of Goodhart's law, with which everyone should be familiar: "when a measure becomes a target, it ceases to be a good measure."

Rule 3: Avoid Premature Enumeration
Interesting discussion on arbitrary cutoff dates for declaring live births as a driver for widely disperate statistics on infant mortality. The author could have done a lot of interesting things discussing our reactions to this information, which would have tied together all the chapters so far in the book. He missed the opportunity. 

Rule 4: Step Back and Enjoy the View
Genuinely intriguing idea to consider a newspaper that was delivered every 25, 50 or 100 years rather than every day. What would it cover? How would it cover the news? What actually would be news looked at from such a long-term perspective?

Thinking of news delivered at a slower rhythm helps broaden your perspective, helps you see what is more likely to be noise rather than signal. There's definitely applications here for investing to think about.

It struck me as rather weird that this author cites Nassim Taleb and the disgraced Rolf Dobelli in the same short paragraph and yet fails to acknowledge (more likely fails even to know about) the embarrassing plagiarism Dobelli committed against Taleb. One strong possibility is this author doesn't really know as much about his sources and subject as he should. Which takes me to a heuristic: the more glibly written the book, the further outside his circle of competence the author is.

Rule 5: Get the Backstory
How many books have I read that contain the infamous jam tasting study, along with meta-analysis of what it means? Even this thick-headed food blogger wrote about it years ago. :))

Note however that this chapter is a good starter text for people to learn about the various catastrophic problems in "studies show" science. The author could have and probably should have gone further and been more aggressive in his criticisms: there's plenty to criticize and plenty of atrocious examples: of foundational psychological studies being found to contain fake data, p-hacking techniques, the decline effect, etc.

Rule 6: Ask Who Is Missing
The author doesn't appear to be conversant in many of the controversies surrounding design flaws in Milgram's famous conformity experiments. 

Rule 7: Demand Transparency When the Computer Says No
Google flu trends: its early success and its embarrassing later failure.
Useful chapter to avoid overly credulous belief in the value of big data analytics.

Note several examples here where the author thinks private companies should be forced to release not only their data but their algorithms to the public for "scientific value." I hate big tech companies as much as the next person, but this idea is rife with disastrous second order consequences (of which the author seems blissfully unaware).

I'm not sure if the author realizes his many inchoate conclusions often seem contradictory: we're supposed to trust statistics, but we're NOT supposed to trust anyone who uses statistics in the privacy of their own corporate headquarters. Or: sure, we should have privacy, but if our data happens to be captured by and used by some corporate algorithm it should be released to the public to be "assessed rigorously," and evaluated by "independent experts" for "accountability" and for "scientific value."

Rule 8: Don't Take Statistical Bedrock for Granted

Rule 9: Remember That Misinformation Can Be Beautiful Too 
Note the quote from and positive mention of the book How Charts Lie here in light of the earlier negative criticism of How to Lie with Statistics. I'm beginning to have grave concerns--truly grave concerns--that a generation of readers will become cynical about charts and lose their trust in charts because of this book. The author appears totally blind to this grave, grave risk. 

Nice to hear the story of Florence Nightingale, but I think this chapter's message on not being tricked by charts could be replaced with a simple one-sentence heuristic, borrowed from The Last Psychiatrist: "what do they want you to believe?"

Rule 10: Keep An Open Mind
Interesting condescension towards Irving Fisher here: he's often used as a punching bag in many books thanks to his unfortunate quotes about the stock market in 1929. 

Once again, It's rather odd to have the author finally address the reproducibility crisis in his book, but then, throughout the book, go on to cite various studies to make the points he's trying to make... without ever addressing the reproducibility of those specific studies that he's citing. Kind of a strange structural (and circular) problem. 

Conclusion: Be curious
When I mentioned the author's "sometimes vacuous" rules, these last two were front of mind. 

More Posts

The Prophet of Edan by Philip Chase [The Edan Trilogy #2]

We all have our part to play and our duty to perform. This is a beautiful novel about performing your duty with honor, even in the face of almost certain failure. Author Philip Chase has an unusual gift for telling a compelling story, and The Prophet of Edan works on two levels: on the individual level, with characters we care about and root for, and on the grand, civilizational level, where entire nations  hurl themselves at each other in a desperate war of survival. And the geopolitical dramas in Philip's world of Eormenlond are downright Kissingerian --with betrayal, realpolitik and honor, all in equal measure. Now, any story with a large cast and a lot of moving parts presents the author with a structural challenge: how do you help the reader keep everybody and everything straight, but yet do it in a way that's organic to the story? After all, this is the second part of a trilogy,  and a lot happened in Book I . So I'll share an example here of what this author does,...

Godel, Escher, Bach by Douglas Hofstadter

A wonderful, beautiful work. Ask me about it, and I'll start nattering at you about sphex wasps, fugues, isomorphisms and "jumping out of the system." And my voice will trail off and you'll see me get a faraway look in my eyes. It's actually quite difficult to describe what this book is about--at least, impossible to describe in a few short sentences. [1] But there are so many ways to read Godel, Escher, Bach , and such a wide range of ideas and insights one can get out of it, that it becomes a different book for every reader. And let me confess, if you haven't read GEB  yet, I am jealous of you. [A quick  affiliate link to Amazon  for those readers who would like to support my work here: if you purchase your Amazon products via any affiliate link from this site, or from my sister site  Casual Kitchen , I will receive a small affiliate commission at no extra cost to you. Thank you!] First of all this book can be understood on many levels. You can read it a...

The Investor's Manifesto by William J. Bernstein

In just under 200 pages, The Investor's Manifesto gives you everything you need to manage your investments: * A historically grounded discussion of the tradeoffs between risk and return, * How to design an investment portfolio using index funds, including advice on withdrawal rates and how (and how often) to rebalance, * A good discussion of human psychological biases (the author uses the wonderful phrase "investing psychopathology" to describe this topic), and * How to navigate the financial services industry without getting your head handed to you. Finally, there's a chapter that summarizes everything, followed by a solid reading list for continuing your investment education, broken down by topic: theory, history, psychology, and business. Anyone wanting to reach a reasonable competence level in investing should read at least one of William Bernstein's books. This one or The Four Pillars of Investing  will suffice. Since he's not a Wall Street guy--he's...