The Data Detective by Tim Harford

Intellectually tepid but harmless. A discussion of the value of statistics in explaining and understanding the modern world, with advice in the form of simple--and sometimes vacuous--rules. The author tries for a Malcolm Gladwell style but he's not quite the writer Gladwell is, and the result is a book that offers readers less insight and less enjoyment than it might.

Notes:

1) The author opens with a strange critique of the short (and in my opinion very useful) 1950s-era book How To Lie With Statistics, claiming it made people collectively suspicious of statistics. Interestingly, he makes this claim with no evidence given, which is an unfortunate way to open a book about finding the truth with statistics.

On one level this probably seems a stupid thing to nitpick about, but to put it on another level, it is never nitpicking to expect a writer to back up any and all claims with at least some evidence. If an author makes the specific claim that a book written in 1954 produced broad, collective cynicism about statistics, and does not cite a single example of that cynicism, it leaves a reader vaguely appalled to see such a transparently unproved assertion made so confidently in the very introduction of a book. But the author just moves on.

2) I say "harmless" above, but the author appears unaware of the *harm* sometimes caused by statistics, particularly the Gaussian/normal distribution statistics that he celebrates in the many studies that he refers to in his examples and arguments. It is a critical error the author could have easily cured if he had read any of Nassim Taleb's work (Fooled By Randomness would be a good start for him). The category error here is to assume that the world (especially domains with extreme complexity or unusual distributions) conforms to a Gaussian/normal distribution and can be analyzed as such. Certainly many domains are normally distributed and can be described very helpfully using Gaussian statistical techniques, but it is a grave error to assume therefore that all domains can be modeled or described effectively this way.

Sadly, the author cites Taleb twice in the book, but it is plain that he hasn't yet integrated any of Taleb's central ideas.

3) Another oddity, in light of the author's criticism of How To Lie With Statistics: See Chapter 9 where the author cites--without any criticism--a book called How Charts Lie in a chapter that basically describes how to lie with statistics using charts. Wait a minute: if a generation of readers collectively became cynical about statistics from reading the first book, won't a new generation of readers became cynical about charts and lose their trust in charts because of this book? When seeing incontinent thinking like this from an author it makes a reader wonder: is there a coherent message in this book or is it just a salad of words?

4) Harford writes this book with an enthusiastic tone of a man who has discovered an incredibly useful tool (statistics) and wants to share his appreciation for it. Imagine a man who's discovered a hammer for the first time, and is enthusiastically explaining all the uses for it. The problem is: to the man with a hammer all the world is a nail. This is a journey all of us go through in our discovery of statistics: it is a journey of epistemic humility, where we discover where statistics can and should be used, and where it shouldn't and can't. There is thus a kind of enthusiastic naive empiricism about this author, which on one hand is endearing, but on the other hand would be quickly dispelled if he would just read Fooled By Randomness.

5) Another interesting structural problem to think about: How do you write a book to encourage readers to pleasure in the joys of statistical analysis (and also enjoy the benefits of the many scientific studies produced using these statistical methods), while at the same time admitting one of the gravest crises in "studies show science" which is the reproducibility crisis? The author first addresses the reproducibility crisis some 100 pages into the book, but yet the book is filled front to back with citations of studies--almost none of which have actually been reproduced! This is a very interesting structural problem and I have no idea how I would handle it either.

Rule 1: Search Your Feelings

This chapter is actually somewhat helpful in providing to an attentive reader two very useful metaquestions when made to feel feelings by news or information: first ask "how does this make me feel?" Then ask "why does it make me feel this way?" You will be impervious to unethical rhetoric if you remember to ask these questions of yourself.

Rule 2: Ponder Your Personal Experience

This chapter contains excellent examples of Goodhart's law, with which everyone should be familiar: "when a measure becomes a target, it ceases to be a good measure."

Rule 3: Avoid Premature Enumeration

Interesting discussion on arbitrary cutoff dates for declaring live births as a driver for widely disperate statistics on infant mortality. The author could have done a lot of interesting things discussing our reactions to this information, which would have tied together all the chapters so far in the book. He missed the opportunity.

Rule 4: Step Back and Enjoy the View

Genuinely intriguing idea to consider a newspaper that was delivered every 25, 50 or 100 years rather than every day. What would it cover? How would it cover the news? What actually would be news looked at from such a long-term perspective?

Thinking of news delivered at a slower rhythm helps broaden your perspective, helps you see what is more likely to be noise rather than signal. There's definitely applications here for investing to think about.

It struck me as rather weird that this author cites Nassim Taleb and the disgraced Rolf Dobelli in the same short paragraph and yet fails to acknowledge (more likely fails even to know about) the embarrassing plagiarism Dobelli committed against Taleb. One strong possibility is this author doesn't really know as much about his sources and subject as he should. Which takes me to a heuristic: the more glibly written the book, the further outside his circle of competence the author is.

Rule 5: Get the Backstory

How many books have I read that contain the infamous jam tasting study, along with meta-analysis of what it means? Even this thick-headed food blogger wrote about it years ago. :))

Note however that this chapter is a good starter text for people to learn about the various catastrophic problems in "studies show" science. The author could have and probably should have gone further and been more aggressive in his criticisms: there's plenty to criticize and plenty of atrocious examples: of foundational psychological studies being found to contain fake data, p-hacking techniques, the decline effect, etc.

Rule 6: Ask Who Is Missing

The author doesn't appear to be conversant in many of the controversies surrounding design flaws in Milgram's famous conformity experiments.

Rule 7: Demand Transparency When the Computer Says No

Google flu trends: its early success and its embarrassing later failure.

Useful chapter to avoid overly credulous belief in the value of big data analytics.

Note several examples here where the author thinks private companies should be forced to release not only their data but their algorithms to the public for "scientific value." I hate big tech companies as much as the next person, but this idea is rife with disastrous second order consequences (of which the author seems blissfully unaware).

I'm not sure if the author realizes his many inchoate conclusions often seem contradictory: we're supposed to trust statistics, but we're NOT supposed to trust anyone who uses statistics in the privacy of their own corporate headquarters. Or: sure, we should have privacy, but if our data happens to be captured by and used by some corporate algorithm it should be released to the public to be "assessed rigorously," and evaluated by "independent experts" for "accountability" and for "scientific value."

Rule 8: Don't Take Statistical Bedrock for Granted

Rule 9: Remember That Misinformation Can Be Beautiful Too

Note the quote from and positive mention of the book How Charts Lie here in light of the earlier negative criticism of How to Lie with Statistics. I'm beginning to have grave concerns--truly grave concerns--that a generation of readers will become cynical about charts and lose their trust in charts because of this book. The author appears totally blind to this grave, grave risk.

Nice to hear the story of Florence Nightingale, but I think this chapter's message on not being tricked by charts could be replaced with a simple one-sentence heuristic, borrowed from The Last Psychiatrist: "what do they want you to believe?"

Rule 10: Keep An Open Mind

Interesting condescension towards Irving Fisher here: he's often used as a punching bag in many books thanks to his unfortunate quotes about the stock market in 1929.

Once again, It's rather odd to have the author finally address the reproducibility crisis in his book, but then, throughout the book, go on to cite various studies to make the points he's trying to make... without ever addressing the reproducibility of those specific studies that he's citing. Kind of a strange structural (and circular) problem.

Conclusion: Be curious

When I mentioned the author's "sometimes vacuous" rules, these last two were front of mind.

The Genesis of Russophobia in Great Britain by John H. Gleason

In-depth (and surprisingly interesting!) analysis of the shifting public and government opinion on Russia during late 18th and early/mid 19th century England, plus a useful (and telling) exploration of the various propaganda and media narratives used to drive these opinions. I've written before on this site, many times, that history rhymes, it doesn't repeat exactly, so you have to know your history--and by this I mean know your actual history, not your country's preferred propaganda narrative of history--in order to see that rhyme to make useful, accurate predictions. It is fascinating to see England in the 1800s applying various forms of the same propagandized and manufactured Russophobia that we see in the United States today. England went from a literal alliance with Russia (against Napoleonic France) to a state of paranoid loathing of Russia in a matter of decades; the USA likewise went from " aren't they our friends now? " after the Soviet collapse to...

What I Just Read

Search This Blog

The Data Detective by Tim Harford

More Posts

A Technique for Producing Ideas by James Webb Young

The Genesis of Russophobia in Great Britain by John H. Gleason

The Design of Everyday Things by Don Norman