Skip to main content

The Data Detective by Tim Harford

Intellectually tepid but harmless. A discussion of the value of statistics in explaining and understanding the modern world, with advice in the form of simple--and sometimes vacuous--rules. The author tries for a Malcolm Gladwell style but he's not quite the writer Gladwell is, and the result is a book that offers readers less insight and less enjoyment than it might.

Notes: 
1) The author opens with a strange critique of the short (and in my opinion very useful) 1950s-era book How To Lie With Statistics, claiming it made people collectively suspicious of statistics. Interestingly, he makes this claim with no evidence given, which is an unfortunate way to open a book about finding the truth with statistics.

On one level this probably seems a stupid thing to nitpick about, but to put it on another level, it is never nitpicking to expect a writer to back up any and all claims with at least some evidence. If an author makes the specific claim that a book written in 1954 produced broad, collective cynicism about statistics, and does not cite a single example of that cynicism, it leaves a reader vaguely appalled to see such a transparently unproved assertion made so confidently in the very introduction of a book. But the author just moves on. 

2) I say "harmless" above, but the author appears unaware of the *harm* sometimes caused by statistics, particularly the Gaussian/normal distribution statistics that he celebrates in the many studies that he refers to in his examples and arguments. It is a critical error the author could have easily cured if he had read any of Nassim Taleb's work (Fooled By Randomness would be a good start for him). The category error here is to assume that the world (especially domains with extreme complexity or unusual distributions) conforms to a Gaussian/normal distribution and can be analyzed as such. Certainly many domains are normally distributed and can be described very helpfully using Gaussian statistical techniques, but it is a grave error to assume therefore that all domains can be modeled or described effectively this way. 

Sadly, the author cites Taleb twice in the book, but it is plain that he hasn't yet integrated any of Taleb's central ideas. 

3) Another oddity, in light of the author's criticism of How To Lie With Statistics: See Chapter 9 where the author cites--without any criticism--a book called How Charts Lie in a chapter that basically describes how to lie with statistics using charts. Wait a minute: if a generation of readers collectively became cynical about statistics from reading the first book, won't a new generation of readers became cynical about charts and lose their trust in charts because of this book? When seeing incontinent thinking like this from an author it makes a reader wonder: is there a coherent message in this book or is it just a salad of words?

4) Harford writes this book with an enthusiastic tone of a man who has discovered an incredibly useful tool (statistics) and wants to share his appreciation for it. Imagine a man who's discovered a hammer for the first time, and is enthusiastically explaining all the uses for it. The problem is: to the man with a hammer all the world is a nail. This is a journey all of us go through in our discovery of statistics: it is a journey of epistemic humility, where we discover where statistics can and should be used, and where it shouldn't and can't. There is thus a kind of enthusiastic naive empiricism about this author, which on one hand is endearing, but on the other hand would be quickly dispelled if he would just read Fooled By Randomness.

5) Another interesting structural problem to think about: How do you write a book to encourage readers to pleasure in the joys of statistical analysis (and also enjoy the benefits of the many scientific studies produced using these statistical methods), while at the same time admitting one of the gravest crises in "studies show science" which is the reproducibility crisis? The author first addresses the reproducibility crisis some 100 pages into the book, but yet the book is filled front to back with citations of studies--almost none of which have actually been reproduced! This is a very interesting structural problem and I have no idea how I would handle it either.

Rule 1: Search Your Feelings
This chapter is actually somewhat helpful in providing to an attentive reader two very useful metaquestions when made to feel feelings by news or information: first ask "how does this make me feel?" Then ask "why does it make me feel this way?" You will be impervious to unethical rhetoric if you remember to ask these questions of yourself.

Rule 2: Ponder Your Personal Experience
This chapter contains excellent examples of Goodhart's law, with which everyone should be familiar: "when a measure becomes a target, it ceases to be a good measure."

Rule 3: Avoid Premature Enumeration
Interesting discussion on arbitrary cutoff dates for declaring live births as a driver for widely disperate statistics on infant mortality. The author could have done a lot of interesting things discussing our reactions to this information, which would have tied together all the chapters so far in the book. He missed the opportunity. 

Rule 4: Step Back and Enjoy the View
Genuinely intriguing idea to consider a newspaper that was delivered every 25, 50 or 100 years rather than every day. What would it cover? How would it cover the news? What actually would be news looked at from such a long-term perspective?

Thinking of news delivered at a slower rhythm helps broaden your perspective, helps you see what is more likely to be noise rather than signal. There's definitely applications here for investing to think about.

It struck me as rather weird that this author cites Nassim Taleb and the disgraced Rolf Dobelli in the same short paragraph and yet fails to acknowledge (more likely fails even to know about) the embarrassing plagiarism Dobelli committed against Taleb. One strong possibility is this author doesn't really know as much about his sources and subject as he should. Which takes me to a heuristic: the more glibly written the book, the further outside his circle of competence the author is.

Rule 5: Get the Backstory
How many books have I read that contain the infamous jam tasting study, along with meta-analysis of what it means? Even this thick-headed food blogger wrote about it years ago. :))

Note however that this chapter is a good starter text for people to learn about the various catastrophic problems in "studies show" science. The author could have and probably should have gone further and been more aggressive in his criticisms: there's plenty to criticize and plenty of atrocious examples: of foundational psychological studies being found to contain fake data, p-hacking techniques, the decline effect, etc.

Rule 6: Ask Who Is Missing
The author doesn't appear to be conversant in many of the controversies surrounding design flaws in Milgram's famous conformity experiments. 

Rule 7: Demand Transparency When the Computer Says No
Google flu trends: its early success and its embarrassing later failure.
Useful chapter to avoid overly credulous belief in the value of big data analytics.

Note several examples here where the author thinks private companies should be forced to release not only their data but their algorithms to the public for "scientific value." I hate big tech companies as much as the next person, but this idea is rife with disastrous second order consequences (of which the author seems blissfully unaware).

I'm not sure if the author realizes his many inchoate conclusions often seem contradictory: we're supposed to trust statistics, but we're NOT supposed to trust anyone who uses statistics in the privacy of their own corporate headquarters. Or: sure, we should have privacy, but if our data happens to be captured by and used by some corporate algorithm it should be released to the public to be "assessed rigorously," and evaluated by "independent experts" for "accountability" and for "scientific value."

Rule 8: Don't Take Statistical Bedrock for Granted

Rule 9: Remember That Misinformation Can Be Beautiful Too 
Note the quote from and positive mention of the book How Charts Lie here in light of the earlier negative criticism of How to Lie with Statistics. I'm beginning to have grave concerns--truly grave concerns--that a generation of readers will become cynical about charts and lose their trust in charts because of this book. The author appears totally blind to this grave, grave risk. 

Nice to hear the story of Florence Nightingale, but I think this chapter's message on not being tricked by charts could be replaced with a simple one-sentence heuristic, borrowed from The Last Psychiatrist: "what do they want you to believe?"

Rule 10: Keep An Open Mind
Interesting condescension towards Irving Fisher here: he's often used as a punching bag in many books thanks to his unfortunate quotes about the stock market in 1929. 

Once again, It's rather odd to have the author finally address the reproducibility crisis in his book, but then, throughout the book, go on to cite various studies to make the points he's trying to make... without ever addressing the reproducibility of those specific studies that he's citing. Kind of a strange structural (and circular) problem. 

Conclusion: Be curious
When I mentioned the author's "sometimes vacuous" rules, these last two were front of mind. 

More Posts

The Great Taking by David Rogers Webb

"What is this book about? It is about the taking of collateral, all of it, the end game of this globally synchronous debt accumulation super cycle. This is being executed by long-planned, intelligent design, the audacity and scope of which is difficult for the mind to encompass. Included are all financial assets, all money on deposit at banks, all stocks and bonds, and hence, all underlying property of all public corporations, including all inventories, plant and equipment, land, mineral deposits, inventions and intellectual property. Privately owned personal and real property financed with any amount of debt will be similarly taken, as will the assets of privately owned businesses, which have been financed with debt. If even partially successful, this will be the greatest conquest and subjugation in world history." Sometimes a book hits you with a central idea that seems at first so preposterously unlikely that you can't help but laugh out loud (as I did) and think, &quo

The Shipping Man by Matthew McCleery

A must-read for shipping investors--and even if you're not, it will likely make one out of you. It's a fun story, hilarious at times, and it teaches readers all kinds of nuances about investing. Our main character, running his own little hedge fund, finds out by pure accident that the Baltic Dry Index is down 97% (!) over the course of just three months. It makes him curious, and this curiosity takes him on a downright Dantean journey through the shipping industry.  He's outwitted left and right: first by savvy bankers in Germany, then by even savvier Greeks. And then, in an awful moment of weakness, he gets lured into buying a "tramp" (a very old, nearly used-up ship needing massive repairs) at what seems like a good price. The industry nearly eats this guy alive more than once, but he comes out the other end a true Shipping Man.  This should be mandatory reading for MBA students. I think back to all the terminally boring "case studies" I had to read ov

The Two Income Trap by Elizabeth Warren

What is wrong with the following statement? "But the two-income family didn't just lose its safety net. By sending both adults into the labor force, these families actually increased the chances that they would need that safety net. In fact, they doubled the risk. With two adults in the workforce, the dual-income family has double the odds that someone could get laid off, downsized, or other wise left without a paycheck. Mom or Dad could suddenly lose a job." You've just read the fundamental thesis of The Two-Income Trap. If you agree with it--although I truly hope you're a better critical thinker than that--you'll have your views reinforced. Thus reading this book would be an unadulterated waste of your time. If on the other hand you are capable of critical thinking and you can successfully see through hilariously unrigorous "logic" of the above statement, then this book will still be a waste of your time (unless you like reading books for the s