David Spiegelhalter is a world renowned statistician. He does
research in statistics, he teaches statistics, and he also applies his
statistical knowledge to real world problems. This makes him uniquely qualified
to write a book that promises so much. The Art of Statistics: Learning from
Data certainly promises a lot to the reader, it is an immense undertaking to
say the least. It is a thick book but not as thick as I thought it would be,
mainly because I hadn't counted on the book being so densely packed with knowledge
and insight.
I had listened to a podcast interview with Spiegelhalter while I was reading the book, he says on this podcast that this is the book
that this the culmination of a lifelong effort to best communicate the
knowledge that he had accumulated in his time as a statistician, and he wanted
to write a book out of which he could
teach the material when he's teaching statistics. He took the initiative of organizing
the knowledge that he accrued and put it in an order as well as explained the
material in the best way he felt was possible.
Early on in the book he talks about avoiding the mathematical
tools and the grind of doing statistics in this book because there are plenty
of other books that are focused on turning the crank and applying the
mathematical tools, so he didn't want to
fill the book too much technical details. As concise, precise, and logical as
he could be in writing, I think more mathematical examples would have helped.
Not that he was bad at explaining the concepts, I felt like it would have been
better to illustrate the ideas with some numerical examples. To be fair, he
does have the copious amount of case studies in the book to help illustrate the
concepts. They were all extremely interesting cases that applies statistics.
Maybe it is just me, but I was always just slightly confused, I had to go back
and reread and try to understand the main points. It is an extremely ambitious
undertaking on his part to try to explain the art of statistics to a general
audience, as most people are not well schooled in statistical thinking and
mathematical computation. I am trained technically but I was not formally trained
in probability and statistics. Whatever probability and statistics background I
picked up was from my days as a practicing engineer, so my ability to
understand what the author is trying to say is uneven, my shortfall altogether.
Even though it was a struggle for me, I feel that it was a worthy struggle
because I felt like he enlightened me and illuminated much of the intricate and
complex concepts of statistics that I was not able to understand previously. I
am using this book along with Ian Stewart's Does Dice Play God book as
the fundamental basis of my own autodidactic attempt at learning probability and
statistics. I am counting on these two books to give me a solid framework from
which I can tackle the numerically intensive books that I have on probability
and statistics. First a solid foundation, and then the nitty gritty details of
the computation. I hope this works.
Spiegelhalter’s other purpose is that he is laying out a way
to do statistics better, to explain
statistics better to the general public, and to eliminate much of the myths and
sources of misunderstandings that people use statistics and the way people
teach statistics. This is a Holy Grail which Spiegelhalter is trying to achieve
through this book. The examples in the book demonstrates just how easily people
can be misled applying statistics and then coming to the wrong conclusions.
He splits the book up into 14 chapters. The first three
chapters talks about data what data mean in how the data is summarized and
communicated. He delves deeply into the ideas of the distribution, the mean,
and standard deviations, as well as the importance of data. Once he has
explained the meaning and importance of data, he jumps straight into causality:
what is causality and why is causality so important. Causality is extremely important because this
is where our human nature immediately go to when we use statistics. We naturally and automatically jump to
conclusions based upon our previous knowledge and we base our decisions and
draw conclusions on what we think we see in the statistics without actually examining
our logical fallacies and pitfalls from using statistics that probably does not
point us into our conclusions. Regression
is next, and then he moves on to algorithms and analytics, tools we use to do
prediction.
Following right behind the tools we use to make predictions,
he naturally examines the role of uncertainty: how certain are we about the
statistics that we have taken and how can we determine how certain we are and
how much confidence we have in the numbers that we have taken. Spiegelhalter also
goes into great details about the data taking with importance of data taking: what
the data taking tells us and how careless data taking can lead us into wrong
conclusions. Probability is next, he calls it the language of uncertainty and
variability, this is where many bog down in understanding the nuance of
probability because it is very subtle and convoluted to understand. All this is
used to set up the chapter where he puts probability and statistics together in
context with one another. This is the big payoff because by putting the two very difficult ideas together, the resulting concept is much greater than the
sum of the parts, the complications of the resultant increases exponentially.
Spiegelhalter concedes that this is probably the most difficult chapter in this
book to understand.
After having set that basis for the art of statistics, he answers
the huge amount of questions that he had generated in his explanations. One of the latter chapter is of particular
interest because humans are very adept Bayesian thinkers, that is we tend to
recalibrate our personal probabilities by taking into account any new
information that we had obtained since we had come to a previous decision.
Spiegelhalter jumps right into that with great gusto, and the chapter is
fantastic reading. He distinguishes the way the Bayesians approach probability
versus the frequentists. As a matter of fact, hat battle is still being fought
today. He goes through the arguments for both cogently and his final assessment
is that there are no absolutes in this regard, so that sometimes Bayesian is
better and sometimes frequentists is better, better being more accurate and gives
us better estimates while lowering the amount of uncertainty that we're dealing
with. The trick then is the ability to discern which of the two ways of
calculating probability is better.
In the end, Spiegelhalter makes a case for how we can
practice the art of statistics better. This is the chapter where he lays out
his manifesto on how statistics should be taught and how statistics should be used.
You could feel the passion rise as you read the argument because this is
obviously something that's very important to him, and he saved his best for
last.
In addition, Spiegelhalter does a great service for the
readers when is added, at the end of each
chapter, a summary page of the
conclusions that he draws from each chapter. He places them out there in bullet form and it is identified by a
black border around that page because he clearly want us to understand each and
every point that he made during the body of the chapters. After having gone
through all of the dense reading, it is helpful to have the bulleted list to help
us recall what we had just read and try to make that learning permanent in our
minds.
I would be lying if I said that this was an easy
undertaking, it is not; but it is a very worthwhile hardship that I gladly
undertook because of the amount of insight and potent argument that Spiegelhalter
makes to my novice mind. His way of not
only communicating his knowledge that but also the way he exposes the weak
arguments that he has found in his years of practicing statistics is masterful
and convincing. The added bonus is his valuable insights on what would make the
practice of statistics better, what would make it more useful to us. It is a a
very hard read, but also a rewarding read.