Followers

Search This Blog

Saturday, May 16, 2020

Book Review-The Art of Statistics-Learning from Data By David Spiegelhalter

David Spiegelhalter is a world renowned statistician. He does research in statistics, he teaches statistics, and he also applies his statistical knowledge to real world problems. This makes him uniquely qualified to write a book that promises so much. The Art of Statistics: Learning from Data certainly promises a lot to the reader, it is an immense undertaking to say the least. It is a thick book but not as thick as I thought it would be, mainly because I hadn't counted on the book being so densely packed with knowledge and insight.

I had listened to a podcast interview with Spiegelhalter  while I was reading the book,  he says on this podcast that this is the book that this the culmination of a lifelong effort to best communicate the knowledge that he had accumulated in his time as a statistician, and he wanted to write a book out of which  he could teach the material when he's teaching statistics. He took the initiative of organizing the knowledge that he accrued and put it in an order as well as explained the material in the best way he felt was possible.

Early on in the book he talks about avoiding the mathematical tools and the grind of doing statistics in this book because there are plenty of other books that are focused on turning the crank and applying the mathematical tools,  so he didn't want to fill the book too much technical details. As concise, precise, and logical as he could be in writing, I think more mathematical examples would have helped. Not that he was bad at explaining the concepts, I felt like it would have been better to illustrate the ideas with some numerical examples. To be fair, he does have the copious amount of case studies in the book to help illustrate the concepts. They were all extremely interesting cases that applies statistics. Maybe it is just me, but I was always just slightly confused, I had to go back and reread and try to understand the main points. It is an extremely ambitious undertaking on his part to try to explain the art of statistics to a general audience, as most people are not well schooled in statistical thinking and mathematical computation. I am trained technically but I was not formally trained in probability and statistics. Whatever probability and statistics background I picked up was from my days as a practicing engineer, so my ability to understand what the author is trying to say is uneven, my shortfall altogether. Even though it was a struggle for me, I feel that it was a worthy struggle because I felt like he enlightened me and illuminated much of the intricate and complex concepts of statistics that I was not able to understand previously. I am using this book along with Ian Stewart's Does Dice Play God book as the fundamental basis of my own autodidactic attempt at learning probability and statistics. I am counting on these two books to give me a solid framework from which I can tackle the numerically intensive books that I have on probability and statistics. First a solid foundation, and then the nitty gritty details of the computation. I hope this works.

Spiegelhalter’s other purpose is that he is laying out a way  to do statistics better, to explain statistics better to the general public, and to eliminate much of the myths and sources of misunderstandings that people use statistics and the way people teach statistics. This is a Holy Grail which Spiegelhalter is trying to achieve through this book. The examples in the book demonstrates just how easily people can be misled applying statistics and then coming to the wrong conclusions.

He splits the book up into 14 chapters. The first three chapters talks about data what data mean in how the data is summarized and communicated. He delves deeply into the ideas of the distribution, the mean, and standard deviations, as well as the importance of data. Once he has explained the meaning and importance of data, he jumps straight into causality: what is causality and why is causality so important.  Causality is extremely important because this is where our human nature immediately go to when we use statistics.  We naturally and automatically jump to conclusions based upon our previous knowledge and we base our decisions and draw conclusions on what we think we see in the statistics without actually examining our logical fallacies and pitfalls from using statistics that probably does not point us into our conclusions.  Regression is next, and then he moves on to algorithms and analytics, tools we use to do prediction.

Following right behind the tools we use to make predictions, he naturally examines the role of  uncertainty: how certain are we about the statistics that we have taken and how can we determine how certain we are and how much confidence we have in the numbers that we have taken. Spiegelhalter also goes into great details about the data taking with importance of data taking: what the data taking tells us and how careless data taking can lead us into wrong conclusions. Probability is next, he calls it the language of uncertainty and variability, this is where many bog down in understanding the nuance of probability because it is very subtle and convoluted to understand. All this is used to set up the chapter where he puts probability and statistics together in context with one another. This is the big payoff because by putting the  two very difficult ideas together,  the resulting concept is much greater than the sum of the parts, the complications of the resultant increases exponentially. Spiegelhalter concedes that this is probably the most difficult chapter in this book to understand.

After having set that basis for the art of statistics, he answers the huge amount of questions that he had generated in his explanations.  One of the latter chapter is of particular interest because humans are very adept Bayesian thinkers, that is we tend to recalibrate our personal probabilities by taking into account any new information that we had obtained since we had come to a previous decision. Spiegelhalter jumps right into that with great gusto, and the chapter is fantastic reading. He distinguishes the way the Bayesians approach probability versus the frequentists. As a matter of fact, hat battle is still being fought today. He goes through the arguments for both cogently and his final assessment is that there are no absolutes in this regard, so that sometimes Bayesian is better and sometimes frequentists is better, better being more accurate and gives us better estimates while lowering the amount of uncertainty that we're dealing with. The trick then is the ability to discern which of the two ways of calculating probability is better.

In the end, Spiegelhalter makes a case for how we can practice the art of statistics better. This is the chapter where he lays out his manifesto on how statistics should be taught and how statistics should be used. You could feel the passion rise as you read the argument because this is obviously something that's very important to him, and he saved his best for last.

In addition, Spiegelhalter does a great service for the readers when is added,  at the end of each  chapter, a summary page of the conclusions that he draws from each chapter. He places them out  there in bullet form and it is identified by a black border around that page because he clearly want us to understand each and every point that he made during the body of the chapters. After having gone through all of the dense reading, it is helpful to have the bulleted list to help us recall what we had just read and try to make that learning permanent in our minds.

I would be lying if I said that this was an easy undertaking, it is not; but it is a very worthwhile hardship that I gladly undertook because of the amount of insight and potent argument that Spiegelhalter makes to my novice mind. His way of  not only communicating his knowledge that but also the way he exposes the weak arguments that he has found in his years of practicing statistics is masterful and convincing. The added bonus is his valuable insights on what would make the practice of statistics better, what would make it more useful to us. It is a a very hard read, but also a rewarding read.

 


No comments: