Followers

Search This Blog

Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

Sunday, May 23, 2021

Book Review-The Data Detective by Tim Harford

 The Data Detective is yet another fine book from the Economist Tim Harford. The premise of this book is to give the layman some sense of examining and interrogating the  statistics that are thrown at us in the media, government reports, and research reports.

Harford is a radio host, a popularizer of economics, as well as a renowned economist. He is very well practiced in explaining the many points of confusion that comes with statistically oriented reportage. He is also an excellent writer. In this book he tries to dive into the rarified world of statistics. As he is an economist, he is quite well versed in the area, but it is one thing to be well versed, it is quite another to be well spoken in the arcana of statistics, especially as those who are the producers of the statistics do not practice disciplined statistical data gathering and analysis and are sometimes confused at what they are trying to do. This is not to say that they are naïve, or that they are deliberately obfuscating the discussions by introducing unnecessary complexity. Even though there are those who are guilty of obfuscation, most confusion in statistics come from unconscious biases, which is the thrust of Harford’s book, as well as David Spiegelhalters’s The Art of Statistics (Spiegelhalter 2019) and Ian Stewart’s Do Dice Play God (Stewart 2019). This book adds a third book in my reference library that explains how opinions, policy, and lives are affected by the subconscious biases. Lay people are often misled using statistics. Some have been trained or exposed to a certain amount of statistics in our undergraduate days or even in our work but that just touches on the bare essentials of statistics; that fundamental lack of basic statistical knowledge and the unawareness of how statistics can be misconstrued and misinterpreted  is what confuse us, which  allow people disseminating the statistical information to mislead us, whether the misleading is intentional or not.

Researchers, governments, advertisers, and people who has malfeasance in their hearts will often confuse us intentionally with statistics. Statistics are often so subtle that the interpretations that are given to us often seem to make logical sense, even when the interpretations can be skewed in many ways. This book seeks to explain some of the nuances and gives us something to work with when we read the popular press,  social media outlets, or when we are dealing with very complex issues that cannot be explain with just simple statistics. The complexity of some of these illicit statistics that are quite challenging.

Harford starts the book out in his introduction; he lays out the case of why he's a tackling this problem as he cites a well-known book written by Darrell Huff in 1954 titled How to Lie with Statistics. He relates the story of Huff and his book and declares that this book is not trying to cover the same grounds as Huff’s book. Indeed, Harford is trying to undo the damage that Huff had inflicted on the credibility of statistics in the minds of the public.

Harford neatly lays out his 10 rules  for making sense of statistics, each rule are chapters in the book explains why some of these rules are necessary.  Harford  digs into the past research and past events that serves as examples of where the confusion originates. He then lays out the landscape for the reader. Harford is exceptional at this particular phase of explaining the problem because he is well practiced in explaining complex ideas to the general public.  The best part of the book is that he is very clear on what he wants to say, he is very clear on saying it, and he is clear on his opinion about all of these rules.  The rules are quite nuanced, but they also are quite useful in guiding us through similarly challenging issue which uses statistics. His cerebral agility with the subject is helpful because he is able to communicate the topic.

The problem with most books which seeks to explain statistics is that the sometimes the authors over explains, relying on the assumption that the reader has a well-grounded background in statistics, so the technical jargons flows unabated; while  other times the authors under explain, assuming that the reader does not have any common sense. To be fair, it is very difficult to get the level right because it is difficult to reach a mass audience as the mass audience has varying levels of expertise, but Harford seemed to have found a way to not condescend to the reader while at the same time effectively educating the reader on the basic essentials of statistics and statistical concepts. It is quite remarkable how he does it and it is a bravura performance. He makes it easy for us to understand these rules while also  giving us  enough material to explain the subtleties of each of these rules and their importance. The act of invoking 10 rules is somewhat gimmicky but it seems to work for Hartford because the material sucked me in.

The most interesting chapter is the very last one,  it invokes his Golden rule: Be curious. Harford wrote this chapter to address the polarization of opinions which is rampant in  present day society. This polarization is derived from a number of factors and is exacerbated by the social media’s penchant to encourage being right over learning. What Harford had found through various research is that the best way to ease that tension and to decrease the polarization is to appeal to the curiosity of your opponent; by appealing to their curiosity, we are extending them an olive branch, to meet them halfway,  and to offer to open up our minds to the civilized discussions of the issue which seemingly divides us.

We have all experienced the aftereffects of trying to go head to head against someone who has an opposing viewpoint:  inevitably, both side would dig in even deeper and the need to be right supersedes the need to understand the issue even further. The nuances of the  different shades of grey that exists is painfully lost and forgotten.

I quite enjoyed this book. This book was recommended to me by a friend who saw that I was struggling with some of the issues with statistics that had saturated the air waves during the COVID 19 pandemic. Initially, I looked upon this book with a certain amount of suspicion but since it is Tim Harford and since he wrote one of my more favorite books: Messy, I took a chance. I was glad that I did because this is a superb book. I think however, that Harfords book with Spiegelhalter's book are complimentary, sothey should be read, if not concurrently, then one closely followed by the other.

Harford also references many other authors in the fields of psychology and economics. People like Tetlock, Kahneman, and so on.  The saliency of Harford’s effort is that he helps us to suss the essence of many of these ideas to make it understandable to an educated audience but not an expert audience.

Works Cited

Spiegelhalter, David. The Art of Statistics: Learning from Data. London: Pelican Books, 2019.

Stewart, Ian. Do Dice Play God: The Mathematics of Uncertainty. New York: Profile Books, 2019.

Tuesday, July 28, 2020

Book Review-Do Dice Play God: The Mathematics of Uncertainty

By Ian Stewart

This is the book that I was very eager to read because of the subject: the mathematics of uncertainty. I read it in parallel with The Art of Statistics by David Spiegelhalter because I felt the combination of the two books on similar topics from two different directions would make the reading experience more complete as the two books should complement one another. It was mentally challenging to read both books together, but I am glad I did because my goal was accomplished: they were indeed complementary. There were certain areas where the books overlapped but it was good retrieval practice to go over some of those areas at spaced intervals.

The book comprises of 18 chapters. The first two chapters sets the tone for Stewart.  By defining the six ages of uncertainty in Chapter One, Stewart proceeds to converse about some of the things that humankind has been using to deal with uncertainty and to predict the future. He follows that initial setting of the stage with a qualitative discussion of the idea of probability and statistics. It is a difficult task because it is easier to discuss probability and statistics in terms of the equations. Even with that caveat, Stewart did an excellent job of explaining quantitative concepts qualitatively, it takes someone who deeply understands the ideas, in all their glory, to be able to pull it off, and Stewart did so. This is not to say that the book is completely devoid of numbers and figures, but it was enlightening to be reading about these concepts without equations and mathematics.

The book then proceeds into many topics about uncertainty and randomness. He shares an abundance of examples and evidence which demonstrates the idea. It sometimes feel like an unrelenting onslaught of different cases in different areas, regarding different problems. The examples  come from mathematics, biology, medicine, physics, numerical systems, and many more, which gives proper perspective to the reader as well elicits an understanding about the universality of uncertainty in our reality.  The main topics that I had struggles with,  and that is true of Spiegelhalter's book as well, is the section on the Bayesian probability, even though Stewart did a masterful job of explaining it. I understand Baysian ideas after having read both but I am still easily confused when trying to apply the idea.

Stewart lost me with his explanation of quantum mechanics and the counter intuitive ideas from quantum mechanics. It was a difficult section to read, even though I was exposed to the idea when I was a young engineering student. On the other hand, when Stewart expounded on the ideas of  dynamical systems, he made perfect sense, as I was thinking about  Lorenz attractors when I was studying dynamical systems as a graduate student.  Since I had understood those equations as equations,  it was not much of a leap for me to understand them as applications which made the mathematics more sensible.

As the reader work their way through the book, they will find themselves doing many mental gymnastics with the mathematics that he does present, but he does an excellent job of explaining  why these concepts are so important to us.

The last chapter is the magnum opus chapter that Professor Stewart uses as his platform to summarize his intention with the book. His key intent is to make the general audience become aware and comfortable with the fact that uncertainty is a normal part of life. Professor Stewart has work diligently throughout the book to chip away at our enduring and grossly erroneous belief that our lives are deterministic, and that any uncertainty that we admit or accept is not something that we overcome easily or can be disregarded because the uncertainty plays a very large role in how our lives will often result.

A quick summary of all the topics that that had been discussed ends the book. In returning to these topics while reading these short pages, the reader realizes the extensive number of  topics that Professor Stewart had discussed; more importantly, the reader finally understands the lessons that Professor Stewart is trying to teach us. He started with the basic ideas of how human beings dealt with uncertainty. As humanity progressed along the timeline, we got better at rationalizing some of the uncertainties, and we thought we were able to minimize the uncertainties. We invented tools like statistics and probability; we deliberately tested and  experimented to arrive at what we thought was the truth. Even though this book is not the definitive history of uncertainty in our world, this book does very well in filling some of the obvious gaps in our thinking and dispels enough biases to make the readers at least accept the fact that life itself is uncertain and full of mystery.

I thought the book was a marvelous read even though it was particularly challenging. Professor Stewart explained many different concepts very well, some better than others, but the overall effect is that the reader can gain a much better understanding of how little and how much we know about our world and appreciate how much guessing we are doing on a daily basis.


Saturday, May 16, 2020

Book Review-The Art of Statistics-Learning from Data By David Spiegelhalter

David Spiegelhalter is a world renowned statistician. He does research in statistics, he teaches statistics, and he also applies his statistical knowledge to real world problems. This makes him uniquely qualified to write a book that promises so much. The Art of Statistics: Learning from Data certainly promises a lot to the reader, it is an immense undertaking to say the least. It is a thick book but not as thick as I thought it would be, mainly because I hadn't counted on the book being so densely packed with knowledge and insight.

I had listened to a podcast interview with Spiegelhalter  while I was reading the book,  he says on this podcast that this is the book that this the culmination of a lifelong effort to best communicate the knowledge that he had accumulated in his time as a statistician, and he wanted to write a book out of which  he could teach the material when he's teaching statistics. He took the initiative of organizing the knowledge that he accrued and put it in an order as well as explained the material in the best way he felt was possible.

Early on in the book he talks about avoiding the mathematical tools and the grind of doing statistics in this book because there are plenty of other books that are focused on turning the crank and applying the mathematical tools,  so he didn't want to fill the book too much technical details. As concise, precise, and logical as he could be in writing, I think more mathematical examples would have helped. Not that he was bad at explaining the concepts, I felt like it would have been better to illustrate the ideas with some numerical examples. To be fair, he does have the copious amount of case studies in the book to help illustrate the concepts. They were all extremely interesting cases that applies statistics. Maybe it is just me, but I was always just slightly confused, I had to go back and reread and try to understand the main points. It is an extremely ambitious undertaking on his part to try to explain the art of statistics to a general audience, as most people are not well schooled in statistical thinking and mathematical computation. I am trained technically but I was not formally trained in probability and statistics. Whatever probability and statistics background I picked up was from my days as a practicing engineer, so my ability to understand what the author is trying to say is uneven, my shortfall altogether. Even though it was a struggle for me, I feel that it was a worthy struggle because I felt like he enlightened me and illuminated much of the intricate and complex concepts of statistics that I was not able to understand previously. I am using this book along with Ian Stewart's Does Dice Play God book as the fundamental basis of my own autodidactic attempt at learning probability and statistics. I am counting on these two books to give me a solid framework from which I can tackle the numerically intensive books that I have on probability and statistics. First a solid foundation, and then the nitty gritty details of the computation. I hope this works.

Spiegelhalter’s other purpose is that he is laying out a way  to do statistics better, to explain statistics better to the general public, and to eliminate much of the myths and sources of misunderstandings that people use statistics and the way people teach statistics. This is a Holy Grail which Spiegelhalter is trying to achieve through this book. The examples in the book demonstrates just how easily people can be misled applying statistics and then coming to the wrong conclusions.

He splits the book up into 14 chapters. The first three chapters talks about data what data mean in how the data is summarized and communicated. He delves deeply into the ideas of the distribution, the mean, and standard deviations, as well as the importance of data. Once he has explained the meaning and importance of data, he jumps straight into causality: what is causality and why is causality so important.  Causality is extremely important because this is where our human nature immediately go to when we use statistics.  We naturally and automatically jump to conclusions based upon our previous knowledge and we base our decisions and draw conclusions on what we think we see in the statistics without actually examining our logical fallacies and pitfalls from using statistics that probably does not point us into our conclusions.  Regression is next, and then he moves on to algorithms and analytics, tools we use to do prediction.

Following right behind the tools we use to make predictions, he naturally examines the role of  uncertainty: how certain are we about the statistics that we have taken and how can we determine how certain we are and how much confidence we have in the numbers that we have taken. Spiegelhalter also goes into great details about the data taking with importance of data taking: what the data taking tells us and how careless data taking can lead us into wrong conclusions. Probability is next, he calls it the language of uncertainty and variability, this is where many bog down in understanding the nuance of probability because it is very subtle and convoluted to understand. All this is used to set up the chapter where he puts probability and statistics together in context with one another. This is the big payoff because by putting the  two very difficult ideas together,  the resulting concept is much greater than the sum of the parts, the complications of the resultant increases exponentially. Spiegelhalter concedes that this is probably the most difficult chapter in this book to understand.

After having set that basis for the art of statistics, he answers the huge amount of questions that he had generated in his explanations.  One of the latter chapter is of particular interest because humans are very adept Bayesian thinkers, that is we tend to recalibrate our personal probabilities by taking into account any new information that we had obtained since we had come to a previous decision. Spiegelhalter jumps right into that with great gusto, and the chapter is fantastic reading. He distinguishes the way the Bayesians approach probability versus the frequentists. As a matter of fact, hat battle is still being fought today. He goes through the arguments for both cogently and his final assessment is that there are no absolutes in this regard, so that sometimes Bayesian is better and sometimes frequentists is better, better being more accurate and gives us better estimates while lowering the amount of uncertainty that we're dealing with. The trick then is the ability to discern which of the two ways of calculating probability is better.

In the end, Spiegelhalter makes a case for how we can practice the art of statistics better. This is the chapter where he lays out his manifesto on how statistics should be taught and how statistics should be used. You could feel the passion rise as you read the argument because this is obviously something that's very important to him, and he saved his best for last.

In addition, Spiegelhalter does a great service for the readers when is added,  at the end of each  chapter, a summary page of the conclusions that he draws from each chapter. He places them out  there in bullet form and it is identified by a black border around that page because he clearly want us to understand each and every point that he made during the body of the chapters. After having gone through all of the dense reading, it is helpful to have the bulleted list to help us recall what we had just read and try to make that learning permanent in our minds.

I would be lying if I said that this was an easy undertaking, it is not; but it is a very worthwhile hardship that I gladly undertook because of the amount of insight and potent argument that Spiegelhalter makes to my novice mind. His way of  not only communicating his knowledge that but also the way he exposes the weak arguments that he has found in his years of practicing statistics is masterful and convincing. The added bonus is his valuable insights on what would make the practice of statistics better, what would make it more useful to us. It is a a very hard read, but also a rewarding read.

 


Saturday, September 1, 2018

Book Review-The Lady Tasting Tea By David Salzburg


This was not a book I had envisioned as being something that I would read, let alone grow to love. My experience with statistics had been limited to some courses I took in graduate school and then exposed to when I was on my first job, we were all exposed to statistical process control (SPC) and six sigma. My background in statistics only went so far as knowing some of the SPC tools. As I grew more mature I began to appreciate the usefulness of statistics but I had a hard time connecting the SPC tools I was exposed to with the mathematic heavy statistics that are taught in the textbooks. As I tried to parse through the dense formal statistical curriculum I grew frustrated with my own inability to get through to the kernel of the topic. As I struggled I kept seeing this particular book being recommended by a number of people, so I bought it and prepared for the worst, yet another dense explanation of rudimentary statistics that had very little to do with what I wanted.

To my surprise and amazement, this book was so different, different from any other book that I had ever read. It was a love paean to the study of statistics, it was a gossipy and information laden history of the evolution of the art of probability and statistics, it was a summary of the important developments in statistics, it was an invaluable primer in the methods used in the practical application of statistic, and finally, it was a hefty philosophical discussion of the problems and issues that are still plaguing the researchers in statistics. I think you get the idea that I kind of liked reading this book.

David Salzburg is a practitioner of the art of statistics, he has the ability to explain the very dense concepts in statistics, both the applied tools and the mathematical conundrums with adept ease. Most importantly he did this without employing any mathematics. Which in some ways is very impressive and in other times it was frustrating because it would have been more enlightening to resort to the bare bones mathematics, but no matter.

Prof. Salzburg clearly has a great love for the story as well as for the subject, he has a great sense of history as well as a deft touch for the internecine nastiness that occurred with the giants of statistics. His descriptions of the relationship, or lack thereof between Pearson and Fisher kept me riveted to the narrative. His description of some of the great mathematicians who were caught in the destructive totalitarian regimes during and after World War II added the human dimension to these stories. I don’t know which aspect of the book I appreciated more, the historical perspective or the unraveling the mystery of the functional relationship between statistical tools and ideas.

There is a clear devotion in his writing that reflects his devotion to giving credit where credit is due, even though he apologized for his inability to give credit to all that had contributed, the breadth and depth of the book was astounding and gratifying to someone who appreciates a truly “Big Picture” look at the statistical landscape from the 10,000 feet view. I particularly enjoyed the discussions regarding the contributions of Deming and Shewart to the SPC branch of the vast tree of statistical evolution. I was able to make the connections from those chapters to untie the knot that was in my mind.

The piece de resistance was the final chapter where he discusses his own views on the unexplained philosophical contradictions still existing in statistics. It felt like I was in the midst of the discussion even though I am a dilettante in the art of statistics.
This is a book that comprised of some very dense concepts and it was difficult to focus at times but it was well worth the effort in my mind.