Followers

Search This Blog

Sunday, May 23, 2021

Book Review-The Data Detective by Tim Harford

 The Data Detective is yet another fine book from the Economist Tim Harford. The premise of this book is to give the layman some sense of examining and interrogating the  statistics that are thrown at us in the media, government reports, and research reports.

Harford is a radio host, a popularizer of economics, as well as a renowned economist. He is very well practiced in explaining the many points of confusion that comes with statistically oriented reportage. He is also an excellent writer. In this book he tries to dive into the rarified world of statistics. As he is an economist, he is quite well versed in the area, but it is one thing to be well versed, it is quite another to be well spoken in the arcana of statistics, especially as those who are the producers of the statistics do not practice disciplined statistical data gathering and analysis and are sometimes confused at what they are trying to do. This is not to say that they are naïve, or that they are deliberately obfuscating the discussions by introducing unnecessary complexity. Even though there are those who are guilty of obfuscation, most confusion in statistics come from unconscious biases, which is the thrust of Harford’s book, as well as David Spiegelhalters’s The Art of Statistics (Spiegelhalter 2019) and Ian Stewart’s Do Dice Play God (Stewart 2019). This book adds a third book in my reference library that explains how opinions, policy, and lives are affected by the subconscious biases. Lay people are often misled using statistics. Some have been trained or exposed to a certain amount of statistics in our undergraduate days or even in our work but that just touches on the bare essentials of statistics; that fundamental lack of basic statistical knowledge and the unawareness of how statistics can be misconstrued and misinterpreted  is what confuse us, which  allow people disseminating the statistical information to mislead us, whether the misleading is intentional or not.

Researchers, governments, advertisers, and people who has malfeasance in their hearts will often confuse us intentionally with statistics. Statistics are often so subtle that the interpretations that are given to us often seem to make logical sense, even when the interpretations can be skewed in many ways. This book seeks to explain some of the nuances and gives us something to work with when we read the popular press,  social media outlets, or when we are dealing with very complex issues that cannot be explain with just simple statistics. The complexity of some of these illicit statistics that are quite challenging.

Harford starts the book out in his introduction; he lays out the case of why he's a tackling this problem as he cites a well-known book written by Darrell Huff in 1954 titled How to Lie with Statistics. He relates the story of Huff and his book and declares that this book is not trying to cover the same grounds as Huff’s book. Indeed, Harford is trying to undo the damage that Huff had inflicted on the credibility of statistics in the minds of the public.

Harford neatly lays out his 10 rules  for making sense of statistics, each rule are chapters in the book explains why some of these rules are necessary.  Harford  digs into the past research and past events that serves as examples of where the confusion originates. He then lays out the landscape for the reader. Harford is exceptional at this particular phase of explaining the problem because he is well practiced in explaining complex ideas to the general public.  The best part of the book is that he is very clear on what he wants to say, he is very clear on saying it, and he is clear on his opinion about all of these rules.  The rules are quite nuanced, but they also are quite useful in guiding us through similarly challenging issue which uses statistics. His cerebral agility with the subject is helpful because he is able to communicate the topic.

The problem with most books which seeks to explain statistics is that the sometimes the authors over explains, relying on the assumption that the reader has a well-grounded background in statistics, so the technical jargons flows unabated; while  other times the authors under explain, assuming that the reader does not have any common sense. To be fair, it is very difficult to get the level right because it is difficult to reach a mass audience as the mass audience has varying levels of expertise, but Harford seemed to have found a way to not condescend to the reader while at the same time effectively educating the reader on the basic essentials of statistics and statistical concepts. It is quite remarkable how he does it and it is a bravura performance. He makes it easy for us to understand these rules while also  giving us  enough material to explain the subtleties of each of these rules and their importance. The act of invoking 10 rules is somewhat gimmicky but it seems to work for Hartford because the material sucked me in.

The most interesting chapter is the very last one,  it invokes his Golden rule: Be curious. Harford wrote this chapter to address the polarization of opinions which is rampant in  present day society. This polarization is derived from a number of factors and is exacerbated by the social media’s penchant to encourage being right over learning. What Harford had found through various research is that the best way to ease that tension and to decrease the polarization is to appeal to the curiosity of your opponent; by appealing to their curiosity, we are extending them an olive branch, to meet them halfway,  and to offer to open up our minds to the civilized discussions of the issue which seemingly divides us.

We have all experienced the aftereffects of trying to go head to head against someone who has an opposing viewpoint:  inevitably, both side would dig in even deeper and the need to be right supersedes the need to understand the issue even further. The nuances of the  different shades of grey that exists is painfully lost and forgotten.

I quite enjoyed this book. This book was recommended to me by a friend who saw that I was struggling with some of the issues with statistics that had saturated the air waves during the COVID 19 pandemic. Initially, I looked upon this book with a certain amount of suspicion but since it is Tim Harford and since he wrote one of my more favorite books: Messy, I took a chance. I was glad that I did because this is a superb book. I think however, that Harfords book with Spiegelhalter's book are complimentary, sothey should be read, if not concurrently, then one closely followed by the other.

Harford also references many other authors in the fields of psychology and economics. People like Tetlock, Kahneman, and so on.  The saliency of Harford’s effort is that he helps us to suss the essence of many of these ideas to make it understandable to an educated audience but not an expert audience.

Works Cited

Spiegelhalter, David. The Art of Statistics: Learning from Data. London: Pelican Books, 2019.

Stewart, Ian. Do Dice Play God: The Mathematics of Uncertainty. New York: Profile Books, 2019.

Thursday, April 29, 2021

Stats for Spikes- It's a Statistical Trap!

Sports can be viewed as a continuous flow of actions. We define discrete stages within the flow so that we can observe and analyze the reality of sports because we humans need to slow time down to a point where we can process what we are seeing in our minds. The stages that we define are used to develop an understanding of the flow; the stages do not  reflect the reality of the game. A natural stage marker in a rebound sport such as volleyball is the termination point, that is, when the ball is whistled dead.  Most of the statistics that we do keep — kills, assists, aces, blocks, block assists, and all the associated errors   —  results from  a dead ball. There are some statistics that we take that don’t directly happen at the stoppage of play, but they are statistics that lead to the dead ball: assists, passes, and digs are some that comes to mind. We also count the number of attempts as a way to decide on our efficiency numbers, those are statistics that do not fall into the dead ball/point scored category.

Taking statistics of a volleyball match gives us a simple picture of the match, but because most statistics that we can take are dead ball statistics, it only gives us the endings of a flurry of action. These simple statistics allows us to capture the facts as we know them according to the points scored. What is left not recorded is most of the match. Just as Mozart proposed about music: “The music is not in the notes, but in the silence between". Volleyball is in the movements between touches, and we are unable to take complete statistics on the space in between. Videos are often used today to capture those moments that are missing from the statistics, but not many coaches in the club and high school ranks have access or the staff to completely analyze videos.

As Dr. W. Edward Deming so famously observed: there are many things that are unmeasurable and there are many things that are unknowable. In the realm of sports, those moment between touches are unmeasurable. The reason for the movements of the individuals moving in a complicated and coordinated team dance with their teammates is unknowable. The way to capture the magic of the game between touches is as elusive as capturing the silence between the notes.

While it is critical for coaches to look at those scoring statistics and understand how they, or their opponents are scoring, we need to recognize that those statistics are but a minimal record of what took place. The scoring-based statistics ignores all the interaction between the individual playing in the game; the individual decisions made by each player and how those decisions are acted and reacted upon by their teammates and opponents; it also ignores the cumulative actions by the team as they react to an action and more importantly, whether they are acting and reacting according to how they had been trained to play.

The scoring-based statistics also ignores the effect of how the teams respond to each other. This point was made after the final match of the 2020 NCAA Division I championships between Kentucky and Texas. My friend and I were discussing the stellar play between these two teams. He made the observation that he was surprised at how seemingly porous the  Texas’ defense was, especially for a team that is playing in the national championship match. My response was incredulity. I believe that the reason Texas was losing on the defensive front was because the potency of the Kentucky offense, that the effectiveness of the Kentucky offense made the Texas defense look overwhelmed, which they were. The point is that sports is an activity based on dualities that act as a whole. Tough serving forces passing errors. Great passing makes great serving look like they were serving lollypops. Great blocking can make a porous backrow defense look like world beaters. A poor block can make the best defenders look hapless. Great setting can make a mediocre hitter look like an all American. Great hitter can make a poor setter look phenomenal. Great offenses can make good defenses look overwhelmed. Coaches know and understand these symbiotic relationships inherently.

Why is this so concerning? It is concerning if you are a coach and you don’t understand the back-and-forth flow of the game, it is concerning if you don’t understand that the two teams are coupled as participants in the game, that they cannot perform their intricate sports defined dances without the other, that they are connected through this pursuit we call the game of volleyball.

Most coaches understand this implicitly, most who are new to the game do not understand the implications of the interconnectedness of the two opposing sides.

Even the experienced coaches who understand the game well can fall into a trap set by the statistics. Recent studies revealed that our minds will easily and naturally adapt to new ways of working; naturally giving up old habits as our minds create new habits in reaction to new cognitive challenges. In The Shallows, Nicholas Carr explores the changes in cognitive behavior wrought by the internet: decreases in our attention span, our growing difficulty in focusing on a single task, our frustration in being unable to read for an extended period because we have adapted easily to reading short and simple articles versus hefty and complex books. Most pernicious is our waning ability to think in complicated and conceptual ways because we have adopted the habit of simplifying concepts down to base essentials. Note that I am not a luddite advocating for returning to adapting overcomplicated concepts to explain our games, just for the sake of exercising our cognition. A quote that is most often attributed to Albert Einstein states: Everything should be made as simple as possible, but not simpler. Which is a variant on Occam’s Razor or the law of parsimony. It is the not simpler part of the quote that applies here. Instead of overcomplicating our explanations for why the game moves the way it does, we are subconsciously oversimplifying our explanations in order to make our explanations fit the statistics we have collected.

The act of using volleyball statistics that is only taken for scoring points, narrows  a person’s frame of reference for their vision of the game flow through only the statistics. It changes the way a person’s brain operates, it emphasizes the singular and discrete dead ball dictated actions rather than the flow of a multitude of continuous action. Indeed, if he/she allows the statistical mindset to dominate his/her internal vision of the game,  the focus on statistics forces the coach to ignore the connections between the actions.

This focus on the recordable statistics encourages resulting: (https://polymathtobe.blogspot.com/2018/12/volleyball-coaching-life-resulting.html)

Resulting can be defined as our propensity to mistake the quality of our decisions with the outcome of the decision, that is, we let the result determine how we judge our decision.

Instead of following their global view of how the game is played, a coach would excuse what he/she would usually see as bad playing or making bad decisions by resulting, assuming that their team is playing well because they are winning, or they are scoring.

Our emphasis on using statistics comes from a natural reaction against coaches depending excessively on “gut feel” or passing the “eye test”. Those heuristics are more often than not fraught with biases that are subconscious as well. Statistics becomes extremely useful when coaches use statistics to determine whether their “gut feel” stands up to the challenges of reality. But if coaches’ understanding of the match is filtered through the statistics that are derived from just the points scored, then the coach’s focus is so narrowed that the reality that he/she sees is  distorted, their understanding of what is happening in the match is skewed, which affects their decision making, and ultimately impact their coaching.

This kind of distortion can roughly be interpreted as an application of  Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” (https://polymathtobe.blogspot.com/2021/03/stats-for-spikes-use-of-statistics-as.html)

This is not to say that this habit has overwhelmed the ranks of all coaches; while some experienced coaches may fall into this trap occasionally, I believe that their experience will come to the fore so that they catch themselves. My concern is with those coaches who are not experienced in seeing the game in all its multifaceted glory. Every coach has to start somewhere and if the coach in question did not have the  advantage of having played the game at a high level; if they had not studied the game and its pedagogy thoroughly; or if they have not thought through the game extensively, they would not have an internal vision of the game at its most competitive level. Those are the coaches that would most likely be susceptible to fall into the habit viewing the game through just the statistics.

Every beginning coach is looking for an edge, and statistics is an edge to be had, it is a very potent edge, but statistics is also just one tool in the toolbox; one need to use all the tools that are available. By adopting the statistics-based goggles, they are depriving themselves of a deeper understanding of the game, and they are doing a disservice to their profession and players by limiting themselves and their vision of the game to just a tiny part of the greater whole.

While experienced coaches can self-correct when they fall into the habit, the inexperienced coach will more than likely fall into the habit and not realize that they are in a trap.

So what to do?

·       Be aware: use the statistics but catch yourself getting too focused on the surface level of  statistics.
·       Avoid extrapolating or making inferences based on the surface level statistics.
·       Double check the statistics with your own observations, does the two pictures mesh?
·       Be aware of resulting. Question whether your team executed, you won or lost the point.
·       Trace the logical sequence of the game action.
·       Understand which questions you are asking, we will often substitute a question that has an answer in place of the question that we really want to ask, but we don’t have the data to answer the original question.
·       Understand and accept that there are data that can not be measured and knowledge that can not be known.
·       When in doubt, actively evoke Admiral Ackbar during the your systematic examination of information to make decisions.