Followers

Search This Blog

Showing posts with label Causation. Show all posts
Showing posts with label Causation. Show all posts

Tuesday, July 12, 2022

Stats for Spikes-Goodhart’s Law and Correlation

“When a measure becomes a target, it ceases to be a good measure.”

-Goodhart’s Law as Summarized by Anthropologist Marilyn Strathern

Ever since I read Marilyn Strathern’s summary of Goodhart’s Law, I was intrigued by the truth that it revealed across all human activities.

The short sentence packs a punch. We  have become ultra-focused on measuring results, partly to CYA, partly to gain an understanding on how our activities are progressing. We are convinced of the dire need to measure all variables every step along the way whether they are measurable or not. We can't seem to be doing anything without having to check the results. Ever since the business world became aware of the importance of  statistics for decision making, our society has decided to take measurement of everything that we can measure even if the information is of dubious importance. The idea is to take the data now and we can figure out what the numbers mean later.

The act of making the measurement the target also violates the statistical adage: “correlation does not mean causation,” most of us who have been exposed to Statistical Process Control (SPC) have had that adage drilled into our heads. We take measurements because it gives us a good estimation of how what we are measuring is  performing, it serves as a performance monitor, whether it is performing as we want. We have determined that the variable or variables that we are measuring has a certain statistical correlation to our desired results, it indicates whether we are on the right trajectory. The mistake that we make, which is where we violate Goodhart’s Law, is that we assume the correlation we observe between measurement and result automatically infers causation; making the threshold measurement a target is a simple and natural leap in logic, something we do without critical thought.

It is not a large leap, although it is a fatal leap in many instances.

In the process of leaping from treating a measurement, an estimate of the process, as a target,  we allow human nature to take over. In our eagerness to hit the target, a target that we believe to be causally related to accomplishing our final goal, we make that fatal error of assuming the correlation is causation. We are very much stuck in the If A then B line of thinking when we equate correlation to causation. Even as we ignore the other variables. People assume that the complex system being measured is linear and directly related to the desired result and that the relationship is one-to-one. So much so that we put blinders on, we have our eyes on the prize, and we have “laser” focus.

Reality, however, is multivariable, more than we can fathom or remember. Each of the variables could potentially be related to all the other variables; additionally,  some are more correlated than others.

In my engineering career I saw numerous instances of people making decisions which violates this very principle, even though they have been trained in Statistical Process Control. They were warned about “correlation does not equal causation” yet they either had not internalize the lesson or they ignored it completely because of wishful thinking.

The managers at a company I was working at realized that they were going to miss their June delivery goal. As the end of June straddles the weekend leading into the Fourth of July, they made the decision to pay the workers extra to work through the Fourth of July holiday and included all the products that were delivered over that weekend into the ledger as June delivery, which made their July delivery data abysmal. The production goal is based on an estimate of the capability of the manufacturing facility, the amount of workers, the difficulty of the manufacturing task, and the consistency of the supply chain. The production goals are estimates, not hard targets, because there are too many variables and natural variances with comes with the estimates. Eventually,  estimates will regress to the mean over the long term. By tweaking the number of days in the reporting period to buttress the production estimates, they are guaranteeing that the later estimates will not be met. In this case the regression happened immediately, although not all tweaking would have the same result.

In another case, most large manufacturing companies have adapted project management and project management tools to monitor progress. We were made aware through weekly updates of our manufacturing process status, mainly with regard to two variables: schedules and expenditures through the use of Schedule and Cost Performance Index (SPI and CPI). These are quantitative measures that are tracked and reported by the project managers. They measure the status and are compared to the  estimates that were created in an open loop fashion at the beginning of the project. Changing those SPI and CPI to reflect evolving challenges in real time are not rare events, all with the permission of the customers. Yet, at every weekly meeting, senior management will inevitably succumb to the temptation of cheating their schedules, forcing overtime, taking short cuts to meet the amorphous indices; indices that were determined many months in advance of the present, indices that were determined without knowledge of the evolving challenges. They do this in their eagerness to show the higher ups their ability to “Make things happen,” and then to be rewarded for their “get it done” attitude. The good project managers would counter these short-sighted whims, refraining from compromising the quality of the process.

There are many other instances where both “correlation does not equal causation” adage and Goodhart’s Law are ignored.

·       The US News and World Report college ranking, where universities treat the rankings as targets, they go to great lengths to game the data (measurements) which affects their rankings by treating the rankings as a target, even though they know that the factual meaning of the ranking is ambiguous.

·       In athletics, people have tried to quantify measures to identify athletic talent. In the case of the NFL combines, they run specific drills and measure each athlete’s performance in those drills. Those drills do not measure the potential draftees’ prowess  for playing the game, it just gives the coaches and team management a skewed set of  measurements, because trainers and strength coaches have successfully culled the accumulated data over time and have created customized workouts to train the athletes to meet the threshold of what is believed to be the ideal physical measurements as revealed by the specific drills. Those who come closest to the pre-determined threshold are more likely to be drafted and sign for large bonuses. These measurements do not guarantee success as a professional, yet the focus on that set of targets persists. The prime example is Tom Brady, he was the 199th pick in the sixth round in the 2000 draft.

Yet the NFL was able to change their thought process when they stopped using the Wonderlic test as a measure of sports intelligence; maybe it is because that test is abysmal in predicting professional success.

It is also in sports that we see steadfast refusal to fall into the trap of ignoring the Goodhart’s Law and “correlation does not equal causation” adage.

There are important statistics and measurements, in all sports, the successful coaches view them as partial measures of the total team performance. The successful coaches understand that sports are messy, complicated, and interrelated, to try to infer performance from such pristine and single dimensional data is foolish. They understand that focusing on only one statistic, or statistics from one aspect of the total game gives them only a partial picture. Using only on-base percentage in baseball, ace to error ratio in volleyball, points in the paint in basketball, total rushing yards in football, etc. really does not indicate that the team will win, they are just one statistic out of many. It is up to the coach to understand the interaction of all the changing data in the context of the game.

Nature also plays tricks on the measurements because there will always be the unmeasurable factors which skews the total picture of each game.

The point of all this is not that we should not make measurements, or that data is inherently skewed. On the contrary, measurements are critical to our understanding the unknown, it gives us a means of judging whether the complex system that we are working with is behaving as we hope. The key to gaining knowledge and making good decisions has more to do with asking questions about our assumptions and beliefs before we start drawing inference. Data and information does not give humans the tools to predict outcome; humans infer the predictions based on their experiences and humans will draw wrong inferences more often than the correct inferences if left alone.

Questions need to be asked, and often. More importantly, those asked questions must address our proclivity to assume that correlation is causation, and our desire to make measurements a target.

 

 

 

 

Saturday, January 4, 2020

Stats For Spikes-Correlation and Causation


This article (Paine 2016) caught my attention recently. It talks about the case of Charles Reep, a former Royal Air Force Wing Commander who was tracking play-by-play data for matches and serving as a quantitative consultant for Football League teams as early as the 1950s.


https://fivethirtyeight.com/features/how-one-mans-bad-math-helped-ruin-decades-of-english-soccer/amp/?__twitter_impression=true&fbclid=IwAR0MNCiSu4nJIcGYvW5dRoTif1mzNc6MJzo8c-AFLU-mDWqZgWOCnT75tIw


The article recalls how Reep’s analytics caused him to conclude that the number of passes made in soccer is directly correlated to scoring. His admonition is that shooting after three passes or less have a higher probability for scoring a goal.

But Reep was making a huge mistake. Put simply, Reep started with each goal scored and looked at how many passes were made prior to scoring. His starting point was goals scored. The problem is that most goals scored in soccer do come after three passes or less, because that is the nature of the game, it is sporadic, and the passing game get disrupted frequently by the defense. What he did not count were the goals missed after just three passes, that block of data is missing because of his focus on just scoring the goal.
In a previous article, Neil Paine of the website Five Thirty-Eight refuted that bit of wisdom gleaned from Reep’s agglomeration of soccer data.

https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-beautiful-game/

But subsequent analysis has discredited this way of thinking. Reep’s mistake was to fixate on the percentage of goals generated by passing sequences of various lengths. Instead, he should have flipped things around, focusing on the probability that a given sequence would produce a goal. Yes, a large proportion of goals are generated on short possessions, but soccer is also fundamentally a game of short possessions and frequent turnovers. If you account for how often each sequence-length occurs during the flow of play, of course more goals are going to come off of smaller sequences — after all, they’re easily the most common type of sequence. But that doesn’t mean a small sequence has a higher probability of leading to a goal.

To the contrary, a team’s probability of scoring goes up as it strings together more successful passes. The implication of this statistical about-face is that maintaining possession is important in soccer. There’s a good relationship between a team’s time spent in control of the ball and its ability to generate shots on target, which in turn is hugely predictive of a team’s scoring rate and, consequently, its placement in the league table. While there’s less rhyme or reason to the rate at which teams convert those scoring chances into goals, modern analysis has ascertained that possession plays a big role in creating offensive opportunities, and that effective short passing — fueled largely by having pass targets move to soft spots in the defense before ever receiving the ball — is strongly associated with building and maintaining possession. (Paine 2014)

To reiterated, he should have focused tracking the number of possessions and whether those possession turned into goals.  Given the complexity of the game, it was perhaps understandable that Reep made this mistake, and given that the state of the art of statistical analysis in sports was still rudimentary, it was perhaps predictable. The unfortunate thing is that Reep was able to convince an entire nation’s soccer establishment, not just any nation, but the nation where the game was born, the nation who’s excellence in the game was globally recognized to go off on a wild goose chase. People should have known better. Maybe.
This brings us to an oft repeated but rarely observed tenet of using statistics in applications: Correlation does not equal causation. The saying may sound glib, but it is remarkably dead on.  If you find some kind of correlation between two events, then our habit and inclination is to jump to the conclusion that the two events have a causal relationship; that is, one event caused the other to occur, or that we can deterministically and reasonably predict the latter event will result from the occurrence of the first event. Unfortunately for us that is rarely the case. Establishing causality takes a bit of mathematical formal checking, just because the statistics show some kind of correlation exists between the two events, however minimal, doesn’t necessarily mean that they have a causal relationship.

In order to establish causality, a lot of number crunching needs to happen, and a lot of statistical metrics need to meet certain established thresholds before we can declare causality. That is a completely different arm of statistical sciences call inferential statistics. Far too involved for me to try to explain here and now, even assuming I can explain it. A rather large and dodgy assumption.
Another thing that Reep’s error illustrates is the Survivorship bias. The story of Abraham Wald and the US warplanes is a favorite on social media and business writers because it perfectly demonstrates the linear and direct thinking most people employ when they see data, or results without taking into account the underlying situation.

Abraham Wald was born in 1902 in the then Austria-Hungarian empire. After graduating in Mathematics he lectured in Economics in Vienna. As a Jew following the Anschluss between Nazi Germany and Austria in 1938 Wald and his family faced persecution and so they emigrated to the USA after he was offered a university position at Yale. During World War Two Wald was a member of the Statistical Research Group (SRG) as the US tried to approach military problems with research methodology.
One problem the US military faced was how to reduce aircraft casualties. They researched the damage received to their planes returning from conflict. By mapping out damage they found their planes were receiving most bullet holes to the wings and tail. The engine was spared.


The US military’s conclusion was simple: the wings and tail are obviously vulnerable to receiving bullets. We need to increase armour to these areas. Wald stepped in. His conclusion was surprising: don’t armour the wings and tail. Armour the engine.

Wald’s insight and reasoning were based on understanding what we now call survivorship bias. Bias is any factor in the research process which skews the results. Survivorship bias describes the error of looking only at subjects who’ve reached a certain point without considering the (often invisible) subjects who haven’t. In the case of the US military they were only studying the planes which had returned to base following conflict i.e. the survivors. In other words what their diagram of bullet holes actually showed was the areas their planes could sustain damage and still be able to fly and bring their pilots home. (Thomas 2019)

What Reep saw was goals, he was fixated on them rather than the big picture, he fell into the trap of reaching the first and most obvious conclusion rather than try to explore the structure of the game. Sometimes prior experience is very useful and not everything new is golden.

Works Cited

Paine, Neil. 2016. "How One Man’s Bad Math Helped Ruin Decades Of English Soccer." http://www.fivethirtyeight.com. October 27. Accessed December 24, 2019. https://fivethirtyeight.com/features/how-one-mans-bad-math-helped-ruin-decades-of-english-soccer/amp/?__twitter_impression=true&fbclid=IwAR0MNCiSu4nJIcGYvW5dRoTif1mzNc6MJzo8c-AFLU-mDWqZgWOCnT75tIw.
—. 2014. "What Analytics Can Teach Us About the Beautiful Game." http://www.fivethirtyeight.com. June 12. Accessed December 24, 2019. https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-beautiful-game/.

Thomas, James. 2019. "Survivorship BIas." McDreeamie Musings. April 1. Accessed December 28, 2019. https://mcdreeamiemusings.com/blog/2019/4/1/survivorship-bias-how-lessons-from-world-war-two-affect-clinical-research-today.