Followers

Search This Blog

Tuesday, July 12, 2022

Stats for Spikes-Goodhart’s Law and Correlation

“When a measure becomes a target, it ceases to be a good measure.”

-Goodhart’s Law as Summarized by Anthropologist Marilyn Strathern

Ever since I read Marilyn Strathern’s summary of Goodhart’s Law, I was intrigued by the truth that it revealed across all human activities.

The short sentence packs a punch. We  have become ultra-focused on measuring results, partly to CYA, partly to gain an understanding on how our activities are progressing. We are convinced of the dire need to measure all variables every step along the way whether they are measurable or not. We can't seem to be doing anything without having to check the results. Ever since the business world became aware of the importance of  statistics for decision making, our society has decided to take measurement of everything that we can measure even if the information is of dubious importance. The idea is to take the data now and we can figure out what the numbers mean later.

The act of making the measurement the target also violates the statistical adage: “correlation does not mean causation,” most of us who have been exposed to Statistical Process Control (SPC) have had that adage drilled into our heads. We take measurements because it gives us a good estimation of how what we are measuring is  performing, it serves as a performance monitor, whether it is performing as we want. We have determined that the variable or variables that we are measuring has a certain statistical correlation to our desired results, it indicates whether we are on the right trajectory. The mistake that we make, which is where we violate Goodhart’s Law, is that we assume the correlation we observe between measurement and result automatically infers causation; making the threshold measurement a target is a simple and natural leap in logic, something we do without critical thought.

It is not a large leap, although it is a fatal leap in many instances.

In the process of leaping from treating a measurement, an estimate of the process, as a target,  we allow human nature to take over. In our eagerness to hit the target, a target that we believe to be causally related to accomplishing our final goal, we make that fatal error of assuming the correlation is causation. We are very much stuck in the If A then B line of thinking when we equate correlation to causation. Even as we ignore the other variables. People assume that the complex system being measured is linear and directly related to the desired result and that the relationship is one-to-one. So much so that we put blinders on, we have our eyes on the prize, and we have “laser” focus.

Reality, however, is multivariable, more than we can fathom or remember. Each of the variables could potentially be related to all the other variables; additionally,  some are more correlated than others.

In my engineering career I saw numerous instances of people making decisions which violates this very principle, even though they have been trained in Statistical Process Control. They were warned about “correlation does not equal causation” yet they either had not internalize the lesson or they ignored it completely because of wishful thinking.

The managers at a company I was working at realized that they were going to miss their June delivery goal. As the end of June straddles the weekend leading into the Fourth of July, they made the decision to pay the workers extra to work through the Fourth of July holiday and included all the products that were delivered over that weekend into the ledger as June delivery, which made their July delivery data abysmal. The production goal is based on an estimate of the capability of the manufacturing facility, the amount of workers, the difficulty of the manufacturing task, and the consistency of the supply chain. The production goals are estimates, not hard targets, because there are too many variables and natural variances with comes with the estimates. Eventually,  estimates will regress to the mean over the long term. By tweaking the number of days in the reporting period to buttress the production estimates, they are guaranteeing that the later estimates will not be met. In this case the regression happened immediately, although not all tweaking would have the same result.

In another case, most large manufacturing companies have adapted project management and project management tools to monitor progress. We were made aware through weekly updates of our manufacturing process status, mainly with regard to two variables: schedules and expenditures through the use of Schedule and Cost Performance Index (SPI and CPI). These are quantitative measures that are tracked and reported by the project managers. They measure the status and are compared to the  estimates that were created in an open loop fashion at the beginning of the project. Changing those SPI and CPI to reflect evolving challenges in real time are not rare events, all with the permission of the customers. Yet, at every weekly meeting, senior management will inevitably succumb to the temptation of cheating their schedules, forcing overtime, taking short cuts to meet the amorphous indices; indices that were determined many months in advance of the present, indices that were determined without knowledge of the evolving challenges. They do this in their eagerness to show the higher ups their ability to “Make things happen,” and then to be rewarded for their “get it done” attitude. The good project managers would counter these short-sighted whims, refraining from compromising the quality of the process.

There are many other instances where both “correlation does not equal causation” adage and Goodhart’s Law are ignored.

·       The US News and World Report college ranking, where universities treat the rankings as targets, they go to great lengths to game the data (measurements) which affects their rankings by treating the rankings as a target, even though they know that the factual meaning of the ranking is ambiguous.

·       In athletics, people have tried to quantify measures to identify athletic talent. In the case of the NFL combines, they run specific drills and measure each athlete’s performance in those drills. Those drills do not measure the potential draftees’ prowess  for playing the game, it just gives the coaches and team management a skewed set of  measurements, because trainers and strength coaches have successfully culled the accumulated data over time and have created customized workouts to train the athletes to meet the threshold of what is believed to be the ideal physical measurements as revealed by the specific drills. Those who come closest to the pre-determined threshold are more likely to be drafted and sign for large bonuses. These measurements do not guarantee success as a professional, yet the focus on that set of targets persists. The prime example is Tom Brady, he was the 199th pick in the sixth round in the 2000 draft.

Yet the NFL was able to change their thought process when they stopped using the Wonderlic test as a measure of sports intelligence; maybe it is because that test is abysmal in predicting professional success.

It is also in sports that we see steadfast refusal to fall into the trap of ignoring the Goodhart’s Law and “correlation does not equal causation” adage.

There are important statistics and measurements, in all sports, the successful coaches view them as partial measures of the total team performance. The successful coaches understand that sports are messy, complicated, and interrelated, to try to infer performance from such pristine and single dimensional data is foolish. They understand that focusing on only one statistic, or statistics from one aspect of the total game gives them only a partial picture. Using only on-base percentage in baseball, ace to error ratio in volleyball, points in the paint in basketball, total rushing yards in football, etc. really does not indicate that the team will win, they are just one statistic out of many. It is up to the coach to understand the interaction of all the changing data in the context of the game.

Nature also plays tricks on the measurements because there will always be the unmeasurable factors which skews the total picture of each game.

The point of all this is not that we should not make measurements, or that data is inherently skewed. On the contrary, measurements are critical to our understanding the unknown, it gives us a means of judging whether the complex system that we are working with is behaving as we hope. The key to gaining knowledge and making good decisions has more to do with asking questions about our assumptions and beliefs before we start drawing inference. Data and information does not give humans the tools to predict outcome; humans infer the predictions based on their experiences and humans will draw wrong inferences more often than the correct inferences if left alone.

Questions need to be asked, and often. More importantly, those asked questions must address our proclivity to assume that correlation is causation, and our desire to make measurements a target.