“When a
measure becomes a target, it ceases to be a good measure.”
-Goodhart’s
Law as Summarized by Anthropologist Marilyn Strathern
Ever since I read Marilyn Strathern’s summary of Goodhart’s
Law, I was intrigued by the truth that it revealed across all human activities.
The short sentence packs a punch. We have become ultra-focused on measuring results,
partly to CYA, partly to gain an understanding on how our activities are
progressing. We are convinced of the dire need to measure all variables every
step along the way whether they are measurable or not. We can't seem to be
doing anything without having to check the results. Ever since the business
world became aware of the importance of statistics for decision making, our society
has decided to take measurement of everything that we can measure even if the information
is of dubious importance. The idea is to take the data now and we can figure
out what the numbers mean later.
The act of making the measurement the target also violates
the statistical adage: “correlation does not mean causation,” most of us
who have been exposed to Statistical Process Control (SPC) have had that adage
drilled into our heads. We take measurements because it gives us a good estimation
of how what we are measuring is performing,
it serves as a performance monitor, whether it is performing as we want. We
have determined that the variable or variables that we are measuring has a
certain statistical correlation to our desired results, it indicates whether we
are on the right trajectory. The mistake that we make, which is where we violate
Goodhart’s Law, is that we assume the correlation we observe between measurement
and result automatically infers causation; making the threshold measurement a
target is a simple and natural leap in logic, something we do without critical
thought.
It is not a large leap, although it is a fatal leap in many
instances.
In the process of leaping from treating a measurement, an
estimate of the process, as a target, we
allow human nature to take over. In our eagerness to hit the target, a target
that we believe to be causally related to accomplishing our final goal, we make
that fatal error of assuming the correlation is causation. We are very much
stuck in the If A then B line of thinking when we equate correlation to
causation. Even as we ignore the other variables. People assume that the complex
system being measured is linear and directly related to the desired result and
that the relationship is one-to-one. So much so that we put blinders on, we
have our eyes on the prize, and we have “laser” focus.
Reality, however, is multivariable, more than we can fathom
or remember. Each of the variables could potentially be related to all the
other variables; additionally, some are
more correlated than others.
In my engineering career I saw numerous instances of people
making decisions which violates this very principle, even though they have been
trained in Statistical Process Control. They were warned about “correlation
does not equal causation” yet they either had not internalize the lesson or
they ignored it completely because of wishful thinking.
The managers at a company I was working at realized that
they were going to miss their June delivery goal. As the end of June straddles
the weekend leading into the Fourth of July, they made the decision to pay the
workers extra to work through the Fourth of July holiday and included all the
products that were delivered over that weekend into the ledger as June delivery,
which made their July delivery data abysmal. The production goal is based on an
estimate of the capability of the manufacturing facility, the amount of
workers, the difficulty of the manufacturing task, and the consistency of the
supply chain. The production goals are estimates, not hard targets, because there
are too many variables and natural variances with comes with the estimates.
Eventually, estimates will regress to
the mean over the long term. By tweaking the number of days in the reporting
period to buttress the production estimates, they are guaranteeing that the later
estimates will not be met. In this case the regression happened immediately, although
not all tweaking would have the same result.
In another case, most large manufacturing companies have
adapted project management and project management tools to monitor progress. We
were made aware through weekly updates of our manufacturing process status,
mainly with regard to two variables: schedules and expenditures through the use
of Schedule and Cost Performance Index (SPI and CPI). These are quantitative measures
that are tracked and reported by the project managers. They measure the status and
are compared to the estimates that were created
in an open loop fashion at the beginning of the project. Changing those SPI and
CPI to reflect evolving challenges in real time are not rare events, all with
the permission of the customers. Yet, at every weekly meeting, senior
management will inevitably succumb to the temptation of cheating their
schedules, forcing overtime, taking short cuts to meet the amorphous indices;
indices that were determined many months in advance of the present, indices
that were determined without knowledge of the evolving challenges. They do this
in their eagerness to show the higher ups their ability to “Make things happen,”
and then to be rewarded for their “get it done” attitude. The good project
managers would counter these short-sighted whims, refraining from compromising
the quality of the process.
There are many other instances where both “correlation
does not equal causation” adage and Goodhart’s Law are ignored.
· The US News and World Report college ranking, where universities treat the rankings as targets, they go to great lengths to game the data (measurements) which affects their rankings by treating the rankings as a target, even though they know that the factual meaning of the ranking is ambiguous.
· In athletics, people have tried to quantify measures to identify athletic talent. In the case of the NFL combines, they run specific drills and measure each athlete’s performance in those drills. Those drills do not measure the potential draftees’ prowess for playing the game, it just gives the coaches and team management a skewed set of measurements, because trainers and strength coaches have successfully culled the accumulated data over time and have created customized workouts to train the athletes to meet the threshold of what is believed to be the ideal physical measurements as revealed by the specific drills. Those who come closest to the pre-determined threshold are more likely to be drafted and sign for large bonuses. These measurements do not guarantee success as a professional, yet the focus on that set of targets persists. The prime example is Tom Brady, he was the 199th pick in the sixth round in the 2000 draft.
Yet the NFL was able to change their thought process when
they stopped using the Wonderlic test as a measure of sports intelligence;
maybe it is because that test is abysmal in predicting professional success.
It is also in sports that we see steadfast refusal to fall
into the trap of ignoring the Goodhart’s Law and “correlation does not equal
causation” adage.
There are important statistics and measurements, in all sports,
the successful coaches view them as partial measures of the total team
performance. The successful coaches understand that sports are messy,
complicated, and interrelated, to try to infer performance from such pristine
and single dimensional data is foolish. They understand that focusing on only
one statistic, or statistics from one aspect of the total game gives them only
a partial picture. Using only on-base percentage in baseball, ace to error
ratio in volleyball, points in the paint in basketball, total rushing yards in
football, etc. really does not indicate that the team will win, they are just
one statistic out of many. It is up to the coach to understand the interaction
of all the changing data in the context of the game.
Nature also plays tricks on the measurements because there will
always be the unmeasurable factors which skews the total picture of each game.
The point of all this is not that we should not make measurements,
or that data is inherently skewed. On the contrary, measurements are critical
to our understanding the unknown, it gives us a means of judging whether the
complex system that we are working with is behaving as we hope. The key to gaining
knowledge and making good decisions has more to do with asking questions about our
assumptions and beliefs before we start drawing inference. Data and information
does not give humans the tools to predict outcome; humans infer the predictions
based on their experiences and humans will draw wrong inferences more often
than the correct inferences if left alone.
Questions need to be asked, and often. More importantly,
those asked questions must address our proclivity to assume that correlation is
causation, and our desire to make measurements a target.
No comments:
Post a Comment