Followers

Search This Blog

Showing posts with label Stats For Spikes. Show all posts
Showing posts with label Stats For Spikes. Show all posts

Tuesday, July 12, 2022

Stats for Spikes-Goodhart’s Law and Correlation

“When a measure becomes a target, it ceases to be a good measure.”

-Goodhart’s Law as Summarized by Anthropologist Marilyn Strathern

Ever since I read Marilyn Strathern’s summary of Goodhart’s Law, I was intrigued by the truth that it revealed across all human activities.

The short sentence packs a punch. We  have become ultra-focused on measuring results, partly to CYA, partly to gain an understanding on how our activities are progressing. We are convinced of the dire need to measure all variables every step along the way whether they are measurable or not. We can't seem to be doing anything without having to check the results. Ever since the business world became aware of the importance of  statistics for decision making, our society has decided to take measurement of everything that we can measure even if the information is of dubious importance. The idea is to take the data now and we can figure out what the numbers mean later.

The act of making the measurement the target also violates the statistical adage: “correlation does not mean causation,” most of us who have been exposed to Statistical Process Control (SPC) have had that adage drilled into our heads. We take measurements because it gives us a good estimation of how what we are measuring is  performing, it serves as a performance monitor, whether it is performing as we want. We have determined that the variable or variables that we are measuring has a certain statistical correlation to our desired results, it indicates whether we are on the right trajectory. The mistake that we make, which is where we violate Goodhart’s Law, is that we assume the correlation we observe between measurement and result automatically infers causation; making the threshold measurement a target is a simple and natural leap in logic, something we do without critical thought.

It is not a large leap, although it is a fatal leap in many instances.

In the process of leaping from treating a measurement, an estimate of the process, as a target,  we allow human nature to take over. In our eagerness to hit the target, a target that we believe to be causally related to accomplishing our final goal, we make that fatal error of assuming the correlation is causation. We are very much stuck in the If A then B line of thinking when we equate correlation to causation. Even as we ignore the other variables. People assume that the complex system being measured is linear and directly related to the desired result and that the relationship is one-to-one. So much so that we put blinders on, we have our eyes on the prize, and we have “laser” focus.

Reality, however, is multivariable, more than we can fathom or remember. Each of the variables could potentially be related to all the other variables; additionally,  some are more correlated than others.

In my engineering career I saw numerous instances of people making decisions which violates this very principle, even though they have been trained in Statistical Process Control. They were warned about “correlation does not equal causation” yet they either had not internalize the lesson or they ignored it completely because of wishful thinking.

The managers at a company I was working at realized that they were going to miss their June delivery goal. As the end of June straddles the weekend leading into the Fourth of July, they made the decision to pay the workers extra to work through the Fourth of July holiday and included all the products that were delivered over that weekend into the ledger as June delivery, which made their July delivery data abysmal. The production goal is based on an estimate of the capability of the manufacturing facility, the amount of workers, the difficulty of the manufacturing task, and the consistency of the supply chain. The production goals are estimates, not hard targets, because there are too many variables and natural variances with comes with the estimates. Eventually,  estimates will regress to the mean over the long term. By tweaking the number of days in the reporting period to buttress the production estimates, they are guaranteeing that the later estimates will not be met. In this case the regression happened immediately, although not all tweaking would have the same result.

In another case, most large manufacturing companies have adapted project management and project management tools to monitor progress. We were made aware through weekly updates of our manufacturing process status, mainly with regard to two variables: schedules and expenditures through the use of Schedule and Cost Performance Index (SPI and CPI). These are quantitative measures that are tracked and reported by the project managers. They measure the status and are compared to the  estimates that were created in an open loop fashion at the beginning of the project. Changing those SPI and CPI to reflect evolving challenges in real time are not rare events, all with the permission of the customers. Yet, at every weekly meeting, senior management will inevitably succumb to the temptation of cheating their schedules, forcing overtime, taking short cuts to meet the amorphous indices; indices that were determined many months in advance of the present, indices that were determined without knowledge of the evolving challenges. They do this in their eagerness to show the higher ups their ability to “Make things happen,” and then to be rewarded for their “get it done” attitude. The good project managers would counter these short-sighted whims, refraining from compromising the quality of the process.

There are many other instances where both “correlation does not equal causation” adage and Goodhart’s Law are ignored.

·       The US News and World Report college ranking, where universities treat the rankings as targets, they go to great lengths to game the data (measurements) which affects their rankings by treating the rankings as a target, even though they know that the factual meaning of the ranking is ambiguous.

·       In athletics, people have tried to quantify measures to identify athletic talent. In the case of the NFL combines, they run specific drills and measure each athlete’s performance in those drills. Those drills do not measure the potential draftees’ prowess  for playing the game, it just gives the coaches and team management a skewed set of  measurements, because trainers and strength coaches have successfully culled the accumulated data over time and have created customized workouts to train the athletes to meet the threshold of what is believed to be the ideal physical measurements as revealed by the specific drills. Those who come closest to the pre-determined threshold are more likely to be drafted and sign for large bonuses. These measurements do not guarantee success as a professional, yet the focus on that set of targets persists. The prime example is Tom Brady, he was the 199th pick in the sixth round in the 2000 draft.

Yet the NFL was able to change their thought process when they stopped using the Wonderlic test as a measure of sports intelligence; maybe it is because that test is abysmal in predicting professional success.

It is also in sports that we see steadfast refusal to fall into the trap of ignoring the Goodhart’s Law and “correlation does not equal causation” adage.

There are important statistics and measurements, in all sports, the successful coaches view them as partial measures of the total team performance. The successful coaches understand that sports are messy, complicated, and interrelated, to try to infer performance from such pristine and single dimensional data is foolish. They understand that focusing on only one statistic, or statistics from one aspect of the total game gives them only a partial picture. Using only on-base percentage in baseball, ace to error ratio in volleyball, points in the paint in basketball, total rushing yards in football, etc. really does not indicate that the team will win, they are just one statistic out of many. It is up to the coach to understand the interaction of all the changing data in the context of the game.

Nature also plays tricks on the measurements because there will always be the unmeasurable factors which skews the total picture of each game.

The point of all this is not that we should not make measurements, or that data is inherently skewed. On the contrary, measurements are critical to our understanding the unknown, it gives us a means of judging whether the complex system that we are working with is behaving as we hope. The key to gaining knowledge and making good decisions has more to do with asking questions about our assumptions and beliefs before we start drawing inference. Data and information does not give humans the tools to predict outcome; humans infer the predictions based on their experiences and humans will draw wrong inferences more often than the correct inferences if left alone.

Questions need to be asked, and often. More importantly, those asked questions must address our proclivity to assume that correlation is causation, and our desire to make measurements a target.

 

 

 

 

Friday, April 1, 2022

Stats for Spikes-Serve Effectiveness

Service errors has become a heated topic when whether to  too grip-it and rip-it in our service games, especially in the men’s game. Girls, women’s coaches and coaches of beginning players have always complained about the service error and have questioned the grip-it-and-rip-it philosophy. Whereas boy’s and men’s  coaches maintain that is it a different game from the girl’s, women’s, and beginner’s game, and those who complain  just don't understand that an easy serve almost always end up an easy serve receive point for the receiving team.

This discussion came back today in one of the postings on VCT. I offhandedly gave a pseudo statistical comparison. I thought about it for a while and came up with a calculated metric that coaches can use to evaluate their team’s service game rather than relying on errors and aces.

I'm pretty sure this is not a universally original idea but it is original for me. I do think this might be an effective metric for teams to track so that they could see where their service game stacks statistically.

The idea is simple, it just uses the points scored statistic, which is ubiquitous. But we would also need to count those points that weren’t scored: the null result from a serve.

First,  we need to count the negative point scoring on our serve

  •        Opponents first ball serve receive points
  •       Our service error points given to the opponent.

Second, we count the positive point scoring on our serve.

  •        Our first ball transition attack points after the opponent’ serve receive attempt.
  •        Opponent’s serve receive attacking error points that we gain.
  •       This element is where it gets a little amorphous. We need to  count the negative points avoided because the opponent was not able to score on the first ball serve receive attempt. It could be thought of as a neutral play because the opponent did not score off of the serve receive. Since the argument is that a less aggressive serves mean a sure serve receive point for the opponent, we should be credited with a positive because the serve played an decisive  role in affecting the serve receive attack. It should be a plus for us because we avoided losing a sure point regardless of who won the point because of the serve. (These points can be weighted as being less than a full point if necessary.)

We can use simple percentages with the sum of the negative points, plus the positive points, plus the neutral points as the denominator. They should sum up to all the serves we executed.

We can use something similar to the kill percentage formula by putting our positive points we gained minus the negative points we gave to the opponent as the numerator. If the percentage is low or if it is negative, we know our service game is not effective. If it is overwhelming positive, then we know that our service game is effective.

Or we can just look at the positive point percentage versus the negative point percentage.

We might call it the service effectiveness percentage. Coaches can calculate this using their team’s historical statistics in this regard, if they have the data for the neutral points, and assess their service  effective for their season and/or for each game or match. Indeed, you can use this for the general service effectiveness of the team, it doesn’t have to be just about grip-it-and-rip-it.  

Just thinking out loud.

Tuesday, July 27, 2021

Stats for Spike-Let ‘er rip. Or not.

This topic comes up every quadrennial, or in this case, five years, when we are treated to the spectacle of volleyball in the Olympics, the showcase event for our sport. Why are these top-level athletes missing so many serves? They are the best of the best, why can’t they serve a ball over?

The philosophy is this: at the highest level of our game, a controlled serve does not put enough pressure on the passer to make them pass into an Out-of-System play; in order to put pressure on the passer, the serve must be as difficult to handle as possible, the intent is not necessarily to ace out the passer, but to get an Out-of-System pass.

 

As the coach, you have to give the passers the green light to rip it because human nature is such that the players respond intrinsically to the rewards/consequence cues that the coaches feed back to their play, i.e. if you want them to rip it at full speed, you have to be willing to take off all the constraints and learn to live with service errors. Coach Speraw's philosophy is that if the team does not make a certain number of service errors every set, the team is not serving aggressively which automatically translates to points for the opponent. 

 

There have been a number of responses to this idea on Volleyball Coaches and Trainers. Some sampling:

·       It is aesthetically unpleasing to see all the missed serves at this level, especially those serves into the middle of the net. Or it is shameful how our best players can’t serve a ball over the net.  I don’t think the players, or the coaches care about the aesthetics. They want to win, and to win, they must score points, to score points, they must serve tough, all the time. The aesthetically pleasing mindset also reveals a latent vein of thought: failure is undesirable and must be avoided. Even as coaches are beginning to come around to the mindset that errors are the natural way of testing the player’s skills in the competitive environment, the best way to learn,  and the only feedback mode that matters; we, in our weakest decision-making moments reach back to the safest and most certain decision-making bias because we are afraid of the unknown and we will inevitably reach for the known and safe.                The other part of this reaction is that many responders are assuming that the level of volleyball as we teach it to our players are played at the same speed, force and intensity as the volleyball that is played in the highest level of the game; that is definitely not the case. Most of us understand that the speed and athleticism displayed in every level of play is tightly coupled to the athletic and cognitive abilities of the players. As the players and games evolve, the strategies and tactics must also evolve to resolve the challenges presented at each level. We also forget that the  speed, force, and intensity at each level rises exponentially as the level of the game being played progresses.

·       Situational: when the score is close or you are approaching that end of set mark, you should just make sure that the serve is in. Or, we need to know when to throw in a changeup in our service to catch the other team unaware.  This statement will be addressed after I introduce some probabilistic definitions.

Probability and Volleyball Statistics

From Wikipedia: https://en.wikipedia.org/wiki/Probability

Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event and 1 indicates certainty.

How do we come up with the probabilities? In most cases, we use descriptive statistics, i.e. game statistics that are taken regularly in volleyball. This definition comes from (Spiegelhalter 2019).

To be clear, probability is usually associated with being a measure of uncertainty, whereas statistics are presented without including any measure of uncertainty.

There are numerous kinds of probability, the ones we usually employed are as follows, this is not a complete list:

·       Classical probability: The ratio of the number of outcomes favoring the event divided by the total number of possible outcomes, assuming the outcomes are all equally likely.  

It is easy to see that we can readily use this when we are doing counting statistics in volleyball. Hitting percentage, kill percentage, Service efficiency etc.

·       Long run frequency probability: based on the proportion of times an event occurs in an infinite sequence of identical experiments.

This kind of probability is the kind that we tend to lock into as the THE objective probability, i.e. this is the foundation of our beliefs when it comes to making our decisions. There are problems with this kind of probability, they are hard to compile because their robustness depends on the law of large number: big numbers in the denominator means that the probability is more likely to be true universally, but the statistics we take in sports have infinite numbers of correlation and factors which affects the statistics. If we chose to ignore the different factors, we are also choosing to include the effects of those factors without recognizing their effects while we are accumulating that large denominator.

·       Propensity or chance: idea that there is some objective tendency of the situation to produce an event.

This is what we choose to believe, this is the basis of our faith in the numbers.

·       Subjective or personal probability: specific persons judgment about a specific occasion, based on their current knowledge, and it's roughly interpreted in terms of the betting odds.

This is the probability that we employ when we go by gut feel or by our intuition. This has generally been dismissed as a decision-making tool because of the amount of bias that can be subconsciously included in the internal calculation of the personal probability. Yet we ignore this kind of probability at our own peril because experienced coaches and scouts have their personal probabilities which are not only valid, calibrated, but also rich in insights, which are not measurable nor consciously identifiable.

 

While professional sports have embraced the Moneyball religion, many professional teams are working with the greybeards in the sport to create a way to integrate the statistics driven approach with the personal probabilities of the experienced coaches because they are finding that just Moneyball is not enough to help them make decisions, that those intangible and unexplained gut feels might be valuable and they need to take advantage of both, in close coordination.

The most common way to generate probabilities is to take the game statistics and turn them into classical probability. The next step is to take as many of the same type of statistics and then combining them to create a long run frequency probability. The belief is that this is the Propensity or chance or the underlying objective probability of  process. For example, if I took my teams hitting percentage over the course of a season, then I have the probability by which I could use as the reference by which I can compare to each match. The problem is that the playing parameters of each match is different: easy serving team versus tough serving team, big blocking team versus a smaller team, teams that runs fast versus teams that sets high and slow, ad infinitum.

Now for the interpretation of the probability. Most people look at a probability to mean that given the same conditions, the results would mirror the probability, but the optimistic bias that we all have subconsciously makes us think that a 99% probability of success means that the action being performed successfully is a foregone conclusion, that is, we can predict that we are going to be successful 99% of the time, but we always neglect that 1% where we are not going to be successful. We would usually take that gamble because the odds are pretty good, but we also blame bad luck when we are not successful. Luck had nothing to do with it, the probability told us it was going to happen. The other part of the consideration is the assumption that the long run frequency probability is not as representative as we think it is because we are ignoring all the other factors.

Let ‘er rip

Back to the original discussion.

There are essentially two probabilities that comes into play in this decision.

The first probability is the success rate of a particular server. The second probability is the success rate of the opposing offense in siding out based on the kind of serves that they are served.

The probability of success of a particular server can be rewritten, if there were enough data into a probability distribution function: the classic bell curve, we are assuming a bell curve, normal or Gaussian distribution, because it is mostly true, and the bell curve allows us to use the classical probability data to extrapolate the results, to allow us to draw inference. Of course, as any good coach knows, the human factors play a large role in affecting the server success probability.  A ten-year-old serving in a game for the first time or  a player serving when their team is behind in the Olympics apply different psychological pressures to the server, even though the results may be the same, blow the serve and you suffer immense embarrassment. There are factors that play into the calculations as well: serving in a large gym that has bad background contrast for server depth perception; strange air flow patterns in an unknown gym, amongst many other things.

In the case of our national team, the coaches have been effusive in their praise of the statistical team, so I am sure that the coaches have good data on each server and their probabilities of success under many different conditions. I am also sure that the coaches make their own adjustments to those probabilities based on what they know of the psychological makeup of their players. I know that both teams employ sports psychologists. But that is just the side of equation that everyone seems to focus on directly.

The criticism is that they err too often. Many will bring up the fact that if they require their players to get the ball in no matter what, why can’t the national team do the same? The difference is that the jump topspin is a high risk, high reward serves, it is a chance that the national team coaches see as an acceptable risk, even if the high risk of the serve translates to higher error rates. The national team coaches understand this fact and are willing to take this chance because when compared with the probabilities from other side of the equation — the sideout rate of the other team when faced with different serves— the payoff outweighs the risks.

It is curious to me why this part of the decision-making equation is so often ignored. If there is a serve there needs to be a pass, they are continuous actions. They are, in essence, one event. We tend to split it into serve and pass components so that we can understand the action in terms of the skills, which sometimes leads us to erroneously view them as being independent actions. Serving statistics should always be spoken of as a conditional probability: we have  this kind probability of success given that we are serving this specific serve against a team that passes that specific serve with a given  probability of success. Passing should also always be spoken of as a conditional probability: we have this probability of success given that we are passing this specific serve against a team that serves the specific serve with a given probability of success.  It is at the confluence of the two conditional probabilities that we can make good decisions.

In this instance, the sideout rate of almost all the teams when served a less than full speed jump topspin has been assigned to be 100%; that is, anything that is less than full aggression will essentially give the opponent a free ball. Indeed, as we watch the Olympic teams, the sideout rate for serves delivered with full aggression are still high, but not 100%.

The question is: how realistic is this assumption? Are the coaches being intellectually lazy by just assuming that the opponents are that proficient at siding out under these circumstances? I don’t have access to the data. I would assume that all the national teams in the Olympic tournament would have extensive statistics about the top teams, at the minimum. What does the real sideout rate have to be in order for the national team coaches to back off of the Let ‘er Rip standard?  I am not sure, but seeing the serving game in Tokyo, I am going to go with the coaches.

Which brings us back to the serving abilities of all the servers. First, given the steadfast belief that the coaches have on the sideout proficiencies of the opponents and the high-risk nature of the jump topspin, no player with a horrible probability of success with a jump topspin would be allowed on the court, or they have been trained to improve their probability of success for a  jump topspin. The determining factor  has to do with the execution of the serve, how the players react to each situation, whether they hurry their serving ritual, whether they get distracted by the situation, or whether they are able to adjust to the environmental conditions. This then makes the decision not of strategy and tactic but of execution.

Someone suggested that perhaps the team can employ a surgical approach: go full out until the score gets close or when the score is close to being 25, then just get the ball in, whether by serving short or go to a jump float.  One problem is that if the teams show this kind of tendency, the opponents will undoubtedly know it, and it is difficult to trick your opponents at this level. The other problem is that since the players are human, and mostly risk averse by human nature, it is a mental struggle to actually go 100% on a swing because they will subconsciously pull back a little bit when executing, just to be safe, if they were told to pull back consciously, the edict would contradict the team philosophy of being actively aggressive, chances are that the serve that would result might be not effective, or that the opponent handles the obfuscation easily. This is not to say that it is a bad strategy, but making that decision requires that the player and coach know and understand how well the opponent knows the team and whether the opponents can be caught by surprise. More variables which contribute to the decision and complicates the calculation.

As I had said in the beginning, this discussion pops up all the time, mostly during the men’s college season and during the Olympics. It is a healthy discussion to have and I doubt this essay will change too many minds one way or the other, although I hope that this essay does introduce some probabilistic concepts as well as provide some other points of view to the coaches who have so passionately debated this particular topic.

Works Cited

Spiegelhalter, David. The Art of Statistics: Learning from Data. London: Pelican Books, 2019.

 

 


Thursday, April 29, 2021

Stats for Spikes- It's a Statistical Trap!

Sports can be viewed as a continuous flow of actions. We define discrete stages within the flow so that we can observe and analyze the reality of sports because we humans need to slow time down to a point where we can process what we are seeing in our minds. The stages that we define are used to develop an understanding of the flow; the stages do not  reflect the reality of the game. A natural stage marker in a rebound sport such as volleyball is the termination point, that is, when the ball is whistled dead.  Most of the statistics that we do keep — kills, assists, aces, blocks, block assists, and all the associated errors   —  results from  a dead ball. There are some statistics that we take that don’t directly happen at the stoppage of play, but they are statistics that lead to the dead ball: assists, passes, and digs are some that comes to mind. We also count the number of attempts as a way to decide on our efficiency numbers, those are statistics that do not fall into the dead ball/point scored category.

Taking statistics of a volleyball match gives us a simple picture of the match, but because most statistics that we can take are dead ball statistics, it only gives us the endings of a flurry of action. These simple statistics allows us to capture the facts as we know them according to the points scored. What is left not recorded is most of the match. Just as Mozart proposed about music: “The music is not in the notes, but in the silence between". Volleyball is in the movements between touches, and we are unable to take complete statistics on the space in between. Videos are often used today to capture those moments that are missing from the statistics, but not many coaches in the club and high school ranks have access or the staff to completely analyze videos.

As Dr. W. Edward Deming so famously observed: there are many things that are unmeasurable and there are many things that are unknowable. In the realm of sports, those moment between touches are unmeasurable. The reason for the movements of the individuals moving in a complicated and coordinated team dance with their teammates is unknowable. The way to capture the magic of the game between touches is as elusive as capturing the silence between the notes.

While it is critical for coaches to look at those scoring statistics and understand how they, or their opponents are scoring, we need to recognize that those statistics are but a minimal record of what took place. The scoring-based statistics ignores all the interaction between the individual playing in the game; the individual decisions made by each player and how those decisions are acted and reacted upon by their teammates and opponents; it also ignores the cumulative actions by the team as they react to an action and more importantly, whether they are acting and reacting according to how they had been trained to play.

The scoring-based statistics also ignores the effect of how the teams respond to each other. This point was made after the final match of the 2020 NCAA Division I championships between Kentucky and Texas. My friend and I were discussing the stellar play between these two teams. He made the observation that he was surprised at how seemingly porous the  Texas’ defense was, especially for a team that is playing in the national championship match. My response was incredulity. I believe that the reason Texas was losing on the defensive front was because the potency of the Kentucky offense, that the effectiveness of the Kentucky offense made the Texas defense look overwhelmed, which they were. The point is that sports is an activity based on dualities that act as a whole. Tough serving forces passing errors. Great passing makes great serving look like they were serving lollypops. Great blocking can make a porous backrow defense look like world beaters. A poor block can make the best defenders look hapless. Great setting can make a mediocre hitter look like an all American. Great hitter can make a poor setter look phenomenal. Great offenses can make good defenses look overwhelmed. Coaches know and understand these symbiotic relationships inherently.

Why is this so concerning? It is concerning if you are a coach and you don’t understand the back-and-forth flow of the game, it is concerning if you don’t understand that the two teams are coupled as participants in the game, that they cannot perform their intricate sports defined dances without the other, that they are connected through this pursuit we call the game of volleyball.

Most coaches understand this implicitly, most who are new to the game do not understand the implications of the interconnectedness of the two opposing sides.

Even the experienced coaches who understand the game well can fall into a trap set by the statistics. Recent studies revealed that our minds will easily and naturally adapt to new ways of working; naturally giving up old habits as our minds create new habits in reaction to new cognitive challenges. In The Shallows, Nicholas Carr explores the changes in cognitive behavior wrought by the internet: decreases in our attention span, our growing difficulty in focusing on a single task, our frustration in being unable to read for an extended period because we have adapted easily to reading short and simple articles versus hefty and complex books. Most pernicious is our waning ability to think in complicated and conceptual ways because we have adopted the habit of simplifying concepts down to base essentials. Note that I am not a luddite advocating for returning to adapting overcomplicated concepts to explain our games, just for the sake of exercising our cognition. A quote that is most often attributed to Albert Einstein states: Everything should be made as simple as possible, but not simpler. Which is a variant on Occam’s Razor or the law of parsimony. It is the not simpler part of the quote that applies here. Instead of overcomplicating our explanations for why the game moves the way it does, we are subconsciously oversimplifying our explanations in order to make our explanations fit the statistics we have collected.

The act of using volleyball statistics that is only taken for scoring points, narrows  a person’s frame of reference for their vision of the game flow through only the statistics. It changes the way a person’s brain operates, it emphasizes the singular and discrete dead ball dictated actions rather than the flow of a multitude of continuous action. Indeed, if he/she allows the statistical mindset to dominate his/her internal vision of the game,  the focus on statistics forces the coach to ignore the connections between the actions.

This focus on the recordable statistics encourages resulting: (https://polymathtobe.blogspot.com/2018/12/volleyball-coaching-life-resulting.html)

Resulting can be defined as our propensity to mistake the quality of our decisions with the outcome of the decision, that is, we let the result determine how we judge our decision.

Instead of following their global view of how the game is played, a coach would excuse what he/she would usually see as bad playing or making bad decisions by resulting, assuming that their team is playing well because they are winning, or they are scoring.

Our emphasis on using statistics comes from a natural reaction against coaches depending excessively on “gut feel” or passing the “eye test”. Those heuristics are more often than not fraught with biases that are subconscious as well. Statistics becomes extremely useful when coaches use statistics to determine whether their “gut feel” stands up to the challenges of reality. But if coaches’ understanding of the match is filtered through the statistics that are derived from just the points scored, then the coach’s focus is so narrowed that the reality that he/she sees is  distorted, their understanding of what is happening in the match is skewed, which affects their decision making, and ultimately impact their coaching.

This kind of distortion can roughly be interpreted as an application of  Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” (https://polymathtobe.blogspot.com/2021/03/stats-for-spikes-use-of-statistics-as.html)

This is not to say that this habit has overwhelmed the ranks of all coaches; while some experienced coaches may fall into this trap occasionally, I believe that their experience will come to the fore so that they catch themselves. My concern is with those coaches who are not experienced in seeing the game in all its multifaceted glory. Every coach has to start somewhere and if the coach in question did not have the  advantage of having played the game at a high level; if they had not studied the game and its pedagogy thoroughly; or if they have not thought through the game extensively, they would not have an internal vision of the game at its most competitive level. Those are the coaches that would most likely be susceptible to fall into the habit viewing the game through just the statistics.

Every beginning coach is looking for an edge, and statistics is an edge to be had, it is a very potent edge, but statistics is also just one tool in the toolbox; one need to use all the tools that are available. By adopting the statistics-based goggles, they are depriving themselves of a deeper understanding of the game, and they are doing a disservice to their profession and players by limiting themselves and their vision of the game to just a tiny part of the greater whole.

While experienced coaches can self-correct when they fall into the habit, the inexperienced coach will more than likely fall into the habit and not realize that they are in a trap.

So what to do?

·       Be aware: use the statistics but catch yourself getting too focused on the surface level of  statistics.
·       Avoid extrapolating or making inferences based on the surface level statistics.
·       Double check the statistics with your own observations, does the two pictures mesh?
·       Be aware of resulting. Question whether your team executed, you won or lost the point.
·       Trace the logical sequence of the game action.
·       Understand which questions you are asking, we will often substitute a question that has an answer in place of the question that we really want to ask, but we don’t have the data to answer the original question.
·       Understand and accept that there are data that can not be measured and knowledge that can not be known.
·       When in doubt, actively evoke Admiral Ackbar during the your systematic examination of information to make decisions.

Sunday, March 21, 2021

Stats for Spikes-Markov Chains

People have used the Markov chain to model volleyball for a while now. The presentation by Albert (Albert 2018) shows that Markov chains were used to:

·       Determine how long a game will last under each scheme, rally scoring or sideout scoring.

·       Determine the probability that a team wins in each scheme, rally scoring or sideout scoring.

·       Determine the value of serving first in sideout  scoring.

The Albert presentation conceptually shows how the volleyball probability tree is built using Markov chains. She demonstrates that Markov chains is useful in simulating the flow of the volleyball match so that the length of matches can be accurately determined, as it uses probability distribution functions to model the uncertainties in a match. If I recall correctly, Markov chains were used to help FIVB model the differences between sideout and rally scoring when they were thinking about changing the scoring.

This presentation piqued my interest in the Markov chain. I became curious about using the Markov chains to model the number of decision points in a match, and how these decisions are dependent and based on the probabilities that describes each action, how many different probabilities are necessary to improve the accuracy of the simulations.  

What I have done as an exercise is to model just the sequence of action in a serve receive situation. There are separate probability trees and flow charts that can be generated for different situations: my team serving, and continuous rallies. The flowchart for my team serving would be identical to the one I created for my team receiving, but with the roles and point winner reversed. There would also be a separate flowchart needed for when the team on defense reverts to offense for a counterattack, the probabilities used in this flowchart might be slightly different because of the counterattack may come from a more chaotic set of conditions, thereby changing the probabilities used. This is just a partial deep dive into the flow of the game and possible implications of the uncertainties coming from twelve individuals playing with a net. I had never intended to build a simulation based on the Markov chain; I leave that to others.

I used the flowcharting diagrams to map out the Markov chains and any errors in the assumptions and the flowcharting is entirely mine as I created this flowchart for my own edification. This is not a traditional way of representing Markov chains, but it made sense to me when I started looking into the Markov chains. It also demonstrates the complexities from the geometric concatenation of each succeeding action as the actions accumulate.

Purpose

There are two purposes:

·       Examine the number of decision points and all the probabilities that feeds into that decision.

·       Counting the number of probabilities that is necessary for just one rally.

Probability

A simple definition of a probability can be: the number of instances that an event A happens in a total of N attempts or opportunities where A could have happened. Pr(A)=Occurrences of A in total occurrences/Total Occurrences of the Event.

The reason that I set the number N to infinity is to show that the law of large numbers is at work and it is best to get as many samples as possible so that the probability calculated is as representative of the event A happening as possible.

Note that the probability is NOT a prediction, it is just a way to give the user a sense of what are the chances that A can happen. This also means that Not(A) can happen as well, the probability of that is Pr(Not A)=1-Pr(A). It can be either one, this is a critical concept to absorb.

Conditional Probabilities

I have used conditional probabilities to gain some granularity to show the dependence that the outcome of each action has on the immediately previous action. Markov chains model a specific event which is composed of many complex interactions of many previous events. Whether we like it or not, even as the play moves further away from the initial point of contact —the serve — each action level in the continued play is still historically dependent on that first contact, although the effect decrease dramatically as the play evolves away from the first contact. The conditional probability is the memory that is hardwired into the computations as it flows away from each of the past actions, because the effect of each previous action is already contained in the conditional probabilities, it is therefore not explicitly reiterated with every step.

The equation:  Pr(A|B) is read as the Probability of event A being true if we know that event B is true. In other words, the probability of A being true is dependent on the probability of B being true. This is how each level of action is linked to the previous level of action.

The probability of each result becoming true, whether it is a point for the serving team or for the receiving team can be calculated by following each action through the flowchart and multiplying the conditional probabilities for each level of action together until an end point is reached. It will become obvious as the process is explained.

Starting

The bubbles on the right notates the actions. The red oval indicates a point for the opponent. The blue oval indicates a point for us. The purple diamond indicates a decision point. The black parallelogram indicates an action and the kind of conditional probabilities that are associated with the action. The green parallelogram indicates a transition to another phase of the game which will follow another flowchart.

The first decision is which serve to execute, that decision is made by either the server or the coach of the serving team. What is left unmodelled is the decision process that goes into the server and the coach’s mind: which passers to target, which zones to attack with the serve etc., those decisions are left out for simplicity and brevity, mainly because this is not a rigorous exposition on every single consideration that goes into a decision. The probability of the success of the  serve chosen is based on the successes each  individual server has with each kind of serve, depending on whether there are enough data collected on the individual server to get a good probability distribution function. For this initial action, the five probabilities listed must sum up to 1: Pr(ST)+Pr(SF)+Pr(JT)+Pr(JF)+PR(S)=1


The next level of action involves the passing team response. I listed five possibilities: Shank, passes ranging from 1-3, and a service error. The action ends with an opponent/serving team point if the pass is shanked or if the passer is Aced and a receiving team point if the server commits a service error, whereas the action continues with a numbered passing value. Note that there are five conditional probability associated with each serve receiving outcome. There are 25 conditional probabilities that need to be collected. 

The action now shifts to a decision by the setter. Out of system plays and non-setters setting have been left out for the sake of brevity. Even with that simplification,  adequate amount of data needed to calculate those probabilities are difficult to accumulate. The setter and/or the coach will decide on the target of the set, conditional on the quality of the pass that the setter must work with. Buried in these conditional probabilities is the training and implicit bias that the setter has, such rules as: only set middles on a 3 pass, or only set outsides on a 2 pass, or only set back row attackers on a 1 pass. Whatever the prescribed solutions to the passing action, they are embedded in the conditional probabilities just as it is ingrained in the setter’s decision-making system. There are 15 conditional probabilities concatenated upon the 15 conditional probabilities based on the numbered passes which are a result of the receiving team’s reaction to the original five serving choices.

Once the setter has made their decision, the next level of results and associated conditional probabilities are given above. There are five results, one of them: the setter error, results in an opponent point; setter error can be mishandled ball, an attack error if the setter decides to attack, or an errant set. Which leaves us with four possible results: good set, attackable set, hitter need to adjust to the set, or the hitter must hit a down ball or free ball over. There are now 16 conditional probabilities based on who the setter decides to set. 


The next level of action comes from the hitter decision. There are seven results. I modelled this level while avoiding individualizing each of the hitters by creating  an “average” hitter by smearing all the statistics of all the hitter on the roster, which I had warned can lead to errant decisions (Wung 2021).

One of the results is the hitter error, which results in an opponent point. Six  conditional probabilities are left to carry on to the other side of the net.


 

The next action is the cumulative effect of the opponent’s defense, combining the reactions of the blockers and backrow defenders. There are three results: a stuff block which results in the opponents getting a point; a kill, where the attacking team gains a point; and the dug ball, where all the non-termination possibilities are combined: ricochets off the block which turns into a dug ball, or any combination of actions resulting in the opponent mounting a counter attack. This third option reverses the flow of the gameplay and the opponent team now becomes the offense and a new Markov chain flowchart needs to be created. There are 12 possible conditional probabilities.

The Point

There are 71 conditional probabilities that needs to be accumulated just for this Markov chain simulation of the serve receive game action. There are four decision points in the model, even though there could be many more if more granularities were to be pursued.

The point of this exercise is:

·       If we desired to predict the outcome of a set or even something as basic as a point, there needs to be massive amount of statistics that needs to be built up to create the conditional probability database, especially since we know that the law of large numbers tell us that we need to have many individual data points if we wish to have accurate conditional probabilities. To simulate the length of matches, these probabilities do not have to be extremely accurate, after all, the application of the Markov chain is not to accurately predict results, the idea is to predict the amount of time it takes to play a set or a match, accuracy and precision does not matter there.

·       More interesting is seeing just how many pieces of conditional probabilities affects the decision making at each decision point. Reflecting back on this thought, the immediate reaction is: of course, the swing depends on the quality of the set, which depends on the quality of the pass, which depends on the quality of the passer and the serve, etc. But it is sobering to realize the number of probabilities is needed to feed into one decision, and that calculation is done by the decision maker instantaneously.

·       Many of the conditional probabilities can be eliminated from consideration because of human bias and the strategic implications of the action, which does pare down the probabilities that have been listed, but even with a pared down list, the number of probabilities, conditional or otherwise, are very large.

As I look at my very rudimentary model, I think of all the uncertainties that are not modelled as I was trying to simplify the model and I think about how much each of them could have affect the outcome of the play. This is the problem of the unmodelled dynamics in control systems, which does affect the predictive ability of the model. But then I am reminded of one of the seven deadly management diseases cited by Dr. W. Edward Deming:  Management by using only of visible figures. (Deming 1982) Dr. Deming’s point is that there are many unknowns and many uncertainties that exists in any endeavor that involves many humans making many very human decisions. His admonition is that  it is foolhardy for decision makers to expect absolute accuracy from any system because there are many things that are unmeasurable and there are many things that are unknowable. Once we realize this, we understand that even as uncertainties and randomness affect what we do in our daily decision making, our need to absolutely eliminate uncertainties and randomness from our daily lives, and from our daily sports, is misguided. In addition, to not recognize this fact and to willingly pour more time and resource into eliminating or minimizing uncertainties and randomness in statistics is a fool’s errand. This is not to say that working on statistics will not give us more insight, we must always seek to learn more from the descriptive statistics that we have, creating statistical categories which will help us understand WHAT our team is doing, but we must never get into the mindset that our final goal is to eliminate uncertainties and randomness in our statistical ponderings; we need to understand our limitations.

Works Cited

Albert, Laura. "Volleyball analytics: Modeling volleyball using Markov chains." Slideshare.net. October 26, 2018. https://www.slideshare.net/lamclay/volleyball-analytics-modeling-volleyball-using-markov-chains (accessed March 19, 2021).

Deming, W. Edward. Out of the Crisis. Cambridge, MA: The MIT Press, 1982.

Wung, Peter. "Stats For Spikes-Use of Statistics as Goals." Musings and Ruminations. March 6, 2021. https://polymathtobe.blogspot.com/2021/03/stats-for-spikes-use-of-statistics-as.html (accessed March 6, 2021).

 






Saturday, March 6, 2021

Stats For Spikes-Use of Statistics as Goals

In this era of Moneyball, almost all sports are delving into how to coax wisdom from the numbers that are naturally generated from taking statistics from playing the game. The USA National Team has been active in this discovery process, there are coaches who are integrated into the coaching staff and are specifically dedicated to the creation, calculation, and analysis of meaningful statistics based on basic playing statistics. They indulge in the process of descriptive statistics, which is the act of capturing the details of game action — what happened with each act of playing the ball — through the act of assigning values to each action on the ball. This is particularly important in the fast paced, continuously changing, and competitive environment of international volleyball. The statistics staff continuously keep the coaches appraised of the game action as seen through the filter of statistics to cut through the cloud of human biases and perceptions.

Those of us who reside in the less rarified air of high school and club volleyball are also interested in using the statistics for our purposes. Even though we cannot possibly accrue that level of descriptive statistics  in our matches; because of our lack of resources, both human and technical, we sometimes try to use the statistics that we do have and try to use  inferential statistics to help us make decisions about how we should plan our training as well as measure our team’s progress throughout a season. If we wish to measure improvement, we need to first measure our base level of performance, whether it is for individual players and individual skills or for team performances during match play. Regardless of the parameters of the performance measures, we need to make those performance measures  before and after making any changes so that we can compare.  

What is not a given is the vast difference between descriptive statistics and inferential statistics. Inferential statistics is based on assumptions made about the processes under measure, whether they are all under the same conditions, whether the processes are under statistical control, and whether the measurement process is repeatable and reproduceable.

We have seen the same need for measurement and improvement when we observe other sports or any other aspects of human endeavors outside of sports wishing to transform observations into corrective action. Statistical Process Control (SPC), and especially Six Sigma Processes, have become ubiquitous in our vocabulary. Indeed, using statistical measures are the keys to creating consistent manufacturing processes, minimize process errors,  and increasing process throughput. Unfortunately, there are critical differences between the manufacturing environment and the sporting environment. In the manufacturing environment, the variability of the machines is measurably minimal because the machines are inanimate and they are, by and large, controllable. This is not to say that it is easy to control those variables; the controllability problem in manufacturing can be difficult because the threshold of error is small and the required signal to noise ratio is large.

In the sporting environment, human actions and responses can be random to the extreme, which drives the uncertainty in the sporting process; to make matters worse, the uncertainties associated with each individual are coupled so that the impact of one person’s randomness is not just limited to the actions of that person, but affects every other person taking part: every player on both teams, the officials, the coaching staff etc. all contribute to the aggregation of uncertainties in all statistical measures. The coupling effect may be miniscule so that much of the coupling can be ignored, but not all couplings can be easily ignored. This is true of the instantaneous descriptive statistics taken during the matches as well, but the averaging in descriptive statistics is minimal as compared to the accruing of the larger statistics that are used to draw inference. For example, a good server influences not only the passer but also the setter, the hitter; the interaction can have secondary and tertiary effects on how the serving team plays as they react to the actions of the passing team. Each action in volleyball, as with most sports, depends on prior actions.

So why talk about this? Because there are many coaches who ask the following form of question: “I have  a Name a Level and Age team, what statistical threshold should my team be performing at when we are performing Name a Volleyball Action?”  

The intent of the question is clear. The coach is trying to determine a reference level of performance for comparison against what they can measure of their own teams. The question is a loaded one because since sports are dependent upon prior actions; that is, there is no way to separate and isolate a specific game action from all that had led up to it, the statistics taken is conditional upon the prior actions, but the measure that we take are singular dimensioned, the measure never truly reflect the deep coupling of the actions.  

To further compound the amount of uncertainties, many assumptions are assumed and tacitly made. The usual practice in statistics is to take many different sets of the one kind of data and aggregating them into one representative set of data by averaging many datasets together. Averaging, as with all things, has its advantages and disadvantages. The advantage is that many datasets can be used to create a uniform and representative set of data which can give the user a good idea of what the general trend is for specific variables: how well the team is performing and how well each player is performing in each of the measured skills. Rather than diving through massive amounts of data, the aggregate data is used. The assumption is that the aggregation is an accurate representation of your team. Which brings us to the disadvantage of taking averages. When an average is taken, the salient contributing factors are smeared, that is: the highs and lows of all the datasets are nullified in deference to the average. The result is that we have erased the unique contributions and variations of individual opponent as well as the team of interest. In essence, averaging your own team statistics creates a fictional “average” representation of your  team. More subtly, the statistics  generated is presumed to be against yet another fictional  average” team, that of the opponents. This is problematic because the opposing team’s actions are what elicits the response from your team, so the weaknesses inherent to your players and your team in aggregate is disguised by the average of the opponent, which negates any insight that you may gain regarding your team and what you would need to correct in training. Another issue is that  by using an “average representation of your team against the “average opponent, you are obscuring the specifics of how your team plays: problem rotations that you may have against a good team or a good player. You are also erasing the problems that you may have in certain situations, like passing in the seams or hitting line. You are averaging out your best player’s statistics along with averaging out your worst player’s statistics, so you are unable to identify the problem area.

Note that all the aforementioned situational information are easily available from the descriptive statistics taken during the game, it is when we try to infer our team’s future performance from comparing our team’s general “average” performance against the performance of our general “average” opponent’s against us that the inferential value of the exercise disappears.

Another, more subtle, logical fallacy has also been made and assumed in addition to the averaging problem.  

“When a measure becomes a target, it ceases to be a good measure.”

Goodhart’s Law as articulated by Marilyn Strathern, Anthropologist

What does the above statement mean?

It means that we selectively choose meaningful measures to help us in determining the truth in what we experience by observing specific variables that will validate or refute the pictures in our minds. The measurement should be performed unobtrusively in order to not affect the  outcome of what we are trying to observe, but if we tried to take a shortcut in our observations by making reality conform to what we perceive we need to observe, i.e., if we made the team aim at the expected measure as targets, we are then skewing our player’s minds to perform according to the artificial horizons set by the measure/target rather than what we hope to achieve: maximize  performance over all of the variables, and more importantly, winning. A good statistical lesson to remember is that correlation does not equal causation. Just because two sets of data correlate do not mean that the one result follows the other.

Using fictitious volleyball truisms as targets for a team can actually hurt the team’s chances.  I was a firm believer in many truisms: the ace to error ratio must be at least one if you want to succeed; teams passing an average of 2.4 on a 3-point scale will always win; set distributions, in order, must be the most sets to the left side, followed by the middle, the right side, and then finally the backrow to win.

Once again, the problem with the truisms is that correlation does not equal causation. When coaches practice and train while using artificially determined goals as the target, the measure stop being the clue to the secret of team performance, they become the target and the end goal. People will focus on the target and work towards achieving that goal and ignore the fact that the purpose of playing the game is to be the winner when the last ball drops. When players get preoccupied by the artificial horizons set by the coaching staff, they are putting the winning and losing and their overall game performance secondary. All coaches have stories about how their teams did everything perfectly according to the statistics and still lost, and vice versa. Statistics should not be the goal; it should be a way to augment the picture that everyone has of the reality they are experiencing.

A Digression.

The  use of averaging happens in real life and it is ubiquitous; even in the way we assess players in general. The avcaVPI measure was instituted to ostensibly help the college athlete determine whether they can play in college as well as help college coaches find the players to recruit based on physical measures. The idea is to use the avcaVPI score to help give players and coaches an idea about how the players would fit in given college divisions and programs by comparing their physical attributes, as measured in non-competitive environments, compare to those who are already playing in college. When the initial VPI measure came out, I remember that it was simply a single score which is an aggregate of various physical measures done at the testing sites. The initial criticism was that the players were not compared against players playing their positions; to AVCA’s credit, it looks like they have corrected that oversight, although the avcaVPI data doesn’t seem like it is further segregated into NCAA, NJCAA, or NAIA divisions, although I could be wrong. The avcaVPI scores are categorized and ranked according to positions, and the percentile where each player fit in each test category as compared to the players who are already playing in college, which is much more useful than before, but it is still  misleading. What is left unsaid again is that correlation does not equal causation, because if a player’s physical measure falls within  the percentile rank of the existing college players, that does not mean that the player is going to get recruited to play in college.

Talent evaluation is a very tricky and uncertain process, just ask any NFL team about their ability to identify a quality quarterback, and then point out that Tom Brady was drafted in the sixth round of 2000 NFL Draft by the New England Patriots, 199th overall, and was the seventh quarterback taken.

The avcaVPI really does very little to clear up the collegiate volleyball recruiting picture for all involved.