Followers

Search This Blog

Sunday, March 21, 2021

Stats for Spikes-Markov Chains

People have used the Markov chain to model volleyball for a while now. The presentation by Albert (Albert 2018) shows that Markov chains were used to:

·       Determine how long a game will last under each scheme, rally scoring or sideout scoring.

·       Determine the probability that a team wins in each scheme, rally scoring or sideout scoring.

·       Determine the value of serving first in sideout  scoring.

The Albert presentation conceptually shows how the volleyball probability tree is built using Markov chains. She demonstrates that Markov chains is useful in simulating the flow of the volleyball match so that the length of matches can be accurately determined, as it uses probability distribution functions to model the uncertainties in a match. If I recall correctly, Markov chains were used to help FIVB model the differences between sideout and rally scoring when they were thinking about changing the scoring.

This presentation piqued my interest in the Markov chain. I became curious about using the Markov chains to model the number of decision points in a match, and how these decisions are dependent and based on the probabilities that describes each action, how many different probabilities are necessary to improve the accuracy of the simulations.  

What I have done as an exercise is to model just the sequence of action in a serve receive situation. There are separate probability trees and flow charts that can be generated for different situations: my team serving, and continuous rallies. The flowchart for my team serving would be identical to the one I created for my team receiving, but with the roles and point winner reversed. There would also be a separate flowchart needed for when the team on defense reverts to offense for a counterattack, the probabilities used in this flowchart might be slightly different because of the counterattack may come from a more chaotic set of conditions, thereby changing the probabilities used. This is just a partial deep dive into the flow of the game and possible implications of the uncertainties coming from twelve individuals playing with a net. I had never intended to build a simulation based on the Markov chain; I leave that to others.

I used the flowcharting diagrams to map out the Markov chains and any errors in the assumptions and the flowcharting is entirely mine as I created this flowchart for my own edification. This is not a traditional way of representing Markov chains, but it made sense to me when I started looking into the Markov chains. It also demonstrates the complexities from the geometric concatenation of each succeeding action as the actions accumulate.

Purpose

There are two purposes:

·       Examine the number of decision points and all the probabilities that feeds into that decision.

·       Counting the number of probabilities that is necessary for just one rally.

Probability

A simple definition of a probability can be: the number of instances that an event A happens in a total of N attempts or opportunities where A could have happened. Pr(A)=Occurrences of A in total occurrences/Total Occurrences of the Event.

The reason that I set the number N to infinity is to show that the law of large numbers is at work and it is best to get as many samples as possible so that the probability calculated is as representative of the event A happening as possible.

Note that the probability is NOT a prediction, it is just a way to give the user a sense of what are the chances that A can happen. This also means that Not(A) can happen as well, the probability of that is Pr(Not A)=1-Pr(A). It can be either one, this is a critical concept to absorb.

Conditional Probabilities

I have used conditional probabilities to gain some granularity to show the dependence that the outcome of each action has on the immediately previous action. Markov chains model a specific event which is composed of many complex interactions of many previous events. Whether we like it or not, even as the play moves further away from the initial point of contact —the serve — each action level in the continued play is still historically dependent on that first contact, although the effect decrease dramatically as the play evolves away from the first contact. The conditional probability is the memory that is hardwired into the computations as it flows away from each of the past actions, because the effect of each previous action is already contained in the conditional probabilities, it is therefore not explicitly reiterated with every step.

The equation:  Pr(A|B) is read as the Probability of event A being true if we know that event B is true. In other words, the probability of A being true is dependent on the probability of B being true. This is how each level of action is linked to the previous level of action.

The probability of each result becoming true, whether it is a point for the serving team or for the receiving team can be calculated by following each action through the flowchart and multiplying the conditional probabilities for each level of action together until an end point is reached. It will become obvious as the process is explained.

Starting

The bubbles on the right notates the actions. The red oval indicates a point for the opponent. The blue oval indicates a point for us. The purple diamond indicates a decision point. The black parallelogram indicates an action and the kind of conditional probabilities that are associated with the action. The green parallelogram indicates a transition to another phase of the game which will follow another flowchart.

The first decision is which serve to execute, that decision is made by either the server or the coach of the serving team. What is left unmodelled is the decision process that goes into the server and the coach’s mind: which passers to target, which zones to attack with the serve etc., those decisions are left out for simplicity and brevity, mainly because this is not a rigorous exposition on every single consideration that goes into a decision. The probability of the success of the  serve chosen is based on the successes each  individual server has with each kind of serve, depending on whether there are enough data collected on the individual server to get a good probability distribution function. For this initial action, the five probabilities listed must sum up to 1: Pr(ST)+Pr(SF)+Pr(JT)+Pr(JF)+PR(S)=1


The next level of action involves the passing team response. I listed five possibilities: Shank, passes ranging from 1-3, and a service error. The action ends with an opponent/serving team point if the pass is shanked or if the passer is Aced and a receiving team point if the server commits a service error, whereas the action continues with a numbered passing value. Note that there are five conditional probability associated with each serve receiving outcome. There are 25 conditional probabilities that need to be collected. 

The action now shifts to a decision by the setter. Out of system plays and non-setters setting have been left out for the sake of brevity. Even with that simplification,  adequate amount of data needed to calculate those probabilities are difficult to accumulate. The setter and/or the coach will decide on the target of the set, conditional on the quality of the pass that the setter must work with. Buried in these conditional probabilities is the training and implicit bias that the setter has, such rules as: only set middles on a 3 pass, or only set outsides on a 2 pass, or only set back row attackers on a 1 pass. Whatever the prescribed solutions to the passing action, they are embedded in the conditional probabilities just as it is ingrained in the setter’s decision-making system. There are 15 conditional probabilities concatenated upon the 15 conditional probabilities based on the numbered passes which are a result of the receiving team’s reaction to the original five serving choices.

Once the setter has made their decision, the next level of results and associated conditional probabilities are given above. There are five results, one of them: the setter error, results in an opponent point; setter error can be mishandled ball, an attack error if the setter decides to attack, or an errant set. Which leaves us with four possible results: good set, attackable set, hitter need to adjust to the set, or the hitter must hit a down ball or free ball over. There are now 16 conditional probabilities based on who the setter decides to set. 


The next level of action comes from the hitter decision. There are seven results. I modelled this level while avoiding individualizing each of the hitters by creating  an “average” hitter by smearing all the statistics of all the hitter on the roster, which I had warned can lead to errant decisions (Wung 2021).

One of the results is the hitter error, which results in an opponent point. Six  conditional probabilities are left to carry on to the other side of the net.


 

The next action is the cumulative effect of the opponent’s defense, combining the reactions of the blockers and backrow defenders. There are three results: a stuff block which results in the opponents getting a point; a kill, where the attacking team gains a point; and the dug ball, where all the non-termination possibilities are combined: ricochets off the block which turns into a dug ball, or any combination of actions resulting in the opponent mounting a counter attack. This third option reverses the flow of the gameplay and the opponent team now becomes the offense and a new Markov chain flowchart needs to be created. There are 12 possible conditional probabilities.

The Point

There are 71 conditional probabilities that needs to be accumulated just for this Markov chain simulation of the serve receive game action. There are four decision points in the model, even though there could be many more if more granularities were to be pursued.

The point of this exercise is:

·       If we desired to predict the outcome of a set or even something as basic as a point, there needs to be massive amount of statistics that needs to be built up to create the conditional probability database, especially since we know that the law of large numbers tell us that we need to have many individual data points if we wish to have accurate conditional probabilities. To simulate the length of matches, these probabilities do not have to be extremely accurate, after all, the application of the Markov chain is not to accurately predict results, the idea is to predict the amount of time it takes to play a set or a match, accuracy and precision does not matter there.

·       More interesting is seeing just how many pieces of conditional probabilities affects the decision making at each decision point. Reflecting back on this thought, the immediate reaction is: of course, the swing depends on the quality of the set, which depends on the quality of the pass, which depends on the quality of the passer and the serve, etc. But it is sobering to realize the number of probabilities is needed to feed into one decision, and that calculation is done by the decision maker instantaneously.

·       Many of the conditional probabilities can be eliminated from consideration because of human bias and the strategic implications of the action, which does pare down the probabilities that have been listed, but even with a pared down list, the number of probabilities, conditional or otherwise, are very large.

As I look at my very rudimentary model, I think of all the uncertainties that are not modelled as I was trying to simplify the model and I think about how much each of them could have affect the outcome of the play. This is the problem of the unmodelled dynamics in control systems, which does affect the predictive ability of the model. But then I am reminded of one of the seven deadly management diseases cited by Dr. W. Edward Deming:  Management by using only of visible figures. (Deming 1982) Dr. Deming’s point is that there are many unknowns and many uncertainties that exists in any endeavor that involves many humans making many very human decisions. His admonition is that  it is foolhardy for decision makers to expect absolute accuracy from any system because there are many things that are unmeasurable and there are many things that are unknowable. Once we realize this, we understand that even as uncertainties and randomness affect what we do in our daily decision making, our need to absolutely eliminate uncertainties and randomness from our daily lives, and from our daily sports, is misguided. In addition, to not recognize this fact and to willingly pour more time and resource into eliminating or minimizing uncertainties and randomness in statistics is a fool’s errand. This is not to say that working on statistics will not give us more insight, we must always seek to learn more from the descriptive statistics that we have, creating statistical categories which will help us understand WHAT our team is doing, but we must never get into the mindset that our final goal is to eliminate uncertainties and randomness in our statistical ponderings; we need to understand our limitations.

Works Cited

Albert, Laura. "Volleyball analytics: Modeling volleyball using Markov chains." Slideshare.net. October 26, 2018. https://www.slideshare.net/lamclay/volleyball-analytics-modeling-volleyball-using-markov-chains (accessed March 19, 2021).

Deming, W. Edward. Out of the Crisis. Cambridge, MA: The MIT Press, 1982.

Wung, Peter. "Stats For Spikes-Use of Statistics as Goals." Musings and Ruminations. March 6, 2021. https://polymathtobe.blogspot.com/2021/03/stats-for-spikes-use-of-statistics-as.html (accessed March 6, 2021).

 






Saturday, March 6, 2021

Stats For Spikes-Use of Statistics as Goals

In this era of Moneyball, almost all sports are delving into how to coax wisdom from the numbers that are naturally generated from taking statistics from playing the game. The USA National Team has been active in this discovery process, there are coaches who are integrated into the coaching staff and are specifically dedicated to the creation, calculation, and analysis of meaningful statistics based on basic playing statistics. They indulge in the process of descriptive statistics, which is the act of capturing the details of game action — what happened with each act of playing the ball — through the act of assigning values to each action on the ball. This is particularly important in the fast paced, continuously changing, and competitive environment of international volleyball. The statistics staff continuously keep the coaches appraised of the game action as seen through the filter of statistics to cut through the cloud of human biases and perceptions.

Those of us who reside in the less rarified air of high school and club volleyball are also interested in using the statistics for our purposes. Even though we cannot possibly accrue that level of descriptive statistics  in our matches; because of our lack of resources, both human and technical, we sometimes try to use the statistics that we do have and try to use  inferential statistics to help us make decisions about how we should plan our training as well as measure our team’s progress throughout a season. If we wish to measure improvement, we need to first measure our base level of performance, whether it is for individual players and individual skills or for team performances during match play. Regardless of the parameters of the performance measures, we need to make those performance measures  before and after making any changes so that we can compare.  

What is not a given is the vast difference between descriptive statistics and inferential statistics. Inferential statistics is based on assumptions made about the processes under measure, whether they are all under the same conditions, whether the processes are under statistical control, and whether the measurement process is repeatable and reproduceable.

We have seen the same need for measurement and improvement when we observe other sports or any other aspects of human endeavors outside of sports wishing to transform observations into corrective action. Statistical Process Control (SPC), and especially Six Sigma Processes, have become ubiquitous in our vocabulary. Indeed, using statistical measures are the keys to creating consistent manufacturing processes, minimize process errors,  and increasing process throughput. Unfortunately, there are critical differences between the manufacturing environment and the sporting environment. In the manufacturing environment, the variability of the machines is measurably minimal because the machines are inanimate and they are, by and large, controllable. This is not to say that it is easy to control those variables; the controllability problem in manufacturing can be difficult because the threshold of error is small and the required signal to noise ratio is large.

In the sporting environment, human actions and responses can be random to the extreme, which drives the uncertainty in the sporting process; to make matters worse, the uncertainties associated with each individual are coupled so that the impact of one person’s randomness is not just limited to the actions of that person, but affects every other person taking part: every player on both teams, the officials, the coaching staff etc. all contribute to the aggregation of uncertainties in all statistical measures. The coupling effect may be miniscule so that much of the coupling can be ignored, but not all couplings can be easily ignored. This is true of the instantaneous descriptive statistics taken during the matches as well, but the averaging in descriptive statistics is minimal as compared to the accruing of the larger statistics that are used to draw inference. For example, a good server influences not only the passer but also the setter, the hitter; the interaction can have secondary and tertiary effects on how the serving team plays as they react to the actions of the passing team. Each action in volleyball, as with most sports, depends on prior actions.

So why talk about this? Because there are many coaches who ask the following form of question: “I have  a Name a Level and Age team, what statistical threshold should my team be performing at when we are performing Name a Volleyball Action?”  

The intent of the question is clear. The coach is trying to determine a reference level of performance for comparison against what they can measure of their own teams. The question is a loaded one because since sports are dependent upon prior actions; that is, there is no way to separate and isolate a specific game action from all that had led up to it, the statistics taken is conditional upon the prior actions, but the measure that we take are singular dimensioned, the measure never truly reflect the deep coupling of the actions.  

To further compound the amount of uncertainties, many assumptions are assumed and tacitly made. The usual practice in statistics is to take many different sets of the one kind of data and aggregating them into one representative set of data by averaging many datasets together. Averaging, as with all things, has its advantages and disadvantages. The advantage is that many datasets can be used to create a uniform and representative set of data which can give the user a good idea of what the general trend is for specific variables: how well the team is performing and how well each player is performing in each of the measured skills. Rather than diving through massive amounts of data, the aggregate data is used. The assumption is that the aggregation is an accurate representation of your team. Which brings us to the disadvantage of taking averages. When an average is taken, the salient contributing factors are smeared, that is: the highs and lows of all the datasets are nullified in deference to the average. The result is that we have erased the unique contributions and variations of individual opponent as well as the team of interest. In essence, averaging your own team statistics creates a fictional “average” representation of your  team. More subtly, the statistics  generated is presumed to be against yet another fictional  average” team, that of the opponents. This is problematic because the opposing team’s actions are what elicits the response from your team, so the weaknesses inherent to your players and your team in aggregate is disguised by the average of the opponent, which negates any insight that you may gain regarding your team and what you would need to correct in training. Another issue is that  by using an “average representation of your team against the “average opponent, you are obscuring the specifics of how your team plays: problem rotations that you may have against a good team or a good player. You are also erasing the problems that you may have in certain situations, like passing in the seams or hitting line. You are averaging out your best player’s statistics along with averaging out your worst player’s statistics, so you are unable to identify the problem area.

Note that all the aforementioned situational information are easily available from the descriptive statistics taken during the game, it is when we try to infer our team’s future performance from comparing our team’s general “average” performance against the performance of our general “average” opponent’s against us that the inferential value of the exercise disappears.

Another, more subtle, logical fallacy has also been made and assumed in addition to the averaging problem.  

“When a measure becomes a target, it ceases to be a good measure.”

Goodhart’s Law as articulated by Marilyn Strathern, Anthropologist

What does the above statement mean?

It means that we selectively choose meaningful measures to help us in determining the truth in what we experience by observing specific variables that will validate or refute the pictures in our minds. The measurement should be performed unobtrusively in order to not affect the  outcome of what we are trying to observe, but if we tried to take a shortcut in our observations by making reality conform to what we perceive we need to observe, i.e., if we made the team aim at the expected measure as targets, we are then skewing our player’s minds to perform according to the artificial horizons set by the measure/target rather than what we hope to achieve: maximize  performance over all of the variables, and more importantly, winning. A good statistical lesson to remember is that correlation does not equal causation. Just because two sets of data correlate do not mean that the one result follows the other.

Using fictitious volleyball truisms as targets for a team can actually hurt the team’s chances.  I was a firm believer in many truisms: the ace to error ratio must be at least one if you want to succeed; teams passing an average of 2.4 on a 3-point scale will always win; set distributions, in order, must be the most sets to the left side, followed by the middle, the right side, and then finally the backrow to win.

Once again, the problem with the truisms is that correlation does not equal causation. When coaches practice and train while using artificially determined goals as the target, the measure stop being the clue to the secret of team performance, they become the target and the end goal. People will focus on the target and work towards achieving that goal and ignore the fact that the purpose of playing the game is to be the winner when the last ball drops. When players get preoccupied by the artificial horizons set by the coaching staff, they are putting the winning and losing and their overall game performance secondary. All coaches have stories about how their teams did everything perfectly according to the statistics and still lost, and vice versa. Statistics should not be the goal; it should be a way to augment the picture that everyone has of the reality they are experiencing.

A Digression.

The  use of averaging happens in real life and it is ubiquitous; even in the way we assess players in general. The avcaVPI measure was instituted to ostensibly help the college athlete determine whether they can play in college as well as help college coaches find the players to recruit based on physical measures. The idea is to use the avcaVPI score to help give players and coaches an idea about how the players would fit in given college divisions and programs by comparing their physical attributes, as measured in non-competitive environments, compare to those who are already playing in college. When the initial VPI measure came out, I remember that it was simply a single score which is an aggregate of various physical measures done at the testing sites. The initial criticism was that the players were not compared against players playing their positions; to AVCA’s credit, it looks like they have corrected that oversight, although the avcaVPI data doesn’t seem like it is further segregated into NCAA, NJCAA, or NAIA divisions, although I could be wrong. The avcaVPI scores are categorized and ranked according to positions, and the percentile where each player fit in each test category as compared to the players who are already playing in college, which is much more useful than before, but it is still  misleading. What is left unsaid again is that correlation does not equal causation, because if a player’s physical measure falls within  the percentile rank of the existing college players, that does not mean that the player is going to get recruited to play in college.

Talent evaluation is a very tricky and uncertain process, just ask any NFL team about their ability to identify a quality quarterback, and then point out that Tom Brady was drafted in the sixth round of 2000 NFL Draft by the New England Patriots, 199th overall, and was the seventh quarterback taken.

The avcaVPI really does very little to clear up the collegiate volleyball recruiting picture for all involved.