People have used the Markov chain to model volleyball for a
while now. The presentation by Albert
·
Determine how long a game will last under each
scheme, rally scoring or sideout scoring.
·
Determine the probability that a team wins in
each scheme, rally scoring or sideout scoring.
·
Determine the value of serving first in sideout scoring.
The Albert presentation conceptually shows how the volleyball
probability tree is built using Markov chains. She demonstrates that Markov
chains is useful in simulating the flow of the volleyball match so that the
length of matches can be accurately determined, as it uses probability
distribution functions to model the uncertainties in a match. If I recall
correctly, Markov chains were used to help FIVB model the differences between
sideout and rally scoring when they were thinking about changing the scoring.
This presentation piqued my interest in the Markov chain. I became
curious about using the Markov chains to model the number of decision points in
a match, and how these decisions are dependent and based on the probabilities
that describes each action, how many different probabilities are necessary to improve
the accuracy of the simulations.
What I have done as an exercise is to model just the
sequence of action in a serve receive situation. There are separate probability
trees and flow charts that can be generated for different situations: my team
serving, and continuous rallies. The flowchart for my team serving would be
identical to the one I created for my team receiving, but with the roles and
point winner reversed. There would also be a separate flowchart needed for when
the team on defense reverts to offense for a counterattack, the probabilities
used in this flowchart might be slightly different because of the counterattack
may come from a more chaotic set of conditions, thereby changing the probabilities
used. This is just a partial deep dive into the flow of the game and possible
implications of the uncertainties coming from twelve individuals playing with a
net. I had never intended to build a simulation based on the Markov chain; I
leave that to others.
I used the flowcharting diagrams to map out the Markov
chains and any errors in the assumptions and the flowcharting is entirely mine
as I created this flowchart for my own edification. This is not a traditional
way of representing Markov chains, but it made sense to me when I started
looking into the Markov chains. It also demonstrates the complexities from the geometric
concatenation of each succeeding action as the actions accumulate.
Purpose
There are two purposes:
·
Examine the number of decision points and all
the probabilities that feeds into that decision.
·
Counting the number of probabilities that is
necessary for just one rally.
Probability
A simple definition of a probability can be: the number of
instances that an event A happens in a total of N attempts or opportunities
where A could have happened. Pr(A)=Occurrences of A in total occurrences/Total Occurrences
of the Event.
The reason that I set the number N to infinity is to show
that the law of large numbers is at work and it is best to get as many samples
as possible so that the probability calculated is as representative of the
event A happening as possible.
Note that the probability is NOT a prediction, it is just a
way to give the user a sense of what are the chances that A can happen. This
also means that Not(A) can happen as well, the probability of that is Pr(Not A)=1-Pr(A).
It can be either one, this is a critical concept to absorb.
Conditional Probabilities
I have used conditional probabilities to gain some
granularity to show the dependence that the outcome of each action has on the immediately
previous action. Markov chains model a specific event which is composed of many
complex interactions of many previous events. Whether we like it or not, even
as the play moves further away from the initial point of contact —the serve — each
action level in the continued play is still historically dependent on that
first contact, although the effect decrease dramatically as the play evolves
away from the first contact. The conditional probability is the memory that is
hardwired into the computations as it flows away from each of the past actions,
because the effect of each previous action is already contained in the
conditional probabilities, it is therefore not explicitly reiterated with every
step.
The equation: Pr(A|B)
is read as the Probability of event A being true if we know that event B is
true. In other words, the probability of A being true is dependent on the
probability of B being true. This is how each level of action is linked to the
previous level of action.
The probability of each result becoming true, whether it is
a point for the serving team or for the receiving team can be calculated by
following each action through the flowchart and multiplying the conditional
probabilities for each level of action together until an end point is reached.
It will become obvious as the process is explained.
Starting
The bubbles on the right notates the actions. The red oval
indicates a point for the opponent. The blue oval indicates a point for us. The
purple diamond indicates a decision point. The black parallelogram indicates an
action and the kind of conditional probabilities that are associated with the
action. The green parallelogram indicates a transition to another phase of the
game which will follow another flowchart.
The first decision is which serve to execute, that decision
is made by either the server or the coach of the serving team. What is left
unmodelled is the decision process that goes into the server and the coach’s mind:
which passers to target, which zones to attack with the serve etc., those
decisions are left out for simplicity and brevity, mainly because this is not a
rigorous exposition on every single consideration that goes into a decision. The
probability of the success of the serve chosen
is based on the successes each individual
server has with each kind of serve, depending on whether there are enough data
collected on the individual server to get a good probability distribution
function. For this initial action, the five probabilities listed must sum up to
1: Pr(ST)+Pr(SF)+Pr(JT)+Pr(JF)+PR(S)=1
The next level of action involves the passing team response.
I listed five possibilities: Shank, passes ranging from 1-3, and a service
error. The action ends with an opponent/serving team point if the pass is
shanked or if the passer is Aced and a receiving team point if the server
commits a service error, whereas the action continues with a numbered passing
value. Note that there are five conditional probability associated with each
serve receiving outcome. There are 25 conditional probabilities that need to be
collected.
Once the setter has made their decision, the next level of results
and associated conditional probabilities are given above. There are five
results, one of them: the setter error, results in an opponent point; setter
error can be mishandled ball, an attack error if the setter decides to attack,
or an errant set. Which leaves us with four possible results: good set,
attackable set, hitter need to adjust to the set, or the hitter must hit a down
ball or free ball over. There are now 16 conditional probabilities based on who
the setter decides to set.
The next level of action
comes from the hitter decision. There are seven results. I modelled this level while
avoiding individualizing each of the hitters by creating an “average” hitter by smearing all the
statistics of all the hitter on the roster, which I had warned can lead to
errant decisions
One of the results is the hitter error, which results in an opponent point. Six conditional probabilities are left to carry on to the other side of the net.
The next action is the
cumulative effect of the opponent’s defense, combining the reactions of the
blockers and backrow defenders. There are three results: a stuff block which
results in the opponents getting a point; a kill, where the attacking team
gains a point; and the dug ball, where all the non-termination possibilities
are combined: ricochets off the block which turns into a dug ball, or any
combination of actions resulting in the opponent mounting a counter attack. This
third option reverses the flow of the gameplay and the opponent team now becomes
the offense and a new Markov chain flowchart needs to be created. There are 12
possible conditional probabilities.
The Point
There are 71 conditional probabilities that needs to be
accumulated just for this Markov chain simulation of the serve receive game
action. There are four decision points in the model, even though there could be
many more if more granularities were to be pursued.
The point of this exercise is:
·
If we desired to predict the outcome of a set or
even something as basic as a point, there needs to be massive amount of
statistics that needs to be built up to create the conditional probability
database, especially since we know that the law of large numbers tell us that
we need to have many individual data points if we wish to have accurate
conditional probabilities. To simulate the length of matches, these
probabilities do not have to be extremely accurate, after all, the application
of the Markov chain is not to accurately predict results, the idea is to
predict the amount of time it takes to play a set or a match, accuracy and
precision does not matter there.
·
More interesting is seeing just how many pieces
of conditional probabilities affects the decision making at each decision
point. Reflecting back on this thought, the immediate reaction is: of course, the
swing depends on the quality of the set, which depends on the quality of the
pass, which depends on the quality of the passer and the serve, etc. But it is
sobering to realize the number of probabilities is needed to feed into one
decision, and that calculation is done by the decision maker instantaneously.
·
Many of the conditional probabilities can be
eliminated from consideration because of human bias and the strategic implications
of the action, which does pare down the probabilities that have been listed,
but even with a pared down list, the number of probabilities, conditional or
otherwise, are very large.
As I look at my very rudimentary
model, I think of all the uncertainties that are not modelled as I was trying
to simplify the model and I think about how much each of them could have affect
the outcome of the play. This is the problem of the unmodelled dynamics in
control systems, which does affect the predictive ability of the model. But
then I am reminded of one of the seven deadly management diseases cited by Dr.
W. Edward Deming: Management by using only
of visible figures.
Works Cited
Albert, Laura. "Volleyball analytics: Modeling
volleyball using Markov chains." Slideshare.net. October 26, 2018.
https://www.slideshare.net/lamclay/volleyball-analytics-modeling-volleyball-using-markov-chains
(accessed March 19, 2021).
Deming, W. Edward. Out of the Crisis.
Cambridge, MA: The MIT Press, 1982.
Wung, Peter. "Stats For Spikes-Use of Statistics
as Goals." Musings and Ruminations. March 6, 2021.
https://polymathtobe.blogspot.com/2021/03/stats-for-spikes-use-of-statistics-as.html
(accessed March 6, 2021).