In this era of Moneyball, almost all sports are delving into how to coax wisdom from the numbers that are naturally generated from taking statistics from playing the game. The USA National Team has been active in this discovery process, there are coaches who are integrated into the coaching staff and are specifically dedicated to the creation, calculation, and analysis of meaningful statistics based on basic playing statistics. They indulge in the process of descriptive statistics, which is the act of capturing the details of game action — what happened with each act of playing the ball — through the act of assigning values to each action on the ball. This is particularly important in the fast paced, continuously changing, and competitive environment of international volleyball. The statistics staff continuously keep the coaches appraised of the game action as seen through the filter of statistics to cut through the cloud of human biases and perceptions.
Those of us who reside in the less rarified air of high school
and club volleyball are also interested in using the statistics for our
purposes. Even though we cannot possibly accrue that level of descriptive statistics in our matches; because of our lack of
resources, both human and technical, we sometimes try to use the statistics
that we do have and try to use inferential
statistics to help us make decisions about how we should plan our
training as well as measure our team’s progress throughout a season. If we wish
to measure improvement, we need to first measure our base level of performance,
whether it is for individual players and individual skills or for team
performances during match play. Regardless of the parameters of the performance
measures, we need to make those performance measures before and after making any changes so that we
can compare.
What is not a given is the vast difference between descriptive
statistics and inferential statistics. Inferential statistics
is based on assumptions made about the processes under measure, whether they
are all under the same conditions, whether the processes are under statistical
control, and whether the measurement process is repeatable and reproduceable.
We have seen the same need for measurement and improvement when
we observe other sports or any other aspects of human endeavors outside of
sports wishing to transform observations into corrective action. Statistical
Process Control (SPC), and especially Six Sigma Processes, have become
ubiquitous in our vocabulary. Indeed, using statistical measures are the keys
to creating consistent manufacturing processes, minimize process errors, and increasing process throughput. Unfortunately,
there are critical differences between the manufacturing environment and the
sporting environment. In the manufacturing environment, the variability of the
machines is measurably minimal because the machines are inanimate and they are,
by and large, controllable. This is not to say that it is easy to control those
variables; the controllability problem in manufacturing can be difficult
because the threshold of error is small and the required signal to noise ratio
is large.
In the sporting environment, human actions and responses can
be random to the extreme, which drives the uncertainty in the sporting process;
to make matters worse, the uncertainties associated with each individual are
coupled so that the impact of one person’s randomness is not just limited to
the actions of that person, but affects every other person taking part: every
player on both teams, the officials, the coaching staff etc. all contribute to
the aggregation of uncertainties in all statistical measures. The coupling
effect may be miniscule so that much of the coupling can be ignored, but not
all couplings can be easily ignored. This is true of the instantaneous
descriptive statistics taken during the matches as well, but the averaging in
descriptive statistics is minimal as compared to the accruing of the larger
statistics that are used to draw inference. For example, a good server influences
not only the passer but also the setter, the hitter; the interaction can have secondary
and tertiary effects on how the serving team plays as they react to the actions
of the passing team. Each action in volleyball, as with most sports, depends on
prior actions.
So why talk about this? Because there are many coaches who
ask the following form of question: “I have
a Name a Level and Age team, what statistical threshold should my
team be performing at when we are performing Name a Volleyball Action?”
The intent of the question is clear. The coach is trying to
determine a reference level of performance for comparison against what they can
measure of their own teams. The question is a loaded one because since sports
are dependent upon prior actions; that is, there is no way to separate and
isolate a specific game action from all that had led up to it, the statistics
taken is conditional upon the prior actions, but the measure that we take are
singular dimensioned, the measure never truly reflect the deep coupling of the
actions.
To further compound the amount of uncertainties, many
assumptions are assumed and tacitly made. The usual practice in statistics is to take many different
sets of the one kind of data and aggregating them into one representative set
of data by averaging many datasets together. Averaging, as with all things, has
its advantages and disadvantages. The advantage is that many datasets can be
used to create a uniform and representative set of data which can give the user
a good idea of what the general trend is for specific variables: how well the
team is performing and how well each player is performing in each of the
measured skills. Rather than diving through massive amounts of data, the aggregate
data is used. The assumption is that the aggregation is an accurate
representation of your team. Which brings us to the disadvantage of taking
averages. When an average is taken, the salient contributing factors are smeared,
that is: the highs and lows of all the datasets are nullified in deference to
the average. The result is that we have erased the unique contributions and variations
of individual opponent as well as the team of interest. In essence, averaging your
own team statistics creates a fictional “average” representation of your
team. More subtly, the statistics generated is presumed to be against yet another
fictional “average” team, that of
the opponents. This is problematic because the opposing team’s actions are what
elicits the response from your team, so the weaknesses inherent to your players
and your team in aggregate is disguised by the average of the opponent, which
negates any insight that you may gain regarding your team and what you would
need to correct in training. Another issue is that by using an “average” representation of your team against the “average”
opponent, you are obscuring the
specifics of how your team plays: problem rotations that you may have against a
good team or a good player. You are also erasing the problems that you may have
in certain situations, like passing in the seams or hitting line. You are averaging
out your best player’s statistics along with averaging out your worst player’s statistics,
so you are unable to identify the problem area.
Note that
all the aforementioned situational information are easily available from the descriptive
statistics taken during the game, it is when we try to infer our team’s future
performance from comparing our team’s general “average” performance against
the performance of our general “average” opponent’s against us that the
inferential value of the exercise disappears.
Another, more subtle, logical fallacy has also been made and
assumed in addition to the averaging problem.
“When a measure becomes a target, it ceases to
be a good measure.”
Goodhart’s Law as articulated by Marilyn Strathern,
Anthropologist
What does the above statement mean?
It means that we selectively choose meaningful measures to help us in determining the
truth in what we experience by observing specific variables that will validate
or refute the pictures in our minds. The measurement should be performed unobtrusively
in order to not affect the outcome of
what we are trying to observe, but if we tried to take a shortcut in our
observations by making reality conform to what we perceive we need to observe, i.e.,
if we made the team aim at the expected measure as targets, we are then skewing
our player’s minds to perform according to the artificial horizons set by the
measure/target rather than what we hope to achieve: maximize performance over all of the variables, and
more importantly, winning. A good statistical lesson to remember is that correlation
does not equal causation. Just because two sets of data correlate do not
mean that the one result follows the other.
Using fictitious volleyball truisms as targets for a team can actually hurt the team’s
chances. I was a firm believer in many truisms:
the ace to error ratio must be at least one if you want to succeed; teams
passing an average of 2.4 on a 3-point scale will always win; set distributions,
in order, must be the most sets to the left side, followed by the middle, the
right side, and then finally the backrow to win.
Once again, the problem with the truisms is that correlation does not equal causation.
When coaches practice and train while using artificially determined goals as
the target, the measure stop being the clue to the secret of team performance, they
become the target and the end goal. People will focus on the target and work
towards achieving that goal and ignore the fact that the purpose of playing the
game is to be the winner when the last ball drops. When players get preoccupied
by the artificial horizons set by the coaching staff, they are putting the
winning and losing and their overall game performance secondary. All coaches have
stories about how their teams did everything perfectly according to the
statistics and still lost, and vice versa. Statistics should not be the goal;
it should be a way to augment the picture that everyone has of the reality they
are experiencing.
A Digression.
The use of averaging happens in real life and it is
ubiquitous; even in the way we assess players in general. The avcaVPI measure
was instituted to ostensibly help the college athlete determine whether they
can play in college as well as help college coaches find the players to recruit
based on physical measures. The idea is to use the avcaVPI score to help give players
and coaches an idea about how the players would fit in given college divisions
and programs by comparing their physical attributes, as measured in
non-competitive environments, compare to those who are already playing in
college. When the initial VPI measure came out, I remember that it was simply a
single score which is an aggregate of various physical measures done at the testing
sites. The initial criticism was that the players were not compared against players
playing their positions; to AVCA’s credit, it looks like they have corrected
that oversight, although the avcaVPI data doesn’t seem like it is further
segregated into NCAA, NJCAA, or NAIA divisions, although I could be wrong. The avcaVPI
scores are categorized and ranked according to positions, and the percentile where
each player fit in each test category as compared to the players who are
already playing in college, which is much more useful than before, but it is
still misleading. What is left unsaid again
is that correlation does not equal causation, because if a player’s physical
measure falls within the percentile rank
of the existing college players, that does not mean that the player is going to
get recruited to play in college.
Talent evaluation is a very tricky and uncertain process, just ask any NFL team about
their ability to identify a quality quarterback, and then point out that Tom Brady
was drafted in the sixth round of 2000 NFL Draft by the New England
Patriots, 199th overall, and was the seventh quarterback taken.
The avcaVPI really does very little to clear up the collegiate volleyball
recruiting picture for all involved.