Followers

Search This Blog

Showing posts with label Stats For Spikes. Show all posts
Showing posts with label Stats For Spikes. Show all posts

Saturday, January 4, 2020

Stats For Spikes-Correlation and Causation


This article (Paine 2016) caught my attention recently. It talks about the case of Charles Reep, a former Royal Air Force Wing Commander who was tracking play-by-play data for matches and serving as a quantitative consultant for Football League teams as early as the 1950s.


https://fivethirtyeight.com/features/how-one-mans-bad-math-helped-ruin-decades-of-english-soccer/amp/?__twitter_impression=true&fbclid=IwAR0MNCiSu4nJIcGYvW5dRoTif1mzNc6MJzo8c-AFLU-mDWqZgWOCnT75tIw


The article recalls how Reep’s analytics caused him to conclude that the number of passes made in soccer is directly correlated to scoring. His admonition is that shooting after three passes or less have a higher probability for scoring a goal.

But Reep was making a huge mistake. Put simply, Reep started with each goal scored and looked at how many passes were made prior to scoring. His starting point was goals scored. The problem is that most goals scored in soccer do come after three passes or less, because that is the nature of the game, it is sporadic, and the passing game get disrupted frequently by the defense. What he did not count were the goals missed after just three passes, that block of data is missing because of his focus on just scoring the goal.
In a previous article, Neil Paine of the website Five Thirty-Eight refuted that bit of wisdom gleaned from Reep’s agglomeration of soccer data.

https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-beautiful-game/

But subsequent analysis has discredited this way of thinking. Reep’s mistake was to fixate on the percentage of goals generated by passing sequences of various lengths. Instead, he should have flipped things around, focusing on the probability that a given sequence would produce a goal. Yes, a large proportion of goals are generated on short possessions, but soccer is also fundamentally a game of short possessions and frequent turnovers. If you account for how often each sequence-length occurs during the flow of play, of course more goals are going to come off of smaller sequences — after all, they’re easily the most common type of sequence. But that doesn’t mean a small sequence has a higher probability of leading to a goal.

To the contrary, a team’s probability of scoring goes up as it strings together more successful passes. The implication of this statistical about-face is that maintaining possession is important in soccer. There’s a good relationship between a team’s time spent in control of the ball and its ability to generate shots on target, which in turn is hugely predictive of a team’s scoring rate and, consequently, its placement in the league table. While there’s less rhyme or reason to the rate at which teams convert those scoring chances into goals, modern analysis has ascertained that possession plays a big role in creating offensive opportunities, and that effective short passing — fueled largely by having pass targets move to soft spots in the defense before ever receiving the ball — is strongly associated with building and maintaining possession. (Paine 2014)

To reiterated, he should have focused tracking the number of possessions and whether those possession turned into goals.  Given the complexity of the game, it was perhaps understandable that Reep made this mistake, and given that the state of the art of statistical analysis in sports was still rudimentary, it was perhaps predictable. The unfortunate thing is that Reep was able to convince an entire nation’s soccer establishment, not just any nation, but the nation where the game was born, the nation who’s excellence in the game was globally recognized to go off on a wild goose chase. People should have known better. Maybe.
This brings us to an oft repeated but rarely observed tenet of using statistics in applications: Correlation does not equal causation. The saying may sound glib, but it is remarkably dead on.  If you find some kind of correlation between two events, then our habit and inclination is to jump to the conclusion that the two events have a causal relationship; that is, one event caused the other to occur, or that we can deterministically and reasonably predict the latter event will result from the occurrence of the first event. Unfortunately for us that is rarely the case. Establishing causality takes a bit of mathematical formal checking, just because the statistics show some kind of correlation exists between the two events, however minimal, doesn’t necessarily mean that they have a causal relationship.

In order to establish causality, a lot of number crunching needs to happen, and a lot of statistical metrics need to meet certain established thresholds before we can declare causality. That is a completely different arm of statistical sciences call inferential statistics. Far too involved for me to try to explain here and now, even assuming I can explain it. A rather large and dodgy assumption.
Another thing that Reep’s error illustrates is the Survivorship bias. The story of Abraham Wald and the US warplanes is a favorite on social media and business writers because it perfectly demonstrates the linear and direct thinking most people employ when they see data, or results without taking into account the underlying situation.

Abraham Wald was born in 1902 in the then Austria-Hungarian empire. After graduating in Mathematics he lectured in Economics in Vienna. As a Jew following the Anschluss between Nazi Germany and Austria in 1938 Wald and his family faced persecution and so they emigrated to the USA after he was offered a university position at Yale. During World War Two Wald was a member of the Statistical Research Group (SRG) as the US tried to approach military problems with research methodology.
One problem the US military faced was how to reduce aircraft casualties. They researched the damage received to their planes returning from conflict. By mapping out damage they found their planes were receiving most bullet holes to the wings and tail. The engine was spared.


The US military’s conclusion was simple: the wings and tail are obviously vulnerable to receiving bullets. We need to increase armour to these areas. Wald stepped in. His conclusion was surprising: don’t armour the wings and tail. Armour the engine.

Wald’s insight and reasoning were based on understanding what we now call survivorship bias. Bias is any factor in the research process which skews the results. Survivorship bias describes the error of looking only at subjects who’ve reached a certain point without considering the (often invisible) subjects who haven’t. In the case of the US military they were only studying the planes which had returned to base following conflict i.e. the survivors. In other words what their diagram of bullet holes actually showed was the areas their planes could sustain damage and still be able to fly and bring their pilots home. (Thomas 2019)

What Reep saw was goals, he was fixated on them rather than the big picture, he fell into the trap of reaching the first and most obvious conclusion rather than try to explore the structure of the game. Sometimes prior experience is very useful and not everything new is golden.

Works Cited

Paine, Neil. 2016. "How One Man’s Bad Math Helped Ruin Decades Of English Soccer." http://www.fivethirtyeight.com. October 27. Accessed December 24, 2019. https://fivethirtyeight.com/features/how-one-mans-bad-math-helped-ruin-decades-of-english-soccer/amp/?__twitter_impression=true&fbclid=IwAR0MNCiSu4nJIcGYvW5dRoTif1mzNc6MJzo8c-AFLU-mDWqZgWOCnT75tIw.
—. 2014. "What Analytics Can Teach Us About the Beautiful Game." http://www.fivethirtyeight.com. June 12. Accessed December 24, 2019. https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-beautiful-game/.

Thomas, James. 2019. "Survivorship BIas." McDreeamie Musings. April 1. Accessed December 28, 2019. https://mcdreeamiemusings.com/blog/2019/4/1/survivorship-bias-how-lessons-from-world-war-two-affect-clinical-research-today.

Monday, December 16, 2019

Stats For Spikes-Variance

I had wanted to do a little bit of explaining about probability and statistics tools. One of them is the concept of variance, so it was with much delight that I saw Coach Jim Stone write the articles below about the observations he made regarding variances.

Definitions
First some definitions.
The mean is the average of the same performance measurements taken over a long time and sampled at regular intervals. That mean is compared to the expected value of the measurements which is calculated prior to measurement decides the accuracy of the measurement,. In manufacturing or any engineering related activities, the mean of the measurements are compared to what the designer had intended and designed to achieve, that is a reference value, a goal to measure against.  The formula for the mean is just the numerical average of all the measurements of the metric, hitting percentage, conversion percentage, and in the article, the focus was on hitting efficiency.  The article compares the average hitting efficiency of various players from 2018 to their hitting efficiency in 2019.

In Coach Stones articles, his use of the term variance refers to a comparison of the 2018 and 2019 numbers, so he is using the 2018 hitting efficiency number as the reference and comparing the 2019 hitting efficiency number against it. The variance that he talks about is the difference for each player from 2018 to 2019.

In the statistical sciences however, variance is defined as the square of the difference of a measurement from the mean of many measurements. The variance, in statistical language, can be calculated as 

Figure 1 Formula to calculate Variance. (Staff, WikiHow 2019)

Standard deviation is defined as how spread out the measurements are from the mean, or the square root of the variance. The calculation is simple and if you don’t want to do it by hand most spreadsheet programs will have a function. In Excel the mean is: mean=average(x1, x2, …xn)  and standard deviation function  is standard dev=stdev((x1, x2, …xn). You may need to download the statistical function package to make it work but it is very simple to use.

Mean and standard deviations are used by statisticians to decide just how precise and accurate the thing being measured is, whether it is a player or a team.

The standard deviation tells us the precision of the process that we are trying to measure.
An illustration is better at getting that point across. The first illustration shows the Normal or Gaussian probability distribution. The mean of the measurements, as compared to the reference value tells us how inaccurate the measured performances of the process/team/people are, the width of the spread of the normal distribution tells us how spread out the measurements are and it gives us a measurement about how imprecise the performance of the process/team/people are. 
Figure 2: Illustration of the meaning of accuracy of precision. (Medcalc Staff 2019)
Another illustration uses a picture of the bull’s eye to better show the relationships.
Figure 3 Bull's eye explanation of the differences in interpretation of accuracy and precision. (Circuit Globe 2019)
In the world of athletic performance, it is next to impossible to use the accuracy intelligently because people and teams will perform according to their best ability in that time at that place, there are too many extraneous variable to account for and to uphold the hypothetical performance standard. Many coaches on VCT often ask for reference values as a goal to achieve for their teams rather than as a means of assessing where their team performance is as compared to a generic measure. The difference is subtle but important, by using a reference measure as comparison is a normal practice: you want to know what the “average” standard is for a team that is for a certain age and gender. The problem is that each team is unique, each player is unique, the conglomeration of the performances of each unique member of the unique team can be averaged to get a measurable “team average”, but comparing your team to a generic measure is unrealistic and depending on how you use that reference value, the usage can cause more problems than it will solve; there are many complicating factors to make the reference measure meaningful. A better way to measure your team’s performance is to do what Coach Stone did, which is to compare your present performance with your past performance, assuming you have a past performance record.  That comparison, making the comparison relative to the previous performance gives the coach a more concrete measure of the team performance.

It is also good to keep in mind that even though measured performance increases the probability of success for teams and players, they are not the determining factors for success; that is, have a great hitting efficiency percentage tells us that the chances of winning are going to be better but they do not guarantee a win: having good measures is not predictive, in this case correlation definitely does not equal to causality.

The standard deviation or precision is something important for coaches to examine, as Coach Stone has said in his articles. The statistics that we gather on the bench for the players and the team in the game gives us a performance measure of the player and team for that set and that match. The measures during a set or match: hitting percentage, blocks, assists, digs, passing efficiency etc. are all a function of the opponent’s strengths and weaknesses; whether the match is home, neutral or away; the temperature and humidity in the gym etc.  So that when we average the same performance measures across matches played against different opponents, locations, atmospheric conditions, we are making an assumption: that the variations inherent in playing the games under different circumstances can affect the performance measures but we can still get the information we want about our team and players by taking the mean of the performance measure while under different conditions. In fact, taking the average of many performances is the preferred way to isolate the actual team performance because the primary performance characteristics of our players, good or bad, shows itself in the average, more so than in looking at a bunch of data from individual matches. The effect of matches played against different opponents, locations, atmospheric conditions, etc.  are all accounted for in the variances that we see in the performance measures. We take for granted that those variances are a part of the performance capabilities of the team and players. Indeed, by taking the average of the performance measures, we are in effect smoothing out the transient performances for each individual match or allowing the variations from the environment and opponent factors to average itself out and we hope that by allowing averaging to take place, we end up filtering out the extraneous effects and we get the team’s actual capabilities over a designated time span. This is the Law of large numbers, which states that by virtue of taking many measurements or samples, the mean that we calculate of all the measurements will end up being closer to the real mean or the player or team that we are measuring.

In Coach Stone’s article on Variance and Lineup, his measure of volatility is what I assume to be the standard deviation. In his case, the volatility of the player is important because we are assuming that all the extenuating circumstances have been smoothed away uniformly and that the actual volatility or standard deviation reflects the actual ability of the player to execute precisely.

Coach Stone was making the point that when coaches make decisions on starting lineups, that volatility, standard deviation, of a player’s statistic should be considered in conjunction with the mean. I heartily agree, except that I would warn that these are probabilistic descriptions of the performance, they are not deterministic, that is there are randomness and uncertainty embedded in the numbers.

In Coach Stones article, he said that he would trade higher efficiency for less volatility. A prudent decision that reflects his personal preference for stability. In his personal probability he saw that there was safety in lower volatility. There is high probability that his decision is a sound one, and the results may bear out the decision, but it is also probabilistically possible that the decision may work out to the contrary, that the events such as a volleyball match are probabilistic in nature and the players performance may not her personal performance curve. He compares the hitting efficiency of Kathryn Plummer and Jazz Sweet and stated that:
If you look one standard deviation from her average efficiency, you can see that Plummer will hit between .200 and .370 almost 70% of her outings.  This is what the team and coach can generally expect on any given night.  Conversely, one of the more volatile players would be Jazz Sweet from Nebraska.  Her volatility is high (relative to Plummer) so her range of performance will be broader.  One could expect that 70% of the matches Sweet will hit between .000%-.320%. 

While I agree with the sentiment, a better wording is that Plummer’s performance would be somewhere between 0.200 and 0.370 at 70% of the time. The turn of phrase is not merely playing with semantics, it turns the argument to the information that the Normal curve actually demonstrate: rather than saying that Plummer hits between 0.200-0.370 70 % of the time, it says that the probability is 70% that she will hit between 0.200 and 0.370. We are putting the probabilistic thinking into play for the decision maker, the term “probability” gives the decision maker food for thought while introducing the reality that there is a 30% chance that she will hit below 0.200 and above 0.375. Instead of thinking that it is a sure thing, that hitting in the range 70% is a great deal, the thought becomes that there is a 15% chance that she will hit below that range and there is a 15% chance that she can hit above that range.  Thinking in probability terms because we have the data already available to us means that we can contemplate the possibility of potential failure and helps temper our expectations. This is why we play the game out, rather than do simulations on laptops. It is a matter of what your personal probability tells you.

One way to help us refine our decision making is if we had prior data on the performance of the players in exact situations, in that case one would use using Baye’s Theorem to recompute the probabilities.
One other myth regarding mean and standard deviation that I would like to dispel is the following. I had heard an anecdote that a coach would, as a matter of habit, pull a player out of the lineup when their statistics indicate that they were performing much above their season mean for a significant amount of time. The rationale for this move relies on the mistaken belief that since the player is outperforming himself for the season, he was due for a low performance, and that by pulling him out of the lineup at that moment,  the team is able to avoid or bypass the low performance and that the player would resume his high performance in his next start. Statistics doesn’t work that way. While the standard deviation measure says that there should be instances where a lower performance occurs to balance the higher performance, there is nothing in statistics that says that a low performance must necessarily happen in a symmetrical way, or happen immediately after a series of high performances. In fact, the lower performance may come when you least expect it. The thing to remember is that the measures are cumulative over a long period of time. Once again, the law of large numbers tells us that the balancing of the highs and lows comes after a large number of measurements rather than instantaneously. In the world of Statistical Process Control, a series of measures that continuously oscillates above and below the mean is an indication that something is wrong with the process.

Next: Statistical Process Control tools. Six-sigma and its significance and whether we should apply that criteria.

Works Cited

Circuit Globe. 2019. Circuit Globe /Accuracy and Precision. May. Accessed August 17, 2019. https://circuitglobe.com/accuracy-and-precision.html.
Medcalc Staff. 2019. Medcalc.org/Accuracy and Precision. August. Accessed August 20, 2019. https://www.medcalc.org/manual/accuracy_precision.php.
Staff, WikiHow. 2019. WikiHow. October 23. Accessed December 15, 2019. https://www.wikihow.com/Calculate-Variance.