Followers

Search This Blog

Saturday, January 4, 2020

Stats For Spikes-Correlation and Causation


This article (Paine 2016) caught my attention recently. It talks about the case of Charles Reep, a former Royal Air Force Wing Commander who was tracking play-by-play data for matches and serving as a quantitative consultant for Football League teams as early as the 1950s.


https://fivethirtyeight.com/features/how-one-mans-bad-math-helped-ruin-decades-of-english-soccer/amp/?__twitter_impression=true&fbclid=IwAR0MNCiSu4nJIcGYvW5dRoTif1mzNc6MJzo8c-AFLU-mDWqZgWOCnT75tIw


The article recalls how Reep’s analytics caused him to conclude that the number of passes made in soccer is directly correlated to scoring. His admonition is that shooting after three passes or less have a higher probability for scoring a goal.

But Reep was making a huge mistake. Put simply, Reep started with each goal scored and looked at how many passes were made prior to scoring. His starting point was goals scored. The problem is that most goals scored in soccer do come after three passes or less, because that is the nature of the game, it is sporadic, and the passing game get disrupted frequently by the defense. What he did not count were the goals missed after just three passes, that block of data is missing because of his focus on just scoring the goal.
In a previous article, Neil Paine of the website Five Thirty-Eight refuted that bit of wisdom gleaned from Reep’s agglomeration of soccer data.

https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-beautiful-game/

But subsequent analysis has discredited this way of thinking. Reep’s mistake was to fixate on the percentage of goals generated by passing sequences of various lengths. Instead, he should have flipped things around, focusing on the probability that a given sequence would produce a goal. Yes, a large proportion of goals are generated on short possessions, but soccer is also fundamentally a game of short possessions and frequent turnovers. If you account for how often each sequence-length occurs during the flow of play, of course more goals are going to come off of smaller sequences — after all, they’re easily the most common type of sequence. But that doesn’t mean a small sequence has a higher probability of leading to a goal.

To the contrary, a team’s probability of scoring goes up as it strings together more successful passes. The implication of this statistical about-face is that maintaining possession is important in soccer. There’s a good relationship between a team’s time spent in control of the ball and its ability to generate shots on target, which in turn is hugely predictive of a team’s scoring rate and, consequently, its placement in the league table. While there’s less rhyme or reason to the rate at which teams convert those scoring chances into goals, modern analysis has ascertained that possession plays a big role in creating offensive opportunities, and that effective short passing — fueled largely by having pass targets move to soft spots in the defense before ever receiving the ball — is strongly associated with building and maintaining possession. (Paine 2014)

To reiterated, he should have focused tracking the number of possessions and whether those possession turned into goals.  Given the complexity of the game, it was perhaps understandable that Reep made this mistake, and given that the state of the art of statistical analysis in sports was still rudimentary, it was perhaps predictable. The unfortunate thing is that Reep was able to convince an entire nation’s soccer establishment, not just any nation, but the nation where the game was born, the nation who’s excellence in the game was globally recognized to go off on a wild goose chase. People should have known better. Maybe.
This brings us to an oft repeated but rarely observed tenet of using statistics in applications: Correlation does not equal causation. The saying may sound glib, but it is remarkably dead on.  If you find some kind of correlation between two events, then our habit and inclination is to jump to the conclusion that the two events have a causal relationship; that is, one event caused the other to occur, or that we can deterministically and reasonably predict the latter event will result from the occurrence of the first event. Unfortunately for us that is rarely the case. Establishing causality takes a bit of mathematical formal checking, just because the statistics show some kind of correlation exists between the two events, however minimal, doesn’t necessarily mean that they have a causal relationship.

In order to establish causality, a lot of number crunching needs to happen, and a lot of statistical metrics need to meet certain established thresholds before we can declare causality. That is a completely different arm of statistical sciences call inferential statistics. Far too involved for me to try to explain here and now, even assuming I can explain it. A rather large and dodgy assumption.
Another thing that Reep’s error illustrates is the Survivorship bias. The story of Abraham Wald and the US warplanes is a favorite on social media and business writers because it perfectly demonstrates the linear and direct thinking most people employ when they see data, or results without taking into account the underlying situation.

Abraham Wald was born in 1902 in the then Austria-Hungarian empire. After graduating in Mathematics he lectured in Economics in Vienna. As a Jew following the Anschluss between Nazi Germany and Austria in 1938 Wald and his family faced persecution and so they emigrated to the USA after he was offered a university position at Yale. During World War Two Wald was a member of the Statistical Research Group (SRG) as the US tried to approach military problems with research methodology.
One problem the US military faced was how to reduce aircraft casualties. They researched the damage received to their planes returning from conflict. By mapping out damage they found their planes were receiving most bullet holes to the wings and tail. The engine was spared.


The US military’s conclusion was simple: the wings and tail are obviously vulnerable to receiving bullets. We need to increase armour to these areas. Wald stepped in. His conclusion was surprising: don’t armour the wings and tail. Armour the engine.

Wald’s insight and reasoning were based on understanding what we now call survivorship bias. Bias is any factor in the research process which skews the results. Survivorship bias describes the error of looking only at subjects who’ve reached a certain point without considering the (often invisible) subjects who haven’t. In the case of the US military they were only studying the planes which had returned to base following conflict i.e. the survivors. In other words what their diagram of bullet holes actually showed was the areas their planes could sustain damage and still be able to fly and bring their pilots home. (Thomas 2019)

What Reep saw was goals, he was fixated on them rather than the big picture, he fell into the trap of reaching the first and most obvious conclusion rather than try to explore the structure of the game. Sometimes prior experience is very useful and not everything new is golden.

Works Cited

Paine, Neil. 2016. "How One Man’s Bad Math Helped Ruin Decades Of English Soccer." http://www.fivethirtyeight.com. October 27. Accessed December 24, 2019. https://fivethirtyeight.com/features/how-one-mans-bad-math-helped-ruin-decades-of-english-soccer/amp/?__twitter_impression=true&fbclid=IwAR0MNCiSu4nJIcGYvW5dRoTif1mzNc6MJzo8c-AFLU-mDWqZgWOCnT75tIw.
—. 2014. "What Analytics Can Teach Us About the Beautiful Game." http://www.fivethirtyeight.com. June 12. Accessed December 24, 2019. https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-beautiful-game/.

Thomas, James. 2019. "Survivorship BIas." McDreeamie Musings. April 1. Accessed December 28, 2019. https://mcdreeamiemusings.com/blog/2019/4/1/survivorship-bias-how-lessons-from-world-war-two-affect-clinical-research-today.

Friday, December 27, 2019

Volleyball Coaching Life-The Last Match

As I am watching the Stanford Volleyball’s recording of the women’s team celebrating their national championship match win, I am struck by the emotions etched in the faces of the players and coach Kevin Hambly. It was a mix of unadulterated joy for some and for others, particularly Libero Morgan Hentz, it was a look of desperate sadness. In the audio portion almost all the players made some comment about the sadness of seeing their four years end, a sadness that came with the recognition that this was the last time that this team, this particular alchemy of people was ever going to play together. Ever. The finality of the thought is brutal but honest.

However, it is Morgan’s demeanor and her human response to that finality that captured my thoughts about the reasons for coaching, at least the most important reason. The juxtaposition of her tear streaked face to go along with her big broad smile captures the juxtaposition of emotions that had enveloped her. Her absolute honesty and integrity made me think on this moment that is fraught with conflicting thoughts.

This scene plays all over the country at the end of Fall, as high school and college teams end their season. At least half of them will end in defeat, so they don’t experience the euphoria that the Stanford team experienced at that moment in PPG Paint Arena in Pittsburgh, only the very few get to do that. Johns Hopkins in Division III, Cal State-Bernardino in Division II, and Marian in NAIA all get to do the celebratory dance, as do the Junior College champions. No doubt their celebrations are joyous and over the top.

But the sense of sadness, the sense of finality of the last match hit every team without regard to winning or losing, we just get to see Morgan express her loss publicly. No doubt there are heartfelt expressions of love and loss in the locker rooms, both the winning and the losing ones. No doubt there are coaching staff sitting stoically in the seats in the arena, processing the meaning of the last match and the sense of loss which has finally hit them after the adrenaline of the match had worn off. No doubt players, staff, and coaches are feeling the weight of regret for things left unsaid, acts of friendship left unperformed, love unexpressed, hugs unhugged. For those who were lucky enough to win the last match together, it is a mixture of happiness, gratefulness, sadness and regret. For those who lose their last match together it is pangs of goals unmet, and missions unaccomplished mixed with the sadness and regret. The common denominator is the sadness and regret. From the team who did not win a match all season to the team who did not lose a match all season, the common denominator is the team, with all the adjectives which inadequately describe the meaning of the term team.

Coaches try to build teams from day one. They preach about family, they admonish the players about having each other’s back, they cajole them to be vulnerable to each other, and they think up ridiculous exercise to motivate the mélange of players to bond into a team. All to capture that magical alchemy call a cohesive team. Some think that team chemistry is a formula, a recipe. If we gave them an opportunity to do this, or to do that then at the end we will have a team. I am much more romantic than that. Each team is much greater than the sum of its parts, but the parts are important. There are as many disparate personalities, temperaments, cultures, logic, and mindsets as there are players, the job of melding them all into a strong and bonded collective is seemingly next to impossible. The team building tactics, and activities do help in progressing the team to their goals, but there is an element of magic which is unpredictable and undetected in all the interpersonal interactions that happen in a team. That magic must happen serendipitously, there are catalysts but their effects are all also uncertain. There are no ways to replicate the magic year after year, there are no ways to capture it if you don’t have it. You sow the ground the best you can and then you hope for the best. Prepare the ground, make sure it is fecund, and then let it happen. Or not.

 For the coaches, watching the end of a chapter in your team or program is the ultimate test of your coaching philosophy. John Kessel used to always ask beginning coaches what they were coaching. He would play gotcha with them if they answered: volleyball. “NO!” he would bellow, scaring the dickens out of the group, “you don’t coach volleyball, you coach people!” It is because we coach people that we value, actually treasure a true team.

It is because we coach people that we, volleyball coaches, are so touched and moved by the elation and sadness of the scene in PPG Paint Area. We don’t do this to win matches, the extrinsic rewards are obviously fantastic, but we do it for the intrinsic rewards, rewards we enjoy in the privacy of our minds and heart, rewards that are inexpressible to those who have not been where we have been. We do it for so many human and emotional reasons and the real rewards comes from witnessing and experiencing our teams become one and reveling in the presence of one another. You don’t need to win the national championship to experience that euphoria and love.