2020 Research Paper Finalists & Posters

THE 2020 REASEARCH PAPER COMPETITION IS PRESENTED BY MLB

We are proud to announce the 9 Research Paper Finalists and 13 Posters – including two by the winners of our first-ever high school competition – selected for the 2020 MIT Sloan Sports Analytics Conference:

2020 MIT SLOAN SPORTS ANALYTICS CONFERENCE Research Paper

Description

Download Full Paper Here

Authors:

Evan Munro, PhD Student, Stanford University
Martino Banchio, PhD Student, Stanford University

Abstract:

Canonical tournament theory structures prizes that are decreasing in rank. In many practical settings, however, the lowest-ranked individuals receive a prize that is higher than middle-ranked individuals. In the major U.S. sports leagues, the most valuable new eligible players are allocated through a draft. To ensure long-term competitiveness of sports franchises, draft picks are allocated with a higher probability to the worst teams in a league. This causes some teams to exert less effort later in the season to secure a valuable draft pick with higher probability. We derive a theoretical model of team decision-making, and we prove that any lottery based on end of season rankings that does not treat all non-playoff teams equally will provide incentives for some teams to lose intentionally. We relax the constraint that the rule depends on the final rankings only, and design a lottery that eliminates tanking, favors the worst teams in a season and is optimal in a restricted class of mechanisms. We show the benefits of this new mechanism in simulations and empirically using data from the National Basketball Association.

Description

Download Full Paper Here

Author:

Neil Johnson, Sports Analytics Developer, ESPN

Abstract:

This research paper looks at the feasibility of parsing player tracking data from a single non-stationary camera like the one typically used in sports broadcasts. Parsing player tracking data from broadcast video opens up a plethora of applications that allows the technology to be scaled downwards and across sports in addition to capturing events that could otherwise not be captured. This approach uses an array of open source computer vision applications including pose estimation and template matching. Early tests show the accuracy of this new method in placing players within a foot of their true location at 94.5%. Making player tracking data more accessible lowers the barrier of entry and increases the timespan for which advanced methods of analysis can be practiced. Additionally, the pose estimation data itself provides an additional new frontier of data analysis that can increase the fidelity of analysis that relies of player tracking data.

Description

Download Full Paper

Author:

Michael Horton, Machine Learning Researcher, Sportlogiq

Abstract:

The introduction of player tracking systems to professional sports has generated large datasets capturing player movement and critical events that occur during games.  The availability of such data, along with a growing appreciation of the role of analytics in evaluation and decision-making in sports, has created the conditions where analysis of player and team performance—as realised in the individual and group movement of the players—is now possible.

Concurrently, advances in machine learning methods and systems, in particular deep learning, have demonstrated success in a broad range of tasks.  However, there is a fundamental mismatch between player tracking data and the implied input format for standard deep-learning architectures.  To date, this issue was dealt with by feature engineering, where raw tracking data is preprocessed into a format suitable for the deep-learning model.  However, such feature engineering is time-consuming, requires significant domain knowledge, and inevitably results in information inherent in the tracking data being discarded.

In this paper, we present a flexible neural network framework that accepts raw trajectory data as input, without the need for any feature engineering or preprocessing, and can be used for a wide variety of sports analytic tasks.  We show that the framework works well on several football prediction tasks when using the player tracking data from the 2019 NFL Big Data Bowl as input: predicting the success rate, location and air-yards of passing plays; and predicting the tackler, tackle location and yards gained on running and passing plays.

Description

Download Full Paper Here

Authors:

James Zhan, Undergraduate Student, Goergen Institute of Data Science, University of Rochester
John Polimeni, Undergraduate Student, Goergen Institute of Data Science, University of Rochester
Luke Gerstner, Undergraduate Student, Goergen Institute of Data Science, University of Rochester

Abstract:

On July 10th, 2019 the Atlantic League had the first game ever called by a robotic umpire. While commissioner Rob Manfred said that there was no timeline for when the technology will be used in the majors, many people think the change will happen eventually. The purpose of our paper is to measure the impact that the robotic umpires will have on the outcomes of games and individual at bats. This question is important to the industry because it is necessary to see the possible impact that this new way of umpiring will have on the game. This is useful for teams and the rules committee. The result of this paper helps the rules committee see how this possible change may impact the game. It also is valuable to teams because if robotic umpires’ impact one aspect of the game more than the other, then the teams may be able to gain an advantage when constructing their rosters over other teams. 

We present a state-of-the-art metric which we call RE12, based off of the widely known RE24, to effectively quantify the effect of umpire missed calls. RE12 measures the run expectancy conditional on the 12 unique pitch counts. We then provide analysis based off of our numerical experiments and discuss the impact that missed calls have on key areas such as hitters, pitchers, and catchers. 

Description

Download Full Paper Here

Authors:

Samuel Kalman, Purdue University Undergraduate
Jonathan Bosch, Syracuse University Undergraduate

Abstract:

Basketball has recently been considered more of a “position-less” sport. Most NBA players have skills, styles, and tendencies that cannot be defined by a single traditional position. With NBA player data that accounts for a player’s efficiency, opportunity, and tendencies, we were able to implement unsupervised machine learning techniques to create a framework for how NBA playing styles can be clustered on the court and used for strategic decision making when building rosters and creating lineup rotations. These player clusters are considered new positions, and they give more accurate and detailed insight to what role a player possesses when on the court and how effective he could be in that role. Our unique methods give players a soft assignment for all clusters, that is, a probabilistic weighting onto each of the clusters indicating their likelihood of specific cluster fit. After analyzing the distribution of the various player stats within each new position, we were able to generate a player role for each cluster. 

We present a more specific way to consider player types and positions in the NBA, while providing insight into which combination of player types yield the most effective basketball performance. Our models also contain a predictive component where we can predict the net rating of a potential lineup. As we have recently witnessed a massive change in playing style by most of the NBA, we offer a more accurate approach for analyzing and understand the roles, responsibilities, and combinations of specific groups of players in the NBA. 

Description

Download Full Paper Here

Authors:

Lotte Bransen, Lead Data Scientist at SciSports, Netherlands
Jan Van Haaren, Chief Product & Technology Officer at SciSports and Research Fellow at KU Leuven, Belgium

Abstract:

Soccer scouts typically ignore the team balance and team chemistry when evaluating potential signings for their teams. Instead, they focus on the individual qualities of the players in isolation. To overcome this limitation of their recruitment process, this paper takes a first step towards objectively providing insight into the question: How well does a team of soccer players gel? We address that question in both an observational and a predictive setting. In the former setting, we observe the chemistry between players who have actually played together, which is relevant when selecting the best possible line-up for a match. In the latter setting, we predict the chemistry between players who have never played together before, which is particularly relevant to assess the fit of a potential signing with the players who are already on the team.

We introduce two chemistry metrics that measure the offensive and defensive chemistry for a pair of players, respectively. The offensive chemistry metric measures the pair’s joint performance in terms of scoring goals, whereas the defensive chemistry metric measures their joint performance in preventing their opponents from scoring goals. We compute our metrics for 361 seasons in 106 different competitions and present a number of concrete use cases. For instance, we show that the partnership between Mohamed Salah and Roberto Firmino in Liverpool’s 2017/2018 Champions League campaign exhibited the highest mutual chemistry between two players. Furthermore, we show that Mesut Özil’s chemistry has rapidly started declining following Alexis Sánchez’ departure to Manchester United in 2018.

Description

Download Full Paper Here

Authors:

Daniel F. Stone, Associate Professor of Economics, Bowdoin College
Brian Mills, Associate Professor, Department of Kinesiology & Health Education, University of Texas at Austin
Duncan Finigan, Associate Product Manager, Toast/ Bowdoin College, 2018

Abstract:

We address a fundamental baseball decision: when to make the “call to the bullpen” and pull the starting pitcher. The limited prior literature on this topic found that pulling starters earlier tends to reduce runs allowed in the current inning. We use a simple theoretical model to show that this result is consistent with win maximization and does not necessarily imply managerial bias. 

We then use data from the 2008-2017 seasons to estimate the effects of pulling the starter on both runs allowed in the current inning and win probability. We argue that the pulling starter decision is plausibly quasi-random conditional on the large set of included covariates, but we acknowledge the lack of true randomization. 

Our estimated effect of pulling the starter on runs allowed in the inning is indeed negative, but the effect on win probability is a precise zero. We examine how these effects vary by game situation, including a measure of lucky hitting performance, and use an alternative measure of managerial quality (a moving average of “Manager of the Year” votes), and find only scattered and highly limited evidence of biases. 

We interpret the results to imply that call to the bullpen decisions are approximately Bayesian-optimal. However, there was a steady downward trend in the mean inning that starters were pulled over a period of decades prior to our sample time-frame. Thus, managers appear to have learned to optimize, but at a very slow pace.

Description

Download Full Paper Here

Author:

Michael J. Hasday, Lecturer, DeSales University

Abstract:

MLB uses final offer arbitration (FOA) to set the salaries of certain players.  In FOA, the team and the player each submits a salary number, and the arbitrator (in MLB’s case, a panel of three arbitrators) is required to select one of the numbers as the award.  The rationale for FOA is that it incentivizes each party to submit a reasonable number so that it will be selected by the arbitrator, and if the submitted numbers are closer, settlement is more likely. 

Although FOA has historically worked well in MLB, players have been critical of the process in recent years as teams have gained the upper hand.  The problem is that FOA results in too much variance in the awards.  This high variance disadvantages players, who are less willing than teams to take risk.  

I propose variations of FOA, modeled after the “Running it Twice” poker procedure, that help level the playing field by greatly decreasing the variance in the awards.  Double-Header Baseball Arbitration plays out like regular FOA, except that two arbitrators independently decide which of the parties’ numbers to award.  If both agree on a number, then that is the award.  If they disagree, then the award is the midway point between the two parties’ numbers.  In another variation, Triple-Header Baseball Arbitration, three arbitrators independently decide.  If all three agree on a number, then that is the award; but, if the arbitrators split 2-1, the award is set at the applicable two-thirds point between the parties’ numbers. 

Description

Download Full Paper Here

Authors:

Dr Will Gürpınar-Morgan, Senior Data Scientist, Stats Perform
Dr Daniel Dinsdale, Data Scientist, Stats Perform
Dr Joe Gallagher, Data Scientist, Stats Perform
Aditya Cherukumudi, Artificial Intelligence Scientist, Stats Perform
Dr Patrick Lucey, Chief Scientist, Stats Perform

Abstract: 

The ability to predict what shot a batsman will attempt given the type of ball and match situation is both one of the most challenging and strategically important tasks in cricket.

The goal of the batsman is to score as many runs without being dismissed, whilst for bowlers their goal is to stem the flow of runs and ideally to dismiss their opponent. Getting the best batsman vs bowler match-up is of paramount importance. For example, for the fielding team, the choice of bowler against the opposition star batsman could be the key difference between winning or losing. Therefore, the ability to have a predefined playbook (as in the NFL) which would allow a team to predict how best to set their fielders given the context of the game, the batsman they are bowling to and bowlers at their disposal would give them a significant strategic advantage.

To this end, we present a personalized deep neural network approach which can predict the probabilities of where a specific batsman will hit a specific bowler and bowl type, in a specific game-scenario. We demonstrate how our personalized predictions provide vital information to inform the decision-making of coaches and captains, both in terms of pre-match and in-game tactical choices, using the 2019 World Cup final between England and New Zealand as a case study example. 

2020 Research Paper Posters

The 2020 Research Paper posters selected for the Conference are listed below.
Description

Download Full Paper Here

Author:

Brian Lehman, Co-Founder & VP Visionist, Inc.

Abstract:

Everyone attempts to determine which college football players will succeed at the next level. This paper takes a novel approach to projecting true NFL potential. Our unique foundation comes from an adaptation of the proven Elo rating system. We create player ratings that evaluate game performance while also accounting for strength of opponent. Over the last ten draft years, our college level player Elo ratings alone identified players whose performance value in the NFL was roughly equivalent to those drafted. We use our player performance curves as the basis for projecting NFL potential.

Nowhere is a draft pick more salient than when looking for a franchise player. A great franchise quarterback is indispensable. Conversely, a poor first round choice can be catastrophic to an organization. Early picks are costly. Even with the high stakes and considerable resources invested in getting it right, the NFL draft is still a gamble. Skill levels vary widely across college football and the sample size to evaluate talent is small. This makes predicting NFL future performance difficult.

This paper introduces a player rating system that helps level the skill-diverse college football landscape. Our normalized player performance metrics enable quantitative player comparisons previously unavailable. Our approach generates player performance curves from these game-by-game metrics, providing a visual representation of a player’s career progression. We identify players with a similar development experience and use them as a model to project future performance.

Description

Download Full Paper Here

Authors:

Gabin Rolland, Research Assistant, LIRIS – École Centrale de Lyon
Romain Vuillemot, Assistant Professor, LIRIS – École Centrale de Lyon
Wouter Bos, PhD, CNRS – Ecole Centrale de Lyon
Nathan Rivière, Student, Ecole Centrale de Lyon

Abstract:

Understanding characteristics of 3-point shots is paramount for modern basketball success, as in recent decades, 3-point shots have become more prevalent in the NBA. They accounted for 33,6% of the number of total shots in 2017-2018, compared to only 3% in 1979-1980. In this paper, we aim at better understanding the connections between the type of 3-point shooting (catch-and-shoots and pull-ups) and the timing for shooting, using two distinct space-time models of player motion. Those models allow us to identify individual behavior as a function of specific defensive settings, e.g. shot-behavior when a player is guarded closely.  We assess our models using SportVU data for specific NBA players.  Our code is open-source to support further research and application of those models.

Description

Download Full Paper Here

Authors:

Vincent Dumoulin, Institute of Statistics, Université de Neuchâtel
Hugues Mercier, Institutes of Computer Science and Mathematics, Université de Neuchâtel

Abstract:

Figure skating has had its share of judging controversies in the last twenty years. The last in line is the suspension of two Chinese judges suspected by the International Skating Union (ISU) of preferential marking in favor of Chinese skaters during the 2018 PyeongChang Olympic Winter Games. In this work we develop novel mathematical techniques to monitor the accuracy and nationalistic bias of figure skating judges. This is fundamental to guarantee a level playing field in this sport. Our analysis reveals systemic nationalistic bias, and although both suspended Chinese judges were undoubtedly biased, they were far from the only ones, nor were they the worst offenders. We also shed light on the current ISU monitoring practices and propose recommendations moving forward. 

Description

Download Full Paper Here

Authors:

Clay Graham, Chief Analytical Architect, Advantage Analytics LLC
Candace Graham, President/Owner, Epic Win Group LLC
Sola Talabi, Lead Nuclear Engineer, Pittsburgh-Technical

Abstract: 

With the repeal of the “Professional and Amateur Sports Protection Act” betting on athletic competition in the United States is now taking on a modicum of legitimacy. Our focus will be investing on baseball games. At the heart of sports betting is calculating the probability of winning the game in general and the bet in particular. Accurately quantifying matchups is essential for prediction consistency and reproducibility. Concurrently sizing the stake (bets) is the foundation to enhancing profitability. Incorporating OPS and multiple methods of calculating the probability of winning generates a very accurate assessments for the outcome of the games. Effectively sizing of the investments (bets) by combining these probabilities with game characteristics generates prediction accuracy of over 62% and while growing profits per bet from 9% to in excess of 15%.

Description

Download Full Paper Here

Author:

Meredith J. Wills, Contributor, The Athletic

Abstract:

The baseball is the one piece of equipment that is used in every game by every player. Since the 1960s, it has generally been consistent…until now. Recently, we have seen a veritable epidemic of changes to the Major League baseball, breaking records and introducing an unprecedented level of unpredictability. Beginning in the second half of the 2015 season, home run rates began to rise, hitting an all-time high in 2017. After a small correction in 2018, a new ball was introduced in 2019 that produced even more offense, topping the 2017 home run record by 11%. Then, at the start of the 2019 MLB postseason, the ball inexplicably deadened. Three changes in the five seasons is, to say the least, uncharacteristic. Here, I examine baseballs from four time periods—pre-2014, 2016-2018, the 2019 regular season, and the 2019 postseason—and find that each show construction differences.

  • 2016-2018: Increased lace thickness
  • 2019 regular season:
    • Smoother leather
    • Flatter seams
    • Greater spherical symmetry
    • Decreased lace thickness
  • 2019 postseason: reintroduction of pre-2019 inventory

In this poster, I offer hypotheses for the sources of these differences and how they may have impacted offense. I also consider what to expect for the 2020 season.

Description

Download Full Paper Here

Authors:

Eric Eager, Pro Football Focus
George Chahrouri, Pro Football Focus

Abstract:

Player valuation is one of the most important problems in all of team sports.  In this paper we use Pro Football Focus (PFF) player grades and participation data to develop a wins above replacement (WAR) model for the National Football League.  We find that PFF WAR at the player level is more stable than traditional, or even advanced, measures of performance, and yields dramatic insights into the value of different positions.  The number of wins a team accrues through its players’ WAR is more stable than other means of measuring team success, such as Pythagorean win totals.  We finish the paper with a discussion of the value of each position in the NFL draft and add nuance to the research suggesting that trading up in the draft is a negative-expected-value move.

Description

Download Full Paper Here

Authors:

Ambra Mazzelli, MIT Sloan and Asia School of Business
Robert S. Nason, Concordia University

Abstract:

We draw on a rich behavioral science tradition to infuse theoretical grounding to NBA player benchmarking by examining how performance feedback impacts player outcomes. We contend that choice of benchmark (i.e. self, team average, or peer rivals) not only impacts assessment (i.e. is a player under- or over-performing), but also how performance feedback is interpreted by players and thus manifests into subsequent player outcomes (i.e. risk taking, errors, and +/-). Performance relative to a player’s own past performance (self) is likely to be attributed to effort while performance relative to team average is attributed to social standing, and performance vis-a-vis rival peers to ability. As a result, responses to performance feedback will depend on the valence of performance feedback (i.e. over or underperforming). In particular, we suggest that while performance feedback from self comparison increases motivation when underperforming and decreases motivation when overperforming, the opposite is true for performance feedback involving social comparisons – players will feel demotivated when underperforming and motivated when outperforming their social referents. In support of our arguments, we find empirical support that self underperformance increases subsequent risk-taking, errors, and +/-, while self overperformance feedback has no effect on risk-taking, errors, and +/-. In contrast, outperforming team average increases subsequent risk-taking, errors, and +/- while underperforming team average reduces subsequent risk-taking, errors, and +/-.  These findings have important implications for when and how to share analytics with players. In particular, we develop a practical tool for providing performance feedback selectively, depending on referent and desired performance change.

Description

Download Full Paper Here

Authors:

Ovunc Yilmaz, Assistant Professor of IT, Analytics, and Operations, Mendoza College of Business, University of Notre Dame
Hayri Alper Arslan, Weatherall Postdoctoral Fellow, Department of Economics, Queen’s University
Necati Tereyagoglu, Assistant Professor of Management Science, Darla Moore School of Business, University of South Carolina

Abstract:

Event organizers are moving from fixed to variable pricing. Although this is theoretically shown to enable organizers to respond to changing demand across events, reports point to somewhat limited implementation due to the unpredictable nature of the popularity of an event and to the unaccounted-for dynamics of the resale market. In this paper, we study the implications of a switch to variable pricing using a quasi-experimental data from the National Football League. Applying a difference-in-differences technique with propensity-score weighting, we find that teams switched to variable pricing sold 2.95% additional tickets per game through the primary market. We provide suggestive evidence that this positive effect is due to the quality-signaling nature of variable pricing for price-sensitive customers. Specifically, we find that variable pricing resulted in higher primary market sales at (i) games in hometowns with lower income levels and higher income diversity, and (ii) unattractive games. We also explore whether variable pricing led to any negative effects through the resale market. With variable pricing, although the number of ticket listings in the resale market went up for unattractive games, customers did not list their tickets at lower prices. This indicates that variable pricing did not lead to cannibalization from resale markets. For attractive games, the minimum listing price in the resale market increased. This shows that the display of popularity through teams’ higher prices increased the option-value for these games, and explains why the primary market ticket sales remained steady for attractive games, even after the increase in prices.

Description

Download Full Paper Here

Authors:

Francisco Peralta Alguacil: Football Data Scientist, Hammarby IF
Pablo Piñones Arce, Hammarby IF
David Sumpter, Professor of Applied Mathematics, Uppsala University
Javier Fernandez, Head of Sports Analytics, FC Barcelona

Abstract:

Soccer has some of the most complex team movement patterns of any team sport. Recently, several measurements have been proposed for evaluating the value of dribbles, passes or shots. The next step is to automatically identify the alternative actions available to players both on and off the ball. 

We address this challenge by building a ‘self-propelled player’ model, simulating attacking roles by maximizing three criteria: pass probability, pitch Impact and pitch control. The model assumes that players can anticipate the movement of the other players on the pitch a few seconds in to the future and maximize the future value of their position. We compared these simulations to player decisions during matches by top-flight men’s teams of Hammarby IF and FC Barcelona. In simulations, we found that the two or three players nearest to the ball tended to optimize the product of pass probability and pitch impact.

In a first-team coaching intervention at Hammarby, players re-watched attacking situations in which they had been involved, and were asked to discuss their own actions in comparison with the model. The players often agreed that the model captured complex game patterns, including off-ball actions. The model also recommended runs that the players hadn’t taken, which the players also found realistic and aided discussions. Despite the novelty of these discussions, the players showed a high willingness to engage with them. We further explored how these techniques can be used to provide automated feedback to players within the match cycle.

Description

Download Full Paper Here

Authors:

John M. Harris, Professor of Mathematics, Furman University
Elizabeth L. Bouzarth, Associate Professor of Mathematics, Furman University
Benjamin C. Grannan, Assistant Professor of Business and Accounting, Furman University
Andrew J. Hartley, Senior Mathematics Major, Furman University
Kevin R. Hutson, Professor of Mathematics, Furman University
Ella M. Morton, Senior Mathematics Major, Furman University

Abstract:

Defensive repositioning strategies (shifts) have become more prevalent in Major League Baseball in recent years. In 2018, batters faced some form of the shift in 34% of their plate appearances. Most teams employ a shift that overloads one side of the infield and adjusts the positioning of the outfield. In this work we describe an integer-programming approach to the positioning of players over the entire field of play. The model uses historical data for individual batters, and it leaves open the possibility of fewer than four infielders. The model also incorporates risk penalties for positioning players too far from areas of the field in which extra-base hits are more likely. Our simulations show that an optimal positioning with three infielders lowered predicted batting average on balls in play (BABIP) by 5.9% for right-handers and by 10.3% for left-handers on average when compared to a standard four-infielder placement of players.

Description

Download Full Paper Here

Authors:

Sergio Llana, Data Scientist, FC Barcelona
Pau Madrero, Data Scientist, FC Barcelona
Javier Fernández, Head of Sports Analytics, FC Barcelona

Abstract:

What should we do to win the next match? This is the most important question a coach can ask to a game analyst and its answer is much more complex to solve than to formulate. In the recent years, new approaches have been presented to address isolated aspects of the game (e.g. quality of shots, space control…) but we lacked tools to perform in-depth opponent scouting. The concepts we present allow to identify opponent’s defensive weaknesses and to discover how to exploit them to gain a competitive advantage in the game.

First, we introduce the concept of off-ball advantage, which identifies when a player controls a valuable space in the field in such a way that, passing the ball to that player, would create a considerable increase in the long-term outcome of the possession. Then, diving deeper into the idea of exploiting spaces of value, we introduce a novel method to assign dynamic defensive areas to each defender. By doing so, we can relate the opponent’s spatial weaknesses to specific players. Finally, we attribute the long-term contribution of both on-ball and off-ball actions. We compute the effective value added of each action in order to to highlight those players that contribute the most to the offensive contribution of a team, and those defenders that are responsible for significant defensive failures.

Description

Author:

Henry Serrano-Wu

Abstract:

Squash is played by over 20 million players in 185 different countries around the world. According to the International Health, Racquet & Sportsclub Association (IHRSA), squash is 1 of the 3 fastest growing sports in the U.S., with participation increasing by 82% between 2007-2011. Despite this explosion in popularity, there has been scant exploration of squash from the sports analytics perspective. Only recently (in 2018) has the Professional Squash Association (PSA) partnered with Sports Data Labs, and the initial focus of this collaboration is to collect in-game biometric data of squash players to track player location. This poster analyzes the men’s final of the 2018 British Open between Miguel Rodriguez and Mohamed El Shorbagy. Akin to how shot charts are compiled for basketball, I track shot-by-shot placement to model each player’s tendency to hit straight drives, cross-court shots, and drop shots. By recording these sequentially, I acquire contextual data for shot selection, thus providing new insight into how players move each other “off the T” for strategic advantage. The resulting method, called S.Q.U.A.B.L. – or Sequence- and QUadrant-Based Learning – provides compelling data to explain how Rodriguez (a.k.a. The Colombian Cannonball) finally defeated the world #1.

There are several applications of this research. Player-specific models can identify tendencies to improve individual performance or inform winning strategies for opponents. With additional data, these models could be extended to quantitatively describe the different styles used by squash players across the world.

Description

Author:

Brian Xu, The Kinkaid School, Student

Abstract:

What happens if two perfect volleyball teams face each other? Both teams would hit every serve in, and they would always side out against the other team, causing the set to continue forever as no team could get a two point lead. By thinking about volleyball in this way, it becomes clear that a team must side out more efficiently than their opponent in order to win.

While side out efficiency is the most important metric in volleyball, how good a team is at passing, setting, and hitting and how good the opposing team is at serving, blocking, and digging affects how likely it is for a team to side out. We seek to determine how important each skill is to winning. To do so, we created multivariable regression models using dependent variables that represented each skill and discarded the variables with low coefficient t values until only statistically significant predictors remained. We then picked the model with the lowest RMSE value and highest R-squared value.

We present a regression model that explains 95% of the variation in wins and differs from a team’s actual set win percentage by 2.84 percentage points on average. By performing min-max normalization on the data, we can compare coefficient confidence intervals to determine how important each skill is to winning. These insights enable coaches to develop more focused training sessions that will improve the skills most relevant to winning and to recruit players who possess the skills most relevant to winning.