2019 Research Paper Finalists & Posters

We are proud to announce our 8 Research Paper Finalists and 11 Posters selected for the 2019 Sloan Sports Analytics Conference:

2019 MIT SLOAN SPORTS ANALYTICS CONFERENCE Research Paper

Description

Download Full Paper Here

Abstract: In 2011, SportVU fundamentally changed the way that basketball can be analyzed. STATS SportVU utilized a six-camera system installed in basketball arenas to track the real-time positions of players, at 25 frames per second. In this paper, we demonstrate how we can apply deep learning techniques to this data to produce a queryable database of basketball possessions.

We trained an unsupervised machine learning pipeline that generates a representation, called a trajectory embedding, of how individual players move on offense. The representation is a 32-dimensional vector of floating-point numbers that captures the semantics of a single player’s movement, such as locations of the endpoints, screen actions, court coverage, and other spatial features. We generated nearly 3 million trajectory-embeddings from three seasons of data (2013-2014, 2014-2015, 2015-2016).

We found that the Euclidean distance between trajectory-embeddings is an excellent indicator of the visual similarity of the movements they encode. For example, two different movements of a post-up in the right block will have nearby embeddings; a post-up in the right block and a screen action above the left wing will have distant embeddings. This result led to the Similar Possessions Finder, a queryable database of basketball possessions.

The Similar Possessions Finder can be used to quickly answer queries such as “How much more frequently did Andre Drummond establish position on the right block than on the left block during the 2015-2016 regular season?” and “Find all possessions from the 2014 playoffs in which Chris Paul ran a screen action in the high post that ended with DeAndre Jordan scoring.”

Speakers
Description

Download Full Paper Here

Abstract: DeepQB is a proposed application of deep neural networks to player tracking data from over two full seasons of American professional football. This novel approach demonstrates the ability to successfully understand complex aspects of the passing game, most notably quarterback decision-making. It can assess and compare individual quarterback pass target selection based on a snapshot presented to the passer by the receivers and defenders. Assessments of quarterback decision-making are made by comparing actual target selection to that predicted by our model. The model performs well, correctly identifying the targeted receiver in 60% of cross-validated cases. When passers target the predicted receiver, passes are completed 74% of the time, compared to 55% when the QB targets any other receiver. This performance is surprisingly strong, given that the offense often conceals its intent by design, while defenses try not to allow any single receiver to be open. Further, quarterback passing skills separate and apart from his receivers and defense are isolated and assessed by comparing metrics of actual play success to the metrics of success predicted by the situation presented to the passer. This approach represents a new way for teams, media, and fans to understand and quantitatively assess quarterback decision-making, an aspect of the sport which has previously been opaque and inaccessible.

Speakers
Description

Download Full Paper Here

Abstract: What is the right way to think about analytics in soccer? Is the sport about measured events such as passes and goals, possession percentages and traveled distance? While analytical work to date has focused primarily on these isolated aspects of the sport, coaches tend to focus on the broader tactical interplay of all 22 players on the pitch. Specifically, soccer analytics lacks a comprehensive approach that can start to address performance-related questions that are closer to the language of the game. This language pose questions such as “who adds more value?”, “how and where is this value added?”, “are teammates creating valuable space?”, “when and how should a backward pass be taken?”, “how risky is a team attacking strategy?”, “what is a player’s decision-making profile?” – questions currently unanswered in the soccer analytics literature.

We present a model that quantifies the expected outcome of a soccer possession at any time during the possession, driven by a fine-grained evaluation of the full spatio-temporal characteristics of the 22 players and the ball. The model is designed in a decoupled way which provides great interpretative power for both visual and quantitative analysis of game situations, allowing to inspect the potential value of ball drives, shots, or passes to any location. Deep learning-based component models are built to capture the complex intricacies of spatiotemporal tactics, while a high-level stochastic process model fuses each component model together in a cohesive, interpretable way.

Throughout this paper we present a wide set of practical applications that showcase the interpretation capacity of this model.

Speakers
Description

Download Full Paper Here

Abstract: This paper takes a different approach to evaluating face-offs and instead of looking at win percentages, the de facto measure of successful face-off takers for decades, focuses on the game events following the face-off and how directionality, clean wins, and player handedness play a significant role in creating value. This demonstrates how not all face-off wins are made equal: some players consistently create post-face-off value through clean wins and by directing the puck to high-value areas of the ice. As a result, we propose an expected events face-off model as well as a wins above expected model that take into account the value added on a face-off by targeting the puck to specific areas on the ice in various contexts, as well as the impact this has on subsequent game events.

Speakers
Description

Download Full Paper Here

Abstract: Baseball enthusiasts have debated how individual pitch characteristics like fastball velocity and curveball movement impact pitching results.  Studying these elements in isolation, however, fails to account for relationships among a pitcher’s different pitches. The relationships among all pitches in a pitcher’s arsenal—rather than the characteristics of a single pitch—are a larger determinate of success.  

Using PITCHf/x information for over 2.5 million MLB pitches thrown by 402 pitchers from 2012 through 2017, model-based clustering techniques are used to group each pitcher’s pitches based on velocity, horizontal movement, and vertical movement.  The range of these measurements, the distance between clusters, and the distribution of pitch types are used in several machine learning models to predict annual pitcher strikeout percentages. The best performing model has a median absolute error of 2.47 percentage points from a pitcher’s actual strikeout percentage.  

Maximum pitch velocity and a pitcher’s strike rate are the most important predictors of strikeout percentage.  Both the maximum amount of vertical movement and the range of vertical movement among a pitcher’s pitches are the next most important predictors of strikeout percentage—more so than the ability to change speeds or a pitcher’s number of pitch types.  These insights may enable players to develop pitch characteristics most likely to increase strikeouts, assist coaches in identifying promising young pitchers, and ultimately help determine when pitchers should be removed from games.

Speakers
Description

Download Full Paper Here

Abstract: In the past decade, significant efforts have been made to understand injury risk in sport using subjective (i.e. rating of perceived exertion) and objective (i.e. inertial sensor outputs) player-monitoring strategies. Particular focus has been placed on the acute:chronic workload ratio (ACWR), defined as the ratio of average acute (1-week) to chronic (4-week) training loads. In the past 5 years, numerous academic papers across multiple sports have concluded that ACWR is predictive of injury risk, and as a result the ACWR has become standard practice in professional sports to manage player workloads.

In this paper, we demonstrate that causal conclusions about the ACWR-injury relationship are prone to confounding from schedule. We use Monte Carlo methods combined with training load data from two sports to illustrate the effect that the yearly training calendar has on the ACWR-injury relationship. We then propose options to mediate this confounding. Our study impacts not only the academic discourse around the ACWR, but also gives practitioners a more realistic expectation of its value in predicting injury.

Speakers
Description

Download Full Paper Here

Abstract: Coaches of professional sports teams are often credited or blamed for the success or failure of their teams, and they are compensated as if they are one of the most important features of a franchise. Although we have anecdotal evidence that coaches matter, the sports analytics literature has generally concluded that they do not. We present a new method for estimating coach effects, which we call Randomization Inference for Leader Effects, or RIFLE. We apply RIFLE to the MLB, NBA, NHL, NFL, college football, and college basketball. We detect coaching effects in all sports. Our estimates generally imply that coaches explain about 20-30 percent of the variation in a team’s success, although coaching effects vary notably across settings and across various outcomes. For example, baseball managers affect runs allowed more than runs scored. Coaches matters more in college football than in the NFL, but do not meaningfully differ in their use of rushing vs. passing. In addition to estimating average coaching effects, we also discuss the difficult task of assessing the quality of an individual coach.

Speakers
Description

Download Full Paper Here

Abstract: Most rules governing trades in the NBA effectively reduce the likelihood of transactions by limiting the assets that qualify for exchange.  Pick protections, which allow a single asset to take on thousands of values, theoretically do the opposite by enabling teams to change the value of a pick to match specific trade conditions.  Evidence suggests, though, that pick protections are not yet being used to their full potential. Despite the freedom to place protections at any position, they are historically concentrated on numbers that are strong psychological anchors like 5, 10, and 14. This clustering suggests that improved valuation of pick protections would be useful information in the NBA trade market. This paper develops a system for valuing protected NBA draft picks.  We adapt a financial asset pricing method to generate a risk-adjusted “basketball price,” in term of on-court player contributions, for any draft pick with any protection scheme. The model is used to evaluate two recently-traded picks: the Lakers’ protected 2015 first-round pick that was traded from the Sun to the 76ers in February 2015, and the Bucks’ 2018 first-round pick that was traded to the Suns in November 2017. This system could be valuable to teams seeking to more precisely value protected picks in prospective trades or by observers seeking to understand how well teams have used protections in the past.  It could also be used to provide a league-wide valuation of picks, potentially reducing the transactions costs of trading protected picks.

Speakers

2019 Research Paper Posters

The 2019 Research Paper posters selected for the Conference are listed below.
Description

Download Full Paper Here

Abstract: Over the last couple decades, large TV contracts and increased availability for media consumption have made up for this potential loss in in-stadium revenue. However, as linear TV is increasingly threatened by streaming and over-the-top (OTT) service providers, sports properties have begun investing large amount of resources into finding new audiences based in the digital environment; namely Fantasy Sports, e-Sports, and e-Gambling. This research analyzes the effectiveness of this outreach and tests the hypothesis that these three emerging fields are good investments for sports teams seeking to expand their current pool of in-stadium ticket buyers.

We present a forward-looking methodology into how teams should spend their investment resources and expand the understanding of how a new market can be targeted with precise and effective messaging. An in-depth understanding of a potential acquisition audience, before the allocation of money and resources, will only increase the likelihood of success. We believe our experiment is a first step in shedding light onto an area of investment that is difficult to quantify. Finally, we envision applications beyond the study of audiences. We’ve demonstrated the model’s ability to measure the interests of a subpopulation without the need for first party data sources. Adding additional data sources, such as purchase data, would only increase the effectiveness of this methodology and allow us to provide increased specificity in the analysis of results.

Speakers
Description

Download Full Paper Here

Abstract: Recent play in Major League Baseball has showcased many attempts to achieve an advantage through smart selection of pitcher-batter matchups. Ideally, each pitcher-batter matchup should be analyzed in isolation of other matchup data to estimate probabilities for outcomes of specific at-bats; however, most matchups have not occurred frequently enough to yield statistically meaningful estimates. In this paper, we move beyond simple lefty-righty matchup rules, and present a technique for classifying pitcher-batter matchups within a set of cluster types. The training of clusters and the verification of our techniques, require statistically significant matchup data and the presence of large bicliques in high pitcher-batter matchup-count bipartite graphs. We present several of these bicliques, and simulate thousands of innings of baseball to show the accuracy of our clustering algorithm compared to the “truth” (estimates obtained with the maximum-likelihood rule). We use this approach to verify the utility of selecting 15 cluster types within the matchup data. We then show how these cluster types can be used to simulate potential matchups where data is sparse or nonexistent in the real world. The result is a set of algorithms that can be used to identify optimal pitcher-batter matchup strategies in cases when matchup data is abundant and when it is sparse. The 15-cluster approach is shown to produce significantly better estimates of matchup probabilities compared to estimates obtained using only traditional lefty-righty rules. By our estimates, the best pitcher-batter matchups are better by roughly half a run per inning over the worst possible matchups.

Speakers
Description

Download Full Paper Here

Abstract: This paper uses Major League Baseball data to examine the relationship between years remaining on player contracts and player performance. There is a potential for moral hazard to arise in this principal-agent relationship as the player may choose a less than optimal level of effort from the perspective of the team when the player has many guaranteed years remaining. This is referred to as shirking. The key challenge in identifying shirking in this setting is that there is positive selection into multi-year contracts. Only the best players are signed to multi-year contracts. To address this positive selection, a player fixed-effects estimation strategy is employed which finds a negative, significant relationship between years remaining and performance. The primary contribution of this work is to show that this relationship is due to shirking. Alternative explanations for this relationship, that teams sign improving players to multi-year contracts or players face an adjustment process when joining a new team, are addressed. Sources of player heterogeneity where shirking is most or least likely are identified. Additional evidence shows that shirking occurs on offense, not defense, and for position players, not pitchers.

Speakers
Description

Download Full Paper Here

Abstract: With the rise of optical tracking data, the ability to accurately model player movement has become a key competitive advantage in many sports. Analysis of this data presents a substantial challenge, due to both the scale of the data, which can consist of more than 100 million rows for a given season, and to the sophisticated methods required to make sense of it. Current approaches generally fall into two categories, black-box methods and Markov models, but both groups have clear shortcomings. Black-box methods have no clear interpretation while traditional Markov-based methods rely on restrictive assumptions about the nature of movement, limiting their capacity to represent complex movement patterns. In this work, we combine elements of traditional Markov approaches with tools from spatial statistics to develop a flexible nonparametric method which allows for complicated patterns of movement and incorporates the presence of meaningful spatial features (such as the three-point line), while remaining completely interpretable. Our key insight is that Markov transition densities can be estimated using a Poisson point process. In this paper we provide a brief overview of the connection between these two mathematical concepts and demonstrate how this relationship is useful in a variety of NBA applications.

Speakers
Description

Download Full Paper Here

Abstract: Pace of play is an important characteristic in hockey as well as other team sports. We provide the first comprehensive study of pace within the sport of hockey, focusing on how teams and players impact pace in different regions of the ice, and the resultant effect on other aspects of the game.

First we examined how pace of play varies across the surface of the rink, in different periods, at different manpower situations, between different professional leagues, and through time between seasons. Our analysis of pace by zone helps to explain some of the counter-intuitive results reported in prior studies. For instance, we show that the negative correlation between attacking speed and shots/goals is likely due to a large decline in attacking speed in the OZ.

We also studied how pace impacts the outcomes of various events. We found that pace is positively-correlated with both high-danger zone entries (e.g. odd-man rushes) and higher shot quality. However, we find that passes with failed receptions occur at higher speeds than successful receptions. These findings suggest that increased pace is beneficial, but perhaps only up to a certain extent. Higher pace can create breakdowns in defensive structure and lead to better scoring chances but can also lead to more turnovers.

Finally, we analyzed team and player-level pace in the NHL, highlighting the considerable variability in how teams and players attack and defend against pace. Taken together, our results demonstrate that measures of team-level pace derived from spatio-temporal data are informative metrics in hockey and should prove useful in other team sports.

Speakers
Description

Download Full Paper Here

Abstract: The NBA is widely regarded as a “superstar-driven league.” However, superstars may miss games due to injury or be purposefully “rested” by teams. A superstar’s absence has detrimental effects on the quality of games, especially with respect to the fan experience. This paper uses econometric methods to quantify the per-game value associated with the NBA’s top players by evaluating ticket price changes on a secondary marketplace when such players are announced to miss a game. We collect high temporal frequency microdata from an online secondary ticket marketplace and the exact timing of player absence announcements to determine the reduction in willingness-to-pay associated with a superstar absence. We find that absences of several superstars, including popular players like Stephen Curry, Kyrie Irving, and Anthony Davis, have a statistically significant and economically meaningful impact ranging from a 7-25% ($9-$25) reduction in the average ticket price for matchups they miss. Over a season, this can lead to millions of dollars in welfare losses. We conduct additional heterogeneity tests, finding that certain players, like Stephen Curry and Kevin Durant, exhibit much larger away game absence effects, while others like Anthony Davis and Kristaps Porzingis experience much larger home game absence effects. Furthermore, the negative impact of a superstar absence is much smaller for games played in larger markets. Our findings have significant ramifications for the NBA and individual franchises, including NBA policies on resting players, the implications of suspensions for welfare of fans, and franchise decisions about dynamic pricing schemes in the primary marketplace.

Speakers
Description

Download Full Paper Here

Abstract: Consumption of sports is trending towards fragmented content on mobile devices. To quote a recent McKinsey & Co study, “We aren’t losing fans, we are fighting shorter attention spans.” In contrast to live television, highlights are visual and short: perfect content for social media. We describe a framework for automatic highlight reel extraction based on game event data. The framework can be applied to most team sports given the output of any in-game state valuation model, of which there are many. We adjust the valuation of events based on their impact on the game’s result, a highly relevant factor in terms of fan interest. We demonstrate the usefulness of our approach for extracting both highlights and lowlights, the latter of which has proven difficult with previous approaches. We apply a minimum entropy threshold to avoid monotony in highlight reels. We introduce Event Interest, treating the interest of an event as a continuous function across time rather than a discrete instant. We introduce Cumulative Event Interest, the output of which provides a simple means of extracting game highlights. We demonstrate that this output can be easily modified depending on the type of highlights required by adjusting a limited, intuitive number of parameters of the Cumulative Event Interest function. We provide examples of the output in the form of graphical game summaries and their corresponding highlight videos.

Speakers
Description

Download Full Paper Here

Abstract: In professional soccer, transfer fees grow ever larger with sums of over $100m frequently changing hands between top clubs. However, one key position remains undervalued: the goalkeeper. Of the fifty most expensive transfers in history just two are goalkeepers. Why might this be? Perhaps stopping goals intrinsically lacks the allure of scoring them, but it’s also possible that there is a perceived parity among goalkeeper’s abilities, their actions are relatively infrequent and historically, descriptive goalkeeper event data has been sparse. It’s been hard to get a quantitative handle on a keeper’s value. Using data from StatsBomb, we outline a framework to evaluate goalkeepers on four key responsibilities: Shot Stopping, Cross Collection, Defensive Activity, and Distribution.

Probabilistic models have been trained and calibrated to estimate individual goalkeeper shot stopping skill and cross collecting aggression. Positional deviations are estimated via a K-Nearest Neighbor algorithm while distribution is assessed through the lens of attacking contribution and the player’s reaction to receiving opponent pressure. This work–and the resulting metrics—enable a flexible matching algorithm to be used to identify prospects that match specific goalkeeper profiles. For example, preliminary results from this framework saw Manchester United’s David De Gea as the highest performing goalkeeper in the 2017-18 Premier League season. We match him to a rising young star in England’s third tier, Dean Henderson, who performed well while on loan at Shrewsbury Town. His parent club is the same as De Gea, Manchester United. Maybe they already own his eventual replacement?

Speakers
Description

Download Full Paper Here

Abstract: Like innings in baseball, curling takes place over “ends” (8 or 10). Each end comprises of 16 shots, with each shot a decision point based on several factors that can be examined with numbers.

This paper reviews the datasets currently available for curling, what statistics and analysis has been implemented to date, and how they can be applied to scouting and in-game strategy. We consider how scoreboard differential impacts tactics and can influence the probability for scoring outcomes in each end. Win Probability and Efficiency Statistics are explained, with comparisons of top ranked teams from the past season. We provide examples of common game situations and demonstrate how teams can make better decisions using analytics. We share sample scouting reports and explain how to evaluate a team’s performance and assess their style of play. We conclude with considerations on possible next stages for curling analytics and what is needed to achieve these goals.

Analytics within curling has taken time to gain acceptance, but players and coaches have begun to understand and are starting to push for more data and information. The window for the adoption of analytics in curling is just starting to open, but in a sport compared to chess, as teams improve, and the level of skill equalizes, using analytics will become an essential piece for success in the future.

Speakers
Description

Download Full Paper Here

Abstract: Allocative efficiency is fundamentally a spatial problem—the distribution of shot attempts within a lineup is highly dependent on court location.  Despite the importance of spatial context, there are very few allocative efficiency analyses which have explicitly accounted for this critical factor. Our unique contribution with this work is a method to analyze allocative efficiency spatially.

The main idea behind our approach is to compare a player’s field goal percentage (FG%) to his field goal attempt rate in context of his four teammates in any given lineup.  To this end, we build Bayesian hierarchical models to estimate player field goal percentages (FG%) and field goal attempt (FGA) rates at every location on the floor using publicly available NBA shot location data.  We next determine the rank of each player’s FG% and FGA relative to his four teammates at every location in the half-court.  Finally, by pairing each player’s lineup-specific FGA rankings with their corresponding FG% rankings, we can explore the relationship between FG% rank and FGA rank and detect areas where the lineup exhibits inefficient allocation of shots. 

We further analyze the impact that deviations from optimality have on a lineup’s overall efficiency.  We develop a measure called lineup points lost (LPL), which we define as the difference in expected points between the observed allocation of shot attempts and the optimal redistribution.  Using these metrics, we can quantify how many points are being lost through inefficient spatial lineup shot allocation, visualize where they are being lost, and identify which players are responsible.

Speakers
Description

Download Full Paper Here

Abstract:While most existing soccer performance metrics focus on players’ technical and physical performances, they typically ignore the mental pressure under which these performances were delivered. Yet, mental pressure is a recurrent concept in the analysis of players’ or teams’ performances. Hence, this paper takes a first step towards objectively understanding how high-mental pressure situations affect the performances and behavior of soccer players.

We introduce an approach that compares soccer players’ performances across different levels of mental pressure. For each game situation, our approach uses a machine learned model to estimate how much mental pressure the player possessing the ball experiences using a combination of match context features and the current game state. Similarly, our approach uses machine learned models to evaluate three aspects of each action performed by the player: the choice of action, the execution of the chosen action, and the action’s expected contribution to the scoreline.

We demonstrate the ability of our approach to provide actionable insights for soccer clubs in four relevant use cases: player acquisition, training, tactical decisions, and lineups and substitutions. For example, we identify Houssem Aouar and Xherdan Shaqiri as suitable replacements for Leicester City’s former star Riyad Mahrez. We also identify a large number of needless fouls under pressure as a fixable weakness of Orlando City’s striker Dom Dwyer. Since soccer players are often confronted with high-pressure situations, our metric provides insights in the link between pressure and performance that can provide soccer clubs a competitive advantage.

Speakers