2018 Research Paper Finalists & Posters

We are proud to announce our 8 Research Paper Finalists & 12 Research Paper Posters for the 2018 Sloan Sports Analytics Conference:

2018 MIT SLOAN SPORTS ANALYTICS CONFERENCE Research Paper

Description

Download Full Paper Here

Abstract: The task of extracting informative measures of talent for Major League Baseball (MLB) players has a surprising parallel in the field of natural language processing — the task of constructing useful word embeddings. Words, like MLB players, can be considered distinct elements in a set, and one common way to represent such categorical data in machine learning algorithms is as one-hot encodings. However, one drawback of one-hot encodings is that every element in the set is equally similar (or dissimilar) to every other element in the set (due to their mutual orthogonality). But words (and players) do exhibit varying degrees of similarity. By modeling how words behave in different contexts, word embedding algorithms (like word2vec) learn to mathematically encode such similarities as geometric relationships between vectors (e.g., cosine similarity or Euclidean distance). This paper introduces (batter|pitcher)2vec, a neural network algorithm that adapts these representation learning concepts to a baseball setting, modeling player talent by learning to predict the outcome of an at-bat given the context of a specific batter and pitcher. The learned representations qualitatively appear to better reflect baseball intuition than traditional baseball statistics, for example, by grouping together pitchers who rely primarily on pitches with dramatic movement. Further, like word2vec, the representations possess intriguing algebraic properties, for example, capturing the fact that Bryce Harper might be considered Mike Trout’s left-handed doppelgänger. Lastly, (batter|pitcher)2vec is significantly more accurate at modeling future at-bat outcomes for previously unseen matchups than simpler approaches.

Speakers
Description

Download Full Paper Here

Abstract: Baseball players must be able to see and react in an instant, yet it is hotly debated whether superior performance is associated with superior sensorimotor abilities. In this research, we compare sensorimotor abilities, measured through 8 psychomotor tasks comprising the Nike Sensory Station assessment battery, and game statistics in a sample of 252 professional baseball players to evaluate the links between sensorimotor skills and on-field performance. For this purpose, we develop a series of Bayesian hierarchical latent variable models enabling us to compare statistics across professional baseball leagues. Within this framework, we find that sensorimotor abilities are significant predictors of on-base percentage, walk rate and strikeout rate, accounting for age, position, and league. We find no such relationship for either slugging percentage or fielder-independent pitching. The pattern of results suggests performance contributions from both visual-sensory and visual-motor abilities and indicates that sensorimotor screenings may be useful for player scouting. Moreover, these findings point to specific domains of visual perception and psychomotor control that may offer useful targets for training interventions to accelerate learning and improve on-field performance.

Speakers
Description

Download Full Paper Here

Abstract: Sketching plays is a universal way for coaches to communicate what they want their players to do. What if a coach didn’t have to rely solely on intuition, but could instead foresee how the defending team is likely to respond to the intended play? Such a tool would enable spontaneous creativity while providing real-time objective analysis explicitly tailored to the current game state. In this work, we consider play sketching from a data-driven perspective. We combine a powerful analytics framework built on deep-imitation learning with an intelligent and highly intuitive user interface. Users freehand sketch plays or modify existing tracking data. Our software then infers the equivalent animation and synthesizes realistic “ghost” defenders. Users can test their plays against different teams and game contexts, and fine-tune sketches to maximize the expected points in a given situation. Until now, insights extracted from player tracking data were only available post-game—primarily because of the complexity of the algorithms and the domain-specific knowledge required to use them. Our software, on the other hand, uses a familiar intuitive interface and operates in real-time on a tablet. Analytics are no longer constrained to the back office and instead can operate courtside for in-game decisions. Additionally, fans can also Monday morning quarterback: simulating alternate offensive decisions using real game data and discovering whether these “what if” scenarios can bust the ghosted defenses.

Speakers
Description

Download Full Paper Here

Abstract: Evaluating shooting ability is a critical component of player comparison and player development. However, players are often evaluated on a limited number of shots, exposing assessment to high variation and inaccurate, anecdotal conclusions. The aim of this paper is to explore the potential of high-resolution shot data to improve shooter evaluation. Using over 22 million shots captured in high-resolution by the Noah Shooting System, we reveal previously hidden systematic biases in entry left-right and entry depth from all positions on the court. Then, we focus on the high-resolution shot data from 509 NBA, college and high school players to train a machine-learning algorithm that predicts shooting ability from 25-shot sessions. The algorithm outperforms conventional methods and better ranks players by skill-level. We conclude by encouraging coaches and players to re-evaluate their largely anecdotal assessment methods and implement more effective, data-driven methods to enhance shooter development and shooter ranking.

Speakers
Description

Download Full Paper Here

Abstract: Professional sports teams incur a large financial burden due to injuries. One aim of training load monitoring is to identify factors related to injury so that appropriate training interventions can be planned. While the relationship between training load and injury has been explored in sports such as Rugby and Australian Rules football, little is known about this relationship in American football. Therefore, our study aimed to identify the relationship between training load and non-contact injury in NFL athletes during training. Eleven inertial sensor variables, differentiated into three sub-groups, were used for training load quantification: Player Load (PLTotal, PLLow, PLMed, PLHigh, PLVH), IMA (IMALow, IMAMed, IMAHigh), and Impacts (ImpactsLow, ImpactsMed, ImpactsHigh). Four models were iteratively fit: One for each inertial sensor subgroup and a joint model, using all variables. Model comparisons were made using Bayesian Information Criterion (BIC) and out of sample log likelihood. Out of sample log likelihood and BIC indicated that the joint model outperformed the sub-group models and should be accepted as the preferred model for explaining the relationship between non-contact injury and training in NFL athletes. This model suggests that a one-unit increase in PLTotal and ImpactsHigh leads to a 6.48x and 2.01x greater risk of injury, respectively. Conversely, a one-unit increase in PLLow decreases the odds of injury by 0.31. These findings may aid practitioners in understanding risk factors of injury in American football training and assist with the planning of subsequent training sessions. 

Speakers
Description

Download Full Paper Here

Abstract: Soccer analytics has long focused on the outcomes of discrete, on-ball events; however, in a game where each player has the ball 3% of the time, on average, much of the sport’s complexity resides in off-ball events. A recurrent subject in observation-based tactical analysis is the creation and closure of spaces, yet it remains highly unexplored from a quantitative perspective. We present a method for quantifying spatial value occupation and generation during open play. Our approach proposes a novel pitch control model that incorporates motion information, relative distance to the ball and player position in order to provide a smooth surface of potential ball control. We also provide a model for the relative value of any field location based on the position of the ball, using feed-forward neural networks. This quantification of space creation allows us to observe Sergio Busquets’ high relevance during positional attacks through his pivoting skills, the dragging power of Luis Suarez to generate spaces for his teammates, and unravels the capacity of Lionel Messi to occupy spaces of value with smooth movements along the field, among many other characteristics. Evaluating space occupation and generation opens the door for new research on off-ball dynamics that can be applied in specific matches and situations, and directly integrated into coaches analysis. This information can be used not only to better evaluate players’ contributions to their teams but also to improve their positioning and movement through coaching, providing a key competitive advantage in a complex and dynamic sport.

Speakers
Description

Download Full Paper Here

Abstract: Esports should have vast data, but often do not. League of Legends only has boxscore-equivalent statistics: items, gold, champion and turret deaths, etc. All current LoL analytics from amateurs to pros rely on this rudimentary data. We present a new unique way to capture the far more abundant and useful underlying data. We track every champion’s location multiple times every second. We also track every ability cast, every attack made, and all damages caused and avoided. We track the amount of vision granted and denied in the fog of war. We track every player’s health and mana and cooldowns. We track everything continuously, invisibly, remotely, and live, without any impact on the user. Using a combination of computer vision, dynamic client hooks, machine learning, visualization, logistic regression, large-scale cloud computing, and fast and frugal trees, we generate this new high-frequency, optical-equivalent data on millions of ranked League of Legends games, calibrate an in-game win probability model, develop enhanced definitions for standard metrics, introduce dozens more advanced metrics, automate player improvement analysis, and apply a new player-evaluation framework on the basic and advanced stats.How much does an individual contribute to a team’s performance? We find that individual actions filtered by win probability correlate almost perfectly to team performance: regular kills and deaths do not nearly explain as much as smart kills and worthless deaths. Our approach offers immediate applications for other eSports as well as traditional sports. All the code for our entire process will soon be open-sourced.

Speakers
Description

Download Full Paper Here

Abstract: Daily Fantasy Sports (DFS) is a multi-billion dollar industry with millions of annual users and widespread appeal among sports fans across a broad range of popular sports. Building on the recent work of Hunter, Vielma and Zaman (2016) we provide a coherent framework for constructing DFS portfolios where we explicitly model the behavior of other DFS players. We formulate an optimization problem that accurately describes the DFS problem for a risk-neutral decision-maker in both double-up and top-heavy payoff settings. Our formulation maximizes the expected reward subject to portfolio feasibility constraints. We relate this formulation to the finance literature on mean-variance optimization and in particular, the literature on outperforming stochastic benchmarks. Using this connection we show how our problems can be reduced (via some simple assumptions and approximations) to the problem of solving binary quadratic programs. One of the contributions of our work is the introduction of a Dirichlet-multinomial data generating process for modeling opponents’ team selections. We estimate the parameters of this model via Dirichlet regressions. A benefit to modeling opponents’ team selections is that it enables us to estimate the value of “insider trading” where an insider, e.g. an employee of the DFS contest organizers, gets to see information on opponents’ portfolio choices before making his own team selections. We demonstrate the value of our framework by applying it to both double-up and top-heavy DFS contests in the 2017-2018 NFL season.

Speakers

2018 Research Paper Posters

The 2018 Research Paper posters selected for the Conference are listed below.
Description

Download Full Paper Here

Author:
Omar Ajmeri, Ali Shah

Abstract: NFL coaches spend countless hours tagging and mining game film, searching for tendencies and patterns to exploit in upcoming matchups. Film tagging alone is a tedious and error-prone task – coaches need to label formations, personnel, and routes. Depending on the play, there can be up to 6 designed routes to account for, generating a wealth of data for each play. While self-scouting is fairly simple given a coach’s understanding of his own playbook, competitively scouting every team in the league on a week to week basis is an extremely time-consuming task. Player tracking data can be used as an effective tool to efficiently label formations and plays, however NFL teams currently can only access their own team’s data. As a result, we developed an algorithm that can capture player tracking data for all teams. Through a series of computer vision techniques looking at pixel density and weighted spatial reasoning, we have automated the classification of NFL All-22 game film from start (offensive formation labeling) to finish (video player tracking coordinates throughout the life of a play). This not only includes formations, but also player routes and player speeds. This effective player tracking system has implications for game planning, scouting, and better evaluation of individual players and coaches. The ability to analyze player location data on a mass scale in a short period of time will fundamentally change how football coaches scout and analyze players and opposing coaches throughout the league.

Speakers
Description

Download Full Paper Here

Abstract: In this paper, we highlight three current issues with win-probability models: i) lack of context, ii) no measure of prediction uncertainty, and iii) no publicly available datasets or models against which to compare. To remedy the last issue, we are releasing our NBA play-by-play dataset and base win-probability model to the research community (see https://www.stats.com/data-science/). To address the issue of context, we developed a neural network architecture which uses team rosters and game states to encode the on-court lineup for a given matchup. The addition of the lineup encoding allows for substantial improvements in model accuracy over existing methods (88% vs. 75%). To capture the uncertainty of our predictions, we moved from match-outcome prediction to final score difference prediction, providing a measure of uncertainty by estimating the likelihood of all possible scores. In addition to capturing the uncertainty of a given match outcome prediction, this approach allows for interactive story-telling applications by enabling the exploration of “alternative outcomes”- for example, what if Kawhi Leonard did not get injured during Game 1 of the Western Conference Finals. In the future, by complementing score prediction with a recurrent architecture, we should see the score distributions collapse, thereby allowing us to determine points of no return in a match, and determine what decisions are irreversible or lead to uncertainty growth in the inferred outcome prediction.

Speakers
Description

Download Full Paper Here

Abstract: During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with a difficult decision when defending against the Cleveland Cavaliers: “Do you double and risk giving up easy shots, or stay at home and do the best you can?” It’s a tough call, but finding a good defensive strategy that effectively incorporates doubling can make all the difference in the NBA. In this paper, we analyze double teaming in the NBA, quantifying the trade-off between risk and reward. Using player trajectory data pertaining to over 643,000 possessions, we identified when the ball-handler was double teamed. Given these data and whether the defense was successful, we used deep reinforcement learning to estimate the quality of the defensive actions. We present qualitative and quantitative results summarizing our learned defensive strategy. In particular, when double teaming Kyrie Irving on the 3 point line, the learned policy suggests leaving a man on the opposite wing open upon an attack from left, and leaving a man in the paint open upon an attack from the right. Based on data from past seasons, when doubling against the Cavs, we estimate that the Indiana Pacers and the Atlanta Hawks had the most room for improvement, while the Chicago Bulls and the Golden State Warriors were playing closest to the learned strategy. Overall, the proposed framework represents a step toward a more comprehensive understanding of defensive strategies in the NBA.

Speakers
Description

Download Full Paper Here

Authors:
Nate Sandholtz, Luke Bornn

Abstract: In basketball, the shot clock makes evaluating shot selection a difficult task. For example, a mid-range jump shot is a relatively inefficient decision early in the shot clock, but it gradually becomes more efficient relative to expectation as the shot clock winds down. These subtle dynamics often get overlooked when evaluating shot selection, which in turn leads to slightly misguided conclusions. Though mid-range jumpers are on average the least efficient shot in the NBA, we cannot simply conclude that teams should take fewer of them across the board — we must consider when and whom should take fewer shots, and how these changes would affect the team’s overall production. This is the key point of interest in this project. In pursuit of answers, we’ve developed statistical methods to simulate, or “re-play”, a team’s regular season plays under different shot probabilities. This entails simulating plays not simply by outcome but rather at the sub-second level, incorporating every intermediary and terminal on-ball event over the course of a play. We do this with respect to time, player, court-region, and defensive pressure, allowing us to explore incredibly nuanced changes to team shot policies. To this end, we model possessions from the 2015-2016 NBA regular season as Markov chains realized from team-specific non-stationary Markov decision processes. Using the estimated decision processes and the initial states of regular season plays, we simulate seasons for each NBA team and forecast the consequences under two alternate mid-range shot policies. 

Speakers
Description

Download Full Paper Here

Abstract: In recent years, the “counter-press”, “geggenpress” and “counter-attack” employed by Pep Guardiola’s, Jurgen Klopp’s and Jose Mourinho’s teams respectively, have been in vogue due to their ability to create good scoring chances by effectively overloading the team that has just lost the ball. These fast-paced, aggressive, and direct transitions provide some of the best opportunities for scoring, and are a potent strategy when executed effectively. Despite the importance around transitions in soccer, in terms of analytics, no quantitative measures have emerged. There are two prime reasons for this: i) obtaining the precise onset and offset time-stamp of a counter-attack is extremely challenging as the task is subjective and fine-grained, and ii) measuring the structural patterns and movements of a team is equally subjective and challenging for a human to annotate. Here we leverage supervised and unsupervised machine learning techniques to automatically and objectively detect these transition situations. First, we learn the formation and playbook of a team via hierarchical clustering. Next, from event sequences and team information we construct a measure of “offensive threat” to further classify threatening and non-threatening plays. Finally, our game-state specific templates enable us to quantify the “defensive disorder” of a team as they transition from offense to defense. From this analysis we are able to detect counter-attacks directly from the player-tracking data, without any human labels, and then use this to quantify the value and impact of execution on the counter-attack, both offensively and defensively, and identify teams with similar transition styles.

Speakers
Description

Download Full Paper Here

Abstract: The gap in resources between the richest and poorest teams in world football is growing wider each season as demonstrated by Paris St German spending a world record 222 million Euro on Neymar who scored 15 goals in the 17/18 Ligue 1 season. The ability for a small market team to replicate the same goal output for the price of an effective set-piece strategy is a clear market inefficiency that can be exploited. Given the large amount of data now available, it should now be possible to quantitatively measure these things. In this paper, we employ a mythbuster’s approach by first stating the common-held belief and seeing if this is true or not. To do this we present an attribute-driven approach to set-piece analysis, which utilizes a hybrid of deep-learning methods to detect complex attributes such as defensive marking schemes, and hand-crafted features to enable interpretability. Specifically, we employ a Convolutional Neural Network (CNN), which adequately captures the defensive structure of a team around set-pieces. Additionally, we use expected metrics such as expected goal value (xG) to value the quality of chances that a team creates based on the location and quality of delivery in addition to the defensive attributes. Our research demonstrates which types of delivery are the most dangerous and how this varies by team. As a result, we are now able to provide a recommendation to a coach and analyst about how a team may play against them and how to prepare for and exploit the opposition’s strengths and weaknesses.

Speakers
Description

Download Full Paper Here

Abstract: The growing availability of players’ tracking data has led to a number of data-driven ghosting models that aim to imitate players’ behaviors in various sports. However, models trained on such tracking data typically assume that the future behavior of the players depends only on their (x,y) locations in the court. Such an assumption makes these models overly simplistic and prevents them from learning subtle behavior patterns of real players. To address this issue, we present an egocentric basketball ghosting model. Our model predicts a player’s future behavior from an egocentric image, which we obtain from a wearable GoPro camera on a player’s head. In contrast to prior methods that use tracking data or third-person cameras, our approach of using first-person cameras allows us to capture exactly what the players see during a game – making it easier to understand and imitate their behavior. Our model uses a single egocentric image to generate a plausible behavior sequence in the form of 12D egocentric camera configurations, which encode a player’s 3D location and his 3D head orientation. We accomplish this via two deep convolutional networks, which are both trained in an unsupervised fashion and do not require manual human annotations. In our experimental section, we demonstrate that our egocentric ghosting model generates realistic basketball sequences, that can be used to predict a player’s future behavior. Furthermore, we show that by inspecting intermediate neuron activations in our trained networks, we can better understand how the model decides what the player will do next.

Speakers
Description

Download Full Paper Here

Abstract: The mental side of the game has been one of the most elusive aspects of performance analysis in tennis. We present a framework for predicting seven emotional states relevant to sport (‘anxiety’, ‘anger’, ‘annoyance’, ‘dejection’, ‘elation’, ‘focus’, and ‘fired up’) from the observed facial expressions of players in match broadcasts. Our methodology applies pre-trained models to extract two feature sets: predicted emotions in the Facial Action Coding System and 17 facial action units. Multiple prediction approaches were trained and tested using these features and a labeled dataset of 1,700 facial images of professional male and female tennis players extracted from 505 match videos. We applied the prediction models to establish emotional profiles for the ‘Big 4’ (Roger Federer, Rafael Nadal, Andy Murray, and Novak Djokovic) at the 2017 Australian Open. Rafael Nadal exhibited the most ‘anxiety’ of the four players (32%, 95% CI 29 to 35%), while Roger Federer was the only player whose predominant state was ‘neutral’ (24%, 95% CI 21 to 27%). When the predicted emotions were associated with point outcomes, we found that all of the Big 4 except for Roger Federer showed significant emotional reactions to the outcomes of points. Further, several emotional sates of Rafael Nadal and Novak Djokovic were significantly predictive of their chances of winning the next point. Our framework for extracting emotional data from single-camera video in professional tennis shows the feasibility of bringing the quantitative study of the inner game into sports performance analysis.

Speakers
Description

Download Full Paper Here

Abstract: Even with the recent influx of data regarding NCAA basketball players and professional teams investing more resources into scouting than ever before, NBA decision makers continue to struggle to consistently select productive players. In this study, I determine the NCAA statistics and pre-draft player characteristics that predict draft position and NBA performance for all NCAA players drafted between 2006-2013. Based on the factors that are under and overemphasized by NBA decision makers, I then examine how these choices relate to general decision making theory. Linear regression models are specified for both draft position and NBA performance, with percentage based metrics used in place of traditional box score counting statistics (i.e. assist percentage instead of assists per game). All factors are adjusted for the position of the player, classified as a Big, Wing, or Point Guard rather than into one of the five traditional positions. A Heckman (1971) sample selection correction is also applied to correct for the non-randomly selected NBA performance sample, which necessarily excludes players who have not played a sufficient (>=500) amount of NBA minutes. NBA decision makers continue to base their draft selections on statistics and characteristics that do not actually predict future NBA success. Overemphasized factors include scoring, size, and college conference, while ball control and offensive efficiency are generally underrated. NBA draft strategy at large can also be connected to certain decision making theories, such as Heath and Tversky’s (1991) competency hypothesis, Samuelson and Zeckhauser’s (1988) status quo bias, and Kahneman and Tversky’s (1979) theory of risk aversion when facing possible gains.

Speakers
Description

Download Full Paper Here

Abstract: Team sports such as hockey and basketball involve complex player interactions. Modeling how players interact presents a great challenge to researchers in the field of sports analysis. The most common source of data available for this type of analysis is player trajectory data, which encode vital information about the motion, action, and intention of players. At an individual level, each player exhibits a characteristic trajectory style that can distinguish him from other players. At a team level, a set of player trajectories forms unique dynamics that differentiate the team from others. We believe both players and teams possess their own particular spatio-temporal patterns hidden in the trajectory data and we propose a generic deep learning model that learns powerful representations from player trajectories. In brief, we use layers of 1D convolutions to learn discriminative feature representations from player tracking data while also resolving the permutation problem inherent in player tracking data. With the learned representations, our model can automatically recognize events, identify players, and classify teams. We show that, on the Sportlogiq hockey dataset, our model with only trajectories as input outperforms a deep neural network that takes videos as input, on the task of event recognition. Our model achieves even better performance when used in combination with videos. We also demonstrate, on a basketball dataset, how our model excels at team classification using only player trajectories. We believe these deep learning trajectory representations have potential for varied applications in understanding and predicting player and team activities in sports analytics.

Speakers
Description

Download Full Paper Here

Abstract: Many models have been constructed to quantify the quality of shots in soccer. In this paper, we evaluate the quality of off-ball positioning, preceding shots, that could lead to goals. For example, consider a tall unmarked center forward positioned at the far post during a corner kick. Sometimes the cross comes in and the center forward heads it in effortlessly, other times the cross flies over his head. Another example is of a winger, played onside, while making a run in past the defensive line. Sometimes the through-ball arrives; other times the winger must break off their run because a teammate has failed to deliver a timely pass. In both circumstances, the attacking player has created an opportunity even if they never received the ball. In this paper, we construct a probabilistic physics-based model that uses spatio-temporal player tracking data to quantify such off-ball scoring opportunities (OBSO). This model can be used to highlight which, if any, players are likely to score at any point during the match and where on the pitch their scoring is likely to come from. We show how this model can be used in three key ways: 1) to identify and analyze important opportunities during a match 2) to assist opposition analysis by highlighting the regions of the pitch where specific players or teams are more likely to create off-ball scoring opportunities 3) to automate talent identification by finding the players across an entire league that are most proficient at creating off-ball scoring opportunities.

Speakers
Description

Download Full Paper Here

Abstract: Injuries are one of the largest determinants in sport; and as such injury risk management should be at the forefront of all major sporting organizations. Many risk management platforms and algorithms attempt to utilize the broad range of available data to assist in identifying players that are likely to encounter issues. Presented in this study are the results of such a platform that analyses performance against many common pitfalls with the problem. Sophisticated data-preprocessing coupled with Artificial-Neural-Networks are utilized on a dataset of players across two Australian Rules Football teams to identify if lower-body soft-tissue injury risk is related to training patterns and player-load. Analysis of player-risk on gameday and in training sessions is discussed with relevance to the inherent-increased risk on gameday. Influence of previous injury and recent injury history are analyzed to identify the influence on the resulting injury risk. With multiple teams’ worth of data available, utilization of both training sets in AI training and validation is discussed to identify if different training patterns can be combined under the one model and if a model can be pre-trained with one team in order to yield better results. Overall, models with ROC-AUC scores of 0.78 can be achieved which can be utilized by organizations to reduce injury rates by providing a means of targeted intervention in training plans and player load.

Speakers