Every year, the MIT Sloan Sports Analytics Conference Research Paper Competition brings exciting and innovative insight and changes to the way we analyze sports. With submissions on topics ranging from the spelling bee to rugby, basketball, and more, we represent the largest forum for groundbreaking research in sports. The Research Paper Competition is an incredible opportunity to reach a diverse audience while still contributing to the advancement of analytics in sports.
Previous year’s top papers were featured on Numbers Never Lie, mentioned and re-run in publications across the world (including TIME, ESPN the Magazine, and The New York Times), and captured the attention of representatives from numerous professional sports teams. We are looking forward to your submission for the 2015 Sloan Sports Analytics Conference Research Paper Competition.
Sports Tracks: This year we will be introducing a track system to the research paper competition. All submissions will be entered into one of four tracks based on the content of the abstract. The tracks are:
Basketball – All submissions related to the sport of basketball Baseball – All submissions related to the sport of baseball Other Sports – All submissions related to other sports Business of Sports – Submissions related to the business of owning, managing, or marketing a sport, or to new technology or ideas which could change the sports industry
2015 Research Paper Finalists
The 2015 Research Papers selected to present at the Conference are listed below.
The finals of the Research Paper Competition at the 9th MIT Sloan Sports Analytics Conference, February 27-28, 2015, Boston, MA. The finalists, representing the best research in the four tracks of the 2015 research paper competition (sponsored by Ticketmaster), give summaries of their research to an all-star panel of judges: FiveThirtyEight founder Nate Silver, Yale University professor emeritus Edward Tufte, Houston Rockets basketball operations analyst Jonathan Hennessy, and Ticketmaster executive John Forese.
READ FULL PAPERAbstract: Due to the ease of recording points, assists, and related goal-scoring statistics, the vast majority of advanced basketball metrics developed to date have focused on offensive production. It is straightforward to see who scored the most points in the 1985/86 season (Alex English, with 2414) or took the most 3-point shots in 1991/92 (Vernon Maxwell, with 473). However, try to look up who had the most points against in 2013/14, or who prevented the most shots from being taken that year, and the history books are, remarkably, empty. Thus we stand in a muddled state where offensive ability is naturally quantified with numerous directly-measured numbers, yet we attempt to explain defensive ability through statistics only loosely related to overall defensive ability, such as blocks and steals. Alternatively, we quote regression-based metrics such as adjusted plus/minus which give no insight into how or why a player is effective defensively. This paper bridges this gap, introducing a new suite of defensive metrics that aim to progress the field of basketball analytics by enriching the measurement of defensive play. Our results demonstrate that the combination of player tracking, statistical modeling, and visualization enable a far richer characterization of defense than has previously been possible. Our method, when combined with more traditional offensive statistics, provides a well-rounded summary of a player’s contribution to the final outcome of a game.
READ FULL PAPERAbstract: Conventional approaches to simulate matches have ignored that in basketball the dynamics of ball movement is very sensitive to the lineups on the court and unique identities of players on both offense and defense sides. In this paper, we propose the simulation infrastructure that can bridge the gap between player identity and team level network. We model the progression of a basketball match using a probabilistic graphical model. We model every touch and event in a game as a sequence of transitions between discrete states. We treat the progression of a match as a graph, where each node is a network structure of players on the court, their actions, events, etc., and edges denote possible moves in the game flow. Our results show that either changes in the team lineup or changes in the opponent team lineup significantly affect the dynamics of a match progression. Evaluation on the match data for the 2013-14 NBA season suggests that the graphical model approach is appropriate for modeling a basketball match.
READ FULL PAPERAbstract: In this paper, we present a method which accurately estimates the likelihood of chances in soccer using strategic features from an entire season of player and ball tracking data taken from a professional league. From the data, we analyzed the spatiotemporal patterns of the ten-second window of play before a shot for nearly 10,000 shots. From our analysis, we found that not only is the game phase important (i.e., corner, free-kick, open-play, counter attack etc.), the strategic features such as defender proximity, interaction of surrounding players, speed of play, coupled with the shot location play an impact on determining the likelihood of a team scoring a goal. Using our spatiotemporal strategic features, we can accurately measure the likelihood of each shot. We use this analysis to quantify the efficiency of each team and their strategy.
READ FULL PAPERAbstract: The NFL uses numerous complex rules in scheduling regular season games to maintain fairness, attractiveness and its wide appeal to all fans and franchises. While these rules balance a majority of the features, they are not robust in spacing games to avoid competitive imbalance. We consider the scheduling of NFL regular season games and formulate a mixed-integer linear program (MILP) to alleviate competitive disadvantages originating from the assignment of bye weeks, Thursday games and streaks of home-away games among various other sources. We propose a two-phase heuristic approach to seek solutions to the resulting large-scale MILP and conduct computational experiments to illustrate how past NFL schedules could have been improved for fairness. We also demonstrate the efficiency and stability of our approach by creating balanced schedules on an extensive set of simulated possible future NFL seasons. Our experiments show that the heuristic can quickly create a large pool of schedules that are completely free of disadvantages due to scheduling of bye-weeks and well-balanced in preparation time differences due to Thursday games.
READ FULL PAPERAbstract: With over a trillion dollars being risked on worldwide sports gambling every year, the interest to modeling game performance in general and baseball in particular has gained growing popularity. Integrating baseball game modeling with analytically based gambling, allows for these two elements to be exploited with a single objective: profiting from the marketplace inequities between the game (production) and betting markets (price and lines). Two questions will be addressed: First, can an accurate baseball gaming model be derived and used to calculate the probability of winning and the economic consequence predicated upon the betting line? Second, what is the optimal bet size based upon the risk tolerances (operational constraints) of the investor? Included will be the derivation of a production function which can be used to calculate the probability of a winning team. Defining the implication of the betting line will address cost, payoffs, and the implied probabilities of winning. Expected Return on Investment and Betting Edge will provide an economics perspective.
READ FULL PAPERAbstract: The purpose of this paper is to introduce a new methodology for quantifying what is commonly referred to as “pitch framing,” in which we attempt to divide the credit for whether a pitch is called a ball or strike among the catcher, the pitcher, the batter, and the umpire involved. We call our system Strike Zone Plus/Minus, and it is unique from other pitch framing methodologies in two ways. First, we treat pitchers, batters, and umpires as independent actors in the system rather than treating them as variables to adjust the catcher's performance by. Second, we use Baseball Info Solutions data on where the catcher sets his target for the pitch, allowing us to incorporate the pitcher's command (how close he comes to hitting the target) into our system. Our results show that we are successfully measuring the abilities of each participant independently of each other and that we are reliably measuring a consistent pitch framing skill. Strike Zone Plus/Minus produces results that are more comprehensive and in some cases radically different than publicly available framing methodologies, which will have many implications. The most direct one will be the valuation of catchers in the free agent market.
Abstract: The sports industry has seen a rapid adoption of dynamic pricing practices in recent years. However, there is still limited understanding on the effect of dynamic pricing on revenue in sport event settings and how to execute effective dynamic pricing strategies. In this paper, we address these issues by developing a comprehensive demand model for single-game ticket sales that can be used to predict the revenue associated with a particular pricing strategy over the course of a sport season. We apply the model to actual ticket sales and pricing data from an anonymous Major League Baseball franchise during a recent MLB season, and evaluate the effectiveness of the dynamic pricing policy applied by our partner franchise during that period. We find that the actual dynamic pricing strategy used by this franchise resulted in revenue decrease of 0.79% compared to a pricing policy where prices are fixed over time. We propose alternative pricing policies to help improve revenue and find that an optimized dynamic pricing policy can improve revenue by 14.3% compared to a pricing policy where prices are fixed over time.
READ FULL PAPERAbstract: Hockey journalists and statisticians currently lack many of the empirical tools available in other sports. In this paper I introduce a win probability metric for the NHL and use it to develop a new statistic, Added Goal Value, which evaluates player offensive productivity. The metric is the first of its kind to incorporate power play information and is the only NHL in-game win probability metric currently available. I show how win probabilities can enhance the narrative around an individual game and can also be used to evaluate playoff series win probabilities. I then introduce Added Goal Value, which improves upon traditional offensive player statistics by accounting for game context. A player's AGV has a strong positive correlation between seasons, making it a useful statistic for predicting future offensive productivity. By accounting for the context in which goals are scored, AGV also allows for comparisons to be made between players who have identical goal-scoring rates. The work in this paper provides several advances in hockey analytics and also provides a framework for unifying current and future work on Corsi, Fenwick and other NHL analytics.