Research Papers

2017 Research Paper Competition

Sponsored by ticketmaster_blog page

Overview

Every year, the MIT Sloan Sports Analytics Conference Research Paper Competition brings exciting and innovative insight and changes to the way we analyze sports. With submissions on topics ranging from the spelling bee to rugby, basketball, and more, we represent the largest forum for groundbreaking research in sports. The Research Paper Competition is an incredible opportunity to reach a diverse audience while still contributing to the advancement of analytics in sports.

Previous year’s top papers were featured on top media outlets through the world, and captured the attention of representatives from numerous professional sports teams. Abstract submission for the 2017 Sloan Sports Analytics Conference Research Paper Competition is now open.

Timeline

Wed. 9/28 11:59pmAbstract submission due
Mon. 10/17 11:59pmFull paper requests submitted
Mon. 12/5 11:59pmFull paper submissions due (if selected)
Mon. 1/30 11:59pmFinalists and posters announced
Fri. 2/17 11:59pmSubmission of poster (if selected)
Fri. 2/24 11:59pmSubmission of presentation (if selected)
3/3 - 3/4Conference presentations (if selected)

Sports Tracks

Based on abstract content, all submissions will be entered into one of following Sports Tracks:

  1. Basketball – All submissions related to the sport of basketball
  2. Baseball – All submissions related to the sport of baseball
  3. Other Sports – All submissions related to the playing of a sport that is not basketball or baseball
  4. Business of Sports – Submissions related to the business of owning, managing, or marketing a sport, or to new technology or ideas which could change the face of sports

Competition Format

The competition consists of the following phases:
1. Abstract Phase
Authors submit abstracts. Based on the judged merits of their abstract submissions, a select group of authors will be invited to submit full manuscripts.

2. Full Manuscript Phase
Invited authors submit full manuscripts. Referees will evaluate every manuscript, and authors of the best submissions will be invited to present their findings at the conference. The referees will also select a set of authors who will be invited to their present their work during a poster session.

3. Conference Phase I
a. Poster Competition
All posters selected for the conference will be entered into a competition for Best Poster, determined by a fan vote during the weekend of the conference. Note: this competition is independent of the presentation finals, and none of the posters will advance to the presentation finals.
b. Presentations
Invited authors will present their findings during the first day of the conference. Based on the quality of the presentation and manuscript, one submission per Sports Track will be selected as the track winner, and will advance to the finals.

4. Conference Phase II – Finals
Finalists will present again during day two of conference in front of an industry panel of experts.

Abstract Guidelines

Abstract submissions should be submitted online, and must use the following guidelines:

  • Abstracts must contain fewer than 500 words, including title and body.
  • Abstracts may include up to two tables or figures combined (e.g. 1 figure and 1 table, or 2 tables)
  • Each abstract should contain the following sections:
    • Introduction – What question is this research trying to answer? Why is it an important question for the industry?
    • Methods – Description of relevant statistical methods used, including data sources or data collection procedures
    • Results – Description of actual (not promised) results along with relevant statistics
    • Conclusion – The overall takeaway from the study, including how the results will impact the sports industry
  • All abstracts must be submitted in one PDF through the 2017 Abstract Submission online submission page

Evaluation of Submissions

The conference seeks submissions that report research pertaining to the use of analytics in the sports industry. We are open to contributions ranging from evaluating players and game strategies, to examining the success factors for sports business.
In the abstract and full paper submission process, research will be evaluated on but not necessarily limited to the following criteria:

  • Novelty of research – Does the research provide interesting insight into new models or challenge existing beliefs?
  • Academic rigor / validity of model – Are the methodologies of the model and results fundamentally sound and appropriate?
  • Reproducibility – Can the model and results be replicated independently?

In evaluating presentation finalists at the 2017 SSAC, the above factors will be supplemented by the following criteria, as judged by a panel of academics, industry executives from team management and sports business operations:

  • Application – What are the applications or potential applications of the insights from the research?
  • Interest / impact – Is there significant interest in the proposed question in the field of study or the community at large? What are the benefits or impact of the model or application?

The Research Papers team will review all abstracts. The Review Committee will evaluate all manuscript submissions. The review committee consists of the Research Paper team, as well as academic professors and experts from top universities in fields including Statistics, Information Sciences, and Economics. The industry panel that makes the final winner selection will make its decision on the basis of the paper and the presentation at the 2017 Conference.
NOTE: Papers that have been previously published elsewhere are not eligible for the competition.

Conflict of Interest Policy

Our objective is to ensure an unbiased evaluation of submissions throughout the process. We are aware that members of the evaluation committee may have had relationships with authors who have submitted papers. When possible, potential conflicts of interest are avoided by minimizing the review of research by the following:

  • Authors who have collaborated with the reviewer on previous submissions
  • Current or former students who worked with the reviewer
  • Colleagues from the same organization
  • Any other previous relationships with the author that may prevent an unbiased evaluation of the paper

All potential conflicts of interest will be managed as best as possible while still maintaining the quality of the review process. Final reviews will occur without knowledge of the names of the authors.

Rights and Permissions

All authors retain ownership rights to the research and the right to publish the research after the conference. Upon submission, authors grant access to 42 Analytics to make their research available for public viewing online, in print, and for conference use for the Sloan Sports Analytics Conference.
Authors are responsible for obtaining permission from third parties to reprint copyrighted information such as data, tables, or figures that may be protected by copyright.

2016 Research Paper Finalists

The 2016 Research Papers selected to present at the Conference are listed below.

2016 MIT SLOAN SPORTS ANALYTICS CONFERENCE Research Paper

Description

READ FULL PAPER

Abstract: Soccer, the most watched sport in the world, is a dynamic game where a team’s success relies on both team strategy and individual player contributions. Passing is a cardinal soccer skill and a key factor in strategy development; it helps the team to keep the ball in its possession, move it across the field, and outmaneuver the opposing team in order to score a goal. From a defensive perspective, however, it is just as important to stop passes from happening, thereby disrupting the opposing team’s flow of play. Our main contribution utilizes this fundamental observation to define and learn a spatial map of each team’s defensive weaknesses and strengths. Moreover, as a byproduct of this approach we also obtain a team specific offensive control surface, which describes a team’s ability to retain possession in different regions of the field. Our results can be used to distinguish between different defensive strategies, such as pressing high up the field or sitting back, as well as specific player contributions and the impact of a manager.

Speakers
Description

READ FULL PAPER

Abstract:

The aim of this paper is to discover patterns of player movement and ball striking (short-and longterm shots, and shot combinations) in tennis using HawkEye data which are indicative of changing the probability of winning a point. This is a challenging task because: i) behavior can be unpredictable, ii) the environment is dynamic and the output state-space is large and iii) examples of specific interactions between agents may be limited or non-existent (player A may not have interacted with player B). However, by using a dictionary of discriminative patterns of player behavior, we can form a representation of a player’s style, which is interpretable latent factors that allows us to personalize interactions between players based on the match context (opponent, matchscore). This approach allows us to perform superior point predictions, and to understand how points are won by systematically creating and exploiting spatiotemporal dominance.

Speakers
Description

READ FULL PAPER

Abstract: This research tests the hypothesis that digital data signals gathered from MLB stadium visitors can provide significant insight on the value of brand sponsorships and also inform on optimization of brand placement. The digital data signals analyzed in this study include mobile device location data collected from all MLB stadiums during the 2015 year and the online browsing behavior associated with these devices. The audience research is unique as it marries offline behavior with online behavior by using online behavioral data to inform on visitors to a physical location. Results from our study show that brand sponsorships do have an impact on fan engagement, and in some cases increase in value as the season progresses. As an example, Colorado Rockies fans at Coors’ Field are 13.6 times more likely to visit the airline sponsor’s website than the general population and 4.1 times more likely than all other MLB team fans. If we look at monthly trends, the propensity for fans to visit the airline sponsor site versus all other MLB fans increased from 4.1 to 6.8. This research has far reaching implications for matching teams with prospective sponsors while at the same time providing a comprehensive perspective on how the audience at each ballpark is interacting with current brands. Furthermore, the methods described allow one to quantify the impact of corporate sponsorship in a way that has never been explored before. Lastly, the techniques described here are not limited to MLB parks or to web site visit behavior.

Speakers
Description

READ FULL PAPER

Abstract: Despite considerable advances in the application of data analytics across the sport industry, sponsorship revenue forecasting still largely relies on a decades-old methodology, the renewal rate. This paper marks the first application of survival analysis approaches to analyze the duration of sponsorships, utilizing a dataset of 69 global sponsorships of the Olympic Games and FIFA World Cup. Past and present sponsors of these two global events include some of the world’s most valuable brands, including Adidas, Coca-Cola, FedEx, IBM, McDonald’s, Panasonic, Samsung, Sony, Visa, and Xerox. The utilization of methods other than standard measures of central tendency (i.e., the renewal rate) allows sport organizations who depend on revenue from sponsorship for their survival to determine not just the aggregated percentage of sponsors who historically renew, but when sponsorships are most likely to continue, when the probability of a sponsorship ending is highest, and the sponsorship’s historical median lifetime. Further, these more advanced methods properly account for censored observations, or sponsorships that are currently ongoing. Consistent with prior applications of exchange theory to the sponsorship business-to-business relationship, results found sponsorships were most susceptible to dissolution within the first two renewal periods, and sponsorship durations differ significantly based on which methodology is applied. For example, sponsorship revenue projections varied by as much as $100 million depending on the approach, demonstrating the importance of providing sport managers with advanced data analysis tools to assist in the organization’s sponsorship revenue forecasting activities.

Speakers
Description

READ FULL PAPER

Abstract: Finding an effective counter to the ball screen is a high priority for NBA coaching staffs. I n this paper, we present construction and application of a tool for automatically recognizing common defensive counters to ball screens. Using player tracking data and supervised machine learning techniques, we learn a classifier that labels ball screens by how they were defended. Applied to a selection of games over four seasons, our classifier identified and labeled 270,823 attempts to defend a ball screen. At the team level, we identify outliers who favored a particular defensive scheme on the way to successful seasons. F or example, the ’12-13’ Bulls went “over” 7% more often than the average team that year. For players, we examine both offense and defense. Offensively, we report how often players face a given defense and their effectiveness in creating points from those situations. Notably, Damian Lillard sees defenders go over in ⅔ of his screens, but with 0.84 pts/poss he’s among the league’s best at capitalizing on these opportunities. Defensively, we examine pairs of players and their ability to stifle opponent scoring. In ’13-14’, Dwight Howard and Jeremy Lin were particularly effective when Lin went over screens, holding the offense to just 0.27 pts/poss. This fully automated tool opens the door to analysis of defensive tactics at an unprecedented scale.

Speakers
Description

READ FULL PAPER

Abstract: We test for a “hot hand” (i.e., short-term predictability in performance) in Major League Baseball using panel data.  We find strong evidence for its existence in all ten statistical categories we consider. The magnitudes are significant; being “hot” corresponds to between one-half and one standard deviation in the distribution of player abilities. Our results are in notable contrast to the majority of the hot-hand literature, which has found little to no evidence for a hot hand in sports, often employing basketball shooting data.  We argue that this difference is attributable to endogenous defensive responses: basketball presents sufficient opportunity for transferring defensive resources to equate shooting probabilities across players whereas baseball does not.  We then document that baseball teams do respond to recent success in their opponents’ batting performance.  Our results suggest that teams use recent performance in a manner that is roughly consistent with drawing a correct inference about the magnitude of the hot-hand. However, there is a tendency for teams to overreact to very recent performance (i.e., the last 5 attempts).

Speakers
Description

READ FULL PAPER

Abstract: Pitcher performance projection is a fundamental area in baseball analysis. Traditional projection systems are based on either ERA (or RA/9), which cannot separate pitcher performance from fielder defense well; or K%, BB%, HR% and batted ball classification, from which crucial information like BABIP is difficult to derive. These systems leave a lot to be desired. Fortunately, PitchF/X was introduced to baseball, and by utilizing it we can analyze pitchers with much better accuracy. This paper introduces a new and improved method for pitcher projection, which we call “Arsenal/Zone rating”, using PitchF/X data. The idea is that pitcher performance can be mostly judged and predicted from two aspects: arsenal rating, which corresponds to the speed and movement of the pitch, and zone rating, which is related to the location the pitch with regard to the strike zone. Our combined projection stat, when linearly combine our rating with basic pitching stats(K%, BB%, HR%, etc.), is much better than all mainstream projection systems. Our system scores around 25.3% R2 on 2012-2014 qualified starting pitchers when combined with xFIP, while baseline systems (also combined with xFIP) are around 22~23% R2.

Speakers
Description

READ FULL PAPER

Abstract: We test for a “hot hand” (i.e., short-term predictability in performance) in Major League Baseball using panel data.  We find strong evidence for its existence in all ten statistical categories we consider. The magnitudes are significant; being “hot” corresponds to between one-half and one standard deviation in the distribution of player abilities. Our results are in notable contrast to the majority of the hot-hand literature, which has found little to no evidence for a hot hand in sports, often employing basketball shooting data.  We argue that this difference is attributable to endogenous defensive responses: basketball presents sufficient opportunity for transferring defensive resources to equate shooting probabilities across players whereas baseball does not.  We then document that baseball teams do respond to recent success in their opponents’ batting performance.  Our results suggest that teams use recent performance in a manner that is roughly consistent with drawing a correct inference about the magnitude of the hot-hand. However, there is a tendency for teams to overreact to very recent performance (i.e., the last 5 attempts).

Speakers

2016 Research Paper Posters

The 2016 Research Papers selected to present at the Conference are listed below.
Description

Jeremy Hochstedler, CEO and Co-Founder, Telemetry Sports

READ FULL PAPER

Abstract: Within football and media organizations, significant resources are consumed through the analysis of player performance and decision-making, especially at the quarterback (QB) position. Since 2014, radio-frequency identification (RFID) tracking technology has been used to continuously monitor the on-field locations of NFL players. Using geospatial American football data, this research quantitatively evaluates receiver openness, player elusiveness, and QB decision-making. In addition to enhancing win probability, using NFL injury data, this research also uncovers how QB decisions and passing ability impact the likelihood of receiver concussions. By making better decisions and finding the open receiver, QBs can put their receivers and their teams in better positions to succeed. The NFL has lagged behind other major sports when it comes to analytical player and team analysis, however the geospatial tracking allows for significant opportunities as the data is analyzed both privately and publicly. This research displays the types of analysis possible with geospatial data available in the NFL. Once the data becomes available to teams and other media organizations, the development of insights into players, strategy, and decision-making will change how the game is evaluated and consumed.

Description

Stephanie Kovalchik, Tennis Data Scientist, Tennis Australia and Victoria University

Martin Ingram, Quantitative Researcher, Stratagem Technologies

READ FULL PAPER

Abstract: It is often said that winning in tennis is as much a mental game as a physical one, yet there has been little quantitative study into the mental side of tennis. We present an approach to identify mentalities in tennis with dynamic response patterns that quantify how a player’s probability of winning a point varies in response to the changing situations of a match. Using 3 million points played by professional male and female tennis players between 2011 and 2015, we found that, on average, players were affected by the state of the score and a variety of pressure situations: exhibiting hot hand effects when ahead, defeatist effects when down, and performing less effectively in clutch situations. Player-specific performance patterns suggested a diversity of player mentalities at the elite level, with subgroups of players responding more or less effectively to pressure, score history, and other match situations. One of the patterns found on the men’s tour included four of the most decorated players in the current game, the `Big Four’, suggesting a champion’s mentality that was characterized by cool-headedness on serve and adaptability on return. Accounting for player mentalities improved predicted outcomes of matches, substantiating the importance of the mental game for success in tennis.

Description

Matthias Schubert, Assistant Professor, LMU Munich

Tobias Mahlmann, Postdoctoral Researcher, University of Lund

Anders Drachen, Associate Professor, Aalborg University

READ FULL PAPER

Abstract:

Esports is computer games played in a competitive environment. As for any other type of sports competition, players and teams seek to improve their behavior to optimize their results. Thus, esports analytics is a new area identifying successful strategies and evaluating game play for computer games. Besides helping the players, esports analytics is also directed to help the game provider to ensure a fair and exciting gaming experience.

Multiplayer Online Battle Arena (MOBA) games are among the most played digital games in the world. In these games, teams of players fight against each other in enclosed arena environments, with a complex gameplay focused on tactical combat. The particular MOBA being examined in this paper, DOTA, had already more than 7.86 million active players monthly in 2013.

To win a match, a team has to develop its heroes by killing hostile units and buildings. This mostly happens during encounters involving players from both teams. Several encounters might occur simultaneously on different locations of the map. Thus, to evaluate game play, each fight has to be analyzed separately.

We present a technique for segmenting matches into spatio-temporally defined components representing these encounters which enables us to analyze player performance on a detailed level. We apply encounter-based analysis to match data from the popular esport game DOTA, and present win probability predictions based on encounters. Finally, metrics for evaluating team performance during match runtime are proposed.

Description

Michael Schuckers, Professor, St. Lawrence University

READ FULL PAPER

Abstract: One of the most important tasks for the general manager of any sports team is the efficient acquisition of player talent. Often one relatively inexpensive ways to accomplish this is through a league draft. In this paper we use historical data available when players were eligible to be selected in the National Hockey League (NHL) Player Entry Draft to build a statistical prediction model for their performance in the NHL. The data that we use is demographic (e.g, heights and weights), pre-draft performance (e.g., points per game and goals against average) and scouting (rankings from the NHL’s own Central Scouting Service (CSS)). We focused on two cohorts of players: those drafted in the 1998 to 2002 drafts and those eligible to be taken in the 2004 to 2008 drafts. In both cohorts, we train our model on the first three draft years and test our model’s performance on the remaining (out of sample) two years. We find that in both cohorts our statistical model consistently orders players for selection in a way that is more highly correlated with how they eventually perform in the NHL. Simply stated, our statistical model is better at ordering players for the NHL draft than NHL teams using only data available when players were selected.

Description

Laszlo Gyarmati, Senior Software Engineer, Qatar Computing Research Institute

Mohamed Hefeeda, Principal Scientist, Qatar Computing Research Institute

READ FULL PAPER

Abstract: It is challenging to get access to datasets related to the physical performance of soccer players. The teams consider such information highly confidential, especially if it covers in-game performance. Despite the fact that numerous teams deployed player tracking systems in their stadiums, datasets of this nature are not available for research or for public usage. Hence, most of the analysis and evaluation of the players’ performance do not contain much information on the physical aspect of the game, creating a blindspot in performance analysis. We propose a novel method to solve this issue by deriving movement characteristics of soccer players. We use event-based datasets from data provider companies covering 50+ soccer leagues allowing us to analyze the movement profiles of potentially tens of thousands of players without any major investment. Our methodology does not require expensive, dedicated player tracking system deployed in the stadium. Instead, if the game is broadcasted, our methodology can be used. We also compute the similarity of the players based on their movement characteristics and as such identify potential candidates who may be able to replace a given player. Finally, we quantify the uniqueness and consistency of players in terms of their in-game movements. Our study is the first of its kind that focuses on the movements of soccer players at scale, while it derives novel, actionable insights for the soccer industry from event-based datasets.

Description

Jim Pagels, Economics Research Assistant, Johns Hopkins University

READ FULL PAPER

Abstract: In an age where live events are the only television programs that can garner collective mass audiences anymore, media rights deals in the four major sports (NBA, MLB, NHL, and NFL) continue to escalate by huge rates every time they are up for renewal. However, games frequently overlap with each other on the calendar. It is often discussed how NFL games allegedly crush ratings for the World Series or NBA playoff games devour the audience for their NHL playoff counterparts. If ratings are so critical to bottom lines and competition does in fact hurt ratings, though, why then do so many sports willingly overlap while other parts of the calendar are left empty? In an industry where teams hire armies of statisticians, coaches, trainers, and scouts to claw at every last inch of competitive edge and where leagues squeeze out every last drop of revenue, one would think someone would notice if that were the case—or does programming competition from other sports simply have little effect on ratings? This paper attempts to isolate the effects of overlap from each sport, examine how that competition affects viewership in each league, and quantify the value lost due to that overlap. We find that competition can have very damaging effects on TV viewership for every sport, most notably the NHL, and these losses can significantly diminish the value of programming rights. In most cases, this overlap is entirely avoidable with some relatively unobtrusive season calendar shifts.

Description

Scott Brave, Policy Economist, Federal Reserve Bank of Chicago

READ FULL PAPER

Abstract: For a profit maximizing sports league, performance enhancing drug (PED) testing is a way in which to avoid some of the costs of player PED use while maintaining competitive balance. Profit maximizing teams have a countervailing incentive to want to employ PED-using players that arises from competition. Using a modified version of the competitive talent market model and estimates of MLB team financials from Forbes, I show that PED testing increased the competitive balance of MLB by altering the risk-return trade-off faced by teams for employing PED-using players. For a typical MLB team, I estimate that testing reduced franchise value by an average of 3 million in 2005 dollars over the last ten years. This result reflects significant impacts on teams’ non-gate revenues, player costs, and future profit growth from PED suspensions. Using variation over time in MLB’s testing policy, I also estimate the number of minor and major league suspensions per season (~20 minor and ~4 major league) characteristic of a policy which balances the costs and benefits of player PED use at the league and team levels. At times over the last ten years MLB has come close to achieving such a policy, falling short most often because of an overabundance of minor league suspensions.

Description

 

Ben Singer, MBA Candidate, Yale School of Management

READ FULL PAPER

Abstract: The Great Recession in the United States not only incurred widespread economic turmoil, but also was one of the few times during the last decade when National Football League (NFL) attendance figures fell below almost complete attendance. However, NFL ticket prices never decreased on average during this time. This paper takes advantage of this natural experiment to analyze the macroeconomic, team performance, and other factors that influence the market for NFL tickets before, during, and after the Great Recession. Data for 30 teams from the years 2004-2014 are used in a regression model that accounts both for team-based fixed effects and autocorrelation in the errors. The results show that the unemployment rate in a team’s metropolitan area is a strong predictor of that team’s attendance; that the mean annual wage in a team’s metropolitan area and the opening of a new stadium are strongly predictive of the price of non-premium tickets; and that the relative ticket price increase associated with a new stadium is greater for non-premium tickets than for premium tickets. Additionally, there is no evidence that team performance affects either component of ticket demand, after controlling for team fixed effects. These findings lend insight into how the Great Recession impacted demand for NFL tickets and which factors can be used to predict the market for NFL tickets in the future. Furthermore, this paper lays the foundation for future work that investigates to what degree NFL teams’ ticket pricing strategies were economically rational during this time period.

Description

Kuan-Chieh Wang, PhD Candidate in Computer Science, University of Toronto

READ FULL PAPER

Abstract: The amount of raw information available for basketball analytics has been given a great boost with the availability of player tracking data. This facilitates detailed analyses of player movement patterns. In this paper, we focus on the difficult problem of offensive playcall classification. While outstanding individual players are crucial for the success of a team, the strategies that a team can execute and their understanding of the opposing team’s strategies also greatly influence game outcomes. These strategies often involve complex interactions between players. We apply techniques from machine learning to directly process SportVU tracking data, specifically variants of neural networks. Our system can label as many sequences with the correct playcall given roughly double the data a human expert needs with high precision, but takes only a fraction of the time. We also show that the system can achieve good recognition rates when trained on one season and tested on the next.

Description

Matthew van Bommel, Graduate Student, Simon Fraser University

READ FULL PAPER

Abstract: Box score statistics in the National Basketball Association are used to measure and evaluate player performance. Some of these statistics are subjective in nature and since box score statistics are recorded by scorekeepers hired by the home team for each game, there exists potential for inconsistency and bias. These inconsistencies can have far reaching consequences, particularly with the rise in popularity of daily fantasy sports. Using box score data, we estimate models able to quantify both the bias and the generosity of each scorekeeper for two of the most subjective statistics: assists and blocks. We also use optical player tracking data for the 2014-2015 season to improve the assist model by including other contextual spatio-temporal variables such as time of possession, player locations, and distance traveled. From this model, we present results measuring the impact of the scorekeeper and of the other contextual variables on the probability of a pass being recorded as an assist. Results for adjusting season assist totals to remove scorekeeper influence as well as a discussion of the impact on daily fantasy sports are also presented.

Description

Hisham Talukder, Data Scientist, Dow Jones

Thomas Vincent, Data Scientist Engineer, DigitalOcean

READ FULL PAPER

Abstract: Player injuries have long been a cause of concern to NBA team management and fans, as they can significantly affect the overall performance of a team. Over the past 2 years many high caliber players, such as Kobe Bryant, Kevin Durant and Derrick Rose, have missed significant amounts of playing time as a result of injuries. In this work we present a model that offers a quantitative and systematic approach to injury prevention by allowing teams to predict the likelihood that any given player will succumb to an injury event during the course of an upcoming game.
We apply advanced machine learning techniques to predict the probability of injury for a player. Our model is based on play-by-play game data, SportsVU data, player workload and measurements, and team schedules from the last 2 years. Our results demonstrate strong accuracy in predicting whether a player will get injured in an upcoming week. By combining these results with information on team schedules and rest days, our approach enables team management and decision-makers to identify the best time for a team to rest their star players and reduce the risk of long-term injuries, while optimizing team strategies.
Finally, we show the effect of injuries on NBA teams as well as on the fans’ experience. By accounting for the amount of money invested in each player, we can rank player injuries based on the financial cost of missed games associated with these injuries. Using our model can be used as a valuable asset for NBA team management.

Description

 

Dan Cervone, Moore-Sloan Data Science Fellow, New York University

READ FULL PAPER

Abstract: Continuously throughout a basketball possession, offensive and defensive players control different regions of the basketball court. An efficient offense creates opportunities for ball control in high value regions (for the appropriate players), whereas good defense suppresses such opportunities. From this competitive dynamic alone, we can infer the implicit value of positioning and spacing among NBA players and teams using hierarchical, spatially regularized paired comparison models.
This allows us to quantify the NBA court “real estate market”, enabling new insights and metrics for both offense and defense. For instance, we can measure players’ off-ball impact on offense by calculating the value of the space freed up for their teammates to control. For analyzing defense, we can quantify how effectively different teams (and different lineups within teams) contain the offense within low-value regions of the court.

Description

John Salmon, Assistant Professor, Brigham Young University

Willie Harrison, Assistant Professor, University of Colorado, Colorado Springs

READ FULL PAPER

Abstract: The earned-run average (ERA) is a common statistic used to evaluate performance on the mound. However, ERA can be decomposed further and is often not presented nor calculated dynamically over time. The Component ERA statistic, or CERA, enhances the traditional ERA by providing a more detailed metric to analyze pitching performance. By appropriately aggregating the results of each individual batter a pitcher faces throughout the season, an “Instantaneous” CERA value can also be calculated. When these values are plotted over time into a profile, the performance trends on an individual pitcher, and the comparison between pitchers, is readily available through various data visualizations. When moving averages (MA) are applied to the CERA profile, crossovers between MA’s over different “time frames” (i.e. numbers of batters faced) can be used as trigger points for the identification of potential trends, suggesting issues such as pitcher fatigue or discomfort. In general, these crossover points from a MA approach, similar to some investing strategies, can suggest when a pitcher is trending up or down providing additional evidence for making more informed pitching decisions. Furthermore, since pitchers have different characteristic CERA profiles and crossover points, the additional data could suggest different ways to properly manage pitchers. CERA statistic profiles could be added to the arsenal of tools available to a manager making decisions about pulling pitchers from a game or the overall team’s pitching rotation and schedule.​

Description

Walter King, Undergraduate Student, The Ohio State University

READ FULL PAPER

Abstract: zWins is a new methodology for Wins Above Replacement (WAR) in baseball, which outperforms and overcomes weaknesses of existing WAR measures by introducing major innovations and providing a flexible system of calculations with complete disclosure of all formulas and methods. WAR is the most widely used measure of player productivity in baseball, and estimates how many additional wins each player produces relative to their presumed replacement. WAR is an attractive concept; it considers offense, defense, and pitching to give each player one number that represents their productivity measured by their direct effect on team success.
Improved measures of performance serve to better inform management decisions regarding player compensation, trades, and other features of player personnel. In Major League Baseball, where players regularly command economic investments on the magnitude of nine figures, it necessary for teams and owners to have accurate measures of player productivity to properly gauge the monetary value of players.
zWins improves upon not only WAR, but also the subcomponents that make up offensive and defensive estimates of performance. In particular, the run production model used in zWins is consistently more accurate than the production models used in various WAR measures, and provides a cohesive strategy for estimating the performance of pitchers and fielders. As a result, zWins shows superior accuracy over competing WAR measures when tested against both actual team performance and Pythagorean Expectation.

Description

Peter Fadde, Professor and Director of Learning Systems Design and Technology, Southern Illinois University

Sean Müller, Senior Lecturer, Motor Learning and Control in the School of Psychology and Exercise Science, Murdoch University

READ FULL PAPER

Abstract: Pitch recognition is highly valued in modern baseball, with statistics such as “eye” (walk-to-strikeout ratio) now appearing in prospect profiles. But statistics are after-the-fact and noisy. Adding systematic testing of Pitch Recognition (PR) can therefore improve talent identification, player development, and – at MLB level – player preparation and utilization. Using the simple and validated video-occlusion method (videos showing batters’ view of pitches are cut off at or shortly after release) we have tested the PR skills of more than 200 minor league, Cape Cod, and college batters. We score batters’ ability to identify pitch type (Fastball, Curveball, Changeup) and predict location (Ball/Strike) on a 20-80 scale. Mean scores of MiLB batters are 60 on both dimensions. Scores under 55 are bottom 25 percent and over 65 are top 25 percent. Cape Cod batters averaged 61 on Type and 63 on Location.
PR testing of batters on a Midwest League team revealed insights for player development. A high draft pick from college because of exceptional plate discipline was hitting under .200 while coaches retooled his swing. His 64 (Type)/71(Location) PR scores showed he still had his good eye. Imagine testing a high school or international prospects’ PR before signing him. Imagine testing an advancing hitter’s PR for the next level. Imagine testing hitters’ ability to “see” pitchers they have faced few, if any, times. A baseball team or organization that adopts PR testing and learns what PR scores mean in their system can substantially improve talent identification, player development, and opponent preparation.