How can we evaluate an NBA player’s decision making? Certainly, we can consider whether he takes high-percentage shots or turns the ball over, but how do we evaluate his drive that collapsed the defense, or his decision to shoot a contested shot instead of making the extra pass? The vast majority of traditional and advanced NBA metrics rely on only a small subset of the actions that comprise a basketball possession: shots, assists, rebounds, and turnovers. In doing so, many players’ contributions are overlooked (e.g. passes that aren’t assists, or dribble penetrations that don’t end in layups/dunks) or overvalued (sinking a contested mid-range shot when a teammate was open for a corner three).
Pointwise introduces a new foundation for quantitative decision analysis in the NBA based on “expected possession value”, or EPV. By analyzing players’ tendencies using optical tracking data, we assign a point value to every instant of every NBA possession by computing the number of points the offense is expected to score by the end of that possession. EPV acts like a stock ticker, responding continuously to players’ movement and decision-making. Using EPV, we can assess the quality of every on-ball decision a player makes by how much he increased or decreased the value of the possession for his team. EPV thus opens up new avenues of basketball analysis that focus on decision-making, opportunity creation and prevention, and optimal responses that were not possible before—in short, a new microeconomics for the NBA.
In basketball, the player with the “Hot Hand” is supposedly more likely to make his next shot if he has made several of his previous shots.Among academics, this notion of streak shooting has been disproven enough times that it is referred to not as the “Hot Hand,” but rather as the “Hot Hand Fallacy.”
However, prior research hinges on the assumption that player shot selection is random, independent of player-perceived hot or coldness. Said differently, it assumes that players will take the same types of shots, with the same level of defensive coverage, regardless of whether they have just made or missed three shots in a row. We find this assumption difficult to believe – if players have been shooting well, it seems logical that they would begin to attempt more difficult shots and opposing defenses would begin to cover them more tightly. This would potentially counteract the Hot Hand effect.
We challenge the belief that the Hot Hand is a fallacy using a dataset of over 83,000 shots from the 2012-2013 NBA season, combined with optical tracking data of both the players and the ball. We show that players who have outperformed over recent shots shoot from significantly further away, face tighter defense, and are more likely to take their team’s next shot. We then turn to the Hot Hand itself and show that players who are outperforming will continue to do so by a small but significant amount, once we control for the difficulty of the present shot.
The recent spread of tracking technology in sports is bringing about a new era in analytics where it is now possible to deconstruct events we previously understood as one item or one statistic. We consider rebounding in basketball. Until recently we would get at most one piece of information after a missed shot: the name of a player that got the rebound. In this paper, we (1) describe the full timeline of a rebound, from the moment the ball leaves the shooter’s hands, to the repositioning of players beneath the hoop, to the actual rebound opportunity, (2) develop metrics for the various dimensions of this timeline using novel techniques, and (3) apply these metrics to calculate individual player abilities in each of these dimensions. This analysis reveals different players’ particular strengths in ways that were previously not quantifiable.
Former coach Stan Van Gundy once said of his Magic team, “[The pick and roll is] what we’re going to be in when the game’s on the line. [...] I don’t care how good you are, you can’t take away everything”. A staple in the modern NBA, the pick and roll is used dozens of times every game. A deep quantitative analysis of the pick and roll could have a dramatic impact on how it is used in offenses as well as how it is defended. However, with over 50,000 pick and rolls occurring in a single season, such analysis is labor intensive. Using machine learning techniques, we leverage the massive amount of data collected by the new player tracking systems – installed across all 30 NBA arenas – to develop a system for automatically identifying when a pick and roll occurs. The system we present is both a tool that will allow us to more easily identify statistical trends with respect to the pick and roll, and a tool that can help us determine which teams run the pick and roll similarly. More generally, it is the basis of a framework for recognizing other patterns of play in the NBA.
In terms of analyzing soccer matches, two of the most important factors to consider are: 1) the formation the team played (e.g., 4-4-2, 4-2-3-1 etc.), and 2) the manner in which they executed it (e.g., conservative – sitting deep, or aggressive – pressing high). Despite the existence of ball and player tracking data, no current methods exist which can automatically detect and visualize formations. Using an entire season of Prozone data which consists of ball and player tracking information from a recent top-tier professional league, we showcase an automatic formation detection method by investigating the “home advantage”.
In a paper we published recently, using an entire season of ball tracking data we showed that home teams had significantly more possession in the forward third which correlated with more shots and goals while the shooting and passing proficiencies were the same. Using our automatic formation analysis, we extend this analysis, and show that teams tend to play the same formation at home as they do away, but the manner in which they execute it is significantly different. Specifically, we show that the formation of teams at home is significantly higher up the field compared to when they play away. This conservative approach at away games suggests that coaches aim to win their home games and draw their away games. Additionally, we also show that our method can visually summarize a game which gives an indication of dominance and tactics. While enabling new discoveries of team behavior which can enhance analysis, it is also worth mentioning that our automatic formation detection method is the first to be developed.
Do Major League Baseball umpires call balls and strikes solely in response to pitch location? No. Analyzing over one million pitches, we find that the strike zone contracts in 2-strike counts and expands in 3-ball counts, and that umpires are reluctant to call two strikes in a row. Effect sizes can be dramatic: for the average umpire, the probability of a called strike in 2-strike counts drops by as much as 19 percentage points in the corners of the strike zone; for some umpires, the chance of a called strike drops from a coin flip to almost zero. We structurally estimate each umpire’s aversions to miscalling balls and strikes in different game states. If an umpire is unbiased, he would only need to be 50% sure that a pitch is a strike in order to call a strike half the time. In fact, the average umpire needs to be 64% sure of a strike in order to call strike three half the time. Moreover, the least biased umpire still needs to be 55% sure of a strike in order to call strike three half the time. In other words, every umpire is biased. Because the biases are strongest at the top and bottom of the strike zone, pitchers should shift their pitches towards the top or bottom in 3-ball counts and towards the left or right in 2-strike counts.
Does money buy wins in baseball? Conventional wisdom says yes, but the conclusions from this paper make us question that ironclad assumption.
A cross-sports comparison finds that MLB, along with the NBA and NFL, has a very weak relationship between payroll and wins. In one striking example, baseball’s model was not able to predict with statistical certainty that the 2013 Yankees (payroll: $229 million) would win more games than the 2013 Astros (payroll: $22 million). By contrast, studies of the English Premier League have found a nearly perfect payroll-wins relationship.
Switching to a longitudinal lens, the paper examines baseball’s historical relationship between payroll and winning. Contrary to popular belief, payroll’s explanatory value on wins is currently at a near-all-time low, in spite of rising payroll inequality. How is this possible? In a word: youth. Pre free-agency-eligible players continue to outperform their elder, more expensive peers at a staggering rate. With an increasingly large percentage of the best players not eligible for purchase on the open market, the payroll-wins relationship continues to erode.
It’s impossible to know for certain whether this trend of weakening “win-buying” ability will continue, but increasingly stringent penalties for performance-enhancing drugs—widely assumed to offer greater benefits to older players—could help maintain the youth dominance effect for the foreseeable future. While today’s headlines may speak of baseball’s “haves” and “have nots” in terms of financial clout, they may need some revision for a future in which the reigning currency is not money but youth.
In this work, we show how machine learning can be applied to generate a model that could lead to better on-field decisions by predicting a pitcher’s performance in the next inning. Specifically we show how to use multi-task machine learning to build pitcher-specific predictive models that can be used to estimate whether a starting pitcher will surrender a run if allowed to start the next inning.
The results suggest that using our model would frequently lead to different decisions late in games than those made by major league managers. From the 5th inning on in close games, for those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up surrendering at least one run in that inning 60% (compared to 43% overall) of the time.
We look at the predictions for Red Sox in the 2013 postseason. There were 96 innings pitched by Red Sox starters, of which, 33 were beyond the 4th inning. In 24 of those innings our model would have agreed with the manger to keep the starter in. The starter ended up giving a run in 3 (12.5%) of those innings. There were 9 innings where the manager kept the starter in, but our model wouldn’t have, and the starter ended up giving a run in 5 (55%) of those innings.
I filter the 25-frames-per-second STATS/SportVu optical tracking data of 233 regular and post season 2011-2012 NBA games for half-court situations that begin when the last player crosses half-court and end when possession changes, resulting in a universe of more than 30,000 basketball plays, or about 130 per game. To categorize the plays algorithmically, I describe the requirements a suitable dynamic language must have to be both more concise and more precise than standard X’s and O’s chalk diagrams. The language specifies for each player their initial starting spots, trajectories, and timing, with iteration as needed. A key component is acceleration. To determine optimal starting spots, I compute burst locations on the court where players tend to accelerate or decelerate more than usual. Cluster analysis on those burst points compared to all points reveals a difference in which areas of the court see more intense action. The primary burst clusters appear to be the paint, the top of the key, and the extended elbow and wing area. I document the most frequently accelerating players, positions, and teams, as well as the likelihoods of acceleration and co-acceleration during a set play and other components intended to collectively lead to an algorithmic taxonomy.
The field goal is a critical scoring play in the National Football League. Coaches and fans alike are interested in the probability that a field goal attempt will be made or missed. Traditional analyses assume that the attempt distance is the primary factor determining success; however, we believe that other environmental and situational factors cannot be ignored. We constructed a binary logistic regression model based on data from the 2000-2011 NFL seasons to identify factors that have a significant effect on the likelihood of field goal success. Distance and most environmental factors were significant. Altitude and artificial turf improved the likelihood of a make, while cold temperatures, wind, and precipitation reduced it. Contrary to popular belief, not one situational factor (regular season vs. postseason, home vs. away, whether a timeout was called before the attempt, and situational pressure) was significant. We used our comprehensive model to evaluate kicker careers, seasons, and stadiums between 2000-2011. This evaluation is superior to pure make percentage, which is ignorant of the difficult of a kicker’s field goal attempts. By more accurately predicting the outcome of field goal attempts, coaches can make better in-game decisions and fans can gain a greater understanding of kicker ability.
Immediately following a missed shot an offensive player can choose to crash the boards for an offensive rebound, get back on defense, or hold their current position. In this paper, we use optical tracking data to develop novel metrics to summarize a team’s strategy immediately following a shot. We evaluate each metric using data from the 2011-2012 NBA season. Our results confirm that getting back on defense and neutralizing threats early in the possession contribute to a defensive success. However, tendencies to get back early on defense after a missed shot can reduce a team’s probability of getting an offensive rebound by more than half.
Hockey is a fast and fluid sport with players frequently coming on and off the ice without the stoppage of play. It is also a relatively low scoring sport compared to other sports such as basketball. Both of these features make evaluation of player performance difficult. Recently, there have been some attempts to get at the value of National Hockey League (NHL) plhttp://www.sloansportsconference.com/wp-admin/admin.php?page=nggallery-manage-galleryayers including Macdonald , Ferrari , and Awad . Here we present a new comprehensive rating that accounts for other players on the ice will a give player as well as the impact of where a shift starts and of every non-shooting events such as turnovers and hits that occur when a player is on the ice. The impact of each play is determined by the probability that it leads to a goal for a player’s team (or their opponent) in the subsequent 20 seconds. The primary outcome of this work is a reliable methodology that can quantify the impact of players in creating and preventing goals for both forwards and defenseman. We present ratings for the top forwards and defensemen based on all events from the 2010-11 and 2011-12 NHL regular seasons.
An important problem facing a basketball team is determining the right proportion of 2 and 3 point shots to take. With many possessions remaining, a team should maximize points—a 3-pointer is simply worth 1.5 2-pointers. 3-point attempts have roughly double the per-shot variance as 2-point attempts, but a team should be “risk neutral.” As time remaining decreases, the trailing team should place an increasingly positive value on risk; the opposite holds for the leading team. Our game theoretic analysis yields a testable optimality condition: 3-point success rate must fall relative to 2-point success rate when a team’s preference for risk increases. Using four years of play-by-play data, we find strong evidence this condition holds for the trailing team only. As a lead decreases, the leading team should become more risk-neutral, but teams in this circumstance actually tighten up and become more risk averse, contrary to what their risk preferences ought to be to maximize the chance of winning the game. We also show that if the offense shoots more 3’s as it becomes risk-loving this implies the attack can be varied more readily than the defensive adjustment. 3-point usage does increase with the trail team’s preference for risk, but actually falls for the leading team. Teams get it right when losing and wrong when winning. We also find a strong motivating effect of losing—the trailing teams displays an overall boost in efficiency for both shot types.
Drawing inspiration from the theory of production flexibility in manufacturing networks, we provide the first optimization-based analysis of the value of positional flexibility (the ability of a player to play multiple positions) for a major league baseball team in the presence of injury risk. First, we develop novel statistical models to estimate (1) the likelihood and duration of player injuries during the regular season, and (2) fielding abilities at secondary fielding positions. Next, we develop a robust optimization model to calculate the degradation in team performance due to injuries. Finally, we apply this model to measure the difference in performance between a team with players who have positional flexibility and a team that does not. We find that using 2012 rosters, flexibility was expected to create from 3% (White Sox) to 15% (Cubs) in value for each team, measured in runs above replacement. In analyzing the results, we find that platoon advantages (e.g., having left-handed batters face right-handed pitchers) form an important component of flexibility. As a secondary finding, based on our statistical analysis of injuries, we find that the likelihood of injury increases with age, but not the duration of injury does not.
Professional team sports are extremely information rich, dynamic and complex, which may provide players with fast and accurate field vision decisive competitive advantages. What then are the underlying processes that make some professional players appear to have better field vision than others? The purpose of this study was to learn more about the ways that some of the best professional soccer players in the world use visual exploratory behaviors (body and head movements initiated to better see their surroundings) in real-world games and to test the relationships between these behaviors and performance. Close-up video images of individual players were obtained from Sky Sport’s split screen PlayerCam broadcasts of 1279 game situations with 118 players (midfielders and forwards) in English Premier League (EPL) soccer games. The results show a clear positive relationship between visual exploratory behaviors that are initiated before receiving the ball and performance with the ball. The best players explore more frequently than others and there is a positive relationship between exploratory behavior frequency and pass completion. The impact of exploratory behaviors is the largest for midfielders performing forward passes. These behaviors may have been off the radar for coaches, scouts and fans, and practical implications are offered.
Basketball is a dualistic sport: all players compete on both offense and defense, and the core strategies of basketball revolve around scoring points on offense and preventing points on defense. However, conventional basketball statistics emphasize offensive performance much more than defensive performance. In the basketball analytics community, we do not have enough metrics and analytical frameworks to effectively characterize defensive play. However, although measuring defense has traditionally been difficult, new player tracking data are presenting new opportunities to understand defensive basketball. This paper introduces new spatial and visual analytics capable of assessing and characterizing the nature of interior defense in the NBA. We present two case studies that each focus on a different component of defensive play. Our results suggest that the integration of spatial approaches and player tracking data not only promise to improve the status quo of defensive analytics, but also reveal some important challenges associated with evaluating defense.
In this paper we use state-of-the-art multimodal neuroimaging to tease apart the spatio-temporal sequence of neural activity that “goes through a hitter’s mind” when they recognize a baseball pitch. Specifically we utilize electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to investigate the neural networks activated for correct and incorrect pitch classifications. Our previous analysis has shown where in the trajectory of a pitch the hitter’s neural activity correctly discriminates a pitch type (e.g. fastball, curveball or slider). Here, we show that correct classifications correlate with a neural network including both visual and sub-cortical motor areas, likely demonstrating a link between visual identification and the required rapid motor response. Conversely, we find that not only is this activity lacking in incorrect classifications, but that it is instead replaced by prefrontal cortex activity, which has been shown to be responsible for more deliberative conflict resolution. Synthesizing these and other results, we hypothesize the potential uses of this technology in the form of a brain computer interface (BCI) to measure and enhance baseball player performance.
This paper introduces the “coattail effect” in the Major League Baseball amateur draft, in which top college baseball prospects draw substantial attention from professional scouts, who become more likely to see the star player’s teammates and more likely to recommend selecting them with later draft picks. Using a sample of 11,540 college players drafted in 1984-2003, strong evidence is found that more players are drafted from a given college team when there is a star teammate present. Estimated effects of the coattail effect on the value players provide to major-league teams are mixed and not very conclusive.
In this paper, we use ball and player tracking data from “Hawk-Eye” to discover unique player styles and predict within-point events. We move beyond current analysis that only incorporates coarse match statistics (i.e. serves, winners, number of shots, volleys) and use spatial and temporal information which better characterizes the tactics and tendencies of each player. Using a probabilistic graphical model, we are able to model player behaviors which enables us to: 1) find the factors such as location and speed of the incoming shot which are most conducive to a player hitting a winner (i.e. “sweet-spot”) or cause an error, and 2) do “live in-point” prediction – based on the shots being played during a rally we estimate the probability of the outcome of the next shot (e.g. winner, continuation or error). As player behavior depends on the opponent, we use model adaptation to enhance our prediction. We show the utility of our approach by analyzing the play of Djokovic, Nadal and Federer at the 2012 Australian Tennis Open.
Separating a hockey player’s offensive and defensive contributions is quite difficult. Offensive skill can lead to increased puck possession and therefore improve statistics aimed at measuring defensive performance such as goals or shots allowed. This challenge can be overcome by measuring goals or shots per possession rather than per game, provided a reasonable estimate of possessions is available. Recording when the puck is brought across the blue line makes this transformation possible, enabling a true assessment of performance in the offensive or defensive zone. Surprisingly, a season of data shows no clear separation between players in shot production or suppression; if offensive stars generate more shots per offensive zone possession than fourth line grinders, the difference is small enough to not show up in a single season’s data. Instead, the team’s shot differential – which has been shown to be a strong predictor of wins – is determined almost entirely in the much less-heralded neutral zone. Neutral zone success involves more than getting extra zone entries; since carrying the puck across the blue line generates more than twice as many shots, scoring chances, and goals as dumping the puck in, gaining the zone with possession is a major driver of success.
When commenting on the ability of NBA teams it is commonplace to cite a young team’s inexperience as a negative and the experience of a veteran laden team as a positive. However, there is a lack of empirical investigation into the effects of player or coach experience on team performance. In this paper I analyze the effects of player, coach, and team experience levels on franchise postseason wins. This study uses hand gathered panel data detailing the 804 NBA seasons played by 30 NBA franchises between 1979 and 2008. I find that increased postseason player experience increases a team’s ability to make the playoffs while not increasing their ability to win in the playoffs. A coach’s postseason experience does contribute to a team’s ability to win in the playoffs. I also find that teammate experience, a proxy variable for team chemistry, significantly increases a team’s postseason success. I also offer plausible explanations for these effects. These results should be of interest to team executives, league analysts, and NBA commentators as it provides quantitative insight to an issue that has previously been based almost entirely on conjecture.
We introduce a novel Skills Plus Minus (“SPM”) framework to measure on-court chemistry in basketball. First, we evaluate each player’s offense and defense in the SPM framework based on three basic categories of skills: scoring, rebounding, and ball-handling. We then simulate games using the skill ratings of the ten players on the court. The results of the simulations measure the effectiveness of individual players as well as the 5-player lineup, so we can then calculate the synergies of each NBA team by comparing their 5-player lineup’s effectiveness to the “sum-of-the-parts.” We find that these synergies can be large and meaningful. Because skills have different synergies with other skills, our framework predicts that a player’s value is dependent on the other nine players on the court. Therefore, the desirability of a free agent depends on the players currently on the roster. Indeed, our framework is able to generate mutually beneficial trades between teams. Other ratings systems cannot generate ex-ante mutually beneficial trades since one player is always rated above another. We find more than two hundred mutually beneficial trades between NBA teams, situations where the skills of the traded players fit better on their trading partner’s team.
Traditional ranking methods such as Average Scoring Margin (ASM) or the Ratings Percentage Index (RPI) are limited in their accuracy and usefulness because they focus on just the final score of the game, not how the game arrived at the final score. In this paper I propose a new method that looks at cumulative win probabilities over the duration of a game to measure both team and an individual player’s performance. Using five years of game-play data to generate a Win Probability Index for NCAA basketball, I am able to create an open system that allows anyone to measure the impact, in terms of win probability added, of each play. This method is more accurate than either the ASM or RPI while also providing a more detailed level of player and play specific detail. My initial design includes input adjustments for conference play, home/away/neutral site games, and a team’s strength of schedule. Outputs of this study include player rankings, team rankings, and strength of schedule rankings. Detailed explanations of my methodology and the simulations used to build the model, a comparison to existing methods, and an exploration of futures uses of this data are included.
This paper investigates spatial and visual analytics as means to enhance basketball expertise. We introduce CourtVision, a new ensemble of analytical techniques designed to quantify, visualize, and communicate spatial aspects of NBA performance with unprecedented precision and clarity. We propose a new way to quantify the shooting range of NBA players and present original methods that measure, chart, and reveal differences in NBA players’ shooting abilities. We conduct a case study, which applies these methods to 1) inspect spatially aware shot site performances for every player in the NBA, and 2) to determine which players exhibit the most potent spatial shooting behaviors. We present evidence that Steve Nash and Ray Allen have the best shooting range in the NBA. We conclude by suggesting that visual and spatial analysis represent vital new methodologies for NBA analysts.
One difficulty with analyzing performance in hockey is the relatively low scoring rates compared to sports like basketball. Fenwick rating (shots plus missed shots) and Corsi rating (shots, missed shots, blocked shots) have been used to analyze players and teams because they have been shown to be better than goals as a predictor of future goals. In this paper, we use variables like faceoffs, hits, and other statistics as predictor variables in addition to goals, shots, missed shots, and blocked shots, to predict goals. Our models outperform previous models with regard to mean squared error of actual goals and predicted goals. The results can be interpreted as expected goals and can be used in adjusted plus-minus models instead of goals. We use ridge regression to estimate a player’s contribution to his team’s expected goals per 60 minutes, independent of his teammates, opponents, and the zone in which his shifts begin. We also give adjusted plus-minus estimates based on goals, shots, Fenwick rating, and Corsi rating and use these results alongside the results for expected goals to provide an additional means by which NHL analysts, decision- makers, and fans can measure how valuable a player is to his team.
If a batter can correctly anticipate the next pitch type, he is in a better position to attack it. That is why batteries worry about having signs stolen or becoming too predictable in their pitch selection. In this paper, we present a machine-learning based predictor of the next pitch type. This predictor incorporates information that is available to a batter such as the count, the current game state, the pitcher’s tendency to throw a particular type of pitch, etc. We use a linear support vector machine with soft-margin to build a separate predictor for each pitcher, and use the weights of the linear classifier to interpret the importance of each feature. We evaluated our method using the STATS Inc. pitch dataset, which contains a record of each pitch thrown in both the regular and post seasons. Our classifiers predict the next pitch more accurately than a naïve classifier that always predicts the pitch most commonly thrown by that pitcher. When our classifiers were trained on data from 2008 and tested on data from 2009, they provided a mean improvement on predicting fastballs of 12.5% and a maximum improvement of 50%. The most useful features in predicting the next pitch were Pitcher/Batter prior, Pitcher/Count prior, the previous pitch, and the score of the game.
One of the most important aspects of team construction is identifying and acquiring the most talented and productive players on your team, the players on whom a team’s fortunes most rely. Teams must decide which player-types, when combined, yield the best fit. As an example, suppose there is a team, whose current best player is a scoring, shoot-first point guard. Suppose this team is looking to bring in a top-flight free agent. What type of player should this team target? Should they bring in a defense-oriented big man? Should they acquire a multi-faceted, jack-of-all-trades wing? This paper aims to answer these questions. Analyzing player data and team season data from 1977, this paper first uses clustering techniques to group players into appropriate groups, then regression to determine the degree to which the composition of a team’s top 2 and top 3 players affect that team’s win total, while accounting for team quality and coaching ability. This paper shows that the composition of a team’s top 2 and top 3 players is a strongly statistically significant factor in the success of a team, and shows which combinations yield over-performance, and which combinations yield underperformance, relative to the team’s talent and coaching quality.
We examine the role of official NFL sponsorships in five primary categories to determine the relative effects of an official sponsorship on each element of the BAV brand equity model, Differentiation, Relevance, Esteem, and Knowledge among NFL fans versus non-fans over the 2008-2010 time period. Importantly, we compare the effects against primary competitors within each category targeting the same NFL fan audience. Results show the benefits of sponsorships over and above the brands’ national campaigns. Category specific results show the importance of longitudinal participation and analysis including competitors and fans of the property versus non-fans otherwise exposed to the brand’s marketing strategy.
This paper leverages STATS’ SportsVu Optical Tracking data to deconstruct several previously hidden aspects of rebounding. We are able to move beyond the outcome of who got the rebound to discover the non-linear relationship between shot location and its impact on offensive rebound rates, implications of the height of where rebounds are obtained, and estimates of where players should move in order to improve rebounding rates. We also leverage machine-learning methods to estimate the predictability of rebounding.
Abstract: In sports literature and reporting the ‘contract-year effect’ has been treated as anything ranging from an old wives tale to a well-established fact. We note many instances of players’ play falling off after landing a huge contract. In this paper we analyze the effect of being in the final year of the contract on player performance (as measured by the NBA efficiency index and using PER rating for a robustness check). Using the set of all NBA players from 1999 onwards we use a fixed effects regression model to determine that being in the last year of a contract causes a player to perform significantly better than in the year prior and that this effect is non-linear over the duration of the contract. We find that this effect is reduced for more experienced players- as players get further into their careers the change in performance level tends to flatten. We postulate a simple game theoretic model that forms the basis for and is consistent with the empirical results. This paper makes a contribution to the economics literature on career concerns and long-term contracts and should be of interest to sports agents, teams and athletes.
Abstract: The “Sabermetric Revolution” has brought rigorous analytics to the fore in operating and managing baseball teams. In sports, Baseball is unique in the data that are available: the isolated pitcher-versus hitter moments allow the game to be effectively analyzed through the statistics of individuals. Over time, front offices have increasingly relied on developing these statistics and sought to maximize expected performance per dollar spent. However, the over-abundance of data and well established metrics has prevented organizations from most effectively utilizing analytics to manage their primary workforce: the players. This paper will provide a framework for the application of statistical tools to roster management and development, allowing a team to leverage the power of analytics and effectively incorporate Sabermetric tools and techniques to improve on-field performance. We will first analyze performance through the lens of Workforce Analytics, a statistical modeling framework which can be applied to any industry. Next we will define an appropriate variable to incorporate future value and cost: Expected Performance Efficiency. Lastly, we will provide a case study of Workforce Analytics as applied to the World Champion San Francisco Giants entering the 2011 season.
Abstract: Optimizing an NBA team’s approach to signing free agents can be viewed as a knapsack problem for which the available free agents comprise the items that can be placed into the knapsack, and each team’s cap space corresponds to the size of the knapsack. The salary cap presents teams with the problem of how to maximize the value of a fixed expenditure in a dynamic market. As free agents are signed, the talent pool available to fill a team’s needs decreases. However, as the amount of available money in the market decreases, the amount required to sign a given player also decreases. Thus, teams are faced with the dilemma of when to offer a contract and how much money to offer. The optimal strategy for solving this problem must consider the trade-off between losing a player to a competitor and acquiring the player at a price that preserves as much cap space as possible for additional players. This trade-off can be solved by a dynamic program based on the multiple-choice knapsack problem, which optimizes the benefit of signing a player at any particular point in the free agency period in comparison to the opportunity costs of the cap space required to do so.
Abstract: I present a combination of player evaluation techniques that significantly outperforms regularized adjusted +/- in prediction accuracy. This approach models a basketball game as a finite state machine, which is a behavior model composed of a finite number of states, transitions between those states, and actions. This state machine consists of four states, a ’normal’ and an ’extra possession’-state for each of the two teams. The ’extra possession’-state can be reached through forcing opponent turnovers or offensive rebounds. Players get positive and negative credit assigned to them for executing certain actions or failure to do so. Variables that determine how credit gets split between players get optimized by minimizing the sum of squared residuals in the training data. NBA games from October through February were used to train each model. All models were evaluated in terms of their ability to predict the outcome of the remaining regular season games, which are not included in each models’ training set. Squared error of predicted vs. actual margin of victory was used as a measure of accuracy
Abstract: Over the past two decades, NBA executives have faced a challenge evaluating talent as the amount of international players entering the NBA has skyrocketed. Notable early draft selections, such as the Pistons.’ pick of Darko Milicic, indicate that the league may still have much to learn about what makes an international player compatible with the NBA and successful in the league. This paper examines this problem by analyzing which skills translate to the NBA game, dissecting what NBA executives value when selecting international players, and by analyzing which international statistics and characteristics are significant determinants of success in the NBA. This paper will also make predictions for the success of upcoming international prospects possible. The results of this study indicate that there are differences between how to identify superstars and role players; however, NBA teams are almost evaluating talent to their maximum ability with the available information
Abstract: The plus-minus statistic for NHL players is meant to be a measure of a player’s offensive and defensive abilities. However, a player’s plus-minus is highly dependent on the team he plays for, the opponents he faces, and other variables out of his control, so it’s not always a good measure of that player’s individual contribution to his team. In this paper we develop an adjusted plus-minus statistic that attempts to isolate a player’s individual contribution. Using data from the detailed shift reports on NHL.com, we develop two weighted least squares regression models to estimate an NHL player’s effect on his team’s success in scoring and preventing goals, independent of that player’s teammates and opponents. Our initial work focused on even strength situations, excluding situations in which one team had pulled their goalie. In our current work, we have modeled power play and shorthanded situations, and we are able to estimate a player’s offensive and defensive contributions during those situations. Also, for those shifts that begin with a faceoff, we have accounted for the zone on the ice in which a shift begins.
Abstract: We interpret the Adjusted Plus/Minus (APM) model as a special case of a general penalized regression problem indexed by the parameter . We provide a fast technique for solving this problem for general values of . We then use cross-validation to select the parameter r and demonstrate that this choice yields substantially better prediction performance than APM.
Optimizing the performance of a basketball offense can be viewed as a network problem, wherein each play represents a “pathway” through which the ball and players move from origin (the in-bounds pass) to goal (the basket). Inspired by recent discussions of the “price of anarchy”, this talk makes a formal analogy between a basketball offense and a traffic network. The analysis suggests a significant difference between taking the highest-percentage shot each possession and playing the most efficient possible game. There may also be an analogue of Braess’s Paradox in basketball, wherein a team’s offense improves after losing a key player.
In head-to-head fantasy sports leagues, it is common belief that managers try to do their best in all statistics categories. In this paper, we work to turn this notion on its head and investigate the strategy of effectively forfeiting certain categories while focusing on a certain subset of all categories. Through millions of draft and match-up simulations based on 2008-2009 NBA statistics, we found that approximately one-quarter of all possible subsets yielded strategies that defeated the “all statistics” strategy in a head-to-head match-up, and that the “all statistics” strategy is not the overall best one.
Adjusted plus/minus has grown in popularity as an NBA player evaluation technique but remains controversial and can yield results which many basketball experts find counterintuitive. We present a framework for evaluating adjusted plus/minus and an enhancement to the technique which nearly doubles its accuracy. Conventional adjusted plus/minus is shown to do a poor job of predicting the outcome of future games, particularly when fit on less than one season of data. Adding regularization greatly improves accuracy, and some player ratings change dramatically. Broader lessons for the sports analytics community regarding model evaluation and the use of Bayesian techniques are discussed.
Although the Pythagorean expectation formula does well at predicting win percentage, the shape of the run distribution can also be a factor. Given two baseball teams with the same average runs per game, the team with the narrower run distribution tends to win more games. Modified formulas that take into account both the runs per game and the shape of the run distributions are presented. Also, slugging percentage has an inverse correlation with the width of the run distribution. A team slugging percentage .080 above average is worth about one extra win compared to the simple prediction using only runs scored and runs allowed.
Abstract: Soccer teams regularly compete at altitudes above 2,000 meters (6,562 feet) with World Cup qualification or other honors on the line. Media, fans, and players often question the fairness of playing at high altitudes, and FIFA temporarily banned international matches above 2,500 meters (8,200 feet) in 2007. Researchers agree that traveling to higher or lower altitude can harm athletic performance, but the effects on professional athletes may be too small to influence match outcomes. Additionally, many teams try to limit altitude effects by allowing players extra time to acclimatize before a match. To identify the causal impact of altitude change, I compare South American international match outcomes between the same teams but played at different altitudes within the same country. This approach controls for influences such as differences in travel distance for high and low altitude countries. I find that traveling to lower altitude does not affect performance but traveling to higher altitude has negative effects. In particular, away teams perform poorly in Quito, Ecuador (2,800 meters), and La Paz, Bolivia (3,600 meters). However, away teams do relatively well in Bogotá, Colombia (2,550 meters). I conclude that stadium altitudes should not be restricted under 3,000 meters without further justification.
Abstract: The major difficulty in evaluating individual player performance in basketball is adjusting for interaction effects by teammates. With the advent of play-by-play data, the plus-minus statistic was created to address this issue . While variations on this statistic (ex: adjusted plus- minus ) do correct for some existing confounders, they struggle to gauge two aspects: the importance of a player’s contribution to his units or squads, and whether that contribution came as unexpected (i.e. over- or under-performed) as determined by a statistical model. We quantify both in this paper by adapting a network-based algorithm to estimate centrality scores and their corresponding statistical significances . Using four seasons of data , we construct a single network where the nodes are players and an edge exists between two players if they played in the same five-man unit. These edges are assigned weights that correspond to an aggregate sum of the two players’ performance during the time they played together. We determine the statistical contribution of a player in this network by the frequency with which that player is visited in a random walk on the network, and we implement bootstrap techniques on these original weights to produce reference distributions for testing significance.
Abstract: In January 2010, grooves on the heads of golf clubs were mandated to have less volume and rounder edges. The intention of the controversial new grooves design was to make hitting from the rough harder, thereby making driving accuracy more important. We analyze data from 2009 and 2010 to determine the impact of the new rule on golfers on the PGA TOUR. In the 1980′s, those golfers who were ranked most accurate in their driving were also ranked highest on the money list. However, this correlation has steadily decreased, to the point where it is now nearly zero. We find that for 2010, the correlation between these two variables is higher, but not statistically significantly so. We then examine whether it was harder in 2010 to hit from the rough, both visually and statistically. Both approaches show that it was no more difficult to hit from the rough in 2010 than in 2009, and perhaps even easier. Lastly, we look into players’ strategies to determine whether or not they are playing differently in 2010 to adjust for the new rule. We find no evidence – either visual or statistical – to suggest that players have significantly changed their styles in 2010.
Abstract: For every successive time a pitcher faces a batter in a game, that pitcher is more likely to allow runs to that batter. An analysis of covariance confirms this observation and suggests that by limiting the number of times a pitcher faces a batter in a game, the pitching team will prevent more runs from being scored. In order to prevent multiple plate appearances against a pitcher a strategic shift in pitching staff construction must be made. This paper proposes the Paired Pitching system, in which four pairs of average pitchers are responsible for innings one through eight, with each member of a pair taking exactly four innings of work. Four bullpen pitchers would be responsible for all other innings. Through a careful analysis of the data this paper shows that the Paired Pitching system would significantly increase wins. Furthermore, MLB teams spend a majority of their player payroll on a five-man rotation. This analysis shows that the Paired Pitching system would significantly decrease the cost of achieving the desired pitching production. Lastly, this paper quantifies additional benefits from the Paired Pitching system, suggests additional research topics, and provides a suggested implementation technique for this new pitching model.
Abstract: We propose a quantitative metric to evaluate a MLB player’s offensive ability against a given pitch. Our swing quality metric should be used in concert with scouting reports and existing metrics to provide player evaluators with a more complete view of a hitter’s ability. Existing advanced baseball metrics do not consider the: (1) type of pitch, (2) velocity of the pitch, and (3) location of the pitch inside or outside of the strike zone in one statistic. Our metric addresses this shortcoming. An explanation of our methodology is provided and an evaluation against existing advanced metrics is performed.
Abstract: Much has been made through the years in the media, literature and academia of Major League Baseball’s infamous antitrust exemption, mostly through the prism of free agency, franchise relocation and television rights. But perhaps the most lasting and damning impact of the exemption resides in the annual Rule 4 Draft, in which MLB’s 30 franchises alternate selections for the exclusive rights to select the top amateur players from Canada and the United States during the first week in June. Yet as calls for reform of the draft have grown in recent years, whether to curb the growing bonuses spurred by two decades of savvy player agents or to include the league’s growing talent sources in the Caribbean, the most pertinent questions typically are left unaddressed. First, are drafted players overpaid? (As a general rule, no.) Second, does the draft distribute talent more evenly than the amateur free agent market? (Probably.) Third, how do we fix it to reward effective management, not lucking into talent via consistent ineptitude or simply having the biggest pursestrings. (Shorten the draft, force teams to carry high-bonus players and drafted players on their expanded rosters, and limit the duration of teams’ control of minor leaguers).
Abstract: Though sports teams have a general intuition about the factors that influence ticket sales, little is understood about the decision-making process that underlies consumers’ consideration / purchase activities. We developed a two-stage model to better understand this process. In the first stage, consumers are faced with a “universal” set of options (i.e., all game / ticket-tier combinations) from which they construct a smaller consideration set. In the second stage, consumers choose one option from this reduces set. We consider a variety of factors in each stage of the model, such as: game attractiveness (which is allowed to vary over time as the strength of each opponent changes throughout the season), seating tier, days until game, and ticket prices. We estimate our model for a U.S. professional sports franchise, and our empirical results can allow teams to run plausible scenarios about the impact that price changes (for current and future games) will have on ticket sales.
Abstract: When facing a heavily-favored opponent, an underdog must be willing to assume greater-than-average risk. In statistical language, one would say that an underdog must be willing to adopt a strategy whose outcome has a larger-than-average variance. The difficult question is how much risk a team should be willing to accept. This is equivalent to asking how much the team should be willing to sacrifice from its mean score in order to increase the score’s variance. In this paper a general analytical method is developed for addressing this question quantitatively. Under the assumption that every play in a game is statistically independent, both the mean and the variance of a team’s offensive output can be described using the binomial distribution. This allows for direct calculations of the winning probability when a particular strategy is employed, and therefore allows one to calculate optimal offensive strategies. This paper develops this method for calculating optimal strategies exactly and then presents a simple heuristic for determining whether a given strategy should be adopted. A number of interesting and counterintuitive examples are then explored, including the merits of stalling for time, the run/pass/Hail Mary choice in football, and the correct use of Hack-a-Shaq.
Abstract: This paper examines the optimality of the shooting decisions of National Basketball Association (NBA) players using a rich data set of shot outcomes. The decision to shoot is a complex problem that involves weighing the continuation value of the possession and the outside option of a teammate shooting. We model this as a dynamic mixed-strategy equilibrium. At each second of the shot clock, dynamic efficiency requires that marginal shot value exceeds the continuation value of the possession. Allocative efficiency is the additional requirement that at that “moment”, each player in the line-up has equal marginal efficiency. To apply our abstract model to the data we make assumptions about the distribution of potential shots. We first assume nothing about the opportunity distribution and establish a strict necessary condition for optimality. Adding distributional assumptions, we establish sufficient conditions for optimality. Our results show that the “cut threshold” declines monotonically with time remaining on the shot clock and is roughly in line with dynamic efficiency. Over-shooting is found to be rare, undershooting is frequently observed by elite players. We relate our work to the usage curve literature, showing that interior players face a generally steeper efficiency trade off when creating shots.
Abstract: We introduce a win probability approach to modeling basketball performance and employ it to determine the effect that early foul trouble by a team’s starters has on its future performance. We find that each of the three seasons from 2006-2007 through 2008-2009 display negative team performance when starters are in early foul trouble, defined as having committed at least one more foul than the current quarter “Q+1”. Most players in early foul trouble should be yanked until they are no longer in foul trouble. Our approach can be extended to other state variables.
Abstract: Evaluation of NHL goalies is often done by comparing their save percentage. These save percentages depend highly upon the defense in front of each goalie and the difficulty of shots that each goalie faces. In this paper we introduce a new methodology for evaluating NHL goalies that does not depend upon the distribution of shots that any individual goalie faced. To achieve this new metric we create smoothed nonlinear spatial maps of goalie performance based upon the shots they did face and then evaluate these goalies on the league average distribution of shots. These maps show the probability of a goalie giving up a goal from across the playing surface. We derive a general mathematical framework for the evaluation of a goalie’s save percentage. Using data from the 2009-10 NHL regular season, we apply this new methodology and calculate our new defense independent goalie rating (DIGR) for each goalie that face more than 600 shots. Results of this evaluation are given and possible extensions of the methodology are discussed.
How and why does performance change under pressure? Psychologists have argued that pressure can both distract, motivate and generate too much self-focus (thinking about the details of how one should accomplish a goal, as opposed to “just doing it”). Studies have implicated self-focus as the key factor in pressure-associated performance declines. To understand if these results extend to highly trained experts, we examine two fundamentally different actions within the context of the same professional sport, basketball. The first action, free throw shooting, requires quiet concentration, while the second, offensive rebounding, is based on effort exerted in the heat of the moment. Home vs. Away variation allows us to understand how a supportive audience moderates the impact of pressure. We find that home free throw shooters do significantly worse in clutch situations, with the effect being larger for poor shooters. Road players show no change in behavior under pressure, indicating distraction plays a limited role in this task. In stark contrast, the home team gets significantly better at offensive rebounding in pressure packed moments, while again the road team shows no relationship between performance and pressure. The results show a clear asymmetric impact of a supportive audience—it can both inspire effort and lead to detrimental self-focus, even for experienced agents. From a sports perspective, it shows how the traditional notion of home-court advantage is not inconsistent with some pressure-related disadvantages (“home choke”).