A switching dynamic generalized linear model to detect abnormal performances in Major League Baseball

Research Paper will be posted in the coming weeks. Check back soon!
Download the
Full Paper Here

Chris Glynn
Surya Tokdar


Abstract: This paper develops a novel statistical method to detect abnormal performances in Major League Baseball. Abnormally high levels of performance may be caused by myriad factors including performance enhancing drugs (PEDs), banned equipment which offers unfair advantages, and illegal surveillance of opponents. The career trajectory of each player’s yearly home run total is modeled as a dynamic process which randomly steps through a sequence of natural ability classes as the player ages. Performance levels associated with the ability classes are also modeled as dynamic processes that evolve with age. The resulting switching Dynamic Generalized Linear Model (sDGLM) models each player’s natural career trajectory by borrowing information over time across a player’s career and locally in time across all professional players under study. Potential structural breaks from the natural trajectory are indexed by a dynamically evolving binary status variable that flags unnaturally large changes to natural ability, possibly due to unnatural causes such as PED abuse. We develop an efficient Markov chain Monte Carlo algorithm for Bayesian parameter estimation by augmenting a forward filtering backward sampling (FFBS) algorithm commonly used in dynamic linear models with a novel Polya-Gamma parameter expansion technique. We validate the model by examining the career trajectories of several known PED users and by predicting home run totals for the 2006 season. The method is capable of identifying both Barry Bonds and Mark McGwire as players whose performance increased abnormally, and the predictive performance is competitive with a Bayesian method developed by Jensen et al. (2009) and two other widely utilized forecasting systems.