(batter|pitcher)2vec: Statistic-Free Talent Modeling With Neural Player Embeddings

Research Paper will be posted in the coming weeks. Check back soon!
Download the
Full Paper Here

Michael A. Alcorn


Abstract: This paper introduces (batter|pitcher)2vec, a neural network algorithm inspired by word2vec that learns distributed representations of Major League Baseball players. The representations are discovered through a supervised learning task that attempts to predict the outcome of an at-bat (e.g., strike out, home run) given the context of a specific batter and pitcher. The learned representations qualitatively appear to better reflect baseball intuition than traditional baseball statistics, for example, by grouping together pitchers who rely primarily on pitches with dramatic movement. Further, like word2vec, the representations possess intriguing algebraic properties, for example, capturing the fact that Bryce Harper might be considered Mike Trout's left- handed doppelgänger. Lastly, (batter|pitcher)2vec is significantly more accurate at modeling future at-bat outcomes for previously unseen matchups than simpler approaches.