Abstract: The task of extracting informative measures of talent for Major League Baseball (MLB) players has a surprising parallel in the field of natural language processing — the task of constructing useful word embeddings. Words, like MLB players, can be considered distinct elements in a set, and one common way to represent such categorical data in machine learning algorithms is as one-hot encodings. However, one drawback of one-hot encodings is that every element in the set is equally similar (or dissimilar) to every other element in the set (due to their mutual orthogonality). But words (and players) do exhibit varying degrees of similarity. By modeling how words behave in different contexts, word embedding algorithms (like word2vec) learn to mathematically encode such similarities as geometric relationships between vectors (e.g., cosine similarity or Euclidean distance). This paper introduces (batter|pitcher)2vec, a neural network algorithm that adapts these representation learning concepts to a baseball setting, modeling player talent by learning to predict the outcome of an at-bat given the context of a specific batter and pitcher. The learned representations qualitatively appear to better reflect baseball intuition than traditional baseball statistics, for example, by grouping together pitchers who rely primarily on pitches with dramatic movement. Further, like word2vec, the representations possess intriguing algebraic properties, for example, capturing the fact that Bryce Harper might be considered Mike Trout’s left-handed doppelgänger. Lastly, (batter|pitcher)2vec is significantly more accurate at modeling future at-bat outcomes for previously unseen matchups than simpler approaches.