Decoding MLB Pitch Sequencing Strategies via Directed Graph Embeddings

Download the
Full Paper Here

Arnav Prasad


This paper presents a novel analysis of pitch sequencing in MajorLeague Baseball (MLB). By leveraging high-resolution pitch tracking data from ~3.5million pitches across the 2015-2019 seasons, this work introduces graph embeddings that successfully map short- and long-term patterns in pitch sequences. This quantitative approach to pitch sequencing captures the intuition that pitchers exhibit a sense of memory when on the mound that is not adequately represented by individual pitch selection or simple pitch-to-pitch correlation. Since each graph embedding corresponds to a forward dependency between a set of pitches, this method enables a pitcher’s sequencing decisions to be represented by a directed graph or network. Model-based clustering (Gaussian Mixture Models via theExpectation Maximization Algorithm) suggest that MLB at-bats can be grouped into a finite collection of universal patterns with respect to both pitch-type and zone selection. Exploratory data analysis of these sequence clusters indicate that a pitcher’s sequencing strategies are distinct—though not inseparable—from their available pitch arsenal; that is, in addition to pitch selection, pitch ordering is a significant component of pitcher decision-making. While MLB pitchers display season- and career-level sequencing behaviors, pitchers dynamically respond to in-game matchups and events by altering their usage of certain sequences when given new information. By interpreting the graph embeddings as forward dependencies, this paper also finds compelling quantitative evidence of “setup” and “knockout” pitches in various sequence clusters. Ultimately, this paper introduces an analytical framework to study and visualize MLB pitch sequencing with potential applications in matchup preparation, player evaluation, and player development.