Abstract: Recent play in Major League Baseball has showcased many attempts to achieve an advantage through smart selection of pitcher-batter matchups. Ideally, each pitcher-batter matchup should be analyzed in isolation of other matchup data to estimate probabilities for outcomes of specific at-bats; however, most matchups have not occurred frequently enough to yield statistically meaningful estimates. In this paper, we move beyond simple lefty-righty matchup rules, and present a technique for classifying pitcher-batter matchups within a set of cluster types. The training of clusters and the verification of our techniques, require statistically significant matchup data and the presence of large bicliques in high pitcher-batter matchup-count bipartite graphs. We present several of these bicliques, and simulate thousands of innings of baseball to show the accuracy of our clustering algorithm compared to the “truth” (estimates obtained with the maximum-likelihood rule). We use this approach to verify the utility of selecting 15 cluster types within the matchup data. We then show how these cluster types can be used to simulate potential matchups where data is sparse or nonexistent in the real world. The result is a set of algorithms that can be used to identify optimal pitcher-batter matchup strategies in cases when matchup data is abundant and when it is sparse. The 15-cluster approach is shown to produce significantly better estimates of matchup probabilities compared to estimates obtained using only traditional lefty-righty rules. By our estimates, the best pitcher-batter matchups are better by roughly half a run per inning over the worst possible matchups.