A Data Driven Goalkeeper Evaluation Framework

Research Paper will be posted in the coming weeks. Check back soon!
Download the
Full Paper Here

Derrick Yam


Abstract: In professional soccer, transfer fees grow ever larger with sums of over $100m frequently changing hands between top clubs. However, one key position remains undervalued: the goalkeeper. Of the fifty most expensive transfers in history just two are goalkeepers. Why might this be? Perhaps stopping goals intrinsically lacks the allure of scoring them, but it’s also possible that there is a perceived parity among goalkeeper’s abilities, their actions are relatively infrequent and historically, descriptive goalkeeper event data has been sparse. It’s been hard to get a quantitative handle on a keeper’s value. Using data from StatsBomb, we outline a framework to evaluate goalkeepers on four key responsibilities: Shot Stopping, Cross Collection, Defensive Activity, and Distribution.

Probabilistic models have been trained and calibrated to estimate individual goalkeeper shot stopping skill and cross collecting aggression. Positional deviations are estimated via a K-Nearest Neighbor algorithm while distribution is assessed through the lens of attacking contribution and the player’s reaction to receiving opponent pressure. This work–and the resulting metrics—enable a flexible matching algorithm to be used to identify prospects that match specific goalkeeper profiles. For example, preliminary results from this framework saw Manchester United’s David De Gea as the highest performing goalkeeper in the 2017-18 Premier League season. We match him to a rising young star in England’s third tier, Dean Henderson, who performed well while on loan at Shrewsbury Town. His parent club is the same as De Gea, Manchester United. Maybe they already own his eventual replacement?