Hockey Analytics: Derived Statistics are the Beginning not the End
The following article was written by Kevin Mongeon, a principal owner at The Sports Analytics Institute that provides consulting advice to a number of professional hockey teams.
A large portion of the efforts of sports analytics specialists is focused on determining the value of a player, or more specifically, a player’s contribution to winning. To this end, analysts have created a number of derived statistics that potentially determine a player’s contribution to winning more accurately than traditional statistics.
For example, in baseball, the derived statistic of on base percentage has been shown to more accurately determining a hitter’s value, in terms of winning, than the more traditional statistic of hitting percentage. Before this knowledge became broadly available baseball managers making personnel decisions using on base percentage rather than hitting percentage were able to achieve more wins per dollar spent on payroll.
Hockey analytics is beginning to emerge as a competency sought by hockey managers to value players and make personnel decisions. As a result, a number of derived statistics have been created to provide new ways of evaluating player performance. Corsi and Fenwick, which count the number of shots directed at the net while a player is on ice, were two of the first derived statistics in hockey. More recently, we (The Sports Analytics Institute) created PGS™ (Predicted Goals Scored™) that calculates the predicted number of goals scored for and against while a player is on the ice. PGS™ accounts for shot characteristics including but not limited to shot location and type.
Can Derived Hockey Statistics Accurately Value Players?
To answer the following question: “Can derived hockey statistics accurately value players?”, it will be useful to examine the reason derived baseball statistics do reasonably well at determining the value of baseball players. Baseball is played in a discrete manner with each team separately playing offense and defense. Each discrete event (i.e., pitch, hit) generates observable data available for analysis. The manner in which baseball is played creates an environment where basic statistical assumptions are not violated when data are aggregated. Therefore, derived statistics aggregated over many events can potentially provide accurate estimates of a baseball player’s contribution to winning.
Hockey is played in a continuous manner with constant flow and continuous player interactions with each team simultaneously playing offense and defense. Only some events (e.g., shots, goals) generate data that can be made available for analysis. As a result, the manner in which hockey is played creates an environment where basic statistical assumptions are violated when data are aggregated. Therefore, derived statistics are not likely to provide accurate estimates of a hockey player’s contribution to winning. Hockey managers are aware of the complex nature in which hockey data are generated and are among the first group of people to dismiss player valuations based solely on derived statistics.
More Hockey Data?
It is common to hear that the solution to obtaining accurate hockey player valuations is through the collection of additional data. Additional data will allow for more sophisticated derived statistics but these statistics will still be generated from the complex manner in in which hockey is played. Therefore, the aggregation of these new derived statistics will most likely not provide accurate player valuations. The primary benefit of additional data is not in the derivation of accurate player valuations. Rather it can be leveraged to determine optimal strategies and tactics for winning.
Determining the Value of Hockey Players
Determining accurate player valuations are possible using statistical models specifically designed to account for the nature in which hockey data are generated. Using a well-formed statistical model, a player’s contribution to winning can be determined while accounting for team, opponent, line-mates, opposing-players, score of the game, as well as a number of other effects. Unlike player valuations obtained from derived statistics, statistical models can predict a player’s offensive and defensive production under different scenarios, such as if they were to play with different line-mates, on a different team, or under different within-game scenarios to name a few.
As more data are collected additional derived hockey statistics are likely to be created. These statistics are not likely to provide information leading to accurate player valuations in hockey. For hockey analytics to become a fundamental part of hockey operations across The National Hockey League, accurate player valuations are required. Given that hockey is played in a continuous manner, player valuations based on statistical models rather than derived statistics are more important in hockey than in other sports.
About the Author
Kevin is a principal owner of The Sports Analytics Institute that provides consulting advice to a number of professional hockey teams. Kevin is also an Assistant Professor of Economics at the University of New Haven. Kevin obtained has a Ph.D. in economics from Washington State University (WSU), an MBA from the University of Windsor, and a mathematics degree from a Lakehead University. Kevin can be contacted at firstname.lastname@example.org or (509)-432-6230.
Editor’s note: The views expressed in each post are those of the author(s) only and not those of the conference organizing team or blog sponsor.