By Zach Slaton, Author of A Beautiful Numbers Game, Contributing Writer at Forbes
It’s 5:00 AM on Sunday, March 3rd. I am sitting in Boston Logan International Airport waiting to board my plane back to Minneapolis, Minnesota, that will allow me to make my connecting flight to Seattle and all I can think of in no particular order is:
1. Do I really have to go back to work on Monday? Why can’t I just keep writing about sports analytics for a living? (Hint: writing is not my day job)
2. I have to make time this week to transcribe two hours worth of soccer-specific analytics interviews.
3. Last night was a bad one for my Seattle Sounders FC. I hope Arsenal doesn’t make it a big goose egg of a soccer weekend for me. Knowing they’ve got a less than 28% chance of finishing in the EPL’s Top 4 come Monday may just be too much for me to handle right now.
4. Speaking of Arsenal, do you think a random Minneapolis airport bar/restaurant will be willing to show the North London Derby at 9 AM?
5. That paper by Dr. Jordet on field vision in soccer was amazing, and the presentation was even better! (more on that later)
6. I met so many cool people this weekend, many of whom I only knew from their twitter handles and the 140-character bursts of conversation I’d had with them over the last year.
7. Finally, has it really been 60 hours since I touched down in Boston for the 2013 MIT Sloan Sports Analytics Conference with my biggest worry at the time being how many different T lines I would have to take to get to McGreevy’s 3rd Base Saloon?
That final point perhaps best sums up the experience. Yes, the conference is an exhausting experience that often feels like one’s brain is hooked up to a fire hose of analytical information that never stops, but it’s a good sort of exhaustion akin to the kind you get after a really satisfying workout. If you don’t leave Boston feeling like you wanted to speak to far more people and attend a far greater number of panels than you did it means you haven’t planned well enough to know there are more options for interactions than there is time in the day. The Sloan conference is a lot like the connected cities concept explored by Richard Florida in this October 2011 Atlantic Monthly article, in that it fosters so much creative thought because it brings together the brightest minds inside and outside of mainstream sporting culture and allows the conversation to simply go where it may. Knowing now, less than twelve hours after the completion of this year’s conference, that another bigger and better one will be held next year only serves to motivate this year’s attendees to take all the new knowledge, contacts, and interactions they’ve acquired and redouble their efforts in the intervening year. It’s a virtuous, positive feedback loop that the conference fosters the 363 days it’s not in session.
Consistent Themes From Day One
Day Two of the conference had a few consistent themes that carried over from day one that included data organization, presentation, and visualization. Yet again the emphasis was on ensuring that data visualization was in tune with the intended audience’s expectations and experience with the data being presented. Sometimes visualization can be looked down upon in the analytics community as it often doesn’t involve fancy equations better suited for academic research papers and there’s a bit of judgment involved because the audience may be less numerically sophisticated than the average Sloan Conference attendee. This view of data visualization has two significant problems:
1. It ignores the reality that the vast majority of the sports-consuming population, whether it’s spectators or practitioners, doesn’t have an advanced degree in mathematics or science, nor should they be expected to have one to engage in sports discussion.
2. If analytics are to continue to penetrate the wider sporting world’s consciousness, we must bring everyone along at a level of discussion with which they’re comfortable. As Edward Tufte would argue, any presentation of data must let your audience consume the data in the way they will intuitively consume it and not the way you want them to consume it.
I said in my Day One recap that the battle for analytics’ acceptance in the boardroom and within coaching staffs has largely been won. A large number of owners, managers, scouting staff, and support staff seem to be accepting the fact that numbers can help them improve their contribution to the games we play, but I did qualify that observation by saying that there’s still the issue of presenting the data to them in a meaningful and understandable way. I don’t know if I could make the same qualified statement of success when it comes to the wider public’s view of analytics, which makes how we present models and data as an analytics community all the more important. Albert Larcada of ESPN’s Stats and Information department provided perhaps the most robust defense of the concept of data-visualization-as analytics by saying the following on the Soccer Analytics panel:
“Analytics doesn’t just mean stats. It can mean data visualization. It can mean video capture and editing. Just because you have a stats or math background doesn’t mean you’re the only one who can do analytics.”
Larcada then proceeded to demonstrate the various depths of visualization ESPN will use depending upon the assumed viewers’ soccer IQ. In one visualization, a comparison between Cristiano Ronaldo’s and Lionel Messi’s position on the pitch was made via heat maps, and these heat maps quickly demonstrated Messi’s role as a “False 9” central player to the viewer while Ronaldo’s role of coming in from a more forward deployed left wing position was also easily seen. Such information is valuable to the understanding of the game by those who only know Barcelona and Madrid are two big Spanish clubs who they may want to watch in the upcoming match. Digging just a bit deeper, Larcada showed another heat map later in the panel, this time using Messi’s touches in Barcelona’s loss to AC Milan in the Champions League. Even the most casual observer of soccer knows Messi is a goal scoring machine who rips defenses apart, yet the heat map showed just how effective AC Milan was at denying him that opportunity via a single red dot on the map indicating the one touch Messi had within Milan’s box. This simple visualization then opens up an array of questions as to how AC Milan was able to do this and perhaps how many other teams did it. The answers to such questions would help explain what kind of defensive tactics might be employed when viewing matches involving Barcelona. Answering such questions would involve a relatively simple query within ESPN’s TruMedia tool they showed off in their booth at the conference, which is able to comb through millions of data points from multiple soccer competitions over multiple years. The key is in providing the simple-yet-effective visualization initially that leads to a whole host of questions that can be quickly answered via a comprehensive database with an easy-to-use front end. That’s how analytics progression happens, even when starting from a relatively simple heat map.
New Insights from Day Two
The topic of organized-and-accessible data was a hot one in the Injury Analytics panel as well. Stan Conte, VP of Medical Services for the Los Angeles Dodgers, talked about how it took 2.5 years to compile a complete season’s worth of injury data in 1996. With advanced use of databases within Major League Baseball (MLB) and more standardized descriptive data being captured with each injury (read as “moving beyond the standard disabled list report”), the MLB has been able to determine a few interesting things:
1. In 2012 alone the league lost 29,000 player days to injury at a total cost $600M in lost player productivity as judged by the value of player contracts.
2. Over $2B was lost due to injury in the last four years.
3. Baseball got serious about its steroids problem in 2006, and subsequently saw a spike in injuries after enforcement of the new policies began.
4. Starting pitchers, the most valuable commodities to a team both in contract valuations and perhaps runs, wins, and losses, are at the highest risk of injury with a more than 50% injury rate per season.
Knowing the size of the problem and the types of injuries allows people like Conte to then turn to the advanced data within the PitchFX database to look for patterns in pitchers who are more or less successful at beating the 50% injury rate. Insights gained from such work will then lead to more sophisticated predictive tools that can be used to judge the likelihood of future injury and thus modify teams’ approaches to how they utilize pitchers (see the Washington Nationals’ shelving of Stephen Strasburg this last baseball season). My hope is that injury analytics reprises its role in next year’s conference, as it certainly will be an important and growing field within the wider sports analytics community. Perhaps in a year’s time we’ll be able to discuss the latest revisions to predictive models like this one developed for soccer.
Two of the other highlights from Saturday’s conference events happened outside the panels. The first was from invited speaker Michael Mauboussin who gave a presentation entitled Why You Don’t Understand Luck. It served as a great reminder that luck will play a bigger role in sport outcomes as athletes grow more skilled and the distribution of skills becomes narrower. Mauboussin encouraged listeners to not only recognize this reality, but also the biological reality that our brains are actually wired to not understand luck. Throughout our evolutionary history the brain has worked to assign cause-to-effect as a survival instinct via what Mauboussin calls the brain’s “interpreter”. The interpreter knows nothing of luck, only causality, which means it is constantly attempting to build a narrative around what are often random events. This also helps explain why most people are not inclined to statistical analysis, and even those who are so inclined often times mistake noise in a data set for signal. Mauboussin provided several clinical and research examples to demonstrate how this ultimately leads to risk adverse behavior, and cautioned all the analysts in the room that what we should all really be interested in is relative skill and not absolute skill. It’s for this reason that he argues that we should stop arguing about who will be the next 0.400 hitter in baseball and instead focus on the distribution of batting averages. I’ll let you watch his presentation when it’s posted to the conference website to understand why.
The last research paper presented on Saturday was also perhaps the most interesting one to me given my soccer analytics background. The Hidden Foundation of Field Vision in EPL Soccer Players was written and presented by Dr. Geir Jordet. Dr. Jordet represented perhaps one of the better combinations of taking an intuitive subject (better field vision should improve one’s passing), broke it down into some very good, high quality video examples (Pirlo’s goal in 2006 World Cup vs. Germany starting at the 6:30 mark, Robin Van Persie’s goal against Liverpool, and Iniesta’s World Cup winning goal against the Dutch starting at 0:35 mark), demonstrated how advanced video technology aided by hard work categorizing it generated the core data for the study, quantified the actual impact the intuitive thought had on pass completion rates (statistically significant gain for midfielders, less so for forwards), and even tied it to training regimens that could be used to improve field vision in younger players. Some may look at his paper’s overall conclusion – field vision = better passing – and respond, “Duh!”, but to do so is to engage in analytics snobbery more obsessed with making the rare-but-famous all-in-one breakthrough than the multitude of smaller insights that lead to more questions and build towards real impacts via evolutionary change. Barcelona’s current possession-based and death-by-a-thousand passes culture didn’t come about from some system imposed overnight upon the club from some über-smart analyst. It was developed from continual insight and refinement of training not only senior players, but more importantly their youth at the La Masia academy. The most dominant professional soccer team in the history of the game was more than a decade in the making. Why should we expect any less development time of models and theories within the wider analytics movement? Dr. Jordet’s paper and presentation are indicative of just such an incremental-yet-highly-insightful approach.
The Role of the Conference as an Analytics Catalyst
Books and movies about events like those captured in Moneyball certainly have helped focus the sporting world’s attention on analytics, but the attention is also a curse because everyone outside of the core group of analysts moving the field forward seems to be looking for the next huge breakthrough worthy of the Hollywood treatment. We all must remember that before Michael Lewis and Billy Beane there was Bill James – a guy with his own knowledge of the game, the best numbers he had available at the time, and a calculator. It took two decades of work by him and a core group of outsiders before they built a critical mass that was finally noticed and used, and even then it required a bit of luck to take off the way it did. The MIT Sloan Sports Analytics Conference has an opportunity to encourage such an approach, and refocus everyone on the reality that not all sports have as mature an analytics movement as baseball does now. We can’t all leap to that same maturity level overnight, and it took baseball decades to get there.
No one knows where the next great breakthrough may come from, and like other research fields we may not know it was a breakthrough until many years later when we see many other people using the model or theory. We do know that the next great analytical metric or model won’t be developed in isolation. It will be developed through continual discussion and refinement, and today’s analysts have a leg up on Bill James because they have a worldwide audience via the Internet and an annual “Super Bowl”-style event where everyone congregates at the MIT Sloan Sports Analytics Conference. I can’t wait to see what small or big breakthroughs are made in the next 363 days when we converge yet again on Boston for the 2014 conference.
In the meantime, keep modeling, testing, visualizing, and communicating. As was said throughout the conference, we don’t want to judge our success or failure based upon outcomes. We want to judge it based upon the process.
Editor’s note: The views expressed in each post are those of the author(s) only and not those of the conference organizing team or blog sponsor.