By Zach Slaton, Author of A Beautiful Numbers Game, Contributing Writer at Forbes
Five stages. 39 panels. 2700 attendees. 21,600 people hours. All in the pursuit of the next great sports analytics breakthrough, and that is just the first of two days at the MIT Sloan Sports Analytics Conference. My intention of describing the conference as “the Super Bowl of sports analytics” was to draw an analogy between the scale of the conference and it being the culmination of an academic calendar’s worth of research. Day one lived up to those expectations and more.
Like the traditional Super Bowl experience, my weekend started before the actual event with the Soccer Analysts meet up on Thursday night. Around thirty self-confessed soccer nerds met at McGreevy’s 3rd Base Saloon in Boston. The bar is part-owned by Dropkick Murphys bassist Ken Casey, which means it is steeped in Boston sports history and the sound system plays a steady stream of alternative and punk rock. The menu is minimal-yet-tasty, the place is loud, and the staff is largely tattooed. I felt at home, even though I was three thousand miles away from Seattle. There is perhaps no better place to kick off a sports-filled weekend than this quintessentially Boston sports bar, and over the next four hours attendees traded the latest soccer statistical theories, shared a few personal stories, and consumed their fair share of beers. The nerdy socialization was perfect preparation for the first ten hours of the Sloan conference that constantly alternates between panel discussions, research presentations, and networking.
After so much fun on Thursday night the alarm clock was admittedly going off a bit earlier than preferred come Friday morning. Nevertheless, getting to the conference early is part of a larger recommendation of getting to panels of interest early. Popular panels fill up fast, leaving standing room only for those who try to show up just before a discussion begins. It was no different on the first day of the conference, with any panel involving Nate Silver easily filled beyond seating capacity. While Silver and a few other panelists deservedly received the lions’ share of attention, such focus on them opened up opportunities for greater interaction in panels or presentations with fewer attendees. The beauty of the Sloan conference is that attendees get out of it what they personally desire and consequently prioritize – the insight that comes from more well known guests, or the more intimate and potentially counter-intuitive insights of less well known yet extremely intelligent panelists. The key to successfully navigating the conference’s multitude of options is being a bit deliberate and planned in one’s approach as there is literally no way for a single person to see all of the content on all five stages. Thus, the highlights presented below are but a subset of the complete conference experience on day one.
1. The relative maturity of sports analytics continues to grow with each passing year. Yes, this can be marked by the conference’s growth in attendees and the number of panels it has, but the real indicator is the number of booths set up by commercial entities and the number of business-oriented panels at the conference. The entirety of the conference atrium, which is easily over a football field in length, was filled on the first day with vendor booths displaying products ranging from smartphone and tablet applications for consumers to software packages that would be found in a team’s front office. ESPN’s Stats and Information department took up an entire lounge to display technologies like their TrueMedia application for data visualization. Analytics will always be fan driven, but the business commitments being made by various commercial interests prove that the sporting world sees a big enough impact for analytics that the field can be monetized.
2. One of the consistent themes from Day One, and perhaps the whole conference, is that analytics is moving beyond the data analysis phase. The commercial and non-commercial sides of the sports world largely accept the value of analytics. That battle seems to have been won by the data nerds. What is needed now, as Kirk Goldsberry said in the XY Data: The Revolution in Visual Tracking panel, is “[not] more people good at running the stats. We need more people better at communicating the stats.” Yes, the leagues, teams, media, and sponsors will continue needing data analysts to crunch the numbers, but the organizational impact that could come from the use of analytics will fail to be realized if users are unable to translate all of the data into a format easily understood and digestible by those who are not as numerically inclined as the analyst. It’s no longer an issue of resistance to a numerical approach; it’s now an issue of making the data understandable.
3. Another consistent theme from day one was that one of the best ways of achieving better levels of data communication comes in embracing improved data visualization. The human mind, no matter how advanced it becomes, has primal animal tendencies, and one of its most primal instincts is to see insights more quickly from visual information than can be realized from the written word or tabulated data. The Data Visualization panel provided some of the best examples of such an approach. The New York Times’ Joe Ward, Google’s Martin Wattenberg, and Fathom’s Ben Fry walked the audience through a number of effective visualizations that took thousands of data points and boiled them down to a story various target audiences would be able to understand and find valuable. Examples included live demonstrations of the popularity of the name “Brady” within the wider Baby Name Wizard’s Voyager tool (highly useful to expectant parents), as well as the simple-yet-effective three color presentation of how quickly Derek Jeter broke through the 3000 hit barrier (looking at the graph, who was the only player to get to 3000 hits faster than Jeter?). Perhaps one of the more interesting examples came in Martin Wattenberg’s plot of wind in the United States. Wattenberg found the visualization aesthetically pleasing, but didn’t find a use for the information until he published the visualization and started getting feedback from the likes of surfers, bicyclists, and even butterfly researchers. This example demonstrated how an analyst might not see immediate value in the data, but by quickly generating rough visualizations for public feedback they may identify experts or users in other fields who might find it valuable. Overall, the Edward Tufte principle of “don’t ask how much I can put in a graphic, but rather how much I can take out” was on display for sixty minutes. The panel challenged analysts who play with gigabytes of data to find simpler, graphical ways to tell a story to decision makers who may care far less than the analyst about the complex data behind the visualization.
4. To that end, this year’s conference appears to have added a good bit more audio/visual support to panels driven more by panelist discussion in years past. This enhancement provides immediate examples of good data visualization in conjunction with many of the panels, as well as a richer experience for the attendees. The Data Visualization panel certainly took advantage of this feature, but so did many other panels on day one. Indications are that this trend should continue in day two. For a conference based upon analytics and visualization, it seems a bit surprising that this is the first year A/V elements have been so widely used. The enhancement is a welcome addition to the conference’s presentation approach.
5. Both the casual observer and the most dedicated analyst can often make the mistake of seeing certainty behind what are actually models making approximations with errors. This year the conference debuted a panel entitled True Performance and the Science of Randomness that attempted to reign in such certainty and encouraged listeners to think probabilistically. Panelists like blackjack legend Jeff Ma and poker savant Nate Silver spoke of how the probabilistic nature of card games was the perfect analogy for how analysts and executives should look at sports analytics models. These models will eventually fail to deliver the desired result as they inevitably have flaws, but if one has properly managed the risk inherent to the model’s forecasts in the long term they should still come out ahead of those not using analytics in their decision making processes. Ma pointed out that the ultimate confidence in such risk mitigation comes in dusting one’s self off after a particularly poor outcome and then going back and using the model to inform their next play, whether one is down $100,000 in a game of cards or has just seen an investment in a player go horribly wrong.
6. The final panel of the day on XY Data also brought about one of the more poignant instructions for analysts: innovation doesn’t require a full data set, and you will hamstring yourself if you wait for such a perfect data set. Panelist Harry Pavlidis contrasted his experience of using PitchFX data, which is widely available, with the more limited HitFX data he was able to access. While the public release of HitFX data only contained a month’s worth of games, it was more than enough for Pavlidis and others to demonstrate the value of advanced batting visualizations and analytics. This demonstrated value drove further studies within Sportvision and Major League Baseball teams that had access to the full data sets, and those subsequent studies would never have taken place without the initial demonstration and thought provoking conclusions it provided. Sometimes we data analysts don’t realize just how spoiled we are in today’s database-friendly, Google-driven instant information world, and instead should recall the simple-yet-effective statistics derived by Bill James, his calculator, and the best publicly available data at the time. James listened to Voltaire, and didn’t let the wait for the perfect data set get in the way of the good one. Analysts today would do well to remember Pavlidis’ more recent example of Voltaire’s adage and perform the best analyses we can with the data we’ve got today, which will then justify getting our hands on even more data tomorrow.
As was pointed out earlier in this post, the items above are only a small sampling of the experiences I had from day one of the conference and represent less than one fifth of the available options over the first day. I’d encourage readers to seek out other attendee’s coverage of the event if they hope to get a complete view of all that the conference had to offer on Friday.
The conference’s second-and-final day is on Saturday. Key panels include ones on injury analytics, big data lessons for sports, ESPN’s use of analytics in telling sports stories, and the soccer analytics panel. I’ll also be spending some time in the research paper competition, with papers on whether or not to crash the boards in basketball and the effects of field vision in soccer being of primary interest to me personally. I’ll return with my thoughts on day two of the conference as well as an overall wrap up early next week. Until then, be sure to follow the MIT Sloan Sports Analytics Conference Twitter account to get the latest news, views, and insights from day two.
Editor’s note: The views expressed in each post are those of the author(s) only and not those of the conference organizing team or blog sponsor.