Recently, we sat down with legendary author of The Bill James Historical Baseball Abstracts, Bill James. Gain insights and learn about Bill’s passion for baseball analytics:
Bill, you incorporated analytics into sports long before it was hip or trendy. In fact, the term data analytics wouldn’t appear for decades. We all know you were an obsessive baseball fan and follower of the game, but what initially drew you to a data-based approach?
A recognition of my limitations. “Data” is incidental, and it isn’t a data-based approach. It’s a fact-based approach. Sportswriting was and is based primarily on authority. The game broadcaster says implicitly that “I have the right to say this because I played the game, and therefore I know what I am talking about.” When I started writing about baseball I was very aware that I had no standing, no authority. I had no license. Therefore, whatever I said had to be something that the reader could check out for himself, for herself, and confirm. Every word was supposed to be something that could independently be validated. Incidentally, that drives one toward data in many cases. But what I was trying to do was to create a fact-based analysis to challenge the authority-based analysis.
When you were first working on The Bill James Baseball Abstract, did you do it out of your passion for the game or as more of a career progression tool (or both)?
Well, I wanted to make a living as a writer. Baseball was what I cared about, what I knew about. I can’t say that it was one or the other.
Your first publication was met with some hesitation from editors and you ended up self-publishing in 1977. Eventually publications like Esquire and Sports Illustrated jumped on board and your career took off. Looking back, would you have done anything differently in the early stages?
I’ve never looked back enough to know. I know I was very fortunate in my early years as a writer. I was prepared to struggle, as a writer, for 10-15 years before I made a living. But it only took about 4 to 5. The first article I ever sent to an editor was accepted and published.
If the same level of data had been available when you were young would you have first tackled baseball or basketball? Why?
Baseball. I mean. . . I love basketball, but it is not the same. I dream about baseball every night. You could wake me up at some random moment of the night and ask me what I was dreaming about, there’s a good chance it would be in some way connected to baseball. Baseball in my subconscious gets mashed up with personal relationships, with politics, with whatever I have been thinking about. I can’t recall that I have ever had a dream about basketball; probably have, but I have no awareness of it.
The static nature of baseball makes it easier to analyze. In football, basketball, soccer, rugby, hockey, a possession can begin with one team having the ball, and the other team scoring points. Possession is a fluid concept. Fluid concepts are difficult to analyze. Baseball moves from state to state in a more predictable manner, which makes it more suitable for introductory-level analysis.
The movement towards analytics has gotten more and more attention over time, but we still often hear people conflate “sabermetrics” with “statistics”. How are those two terms different and is it important to distinguish between the two?
Well, research is a lighted pathway between a question and an answer. All sabermetric discussions begin with a question having nothing at all to do with statistics. The essential question is “Why do teams win?”, but the central question has millions of subsidiary questions. What is the value of speed to an offense? What is the value of velocity to a pitcher? Which of these two players is more valuable to his team? Who was better, Vladimir Guerrero or David Ortiz? If we’re making a trade, who should we trade for? Who will have a longer career, a 22-year-old defensive wizard at shortstop who can’t hit or a 22-year-old left fielder who can hit?
There are no interesting questions about baseball statistics. And the answers to these types of questions cannot be different if examined through the statistics than if examined in some other way. If you get one answer one way and a different answer the other way, then one answer or the other is just wrong.
But in searching for the answers to these questions, we PRODUCE statistics as a by-product. To go back to the beginning, one of the first things I did was to count stolen bases allowed by catchers. In the mid-1970s there were no published stats about stolen bases allowed by different catchers, different pitchers. Pete Palmer had done some work on the issue, but it was spotty, and there had been published stats about it in the 1920s, counts of stolen bases against pitchers and against catchers, but that line was dropped because it was too much trouble to keep track of that stuff.
In the mid-1970s Johnny Bench’s defense was commonly believed to be phenomenal and he won the Gold Glove every year, but I remember somebody saying that Steve Yeager of the Dodgers actually threw better than Bench but didn’t get the love from the media because he didn’t hit. So, I thought, “Well, why don’t we just count how many stolen bases there are against Bench, and how many against Yeager?”, not knowing that this had actually been done before. Anyway, when you do those counts, then you are creating a statistic, stolen bases allowed. It’s a by-product of the study, the purpose of which is to compare Bench to Yeager to the rest of the league. So, what I found was, (1) Bench was in fact extremely good at limiting the running game, (2) Yeager was equally good, and (3) nobody else was on the same level. Well, Jim Sundberg came up in ’74; he was on the same level.
Now that sabermetric analysis has become the status quo in professional sports, do you see any limitations in its applicability to certain sports, organizations, or specific aspects of the game being analyzed?
No. I’m sure there are limitations, but you asked me if I see them. No, I don’t see them.
What I often say is that all we have done is to pull a few buckets full of knowledge out of an ocean of ignorance. What we DON’T know is so much larger than what we do know that the limitations are essentially not there. Is there some limit to the size of the ocean? Yes, of course; there is a shore line somewhere. But when you’re in the middle of the sea it is so far away that it’s not relevant.
Our limitation is our arrogance. To the extent that we think we know things, to the extent that we are convinced we understand things, we lose the ability to “see” what we don’t know, and this limits us. But as long as you accept that the world is billions of times more complicated than your mind, so you can never understand the world or even come close to doing so, then there will always be new things to study.
Does the advent of “big data” and rapidly growing data collection capabilities across sports create bigger differences amongst organizations, or does it lead to more parity?
That’s an innovation and structure question. Structure always tries to level the playing field; innovation always disrupts. Innovation always leads to short-term inequalities. The powers that control the leagues are always trying to restore order, and also the natural processes of competitive imitation spread innovation across the league, so that in the longer term all things that used to be innovations become community possessions.
What do you think might be the next big breakthrough in sports technology?
With DNA testing we should soon be able to identify superstar athletes the moment they are born, don’t you think?
I hope nobody thinks I was serious about that.
What’s your response to those that think analytics has made baseball “boring”?
We have to worry about the positive or negative impact of what we do. It’s a legitimate concern. We have to be careful that we’re not accidentally harming the game.