Thursday, October 16, 2008

The limits of baseball statistics

I love baseball, I love baseball writing and I love baseball statistics, so I’ve been mulling a subject that has bothered me for a long time: the assault on some baseball terminology as propagated by Joe Sheehan at Baseball Prospectus.

Baseball Prospectus has revolutionized baseball’s record-keeping over the last few years — or at least popularized a long-simmering underground revolution. Put simply, baseball’s most cherished statistics — batting average, runs batted in, pitcher wins and losses — tell us a lot more about what happened in a game than they do make an accurate measure of a player’s contribution. It sounds tricky, but the best way to look at it is that the statistics that Baseball Prospectus compiles from games have a much better track record of predicting what will happen in the future (hence, “Prospectus”) for any given player than the “traditional” stats, which are measures of things that are often beyond a player’s control. That is, the total number of runs batted in a player will accrue during the season relies heavily on the quality of a player’s teammates, and a pitcher may win a game in which he gave up 10 runs and lose one in which he gave up 1. It doesn’t even take a baseball fan to divine the dubious value of such a statistic.

As the BP crew has grown in stature and number over the past few years — primarily since the publication of the 2002 book Moneyball, which highlighted the “newfangled” methods — it has been under nearly constant attack from baseball lifers and “purists,” who argue, basically, that the number-crunchers are a bunch of dweebs who long to make passionate love to their computers. It’s was unfair even before the BP numbers turned out, on a macro level, to help teams to such a degree that it’s pretty much accepted that they were right; the criticism from the old guard now is fairly passive/aggressive and limited to veteran announcers and writers who make claims that the “stats don’t always tell the story” and that there are intangibles involved with winning baseball games. The numbers guys don’t deal with intangibles. If it can’t be measured, it is not important.

This dichotomy rears its ugly head, as it were, every October during the baseball playoffs. Inevitably an announcer will make a comment that a team’s “veteran leadership” will prove decisive, or that their “heart” will lead them to victory. Just as inevitably, Sheehan will write a column excoriating the mouthpiece. Here is an excerpt from this year’s column:
Post-season baseball is just baseball with more media credentials and fewer games between flights. Pressure? There may be more, but is it any more than that faced when you're trying to get drafted? Make a team? Win a playoff spot? Does this week really feel more pressure-packed for the Brewers or White Sox than last week, every game a must-win game, did?

The stock storylines don't add anything to our enjoyment of the game. Whether it's "post-season experience" or "veteran leadership" or "pitching and defense" or "small ball," all these attempts to fit the postseason into boxes limit our knowledge rather than expand it. If we're going to break down these games, and figure out why players do well and poorly, why teams win and lose, let's wipe the slate clean and focus on what's happening on the field.
After years of struggling to figure out exactly what my problem is with this line of reasoning, I think I’ve finally found it: it renders words meaningless. Sheehan’s problem is not that these terms are misapplied but that they are applied at all. This gets to the heart of what Baseball Prospectus is all about: predicting the next set of numbers. There is one set of numbers, a game happens, and then there is a new set of numbers, both for the game itself and one that incorporates all previous games. The numbers do a fairly good job of predicting what the results will be on a macro level, but as Sheehan notes above in reference to the playoffs, post-season baseball is fundamentally no different than regular-season baseball; that is to say, and I’m sort of quoting from memory from hundreds of other articles that he’s written, there is nothing about post-season baseball that makes the numbers any less capricious than they are in May. That is to say, October baseball is subject to the same forces as any other game, with respect to creating new sets of data. You can predict what might happen, and be correct a good percentage of the time, but the game itself — the number of outs, the rules, etc. — is no different in the playoffs than it is in September, in that it’s damn near impossible to predict anything with certainty. The otherwise dogshit Cardinals won the World Series two years ago due to a strong October run. Here’s what Joe wrote to crown them:
Fans, and the less-critical corners of the media, are welcome to embrace the Cardinals and create storylines about raising their level of play and coming up big when it counted and grit and guts and what have you. It might ring more true if it wasn’t the standard storyline for every single team that wins a championship: they’re better people than the guys who lost.
I think he means “better baseball players,” but that’s not my point. Look at sentence fragment that talks about how writers will “create storylines about raising their level of play and coming up big when it counted and grit and guts and what have you.”

Do you notice anything wrong?

If you do, awesome.

If you don’t, let’s start from the beginning. Of baseball. Baseball is a human construct. Or at least we assume it is (ha!), not knowing its precise origins. (The Cooperstown moment is a myth, but one that will do. Like a lot of history.) But let’s just be clear: there’s nothing inherently special about baseball any more than there is anything inherently special about anything: any meaning it has is what we give it. The pre-season, 162-game schedule and post-season are completely arbitrary, save for the meaning we give it. The “championship of baseball” is a construct that, like the sport itself, has no inherent meaning whatsoever. I suspect Joe would agree with me on this, and why the numbers don’t play any different in the post-season than they do the regular season. The numbers don’t know it’s the playoffs.

But the numbers don’t play the game.

This is an incredibly important distinction that has been made many times, by many people, the only difference between them and myself being that they are usually trying to discredit BP’s stat-heavy mission. I am doing no such thing. I love the numbers. I play in a fantasy baseball league that is entirely situation-neutral numbers heavy — that is, the numbers which are BP’s bread-and-butter — and wouldn’t trade the numbers for anything. But there’s a reason that the numbers can only predict what will happen in a given game, series, or season, I dunno, 60 percent of the time (to randomly choose a fairly generous number) — the game is played by people. Or, as Billy Beane, master of the numbers, said in Moneyball, “My shit doesn’t work in the playoffs.” People play the game, and sometimes the favorites win, and sometimes they don’t, like the Cardinals in 2006. And the people, unlike the numbers, know it’s the championship. While Joe is perfectly fine with making his own, completely subjective value judgments on how much “pressure” playing the playoffs actually brings (Despite his distaste of subjectivity, remember: “There may be more, but is it any more than that faced when you're trying to get drafted? Make a team? Win a playoff spot?”), he a) intentionally overlooks the fact that the World Series is, by acclimation if not definition, the most important baseball played each year and thus likely subject to the most pressure; and b) follows it up with, “If we're going to break down these games, and figure out why players do well and poorly, why teams win and lose, let's wipe the slate clean and focus on what's happening on the field,” which has nothing to do with his anti-“veteran leadership” et al. screed. When people are talking about “postseason experience” and “veteran leadership,” this is exactly what they are trying to do.

The numbers, with their gap in accuracy between predicted results and actual results, don’t do the trick. Observation closes the gap. In the Cardinals/Tigers series, Sheehan talks about how the Cardinals got lucky that the Tigers made so many errors, and that the Tigers lost the title more than the Tigers won it. This is likely because the Tigers made such a shockingly high number of errors (seven, I believe) in a short series, and errors are largely unpredictable, so the random sequence of events — the errors — tilted the series toward the Cardinals. All this talk of randomness and capriciousness, which creeps up every year, viz:
I keep coming back to the central theme of any baseball postseason. The champion isn’t necessarily the best team, but it is almost always the team that plays the best in the short series of October. The Rays aren’t getting "lucky" in any sense other than they’re playing well when playing well has some excellent rewards. The Red Sox aren’t getting "unlucky," other than that they’re playing poorly at the same time. The Rays are playing better baseball, and thanks to that, they’re one win away from something that would have seemed preposterous to all but one man and his trusty CPU seven months ago.
Just for a quick side-trip, let’s look up “champion” in Webster’s. It’ll be important:
2. One who by defeating all rivals, has obtained an acknowledged supremacy in any branch of athletics or game of skill, and is ready to contend with any rival; as, the champion of England.
Getting back to the Tigers/Cardinals, the question Joe posed is not how the Tigers lost but why they lost, for the how is obvious — it was the errors. Neither Joe’s statistics or observations begin to provide the why he seeks. Isn’t that something? In fact, the only two sources of why are the injury report and the work of writers, who try to use the tools at their disposal (words) to describe why what happens, you know, happens. Words like “heart” and “passion” are perfectly applicable in baseball because if they are not they would cease to exist. They would be meaningless. As a quick exercise, think about your day right now for one second. How many things are going through your mind? Now imagine a sport that takes, at the least, 18 people to complete one game. How many processes, spoken or unspoken, would contribute to the outcome? It would have to be infinite, right?

I think it’s time that Joe and the other hardcore number-crunchers realize that we have created baseball, but the numbers merely describe the numbers, and nothing else. The words we use have meaning, so when we call a team a “champion,” they are the best team because we say they are. Everyone knew what they were playing for when the season began, and only one team achieved it. It happens because of great players, veteran leadership, tactical decisions, experience, and features from across the spectrum of what it means to be human, some of which are quantifiable, some of which are not. Sometimes the words are wrong (Your best bet would be to ask for examples of veteran leadership). But sometimes the numbers are wrong, too. We’re trying to describe what makes our champions our champions. If the “champion” is merely a construct and doesn’t mean anything, you shouldn’t care. If it does mean something, then you’ve admitted defeat. The words don’t predict or describe as well as precisely as the numbers, but that doesn’t mean they’re less important. Baseball is one of the most dynamic games ever created, but it's not one-one hundredth as dynamic as the human brain and human emotion. Champions are champions for a reason; we made up the word the same way we made up the game. Let us tell the stories of why the champions became who they are. We'll use the numbers and use words. It's an imperfect exercise. But we're trying.

UPDATE: Wow. This now exists. It's all there.

Also: Follow-up emails for those that are interested.


cannatar said...

I think we all agree that the team that wins the 7-game series played better during those 7 games (unless the difference was a blown call or a ball hitting a pebble, etc). They hit better, pitched better, and/or fielded better. Sheehan's position is that the outcome of the 7-game series doesn't tell us anything more than what happened in those 7 games. Many writers/announcers make the assumption that the winner of the series is inherently superior in some manner (talent, heart, clutchness, etc.). I think you can still "tell the stories" without assuming that there's something inherently superior about the winners - tell the story of the plays, the athleticism, the hustle, the wicked sliders, the blazing fastballs, the perfect bunts, the pitch selection, the many battles between pitchers and batters. There are a lot of stories to tell about every baseball game if you're a skilled observer and a good writer.

Bryan said...

I agree, but my beef is that he basically has argued for years that such terms are explicitly off limits, when they, as traits, do in fact exist. The traits being misapplied does not mean they do not exist.

The winners still won the championship. Something made them win it. The numbers are on one side of it. If the other side wins, they could have won due to a large number of factors. Not all those factors are measurable by numbers and observation. The point of these terms, the point of words, is to try and describe why this team won this particular series. That a lot of people do it poorly does not mean it cannot be done, or is not worth trying. And if there are terms to describe why one team won the seven game series on which we place the most importance, we should use them. That one team may not have anything that makes them inherently superior, but they won for a reason. Humans play games. Humans are susceptible to things such as veteran influence, heart, and passion. If those are the right terms in our divination, they deserve to be used. That is why the words exist.

Bryan said...

Basically I don't feel like the numbers guys don't have a monopoly on baseball journalism, just like the old-school guys don't, and the lecturing doesn't advance his cause in any way. Not that what I say matters. But I'd like to believe that there's a way to meld numbers and prose without constantly picking on what everyone else is doing, as it seems like this was the numbers guys' problem with the baseball establishment when they broke it. It's just mean, basically.

Though anyone wishing to speak ill of Chip Caray forever has the floor in mine home.

Daniel said...

Very well written - I appreciate your viewpoint. A couple of points I'd like to make and I'll try not to ramble.

1. I think that part of the reason guys like Sheehan get so worked up is because so many of the "old school" baseball media attribute success to generalized, cliched intangibles. Someone like Chip Caray may attribute Ortiz's homerun to a mysterious "clutch" ability when actually it was mostly due to the fact that Ortiz is statistically very good at hitting homeruns, relatively speaking.

2. The problem I have with some of the more intense stat guys (like Sheehan) is that there is this sense that there is no place in baseball for anything like "clutch" or "veteran leadership" or "chemistry." The two things aren't necessarily mutually exlusive. Ortiz has been in this type of situation before, more than the average guy. Maybe Ortiz has a way to calm himself in these pressure filled situations and that is why he's more likely to hit a homerun when faced with that pressure than other guys. I don't know. Sheehan doesn't know. But I don't think there's anything wrong with using some of these things to try to explain what happened in a baseball game.

I too enjoy statistics and I understand their usefulness in both predicting future events and understanding past events. But I don't agree with the criticism of people who apply analysis outside of the realm of statistics to explain baseball.

As an example, it would be silly to say that Grant Balfour's excellent season was a product of Troy Percival's veteran leadership. But it would be almost as silly to dismiss that leadership as having no affect on Balfour at all. And I have no problem with someone bringing that up during a broadcast or in an article. I do have a problem with people who dismiss it out of hand as being entirely irrelevant.

Bryan said...

Thanks for the response, Daniel. It sounds like we're pretty much in agreement here. Yes, it might seem odd that I'm excoriating Chip Caray and Sheehan (who is more or less excoriating Caray) at the same time, but I feel like his sustained campaign against this sort of nonsense has drifted into an indictment of all things non numbers- and observation-related. It might be nitpicking (and commenter #1, cannatar, would almost certainly say so), but I've refrained from getting too worked up about it because I thought at some point it would simply be understood, and Sheehan would move on. I think the key here is to demand evidence of all this stuff, the same way I'd ask Sheehan if he has any evidence about passion, and he'd probably say "no." These things are not tangible but when we speculate on them it's to attempt to expand our knowledge of the game.

Basically, I'm also mystified at the premise that the "better" team doesn't always win the championship, when in fact the point of the championship is to determine the best team. It's a very Parcells-ian argument, but one that, for me at least, enhances my enjoyment and understanding of the game.

birtelcom said...

"Basically, I'm also mystified at the premise that the 'better' team doesn't always win the championship, when in fact the point of the championship is to determine the best team."

If we define the "better" team as the one who will win any particular game more than 50% of the time (because of any tangible talent advantage the team may have as well as any intangible advantages it may have, such as smarts, tenacity, poise under pressure, etc.) it remains true that that "better" team will lose a best-of-seven game series quite often, if the teams were to play, say, 100 such series. Based on one such best-of-seven series, we simply cannot accurately conclude that the outcome of that series truly tells us anything about which team is "better" in the sense I started with. When we begin to make up stories about which team is "better" in this sense, based solely on the evidence of a single best of seven series, we are merely telling "just-so stories" that sound cute and make superficial sense but have nothing to do with evidence. We might just as well say the sun orbits around a stationary earth because it fits the evidence of what we see in the sky. That makes a satisfying story but you don't publish it in newspapers because it doesn't fit the full evidence that we have.

Bryan said...

I don't think the earth/sun analogy is valid. If we knew the earth orbited the sun 60% of the time and it didn't 40% of the time, then I think it might be appropriate. At it is, we're stuck around 100%. Even Manny's not hitting home runs every time up; until the player who hits a home run every time up comes, I'm not going to equate these forms of "evidence." Plus, there are things like gravity and such that explain why the Earth does what it does; these being inanimate objects, we'll know why things happen when they go wrong (or, let's hope not).

This may not be a good example of why the analogy is bad but I think there are dozens of others.

My point is the statistics in baseball are not really "evidence" of anything, given that they are fluid, but even if we accept they are, things happen which counteract the "evidence" which most statistically-minded observers call "randomness." I think "randomness" is a lazy example and until we get this "randomness" under control it's best to avoid outlawing terms like "veteran leadership" et. al. In this case, I don't think I'm arguing anything much different than Bill James was in Underestimating the Fog, in which he cautioned against making assumptions based on incomplete knowledge. That is what I'm saying. The better/best argument proceeds along those lines, with my arguing that until we can conclusively determine who the best team is via some other method (and here's where I'd have to disagree that the stats, fluid as they are, are "evidence" of anything inherent), I'll go with the team that wins the final series of the year, which is every team's goal. MLB teams are created for the goal of winning the World Series, so I find it hard to argue that the team that wins the World Series isn't the best team. You say one series can't tell us who is "better" in some fundamental sense, whereas I say that since team X won this particular series, team X was the best team in baseball in a given year, because they won the series that every team tried to win. I'm not talking about what would or could happen, I'm talking about what did happen. I realize that much of this is semantics and metaphysics etc. so we're not actually all that far off from each other, it's just a distinction that I make that, really, keeps baseball fun for me. I'd feel better if there wasn't a best-of-5 in the postseason, but I still feel good enough to state my case.