With the recent hire of Paul DePodesta of Moneyball fame by the Cleveland Browns, I figured it’s time to put into words something I’ve been thinking about for a while now.
Moneyball cannot be translated to the NFL.
Of course analytics do matter in the NFL, and they’ve mattered for quite some time now. Chicago’s Phil Emery and Atlanta’s Thomas Dimitroff are both well known for embracing “next gen stats”. It’s worth noting that Emery has since been fired and Dimitroff is on the hot seat himself. My argument is more that they don’t matter nearly as much in football as they do in baseball.
What is Moneyball?
We can’t have a discussion on something until we agree on what exactly it is. When people say “Moneyball” they’re of course referring to the 2003 book by Michael Lewis about how the Oakland Athletics used data to drive their personnel decisions. The general idea is to go beyond the basic statistics (e.g. strikeouts, batting average, ERA) and develop a set of statistical tools that can be used as a primary decision factor for selecting players. This allows you to find very good players and pay them low salaries.
Ever since the A’s found success with their system the world of sabermetrics has exploded. You now have stats like RAA, WAA, RAR, WAR, oWAR, dWAR, oRAR, RC, AIR, BABIP, OPS, OPS+, and on and on. People have been talking about bringing a system like this to the NFL for a while, but nobody has been quite as successful at it in the NFL compared to MLB.
What’s Different About Football?
In my view there are four main differences between the two sports that make analytics far less valuable in football than in baseball.
- Sample Size
- Salary Cap
Baseball is fundamentally an individual sport. A batter stands at the plate and the pitcher has to throw the ball at him. It’s a one-on-one matchup where nobody else has an impact on what happens other than the two players. Yes, there are a few things like having runners on base that will make each player play differently, but those relationships aren’t numerous and you can often ignore them without losing much information.
Contrast that to football where any given player’s performance during a particular play is dependent on every other player on the field. The only positions that have true one-on-one matchups are linemen, and even that gets murky a lot of the time. The real problem is that there is heavy dependence between successive plays. A fundamental assumption in much of statistical modeling is the independence of your observations. This assumption holds in baseball much more than in football.
What do coaches even do in baseball? They set the lineup, tell guys when to bunt or steal, and decide who pitches when. In football coaches orchestrate the entire team from the sideline. They tell every player what to do on every single play. If a coach wants to prioritize stopping the run then he will tell his DE to not sell out on the pass so much. There is no way to tell the difference between that and a player who is trying to rush the passer but just fails.
There are 162 games per year in MLB. There are 16 games per year in the NFL. This pretty much speaks for itself. Combine that with the fact that you’re trying to take into account all of these relationships between players, teams, and coaches and you quickly run into sample size problems. You can’t take everything into account when you only have 16 games to work with per year.
Baseball has no salary cap. The NFL has a salary cap. If you’re a mid/small market team in baseball then it is vitally important that you pay less money for more talent than the big boys. In the NFL everybody has the same amount of money so getting the most bang for your buck isn’t as important. It’s still very important (in many ways it’s what separates the great teams from the mediocre teams), it’s just not as important as it is in baseball.
So We Should Give Up Using Analytics in Football?
No, but we need a lot more data before we can get to the level of utility that anlytics provides in baseball. We’re going to need better data collection technology (likely in the form of RFID chips everywhere) and better modeling methods. We can’t solve the analytics problem in football by adding things up in different ways to come up with new stats to track. We need fundamental advances in statistical modeling methodology. We’re not there yet, but I think we can get there eventually.