Sunday, September 8, 2013

The Algorithm

Unfortunately, I must state that I am not publishing the details of the algorithm for two reasons. First, while it isn't too complex, it is a hundred lines of python code, so I wouldn't be able to share it with you completely without pooping all over the front seat. Second, I think it's more fun to keep it to myself. So while you're going to get the story and the basics, don't expect the (not so) magic formula.

I came up with part of this "algorithm" before last NFL season, but at that point it was all still in the trial phase. I had the idea to skin football down to its very basics (see "The Baseball Analogy"), so I wrote a little metric that gave each team a value. Then, using previous seasons' data, I adjusted the inputs so that the teams' values correlated as highly as possible with pythagorean wins. You might ask why my metric is any better than pythagorean wins then. It all comes down to sample sizes. Pythagorean wins, in the long run, will perfectly align with how good a team is (as will actual wins). However, in a shorter run, such as 16 Broncos games, there can still be quite a lot of error. I wanted my metric to match up with how good teams actually were (hence the large sample size of seasons worth of pythagorean wins data), but I still wanted to (and managed to) have a metric that was far less susceptible to the variance that occurs during an NFL season for any given team. I chose to use pythagorean wins rather than actual wins (even though both are perfect in the long run) because there is so much error in actual wins that even over multiple seasons, the luck won't "even out." Now it was time to see my formula work in practice.

I updated this metric each week, and midway through the season, I still had some weird outliers: a few teams that hadn't been that much better than average, but which my metric ranked as some of the best teams in the NFL. In fact, my metric had ranked a 4-3 Broncos team as the best in the NFL. But we all know how that turned out. Maybe more surprisingly, it had a 6-5 Seahawks team in the top 3. Again, we all know how that turned out. And finally, it had a 3-6 Redskins team in the top 5. In case you didn't remember how these turned out, none of those teams lost again in the regular season. They did in the postseason, but as I alluded to before, the W/L results of individual games involve so much variance that they are not worth giving much consideration to. And while the metric did have its failures (for example, a high ranking for the Steelers and a low one for the Colts, though it was pretty obvious throughout the season that the colts were actually just a bad team that got lucky {no pun intended}), it also led to some other reasonably good predictions, such as the late season success of the Panthers. I was sold on it. All I had to do was adjust for its biggest failing.

This offseason, I wanted to make a more legitimate ranking of the teams, so I looked back at my old metric. The problem was that while the metric was able to discount the high variance events that didn't even out, it was not able to counter the strength-of-schedule disparities. Accordingly, I wrote a program (This is when my metric grew up and became an algorithm!) that adjusted for the opponent's skill on each play. If the Giants dominated offensively against the Saints, the algorithm would take into account that they ran 62 plays against one of the worst defenses of all time. And with that, I created this blog. Sunday week 1, it begins.

No comments:

Post a Comment