This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

Archive for May, 2006

How often does the best team win?

31st May 2006

In the 1989 Baseball Abstract --- yes, there was a 1989 Baseball Abstract; I'll bet I am one of no more than ten people on the planet who has it on his bookshelf right now --- Bill James wrote an essay called How Often Does the Best Team Actually Win? Here is a passage from the introduction:

Yes, we know that the luck evens out in a 162-game schedule, but how consistently? Does the best team win the division, in a 162-game schedule, 90% of the time? 75%? How often? Does the best team in baseball win the World Championship nine years in ten, or two? Is it possible for a team which is in reality just average --- a .500 team --- to win its division (and therefore possibly even the World Series) by sheer luck?

Note that he was not asking how often the team with the best record wins the World Series, or how often a team with a .500 record would win. He was asking how often the team that really and truly was the best wins the World Series, and how often a team that was morally a .500 team would win the world series (most likely lucking into a better-than-.500 record in the process).

Questions like the former can't be answered by looking at real life results, but only because we don't have enough of a sample size. Questions like the latter, though, cannot be answered using real life results even if we live to see a million seasons. We don't know how often the best team wins the World Series or the Super Bowl because we don't know --- we can't know --- who the best team is. Pittsburgh may have been the best team in the NFL last year, or they may have been the 3rd best or the 14th best. We don't know how often a .500 team wins the Super Bowl because we don't know who the .500 teams are.

If you want to know how often the best team wins the title, you have to build a model. In that model, you can create teams whose strengths you know, because you defined them. James did just that, and he concluded that in Major League Baseball, structured as it was in the late 1980s, the best team wins the World Series 29% of the time. The best team in a division wins that division about 53% of the time. The best team in all of baseball missed the playoffs about 29% of the time.

These results seemed to make him a little uneasy. He closed the essay with this:

The belief that in a 162-game schedule the luck will even out is certainly unfounded --- but that unfounded belief may also be essential to the health of the game. Would people lose interest in baseball if they realized that the best team doesn't win nearly half the time? Would it damage the perception of the World Series if people realized that the best team in baseball only emerges with the crown about 30% of the time?

For me, no. It would not damage my interest, and for most of you also, I suspect. I am afraid that for some people, the answer would be the other one. I've learned a lot of surprising things in running these simulations, and I'm happy to have that knowledge....But I don't think it's something I'm going to talk about a whole lot.

I think he's got it backwards. I think it's the stat geeks who are concerned about the best team winning. The rest of the public, in my experience, doesn't give much thought at all to the notion of "the best team," or is content to define the best team to be the one that wins and/or to appreciate the unpredictability for unpredictability's sake. Furthermore, I don't think that, in a 26-team league, 29% is all that low. If the best team in baseball is morally a .600 team, say, then most years there are probably two or three more teams pretty close to that. If a third-best team that is within a few percentage points of the best team happens to win a title because of luck, I don't think anyone considers that a travesty.

In any event, I --- like James --- find the topic fascinating, and have for years been meaning to replicate this study for the NFL. Yesterday's post was not exactly like the James study, but was in some ways similar. And it prompted me to roll up the sleeves and get the simulator built. So I did. And I'm going to spend the next post or five discussing what kinds of things it spits out. Discussion will include, but not be limited to, the followng:

  • I'll answer the same questions James did. How often does the best team in football win the Super Bowl? How often does the best team in football fail to make the playoffs? How often does a sub-.500 team win the Super Bowl? It's not clear how the answers will differ from MLB circa 1989. On one hand, baseball plays ten times more games, which gives the luck more of a chance to even out. On the other hand, football simply doesn't have as much luck built into it as baseball does. If the worst team in baseball beats the best team, it barely raises an eyebrow. In football, that almost never happens.
  • I want to examine various playoff configurations and see how much the answers to the above questions change. For example, what if we eliminated the wildcard and simply let the eight division winners play a standard tournament? Would that increase or decrease the chances of the best team winning? It's not clear, not to me anyway. Sometimes the wildcard lets weak teams in, sometimes it lets strong teams in. What if we had four divisions of eight instead of eight divisions of four? How would that change things? What if, as a friend of mine advocates, we have two conferences of 16 teams each and no divisions at all?
  • I also want to briefly investigate questions along the lines of, how often does a sub-.500 team win its division? Unlike the first bullet, here I'm not talking about teams that were morally sub-.500. I'm talking about teams whose record was under .500. Similarly, we can investigate question like, how often should we see an undefeated team? How often should we see a winless team? What are the chances of a four-way tie in a division?
  • James didn't do this, but I think it will be fun to take a look at some specific teams in specific years. In the previous post, I talked about what would happen if we switched the 2004 Colts and Falcons prior to the playoffs. Now I'll talk about the what would have happened if we had switched them before the season started. This will require an extra step (i.e. leap of faith) which I'll explain when the time comes. As another example, I talked last week about the Chargers having a rough schedule last year. What if they had played the Panthers' schedule last year and the Panthers had played theirs?

Many of these ideas were touched upon in the comments to yesterday's post. If you have more suggestions of questions to ponder, bung them down in the comments.

Posted in Statgeekery | 27 Comments »

Conference imbalance and playoff fairness

30th May 2006

Last week I posted some quick lists of bad teams that made the playoffs and good teams that didn't.

In the comments of the former appeared this:

2004 really was a bad year for the NFC! I can see at least 4 teams on the list [of below average teams that made the playoffs], and the Falcons are 16th, despite IMO being clearly the second best team in the conference that season.

Four of the six playoff teams in the NFC that year were indeed below average according to the simple rating system. In fact, according to that system, 14 of the 16 teams in the NFC were below average. The average rating of all AFC teams was +7.8, which means the average rating of all NFC teams was -7.8, which means that an average AFC team was 16 points better than an average NFC team in 2004. I'll do a full post (or more) on conference imbalance someday, but for now I'll just say that that differential is the highest since the merger. The NFL was an absurdly imbalanced league in 2004.

This is probably the place to remind everyone, self included, that the ratings are just rough estimates and we should be attaching some mental error bars to them. In particular here, I think the Eagles' rating is likely an understatement of their strength because they mailed in their last three games. This would have a ripple effect on the rest of the NFC, which might mean that, really and truly, only 11 of 16 teams being below average instead of the 14 we're estimating above. Or something like that. Anyway, it doesn't change the fact that the NFL was an absurdly imbalanced league in 2004.

Consider the Colts and the Falcons, for example. In order to reach the Super Bowl, the Colts would have had to first beat a Denver team that was arguably better than any team in the NFC. Then they would have had to beat a 14-2 team and a 15-1 team --- both of which compiled their records against tougher-than-average schedules, I might add --- on the road. That's rough. All the Falcons had to do was win two games, one of them against a below-average opponent. If you believe that teams who accomplished more in the regular season should be rewarded with an easier postseason road, something which is implicitly assumed in the postseason structure of every sports league I'm aware of, then you have to consider this unfair.

I decided to investigate just how unfair it was. The basic idea is this: estimate the Colts' chances of reaching and/or winning the Super Bowl, and compare it to what their chances would have been had they been in the other bracket.

The first thing we need to do is find a formula that relates two teams' ratings to their chances of winning a game between the two of them. I'll skip the details, but here is the formula I used:

Home team prob. of winning =~ 1 / (1 + e^(-.438 - .0826*diff))

where diff is the home team's rating minus the visiting team's rating. If the home team is 7 points better than the road team, this model gives the home team a 73% chance of winning. If the home team is 7 points worse, this model gives the home team a 46% chance of winning. I wouldn't go to war with any bookies using this alone, but it should serve our purpose here, which is to give us the rough estimates needed to simulate the playoff tournament a few bazillion times. That will then give us a rough estimate of each team's probability of winning the Super Bowl.

Here were each team's estimated chances of reaching and winning the Super Bowl at the beginning of the playoffs in 2004:

ReachSB WinSB
1. pit 35.4 22.1
2. nwe 35.7 24.6
3. ind 13.5 9.2
4. sdg 9.2 5.9
5. nyj 3.4 1.8
6. den 2.8 1.8

1. phi 56.3 22.4
2. atl 19.5 5.4
3. gnb 11.5 3.5
4. sea 6.4 1.6
5. stl 2.4 0.6
6. min 3.9 1.1

Anyway, let's see what happens if you switch the Colts and Falcons, giving the Colts the two seed in the NFC and the Falcons the three seed in the AFC:

ReachSB WinSB
1. pit 37.7 19.8
2. nwe 41.7 24.9
3. atl 2.0 0.6
4. sdg 9.9 5.3
5. nyj 3.8 1.8
6. den 5.0 2.6

1. phi 41.2 17.1
2. ind 47.1 24.8
3. gnb 5.2 1.6
4. sea 3.3 0.8
5. stl 1.2 0.2
6. min 2.1 0.5

The Colts' chances of reaching the Super Bowl would have been about three to four times greater had they been in the other league. The Falcons' chances would have decreased by a factor of 10 had they been in the other league. The Bills missed the playoffs in the AFC. Had they been the #6 seed in the NFC, they would have had a 15% chance of getting to the Super Bowl.

Finally, this comes from the comments of the "best non-playoff team" post:

Don’t forget, that 1991 San Francisco team lost to the Falcons on a Hail Mary pass (Tolliver to Haynes, I believe for 44 yards). If that pass is incomplete, SF goes 11-5 and wins the division, NO is a wildcard team and Atlanta misses the playoffs entirely.

Had things played out that way, San Francisco would have had an estimated 16% chance at reaching the Super Bowl and a 10% chance of winning it, and those numbers would be quite a bit higher had the 1991 Redskins not been such a juggernaut.

Yes, yes, I know. That's the way the ball bounces, that's why they play the games, great teams will find a way to overcome bad breaks, and so on and so forth. Anyone with the urge to post, "the Patriots won the 2004 title on the field and that's all that matters" will not be telling me anything I don't know. I get that. I am aware that it's meaningless to say that being in the AFC cost Indianapolis .156 Super Bowl titles in 2004.

For some reason, it's something I wanted to know anyway.

Posted in History, Statgeekery | 9 Comments »

Tough schedules, lucky teams, and Simpson’s paradox

29th May 2006

These two posts gave me occasion to whip up a program that tells me what every team's record was against playoff teams and against non-playoff teams. Lots of interesting tidbits in there.

For example, the 1998 Cardinals were one of 16 teams since the merger to make the playoffs without having any wins over playoff teams. They were 0-2 against playoff teams. But perhaps more shameful are the 2004 Vikings, who made the playoffs despite going 0-5 against playoff teams. Like the Cardinals, though, they somehow managed to win an actual playoff game.

The 1998 Cardinals and the 1998 Saints provide a very nice example of Simpson's paradox. Check out this table:

Cards Saints Better Record
vs playoff teams 0-2 1- 9 Saints
vs non-playoff teams 9-5 5- 1 Saints
Total 9-7 6-10 Cards

The Saints had a better record (percentage-wise) than the Cardinals against playoff teams. The Saints had a better record than the Cardinals against non-playoff teams. But the Cardinals had an overall record that was three games better. In case you're curious, there have been 42 instances like this since 1970. Where else are you going to get info like that?

More trivia:

  • The 1999 Jaguars were winless (0-2) against playoff teams and undefeated (14-0) against non-playoff teams. They are the only team to hold that distinction.
  • The 1989 Browns were 5-1 against playoff teams and 4-5 against non-playoff teams.
  • Ten teams have been undefeated against playoff teams, with the 2003 Patriots having the most wins (5) among those teams.
  • The 1993 Bucs are the only team to play against 11 playoff teams. Four years later, the 1997 Bucs became one of only two teams to make the playoffs despite playing against 10 playoff teams. The 1994 Lions were the other.
  • With the schedule set up as it currently is, every team is guaranteed to play at least two playoff teams each year. Back in the old days, four teams played against just one: the 1970 Cowboys, the 1974 Steelers, the 1976 Rams, and the 1987 Redskins. I don't know enough about historical scheduling practices to know if it was ever theoretically possible to get through a season without playing a playoff team. In any case, no team ever did. [CORRECTION: the 1976 Rams, 1975 Vikings, and 1972 Dolphins did.]
  • The 1997 Packers and 1998 Jets were 7-1 against playoff teams.
  • Two Super Bowl winners, the 1974 Steelers and 1999 Rams, did not beat a playoff team during the regular season. [CORRECTION: add the 1972 Dolphins to the list.]
  • Among Super Bowl winners, the worst record against non-playoff teams belongs to the 1988 San Francisco 49ers, who were 7-4.

Posted in History, Statgeekery | 11 Comments »

The worst playoff team in history

26th May 2006

This is a companion post to yesterday's best non-playoff team in history.

The worst playoff team in history is, as some of you guessed, the 1998 Arizona Cardinals. They played the weakest schedule in the league and were still outscored by 53 points. They only played one team with a winning record (Dallas. They did play them twice, losing both times.) According to the simple power ranking scheme, the Cardinals were the 26th-best team (out of 30) in the league in 1998. Here are their nine wins:

  • Week 3 - they beat the 3-13 Eagles
  • Week 4 - they beat the 4-12 Rams
  • Week 6 - they beat the 4-12 Bears
  • Week 9 - they beat the 5-11 Lions
  • Week 10 - they beat the 6-10 Redskins
  • Week 12 - Redskins again
  • Week 15 - Eagles again
  • Week 16 - they beat the 6-10 Saints
  • Week 17 - they beat the 5-11 Chargers

Cardinal fans might point out that, by definition, you can't be the worst playoff team in history if you actually won a playoff game. And this team somehow did manage to do that, beating Dallas in Dallas before losing in Minnesota. I don't care. This team was so bad that they take the title anyway.

I'll close with a list of all playoff teams since the merger that the simple rating system says were below average. For obvious reasons, there are a lot of 1982 teams here. For reasons that are not obvious (to me, anyway), the vikings appear eight times on the list.

TM YR Rating
ari 1998 -7.4
ram 2004 -6.0
atl 1978 -4.6
cle 1982 -3.7
pit 1989 -3.7
chi 1977 -3.6
atl 1982 -3.6
sea 2004 -2.9
tam 1979 -2.8
den 1983 -2.8
stl 1982 -2.7
nwe 1982 -2.7
hou 1989 -2.5
min 1987 -2.5
nyj 1991 -2.4
atl 2004 -2.2
chi 1994 -1.9
phi 1995 -1.7
min 2004 -1.7
ind 1996 -1.6
min 1977 -1.6
jax 1996 -1.5
nyj 1986 -1.4
ind 1995 -1.3
mia 1970 -1.3
nor 1990 -1.3
cle 1985 -1.3
cin 1990 -1.1
tam 2005 -1.0
car 2003 -0.9
buf 1995 -0.9
sea 1988 -0.7
rai 1993 -0.7
ram 1979 -0.6
dal 2003 -0.5
hou 1978 -0.4
det 1993 -0.3
min 1980 -0.3
min 1978 -0.2
min 1996 -0.1
min 1993 -0.1
min 1997 -0.1
pit 1983 -0.1
chi 1979 -0.0

Posted in General, History | 16 Comments »

The best non-playoff team in history

25th May 2006

It might just be the 2005 San Diego Chargers.

If you go by the basic power rating system, the Chargers were the third best team in the NFL last year with a rating of +9.9, which means that, if you adjust for the schedule they played, they were about 9.9 points better than an average team. According to that metric, the Chargers were the third-best team since the merger to be watching the postseason on TV:

TM YR Rating
sfo 1991 10.4
cin 1976 10.0
sdg 2005 9.9
ram 1970 9.3
mia 1975 8.9
buf 2004 8.1
mia 1977 7.4
stl 1970 7.3
den 1976 7.2
kan 2005 7.0
buf 1975 6.6
sea 1986 6.5
cin 1989 6.5
kan 1999 6.4
ram 1971 6.3
oak 1999 6.2
hou 1975 6.1
min 1986 6.1
bal 2004 6.1
kan 2002 6.1
mia 2002 6.1
hou 1977 6.0
nwe 1980 6.0

Using this rating system to compare across years requires a bit of interpretation. This doesn't say the 2005 Chargers were a better team (or a worse team) than, say, the 1977 Oilers. It says that the 2005 Chargers were better, relative to their league, than the 1977 Oilers were, relative to theirs. It seems to me that's an appropriate metric by which to judge meaningless trivia like "best non-playoff team in history."

If you click on the 1991 49ers and the 1976 Bengals, you'll see that each of them has pretty strong claim to this title as well. The 49ers were third in the NFL in points scored and fourth in points allowed. The Bengals ranked sixth and seventh in those two categories. They were 10-4, with all four losses coming against playoff teams, including two to the eventual Super Bowl champion Steelers.

But I think the simple rating system I'm using actually understates the Chargers' strength. If memory serves (correct me if I'm incorrect), the loss to Denver in week 17 was essentially an exhibition game, as both teams' postseason destinations were already sealed. Further, the Chargers' loss to Philadelphia looks like a bad loss to the computer, but at the time, the Eagles still had Owens and McNabb and were among the best teams in the NFC. Likewise, there is little shame in their loss to the Dolphins, who were in the middle of a six-game win streak when they beat San Diego.

On the flip side, who is the worst playoff team of all time? Unlike the above, where you could reasonably argue for a few different teams, this one is not debateable. I knew who it was before running the numbers, but the numbers confirmed it. I'll write about them in a future post.

Posted in General | 16 Comments »

Why is 3rd-and-2 a passing down?

24th May 2006

Let's start with just theory. No data.

On 3rd-and-anything, shouldn't the leaguewide success rate on runs and the success rate on passes be roughly equal? On say, 3rd-and-3, if running plays succeeded 70% of the time and passes succeeded only 53% of the time(these are made up numbers), then wouldn't teams start to run the ball more often on 3rd-and-3? Then the fact that teams were running more than they used to in that situation would cause defenses to expect more runs and fewer passes, which would cause them to gear their defenses more toward stopping the run, which would cause the success rate on runs to go down and the success rate on passes to go up.

How far would the success rate on running plays go down? It seems to me it would go down --- and the success rate on passes would go up --- just until the point where they are the same. If the success rates on the two kinds of plays are the same, then offenses don't have any incentive to shift their play-calling mixture. Thus, defenses don't have any reason to shift their expectations, and the success rates should stay the same. Equilibrium.

And we should stay in that equilibrium until the system gets a shock from outside. A rule change that favors the run over the pass (or vice versa) could throw it out of balance. A new play-calling innovation in one aspect of the game could do it.

But even then, we should rather quickly settle down into a new equilibrium. Imagine, for example, that everything is cruising along with a leaguewide 3rd-and-3 success rate of 58% on both runs and passes. Now some hotshot defensive coordinator creates a new scheme that allows defenses to achieve standard 3rd-and-3 pass coverage while at the same time making it very difficult to run the ball. Such an innovation would spread throughout the league, and would generally make running on 3rd-and-3 much less attractive than it used to be. Say the passing success rate stays at 58% while the running success rate drops to 40%.

If you're an offensive coordinator and you can succeed 40% of the time with runs and 58% of the time with passes, what are you going to do? You're going to pass more. The defense will notice this and will start to play the pass a bit more, which will open the run back up a little. But as long as the pass is more profitable, offenses should shift from running to passing. As long long as offenses are shifting from running to passing, defenses will adjust accordingly and make the pass less profitable. As soon as the pass is no longer more profitable than the run, all this shifting stops. Equilibrium.

I was using 3rd-and-3 as an example, but the same reasoning should apply on 3rd-and-anything. On 3rd-and-5, for example, the run has a chance to succeed largely because of the surprise factor, so runs will only succeed if they aren't tried too often. Therefore, it may be an 80/20 pass/run mix that achieves equal success rates. On 3rd-and-1, it may be a 30/70 mix. I am not saying that the number of runs and passes should be equal in any given situation, just that the success rate on runs and passes should be equal.

[I should probably stop short of saying 3rd-and-anything. On 3rd-and-16, for instance, teams use the run as a sort of pre-punt to try to improve their field position rather than as an instrument to pick up the first, and my theory isn't applicable.]

Anyway, that's how it ought to be in some sort of idealized world with perfect information and rational choices and homogeneous teams whose only goals are to get the immediate first down or to stop the other team from doing so. In the real NFL, however, this is how it plays out. This is 2003--2005 data.

Rush (success rate) Pass (success rate)
(3rd-or-4th)-and-1 76.5% (71.5%) 23.5% (53.9%)
(3rd-or-4th)-and-2 41.6% (57.1%) 58.4% (48.9%)
(3rd-or-4th)-and-3 25.5% (55.9%) 74.5% (51.8%)
(3rd-or-4th)-and-4 19.2% (50.2%) 80.8% (47.1%)
(3rd-or-4th)-and-5 15.8% (38.1%) 84.2% (42.1%)

To make sure we're clear, the second line says that on 3rd-and-2 (or 4th-and-2) during the past three seasons, teams have passed the ball 58.4% of the time and run it only 41.6% of the time. When they've passed, they've picked up the first 48.9% of the time. When they've run, they've picked it up 57.1% of the time.

The data fits the theory fairly well on 3rd-and-3, 4, and 5. But it's not even close on 3rd-and-1 and 3rd-and-2. That brings us to the title of the post: why is 3rd-and-2 a passing down? An alternate title might be, why does my theory stink so bad?

Maybe my theory doesn't stink, but the pace of innovation casuses the equilibria to be so short-lived that they can't be captured with a 3-year snapshot.

Maybe my theory doesn't stink, but the equilibria show up at the team level and not at the league level. This would be tough to verify, as the sample sizes get pretty smallish if you look at things one team at a time. For the record, here are the three-year average run/pass rates and overall success rates on 3rd-and-2:

TM R/P ratio ConvRate
dal 62 / 38 44.2
den 60 / 40 52.3
buf 59 / 41 44.4
car 54 / 46 50.0
sdg 52 / 48 52.1
nwe 52 / 48 65.5
sea 48 / 52 66.0
stl 46 / 54 38.0
pit 46 / 54 47.5
atl 46 / 54 49.1
jax 45 / 55 54.7
sfo 44 / 56 40.0
nyj 44 / 56 50.0
min 44 / 56 58.0
bal 43 / 57 50.8
mia 42 / 58 54.2
nor 40 / 60 67.3
chi 40 / 60 52.7
hou 40 / 60 49.1
phi 40 / 60 53.5
ari 39 / 61 57.1
ind 39 / 61 52.9
oak 39 / 61 63.3
cin 39 / 61 70.5
was 38 / 62 55.1
kan 37 / 63 58.1
nyg 36 / 64 53.3
cle 35 / 65 40.0
gnb 35 / 65 57.8
ten 28 / 72 44.0
tam 20 / 80 43.1
det 11 / 89 44.3

Are you kidding me Steve Mariucci? You're passing 89% of the time on 3rd-and-2?

I always thought running was standard procedure in short yardage, and I always thought 2 is short yardage. Yet teams pass more than they run on 3rd-and-2. This would make sense if teams were having more success with the pass on 3rd-and-2. But they're not. Why are teams passing in a running situation even though it's not working?

It is possible that the success rates are skewed in favor of the run because of quarterback run-pass option plays or unplanned scrambles. That is, quarterbacks on such plays tend to run only if they know they can get the first, and throw it away otherwise. This would cause the successes to get counted as runs and the failures to get counted as passes, even though it's the same play. But there can't be enough of those plays to make too much difference.

Maybe teams are interested in more than just the immediate first down. On 3rd-and-(1-or-2), 7.3% of pass plays went for 20 more yards, while only 2.4% of running plays did. Maybe, contrary to David Romer's findings, teams will trade in a surer chance at a 2-yard first down pickup for a chance at a long gainer.

Posted in General | 13 Comments »

More on rating systems: margin of victory loss

23rd May 2006

Back in this post, I described a simple iterative ranking scheme. Like all rating systems, that one has its strengths and weaknesses.

That system is not one of the systems actually included in the BCS selection process, because a few years ago the BCS mandated that all their computer ranking algorithms must completely ignore margin of victory. This is a controversial topic among aficianados of ranking algorithms.

One one hand, the margin of victory contains extra information. If you know that Team A beat Team B, that tells you something about the relative strengths of the two teams. But if you know that Team A beat Team B by 31 points --- or by one point --- you know more about the relative strengths. You don't know everything, of course, but you know more, and it just makes good sense to include more data rather than less. On the other hand, using margin of victory is in some abstract sense contrary to the point of almost all sports, and football in particular. The only purpose of the score is to determine a winner. The team that wins by 31 may have looked more impressive than the team that wins by a single point, but The Institution Of Sport does not recognize them as having accomplished anything different. A win is a win.

In general, I can see the merits of both sides of the debate. But in the particular case of using mathematical algorithms to help determine which teams play in the official national championship game, as with the BCS, it certainly does make sense to remove margin of victory from consideration. Otherwise, teams would have incentive to run up the score needlessly in a game that is already essentially over, which is almost universally considered poor sportsmanship (I don't necessarily agree with that almost-universally-held view, by the way, but that's another post.) But whether it's bad sportsmanship or not, incentives change behavior. And at the very least, including margin of victory gives teams incentive to attempt to inject false information into the equations.

Anyway, a reader named Vince posted this in the comments to the above-linked post:

My dad and I used to argue that teams should be measured on 1) W-L %, 2) Strength of schedule, and 3) Margin of loss. This came about in one college season where two teams who played similar schedules each had one loss, but one team lost by seven and the other by a huge margin.

Would it be possible to do a ranking that puts a margin of one point on all wins, but has no such limits on losses?

Vince and his dad are a couple of sharp dudes. By treating all wins --- but not all losses --- equally, we can capture some of the information contained in the score without giving teams any incentive to run it up. So I starting playing around to see if I could figure out a way to make it mathematically feasible. And I think I did. Here is the plan:

  1. Figure out the average margin of victory in all games during the course of the season. In the 2005 NFL, it was about 11.7. That is, the winning team scored, on average, 11.7 more points than the losing team.
  2. Count every win as +11.7 points, and every loss as -N points, where N was the actual margin of the game. So a one-point loss is -1, and a 20-point loss is -20. A one-point win is +11.7, and a 20-point win is +11.7.
  3. Compute each team's average point margin using to the strange accounting system described above. For example, the Chargers were 9-7 last year. Their seven losses were by a total of 43 points, so their average point margin would be (11.7 * 9 - 43) / 16, which is about +3.9. A team that went 16-0 would have a margin of +11.7, while a team that went 0-16 might have a margin anywhere from -1 to -40 depending on how lopsided their losses were.
  4. Now you've got a collection of average point margins that sum to exactly zero, so you can plug into the same system we used previously. Simply adjust the ratings repeatedly until they stabilize.

Here are the ratings for the 2005 NFL using this scheme:

TM Rating StrOfSched
1. den 9.6 1.9
2. ind 7.2 -1.5
3. sdg 6.8 2.9
4. sea 6.3 -1.8
5. nyg 6.1 0.9
6. was 5.8 1.9
7. kan 5.5 1.8
8. jax 5.2 -1.6
9. pit 4.8 -0.8
10. dal 4.7 1.7
11. car 4.3 -1.9
12. nwe 3.4 0.7
13. cin 2.6 -1.2
14. tam 2.5 -1.9
15. chi 2.1 -1.9
16. mia 1.3 -0.5
17. atl -0.4 -1.0
18. phi -1.8 2.3
19. min -2.6 -0.9
20. oak -2.7 2.6
21. bal -2.9 -0.2
22. ram -3.2 -1.1
23. cle -3.4 -0.5
24. ari -4.5 -0.3
25. gnb -4.7 -0.6
26. buf -5.1 0.5
27. nyj -5.1 1.1
28. det -5.5 -0.7
29. ten -7.8 -0.4
30. nor -9.1 -0.0
31. sfo -9.1 0.7
32. hou -10.1 0.0

Remember I said there is no incentive to run up the score? In fact there is a disincentive to do so. If you run up the score, you do nothing to your average point margin (because all wins are counted the same), but you do hurt your opponent's point margin. This weakens your strength of schedule, which actually lowers your rating. Here is some "proof." The Colts beat the Cardinals 17-13 in the last game of the season last year. If we change that score to 57-13, here are the new ratings:

TM Rating StrOfSched
1. den 9.9 2.0
2. ind 7.1 -1.7
3. sdg 7.0 3.0
4. sea 6.0 -2.3
5. nyg 6.0 0.7
6. was 5.7 1.8
7. kan 5.7 1.9
8. jax 5.1 -1.8
9. pit 5.1 -0.7
10. dal 4.6 1.5
11. car 4.4 -1.8
12. nwe 3.7 0.9
13. tam 2.8 -1.7
14. cin 2.8 -1.0
15. chi 2.3 -1.8
16. mia 1.6 -0.3
17. atl -0.2 -0.8
18. phi -1.9 2.1
19. min -2.4 -0.8
20. oak -2.5 2.8
21. bal -2.7 -0.1
22. cle -3.2 -0.4
23. ram -3.6 -1.5
24. gnb -4.5 -0.4
25. buf -4.8 0.7
26. nyj -4.9 1.4
27. det -5.5 -0.7
28. ari -7.1 -0.4
29. ten -7.9 -0.6
30. nor -8.9 0.1
31. sfo -9.5 0.3
32. hou -10.3 -0.2

The Cards drop four spots, as Vince and his dad think they should, but the Colts' rating also dropped just a hair. Instead of giving teams incentive to score, score, score in the closing moments of an already-decided contest, this system would actually give teams incentive to let the other team score. It's a kindler, gentler rating system. Hooray for everyone!

In all seriousness, though, I can't envision that becoming a practical problem if a system like this were installed as part of the BCS formula. One question I have is whether this system would produce college football ratings that look reasonable to most people. I think it would, but we'll have to run the numbers to find out for sure. I'll put that on the to-do list.

Posted in BCS, Statgeekery | 6 Comments »

More home cookin’

22nd May 2006

Last week I posted this breakdown of success rates at home versus on the road on third (or fourth) and one when the play was close. I'm just going to throw out a few similar breakdowns here. As I said in the previous post, I don't think it's possible to determine from the stats whether there is an officiating-related home field advantage, so I'm going to refrain from commenting much. I just thought you might be interested in the numbers.

First, here is the dual breakdown to the one I presented last week. The first column is the exact data I presented in the last post. The second column contains the conversion rates on plays that were not close (and hence where a spot couldn't have made a difference).

Success rates on rushing plays on (3rd-or-4th)-and-1

When the play is close When not close
home team 48.7% 87.2%
road team 40.8% 86.7%

And here is some further detail:

All (3rd-or-4th)-and-1 rushing attempts

Gain<0 Gain=0 Gain=1 Gain>1
home team 8.3% 17.7% 16.9% 57.0%
road team 8.2% 22.8% 15.7% 53.3%

Finally, these data should give us an idea of what the overall home field advantage is on 3rd down (and 4th down) plays.

Success rates on all (3rd-or-4th)-and-N plays

N Home Road
1 68.6% 66.0%
2 53.3% 51.2%
3 52.7% 53.0%
4 48.1% 47.3%
5 42.6% 40.4%
6 44.9% 40.8%
7 36.2% 39.0%
8 32.1% 31.5%
9 33.2% 29.2%
10+ 21.0% 19.2%

Posted in Home Field Advantage | 3 Comments »

Home cookin?

19th May 2006

You know what I hate?

The quarterback sneak.

I acknowledge that it's generally a pretty effective play if you need to pick up two inches. But it's really ugly. And besides that, it puts the refs in a tough spot. On most quarterback sneaks, it's impossible to get a decent spot because no one --- not the refs, the fans, or even the TV cameras --- can see through the pile of bodies well enough to pinpoint the exact spot of the ball (which you can't see) at the time that the knee (which you can't see) touches the ground (which you can't see) or figure out when the forward momentum of the ball carrier was stopped. It just can't be done. And the result is that the ref has to arbitrarily decide whether to award a first down or not.

That makes me wonder whether the arbitrary spots that the home team gets might be different from the arbitrary spots that the road team gets. I decided to take a very incomplete preliminary look at some data to see if anything interesting would turn up. And, though I started this post talking about quarterback sneaks, I'm going to open up the data to the broader topic of short-yardage situations.

So here is what I did. I looked at all 3rd-and-1 and 4th-and-1 situations during the past three seasons in which a rush was attempted and where the rush gained either zero or one yard. Inasmuch as we can tell from the play-by-play data, those would be the plays where a spot could make the difference. Here is the data:

attempts successes success rate
home team 357 174 48.7%
road team 390 159 40.8%
TOTAL 747 333 44.6%

Given the sample sizes involved, it's very unlikely that such a split would happen by chance if the true success rates were equal. So we have pretty good evidence that the success rates are not the same. It's pretty likely that something is going on here.

I need to state clearly that this does not necessarily say anything about the refs and whether their spotting guesses are influenced by the home crowd. The refs are just one of many possibilities for the something that is going on. Teams in all sports do all sorts of things better at home than on the road, so this could be just another non-officiating-related manifestation of that slippery character named home field advantage.

Or maybe it's not. This data isn't merely saying that the home team converts more often on 3rd-and-1. It's saying that they convert more often on 3rd-and-1 when the play is close. I don't think it's possible to statistically separate the officiating-related home field advantage (if any) from the non-officiated-related home field advantage, so we'll never know. But this looks a bit suspicious to my paranoid eye.

I think sports rooting is a good outlet for me to release all my irrationality. Most people who know me consider me pretty logical and level-headed, and generally I am. As I sit here typing this, I truly believe that "this does not necessarily say anything about the refs and whether their spotting guesses are influenced by the home crowd. The refs are just one of many possibilities for the something that is going on." But the first time a team I'm rooting for gets a bad spot on the road, this data will become iron-clad evidence of widespread conspiracy.

That's healthy, right?

Posted in Home Field Advantage | 10 Comments »

Team quarterbacking through the years

18th May 2006

The end of the Joey Harrington era in Detroit gave sports economist Dave Berri a chance to observe that no Lion quarterback has made the pro bowl in 35 years. That's pretty bad, but according to what I'm about to show you there are a few other teams that can make a case as having had worse overall quarterback play than Detroit over the course of the "modern era for passing" (1978--present).

Here is the plan:

  1. Compute each team's passer rating (just for quarterbacks --- passing attempts by others have been discarded) for each season since 1978. I'm not a fan of the NFL's passer rating formula and I'm not sure what possessed me to use it here, but you'll get similar results if you use yards-per-attempt or, I suspect, any other reasonable metric.
  2. Compare it to league average and get a Passing Effectiveness Index for each team for each year. For example, Detroit's quarterbacks posted a 69.1 passer rating last season. League average was 80.0. Dividing the Lions by the League gives you about .863, which I'll multiply by 100 to make it more easily digestible. The Lions Passing Effectivness Index for 2005 was 86.3.
  3. Average each team's Passing Effectiveness Index over all the years they've been in the league (since 1978). When you do that for the Lions, you get 92.9. This means that Detroit's quarterback's have been, on average, about 7.1% less effective than average (according to passer rating) over the past 28 years.

You get no prize for guessing what franchise has the highest average Passing Effectiveness Index. It's the 49ers, who boast an extremely impressive 115.4. The second best franchise has a 105.8. In fact, the difference between San Francisco and #2 is bigger than the difference between #2 and #24.

As foreshadowed above, last place belongs not to the Lions, but to their divisional rivals: the Chicago Bears. This is a grim list indeed:

1978 72.6 Bob Avellini
1979 96.2 Mike Phipps
1980 78.9 Vince Evans
1981 73.0 Vince Evans
1982 93.2 Jim McMahon
1983 98.6 Jim McMahon
1984 98.6 Jim McMahon
1985 103.1 Jim McMahon
1986 79.9 Mike Tomczak
1987 99.4 Jim McMahon
1988 98.0 Jim McMahon
1989 91.1 Mike Tomczak
1990 94.3 Jim Harbaugh
1991 97.2 Jim Harbaugh
1992 91.8 Jim Harbaugh
1993 85.0 Jim Harbaugh
1994 100.0 Steve Walsh
1995 117.9 Erik Kramer
1996 93.7 Dave Krieg
1997 85.8 Erik Kramer
1998 97.5 Erik Kramer
1999 99.5 Shane Matthews
2000 85.7 Cade McNown
2001 94.6 Jim Miller
2002 91.5 Jim Miller
2003 77.8 Kordell Stewart
2004 75.7 Chad Hutchinson
2005 76.7 Kyle Orton

Only thrice in the last 28 years have the Bears even been above average in the passing game. Also below the Lions are the Texans, the Buccaneers, and the Cardinals, but all those teams are very close.

I'll post the full list later today, but I wanted to first give you the opportunity to impress me by guessing who is #2 on the list after the 49ers.

Addendum: Good guessing by monkeytime. Here is the list. PEI is the franchise's average Passing Effectiveness Index, PctOverAvg is the percentage of that franchise's seasons in which they've been above average in PEI.

Franchise PEI PctOverAvg
49ers 115.4 85.7
Jaguars 105.8 72.7
Dolphins 105.2 64.3
Vikings 104.4 57.1
Cowboys 103.3 64.3
Broncos 103.1 57.1
Bengals 102.2 53.6
Packers 102.2 50.0
Chiefs 102.1 57.1
Oilers/Titans 101.1 57.1
Redskins 101.0 50.0
Jets 100.9 42.9
Rams 100.9 50.0
Raiders 100.9 46.4
Bills 100.9 60.7
Eagles 100.4 57.1
Browns 100.3 40.0
Colts 99.8 39.3
Falcons 99.7 46.4
Seahawks 99.6 50.0
Patriots 98.6 42.9
Chargers 97.6 35.7
Panthers 97.4 54.5
Steelers 97.2 46.4
Saints 95.6 46.4
Giants 95.2 42.9
Ravens 94.2 20.0
Lions 92.9 21.4
Cardinals 92.8 28.6
Buccaneers 91.9 21.4
Texans 91.1 25.0
Bears 91.0 10.7

Posted in General, History | 7 Comments »

Michael Vick

17th May 2006

I devoted a few posts to Matt Schaub (I, II) awhile ago, so I may as well write a little about the other Atlanta quarterback.

Back in 2002, I was a serious Vick-backer. I no longer am. But that's not because I changed my mind. It's because Vick regressed. At least I think that's the way it happened.

He is incredibly fun to watch, so that ensures a lot of TV hype. Because of that, there will be haters. Peyton Manning collects haters for the same reason. But Vick generates extra animosity because he doesn't do things the way quarterbacks are supposed to do things. Johnny Unitas he's not. But for those that were able to get past that, it was easy to recognize that he was --- the occassional really ugly pass aside --- a great quarterback in 2002. He was confident, he was decisive, and his scrambling ability made him virtually impossible to defend.

In 2004, he did less but his team was more successful, so it's hard to complain about that.

Last year, he was not only not a great quarterback. He was a not even a good quarterback. In fact, he wasn't even fun to watch, and I thought his running ability was noticeably diminished. Here's what Vick says about it:

My knee was bothering me all year. I never cried about it. I never complained about it. I just tried to do the best I can for the team. Now, I'm 100 percent healthy. I'm where I used to be.

That's a common refrain in the offseason, so I'm skeptical. But I want to believe him because I sure do enjoy watching him run.

I said earlier that it was easy to recognize that Vick was a great quarterback in 2002 and 2004, and I believe that. In particular, I saw with my own eyes that on every running play the opponent had to keep its outside linebackers and/or ends home --- on both sides --- for fear of the bootleg. This opened up the middle for Dunn and Duckett to run wild, which led to all the ball-control-and-defense wins that Atlanta racked up during 2004.

Or so it appeared to me.

I went in search of stats to corroborate my impression and came up empty. Here are Dunn and Duckett's numbers from 2002--2005 with and without Vick in the game.

With Vick
Warrick Dunn 689 3048 4.4 72.6
T.J. Duckett 407 1591 3.9 41.9

Without Vick
Warrick Dunn 211 1073 5.1 67.1
T.J. Duckett 144 584 4.1 38.9

[Fine print: a game was defined to be "with Vick" if Vick attempted 10 or more passes in that game. In other words, I'm not going down to the partial game level. Every game was either a "with Vick" game or a "without Vick" game.]

I'm really not sure what to make of that, except that it wasn't what I was expecting.

Posted in General | 18 Comments »

David Romer’s paper: postscript

16th May 2006

For easy reference, here are the previous posts in the sequence: I, II, III.

Romer's paper cites two academic papers by former NFL quarterback Virgil Carter and Robert Machol. The first was written in 1971. I haven't read it, but it is described in The Hidden Game of Football, which was originally written in 1988 and then updated and re-released in 1998.

Carter and Machol also created a function that converts situations to point values. They did it slightly differently from Romer. What they did was to dig through the play-by-play and look at all the times a team had a first-and-10 on a given yard line. Then they record who scored next and how many points, and take the average. Here is how Carter and Machol's method was described in The Hidden Game:

In our study of 240 games in 1997, we found 783 first down plays from the 50 yard line (plus or minus 2 yards) . . . Of these 783 first downs, the offense scored next in 482 cases totalling 2473 points. The team on defense was the next to score 194 times for 1088 points. And 146 times neither team scored. Subtract 1088 from 2473, and you leave the offense with a plus of 1385 points. Divide by 783. With a first-and-10 at midfield, the offense has a point potential of 1.77.

Romer writes that this method is "considerably cruder" than his own. I found that a bit off-putting, but it's essentially true. In any case, the conclusions reached by Palmer, Thorn, and Carroll in The Hidden Game using Carter and Machol's method are generally similar to Romer's.

After reading the previous posts in this sequence, a friend of mine sent me this link, which is David Sklansky's analysis of the situation. I had never heard of the guy, but because he is a poker guru of some repute, I'm probably the last man in the country who can say that. My friend described him as, "somewhat of a poker theorist-genius whose thoughts on other topics are often interesting as well." His analysis is admittedly incomplete and he seems to be unaware of the work done by Romer et al, but the linked article is a concise and well-written summary of the relevant issues.

Finally, this one seems a bit out there but it's definitely worth thinking about. I stumbled across this article by Jason Scheib. I don't know anything at all about Jason Scheib, but he appears to be a pretty sharp guy and he has put a lot of thought into this idea. The idea is . . ., well, I'd better let him tell you what the idea is:

About a year and a half ago I took a pretty simple idea (A punt is a turnover) and began exploring it as far and in as many different directions as it would take me. Over that time it has grown into a turnover theory that gives a different perspective on the game of football. It is a theory based on redefining what turnovers are and using this new definition to see what a team can do to improve their net turnovers in an effort to win more games. This theory presents two significant implications: 1) a team would win more games if they never punt, and 2) a team that never punts would not just be employing a different strategy but would approach the game in a fundamentally different way, which would further add to their success.

This is not about taking more risks and punting less often. That could cost you games depending on when you decide to punt and when you decide not to. The key is to never punt. Never punting takes away the risk because it allows the averages to work in your favor. It also opens you up to different play calling opportunities, primarily on third down. The two go together and are dependent on each other in order to make this work.

Before you laugh, go read it. I have skimmed it a few times and honestly can't say I completely understand yet why going for it on 4th-and-14 is a good plan, but I am definitely intrigued by the idea of making your offense more efficient by playing in a four-down mindset at all times. Romer's analysis shows that you can get a slight advantage by making different decisions on fourth down. What I get out of Scheib's idea is that maybe you can get an even bigger advantage by looking at the third-down-fourth-down sequence as a whole, and making nontraditional decisions there.

Posted in Statgeekery | 3 Comments »

David Romer’s paper III

15th May 2006

As you may have guessed, this is a continuation of David Romer's paper II which is a continuation of David Romer's paper I. The first one is optional I suppose, but to understand this one you need to read the second.

The question is: how did Romer arrive at his function that associates a numerical point value to a first-and-10 on any yard line?

First, he took three year's worth of NFL game logs. Then he threw away the last three quarters of each game and worked with only the first quarters. He did this so he could assume that teams were in point maximization mode, and also to avoid the effects of end-of-half and end-of-game maneuvering. Next, he distilled the data down to only 101 situations. Situations 1 through 99 are first-and-10 at the given yard line (1 means your own 1 and 99 means your opponent's 1). Situation 100 is a kickoff from the 30. Situation 101 is a free kick from the 20. So a game log that looks like this:

Patriots KICKOFF to Jets 3, returned to Jets 20.
1st-and-10 at Jets 20 - Martin rushes for 8 yards
2nd-and-2 at Jets 28 - Pennington to Coles for 5 yards
1st-and-10 at Jets 33 - Pennington pass incomplete
2nd-and-10 at Jets 33 - Martin rushes for 2 yards
3rd-and-8 at Jets 33 - Pennington sacked for -4 yards
4th-and-12 at Jets 29 - Jets punt to Patriots 42. Fair catch.
1st-and-10 at Pats 42 - Brady to Branch for 22 yards.
1st-and-10 at Jets 36 - Dillon rushes for no gain.
2nd-and-10 at Jets 36 - Dillon runs for 36 yard TD.
Extra point good.
Patriots KICKOFF to Jets 3. Returned to Jets 37.
1st-and-10 at Jets 37 - Pennington pass intercepted by Bruschi, returned for TD.
Extra point good.
Patriots KICKOFF to Jets 2. Returned to Jets 25.
. . .

Would now look like this:

Patriots ball, situation 100
Jets ball, situation 20
Jets ball, situation 33
Patriots ball, situation 42
Patriots ball, situation 64
[Patriots score 7 points]
Patriots ball, situation 100
Jets ball, situation 37
[Patriots score 7 points]
Patriots ball, situation 100
. . .

Now, let's look at which situations led to which other situations, and how many points were scored in between. We'll look at this just from the Jets' standpoint, which means that we'll really think of there being 202 situations, which we'll call situations 1 through 101 and -1 through -101. We'll define Situation 20, for example, to mean it's the Jets' ball on their 20 whereas Situation -20 means its the Patriots ball on the Patriots 20. Here is the data again:

Situation -100 leads to situation 20 (no points scored)
Situation 20 leads to situation 33 (no points scored)
Situation 33 leads to situation -42 (no points scored)
Situation -42 leads to situation -64 (no points scored)
Situation -64 leads to situation -100 (-7 net points scored)
Situation -100 leads to situation 37 (no points scored)
Situation 37 leads to situation -100 (-7 net points scored)
. . .

Now imagine you have 800 games worth of logs that look like that. Let's define V_i to be the value of Situation i. Our goal is to find V_i for all 202 situations. How to do that?

Well first of all, we declare that V_-i = -V_i. That is, if any given situation is worth, say 3 points to the offense, then it must by definition be worth -3 points to the defense. So now we just have to find the values for the positive situations.

Now, the value of Situation i is the average net points that all situation is led to immediately, plus the average value of the situations that resulted after a situation i.

Look at the log above. The Jets went from a situation 20 to a situation 33 and scored no points in between. The value of that particular instance of situation 20 to the Jets was the points they got (zero) plus the value of the next situation (situation 33). Mathematically:

V_20 = 0 + V_33

Now if we scoured the data for all the situation 20s that occurred for all teams in the data set, then we could average together the resulting values to get an overall value for situation 20. The equation would be:

V_20 = (average immediate net points from all situation 20s)
(average value of the resulting situations)

So V_20 is going to be defined in terms of V_33 and probably all the other Vs too. Likewise, each of those Vs is going to be defined in terms of all the other Vs. We have 101 values we want to find and we want them, collectively, to solve 101 equations. Does this sound familiar? Careful readers of this blog will notice that it's the exact same setup we used to put a point value on teams in this post. We're using it to put a point value on situations here.

In the team context described in the above-linked post, that mathematical method takes into account point margin, strength of opponents, strength of opponents' opponents, strength of opponents' opponents' opponents, and so on. In this context, the same method takes into account, for each situation, the net points scored from that that situation, the net points scored from the situations that it leads to, and from the situations those situations lead to, and so on.

I'm currently reading a book called The Wisdom of Crowds, by James Surowiecki. I'm only a few chapters in, but I can already recommend it with confidence. However, Surowiecki summarizes Romer's paper in Chapter Three, and he gets this part wrong:

When [Romer] was done, he had figured out the value of a first down at every single point on the field. A first-and-ten on a team's own twenty yard line was worth a little bit less than half a point --- in other words, if a team started from its own twenty yard line fourteen times, on average it scored just one touchdown.

I know that he is trying to simplify things, but this is a very important point if you want to understand the paper. The half-a-point value of a first at the 20 includes not only the points that you might score on that drive, but also the points your opponent might score with the field position you're likely to give them if you don't score, and the points you're likely to score with the field position they give you after they do or don't score, and so on.

Tomorrow, I'll wrap up this discussion with a quick summary of some other work that's been done on point values, fourth downs, and punting.

Posted in Statgeekery | 1 Comment »

My dad can beat up your sister

13th May 2006

I'm not going to make a habit of posting on the weekends, but I couldn't pass up this opportunity to point out a textbook example of what Bill James used to call the "my dad can beat up your sister" argument.

is former NFL quarterback Doug Williams commenting on the Vikings' 2nd round pick, quarterback Tarvaris Jackson of Alabama State:

He's faster than Matt Leinart, and he can throw the ball better than Vince Young.

I'll bet he's also stronger than Ken Dorsey, more motivated than Joey Harrington, and soberer than Brian Griese.

Posted in General | 1 Comment »

Jimmy Smith

12th May 2006

Since Jimmy Smith decided to retire yesterday, let's take a day (and a weekend) off from the discussion of the Romer paper and spend it putting Smith's career stats in perspective.

We've already had a couple of recent posts on ranking wide receivers. In this system, which is based on percentage of team receiving yards, Smith ranked behind only Michael Irvin among all receivers whose careers started after 1977. I don't think anyone believes Smith has had a better career than, say, Jerry Rice. But still, it's notable that Smith has been able to account for an extremely high percentage of his teams' passing production over the years. In this one, Smith ranks 23rd among all receivers whose careers started in 1970 or later. That system probably punishes him unfairly for his zero-catch-zero-yard seasons in Dallas at the beginning of his career.

Let's just focus on putting Smith's career numbers in perspective with his contemporaries. I'll define someone to be a contemporary of Smith if his debut year was within four years of Smith's 1992 debut. So all receivers who debuted between 1988 and 1996 will be included. That gets us from Tim Brown to Terrell Owens, but doesn't inlcude geezers like Rice or whippersnappers like Randy Moss. Here are the basic stats, sorted by receiving yards:

Player REC YD TD
Tim Brown 1094 14934 100
*Marvin Harrison 927 12331 110
Jimmy Smith 862 12287 67
*Isaac Bruce 813 12278 77
Michael Irvin 750 11904 65
*Rod Smith 797 10877 65
*Keenan McCardell 825 10680 62
*Terrell Owens 716 10535 101
Andre Rison 743 10205 84
*Keyshawn Johnson 744 9756 60
Rob Moore 628 9368 49
Herman Moore 670 9174 62
Anthony Miller 595 9148 63
*Eric Moulds 675 9091 48
Tony Martin 593 9065 56
*Ricky Proehl 666 8848 54
Terance Mathis 689 8809 63
*Johnnie Morton 624 8719 43
*Muhsin Muhammad 642 8501 48
*Joey Galloway 550 8501 64
Curtis Conway 594 8230 52
Jeff Graham 542 8172 30
Sterling Sharpe 595 8134 65
*Joe Horn 539 7822 53
*Amani Toomer 529 7797 44
*Terry Glenn 523 7776 38

It's instructive to compare Jimmy Smith's career with Tim Brown's. Brown has 2700 yards on Smith, but Brown has a lot of junk yards and Smith has almost none. Tim Brown had seasons of 265, 554, 693, 567, and 200 yards. That's about 2200 yards that his teams likely could have gotten out of any old waiver wire receiver.

One of my favorite arbitrary-but-interesting stats is Yards Over 1000. We compute it simply by starting to count the yards only after they reach the 1000 mark. So Jimmy Smith, who had 1023 yards last season, gets credit for 23 yards. In 2003 Smith had 805 yards, so he gets credit for zero. The idea is that you get credit only for doing something above and beyond the ordinary. Among the same group of receivers, here are the leaders in Yards Over 1000:

Player YdOv1000
*Marvin Harrison 2853
Michael Irvin 2330
*Isaac Bruce 2228
Jimmy Smith 2194
Tim Brown 1992
*Rod Smith 1643
*Terrell Owens 1562
Herman Moore 1448
Sterling Sharpe 1382
*Joe Horn 1316
*Eric Moulds 1024
*Muhsin Muhammad 841
Jake Reed 800
*Keenan McCardell 784
Andre Rison 749
Antonio Freeman 741
*Amani Toomer 731
Yancey Thigpen 705
Anthony Miller 660
*Keyshawn Johnson 655
Tony Martin 613
Rob Moore 610
Carl Pickens 564
Terance Mathis 533
Brett Perriman 509
Robert Brooks 507
Derrick Alexander 499
*Joey Galloway 422
*Terry Glenn 415

I don't think Rod Smith is going to catch him on this list. Owens might, but it's not a given. If you look at Yards Over 1200, Smith is fourth behind the same three guys. If you prefer Yards Over 800, he's second behind only Harrison.

All things --- well, all numbers --- considered, I think the only guys on this list that can make a case as being better than Jimmy Smith are:

  1. Marvin Harrison - almost certainly will end up being the top guy in this cohort.
  2. Tim Brown - even throwing away his junk yards, Brown is close to Smith yardagewise. And he's got a big touchdown advantage, if you're into that kind of thing.
  3. Michael Irvin - very similar profile to Smith: huge piece of a small receiving pie, and relatively low touchdown totals. Irvin played on better teams, which some people would use as evidence in favor of Irvin and others would use as evidence in favor of Smith.
  4. Terrell Owens - Smith currently has better looking overall career numbers. But that could change in a hurry.
  5. Isaac Bruce - three years younger than Smith. Right now I'd give Smith the nod over Bruce based on the systems they played in. But a couple more good seasons by Bruce could change that.
  6. Rod Smith - statistically, Jimmy looks slightly but clearly better. To flip that, Rod would need a couple of seasons better than I expect him to have. But it's close enough that someone more knowledgeable than I might be able to make a case for him based on blocking and/or other things.

I don't think any of the other active receivers --- Horn, Moulds, Muhammad, McCardell, Keyshawn, Galloway --- can reach Smith. So out of this nine-year slice of history, Smith's numbers will end up somewhere between the 2nd-best and the 7th-best. I have to admit that Smith's name doesn't immediately leap to my mind when I think of the great receivers of the era, but he put together a very impressive career.

Posted in General | 4 Comments »

David Romer’s paper II

11th May 2006

This is a continuation of yesterday's post, so you might want to read that one real quick if you haven't.

Assume you are coaching an average team against another average team and assume it's very early in the game.

  1. Would you give the other team half a point for the right to receive the kickoff in both halves? A full point?
  2. Would you rather have the ball, first-and-10, on your own 14 yard line, or would you rather your opponent have the ball, first-and-10, on their 14 yard line?
  3. If you had first-and-10 at your opponent's 42 yard line, would you trade it, on the spot, for a guaranteed field goal?
  4. Would you rather (A) be leading by three points and have the ball, first-and-10, at midfield, or (B) be leading by seven points, but your opponent has the ball, first-and-10, at midfield?

  5. If you had first-and-10 at your own one yard line and you knew with 100% certainty that you could execute a 56-yard quick kick (with no return), would you do it?

If you want to answer these questions, then you need a way to translate situations into point values. Is 50 yards worth three points? Four points? How many points is possession of the ball worth? And do the answers to these questions depend upon where the ball is? The cornerstone of Romer's paper is putting a point value on every situation, and I'll explain later exactly how he does it. If you are trying to maximize your point differential, which, early in the game, is essentially equivalent to maximizing your probability of winning, then the paper suggests that each of the above decisions is between two equally attractive options. If you found them to be difficult decisions, then your intuition matches up with Romer's model.

Here is a picture that summarizes his model:

The point value of a first-and-ten at any given yard line

Starting at the left, note that if the yard line (on the x-axis) is 1, the associated point value (on the y-axis) is -1.6. That says: if it's a tie game and you have first-and-10 at your own one yard line, then it's really not a tie game. You're morally trailing by 1.6 points. At about your 15 yard line, the point value is zero. That says that you should be indifferent between having the ball at your own 15 and having your opponent have the ball at his 15. At midfield, the point value is about +2.

What doesn't show up on the chart is the kickoff situation. If your team is lining up to kick off, you are in a -.6 point situation. This jives with the fact that +.6 is the value of having first-and-10 on your own 27, which is roughly the average starting field position after a kickoff. The fact that a kickoff is worth -.6 points is crucial, because scores are always followed by kickoffs. So a field goal isn't really worth 3 points. It's worth only 2.4 points. Likewise, a touchdown is worth 6.4 points.

Let's look at those questions again:

  1. Would you give the other team half a point for the right to receive the kickoff in both halves? A full point? - as discussed above, the right to receive a kickoff is worth about .6 points, so either of those trades would be a close call.
  2. Would you rather have the ball, first-and-10, on your own 14 yard line, or would you rather your opponent have the ball, first-and-10, on their 14 yard line? - as discussed above, the 14 or 15 yard line is the place where the point value is zero, so this is an even trade either way.
  3. If you had first-and-10 at your opponent's 42 yard line, would you trade it, on the spot, for a guaranteed field goal? - the point value of a first-and-10 at your opponent's 42 is around +2.4, the same as the value of a field goal. Again, even trade.
  4. Would you rather (A) be leading by three points and have the ball, first-and-10, at midfield, or (B) be leading by seven points, but your opponent has the ball, first-and-10, at midfield? - a first-and-10 at midfield is worth about +2 points, so the difference between having the ball and not having the ball is 4 points. Even trade.
  5. If you had first-and-10 at your own one yard line and you knew with 100% certainty that you could execute a 56-yard quick kick (with no return), would you do it? - first-and-10 at the 1 is worth -1.6. First-and-10 at your own 44 is worth about +1.6. So if your opponent has first-and-10 at his 44, that's worth about -1.6 to you. Same as having the ball at your own 1. Even trade.

Before I talk further about it, I need to point out that the model is built from play-by-play data taken from the first quarters of games only. It therefore does not take into account end-of-half or end-of-game situations where a particular number of points are crucial. For example, consider the third question above. That question starts to look a lot different if there is one minute left in the game and you're trailing by 2. Or by 4. Or leading by 6. Romer's model serves as a guide only for teams that are trying to maximize points and that are not worried about end-of-half or end-of-game maneuvering. For that reason, you should assume that all the strategy questions above and below are taking place in the early stages of a generic game.

A very similar system of equating situations with point values was described (more than twenty years ago, it's worth noting) in The Hidden Game of Football, by Pete Palmer, John Thorn, and Bob Carroll. When I read that, it drastically changed the way I watch football. I'm a bit of a freak, I'll give you that, but even for a normal and well-adjusted football fan, this chart has all sorts of interesting implications.

First let's talk about what we're supposed to be talking about: fourth down strategy. Suppose you have fourth-and-one at your own 20 yard line. No one ever even considers going for it in this situation because everyone focuses on what happens if you fail. Indeed, if you don't get the first, you have given your opponent the gift of a +3.6 point situation. That's bad. But what people fail to see is that the punt --- let's assume it nets 40 yards --- puts them in a good situation too: +1.4 points. So you're not gambling 3.6 points, you're gambling 2.2 points.

Let's say your probability of picking up the first is p. If you get it, you'll be in a +.4 situation. If you don't, you'll be in a -3.6 situation. So if you decide to go, your expected situation is p * (.4) + (1-p) * (-3.6). If you punt, your expected situation is -1.4. Setting those two equal and solving for p yields a breakeven point of about p = .55. In other words, if you think you have a 55% or better chance of making the first, you are better off going for it. As any football fan knows, but Romer demonstrates anyway, there is good reason to believe that teams often have a better than 55% chance of making a first down, but punt anyway. (What Romer does is actually a bit more complex, and takes into account the possibilities of blocked punts, fumbled punts, longer gains on the 4th down attempt, and essentially anything else that might happen.)

Let's examine a few other interesting implications of the model:

  • What is a successful inside-the-5 punt worth? According to the model, the difference between a first down at the 20 and a first down at the five yard line is about 1.25 points.
  • Take a look at the symmetry of the graph. A consequence of that symmetry is that the cost of a turnover is virtually independent of where it takes place on the field. If you turn the ball over on your own 10, you go from a -.3 situation to a -4.3 situation, so it costs you about 4 points. If turn it over at midfield, you go from about a +2 situation to a -2 situation. Again, 4 points. Likewise, a turnover at your opponent's 20 moves you from a +3.6 to a -.4.
  • The chart quantifies what we all know: that yardage between the 20s is cheaper than red zone yardage. Moving 10 yards from your own 1 to your own 11 is worth the same amount of points as moving 23 yards from your 11 to your 34. I think this is part of why punting isn't that great of a deal. Unless you're backed way up, the yardage that you gain by punting is cheap yardage. The slope of that curve in the non-red zone is about 1/18, which means that 18 yards is worth a point. So most punts gain you about two points worth of yardage. You lose the ball though, which is a four-point swing, so a typical punt is a -2 point play. A failed fourth down attempt, obviously, is worse than that, but it's not that much worse.

Tomorrow I will describe the particulars of how Romer arrived at the all-important chart pictured above.

Posted in Statgeekery | 16 Comments »

David Romer’s paper

10th May 2006

I have an intuitive "proof" that coaches are generally too conservative on fourth downs. It goes like this. The next time you're watching a football game, pay attention for a situation where your team in on defense and it's fourth-and-1-or-2 somewhere around midfield. While the replay is on your screen, the announcers will spend a second or two discussing the possibility that they might go for it. You're not sure what's going to happen. When the TV cuts back to live action, you see the punter trotting onto the field. If you're like me, this is a huge relief. I am always nervous when I think they might go for it and relieved when they don't. That's my gut telling me that going for it is the right move there. Does your gut tell you the same thing?

I'm going to devote the next few (or several, depending on how it goes) posts to the topic of fourth down strategy. Readers of this blog are probably familiar with various studies indicating that coaches ought to go for it more often. The first such study I encountered was in a book called The Hidden Game of Football which I'm sure many of you have read.

A few years ago, this paper by Berkeley economist David Romer got a lot of publicity. It used to be titled It's Fourth Down and What Does the Bellman Equation Say? A Dynamic-Programming Analysis of Football Strategy, but he has changed the name to Do Firms Maximize? Evidence from Professional Football. It was just published last month in The Journal of Political Economy. The abstract says:

Examination of teams' actual decisions shows systematic, clear-cut, and overwhelmingly statistically significant departures from the decisions that would maximize teams' chances of winning.

The decisions he's talking about are whether to go for it or kick on fourth downs. As you can guess from the abstract, Romer concludes that coaches kick too much --- both field goals and punts --- and go for it too little on fourth down. The point of his paper, at least from the standpoint of a reader of the Journal of Political Economy, is to test whether firms truly do exhibit profit-maximizing behavior as is routinely assumed in economic theory. Romer starts by equating profits with wins:

the problem of maximizing profits [in football] plausibly reduces to the much simpler problem of maximizing the probability of winning

He then demonstrates rather convincingly that teams' fourth down decisions are not optimal from the standpoint of maximizing their probability of winning, and spends the last few pages of the paper wondering why. He points to several studies which indicate that people will under certain circumstances prefer options with less risk even when the expected payout is greater for the riskier option (and I mean the expected payout after taking into account the risk). In other words, people often value conservatism for conservatism's sake. Romer says:

previous work provides little evidence about the strength of the forces pushing decision-makers toward conservatism. The results of this paper suggest that the forces may be shockingly strong.

In my opinion, it's painfully obvious why coaches make these decisions. This is not an indictment of the paper --- I'm sure it's written in a style that's appropriate for the journal in which it appears --- but I didn't have the patience to fight through all the jargon in the last section of the paper. So Romer may have alluded to this, but I wasn't sure. The reason NFL coaches behave so conservatively in this situation is because they are behaving in such a way as to maximize not their probability of winning but their quality of life. For an NFL coach, quality of life certainly is largely determined by winning percentage and is also highly dependent on job security which is in turn largely determined by winning percentage. But the payoff of straying from "the book" is nowhere near worth the cost. Romer states:

This evidence suggests that a rough estimate of the potential gains from going for it more often on fouth downs is . . . an increase of about 2.1 percentage points in the probability of winning. Since an NFL season is 16 games long, this corresponds to slightly more than one additional win every three seasons.

Imagine you're an NFL coach. You have the option of winning an expected 6 games this year or winning an expected 6.33 games and fielding approximately 1,846,344 questions per day about your decision to go for it on fourth-and-one from your own 22 on your first drive. Those .33 wins aren't going to save your job. But unless your owner understands what you're doing and is also willing to ignore the legions of fans and writers who don't, your nonstandard decisions could cost you your job. Romer is obviously not claiming that you'll always make it if you go for it more often on fourth down; he's saying that, in the long run, the benefits you'll get when you do make it exceed the costs you incur when you don't. A coach employing the strategies suggested in this paper would frequently make the right choice and have it not work out. Ask Barry Switzer how fun that is.

Of course this brings up the question of how the non-optimal default fourth down strategies got into "the book" in the first place. I suspect that, given the game conditions in the early days of football, punting on fourth down was almost always the optimal decision. The conditions of the game changed slowly enough that no one noticed when some critical threshhold was reached that should have caused the default decisions to change. But I'm really not sure about that.

I'll spend the next day (or more) describing the mathematical details of Romer's method and its implications. [EDIT: here is the link to the next post in the sequence.]

Posted in Statgeekery | 13 Comments »

Another ranking system

9th May 2006

I'm essentially writing these down for my own benefit, so that if I forget how some of these things work I'll have a document to refer to. If you enjoy reading along, sit a spell. If not, I should be on to different topics tomorrow.

There is a particular style of argument, rarely used in NFL discussions but a staple for college football fans, that is tempting to use because it is based on a very reasonable premise but that is always doomed to lose. You might call it the argument by transitivity. Notre Dame is better than LSU because Notre Dame beat Tennessee and Tennessee beat LSU. Oregon is better than Notre Dame because Oregon killed Stanford and Notre Dame barely beat them. Arizona State is better than Auburn because they beat Northwestern who beat Wisconsin who beat Auburn.

As you know, this argument can't be taken seriously because it can be used to prove that just about any team is better than just about any other team. If you want to have a little fun with it, this page will let you do just that. Now indulge me briefly while I break down the mathematics of this argument.

The scoreboard says:

Tennessee beat LSU by 3

It's not much of a stretch from there to:

Tennessee is 3 points better than LSU

If you wanted to construct a mathematical model out of that bit of information, you might do this:

R_ten - R_lsu = 3

where R_ten is Tennessee's rating and R_lsu is LSU's. Put that with the rest of your data, though, and your mathematical model is shot. It looks like this:

R_ten - R_lsu = 3
R_lsu - R_vandy = 28
R_vandy - R_ten = 4
[. . . about 800 more equations . . ]

You've got about 800 equations and about 120 unknowns, but you can already tell that there will be no solution. Tennessee's rating has to be bigger than LSU's, LSU's has to be bigger than Vanderbilt's, and Vanderbilt's has to be bigger than Tennessee's. Impossible. Mathematically speaking, there is simply no way to assign a number to every team in such a way that all the results match up with the numbers exactly. That's why the argument by transitivity fails.

At this point, you probably think I'm insulting your intelligence. You understood all that without me having to get all mathy on you. But I needed to get all mathy to describe what happens next. We know the argument by transitivity doesn't work. But it's still popular, and the reason is that it's premise is reasonable. So let's add some extra stuff to give the argument a bit of wiggle room. When Tennessee beats LSU by 3, instead of saying:

R_ten - R_lsu = 3

I'll say

R_ten - R_lsu = 3 + e1

The extra e1 is a fudge factor. The above equation says, "The difference between Tennesee and LSU is 3 points plus or minus some other stuff that didn't show up on the scoreboard." So our collection of equations now looks like this:

R_ten - R_lsu = 3 + e1
R_lsu - R_vandy = 28 + e2
R_vandy - R_ten = 4 + e3
[. . . about 800 more equations . . ]

Remember that the es represent the stuff that didn't show up on the scoreboard. Since we want our ranking system to be objective, we take the viewpoint that the scoreboard is what matters and the es are there only because they have to be. So what we want to do is make the combined size of the es as small as possible. (For technical reasons that aren't important to the argument, we will want to minimize the sum of the squares of the es rather than the es themselves, but don't worry about that.)

Imagine that you have three dials --- one marked Tennessee, one marked LSU, and one marked Vanderbilt --- on a control panel. You can increase or decrease a team's rating by turning their dial. Now imagine that the total (squared) e is the volume. The object is the make the volume as low as possible. If you tune the Tennessee dial higher, then the volume from e1 goes down, but the volume from e3 goes up. As you tune LSU's dial, it affects the volume of e1 and e2, and Vandy's affects e2 and e3. The idea is to tune all three dials to a place that achieves the lowest possible volume. Now add 117 dials, each of which affects 11 or 12 es, tune to the lowest possible volume and you've got yourself a rating for all Division I college football teams.

The lower the volume, the lower the sum of the squared es and hence the better that set of ratings matches up with the actual game results. What we want to do is to find the lowest possible total, out of all possible sets of ratings. That would be set of ratings that is the best match for the actual data. A computer, properly programmed, can find this collection of ratings.

To summarize: if you want to play the transitivity game with any set of ratings, you're going to run into some contradictions. It's unavoidable. This system is designed to run into as few contradictions as possible. Or, more precisely, to minimize the total magnitude of all the contradictions.

OK, now here's the neat thing: the ranking system described above turns out to be the same as the one described yesterday. The descriptions are different and the mathematical tools used to get the answer are different, but you end up in the same place.

Have you ever, in your life, seen anything cooler than that?

Posted in BCS, Statgeekery | 2 Comments »

A very simple ranking system

8th May 2006

My friend Joe Bryant says that the BCS bowl matchups are like getting a shrimp cocktail at Morton's Steakhouse. Sure, it's better than what you normally eat, but at the same time it's frustrating and disappointing because you can see a bunch of far preferable alternatives right there in front of your eyes. I tend to agree. Nonetheless, it is not in any way an exaggeration to say that the BCS revived my interest in college football. Not because of the matchups the system has produced, but because it gave me an excuse to learn some very interesting mathematics.

As you probably know, the participants in the BCS championship game are determined in part by a collection of computer rankings. Those computer rankings are implementing algorithms that "work" because of various mathematical theorems. At some point, I'm going to use this blog to write down everything I know about the topic (which by the way is a drop in the bucket compared to what many other people know; I am not an expert, just a fan) in language that a sufficiently interested and patient non-mathematician can understand.

I'll start that off today by describing one of the most basic ranking algorithms.

The idea is to define a system of 32 equations in 32 unknowns. The solution to that system will be collection of 32 numbers and those numbers will serve as the ratings of the 32 NFL teams. Define R_ind as Indianapolis' rating, R_pit as Pittsburgh's rankings, and so on. Those are the unknowns. The equations are:

R_ind = 12.0 + (1/16) (R_bal + R_jax + R_cle + . . . . + R_ari)
R_pit = 8.2 + (1/16) (R_ten + R_hou + R_nwe + . . . . + R_det)
R_stl = -4.1 + (1/16) (R_sfo + R_ari + R_ten + . . . . + R_dal)

One equation for each team. The number just after the equal sign is that team's average point margin. In plain English, the first equation says:

The Colts' rating should equal their average point margin (which was +12), plus the average of their opponents' ratings

So every team's rating is their average point margin, adjusted up or down depending on the strength of their opponents. Thus an average team would have a rating of zero. Suppose a team plays a schedule that is, overall, exactly average. Then the sum of the terms in parentheses would be zero and the team's rating would be its average point margin. If a team played a tougher-than-average schedule, the sum of the terms in parentheses would be positive and so a team's rating would be bigger than its average point margin.

It would be easy to find the Colts' rating if we knew all their opponents' ratings. But we can't figure those out until we've figured out their opponents' ratings, and we can't figure those out until. . ., you get the idea. Everyone's rating essentially depends on everyone else's rating.

So how do you actually find the set of values that solves this system of equations? In high school you probably learned how to solve 2-by-2 and maybe 3-by-3 systems of equations by putting some numbers into a matrix, doing some complicated operations on that matrix, and then reading the solutions off the new matrix. Same thing here, except you've got a 32-by-32 matrix instead of a 2-by-2 matrix. If you wanted college football rankings, it'd be 120-by-120. I recommend using a computer.

It's more instructive, though, to solve it a different way. We'll start by giving everyone an initial rating, which is just their average point margin. I'll use the Colts as an example. Their initial rating is +12.0. Now look at the average of their opponents' intial ratings:

Opp Rating
ari -4.75
bal -2.12
cin 4.44
cle -4.31
hou -10.69
hou -10.69
jax 5.75
jax 5.75
nwe 2.56
pit 8.19
ram -4.12
sdg 6.62
sea 11.31
sfo -11.81
ten -7.62
ten -7.62

Those average -1.2, so the Colts' new rating will be 12.0 - 1.2, which is 10.8. So after this calculation the Colts' rating changed from +12 to +10.8. But meanwhile, every other team's rating changed as well, so we have to do the whole thing over again with the new ratings. On the second pass, the Colts schedule looks a bit different:

Opp Rating
ari -4.76
bal -1.49
cin 4.09
cle -3.85
hou -9.69
hou -9.69
jax 4.85
jax 4.85
nwe 3.09
pit 8.02
ram -5.16
sdg 8.62
sea 8.99
sfo -10.77
ten -7.30
ten -7.30

The average of these is -1.1, so the Colts' opponents aren't quite as bad as they looked at first. Indy's new rating is 12.0 - 1.1, which is 10.9. Uh oh! Everyone else's ratings just changed again, so we've got to run through the same procedure again. And again. And again. And eventually the numbers stop changing. When that happens, you know you've arrived at the solution. Take a look at the Colts schedule with the final rankings and you'll be able to convince yourself that this method works:

WK OPP Margin Rating Margin
1 bal 17 -1.83 15.17
2 jax 7 4.76 11.76
3 cle 7 -4.22 2.78
4 ten 21 -7.57 13.43
5 sfo 25 -11.15 13.85
6 ram 17 -5.15 11.85
7 hou 18 -10.03 7.97
9 nwe 19 3.14 22.14
10 hou 14 -10.03 3.97
11 cin 8 3.82 11.82
12 pit 19 7.81 26.81
13 ten 32 -7.57 24.43
14 jax 8 4.76 12.76
15 sdg -9 9.94 0.94
16 sea -15 9.11 -5.89
17 ari 4 -4.98 -0.98
AVERAGE 12.0 -1.20 10.80

How to read this table: in week 1, the Colts beat the Ravens by 17. The Ravens were, all things considered, 1.83 points worse than average, so the Colts got a "score" of 17 - 1.83, or 15.17 for that game. In week 2, the Colts beat the Jaguars by 7. Jacksonville was 4.76 points better than average, so the Colts get an 11.76 for that game. Average their scores for each game and you've got their rating. The bottom line says:

The Colts' won their games by an average of 12 points each. Their opponents were, on average, 1.2 points worse than average. Thus the Colts were 10.8 points better than average.

Let's examine some of the features of this system:

  • The numbers it spits out are easy to interpret - if Team A's rating is 3 bigger than Team B's, this means that the system thinks Team A is 3 points better than Team B. With most ranking algorithms, the numbers that come out have no real meaning that can be translated into an English sentence. With this system, the units are easy to understand.
  • It is a predictive system rather than a retrodictive system - this is a very important distinction. You can use these ratings to answer the question: which team is stronger? I.e. which team is more likely to win a game tomorrow? Or you can use them to answer the question: which of these teams accomplished more in the past? Some systems answer the first questions more accurately; they are called predictive systems. Others answer the latter question more accurately; they are called retrodictive systems. As it turns out, this is a pretty good predictive system. For the reasons described below, it is not a good retrodictive system.
  • It weights all games equally - every football fan knows that the Colts' week 17 game against Arizona was a meaningless exhibition, but the algorithm gives it the same weight as all the rest of the games.
  • It weights all points equally, and therefore ignores wins and losses - take a look at the Colts season chart above. If you take away 10 points in week 3 and give them back 10 points in week 4, you've just changed their record, but you haven't changed their rating at all. If you take away 10 points in week 3 and give back 20 points in week 4, you have made their record worse but their rating better. Most football fans put a high premium on the few points that move you from a 3-point loss to a 3-point win and almost no weight on the many points that move you from a 20-point win to a 50-point win.
  • It is easily imressed by blowout victories - this system thinks a 50-point win and a 10-point loss is preferable to two 14-point wins. Most fans would disagree with that assessment.
  • It is slightly biased toward offensive-minded teams - because it considers point margins instead of point ratios, it treats a 50-30 win as more impressive than a 17-0 win. Again, this is an assessment that most fans would disagree with.
  • This should go without saying, but - I'll say it anyway. The system does not take into account injuries, weather conditions, yardage gained, the importance of the game, whether it was a Monday Night game or not, whether the quarterback's grandomother was sick, or anything else besides points scored and points allowed.

This system, like all systems, has some drawbacks, but it has the virtue of simplicity. It is easy to understand and it produces numbers that are easy to interpret. That is not to be sneezed at.

Furthermore, most of its drawbacks have easy fixes. For example, when computing a team's initial rating --- i.e. their average point margin --- you can tweak the individual game margins to make the initial rating "smarter." One way to do that is to cap the margin of victory at 21 points, or 14 points or whatever you want. You can explcitly incorporate wins and losses by giving the winning team a bonus of 3 points or 10 points or however many you want. To take it to the extreme, you could simply define all wins to be one-point wins and all losses to be one-point losses. This removes margin of victory from the scene completely. As usual, when you tweak the method to stengthen its weaknesses, you also weaken its strengths. In particular, if you use a modified margin of victory, the numbers don't have as nice an interpretation.

I'll close with some rankings. Here are the NFL's 2005 regular season rankings according to the original method:

Team Rating StrOfSched
1. ind 10.8 -1.2
2. den 10.8 2.2
3. sdg 9.9 3.3
4. sea 9.1 -2.2
5. pit 7.8 -0.4
6. nyg 7.5 0.7
7. kan 7.0 2.1
8. was 6.0 1.9
9. car 5.1 -3.2
10. jax 4.8 -1.0
11. cin 3.8 -0.6
12. dal 3.2 2.1
13. nwe 3.1 0.6
14. chi 1.4 -2.2
15. mia -0.8 -0.8
16. tam -1.0 -2.6
17. atl -1.2 -1.9
18. bal -1.8 0.3
19. phi -2.3 2.6
20. oak -2.8 3.0
21. min -3.5 -1.1
22. gnb -3.7 -0.8
23. cle -4.2 0.1
24. ari -5.0 -0.2
25. ram -5.1 -1.0
26. buf -5.8 0.2
27. nyj -6.4 0.8
28. det -6.7 -1.0
29. ten -7.6 0.1
30. hou -10.0 0.7
31. nor -11.1 -0.9
32. sfo -11.1 0.7

Here they are if every win of less than 7 points is counted as a 7-point win and if the margin of victory is capped at 21.

Team Rating StrOfSched
1. den 10.1 1.6
2. ind 9.9 -1.4
3. sea 7.1 -1.9
4. sdg 6.9 2.9
5. nyg 6.3 0.7
6. pit 6.1 -0.6
7. was 5.5 1.6
8. kan 5.4 1.7
9. car 4.8 -2.3
10. jax 4.8 -1.1
11. cin 3.8 -0.9
12. dal 3.6 1.6
13. nwe 2.8 0.7
14. chi 1.5 -1.8
15. tam 0.9 -1.9
16. mia 0.6 -0.7
17. atl -0.4 -1.3
18. min -1.8 -1.1
19. phi -1.9 2.1
20. cle -3.2 -0.2
21. bal -3.4 0.3
22. oak -3.6 2.6
23. gnb -4.9 -0.5
24. buf -5.1 0.3
25. ram -5.1 -0.7
26. ari -5.1 -0.1
27. nyj -5.8 0.8
28. det -6.0 -0.8
29. ten -6.7 -0.1
30. sfo -8.1 0.5
31. nor -9.2 -0.4
32. hou -9.8 0.5

Here they are with margin of victory removed altogether:

Team Rating StrOfSched
1. den 0.69 0.07
2. ind 0.66 -0.09
3. sea 0.50 -0.12
4. jax 0.42 -0.08
5. nyg 0.42 0.04
6. was 0.37 0.12
7. pit 0.34 -0.03
8. kan 0.33 0.08
9. cin 0.31 -0.07
10. sdg 0.29 0.17
11. nwe 0.26 0.01
12. chi 0.26 -0.11
13. tam 0.25 -0.13
14. car 0.24 -0.13
15. dal 0.22 0.09
16. mia 0.06 -0.07
17. min 0.05 -0.07
18. atl -0.06 -0.06
19. phi -0.14 0.11
20. bal -0.23 0.02
21. cle -0.26 -0.01
22. ram -0.28 -0.03
23. oak -0.36 0.14
24. ari -0.37 0.01
25. buf -0.37 0.01
26. det -0.41 -0.03
27. sfo -0.44 0.06
28. nyj -0.45 0.05
29. gnb -0.49 0.01
30. ten -0.50 0.00
31. nor -0.63 -0.01
32. hou -0.71 0.04

ADDENDUM: I need to clarify one thing about the simple rating system: it’s not my system. I didn’t invent it. In fact, it’s one of those systems that has been around for so long that no one in particular is credited with having developed it (as far as I know anyway). People were almost certainly using it before I was born. I like the system and use it a lot because it’s fairly easy to interpret and understand, and because the math behind it is nifty. But I just realized that I had never been clear enough about the fact that it’s not my system. I just use it.

Posted in BCS, Statgeekery | 79 Comments »

Reggie Bush: punter

5th May 2006

Friday is the day for meaningless banter. With that in mind, I offer this old story, which indicates that Reggie Bush wants to petition to be able to wear number 5 in the NFL.

NFL rules, however, don't allow for running backs to take that number -- 33 years ago the league adopted a numbering system to make it easier for officials to differentiate players by position.

Under the rule, quarterbacks, punters and placekickers wear numbers 1 through 19. Running backs and defensive backs are assigned 20 through 49, while wide receivers and tight ends are given numbers 80 through 89.

Here is my question: are "quarterbacks, punters and placekickers, . . . running backs and defensive backs, . . . wide receivers and tight ends" actually defined in the rules of football or the rules of the NFL? As far as I know, on any given play there are players who are eligible receivers and players who are not eligible receivers (and are therefore not eligible to be downfield on pass plays). I can understand wanting to have different sets of numbers for those two groups. But is there anything in the rules that distinguishes a running back from a wide receiver or a quarterback?

What if some team decides to get creative with someone like Vince Young or Matt Jones, line him up all over the field, have him take the snap some of the time but not at other times, hand off some of the time and get handed off to at other times, pass the ball sometimes and receive it sometimes? What number is he allowed to wear? What if they let Reggie take a snap and kneel down at the end of a preseason game? Or kick an extra point or do a quick-kick? Or what if they just list him as the fourth-string kicker? Can he wear #5 then?

I'm with you, Reggie.

As an aside, why doesn't some team --- I'm looking in your direction, Jim Mora, Jr. --- get creative with someone like Vince Young or Matt Jones or Michael Vick, line him up all over the field, have him take the snap some of the time but not at other times, hand off some of the time and get handed off to at other times, pass the ball sometimes and receive it sometimes? Assuming Matt Schaub really is a good quarterback, I wonder what Bill Cowher would do if he had Michael Vick and Matt Schaub on his team...

Posted in General | 13 Comments »