SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.
Also, our existing PFR blog rss feed will be redirected to the new site's feed.
Pro-Football-Reference.com ยป Sports Reference
For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.
Is less more?
A few weeks ago, footballoutsiders linked to my ten thousand seasons series. A general theme among the comments was that the simulation was inaccurate because it was based on season-long power ratings instead of last-few-weeks power ratings. Because teams' true strengths vary so much during the course of a season, I should have used a smaller but more recent sample instead of using all the data. Less is more. I thought about that for awhile and pondered the possibility that those folks might have a good point.
That caused me to try to build a power rating system based on at-the-time strength of schedule, which I was unable to do. But a by-product of the effort was this post about at-the-time strength of schedule. Interestingly, the majority of the respondents to it felt that taking a five-week slice of data introduced too much variability into the numbers. Use all the data. More is more. I thought about that for awhile and pondered the possibility that those folks might have a good point too.
So I decided to do a quick check. I looked at all games in weeks 10--13 during the years 1990--2005. For each game, I recorded the following information:
- the difference between the two teams' full-season at-the-time ratings according to the simple rating system.
- the difference between the two teams' last-5-weeks at-the-time ratings according to the same system.
So if it's week 12 of 2005 and San Francisco is playing Tennessee, we look at their week 1--11 ratings (which rate the Titans as about 5 points better) and their week 7--11 ratings (which rate the 49ers a couple of points better). I chose to look only at weeks 10 through 13 because week 10 is late enough to show some differentiation between the full-season and at-the-time ratings, and week 13 is early enough that most teams haven't given up or started to rest their regulars or whatever.
Now that we've got all the data collected, we run a logit regression to build a formula that will predict the winner of each game. Result: the at-the-time rating was not significant (in the official statistical sense). That means: if you know the full-season ratings, then there is not sufficient evidence to conclude that knowing the last-5-weeks ratings helps you predict the winners of this week's games.
If you build a formula that uses just at-the-time ratings, it will predict about 62% of the games correctly. If you build a formula that uses just full-season ratings, it will predict about 66.4% of the games correctly. If you build a formula that incoporates both, it will predict about 66.6% of the games correctly.
Interesting.
One problem here is that the simple rating system does not take home field advantage into account. It could be modified to do so, but I've never bothered because NFL teams always play the same number of home and road games during the course of a season. But that's not true in a 5-week stretch, so the last-5-weeks ratings have a bit of noise included in them. I'm not sure how much of a difference that makes, but it might make some.
Assuming the above paragraph doesn't invalidate the study, this looks like pretty clear evidence that, in this case, less is not more.
This entry was posted on Wednesday, July 5th, 2006 at 6:13 am and is filed under General, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Bravo! I found that debate on Football Outsiders to be very thought provoking. I kind of saw three models emerging out of that discussion:
1) closely spaced intrinsic strength ratings with small single game variance
2) more widely spaced intrinsic strength ratings will large single game variance
3) model 1, execept the intrinsic strength ratings are allowed to vary over the course of a season.
The 3rd model seemed to emerge due to inadeqacies of model 1.
My personal subjective opinion was that model 2 was the most appropriate, but model 3 was not easy to dismiss out of hand. Evidence suggesting that a particular team has weakened or strengthened over the course of a season I file under "splits happen". My analysis in the comments of Monday's column was my first attempt to put some meat to this arguement, but suffered from insufficient data. Great job, especially since it supports my internal biases
.
Hey Doug,
Couldn't you invent a weighting function which takes the full season into account, yet diminishes the contribution of games far away from the current game? Perhaps a suitable modification of e^{-x^2} (i.e., normal)?
The answer is, of course you can (though the programming might be cumbersome). Isn't that the direction this is moving?
This is fascinating stuff. No wonder you're getting linked.
Well done, Doug! 5 game streaks aren't statistically significant? Good to finally get that issue resolved.
Very, very interesting stuff Doug. Good work! I think our brain logically remembers teams that turn it on late, but forgets about teams that turn it on late and then stink again. It's easy to remember the 2002 Jets (2-5, then 7-2) since they made the playoffs and won a game; but the 2002 Rams (0-5, 5-0, 2-4) are much easier to forget. But I never would have guessed the results as you showed them.
Doug,
I'd be curious to see some analysis of when the full-season formula succeeds and fails. Is it more accurate in the beginning of the season or towards the end? Is it better at predicting certain teams than others? Or better at predicting certain match-ups (two highly ranked teams, two lowly ranked teams, etc.)?
wthii, that sounds like what weighted DVOA does. IIRC, the weights are such that the most recent 8 weeks are at or near full weight while earlier weeks decline in weight until the first 3 have negligible impact.
I think our brain logically remembers teams that turn it on late, but forgets about teams that turn it on late and then stink again.
There's a fair amount of evidence that teams do trend over the course of a season, and logically, you'd expect them to. This doesn't really say that doesn't happen. It just says that you don't have enough data (just using the SRS) to determine those trends accurately.
Logically it's the same thing as measuring someone's speed once an hour, with 10 mph precision. If someone decelerates between 0-10 mph/h, it'll take a long while to statistically verify the decrease. Doesn't mean it's not happening, just that you can't see it easily.