For reference, here is maximum likelihood, part I. This post won't make much sense unless you've read that one.
Remember that I likened the method of maximum likelihood to trying to twist a bunch of dials (one for each team) so that a particular quantity is as big as possible. If you're looking at a season of 1A college football, you've got 119 dials, the thing you're trying to maximize has about 800 parts to it, and each of the dials directly controls about 12 of those parts.
Suppose you're twiddling with the Florida dial. In that mess of 800 factors, you see (R_flo / (R_flo + R_vandy)). Turning up the Florida dial increases that piece. So turn it up. Likewise, cranking the Florida dial increases the (R_flo / (R_flo + R_arkansas)) bit, so you turn it up some more. But then you notice there's a (R_auburn / (R_auburn + R_flo)) piece in there. Turning up the Florida dial decreases this part. You could counteract that by turning up the Auburn dial, but you know you're going to have to pay a price for that eventually because of the (R_georgia / (R_auburn + R_georgia)) piece, among others.
The point is, there is a place which is "just right" for the Florida dial. They won a lot of games, many of them against good teams (this creates big denominators), so you want to turn their dial up. But you can't turn it up too much, or else it will turn down that Auburn/Florida piece, to the detriment of the entire product.
Now consider Ohio State's dial. Turn it up. Now turn it up some more. Now turn it up some more. Keep turning it up and, because the Buckeyes never lost a game, you'll never run into any problem. There's nothing stopping you from turning Ohio State's dial up to infinity. You can always make that product bigger by turning Ohio State's dial up. Their rating has to be infinite.
That's OK, you say. Ohio State was undefeated and should be ranked first, right? Right, but then note that the same thinking applies to Boise State. They must, in a sense, necessarily be tied with Ohio State with an infinite rating. Is that what we want? Maybe, and maybe not, but I'm pretty sure most people don't want a system that mandates that undefeated teams always rank at the top no matter what.
But the plot thickens. Michigan's only loss was to Ohio State. So the only way it hurts you to turn up Michigan's dial is because of this term: (R_osu / (R_osu + R_mich)). But if Ohio State's ranking is infinite, then you can turn up Michigan's dial without penalty. And since they won all the rest of their games, turning up the Michigan dial helps increase the product. So Michigan, it turns out, needs an infinite rating as well, though not quite as big of an infinite rating as Ohio State's [yes, I'm getting sloppy with the infinities here --- my goal is to give an impression of the way things work, not to be mathematically precise].
Now who else needs an infinite rating? Wisconsin, whose only loss was to Michigan. Once Michigan's dial is jacked up to a gazillion, it doesn't hurt you much to jack Wisconsin's up to a few million.
Rather than start talking about the technicalities of this infinity business, let's just summarize with this: the method of maximum likelihood, in its purest form, mandates that, no matter what the schedules look like, the top ranked teams must be those that have never lost, or have only lost to teams that have never lost, or have only lost to teams that have only lost to teams that have never lost, or ....
In many situations --- basketball, baseball, NFL --- this isn't generally a problem. For college football, it's a huge problem. It's certainly defensible to have Michigan ranked ahead of Florida. But even setting aside Boise State, I don't know too many people who think Wisconsin should be ranked ahead of Florida. Further, if you wanted to rank all 706 college football teams, then any undefeated Division III or NAIA team would have to rank ahead of Florida too.
In my opinion, maximum likelihood is one of the best rating systems around: it has a sound theoretical basis, is relatively easy to understand, and produces what most people consider to be sensible results in most cases. But all models break in some situations and this one unfortunately happens to break right when and where it's needed most: at the top of the standings of a typical college football season.
But there are some ways to fix it.
One way is simply to count a win as a 99% win and 1% loss. How do you do that? Well, the easiest way to think about it is to pretend that every game is 100 games, 99 of which were won by the winner and one of which was won by the loser. Now Ohio State isn't 12-0; they're 1188-12. But the point is that they are now in the denominator of a few terms for which they are not also in the numerator. So their rating won't be infinite. If you do this with the pre-bowl 2006 college football data, you knock Wisconsin down to #9.
This practicality, however, is gained at the expense of elegance. In particular, why 99%? Why not count a win as 94% of a win, or 63%, or 99.99%? The higher that number is, the more your rating system will depend on wins and losses. The lower it is, the more it will depend on strength of schedule. As soon as it gets below 94%, for example, Florida starts to rank ahead of Ohio State. [Astute observers will at this point suggest varying that percentage according to the margin of victory: a 1-point win could count as 60% of a win, for example, while a 28-point win could count as 99% of a win. This indeed can be done --- and I'll do it in a future post --- but for now I'm playing by BCS rules: only Ws and Ls.]
An arbitrary parameter just jars my sensibilities. It might "work" (depending on what you mean by "work"), but it ruins the nice clean description of this method. I have seen a couple of academic papers that employ more complicated fixes, but they also have a parameter and no objective basis for determining what that parameter ought to be.
What I prefer is the simple fix proposed by David Mease. He simply introduces a dummy team and gives every team a win and a loss against that dummy team. Problem solved; now no team is undefeated and no team will have an infinite rating. If you find this a cludgy or arbitrary solution that ruins the theoretical beauty of the method, then you can read Mease's paper, where he explains how the introduction of the dummy team can serve as a set of Bayesian priors. If you're into that kind of thing.
Mease's ratings are among my favorites and, if I were running the BCS, they'd be a part of it. Now back to Peter Wolfe, whose ratings are included in the BCS and who uses something he describes as a maximum likelihood method. He does not specify exactly how he fixes the infinite rating problem. I keep meaning to email and ask him, but for some reason I only remember to do so every year around early December, and I figure he's probably got enough emails to deal with in early December.
I have tried putting in a dummy team. I've tried counting wins as P percent wins for various values of P. But I can't replicate the order of Wolfe's rankings. That might have to do with the fact that Wolfe ranks all 706 college football teams, whereas I'm only ranking the D1 teams (with an additional "generic 1AA team" included to soak up the games against 1AA teams.). Or he might have some elegant fix that I'm not aware of. Maybe in February or March I'll remember to email him and ask.
This entry was posted on Friday, December 15th, 2006 at 5:37 am and is filed under BCS, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.