**SITE NEWS:**
We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com ยป Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

## Another ranking system

I'm essentially writing these down for my own benefit, so that if I forget how some of these things work I'll have a document to refer to. If you enjoy reading along, sit a spell. If not, I should be on to different topics tomorrow.

There is a particular style of argument, rarely used in NFL discussions but a staple for college football fans, that is tempting to use because it is based on a very reasonable premise but that is always doomed to lose. You might call it the argument by transitivity. Notre Dame is better than LSU because Notre Dame beat Tennessee and Tennessee beat LSU. Oregon is better than Notre Dame because Oregon killed Stanford and Notre Dame barely beat them. Arizona State is better than Auburn because they beat Northwestern who beat Wisconsin who beat Auburn.

As you know, this argument can't be taken seriously because it can be used to prove that just about any team is better than just about any other team. If you want to have a little fun with it, this page will let you do just that. Now indulge me briefly while I break down the mathematics of this argument.

The scoreboard says:

Tennessee beat LSU by 3

It's not much of a stretch from there to:

Tennessee is 3 points better than LSU

If you wanted to construct a mathematical model out of that bit of information, you might do this:

R_ten - R_lsu = 3

where *R_ten* is Tennessee's rating and *R_lsu* is LSU's. Put that with the rest of your data, though, and your mathematical model is shot. It looks like this:

R_ten - R_lsu = 3

R_lsu - R_vandy = 28

R_vandy - R_ten = 4

[. . . about 800 more equations . . ]

You've got about 800 equations and about 120 unknowns, but you can already tell that there will be no solution. Tennessee's rating has to be bigger than LSU's, LSU's has to be bigger than Vanderbilt's, and Vanderbilt's has to be bigger than Tennessee's. Impossible. Mathematically speaking, there is simply no way to assign a number to every team in such a way that all the results match up with the numbers exactly. That's why the argument by transitivity fails.

At this point, you probably think I'm insulting your intelligence. You understood all that without me having to get all mathy on you. But I needed to get all mathy to describe what happens next. We know the argument by transitivity doesn't work. But it's still popular, and the reason is that it's premise is reasonable. So let's add some extra stuff to give the argument a bit of wiggle room. When Tennessee beats LSU by 3, instead of saying:

R_ten - R_lsu = 3

I'll say

R_ten - R_lsu = 3 + e1

The extra *e1* is a fudge factor. The above equation says, "The difference between Tennesee and LSU is 3 points plus or minus some other stuff that didn't show up on the scoreboard." So our collection of equations now looks like this:

R_ten - R_lsu = 3 + e1

R_lsu - R_vandy = 28 + e2

R_vandy - R_ten = 4 + e3

[. . . about 800 more equations . . ]

Remember that the *e*s represent the stuff that didn't show up on the scoreboard. Since we want our ranking system to be objective, we take the viewpoint that the scoreboard is what matters and the *e*s are there only because they have to be. So what we want to do is make the combined size of the *e*s as small as possible. (For technical reasons that aren't important to the argument, we will want to minimize the sum of the squares of the *e*s rather than the *e*s themselves, but don't worry about that.)

Imagine that you have three dials --- one marked Tennessee, one marked LSU, and one marked Vanderbilt --- on a control panel. You can increase or decrease a team's rating by turning their dial. Now imagine that the total (squared) *e* is the volume. The object is the make the volume as low as possible. If you tune the Tennessee dial higher, then the volume from *e1* goes down, but the volume from *e3* goes up. As you tune LSU's dial, it affects the volume of *e1* and *e2*, and Vandy's affects *e2* and *e3*. The idea is to tune all three dials to a place that achieves the lowest possible volume. Now add 117 dials, each of which affects 11 or 12 *e*s, tune to the lowest possible volume and you've got yourself a rating for all Division I college football teams.

The lower the volume, the lower the sum of the squared *e*s and hence the better that set of ratings matches up with the actual game results. What we want to do is to find the lowest possible total, out of all possible sets of ratings. That would be set of ratings that is the best match for the actual data. A computer, properly programmed, can find this collection of ratings.

To summarize: if you want to play the transitivity game with *any* set of ratings, you're going to run into some contradictions. It's unavoidable. This system is designed to run into as few contradictions as possible. Or, more precisely, to minimize the total magnitude of all the contradictions.

OK, now here's the neat thing: the ranking system described above turns out to be the same as the one described yesterday. The descriptions are different and the mathematical tools used to get the answer are different, but you end up in the same place.

Have you ever, in your life, seen anything cooler than that?

This entry was posted on Tuesday, May 9th, 2006 at 4:15 am and is filed under BCS, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Interesting stuff, but the fatal flaw of all approximations is that performance isn't static. Team X might have been better than Team Y at a certain point in the season, but theoretically worse at a later point. Moreover, Team X simply might match up particularly well with Team Y despite being an inferior team in most respects. All these things can be subjectively evaluated by someone well versed in football who has seen the teams in question play enough to have a general grasp of their performance level, but those people never vote in the polls anyway. If someone thinks that more than a handful of sportswriters know football, they don't know many sportswriters, and then coaches don't have enough time to study all teams in the nation even if they go through the trouble of submitting their own vote. So in the absence of a truly qualified panel of "experts," I put more of my faith in the computers and formulas such as the ones you described than in self-important scribes.

JDB: Which are the voters meant to reward, current form or the whole "body of work"? That's something that's never been clear to me.

For current form, here is a wordy explanation of my method:

Let's say two teams, Auburn and BYU, meet up. Take the difference between their current ratings to predict the result.

So if rating A = 22 and rating B = 18, we expect Auburn to win by 4.

Now let's say Auburn ends up winning by 16. That's 12 off from the predicted result. We adjust each rating by some fraction of that error. About 1/6 usually works best.

We adjust rating A up by 12/6 = 2, and rating B down by 2. Not because Auburn won, but because they did better than expected and BYU did worse than expected.

So rating A becomes 22+2=24, and rating B becomes 18-2 = 16.

There are some extra complications - I do have a function to round off blowouts, and a home field advantage. But that's basically it. You can start with old data and give teams any old rating to start with, and they converge to something that makes sense pretty quickly.

Alternatively, there's this method of finding the best team in college football:

http://www.covehurst.net/ddyte/football/silver.htm