A very simple ranking system
Posted by Doug on Monday, May 8, 2006
My friend Joe Bryant says that the BCS bowl matchups are like getting a shrimp cocktail at Morton's Steakhouse. Sure, it's better than what you normally eat, but at the same time it's frustrating and disappointing because you can see a bunch of far preferable alternatives right there in front of your eyes. I tend to agree. Nonetheless, it is not in any way an exaggeration to say that the BCS revived my interest in college football. Not because of the matchups the system has produced, but because it gave me an excuse to learn some very interesting mathematics.
As you probably know, the participants in the BCS championship game are determined in part by a collection of computer rankings. Those computer rankings are implementing algorithms that "work" because of various mathematical theorems. At some point, I'm going to use this blog to write down everything I know about the topic (which by the way is a drop in the bucket compared to what many other people know; I am not an expert, just a fan) in language that a sufficiently interested and patient non-mathematician can understand.
I'll start that off today by describing one of the most basic ranking algorithms.
The idea is to define a system of 32 equations in 32 unknowns. The solution to that system will be collection of 32 numbers and those numbers will serve as the ratings of the 32 NFL teams. Define R_ind as Indianapolis' rating, R_pit as Pittsburgh's rankings, and so on. Those are the unknowns. The equations are:
R_ind = 12.0 + (1/16) (R_bal + R_jax + R_cle + . . . . + R_ari)
R_pit = 8.2 + (1/16) (R_ten + R_hou + R_nwe + . . . . + R_det)
.
.
.
R_stl = -4.1 + (1/16) (R_sfo + R_ari + R_ten + . . . . + R_dal)
One equation for each team. The number just after the equal sign is that team's average point margin. In plain English, the first equation says:
The Colts' rating should equal their average point margin (which was +12), plus the average of their opponents' ratings
So every team's rating is their average point margin, adjusted up or down depending on the strength of their opponents. Thus an average team would have a rating of zero. Suppose a team plays a schedule that is, overall, exactly average. Then the sum of the terms in parentheses would be zero and the team's rating would be its average point margin. If a team played a tougher-than-average schedule, the sum of the terms in parentheses would be positive and so a team's rating would be bigger than its average point margin.
It would be easy to find the Colts' rating if we knew all their opponents' ratings. But we can't figure those out until we've figured out their opponents' ratings, and we can't figure those out until. . ., you get the idea. Everyone's rating essentially depends on everyone else's rating.
So how do you actually find the set of values that solves this system of equations? In high school you probably learned how to solve 2-by-2 and maybe 3-by-3 systems of equations by putting some numbers into a matrix, doing some complicated operations on that matrix, and then reading the solutions off the new matrix. Same thing here, except you've got a 32-by-32 matrix instead of a 2-by-2 matrix. If you wanted college football rankings, it'd be 120-by-120. I recommend using a computer.
It's more instructive, though, to solve it a different way. We'll start by giving everyone an initial rating, which is just their average point margin. I'll use the Colts as an example. Their initial rating is +12.0. Now look at the average of their opponents' intial ratings:
Opp Rating
===========
ari -4.75
bal -2.12
cin 4.44
cle -4.31
hou -10.69
hou -10.69
jax 5.75
jax 5.75
nwe 2.56
pit 8.19
ram -4.12
sdg 6.62
sea 11.31
sfo -11.81
ten -7.62
ten -7.62
Those average -1.2, so the Colts' new rating will be 12.0 - 1.2, which is 10.8. So after this calculation the Colts' rating changed from +12 to +10.8. But meanwhile, every other team's rating changed as well, so we have to do the whole thing over again with the new ratings. On the second pass, the Colts schedule looks a bit different:
Opp Rating
===========
ari -4.76
bal -1.49
cin 4.09
cle -3.85
hou -9.69
hou -9.69
jax 4.85
jax 4.85
nwe 3.09
pit 8.02
ram -5.16
sdg 8.62
sea 8.99
sfo -10.77
ten -7.30
ten -7.30
The average of these is -1.1, so the Colts' opponents aren't quite as bad as they looked at first. Indy's new rating is 12.0 - 1.1, which is 10.9. Uh oh! Everyone else's ratings just changed again, so we've got to run through the same procedure again. And again. And again. And eventually the numbers stop changing. When that happens, you know you've arrived at the solution. Take a look at the Colts schedule with the final rankings and you'll be able to convince yourself that this method works:
OPP Adj
WK OPP Margin Rating Margin
==============================
1 bal 17 -1.83 15.17
2 jax 7 4.76 11.76
3 cle 7 -4.22 2.78
4 ten 21 -7.57 13.43
5 sfo 25 -11.15 13.85
6 ram 17 -5.15 11.85
7 hou 18 -10.03 7.97
9 nwe 19 3.14 22.14
10 hou 14 -10.03 3.97
11 cin 8 3.82 11.82
12 pit 19 7.81 26.81
13 ten 32 -7.57 24.43
14 jax 8 4.76 12.76
15 sdg -9 9.94 0.94
16 sea -15 9.11 -5.89
17 ari 4 -4.98 -0.98
==============================
AVERAGE 12.0 -1.20 10.80
==============================
How to read this table: in week 1, the Colts beat the Ravens by 17. The Ravens were, all things considered, 1.83 points worse than average, so the Colts got a "score" of 17 - 1.83, or 15.17 for that game. In week 2, the Colts beat the Jaguars by 7. Jacksonville was 4.76 points better than average, so the Colts get an 11.76 for that game. Average their scores for each game and you've got their rating. The bottom line says:
The Colts' won their games by an average of 12 points each. Their opponents were, on average, 1.2 points worse than average. Thus the Colts were 10.8 points better than average.
Let's examine some of the features of this system:
- The numbers it spits out are easy to interpret - if Team A's rating is 3 bigger than Team B's, this means that the system thinks Team A is 3 points better than Team B. With most ranking algorithms, the numbers that come out have no real meaning that can be translated into an English sentence. With this system, the units are easy to understand.
- It is a predictive system rather than a retrodictive system - this is a very important distinction. You can use these ratings to answer the question: which team is stronger? I.e. which team is more likely to win a game tomorrow? Or you can use them to answer the question: which of these teams accomplished more in the past? Some systems answer the first questions more accurately; they are called predictive systems. Others answer the latter question more accurately; they are called retrodictive systems. As it turns out, this is a pretty good predictive system. For the reasons described below, it is not a good retrodictive system.
- It weights all games equally - every football fan knows that the Colts' week 17 game against Arizona was a meaningless exhibition, but the algorithm gives it the same weight as all the rest of the games.
- It weights all points equally, and therefore ignores wins and losses - take a look at the Colts season chart above. If you take away 10 points in week 3 and give them back 10 points in week 4, you've just changed their record, but you haven't changed their rating at all. If you take away 10 points in week 3 and give back 20 points in week 4, you have made their record worse but their rating better. Most football fans put a high premium on the few points that move you from a 3-point loss to a 3-point win and almost no weight on the many points that move you from a 20-point win to a 50-point win.
- It is easily imressed by blowout victories - this system thinks a 50-point win and a 10-point loss is preferable to two 14-point wins. Most fans would disagree with that assessment.
- It is slightly biased toward offensive-minded teams - because it considers point margins instead of point ratios, it treats a 50-30 win as more impressive than a 17-0 win. Again, this is an assessment that most fans would disagree with.
- This should go without saying, but - I'll say it anyway. The system does not take into account injuries, weather conditions, yardage gained, the importance of the game, whether it was a Monday Night game or not, whether the quarterback's grandomother was sick, or anything else besides points scored and points allowed.
This system, like all systems, has some drawbacks, but it has the virtue of simplicity. It is easy to understand and it produces numbers that are easy to interpret. That is not to be sneezed at.
Furthermore, most of its drawbacks have easy fixes. For example, when computing a team's initial rating --- i.e. their average point margin --- you can tweak the individual game margins to make the initial rating "smarter." One way to do that is to cap the margin of victory at 21 points, or 14 points or whatever you want. You can explcitly incorporate wins and losses by giving the winning team a bonus of 3 points or 10 points or however many you want. To take it to the extreme, you could simply define all wins to be one-point wins and all losses to be one-point losses. This removes margin of victory from the scene completely. As usual, when you tweak the method to stengthen its weaknesses, you also weaken its strengths. In particular, if you use a modified margin of victory, the numbers don't have as nice an interpretation.
I'll close with some rankings. Here are the NFL's 2005 regular season rankings according to the original method:
Team Rating StrOfSched
=============================
1. ind 10.8 -1.2
2. den 10.8 2.2
3. sdg 9.9 3.3
4. sea 9.1 -2.2
5. pit 7.8 -0.4
6. nyg 7.5 0.7
7. kan 7.0 2.1
8. was 6.0 1.9
9. car 5.1 -3.2
10. jax 4.8 -1.0
11. cin 3.8 -0.6
12. dal 3.2 2.1
13. nwe 3.1 0.6
14. chi 1.4 -2.2
15. mia -0.8 -0.8
16. tam -1.0 -2.6
17. atl -1.2 -1.9
18. bal -1.8 0.3
19. phi -2.3 2.6
20. oak -2.8 3.0
21. min -3.5 -1.1
22. gnb -3.7 -0.8
23. cle -4.2 0.1
24. ari -5.0 -0.2
25. ram -5.1 -1.0
26. buf -5.8 0.2
27. nyj -6.4 0.8
28. det -6.7 -1.0
29. ten -7.6 0.1
30. hou -10.0 0.7
31. nor -11.1 -0.9
32. sfo -11.1 0.7
Here they are if every win of less than 7 points is counted as a 7-point win and if the margin of victory is capped at 21.
Team Rating StrOfSched
=============================
1. den 10.1 1.6
2. ind 9.9 -1.4
3. sea 7.1 -1.9
4. sdg 6.9 2.9
5. nyg 6.3 0.7
6. pit 6.1 -0.6
7. was 5.5 1.6
8. kan 5.4 1.7
9. car 4.8 -2.3
10. jax 4.8 -1.1
11. cin 3.8 -0.9
12. dal 3.6 1.6
13. nwe 2.8 0.7
14. chi 1.5 -1.8
15. tam 0.9 -1.9
16. mia 0.6 -0.7
17. atl -0.4 -1.3
18. min -1.8 -1.1
19. phi -1.9 2.1
20. cle -3.2 -0.2
21. bal -3.4 0.3
22. oak -3.6 2.6
23. gnb -4.9 -0.5
24. buf -5.1 0.3
25. ram -5.1 -0.7
26. ari -5.1 -0.1
27. nyj -5.8 0.8
28. det -6.0 -0.8
29. ten -6.7 -0.1
30. sfo -8.1 0.5
31. nor -9.2 -0.4
32. hou -9.8 0.5
Here they are with margin of victory removed altogether:
Team Rating StrOfSched
=============================
1. den 0.69 0.07
2. ind 0.66 -0.09
3. sea 0.50 -0.12
4. jax 0.42 -0.08
5. nyg 0.42 0.04
6. was 0.37 0.12
7. pit 0.34 -0.03
8. kan 0.33 0.08
9. cin 0.31 -0.07
10. sdg 0.29 0.17
11. nwe 0.26 0.01
12. chi 0.26 -0.11
13. tam 0.25 -0.13
14. car 0.24 -0.13
15. dal 0.22 0.09
16. mia 0.06 -0.07
17. min 0.05 -0.07
18. atl -0.06 -0.06
19. phi -0.14 0.11
20. bal -0.23 0.02
21. cle -0.26 -0.01
22. ram -0.28 -0.03
23. oak -0.36 0.14
24. ari -0.37 0.01
25. buf -0.37 0.01
26. det -0.41 -0.03
27. sfo -0.44 0.06
28. nyj -0.45 0.05
29. gnb -0.49 0.01
30. ten -0.50 0.00
31. nor -0.63 -0.01
32. hou -0.71 0.04
ADDENDUM: I need to clarify one thing about the simple rating system: it’s not my system. I didn’t invent it. In fact, it’s one of those systems that has been around for so long that no one in particular is credited with having developed it (as far as I know anyway). People were almost certainly using it before I was born. I like the system and use it a lot because it’s fairly easy to interpret and understand, and because the math behind it is nifty. But I just realized that I had never been clear enough about the fact that it’s not my system. I just use it.
This entry was posted on Monday, May 8th, 2006 at 4:06 AM and filed under BCS, Statgeekery. Follow comments here with the RSS 2.0 feed. Post a comment or leave a trackback.

i would never sneeze at that system! the vibron (which is often sneezed at) does something similar for fantasy points allowed per fantasy position. It sounds like I need to adjust for strength of schedule for my strength of schedule adjustment. i need a tissue. gls.
Posted on 08-May-06 at 10:01 am | PermalinkDon't you need to figure out a team A's opponents margin of victory in games they didn't play against team A in order to get their true strength?
Posted on 08-May-06 at 4:23 pm | PermalinkThis is pretty cool. I'm not sure what the best thing to do with it is though.
Posted on 08-May-06 at 5:18 pm | PermalinkDave, that's essentially what the iteration does for you. Note how the Colts' SOS looked a little bit better the second time around than it did the first. That happened, in part, because the first pass adjusted those teams' strengths to account for the fact that they played the Colts.
Posted on 08-May-06 at 5:42 pm | PermalinkChase, I will show you a SUPER DUPER cool application of this system later in the week.
Posted on 08-May-06 at 7:38 pm | PermalinkAh, yes thats right. I did change to the iteration method not too long ago in my power rankings.
Posted on 09-May-06 at 2:43 am | Permalink[...] Support pro-football-reference.com « A very simple ranking system [...]
Posted on 09-May-06 at 5:05 am | PermalinkThis is very cool.
I was wondering - would it make some sense to add a weighting to the rankings somehow? Say we're in week 8 of the season, and we're looking ahead to week 9 - I think you could argue that the performace of teams in week 8 would have a better predicitive ability the performaces in week 1.
I would think doing a weighted margin wouldn't be too hard - but I'm not sure how (or if you even should) go about weighting OPP Ranking.
And to complicate things more, Home/Away differences should probably be factored in as well.
Very cool stuff.
Posted on 09-May-06 at 9:14 am | PermalinkMattyP,
One easy way to conceptualize different weights for different games is to just count more recent games as more games. In other words, say it's week 9 and you want to weight week 8's games as 3/11, week 7's games as 2/11, and week 1--6's games as 1/11 each. You could just put an extra copy of every week 7 game and two extra copies of every week 8 game into the data. Now, instead of 8 games, every team would have 11 "games," three of which would be duplicates. Then plug right into the same system and voila: weighted rankings.
[Now the hard thing would be to construct a system of this type that takes into account the teams rating at the time the game was played. Playing the Eagles at the beginning of 2005 was much different from playing them at the end of 2005 but this model doesn't capture that, and I don't see how to easily modify it to do so.]
Home/away can be added to the model fairly easily as well. You add one more unknown (HFA) and one more equation (which translates to something like "HFA is the average point margin of the home team over all games minus the average ratings difference of the home team vs the visiting team.") You also modify each team's equation a bit by taking the home and road games into account. So instead of 32 equations and 32 unknowns, you've got 33 and 33. But you can still solve it the same way.
For college ratings, it's worthwhile to add this because some teams play more home games than road games, and the stronger team is more often at home. In the NFL, where every team plays the same number of home and road games, it won't change anyone's rating at all. You could get a value for home field advantage, but it would just be the average point margin of the home team.
I suppose for in-season NFL ratings, it might be worth doing. I guess it makes sense to do it for 2005 only in the NFL too, because of the Giants-Saints game.
Posted on 09-May-06 at 9:46 am | PermalinkDoug,
I think I was subconsciously making the jump to use the rankings as a predictive tool - that's where the weighting and home/away factoring came in.
I'm a big fan of Jeff Sagarin's rankings in the USA Today (and on his website), so it's cool to look behind the curtain and see how ranking algotithms work.
Posted on 10-May-06 at 9:02 am | PermalinkI know I'm late in replying to this, but I'm fascinated with the concept, particularly in adjusting the point values to measure different things.
My dad and I used to argue that teams should be measured on 1) W-L %, 2) Strength of schedule, and 3) Margin of loss. This came about in one college season where two teams who played similar schedules each had one loss, but one team lost by seven and the other by a huge margin.
Would it be possible, Doug, to do a ranking that puts a margin of one point on all wins, but has no such limits on losses?
And if you're ever out of blog ideas, I for one would love a step-by-step instruction method on how to set up rankings on our own, using Excel or whatever.
Posted on 12-May-06 at 8:42 pm | PermalinkInteresting idea. It's not immediately clear how to tweak this system to do that because this system requires that everything balances just right. The total point margin for the entire league has to be zero.
I guess you could try this: say Texas beats OU by 37 points. Give +1 to Texas, and -37 to OU, and put the other +36 into a common pool that then gets re-distributed at the end of the year among all wins by all teams. Then you've got your initial ratings that would average zero over all teams. I'm not 100% sure that this work (I mean mathematically work) because I haven't thought completely through what would have to happen at the next steps.
And even if it works mathematically, I'm not sure it would "work" the way you and your old man want it to. I think it might push a 10-1 team with a 20-point loss down beneath some 7-4 teams with close losses, and I'm not sure you want to do that. Of course, I guess you could cap the loss margin at 14 or something.
Interesting idea. I'll throw it on the list.
Posted on 13-May-06 at 7:09 am | Permalink[...] Back in this post, I described a simple iterative ranking scheme. Like all rating systems, that one has its strengths and weaknesses. [...]
Posted on 23-May-06 at 4:11 am | Permalink[...] If you go by the basic power rating system, the Chargers were the third best team in the NFL last year with a rating of +9.9, which means that, if you adjust for the schedule they played, they were about 9.9 points better than an average team. According to that metric, the Chargers were the third-best team since the merger to be watching the postseason on TV: [...]
Posted on 25-May-06 at 4:07 am | Permalink"It is easily imressed by blowout victories"
Couldn't that be fixed by changing from straight margin of victory to something that gave most of the credit for the win/loss and only some for margin?
For example, what if we took the log of the margin and added one point for a win or loss (no point for a tie). Assuming a binary log, this would make a 32 point win or loss a margin of 6 (1 + log base 2 of 16 = 1 + 4). Twice a four point win (1 + log base 2 of 4 = 1 + 1 = 2), three times a two point win (1 + log base 2 of 2 = 1 + 1 = 2) and six times a single point win (1 + log base 2 of 1 = 1 + 0 = 1). Increasing the base of the log increases the value of winning versus margin.
Another possibility would be to mark *all* wins as worth the average point margin. For example, if games had an average point margin of four (meaning that the winning team out scored the losing team by an average of four points), then each win would be worth four points. Then mark the losses by the actual margin. In this system, a 10-1 team would have to be blown out by a *lot* to be worse than a 7-4 team. I think that this might make Vince's suggestion work.
Posted on 27-May-06 at 1:22 pm | PermalinkI guess I needed to read more first... Doug already did my second suggestion in his May 23rd post.
I still think that using logarithms rather than actual margins would be interesting.
Posted on 27-May-06 at 1:57 pm | Permalink[...] For each simulated season, I will assign each team a true strength which is a random number from a normal distribution with mean 0 and standard deviation 6. This means that the teams’ true strengths are mostly somewhat close to zero. In particular, roughly two-thirds of all teams will have true strengths between -6 and +6, about 95% of all teams will have true strengths between -12 and +12. As you probably guessed, these numbers were rigged so that they generally agree with the values that the simple rating system produces for real NFL seasons in this decade. [...]
Posted on 01-Jun-06 at 6:16 am | PermalinkI do power rankings just based on yds and ppg differential. I would like an easier way to adjust my strength of schedule. This looks like something I'd like to try. How do I set it up on a spreadsheet, I've never done log. My numbers are very equal to the top numbers for ncaa football, sagarin and other top rankings. They were tracked here http://tbeck.freeshell.org/fb/results.txt. I started being tracked in the second half of the season, and did very well. thanks
Posted on 25-Mar-07 at 2:35 pm | PermalinkDear Doug, thanks for the simple point differential power ranking system. Could you combine point differential with win/loss percent giving them equal weight? Or perhaps giving wins about two-thirds weight and point differential one-third? Could you apply your simple rankings to the 1998 Vikings and the 73 Rams, please. Two of my biggest heartaches. Thank you. Quinton.
Posted on 06-Sep-07 at 2:16 pm | PermalinkDoug - I'm just reading this "A Very Simple Ranking System" and I placed it in a Excel Spread Sheet. When I continually run the figures they grow instead of eventually stop changing. I am obviously doing something wrong.
Posted on 05-Nov-07 at 1:55 am | PermalinkThe Colts change from 12.00 to 10.80. Then using the 10.80, I run it again and the Colts change from 12.00 to 10.90. Then using the 10.90 I run it again and the Colts change from 12.00 to 10.84 and so on and so on but the figures never stop changing.
Those average -1.2, so the Colts’ new rating will be 12.0 - 1.2, which is 10.8. So after this calculation the Colts’ rating changed from +12 to +10.8. But meanwhile, every other team’s rating changed as well, so we have to do the whole thing over again with the new ratings. On the second pass, the Colts schedule looks a bit different:
The average of these is -1.1, so the Colts’ opponents aren’t quite as bad as they looked at first. Indy’s new rating is 12.0 - 1.1, which is 10.9.
I don't get it. You use the average MOV for the starting rating of each team, then you adjusted it by the average MOV for opponents to create the first rating. But, when you calculate the second rating, shouldn't it be the adjusted rating (10.8) plus the average of the opponents adjusted rating? You keep using the MOV for the team and adjust it by new ratings average. That doersn't seem right.
Posted on 24-Dec-07 at 12:12 pm | PermalinkJoe,
If we did what you suggest, then the algorithm couldn't possibly converge unless or until every team had an SOS of zero (do you see why?)
The rating system described in the above post is, of course, just one of many, many ways to derive a set of ratings. And I'm not necessarily claiming it's better than any other particular scheme. But it does "work" in the sense that every team's rating will always equal its nominal point margin plus (or minus) its SOS.
Posted on 24-Dec-07 at 1:37 pm | PermalinkI don't have the programming chops to accomplish this feat, but wouldn't it better to calculate this table by using point ratios.
Posted on 13-Apr-08 at 10:38 am | PermalinkDoug,
Taking a second look at this recently, it occurred to me what you are doing. The Colts won by 12 points per game. You are trying to determine how to ration those 12 points among the Colts and among their opponents. So the first time, you determine that their opponents were 1.2 points worse than average, so the Colts accounted for 10.9. Then the second go round has them the opponets at -1.1, so the Colts account for 10.9. Always the two should be 12. This is a long way of saying I get it.
Posted on 21-Apr-08 at 9:46 am | PermalinkDoug,
Posted on 21-Jun-08 at 1:20 am | PermalinkVery interesting information if anyone has an excel sheet that would not mind sharing and a little time I would like to learn on the fly, To help me pass the time in Iraq.
[...] PFR’s: Explanation of SRS [...]
Posted on 26-Sep-08 at 12:34 pm | PermalinkDoug,
I am a little confused. First you say that the way to figure out a team's rating is to solve a system of equations in which every team's rating depends on their opponents' ratings which depend on their opponents' ratings and so on. That makes complete sense to me. Set up a 32 x 32 matrix and solve it, done.
Then you go through this Colts example with all these iterations. Is the Colts example meant to be an alternate method of calculating the ratings, instead of solving the system of equations? Or is the Colts example an illustration of what is actually happening when you solve the system of equations?
P.S. I apologize for commenting on a post that is 2 and a half years old, but I just recently found the site.
Posted on 20-Nov-08 at 12:03 pm | PermalinkWell, it's still solving the same equations. It's just doing it via iteration. But basically, yes. It's an alternate way of thinking about and deriving the same set of rankings.
Posted on 21-Nov-08 at 7:03 pm | PermalinkSo, in theory, if I wanted to determine who, statistically, had the best defense, run defense, or pass defense, etc... over the past 50 years - I could use this type of matrix to arrive at a pre-determined value to pit the 85 bears against the 78 Steelers?
Posted on 30-Nov-08 at 4:25 pm | Permalink[...] the quality of team based upon point differential, strength of schedule, and quality of wins (an explanation of SRS can be found here). There are other variations on this system (John Hollinger has a system that isn’t [...]
Posted on 20-Sep-09 at 8:34 pm | Permalink