This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

More on rating systems: margin of victory loss

Posted by Doug on May 23, 2006

Back in this post, I described a simple iterative ranking scheme. Like all rating systems, that one has its strengths and weaknesses.

That system is not one of the systems actually included in the BCS selection process, because a few years ago the BCS mandated that all their computer ranking algorithms must completely ignore margin of victory. This is a controversial topic among aficianados of ranking algorithms.

One one hand, the margin of victory contains extra information. If you know that Team A beat Team B, that tells you something about the relative strengths of the two teams. But if you know that Team A beat Team B by 31 points --- or by one point --- you know more about the relative strengths. You don't know everything, of course, but you know more, and it just makes good sense to include more data rather than less. On the other hand, using margin of victory is in some abstract sense contrary to the point of almost all sports, and football in particular. The only purpose of the score is to determine a winner. The team that wins by 31 may have looked more impressive than the team that wins by a single point, but The Institution Of Sport does not recognize them as having accomplished anything different. A win is a win.

In general, I can see the merits of both sides of the debate. But in the particular case of using mathematical algorithms to help determine which teams play in the official national championship game, as with the BCS, it certainly does make sense to remove margin of victory from consideration. Otherwise, teams would have incentive to run up the score needlessly in a game that is already essentially over, which is almost universally considered poor sportsmanship (I don't necessarily agree with that almost-universally-held view, by the way, but that's another post.) But whether it's bad sportsmanship or not, incentives change behavior. And at the very least, including margin of victory gives teams incentive to attempt to inject false information into the equations.

Anyway, a reader named Vince posted this in the comments to the above-linked post:

My dad and I used to argue that teams should be measured on 1) W-L %, 2) Strength of schedule, and 3) Margin of loss. This came about in one college season where two teams who played similar schedules each had one loss, but one team lost by seven and the other by a huge margin.

Would it be possible to do a ranking that puts a margin of one point on all wins, but has no such limits on losses?

Vince and his dad are a couple of sharp dudes. By treating all wins --- but not all losses --- equally, we can capture some of the information contained in the score without giving teams any incentive to run it up. So I starting playing around to see if I could figure out a way to make it mathematically feasible. And I think I did. Here is the plan:

  1. Figure out the average margin of victory in all games during the course of the season. In the 2005 NFL, it was about 11.7. That is, the winning team scored, on average, 11.7 more points than the losing team.
  2. Count every win as +11.7 points, and every loss as -N points, where N was the actual margin of the game. So a one-point loss is -1, and a 20-point loss is -20. A one-point win is +11.7, and a 20-point win is +11.7.
  3. Compute each team's average point margin using to the strange accounting system described above. For example, the Chargers were 9-7 last year. Their seven losses were by a total of 43 points, so their average point margin would be (11.7 * 9 - 43) / 16, which is about +3.9. A team that went 16-0 would have a margin of +11.7, while a team that went 0-16 might have a margin anywhere from -1 to -40 depending on how lopsided their losses were.
  4. Now you've got a collection of average point margins that sum to exactly zero, so you can plug into the same system we used previously. Simply adjust the ratings repeatedly until they stabilize.

Here are the ratings for the 2005 NFL using this scheme:

TM Rating StrOfSched
1. den 9.6 1.9
2. ind 7.2 -1.5
3. sdg 6.8 2.9
4. sea 6.3 -1.8
5. nyg 6.1 0.9
6. was 5.8 1.9
7. kan 5.5 1.8
8. jax 5.2 -1.6
9. pit 4.8 -0.8
10. dal 4.7 1.7
11. car 4.3 -1.9
12. nwe 3.4 0.7
13. cin 2.6 -1.2
14. tam 2.5 -1.9
15. chi 2.1 -1.9
16. mia 1.3 -0.5
17. atl -0.4 -1.0
18. phi -1.8 2.3
19. min -2.6 -0.9
20. oak -2.7 2.6
21. bal -2.9 -0.2
22. ram -3.2 -1.1
23. cle -3.4 -0.5
24. ari -4.5 -0.3
25. gnb -4.7 -0.6
26. buf -5.1 0.5
27. nyj -5.1 1.1
28. det -5.5 -0.7
29. ten -7.8 -0.4
30. nor -9.1 -0.0
31. sfo -9.1 0.7
32. hou -10.1 0.0

Remember I said there is no incentive to run up the score? In fact there is a disincentive to do so. If you run up the score, you do nothing to your average point margin (because all wins are counted the same), but you do hurt your opponent's point margin. This weakens your strength of schedule, which actually lowers your rating. Here is some "proof." The Colts beat the Cardinals 17-13 in the last game of the season last year. If we change that score to 57-13, here are the new ratings:

TM Rating StrOfSched
1. den 9.9 2.0
2. ind 7.1 -1.7
3. sdg 7.0 3.0
4. sea 6.0 -2.3
5. nyg 6.0 0.7
6. was 5.7 1.8
7. kan 5.7 1.9
8. jax 5.1 -1.8
9. pit 5.1 -0.7
10. dal 4.6 1.5
11. car 4.4 -1.8
12. nwe 3.7 0.9
13. tam 2.8 -1.7
14. cin 2.8 -1.0
15. chi 2.3 -1.8
16. mia 1.6 -0.3
17. atl -0.2 -0.8
18. phi -1.9 2.1
19. min -2.4 -0.8
20. oak -2.5 2.8
21. bal -2.7 -0.1
22. cle -3.2 -0.4
23. ram -3.6 -1.5
24. gnb -4.5 -0.4
25. buf -4.8 0.7
26. nyj -4.9 1.4
27. det -5.5 -0.7
28. ari -7.1 -0.4
29. ten -7.9 -0.6
30. nor -8.9 0.1
31. sfo -9.5 0.3
32. hou -10.3 -0.2

The Cards drop four spots, as Vince and his dad think they should, but the Colts' rating also dropped just a hair. Instead of giving teams incentive to score, score, score in the closing moments of an already-decided contest, this system would actually give teams incentive to let the other team score. It's a kindler, gentler rating system. Hooray for everyone!

In all seriousness, though, I can't envision that becoming a practical problem if a system like this were installed as part of the BCS formula. One question I have is whether this system would produce college football ratings that look reasonable to most people. I think it would, but we'll have to run the numbers to find out for sure. I'll put that on the to-do list.