SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com » Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

Maximum likelihood with home field and margin of victory

Posted by Doug on December 19, 2006

Before reading this entry, make sure you've read part I and part II in the maximum likelihood series.

How to incorporate home field into a maximum likelihood model

In the basic model, we are trying to maximize the product of all R_i / (R_i + R_j), where this factor represents a game in which team i beat team j. In order to build home field advantage into the model, we need one additional parameter. Let's call it h. Think of it as a multiplier that affect the home team's rating.

Let's look at the same simple "season" we looked at last time:

A beat B
B beat C
C beat A
A beat C

In the basic model, we chose ratings A, B, and C so as to maximize:

P = A/(A+B) * B/(B+C) * C/(A+C) * A/(A+C)

Now let's assume that the home teams in those games were A, C, C, and A. If h is a multiplier that alters the home team's rating, then A's probability of winning that first game isn't A/(A+B), it's hA/(hA+B). And so the quantity to be maximized is:

P = hA/(hA+B) * B/(B+hC) * hC/(A+hC) * hA/(hA+C)

Now instead of having three dials (A, B, and C) to twiddle, we have four dials: A, B, C, and h. But it's the same game: set them all so as to maximize P. Here are the home-field-included rankings through week 14:


TM Rating Record
======================
sdg 5.600 11- 2- 0
ind 4.326 10- 3- 0
chi 4.141 11- 2- 0
bal 3.769 10- 3- 0
nwe 2.451 9- 4- 0
nor 1.688 9- 4- 0
cin 1.678 8- 5- 0
jax 1.545 8- 5- 0
dal 1.322 8- 5- 0
den 1.278 7- 6- 0
nyj 1.216 7- 6- 0
nyg 1.197 7- 6- 0
ten 1.078 6- 7- 0
buf 0.981 6- 7- 0
kan 0.861 7- 6- 0
phi 0.775 7- 6- 0
atl 0.720 7- 6- 0
sea 0.719 8- 5- 0
mia 0.688 6- 7- 0
pit 0.679 6- 7- 0
car 0.550 6- 7- 0
min 0.474 6- 7- 0
cle 0.405 4- 9- 0
gnb 0.404 5- 8- 0
hou 0.364 4- 9- 0
was 0.324 4- 9- 0
stl 0.292 5- 8- 0
sfo 0.273 5- 8- 0
tam 0.247 3-10- 0
ari 0.180 4- 9- 0
oak 0.114 2-11- 0
det 0.088 2-11- 0

HFA = 1.556

If you take two averagish teams, say the Bills and Titans, and plug in the numbers, you get a 63% probability of Tennessee beating Buffalo in Nashville and a 59% probability of the Bills winning that same matchup in Buffalo. If, on the other hand, you have two mismatched teams like the Colts and Lions, then homefield means very little and you get 99% Colts in Indy and 97% Colts in Detroit.

How to incorporate margin of victory into a maximum likelihood model

Several months ago I told you about what I call the very simple rating system. That was a rating system that included only points scored and points allowed (and schedule). It doesn't directly consider wins and losses at all. However, by tinkering just a bit, you can turn it into a system that does consider wins and losses. In fact, you can turn it into a system that only considers wins and losses (and schedule). In doing so, you lose the theoretical elegance of the method, but you might get a system that "works" better. And most of the time that's what you want.

The situation here is similar. Maximum likelihood is a method that only considers wins and losses and doesn't consider margin of victory at all. But with a little tweaking you can turn it into a system that does exactly the opposite or you can set it somewhere in between. Just as is the case with the simple rating system, tweaking the system in this way strips it of some of its abstract beauty. But if it turns it into a tool that is better for the purpose you have in mind, then that's OK.

To incorporate margin of victory, all you have to do is (conceptually) pretend the game is 100 games and then decide based on the final score how you want to divvy up those hundred games between the two teams. Or, to put it another way, you want to award each team some percentage of a win and some percentage of a loss.

The easiest way to do it is to award the entire game to the winner. That's just the basic margin-not-included system we've been talking about.

The other extreme would be to award something like

(1/2) * ( 1 + (WinnerPoints - LoserPoints)/(WinnerPoints + LoserPoints) )

to the winner. So for example a 17-10 win would be worth about .63 wins, while a 37-30 win would be worth about .55 and a 37-10 win would be worth .79. A shutout would always be worth one full win. Using this system, the NFL ratings through week 14 look like this:


TM Rating Record
======================
chi 1.940 11- 2- 0
jax 1.777 8- 5- 0
sdg 1.589 11- 2- 0
bal 1.551 10- 3- 0
dal 1.420 8- 5- 0
cin 1.370 8- 5- 0
nwe 1.360 9- 4- 0
nyg 1.348 7- 6- 0
ind 1.305 10- 3- 0
nor 1.231 9- 4- 0
den 1.211 7- 6- 0
mia 1.074 6- 7- 0
phi 1.071 7- 6- 0
buf 1.057 6- 7- 0
ten 0.946 6- 7- 0
kan 0.937 7- 6- 0
pit 0.924 6- 7- 0
car 0.918 6- 7- 0
atl 0.898 7- 6- 0
nyj 0.861 7- 6- 0
sea 0.854 8- 5- 0
hou 0.848 4- 9- 0
min 0.801 6- 7- 0
was 0.738 4- 9- 0
cle 0.699 4- 9- 0
stl 0.639 5- 8- 0
ari 0.617 4- 9- 0
sfo 0.616 5- 8- 0
det 0.604 2-11- 0
gnb 0.589 5- 8- 0
oak 0.521 2-11- 0
tam 0.483 3-10- 0

HFA = 1.158

The "predictions" now look much more intuitive. Colts over Lions, instead of being a 95+% walkover for the Colts is now seen as a 71% chance of a Colts' win in Indy and a 65% chance of a Colts win in Detroit.

It makes sense that this change in the algorithm would result in much more conservative (and more realistic) predictions of future games. By treating a win as only a partial win, we're allowing the algorithm to use information that our brains are already using when we make a quick top-of-the-head guess. For instance, when the Colts beat Buffalo 17-16 in week 10, it goes down in the standings as one win for the Colts and one loss for the Bills, and the basic maximum likelihood model likewise counts it as a 100% win for the Colts. But the modified model instead sees it as a close game that really could have gone either way but that the Colts happened to win.

The model above treats that Colts win as a 52% win for Indy and a 48% win for Buffalo. Some people might think that goes a bit too far, that winning should count for something extra beyond the point margin. Those folks might use a split like this to the winner:

.6 + .4 * ( (WinnerPoints - LoserPoints)/(WinnerPoints + LoserPoints) )

This guarantees winners at least 60% of the win. Not surprisingly, it will have the effect of making the rankings look more like the standings (but still not as much as the original margin-not-included model):


TM Rating Record
======================
chi 2.169 11- 2- 0
sdg 1.905 11- 2- 0
bal 1.795 10- 3- 0
jax 1.755 8- 5- 0
ind 1.600 10- 3- 0
nwe 1.508 9- 4- 0
cin 1.424 8- 5- 0
dal 1.418 8- 5- 0
nyg 1.329 7- 6- 0
nor 1.323 9- 4- 0
den 1.229 7- 6- 0
buf 1.054 6- 7- 0
phi 1.039 7- 6- 0
mia 1.011 6- 7- 0
ten 0.983 6- 7- 0
kan 0.938 7- 6- 0
nyj 0.919 7- 6- 0
pit 0.897 6- 7- 0
atl 0.883 7- 6- 0
car 0.858 6- 7- 0
sea 0.850 8- 5- 0
hou 0.753 4- 9- 0
min 0.748 6- 7- 0
was 0.656 4- 9- 0
cle 0.649 4- 9- 0
stl 0.576 5- 8- 0
gnb 0.564 5- 8- 0
sfo 0.557 5- 8- 0
ari 0.516 4- 9- 0
det 0.461 2-11- 0
tam 0.441 3-10- 0
oak 0.431 2-11- 0

HFA = 1.204

This entry was posted on Tuesday, December 19th, 2006 at 6:56 am and is filed under BCS, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.