SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com ยป Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

Game similarity scores

Posted by Doug on September 22, 2006

Back in this post, I took a quick ad hoc look at teams that started the season similarly to this season's Atlanta Falcons. As promised, I decided to do a more systematic analysis of all 32 teams. Here it is.

STEP 1: for every pair of team games since 1978, build a similarity score according to the following formula:

  1. In traditional Bill James fasion, we start with 1000

  2. if one team won and the other lost, subtract 1000.

  3. if one team was on the road and the other was at home, subract 200.

  4. from whatever is left, subtract (Team1Score - Team2Score)^2 + (Team1OppScore - Team2OppScore)^2 + (Team1Margin - Team2Margin)^2

Yes, I know there is some redundancy there, and the formula might be amenable to some algebraic tinkering that would make it look prettier. I basically just messed around until I came up with something that looked generally OK. Why squared? No good reason. Why 200 for home/road? Because it seemed sort of OK. That's about the level of rigor I was using.

STEP 2: For each team in the 2006 NFL, compare them with every other team since 1978 and construct a first-two-weeks similarity score with that team, according to the formula: Similarity = Week1Similarity + Week2Similarity. Why compare week 1 to week 1 and week 2 to week 2, instead of just comparing the first two weeks as a whole? I'm not sure I can defend that.

STEP 3: For each team in the 2006 NFL, look at the twenty most comparable teams and compute their average eventual record, weighted according to how strong the similarity is.

Here are the top twenty comparables to the 2006 Atlanta Falcons:


TM Year Week 1 Week 2 SIM Record
===============================================
atl 2006 R 20- 6 H 14- 3
phi 1981 R 24-10 H 13- 3 1966 10- 6-0
ind 2005 R 24- 7 H 10- 3 1942 14- 2-0
rai 1983 R 20-10 H 20- 6 1914 12- 4-0
tam 2005 R 24-13 H 19- 3 1876 11- 5-0
mia 1998 R 24-15 H 13- 7 1836 10- 6-0
pit 1998 R 20-13 H 17-12 1776 7- 9-0
det 2000 R 14-10 H 15-10 1762 9- 7-0
ram 1978 R 16-14 H 10- 0 1750 12- 4-0
jax 2001 H 21- 3 H 13- 6 1748 6-10-0
kan 1992 R 24-10 H 26- 7 1744 10- 6-0
ram 1986 R 16-10 H 16-13 1736 10- 6-0
atl 1998 R 19-14 H 17-12 1728 14- 2-0
sea 2004 R 21- 7 R 10- 6 1724 9- 7-0
tam 1980 R 17-12 H 10- 9 1722 5-10-1
pit 1978 R 28-17 H 21-10 1708 14- 2-0
buf 1980 H 17- 7 H 20-10 1688 11- 5-0
rai 1987 R 20- 0 H 27- 7 1662 5-10-0
cle 1978 H 24- 7 H 13-10 1660 8- 8-0
jax 2004 R 13-10 H 7- 6 1656 9- 7-0
mia 1979 R 9- 7 H 19-10 1656 10- 6-0

The weighted (and scaled to 16 games) average of those teams' win totals was 9.9 wins, and we might view that as a reasonable over/under for Atlanta's 2006 win total. According to this method, here are the projected win totals for each team this season:


TM Proj Wins
==============
ind 10.1
atl 9.9
chi 9.8
cin 9.8
nor 9.8
bal 9.8
jax 9.6
sea 9.6
sdg 9.6
nwe 9.6
min 9.1
ari 8.9
dal 8.5
pit 8.5
den 8.1
buf 8.0
phi 7.9
nyg 7.7
stl 7.6
nyj 7.5
sfo 7.0
mia 6.4
hou 6.3
kan 6.2
was 5.9
det 5.8
ten 5.6
car 5.6
gnb 5.6
cle 5.4
tam 5.3
oak 5.3

The main takeaway lesson from this exercise is the same as in this post: don't get too excited or too upset about early season results. The Bears have been destroying teams, and this method projects them to win only 7.8 of their remaining 14 games. The Raiders have looked as bad as an NFL team can look, and the method says they'll win 5.3 games.

In some sense, this exercise is just a whole lot of work to get (I'm assuming something very close to) the same results you'd get by running a simple regression of wins versus first-two-weeks record and scoring margin. But I like this method better, because it's not a black box.

You say the Bears should expect to win X games this year. Your friend calls BS: haven't you seen how dominant they've looked? If regression is what you've got, it's tough to give a decent counterargument unless he understands regression. But this method lays the reasoning right out there in a crystal clear way: the 1986 Falcons won their first two games by a combined 41 points and they ended up winning 7 games. The 1994 Seahawks won their first two by scores of 28-7 and 38-9, and they finished at 6-10. That is, of course, the same kind of information that your regression was taking into account, but it's just so much more transparent here.

Also, while I doubt it's actually happening here, this method is theoretically capable of picking out subtle combinations of things that regression wouldn't tell you, because you wouldn't think to ask it. For example, I don't know if this means anything or not, but it's intriguing that the Colts, who have scored a lot of points and also given up a lot, project better than the Chargers, Ravens, and Bears, who have scored a lot and given up almost none.

For those who want to investigate, here are those four teams' comparable lists:


TM Year Week 1 Week 2 SIM Record
===============================================
ind 2006 R 26-21 H 43-24
stl 1985 R 27-24 H 41-27 1948 5-11-0
nyj 1987 R 31-28 H 43-24 1922 6- 9-0
nor 2002 R 26-20 H 35-20 1902 9- 7-0
den 1993 R 26-20 H 34-17 1864 9- 7-0
atl 2004 R 21-19 H 34-17 1828 11- 5-0
den 1998 H 27-21 H 42-23 1796 14- 2-0
jax 1997 R 28-27 H 40-13 1750 11- 5-0
ram 1989 R 31-21 H 31-17 1732 11- 5-0
sfo 1984 R 30-27 H 37-31 1690 15- 1-0
kan 2003 H 27-14 H 41-20 1662 13- 3-0
sfo 1995 R 24-22 H 41-10 1642 11- 5-0
rai 1982 R 23-17 R 38-14 1624 8- 1-0
sea 1985 R 28-24 R 49-35 1604 8- 8-0
det 2004 R 20-16 H 28-16 1600 6-10-0
sea 1988 R 21-14 H 31-10 1578 9- 7-0
was 1978 R 16-14 H 35-30 1546 8- 8-0
dal 1983 R 31-30 R 34-17 1544 12- 4-0
pit 1992 R 29-24 H 27-10 1526 11- 5-0
mia 1990 R 27-24 H 30- 7 1512 12- 4-0
nwe 1999 R 30-28 H 31-28 1510 8- 8-0


TM Year Week 1 Week 2 SIM Record
===============================================
sdg 2006 R 27- 0 H 40- 7
gnb 1996 R 34- 3 H 39-13 1840 13- 3-0
sea 1994 R 28- 7 R 38- 9 1690 6-10-0
gnb 2001 H 28- 6 H 37- 0 1664 12- 4-0
cin 2005 R 27-13 H 37- 8 1636 11- 5-0
atl 1986 R 31-10 H 33-13 1594 7- 8-1
rai 1987 R 20- 0 H 27- 7 1564 5-10-0
buf 1981 H 31- 0 R 35- 3 1526 10- 6-0
phi 1980 H 27- 6 R 42- 7 1520 12- 4-0
den 2003 R 30-10 R 37-13 1516 10- 6-0
sdg 2002 R 34- 6 H 24- 3 1498 8- 8-0
tam 1992 H 23- 7 H 31- 3 1492 5-11-0
sfo 1996 H 27-11 H 34- 0 1472 12- 4-0
sea 1998 R 38- 0 H 33-14 1464 8- 8-0
mia 1981 R 20- 7 H 30-10 1428 11- 4-1
kan 1992 R 24-10 H 26- 7 1330 10- 6-0
sdg 1979 R 33-16 H 30-10 1330 12- 4-0
sea 2003 H 27-10 R 38- 0 1322 10- 6-0
buf 2003 H 31- 0 R 38-17 1320 6-10-0
mia 1996 H 24-10 R 38-10 1284 8- 8-0
mia 1984 R 35-17 H 28- 7 1278 14- 2-0


TM Year Week 1 Week 2 SIM Record
===============================================
bal 2006 R 27- 0 H 28- 6
rai 1987 R 20- 0 H 27- 7 1896 5-10-0
sdg 2002 R 34- 6 H 24- 3 1888 8- 8-0
atl 1986 R 31-10 H 33-13 1770 7- 8-1
gnb 1996 R 34- 3 H 39-13 1740 13- 3-0
kan 1992 R 24-10 H 26- 7 1708 10- 6-0
mia 1981 R 20- 7 H 30-10 1682 11- 4-1
sea 1998 R 38- 0 H 33-14 1660 8- 8-0
sdg 1979 R 33-16 H 30-10 1584 12- 4-0
sdg 1996 H 29- 7 H 27-14 1576 8- 8-0
dal 1981 R 26-10 H 30-17 1572 12- 4-0
mia 1984 R 35-17 H 28- 7 1564 14- 2-0
tam 1992 H 23- 7 H 31- 3 1560 5-11-0
sea 1994 R 28- 7 R 38- 9 1556 6-10-0
ram 1988 R 34- 7 H 17-10 1540 10- 6-0
sea 1984 H 33- 0 H 31-17 1534 12- 4-0
cin 2005 R 27-13 H 37- 8 1528 11- 5-0
den 2003 R 30-10 R 37-13 1508 10- 6-0
rai 1984 R 24-14 H 28- 7 1504 11- 5-0
pit 2005 H 34- 7 R 27- 7 1496 11- 5-0
dal 1995 R 35- 0 H 31-21 1494 12- 4-0


TM Year Week 1 Week 2 SIM Record
===============================================
chi 2006 R 26- 0 H 34- 7
gnb 1996 R 34- 3 H 39-13 1840 13- 3-0
rai 1987 R 20- 0 H 27- 7 1830 5-10-0
atl 1986 R 31-10 H 33-13 1764 7- 8-1
sdg 2002 R 34- 6 H 24- 3 1744 8- 8-0
sea 1994 R 28- 7 R 38- 9 1698 6-10-0
cin 2005 R 27-13 H 37- 8 1672 11- 5-0
mia 1981 R 20- 7 H 30-10 1672 11- 4-1
kan 1992 R 24-10 H 26- 7 1624 10- 6-0
tam 1992 H 23- 7 H 31- 3 1616 5-11-0
sea 1998 R 38- 0 H 33-14 1598 8- 8-0
den 2003 R 30-10 R 37-13 1594 10- 6-0
gnb 2001 H 28- 6 H 37- 0 1586 12- 4-0
sdg 1979 R 33-16 H 30-10 1540 12- 4-0
buf 1981 H 31- 0 R 35- 3 1508 10- 6-0
mia 1984 R 35-17 H 28- 7 1494 14- 2-0
dal 1981 R 26-10 H 30-17 1488 12- 4-0
sfo 1996 H 27-11 H 34- 0 1480 12- 4-0
rai 1984 R 24-14 H 28- 7 1472 11- 5-0
sdg 1996 H 29- 7 H 27-14 1432 8- 8-0
sea 1984 H 33- 0 H 31-17 1424 12- 4-0

The main objection I'd anticipate to this method is that it doesn't account for strength of schedule. The Chargers have only played the Raiders and Titans, who are terrible. The Falcons have played the Panthers and Bucs, who are, well, we don't really know what they are. As JKL pointed out in the comments to a previous post, it's very difficult to put a strength of schedule number on any of these teams right now (except for the teams that have played the Patriots, who have Tom Brady and are therefore known to be great). What do we use? Last year's record? This year's record? Some sort of power ranking scheme applied to the first two weeks? I'm not sure. And the uncertainty is enough to make me want not to waste a lot of effort trying to account for it.

This entry was posted on Friday, September 22nd, 2006 at 4:17 am and is filed under General, History, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.