SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

# Pro Football Reference Blog

## Game similarity scores

Posted by Doug on September 22, 2006

Back in this post, I took a quick ad hoc look at teams that started the season similarly to this season's Atlanta Falcons. As promised, I decided to do a more systematic analysis of all 32 teams. Here it is.

STEP 1: for every pair of team games since 1978, build a similarity score according to the following formula:

2. if one team won and the other lost, subtract 1000.

3. if one team was on the road and the other was at home, subract 200.

4. from whatever is left, subtract (Team1Score - Team2Score)^2 + (Team1OppScore - Team2OppScore)^2 + (Team1Margin - Team2Margin)^2

Yes, I know there is some redundancy there, and the formula might be amenable to some algebraic tinkering that would make it look prettier. I basically just messed around until I came up with something that looked generally OK. Why squared? No good reason. Why 200 for home/road? Because it seemed sort of OK. That's about the level of rigor I was using.

STEP 2: For each team in the 2006 NFL, compare them with every other team since 1978 and construct a first-two-weeks similarity score with that team, according to the formula: Similarity = Week1Similarity + Week2Similarity. Why compare week 1 to week 1 and week 2 to week 2, instead of just comparing the first two weeks as a whole? I'm not sure I can defend that.

STEP 3: For each team in the 2006 NFL, look at the twenty most comparable teams and compute their average eventual record, weighted according to how strong the similarity is.

Here are the top twenty comparables to the 2006 Atlanta Falcons:

```
TM Year    Week 1   Week 2     SIM    Record
===============================================
atl 2006   R 20- 6  H 14- 3
phi 1981   R 24-10  H 13- 3    1966    10- 6-0
ind 2005   R 24- 7  H 10- 3    1942    14- 2-0
rai 1983   R 20-10  H 20- 6    1914    12- 4-0
tam 2005   R 24-13  H 19- 3    1876    11- 5-0
mia 1998   R 24-15  H 13- 7    1836    10- 6-0
pit 1998   R 20-13  H 17-12    1776     7- 9-0
det 2000   R 14-10  H 15-10    1762     9- 7-0
ram 1978   R 16-14  H 10- 0    1750    12- 4-0
jax 2001   H 21- 3  H 13- 6    1748     6-10-0
kan 1992   R 24-10  H 26- 7    1744    10- 6-0
ram 1986   R 16-10  H 16-13    1736    10- 6-0
atl 1998   R 19-14  H 17-12    1728    14- 2-0
sea 2004   R 21- 7  R 10- 6    1724     9- 7-0
tam 1980   R 17-12  H 10- 9    1722     5-10-1
pit 1978   R 28-17  H 21-10    1708    14- 2-0
buf 1980   H 17- 7  H 20-10    1688    11- 5-0
rai 1987   R 20- 0  H 27- 7    1662     5-10-0
cle 1978   H 24- 7  H 13-10    1660     8- 8-0
jax 2004   R 13-10  H  7- 6    1656     9- 7-0
mia 1979   R  9- 7  H 19-10    1656    10- 6-0
```

The weighted (and scaled to 16 games) average of those teams' win totals was 9.9 wins, and we might view that as a reasonable over/under for Atlanta's 2006 win total. According to this method, here are the projected win totals for each team this season:

```
TM   Proj Wins
==============
ind    10.1
atl     9.9
chi     9.8
cin     9.8
nor     9.8
bal     9.8
jax     9.6
sea     9.6
sdg     9.6
nwe     9.6
min     9.1
ari     8.9
dal     8.5
pit     8.5
den     8.1
buf     8.0
phi     7.9
nyg     7.7
stl     7.6
nyj     7.5
sfo     7.0
mia     6.4
hou     6.3
kan     6.2
was     5.9
det     5.8
ten     5.6
car     5.6
gnb     5.6
cle     5.4
tam     5.3
oak     5.3
```

The main takeaway lesson from this exercise is the same as in this post: don't get too excited or too upset about early season results. The Bears have been destroying teams, and this method projects them to win only 7.8 of their remaining 14 games. The Raiders have looked as bad as an NFL team can look, and the method says they'll win 5.3 games.

In some sense, this exercise is just a whole lot of work to get (I'm assuming something very close to) the same results you'd get by running a simple regression of wins versus first-two-weeks record and scoring margin. But I like this method better, because it's not a black box.

You say the Bears should expect to win X games this year. Your friend calls BS: haven't you seen how dominant they've looked? If regression is what you've got, it's tough to give a decent counterargument unless he understands regression. But this method lays the reasoning right out there in a crystal clear way: the 1986 Falcons won their first two games by a combined 41 points and they ended up winning 7 games. The 1994 Seahawks won their first two by scores of 28-7 and 38-9, and they finished at 6-10. That is, of course, the same kind of information that your regression was taking into account, but it's just so much more transparent here.

Also, while I doubt it's actually happening here, this method is theoretically capable of picking out subtle combinations of things that regression wouldn't tell you, because you wouldn't think to ask it. For example, I don't know if this means anything or not, but it's intriguing that the Colts, who have scored a lot of points and also given up a lot, project better than the Chargers, Ravens, and Bears, who have scored a lot and given up almost none.

For those who want to investigate, here are those four teams' comparable lists:

```
TM Year    Week 1   Week 2     SIM    Record
===============================================
ind 2006   R 26-21  H 43-24
stl 1985   R 27-24  H 41-27    1948     5-11-0
nyj 1987   R 31-28  H 43-24    1922     6- 9-0
nor 2002   R 26-20  H 35-20    1902     9- 7-0
den 1993   R 26-20  H 34-17    1864     9- 7-0
atl 2004   R 21-19  H 34-17    1828    11- 5-0
den 1998   H 27-21  H 42-23    1796    14- 2-0
jax 1997   R 28-27  H 40-13    1750    11- 5-0
ram 1989   R 31-21  H 31-17    1732    11- 5-0
sfo 1984   R 30-27  H 37-31    1690    15- 1-0
kan 2003   H 27-14  H 41-20    1662    13- 3-0
sfo 1995   R 24-22  H 41-10    1642    11- 5-0
rai 1982   R 23-17  R 38-14    1624     8- 1-0
sea 1985   R 28-24  R 49-35    1604     8- 8-0
det 2004   R 20-16  H 28-16    1600     6-10-0
sea 1988   R 21-14  H 31-10    1578     9- 7-0
was 1978   R 16-14  H 35-30    1546     8- 8-0
dal 1983   R 31-30  R 34-17    1544    12- 4-0
pit 1992   R 29-24  H 27-10    1526    11- 5-0
mia 1990   R 27-24  H 30- 7    1512    12- 4-0
nwe 1999   R 30-28  H 31-28    1510     8- 8-0
```

```
TM Year    Week 1   Week 2     SIM    Record
===============================================
sdg 2006   R 27- 0  H 40- 7
gnb 1996   R 34- 3  H 39-13    1840    13- 3-0
sea 1994   R 28- 7  R 38- 9    1690     6-10-0
gnb 2001   H 28- 6  H 37- 0    1664    12- 4-0
cin 2005   R 27-13  H 37- 8    1636    11- 5-0
atl 1986   R 31-10  H 33-13    1594     7- 8-1
rai 1987   R 20- 0  H 27- 7    1564     5-10-0
buf 1981   H 31- 0  R 35- 3    1526    10- 6-0
phi 1980   H 27- 6  R 42- 7    1520    12- 4-0
den 2003   R 30-10  R 37-13    1516    10- 6-0
sdg 2002   R 34- 6  H 24- 3    1498     8- 8-0
tam 1992   H 23- 7  H 31- 3    1492     5-11-0
sfo 1996   H 27-11  H 34- 0    1472    12- 4-0
sea 1998   R 38- 0  H 33-14    1464     8- 8-0
mia 1981   R 20- 7  H 30-10    1428    11- 4-1
kan 1992   R 24-10  H 26- 7    1330    10- 6-0
sdg 1979   R 33-16  H 30-10    1330    12- 4-0
sea 2003   H 27-10  R 38- 0    1322    10- 6-0
buf 2003   H 31- 0  R 38-17    1320     6-10-0
mia 1996   H 24-10  R 38-10    1284     8- 8-0
mia 1984   R 35-17  H 28- 7    1278    14- 2-0
```

```
TM Year    Week 1   Week 2     SIM    Record
===============================================
bal 2006   R 27- 0  H 28- 6
rai 1987   R 20- 0  H 27- 7    1896     5-10-0
sdg 2002   R 34- 6  H 24- 3    1888     8- 8-0
atl 1986   R 31-10  H 33-13    1770     7- 8-1
gnb 1996   R 34- 3  H 39-13    1740    13- 3-0
kan 1992   R 24-10  H 26- 7    1708    10- 6-0
mia 1981   R 20- 7  H 30-10    1682    11- 4-1
sea 1998   R 38- 0  H 33-14    1660     8- 8-0
sdg 1979   R 33-16  H 30-10    1584    12- 4-0
sdg 1996   H 29- 7  H 27-14    1576     8- 8-0
dal 1981   R 26-10  H 30-17    1572    12- 4-0
mia 1984   R 35-17  H 28- 7    1564    14- 2-0
tam 1992   H 23- 7  H 31- 3    1560     5-11-0
sea 1994   R 28- 7  R 38- 9    1556     6-10-0
ram 1988   R 34- 7  H 17-10    1540    10- 6-0
sea 1984   H 33- 0  H 31-17    1534    12- 4-0
cin 2005   R 27-13  H 37- 8    1528    11- 5-0
den 2003   R 30-10  R 37-13    1508    10- 6-0
rai 1984   R 24-14  H 28- 7    1504    11- 5-0
pit 2005   H 34- 7  R 27- 7    1496    11- 5-0
dal 1995   R 35- 0  H 31-21    1494    12- 4-0
```

```
TM Year    Week 1   Week 2     SIM    Record
===============================================
chi 2006   R 26- 0  H 34- 7
gnb 1996   R 34- 3  H 39-13    1840    13- 3-0
rai 1987   R 20- 0  H 27- 7    1830     5-10-0
atl 1986   R 31-10  H 33-13    1764     7- 8-1
sdg 2002   R 34- 6  H 24- 3    1744     8- 8-0
sea 1994   R 28- 7  R 38- 9    1698     6-10-0
cin 2005   R 27-13  H 37- 8    1672    11- 5-0
mia 1981   R 20- 7  H 30-10    1672    11- 4-1
kan 1992   R 24-10  H 26- 7    1624    10- 6-0
tam 1992   H 23- 7  H 31- 3    1616     5-11-0
sea 1998   R 38- 0  H 33-14    1598     8- 8-0
den 2003   R 30-10  R 37-13    1594    10- 6-0
gnb 2001   H 28- 6  H 37- 0    1586    12- 4-0
sdg 1979   R 33-16  H 30-10    1540    12- 4-0
buf 1981   H 31- 0  R 35- 3    1508    10- 6-0
mia 1984   R 35-17  H 28- 7    1494    14- 2-0
dal 1981   R 26-10  H 30-17    1488    12- 4-0
sfo 1996   H 27-11  H 34- 0    1480    12- 4-0
rai 1984   R 24-14  H 28- 7    1472    11- 5-0
sdg 1996   H 29- 7  H 27-14    1432     8- 8-0
sea 1984   H 33- 0  H 31-17    1424    12- 4-0
```

The main objection I'd anticipate to this method is that it doesn't account for strength of schedule. The Chargers have only played the Raiders and Titans, who are terrible. The Falcons have played the Panthers and Bucs, who are, well, we don't really know what they are. As JKL pointed out in the comments to a previous post, it's very difficult to put a strength of schedule number on any of these teams right now (except for the teams that have played the Patriots, who have Tom Brady and are therefore known to be great). What do we use? Last year's record? This year's record? Some sort of power ranking scheme applied to the first two weeks? I'm not sure. And the uncertainty is enough to make me want not to waste a lot of effort trying to account for it.

This entry was posted on Friday, September 22nd, 2006 at 4:17 am and is filed under General, History, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.