SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com ยป Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

Playoff data mining

Posted by Doug on January 5, 2007

Following some good comments from this post, I decided to take a deeper look at the determinants of playoff success using, as JKL suggests, similarity scores.

For every playoff (non Super Bowl) game since the beginning of the 12-team playoff format in 1990, I recorded the following bits of data for each team:


  1. Their regular season record.

  2. Their record in the last six games of the regular season.

  3. Their regular season strength of schedule (overall W/L percentage of their opponents, with games against the team in question removed).

  4. Their regular season point differential.

  5. Whether or not it was a home game.

  6. Whether or not they had a bye the previous week.

Then I compared each team to its playoff opponent in each of those categories. So, for example, the Philadelphia Eagles look like this for this weekend's game against the Giants:

Record: two games better than New York's. This gets recorded as '+2'

Last six games record: Eagles 5-1, Giants 2-3, so the Eagles are three games better: +3

Strength of schedule: .038 worse than the Giants. This is recorded as -.038.

Point differential: Philly's was +70 and New York's was -7. So this is a +77 for Philly.

Home field: yes

Bye last week: neither team had one (obviously) so this is a zero.

Now I searched through all 320 historical playoff teams to find the ones that look most like the current Eagles. Here are the top 15:


TM YR R OPP W L6 SOS PD H B SIM Res
==================================================
phi 2 3 -0.038 77 1 0
sdg 2004 w nyj 2 2 -0.042 61 1 0 880 L
sfo 1996 w phi 2 2 -0.033 119 1 0 853 W
was 1999 w det 2 2 -0.075 67 1 0 853 W
gnb 1995 w atl 2 2 0.013 77 1 0 849 W
gnb 2004 w min 2 2 -0.017 34 1 0 835 L
ind 2004 w den 2 2 0.025 94 1 0 819 W
nyg 1997 w min 1 3 -0.069 47 1 0 789 L
min 1992 w was 2 1 -0.025 80 1 0 784 L
nwe 2003 c ind 2 2 0.000 -1 1 0 784 W
phi 2001 w tam 2 1 -0.046 91 1 0 778 W
nyg 2000 c min 1 2 -0.042 56 1 0 775 W
sea 2005 c car 2 1 -0.012 49 1 0 746 W
sdg 1992 w kan 1 2 -0.029 28 1 0 742 W
pit 1995 c ind 2 1 0.008 65 1 0 741 W
dal 1993 c sfo 2 2 0.037 -31 1 0 716 W

SIM is the similarity score; 1000 is the maximum. You can see that the teams at the top of the list do indeed have very similar profiles to the 2006 Eagles in their matchup against the Giants. Because of the way this is set up, the Giants' list of comparable teams will necessarily be the opponents of the teams listed above.

When you get to the bottom of the list, claiming similarity is a dicey proposition. The 1995 Steelers' and 1993 Cowboys' matchups certainly have some similarity to this Eagle matchup, but there are also some significant differences. For one thing, the 95 Steelers and 93 Cowboys were playing conference championship games while this Eagles' game is only in the wildcard round. You might argue --- and I won't disagree too strongly --- that wildcard games should only be compared to wildcard games, divisional games to divisional games, and so on. That's fine, but it limits an already-small data set.

As usual, we have to make some choices about the tradeoff between sample size and sample relevance. We've got to draw the line somewhere, and I felt that including all playoff rounds and looking at the 15 most comparable matchups achieved about the right balance.

Eleven of these 15 teams won, so we might estimate the Eagles chances at 11/15, which is about 73.3%. I think it makes sense to weight the more similar teams more heavily. In this case, the more comparable teams didn't do quite as well; a weighted average gives the Eagles a 72.5% chance. A weighted average of the scores of those fifteen games is 26.2 - 15.4, a 9-point spread.

The books have installed Philly as a 7-point favorite. The money lines vary a bit from place to place, but they all seem to be consistent with a Eagle win probability in the neighborhood of 72.5%.

Here are the other matchups (cover your eyes, Chase):


TM YR R OPP W L6 SOS PD H B SIM Res
==================================================
ind 3 0 0.021 51 1 0
hou 1991 w nyj 3 0 0.071 114 1 0 887 W
ind 2003 w den 2 0 0.000 31 1 0 859 W
ind 2005 d pit 3 0 -0.025 61 1 1 844 L
stl 2001 c phi 3 1 0.000 95 1 0 835 W
atl 2004 d stl 3 0 -0.046 76 1 1 808 W
mia 1994 w kan 1 0 0.025 41 1 0 786 W
nyg 1993 w min 2 0 -0.050 96 1 0 784 W
gnb 2002 w atl 3 1 -0.035 -18 1 0 774 L
pit 1995 c ind 2 1 0.008 65 1 0 773 W
buf 1993 c kan 1 0 -0.008 50 1 0 769 W
dal 1996 w min 1 0 0.050 53 1 0 769 W
kan 1991 w rai 1 0 0.037 69 1 0 765 W
sea 2005 c car 2 1 -0.012 49 1 0 764 W
min 1994 w chi 1 0 0.042 78 1 0 752 L
den 2005 c pit 2 1 0.017 6 1 0 750 L
WEIGHTED AVERAGE: 73.8 pct chance of victory
PROJECTED SCORE: 24.9-17.8

TM YR R OPP W L6 SOS PD H B SIM Res
==================================================
nwe 2 0 0.038 127 1 0
phi 2004 c atl 2 1 0.029 123 1 0 887 W
nyg 1993 w min 2 0 -0.050 96 1 0 881 W
nwe 1996 c jax 2 -1 0.058 115 1 0 867 W
ind 2003 w den 2 0 0.000 31 1 0 866 W
min 1998 c atl 1 0 0.021 107 1 0 862 L
buf 1993 d rai 2 0 0.013 107 1 1 854 W
hou 1991 w nyj 3 0 0.071 114 1 0 854 W
dal 1995 d phi 2 0 0.050 164 1 1 851 W
min 1994 w chi 1 0 0.042 78 1 0 847 L
kan 1991 w rai 1 0 0.037 69 1 0 841 W
dal 1996 w min 1 0 0.050 53 1 0 814 W
jax 1999 c ten 1 0 -0.038 111 1 0 808 L
pit 1995 c ind 2 1 0.008 65 1 0 808 W
mia 1994 w kan 1 0 0.025 41 1 0 801 W
oak 2002 d nyj 2 1 0.040 123 1 1 794 W
WEIGHTED AVERAGE: 80.1 pct chance of victory
PROJECTED SCORE: 24.5-16.1

TM YR R OPP W L6 SOS PD H B SIM Res
==================================================
sea 0 0 -0.004 -81 1 0
cin 2005 w pit 0 0 -0.017 -60 1 0 966 L
nyg 2005 w car 0 0 0.046 -24 1 0 893 L
chi 2005 d car 0 0 0.008 -74 1 1 880 L
cin 1990 w hou 0 -1 -0.046 -90 1 0 849 W
sfo 1997 c gnb 0 -1 0.008 -30 1 0 836 L
phi 1990 w was 0 0 -0.071 17 1 0 835 L
ten 2002 c oak 0 0 -0.054 -103 0 0 827 L
mia 1992 c buf 0 1 -0.042 -39 1 0 820 L
buf 1995 w mia 1 0 -0.054 -51 1 0 819 W
car 2003 w dal 1 0 -0.013 -8 1 0 818 W
sfo 2002 w nyg 0 -1 0.023 -25 1 0 817 W
nor 2000 w stl 0 1 -0.033 -20 1 0 809 W
nyj 2002 w ind -1 0 0.019 -13 1 0 809 W
mia 2000 w ind 1 0 -0.021 -6 1 0 808 W
det 1993 w gnb 1 0 -0.075 -52 1 0 800 L
WEIGHTED AVERAGE: 45.5 pct chance of victory
PROJECTED SCORE: 23.3-23.6

The projections for the Chiefs/Colts and Patriots/Jets are dead on with the Vegas lines. This method likes the Cowboys more than Vegas does and, though this may just be the paranoid Cowboy-hater in me, I have to agree. The Cowboys have been playing poorly for a couple of weeks; the Seahawks have been playing poorly all season. I don't really understand that line, and I'm tempted to make Dallas my lock of the year (NB: last year's lock of the year was Seattle over Pittsburgh in the Super Bowl).

But we're not here to talk about gambling. We're here to have a little fun with a data-mining exercise.

To my mind, the fact that three out of four matchups align perfectly with the best available probability estimates (which is what the vegas lines are) is extremely encouraging. It says the method is reasonable, but there is still room for surprises, from which we might learn something.

This entry was posted on Friday, January 5th, 2007 at 5:16 am and is filed under General, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.