Adjusting football’s Pythagorean Theorem
Posted by Doug on July 18, 2007
I got an email recently from a guy named Matt who runs a college sports blog called Statistically Speaking. In his email he referred to a recent post of his where he unveiled an interesting modification of the Pythagorean Theorem for football. First let me tell you just a bit about what the theorem is.
In the early 80s, or possibly even before that, Bill James noted that baseball teams' true strengths could generally be measured more accurately by looking at runs scored and runs allowed than by looking at wins and losses. To be more precise, he found that one can predict future win/loss records more accurately using only past runs scored and runs allowed than using only past wins and losses. To put it another way, if a team had a record of 82-80, but their runs scored and allowed totals were more in line with those of a 76-86 team, then that team should be treated as a 76-86 team for the purposes of predicting next year's record.
So what record "should" a team with RS runs scored and RA runs allowed have had? James came up with the formula:
RS^2
Expected record =~ -----------
RS^2 + RA^2
Because it has some superficial similarities to the Pythagorean Theorem about right triangles that you learned at some point in your youth, it came to be known by the same name. As it turns out, though, you can replace the 2s in the exponents with 1.82s and get slightly better predictions. For football, people have found that an exponent of 2.37 seems to work best. So the Pythagorean Theorem for football looks like this:
PF^2.37
Expected record =~ -----------------
PF^2.37 + PA^2.37
Here is each team's Pythagorean record for 2006.
Tm record PythagRecord ========================= sdg 14- 2 12.1- 3.9 bal 13- 3 12.7- 3.3 chi 13- 3 12.4- 3.6 nwe 12- 4 12.2- 3.8 ind 12- 4 9.6- 6.4 phi 10- 6 9.8- 6.2 nor 10- 6 10.3- 5.7 nyj 10- 6 8.7- 7.3 sea 9- 7 7.8- 8.2 kan 9- 7 8.5- 7.5 den 9- 7 8.4- 7.6 dal 9- 7 9.8- 6.2 ten 8- 8 6.0-10.0 jax 8- 8 10.8- 5.2 stl 8- 8 7.6- 8.4 nyg 8- 8 7.8- 8.2 car 8- 8 6.9- 9.1 gnb 8- 8 6.2- 9.8 pit 8- 8 9.1- 6.9 cin 8- 8 9.1- 6.9 atl 7- 9 6.9- 9.1 buf 7- 9 7.7- 8.3 sfo 7- 9 5.1-10.9 min 6-10 6.6- 9.4 mia 6-10 7.2- 8.8 hou 6-10 5.1-10.9 was 5-11 6.1- 9.9 ari 5-11 6.0-10.0 cle 4-12 4.4-11.6 tam 4-12 3.6-12.4 det 3-13 5.6-10.4 oak 2-14 2.7-13.3
Here is a quick demonstration of the method's ability to predict the future on the group level. I looked at all teams since 1978 with a record of exactly 10-6. Then I divided them into three groups: [1] those with 9.5 or fewer Pythagorean wins (these were the teams we might say were lucky to be 10-6), [2] those with between 9.5 and 10.5 Pythagorean wins (these teams really were morally 10-6 teams), and [3] those with 10.5 or more Pythagorean wins (these, we would speculate, were actually stronger than 10-6 teams). Here's how they did the next year:
Average next year wins
=================================================
9.5 or fewer PWins 7.9
9.5--10.5 PWins 9.3
10.5 or more PWins 9.9
Perhaps a more rigorous proof of the method's power is this regression of Year N+1 wins on the two variables: Year N Wins, and Year N Pythagorean wins:
Predicted Year N+1 wins =~ 4.07 + .12*(Year N Wins) + .38*(Year N Pythag wins)
The coefficient on Pythagorean wins is much larger than that on actual wins, which says that Pythagorean wins are more closely associated with Year N+1 wins than are actual wins. Further, regression fans will want to know that the coefficient on actual wins is only barely significant if at all (p=.07), while the coefficient on Pythagorean wins is highly significant (p=.0000009).
So, we finally get to Matt's new method. I'll quote from his blog post:
blowouts, especially extreme blowouts can artificially inflate or deflate a team's Pythagorean record depending on whether or not they received or doled out the beating. The solution? Compute the Pythagorean winning percentage on a game by game basis, add up the totals, and divide by games played. This way each game is counted the same and the effect of blowouts is lessened.
Here are the 2006 records, Pythagorean records, and adjusted (by Matt) Pythagorean records:
Tm record Pythag New Pythag ==================================== sdg 14- 2 12.1- 3.9 11.5- 4.5 bal 13- 3 12.7- 3.3 11.1- 4.9 chi 13- 3 12.4- 3.6 11.1- 4.9 nwe 12- 4 12.2- 3.8 10.5- 5.5 ind 12- 4 9.6- 6.4 9.3- 6.7 phi 10- 6 9.8- 6.2 9.5- 6.5 nor 10- 6 10.3- 5.7 9.8- 6.2 nyj 10- 6 8.7- 7.3 8.7- 7.3 sea 9- 7 7.8- 8.2 8.4- 7.6 kan 9- 7 8.5- 7.5 8.0- 8.0 den 9- 7 8.4- 7.6 9.3- 6.7 dal 9- 7 9.8- 6.2 9.3- 6.7 ten 8- 8 6.0-10.0 7.0- 9.0 jax 8- 8 10.8- 5.2 9.9- 6.1 stl 8- 8 7.6- 8.4 7.7- 8.3 nyg 8- 8 7.8- 8.2 8.4- 7.6 car 8- 8 6.9- 9.1 7.9- 8.1 gnb 8- 8 6.2- 9.8 7.3- 8.7 pit 8- 8 9.1- 6.9 8.1- 7.9 cin 8- 8 9.1- 6.9 9.3- 6.7 atl 7- 9 6.9- 9.1 7.2- 8.8 buf 7- 9 7.7- 8.3 8.2- 7.8 sfo 7- 9 5.1-10.9 7.0- 9.0 min 6-10 6.6- 9.4 6.6- 9.4 mia 6-10 7.2- 8.8 7.2- 8.8 hou 6-10 5.1-10.9 6.5- 9.5 was 5-11 6.1- 9.9 6.5- 9.5 ari 5-11 6.0-10.0 6.2- 9.8 cle 4-12 4.4-11.6 5.0-11.0 tam 4-12 3.6-12.4 4.4-11.6 det 3-13 5.6-10.4 5.5-10.5 oak 2-14 2.7-13.3 3.8-12.2
Now, the question is: if you know a team's wins, a team's Pythagorean wins, and a team's adjusted Pythagorean wins, what is the relative importance of each of those in predicting the team's Year N+1 wins? Here is what regression says:
Year N+1 wins =~ 3.93 + .11*(Year N wins) + .34*(Year N Pythag wins) + .06*(Year N AdjPythag wins)
The Pythag Wins coefficient is significant and the other two are not. All three inputs are, of course, very highly correlated, which can sometimes cause problems in regressions. What I don't know about the fine points of regression analysis could fill a warehouse, but I think we've got enough data here to conclude that the significance of the regular Pythagorean wins coefficient and the lack of significance for the other two means that Pythagorean wins is generally the better predictor in cases where there is some disagreement among the three.
One more try: a regression of Year N+1 wins on Year N wins and Pythagorean wins has an R^2 of .203. A regression of Year N+1 wins on Year N wins and Year N adjusted Pythagorean wins has an R^2 of .199.
So it appears to me that, for NFL games, the good old fashioned Pythagorean Theorem is no worse, and possibly a tiny, tiny, tiny, tiny bit better, than Matt's adjusted version. However, based on some preliminary investigations, Matt found the opposite in college football.
Is one of us wrong? I don't think so. It seems believable that deflating blowouts a bit would create a better gauge of team strength for college football teams and a (very slightly) worse one for NFL squads. Virginia Tech is so much better than Duke that they can essentially beat them by whatever score they want. Whether they decide to beat them 31-3 or 61-3 doesn't tell us anything more than what we already knew: the Hokies are much, much better. But, as bad as the Raiders may be, there are no Dukes in the NFL, and just about any team has a credible shot at beating any other on a Given Sunday. If you hang a severe blowout on an NFL team, it apparently says something about the relative strength of your teams.
I bet you're right about the difference between NFL and college is in the blowouts.
I did a very similar study last winter, but instead of using points scored and allowed, I used a model of efficiency stats to determine "expected wins." I found that last year's expected wins are reliably better than actual wins to predict the following year's record.
The post is here: https://bbnflstats.blogspot.com/2007/06/next-years-wins.html
And 2007 predictions based on the model are here: https://bbnflstats.blogspot.com/2007/06/2007-team-win-predictions.html
The predictions are just a starting point. Obviously some teams need adjusting, such as NE or (as of yesterday) ATL.
One problem with using the Pythag formula in football is the scoring system. It's a non-continuous system awarding 2,3,6,7,or 8 points for various scoring methods. Baseball only has 1 run increments, which allows the Pythag formula to be a better fit than football.
It seems believable that deflating blowouts a bit would create a better gauge of team strength for college football teams and a (very slightly) worse one for NFL squads.
What does this tell you? Schedules are roughly even in the NFL. They are not in college. Especially when it comes to the skewness of the distribution of opposed team strengths.
It's really just happening because the Pythagorean theorem isn't strength-of-schedule independent. A team that wins by 30, 20, 30 vs 3 very bad teams, and then loses by 7, 10, 7 vs good teams will look like they're better than middle-of-the-pack, when in truth, their schedule was just biased - if a different team faces three bad teams, and wins by 10, 17, and 10, and 3 good teams and loses by 7, 10, 7, they're basically identical to the first team.
Think about it this way: when you see Virginia Tech beat Duke by a gazillion points, your first guess is "Duke is a very bad team" - Virginia Tech can be good, but if they can beat Duke that much, Duke can't be a good team (otherwise VT would be stratospherically good). Or you could just look at the fact that Duke's 1-10.
Deflating blowouts is just a first step towards a schedule correction.
Interestingly, another Matt, who also sometimes posts at another blog called Statistically Speaking rolled this out for baseball. We talk about it here:
https://www.insidethebook.com/ee/index.php/site/comments/the_fallacy_of_pythagorean/
His link is in post 3.
Tom
For the 1990-2007 seasons, 2.535 is the best fit exponent.