Posted by Chase Stuart on November 23, 2009
I don't advocate gambling on football games, and neither does the P-F-R blog. Point-spread data are very useful as historical guides to understanding the perception at any point in time and to measure how the public may improperly value certain teams. The past is never a perfect prediction of the future, and the results of this post are intended for educational purposes, only.
About a year ago I wrote a preliminary post on how to grade the best defenses in NFL history. I focused on four categories to rank defenses, as I didn't think there was one best stat to use. Today I'm going to use four "basic" categories to grade each team; rushing yards per carry, net yards per pass attempt, rushing yards per carry allowed, and net yards per pass attempt allowed.
I'm ignoring things like touchdowns, fumbles and interceptions. Why? Interception rates are essentially random, and fumble recovery rates are too. Touchdowns are slightly more predictable, but they don't correlate with future success as well as yards do. Therefore, instead of assigning some arbitrary value to touchdowns scored, I chose to leave them out. I could probably improve the formula by assigning a small weight for touchdowns (and maybe an even smaller weight to turnovers), but I'm trying to use some "basic" stats. On the other hand, I'm leaving in sack and sack yardage data, based on the work done by Jason to show that such numbers are predictable.
So I'm measuring, roughly, each team's ability to run, pass, stop the run and defend the pass. I could have, but chose not, to combine runing and pass defense. I'm not measuring these things as well as I could -- most significantly, I did not adjust for strength of schedule -- but I wanted to keep the data simple. I merely looked at how well, relative to league average, each team was in the four categories through their first ten games of the season.
I'll use the '01 Rams as an example. St. Louis rushed 416 times for 2,027 yards, an average of 4.87 yards per carry. The NFL average through ten games that season was 4.06 YPC, which (rounding errors aside) means the Rams averaged 0.82 more YPC than the league average. The Rams had an astounding 4,663 net passing yards (passing yards minus sack yards lost) on 591 net attempts (passes plus sacks); that average of 7.89 NY/A was 2.02 net yards per "attempts plus sacks" better than average. These Rams were pretty good on defense, too: they allowed 1,374 rushing yards on 366 carries in their first ten games; that 3.75 YPC allowed average was 0.30 YPC better than average. Their pass defense was even better, as the Rams allowed only 5.28 NY/A, 0.59 NY/A better than average.
I then added up each team's performance in each category relative to average to get a "team grade" relative to league average. The Rams '01 rating of 3.72 yards per play better than average (based on being 0.82 YPC and 2.02 NY/A better than average on offense, and 0.30 YPC allowed and 0.59 NY/A allowed tougher than average) was the best through ten games of any team from 1988 to 2007. I picked those years because I don't yet have 2008 point spread data in my database yet and because I wanted 20 years worth of data. In fact, here are the top 20 teams in terms of yards per play better than average through 10 games:
Okay, so now what? The very best teams teams are about 15-20 points better than league average, so (and this is where it gets really mathematical) I multiplied each team's "yards per play relative to average" score by five. That's it.
So now the Rams go from being 3.72 yards per play better than average to being 18.6 points per game better than average. I did this for all of the 601 team-seasons (through ten games) from '88 to '07. Now here's what you should be thinking: Chase picked four relatively arbitrary stats, combined them in a totally arbitrary way without explaining why, and then multiplied them by a number he picked from his you know what. How could these possibly be useful?
I then looked at the 11th, 12th, 13th and 14th games of the season played by those 601 teams. I set a point spread for each game, where the spread was equal to the difference between the two teams in my "points over average" score plus three points to the home team. So when the '01 Rams (+18.6) played in Atlanta (-9.1) in week twelve, I set the point spread at St. Louis -24.8 points. The actual point spread was St. Louis -8; that difference of 17 points was the largest difference between my projected point spread and the actual point spread in the study. There were 1202 games played in my data set (games 11-14 of the '88 to '07 seasons); 508 times (42%) the actual point spread was within three points of my projected point spread.
To see how my system did, however, you need to look at the most extreme games. In 462 games, my projected point spread differed from the actual point spread by 5.0 points or more. The team my system would say was underrated by the point spread covered in 297 of those games and failed to cover in 152 of them; thirteen games were a push. A 297-152-13 record translates to a .657 winning percentage against the spread.
If you focus only on games where the projected and actual point spreads differed by 8.0 points or more, the teams heavily underrated by the point spread were 136-63-7, for a .677 winning percentage. Bump the requirement to 10.0 points differential or more, and the undervalued teams went 74-29-4, an incredible 0.710 winning percentage.
What's this mean? Whatever inputs are used to figure out the point spread, two things are clear: some information is being overvalued (likely things such as record, turnovers, red zone efficiency, return touchdowns against teams besides the Steelers) and other things are undervalued (rushing and passing efficiency on both sides of the ball). My formula created to pick "winners" was not very advanced; in addition to some fuzzy math, I totally ignored important things like strength of schedule (made doubly bad since, no doubt, the points spread takes this into account), injuries, and all of the great advances Jason has made with respect to the intricacies of home field advance (many of them available here). So what do we do now?
Come up with predicted point spreads for games over the next four weeks, and see how this formula works. We won't have a large enough sample size to feel very confident even if it works -- there might be only 15 games where the numbers say a team is a really good bet -- but we might as well track them starting this season.