The P-F-R Win Probability Model
The roots of our win probability model lie in the theory put forth in chapters 43 and 45 of Wayne Winston’s book Mathletics.
Using previous research by Hal Stern, Winston posited that the final margin of victory for an NFL team in a given game can be approximated as a normal random variable with a mean of the Vegas line and a standard deviation between 13-14. (Winston and Stern’s exact number, derived from the 1981, 1983, & 1984 regular seasons, was 13.86; I’m using 13.45, based on the overall NFL average from 1978-2012.)
To quote Winston:
“A normal random variable can assume fractional values, but the final margin of victory in a game must be an integer. Therefore we estimate the probability that the home team wins by between a and b points (including a and b, where a < b) is: probability(margin is between a - 0.5 and b + 0.5). The Excel function
gives us the probability that a normal random variable with the given mean and sigma is less than or equal to x.
(If) the Colts are a 7-point favorite in Super Bowl XLI, what is the probability that they will win the game?
Here we assume the point spread equals the mean outcome of the game. The Colts can win with a final margin of 1 point or more or win with, say, a 0.5 probability if regulation time ends in a tie. The probability the Colts win by 1 or more:
= 1 - NORMDIST(0.5,7,13.86,TRUE) = 1 - 0.3196 = 0.6804.
The probability regulation ends in tie:
= NORMDIST(0.5,7,13.86,TRUE) - NORMDIST(-0.5,7,13.86,TRUE) = 0.0253.
Therefore, we estimate the Colts’ chance of winning Super Bowl XLI to be 0.6804 + 0.5*(.0253) = 0.693.”
This forms the structural basis of our win probability model. We run the formula above (using 13.45 as the standard deviation of scoring margin instead of 13.86) for every game, plugging in the Vegas line as the mean expected point margin to generate pregame win probabilities for each team.
During games, the process gets slightly more complicated. First, we need to modify Winston’s formula to account for the diminishing amount of time remaining in the game. To quote Winston again:
“If we assume that the changes in margins during different parts of the game are independent and follow the same distribution (the technical term is identically distributed), then the standard deviation of the margin during n minutes of [a] game is:
(game standard deviation of margin) / sqrt(fraction of game that n minutes is)”
Using the 13.45 standard deviation we derived earlier, that formula is as follows for NFL games:
STDEV = (13.45 / SQRT((60 / minutes_remaining)))
So after 1 quarter, the expected standard deviation of scoring margin goes from 13.45 at pregame to 11.65, etc.
In addition to modifying the standard deviation about the mean, we also need to adjust the mean (the Vegas line) itself to account for the reduced amount of time remaining in the game. Though he doesn’t address this issue directly in Mathletics, in an email exchange with P-F-R, Winston suggested to scale down the Vegas line linearly based on how much time had elapsed. For instance, if the pregame mean is +3 for 60 minutes, then after a quarter (for the remaining 45 minutes) it would be 0.75 * 3 = +2.25.
This means the home team’s probability of winning after a quarter of play -- assuming perfectly neutral possession, down, distance, and field-position conditions -- can be computed using the following Excel function:
Accounting for Down/Distance/Field Position
The equations above work fine for the beginning of each half, since conditions are by definition neutral (possession isn’t carrying over from a previous quarter). However, to compute in-game probabilities for any other situation, we have to make one last modification to account for who currently has the ball, in addition to their down, distance, and field position.
Recall that in 2012, we introduced Expected Points (EP). EP measures the average number of future net points we would expect to be produced on the very next scoring play of the game (regardless of which team does the scoring).
As an example, when a team has the ball on 1st and 10 at their own 20, their Expected Points are 0.28, meaning the next scoring play of the game is likely to net them 0.28 points on average. Gain 3 yards on 1st down (making it 2nd and 7 from the 23), and the EP falls to 0.14, since it accounts for a decreased probability of getting the first down, in addition to what’s likely to happen after a punt. Conversely, gain 10 yards on 1st down (making it 1st & 10 from the 30), and the EP grows to 0.94 -- the product of a new set of downs, as well as better field position.
In other words, EP captures the expected average scoring consequences of the current game situation. This makes it perfect for our win probability calculations, which to this point have accounted for scoring margin, time remaining, and the Vegas line, but also assumed neutral game conditions. EP can help us handle that final missing piece, if we add EP to the current margin of the game (generating a de facto “current expected margin” based on game conditions) and plug that into the formula above instead of the actual point margin.
For instance, from Winston’s formula we would expect a team (say, “Los Angeles”) favored by 3 and leading by 7 with 10 minutes left in the game to win 91.3% of the time. But if the opponent had the ball on 1st & 10 from L.A.’s 20 -- a 4.24 EP situation for the opponent -- Los Angeles’ WP would fall to 72.3%, which is a more accurate snapshot of the current situation. This is the modification we apply to all plays (other than the start of each half) when computing the WP metric you see in the graphs and tweets.
In addition to the above method of assessing win probability, in the 2016 offseason we introduced different model within the last 5 minutes of the half and the end of the game. This attempts to more closely match what we think a coach might decide to do in a given game situation - obviously as the game winds down there is a very big difference between a 2 point lead and a 4 point lead.
For instance, if a team is trailing by 2 on the 20 yard line with 10 seconds to go, we don't want to give them the 4.24 EP as outlined above because this would overestimate their probability of winning - the real probability should be more closely tied to the probability of making a field goal from that spot, since that is the most likely next step.
For these calculations, we use similar inputs to the above (the score differential, yard line, down, distance, time remaining, original Vegas line) and plug it into a different model when within the last 5 minutes of the game that we've found more closely matches actual outcomes.
Since this is our first foray into the world on NFL win probability (a field that others, most notably Brian Burke, have occupied for a while now), we fully expect there to be questions, comments, and critiques, even though we’re generally pretty happy with the version described above. As always, please email us with any feedback, and we’ll be happy to respond and/or consider changes to the system if appropriate.
 “On the Probability of Winning a Football Game,” American Statistician 45, no. 3 (August 1991): 179–83