SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com ยป Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

David Romer’s paper III

Posted by Doug on May 15, 2006

As you may have guessed, this is a continuation of David Romer's paper II which is a continuation of David Romer's paper I. The first one is optional I suppose, but to understand this one you need to read the second.

The question is: how did Romer arrive at his function that associates a numerical point value to a first-and-10 on any yard line?

First, he took three year's worth of NFL game logs. Then he threw away the last three quarters of each game and worked with only the first quarters. He did this so he could assume that teams were in point maximization mode, and also to avoid the effects of end-of-half and end-of-game maneuvering. Next, he distilled the data down to only 101 situations. Situations 1 through 99 are first-and-10 at the given yard line (1 means your own 1 and 99 means your opponent's 1). Situation 100 is a kickoff from the 30. Situation 101 is a free kick from the 20. So a game log that looks like this:


Patriots KICKOFF to Jets 3, returned to Jets 20.
1st-and-10 at Jets 20 - Martin rushes for 8 yards
2nd-and-2 at Jets 28 - Pennington to Coles for 5 yards
1st-and-10 at Jets 33 - Pennington pass incomplete
2nd-and-10 at Jets 33 - Martin rushes for 2 yards
3rd-and-8 at Jets 33 - Pennington sacked for -4 yards
4th-and-12 at Jets 29 - Jets punt to Patriots 42. Fair catch.
1st-and-10 at Pats 42 - Brady to Branch for 22 yards.
1st-and-10 at Jets 36 - Dillon rushes for no gain.
2nd-and-10 at Jets 36 - Dillon runs for 36 yard TD.
Extra point good.
Patriots KICKOFF to Jets 3. Returned to Jets 37.
1st-and-10 at Jets 37 - Pennington pass intercepted by Bruschi, returned for TD.
Extra point good.
Patriots KICKOFF to Jets 2. Returned to Jets 25.
. . .

Would now look like this:


Patriots ball, situation 100
Jets ball, situation 20
Jets ball, situation 33
Patriots ball, situation 42
Patriots ball, situation 64
[Patriots score 7 points]
Patriots ball, situation 100
Jets ball, situation 37
[Patriots score 7 points]
Patriots ball, situation 100
. . .

Now, let's look at which situations led to which other situations, and how many points were scored in between. We'll look at this just from the Jets' standpoint, which means that we'll really think of there being 202 situations, which we'll call situations 1 through 101 and -1 through -101. We'll define Situation 20, for example, to mean it's the Jets' ball on their 20 whereas Situation -20 means its the Patriots ball on the Patriots 20. Here is the data again:


Situation -100 leads to situation 20 (no points scored)
Situation 20 leads to situation 33 (no points scored)
Situation 33 leads to situation -42 (no points scored)
Situation -42 leads to situation -64 (no points scored)
Situation -64 leads to situation -100 (-7 net points scored)
Situation -100 leads to situation 37 (no points scored)
Situation 37 leads to situation -100 (-7 net points scored)
. . .

Now imagine you have 800 games worth of logs that look like that. Let's define V_i to be the value of Situation i. Our goal is to find V_i for all 202 situations. How to do that?

Well first of all, we declare that V_-i = -V_i. That is, if any given situation is worth, say 3 points to the offense, then it must by definition be worth -3 points to the defense. So now we just have to find the values for the positive situations.

Now, the value of Situation i is the average net points that all situation is led to immediately, plus the average value of the situations that resulted after a situation i.

Look at the log above. The Jets went from a situation 20 to a situation 33 and scored no points in between. The value of that particular instance of situation 20 to the Jets was the points they got (zero) plus the value of the next situation (situation 33). Mathematically:


V_20 = 0 + V_33

Now if we scoured the data for all the situation 20s that occurred for all teams in the data set, then we could average together the resulting values to get an overall value for situation 20. The equation would be:


V_20 = (average immediate net points from all situation 20s)
+
(average value of the resulting situations)

So V_20 is going to be defined in terms of V_33 and probably all the other Vs too. Likewise, each of those Vs is going to be defined in terms of all the other Vs. We have 101 values we want to find and we want them, collectively, to solve 101 equations. Does this sound familiar? Careful readers of this blog will notice that it's the exact same setup we used to put a point value on teams in this post. We're using it to put a point value on situations here.

In the team context described in the above-linked post, that mathematical method takes into account point margin, strength of opponents, strength of opponents' opponents, strength of opponents' opponents' opponents, and so on. In this context, the same method takes into account, for each situation, the net points scored from that that situation, the net points scored from the situations that it leads to, and from the situations those situations lead to, and so on.

I'm currently reading a book called The Wisdom of Crowds, by James Surowiecki. I'm only a few chapters in, but I can already recommend it with confidence. However, Surowiecki summarizes Romer's paper in Chapter Three, and he gets this part wrong:

When [Romer] was done, he had figured out the value of a first down at every single point on the field. A first-and-ten on a team's own twenty yard line was worth a little bit less than half a point --- in other words, if a team started from its own twenty yard line fourteen times, on average it scored just one touchdown.

I know that he is trying to simplify things, but this is a very important point if you want to understand the paper. The half-a-point value of a first at the 20 includes not only the points that you might score on that drive, but also the points your opponent might score with the field position you're likely to give them if you don't score, and the points you're likely to score with the field position they give you after they do or don't score, and so on.

Tomorrow, I'll wrap up this discussion with a quick summary of some other work that's been done on point values, fourth downs, and punting.

This entry was posted on Monday, May 15th, 2006 at 4:17 am and is filed under Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.