SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

# Pro Football Reference Blog

## The Poisson distribution

Posted by Doug on December 6, 2006

One of my original goals when starting this blog was to highlight some of the mathematics in and of the game of football. I didn't have anything groundbreaking in mind; I just thought it might be nice for a football example to show up when people googled markov chain or Benford's Law or whatever. I was doing a little of that during the offseason, but I've gotten away from it ever since some actual football started getting played. In the comments to last week's Benford's Law post, JKL provides a nice excuse to get back to it:

Do the distribution of TD’s follow a normal or a poisson distribution, or some other distribution pattern.

For all WR’s who score exactly 5 TD’s in a season, do we have the expected number of single 3 TD games from that population as a whole, or are there more or fewer players with exactly 1 TD in 5 different games than we might otherwise expect?

Let's first imagine a receiver whose true ability level is 8 touchdowns per year. In a given season, he might score nine or six or ten, but over several (imagined) seasons he'd average eight per year. What if you wanted to simulate a year's worth of game logs for this player?

One simple model would be to note that this player should average .5 touchdowns per game and then view each game as a coin flip. Heads he scores in that game, tails he doesn't.

That's not a terrible model. It will give the guy 8 TDs per year in the long run. But it's obviously lacking. In the long run, it will predict that half his games will be 1-TD games and the other half will be 0-TD games. Our years of experience reading box scores tell us that's not realistic.

So why not break it down a bit further? Instead of viewing this guy as a .5-TDs-per-game player and then simulating 16 games each as a coin flip with probability .5, we could view him as a .25-TDs-per-half player and then simulate 32 halves each as a coin flip with probability .25. This idealized receiver will still average 8 TDs per year, but now he will have 2-TD games 12.5% of the time, 1-TD games 37.5% of the time, and 0-TD games half the time.

Better.

But why stop there? Let's look at him as a .125 TDs-per-quarter player and simulate 64 quarters. I'll spare you the calculations, but this would result in the following:

```
0-TD games:  58.62% of the time
1-TD games:  33.50% of the time
2-TD games:   7.18% of the time
3-TD games:   0.68% of the time
4-TD games:   0.02% of the time
```

Now that's starting to look relatively realistic.

This "coin-flipping" model is called a binomial model, by the way. Let's stop here and consider a couple of the assumptions implicit in the binomial model. In order to compute the above, we have assumed that each quarter (coin flip) is independent of the others. In other words, the above assumes that Chad Johnson's scoring in the first quarter tells us nothing one way or the other about whether he'll score in the second quarter. There are all sorts of reasons why we might doubt that assumption. Scoring in the first quarter might be a clue that he's playing against a weak secondary, which would indicate an increased chance of TDs in future quarters of the same game. On the other hand, scoring in the first quarter might cause the opposing defense to start double- or triple-covering him, thereby leading to a lower probability of future TDs. And that's just the tip of the iceberg of possible ways this model fails to be literally correct.

But you know what they say: all models are imperfect, some are useful anyway. Let's press on and see what happens.

What if we look at him as a .00833-TDs-per-minute player and then simulate 960 minutes each as a coin flip with probability .00833? What if we look at him as a .0001389-TDs-per-second player and then simulate 57600 seconds?

We're getting into some obvious absurdity here, as this model would yield a chance of this receiver scoring a thousand (or much more) TDs in a season. It would be a very, very, very tiny chance --- so tiny that for all practical purposes it could never happen --- but a chance nonetheless. Furthermore, as we break the season down into more and more pieces, each of which is smaller and smaller, the calculations required are getting uglier and uglier.

Believe it or not, it turns out that the math can be simplified by breaking the season down into infinitely many pieces, each of infinitessimal length (technically, breaking it down into N pieces and then taking the limit as N goes to infinity). When you do that, what you get is this:

```
Prob. of having an N-touchdown game =~ e^(-1/2) (1/2)^n / n!
```

This is called a Poisson distribution with parameter 1/2 (the parameter 1/2 comes from the fact that our guy averages half a TD per game). When you plug that in for various values of n, you get this:

```
0-TD games:  60.65% of the time
1-TD games:  30.33% of the time
2-TD games:   7.58% of the time
3-TD games:   1.26% of the time
4-TD games:   0.16% of the time
5+-TD games:  0.02% of the time
```

Note that all of the above is pure theory. At no point in making the above computations did any of the details of how NFL football is played come into the discussion. A mathematician who has never seen a football game could have built this model. It's not a good model unless it describes what happens in actual football games.

So does it?

If you look at all receivers since 1995 who played in 16 games and scored exactly 8 touchdowns, you'll find 29 such seasons. That's a total of 464 games. If the poisson model is to be believed, we would expect about 281 zero-TD games, about 141 one-TD games, and so on. Here is a table showing the expected and actual totals:

```
TDs    Prob.   Expected   Actual
================================
0    0.607     281.4      272
1    0.303     140.7      157
2    0.076      35.2       30
3    0.013       5.9        5
4    0.002       0.7        0
5    0.000       0.1        0
```

Whether that's close enough to claim that the poisson really is a good model in this case is for another post. For now, let's just say it looks pretty close. The actual data shows a few more one-TD games and couple fewer 0- and 2-TD games than expected, but overall it's a remarkably good match.

Of course, there's no reason to limit ourselves to players who scored 8 TDs. We could similarly look at players who scored 4 or 6 or 12 or whatever. All we have to do is plug in 4/16 or 6/16 or 12/16 or whatever into the formula in place of 1/2. Here is the data:

```
TDs    Prob.   Expected   Actual
================================
0    0.779     398.7      395
1    0.195      99.7      106
2    0.024      12.5       11
3    0.002       1.0        0
4    0.000       0.1        0
5    0.000       0.0        0
TDs    Prob.   Expected   Actual
================================
0    0.732     515.1      506
1    0.229     161.0      177
2    0.036      25.1       20
3    0.004       2.6        1
4    0.000       0.2        0
5    0.000       0.0        0
TDs    Prob.   Expected   Actual
================================
0    0.687     417.9      407
1    0.258     156.7      178
2    0.048      29.4       19
3    0.006       3.7        4
4    0.001       0.3        0
5    0.000       0.0        0
TDs    Prob.   Expected   Actual
================================
0    0.646     382.2      364
1    0.282     167.2      201
2    0.062      36.6       23
3    0.009       5.3        4
4    0.001       0.6        0
5    0.000       0.1        0
TDs    Prob.   Expected   Actual
================================
0    0.607     281.4      272
1    0.303     140.7      157
2    0.076      35.2       30
3    0.013       5.9        5
4    0.002       0.7        0
5    0.000       0.1        0
TDs    Prob.   Expected   Actual
================================
0    0.570     282.6      277
1    0.321     159.0      166
2    0.090      44.7       46
3    0.017       8.4        7
4    0.002       1.2        0
5    0.000       0.1        0
TDs    Prob.   Expected   Actual
================================
0    0.535     128.5      114
1    0.335      80.3      105
2    0.105      25.1       18
3    0.022       5.2        3
4    0.003       0.8        0
5    0.000       0.1        0
TDs    Prob.   Expected   Actual
================================
0    0.503      72.4       68
1    0.346      49.8       56
2    0.119      17.1       18
3    0.027       3.9        1
4    0.005       0.7        1
5    0.001       0.1        0
TDs    Prob.   Expected   Actual
================================
0    0.472      52.9       50
1    0.354      39.7       44
2    0.133      14.9       14
3    0.033       3.7        4
4    0.006       0.7        0
5    0.001       0.1        0
```

The patterns are generally the same as what we saw in the 8-TD case: the actual numbers show fewer 0- and 2-TD games than the poisson model would predict, and more 1-TD games. Just off the top of my head, I'd guess that this is because of a general tendency to spread things around among different receivers on the same team. Whether that's forced by the defense or mandated by the coach I'm not sure. Also, there are just a shade fewer 3+ TD games than the poisson would predict. This may be because teams that have a receiver who catches 2 TD passes generally have a comfortable lead and don't need to throw anymore. Or because defenses who give up two TDs to the same guy try to make darn sure they don't give up a third.

The bottom line is that, if you know that something --- calls from telemarketers, flat tires, power outages, touchdown catches --- will happen, on average, x times per game, or per month, or per day, or per decade, and you want to know what is the probability that it will happen n times in a given time period, the poisson model can often give you a pretty good estimate.

This entry was posted on Wednesday, December 6th, 2006 at 6:10 am and is filed under Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.