## The Poisson distribution

Posted by Doug on December 6, 2006

One of my original goals when starting this blog was to highlight some of the mathematics in and of the game of football. I didn't have anything groundbreaking in mind; I just thought it might be nice for a football example to show up when people googled markov chain or Benford's Law or whatever. I was doing a little of that during the offseason, but I've gotten away from it ever since some actual football started getting played. In the comments to last week's Benford's Law post, JKL provides a nice excuse to get back to it:

Do the distribution of TD’s follow a normal or a poisson distribution, or some other distribution pattern.

For all WR’s who score exactly 5 TD’s in a season, do we have the expected number of single 3 TD games from that population as a whole, or are there more or fewer players with exactly 1 TD in 5 different games than we might otherwise expect?

Let's first imagine a receiver whose true ability level is 8 touchdowns per year. In a given season, he might score nine or six or ten, but over several (imagined) seasons he'd average eight per year. What if you wanted to simulate a year's worth of game logs for this player?

One simple model would be to note that this player should average .5 touchdowns per game and then view each game as a coin flip. Heads he scores in that game, tails he doesn't.

That's not a terrible model. It will give the guy 8 TDs per year in the long run. But it's obviously lacking. In the long run, it will predict that half his games will be 1-TD games and the other half will be 0-TD games. Our years of experience reading box scores tell us that's not realistic.

So why not break it down a bit further? Instead of viewing this guy as a .5-TDs-per-game player and then simulating 16 games each as a coin flip with probability .5, we could view him as a .25-TDs-per-half player and then simulate 32 halves each as a coin flip with probability .25. This idealized receiver will still average 8 TDs per year, but now he will have 2-TD games 12.5% of the time, 1-TD games 37.5% of the time, and 0-TD games half the time.

Better.

But why stop there? Let's look at him as a .125 TDs-per-quarter player and simulate 64 quarters. I'll spare you the calculations, but this would result in the following:

0-TD games: 58.62% of the time

1-TD games: 33.50% of the time

2-TD games: 7.18% of the time

3-TD games: 0.68% of the time

4-TD games: 0.02% of the time

Now that's starting to look relatively realistic.

This "coin-flipping" model is called a *binomial model*, by the way. Let's stop here and consider a couple of the assumptions implicit in the binomial model. In order to compute the above, we have assumed that each quarter (coin flip) is independent of the others. In other words, the above assumes that Chad Johnson's scoring in the first quarter tells us nothing one way or the other about whether he'll score in the second quarter. There are all sorts of reasons why we might doubt that assumption. Scoring in the first quarter might be a clue that he's playing against a weak secondary, which would indicate an increased chance of TDs in future quarters of the same game. On the other hand, scoring in the first quarter might cause the opposing defense to start double- or triple-covering him, thereby leading to a lower probability of future TDs. And that's just the tip of the iceberg of possible ways this model fails to be literally correct.

But you know what they say: all models are imperfect, some are useful anyway. Let's press on and see what happens.

What if we look at him as a .00833-TDs-per-minute player and then simulate 960 minutes each as a coin flip with probability .00833? What if we look at him as a .0001389-TDs-per-second player and then simulate 57600 seconds?

We're getting into some obvious absurdity here, as this model would yield a chance of this receiver scoring a thousand (or much more) TDs in a season. It would be a very, very, very tiny chance --- so tiny that for all practical purposes it could never happen --- but a chance nonetheless. Furthermore, as we break the season down into more and more pieces, each of which is smaller and smaller, the calculations required are getting uglier and uglier.

Believe it or not, it turns out that the math can be simplified by breaking the season down into infinitely many pieces, each of infinitessimal length (technically, breaking it down into N pieces and then taking the limit as N goes to infinity). When you do that, what you get is this:

Prob. of having an N-touchdown game =~ e^(-1/2) (1/2)^n / n!

This is called a *Poisson distribution with parameter 1/2* (the parameter 1/2 comes from the fact that our guy averages half a TD per game). When you plug that in for various values of n, you get this:

0-TD games: 60.65% of the time

1-TD games: 30.33% of the time

2-TD games: 7.58% of the time

3-TD games: 1.26% of the time

4-TD games: 0.16% of the time

5+-TD games: 0.02% of the time

Note that all of the above is pure theory. At no point in making the above computations did any of the details of how NFL football is played come into the discussion. A mathematician who has never seen a football game could have built this model. It's not a good model unless it describes what happens in actual football games.

So does it?

If you look at all receivers since 1995 who played in 16 games and scored exactly 8 touchdowns, you'll find 29 such seasons. That's a total of 464 games. If the poisson model is to be believed, we would expect about 281 zero-TD games, about 141 one-TD games, and so on. Here is a table showing the expected and actual totals:

TDs Prob. Expected Actual

================================

0 0.607 281.4 272

1 0.303 140.7 157

2 0.076 35.2 30

3 0.013 5.9 5

4 0.002 0.7 0

5 0.000 0.1 0

Whether that's close enough to claim that the poisson really is a good model in this case is for another post. For now, let's just say it looks pretty close. The actual data shows a few more one-TD games and couple fewer 0- and 2-TD games than expected, but overall it's a remarkably good match.

Of course, there's no reason to limit ourselves to players who scored 8 TDs. We could similarly look at players who scored 4 or 6 or 12 or whatever. All we have to do is plug in 4/16 or 6/16 or 12/16 or whatever into the formula in place of 1/2. Here is the data:

32 receivers with 4 TDsTDs Prob. Expected Actual

================================

0 0.779 398.7 395

1 0.195 99.7 106

2 0.024 12.5 11

3 0.002 1.0 0

4 0.000 0.1 0

5 0.000 0.0 044 receivers with 5 TDs

TDs Prob. Expected Actual

================================

0 0.732 515.1 506

1 0.229 161.0 177

2 0.036 25.1 20

3 0.004 2.6 1

4 0.000 0.2 0

5 0.000 0.0 039 receivers with 6 TDs

TDs Prob. Expected Actual

================================

0 0.687 417.9 407

1 0.258 156.7 178

2 0.048 29.4 19

3 0.006 3.7 4

4 0.001 0.3 0

5 0.000 0.0 037 receivers with 7 TDs

TDs Prob. Expected Actual

================================

0 0.646 382.2 364

1 0.282 167.2 201

2 0.062 36.6 23

3 0.009 5.3 4

4 0.001 0.6 0

5 0.000 0.1 029 receivers with 8 TDs

TDs Prob. Expected Actual

================================

0 0.607 281.4 272

1 0.303 140.7 157

2 0.076 35.2 30

3 0.013 5.9 5

4 0.002 0.7 0

5 0.000 0.1 031 receivers with 9 TDs

TDs Prob. Expected Actual

================================

0 0.570 282.6 277

1 0.321 159.0 166

2 0.090 44.7 46

3 0.017 8.4 7

4 0.002 1.2 0

5 0.000 0.1 015 receivers with 10 TDs

TDs Prob. Expected Actual

================================

0 0.535 128.5 114

1 0.335 80.3 105

2 0.105 25.1 18

3 0.022 5.2 3

4 0.003 0.8 0

5 0.000 0.1 09 receivers with 11 TDs

TDs Prob. Expected Actual

================================

0 0.503 72.4 68

1 0.346 49.8 56

2 0.119 17.1 18

3 0.027 3.9 1

4 0.005 0.7 1

5 0.001 0.1 07 receivers with 12 TDs

TDs Prob. Expected Actual

================================

0 0.472 52.9 50

1 0.354 39.7 44

2 0.133 14.9 14

3 0.033 3.7 4

4 0.006 0.7 0

5 0.001 0.1 0

The patterns are generally the same as what we saw in the 8-TD case: the actual numbers show fewer 0- and 2-TD games than the poisson model would predict, and more 1-TD games. Just off the top of my head, I'd guess that this is because of a general tendency to spread things around among different receivers on the same team. Whether that's forced by the defense or mandated by the coach I'm not sure. Also, there are just a shade fewer 3+ TD games than the poisson would predict. This may be because teams that have a receiver who catches 2 TD passes generally have a comfortable lead and don't need to throw anymore. Or because defenses who give up two TDs to the same guy try to make darn sure they don't give up a third.

The bottom line is that, if you know that something --- calls from telemarketers, flat tires, power outages, touchdown catches --- will happen, on average, *x* times per game, or per month, or per day, or per decade, and you want to know what is the probability that it will happen *n* times in a given time period, the poisson model can often give you a pretty good estimate.

Poisson? Uh oh. That sounds like one of the words that resulted in me flunking out of college.

Doug, I have a question about your charts. Your chart label says "7 receivers with 12 TDs" but goes on to list 50+44+14+4=112 receivers. All of the tables are like this - with more receivers listed than the title suggests. Am I misreading this? Shouldn't there be a total of 7 receivers listed in the last chart?

Richie, 7 receivers * 16 games = 112 receiver-games

Among all those receivers, we would expect to see 52.9 0-TD games, and we actually saw 50. And so on.

Gotcha. Thanks. This is why I flunked out of Stats I guess.

Doug - interesting stuff - but I am a little leery of the initial assumptions (labeling a guy an 8 touchdown guy.) I don't have any specific reasons, but it seems that that is cutting down on the pool of people to the point where it will almost certainly fit the model.

Wouldn't it be a bit more sensible to do something like look at all the touchdown receptions in a season by WR's, and then divide by the number of WR games played (maybe using something like WR's w/ > 1 catch per game to give an estimate, or 3wr's * 32 teams * 16 games)?

we would expect about 281 one-TD games, about 141 two-TD games, and so on. Here is a table showing the expected and actual totals:

TDs Prob. Expected Actual

================================

0 0.607 281.4 272

1 0.303 140.7 157

2 0.076 35.2 30

3 0.013 5.9 5

4 0.002 0.7 0

5 0.000 0.1 0

You say 281 1-TD games above, but I see 281 0-TD games in the chart and so forth. Typo?

Dangit. Yes. Typo. I'll fix it.

Doug, A total of your 4 to 12 TD players gives us:

TDs Expected Actual

0 2532 2453

1 1054 1190

2 241 199

3 40 29

4 5 1

5 1 0

So, it's clear that the Poisson model, to the "nth" second is good but not great.

However, why not just go back to your "per quarter" model. That one seemed to match the 8-TD guys the best. Can you rerun your charts, appending either the "per quarter" model. Or, better yet, finding the best-fit of what kind of model would work best (say, TD per 10.67 minutes).

Great job!

Hi, Doug,

For 8-TD guys, you're finding everyone with exactly 8 TDs. But the guys with exactly 8 TDs aren't exactly the same as the guys *expected* to get 8 TDs.

Wouldn't your method cause the higher categories (3+ TDs, etc.) to be underpopulated and the lower categories to be overpopulated?

Are you sure of this? I would interpret the math as saying that "in 33.50% of all games, you would score touchdown(s) in 1 quarter", "in 7.18% of games you would score touchdown(s) in 2 quarters", etc., rather than "you will have a 1 (or 2 or 3, etc.) TD game". Clearly, you can score multiple TDs in a quarter.

The average of the above gives you 0.5 TD per game. I don't see anything wrong with it.

Bill, in football it is possible to score multiple TDs in a quarter, but the binomial model assumes that, in each period of time, either you get one or you get zero. That means that the TDs-per-quarter binomial model has an assumption that isn't actually true of NFL football. That's one of the reasons for letting the time period shrink towards 0 and replacing the binomial model with the Poisson model.

The assumptions of the Poisson model also don't exactly fit the reality of a football game, though. The Poisson model assumes that you're equally likely to score a TD at any time, whether you haven't scored a TD yet or whether you scored 2 seconds ago. But in real football it is physically impossible to score two TDs two seconds apart (unless, I suppose, your first TD comes just before the end of a quarter and your second comes on the ensuing kickoff). Usually you have to wait a lot longer than two seconds for the other team's possession to end.

The model that seems most appropriate to me is a binomial model where we break the game into possessions. On average, it looks like a team has between 11 and 12 possessions in a game, and it's only possible to score either 0 or 1 TDs in a single possession, so a binomial model that breaks a game into 11-12 units seems like a good fit to a real game. This model assumes 1) that a receiver is equally likely to score a TD on any possession and 2) that a team has the exact same number of possessions every game. (If you wanted to have a fun fun post on hierarchical modeling, Doug, you could see what happens when you let the number of possessions per game vary.)

I ran the numbers, and this binomial model comes a little bit closer to predicting the actual data than the Poisson model, but not much. Collapsing across all of the receivers that Doug gave us data for, here are the numbers.

First, actual number of games:

0 TD: 2453

1 TD: 1190

2 TD: 199

3+TD: 30

Second, the predictions of the 11 possessions/game binomial model:

0 TD: 2520

1 TD: 1092

2 TD: 238

3+TD: 38

Third, the predictions of the Poisson model:

0 TD: 2543

1 TD: 1058

2 TD: 241

3+TD: 46

So both models predict too few 1 TD games, and too many of the others, probably for reasons like the ones that Doug suggests.

Ah, thanks, Tango (#8). I think what you've shown is the effect I postulated in #9.

Following up on my previous post (#10), allow me to illustrate my method for calculating the probability for a "2 TD game".

First, we're given that the average TDs per quarter we're dealing with is 0.125. Therefore, let's assume that Pr(1 TD in a qtr) = 0.125

Let's go to 4 decimal places

Pr(1 TD in a qtr) = 0.1250

Pr(2 TD in a qtr) = 0.0156 ( = 0.1250 ^ 2 )

Pr(3 TD in a qtr) = 0.0020 ( = Pr(1) ^ 3 )

Pr(4 TD in a qtr) = 0.0002

That's as far as we can go with 4 decimal places, so adding them up, we get Pr(1+ TDs in a qtr) = .1428

That makes Pr(0 TDs in a qtr) = 1 - 0.1428 = 0.8572.

Now, there are two ways to have a "2 TD game"

Case 1: 1 TD per qtr for any 2 qtrs.

The prob of this is (4C2) * (0.1250)^2 * (0.8572)^2 = 0.0689

Case 2: 2 TD in any one qtr.

The prob of this is (4C1) * (0.0156) * (0.8572)^3 = 0.0393

Adding both cases together, we get:

Pr("2 TD game") = 0.0689 + 0.0393 = 0.1082

Therefore, unless my logic is flawed, I would expect a "2 TD game" 10.82% of the time.

Bill as far as I can tell 4C2 = 6 and 4C1 = 4, but what does that stand for? I thought maybe 4!/2!, but that wouldn't work because 4!/1! isn't 4. What does the 4C1/2 stand for?

With my notation aCb is a combinatoric representation for "a items taken b at a time". 4C2, in this case, means there are 4 quarters and we want any 2 of them. In other words, 4C2 is the number of combinations of any 2 quarters in a 4 quarter game.

4C2 = 4!/(2!2!) = 24/4 = 6. 4C1 = 4!/(1!3!) = 24/6 = 4

And here I thought all you knew was that the Bills are going to get smoked this weekend.

I'm not so sure about that. I've heard McGahee has the Jets' number 🙂

Don't take my word for it... ask Doug's brother!

I think you've got us confused, Bill. It's my brother that mistakenly thinks McGahee owns the Jets. Watch the Jets go 3-0 against the Bills in 2006, this weekend.

Oh, my mistake. Gimme some credit, though. The fact that a Bills fan is able to remember anything about anybody OR their brother has to count for something. Snow affects the memory, you know. Is Gastineau starting this week?

Aha!! I was right (and so was Doug). I whipped out the statistics textbook and reviewed the Poisson distribution (I knew something was "fishy"). The Poisson distribution is designed such that it deals with probabilities of either zero or one "success" per interval (with the assumption that Pr(more than 1 in an interval) = 0 ). Doug's figures were right, as far as they went. However, the Poisson distribution is not the best choice for this problem, as we all can imagine a player getting multiple TDs in one quarter (Cite "Evans's Law of Multiple TDs vs Houston").

Since I made the request, you would think the least I should be able to do is read it before Bill M. posts 10 times.

One random thought that popped into my head is that there are different types of TD's. I would basically subdivide them into run of play TD's, and those occurring in the red zone, where the field is more restricted, and body positioning, working in traffic is a little more important than speed and elusiveness. Perhaps one type follows poisson a little better than the other.

To test this, we could divide all TD's out, but that would require looking at each box score. Less exact but more manageable, we could divide the receivers by YPC groups to see if the high YPC group or low YPC group followed the poisson distribution more.

I did that with 10 TD and 9 TD guys. My caveat is that I used the leader boards, so if a 9 TD guy didn't appear on the leader board, he may not be included and my numbers may not exactly match up with Doug's. I also used the Tight Ends who caught that many TD's, which Doug may not have used.

The dividing line for YPC fell at 14.0. Here are the results. I find them interesting. But then again, splits happen, so it bears further examination.

10 TD's in one season

HIGH YPC GUYS

0- 82

1- 60

2- 14

3- 4

LOW YPC GUYS

0- 76

1- 69

2- 14

3- 1

9 TD'S in one season

HIGH YPC GUYS

0- 146

1- 80

2- 26

3- 4

LOW YPC GUYS

0- 132

1- 82

2- 25

3- 1

The high YPC guys are very close to the poisson distribution. The low YPC guys are farther away, and have more 1 TD games, and less 3 TD games and 0 TD games.

Bill,

The poisson does in fact allow for multiple TDs in a quarter. Assuming you've got an 8-TD per year guy, then he's a 1/8 TD per quarter guy, so you can estimate his distribution of TDs per quarter using a poisson with parameter 1/8.

Prob of N TDs in a quarter would be e^(-1/8) (1/8)^N / N!

What that yields is zero-TD quarters about 88.2%, one-TD quarters about 11% of the time, two-TD quarters about 0.7% of the time, and 3(or more)-TD quarters about 0.1% of the time.

The poisson always allows for multiple occurrences in the same time period, no matter how small that time period is.

The quarter-by-quarter model is a binomial model, not a poisson. The reason I introduced it is because it is a stepping-stone on the way to the poisson. If you do the game-by-game binomial model, then the half-by-half binomial model, then the quarter-by-quarter, then the minute-by-minute, then the second-by-second, and then you take the limit, you get the poisson.

Tango and Phil,

Thanks for taking a break from the hot-stove action to drop by 🙂 I'll try to address your questions when I get a minute later today.

Now my question is what is the point? So we find out that WR touchdowns might resemble a Poisson distribution, but what does it mean? What can we do with this information? Does this just help us to estimate future TD production?

Richie,

I'm not sure I can convincingly sell you on applications of this particular example. But in general, the Poisson's value is that it takes a single number (assumed to be known): how many occurrences per time period a given event will occur

on average. And it gives you back the probability of that event occurring aspecificnumber of times.For example, suppose you're in charge of setting the budget for the state of Oklahoma's disaster relief department. The weather people tell you that, on average, 0.8 major tornadoes per month is what you can expect. Suppose you have the resources to deal with a two-tornado month OK, but a three-tornado month would really cripple the state. If the poisson is a good model for tornado occurrences (which it probably is, if interpreted properly), then you can estimate the probability of a three (or more)-tornado month and make an informed decision on whether you're willing to deal with that risk, or whether you need to increase the budget.

Now back to football. Suppose you're in a TD-only fantasy league and you've got a big game this week. You go to your favorite fantasy football site and check out the weekly projections. Your players are projected to score 12 TDs and his are projected to score 10 TDs. So you're the favorite. But what's your probability of winning? 51%? 60%? 80%? You don't know unless have some sense of the distribution of TDs.

Tango, Phil, and Jacob, let me try to explain a bit better what this post is supposed to be about.

I am starting from the assumption that I

knowthe player's true TD-scoring ability level, and also the assumption that that ability level is constant over time.The question is: given that I

knowhe's going to average 8 TDs per year in the long run, how do I expect those TDs to be distributed within the games?It is fascinating (to me) that a very reasonable --- not perfect, but very reasonable --- model can be built without knowing anything at all about NFL football. And that the same model does a nice job of approximating the distribution of all sorts of other things. Wiki lists the following applications of the poisson:

So we've built a model. Now the question is: does that model describe what we see in the real world?

In order to test it, we need to find a receiver whose true level of ability we

knowwill remain a constant 8 TDs per year for a long period of time. No such animal exists, of course, so I had to invent one: the composite of all receivers who actually did score 8 TDs in a year.Jacob's objection, if I understand it correctly, is that by picking all guys with exactly 8 TDs I'm forcing the data to fit the model. That's partly right. By choosing that set of data, I am forcing the predicted and actual

averageTDs per game to be equal. But, of all the possible distributions that would end up giving us an average of .5 TDs per game, which one actually occurs in practice? That answer is not rigged. The actual data might have turned out radically different from the poisson. But it didn't.Phil's objection, I think, is that, if we looked at all receivers who we

projectedto have 8 TDs in a year, their actual distribution would not turn out to be the same as the distribution of the guys who actually did score 8 TDs in a year. I suspect you're right about that, Phil, but that's not the question I'm trying to answer here.Tango, your observation about the per-quarter model producing the best fit is an interesting one. But my intent with this particular post is not to find the best fit. It's to explain the poisson and see if it's a good fit.

Fair enough.

Hi, Doug,

Actually, I'm arguing that if you take the TD distribution of guys who scored exactly 8 TDs, you would NOT expect it to be Poisson. Only the distribution of guys who were *expected* to score 8 would be Poisson.

The guys who actually do score 8 should be similar to Poisson, but with two few extreme values and too many average-ish values. (Which is what you found.)

Here's one way to think about it: suppose TDs are poission with mean 8. Some of those 8-mean guys will score 2 TDs their first game, just by luck. Those particular players now have an expectation of 9.5 TDs for the season instead of 8, and thus will be underrepresented in the sample.

Phil is right technically. But, does it matter? Let me switch to baseball.

If we go and get all .300-.320 hitters over the last 10 years, we'll find a certain percentage of 0-hit, 1-hit... 5-hit games, giving us an overall mean of say .310.

But, if we are looking for true .310 hitters, guys who will have produced say seasons of .280-.340, what kind of distribution will we get? First off, you are going to need to weight the guys at the extremes less than at the center, since the chance that a .280 hitter is actually a .310 hitter is less than a .310 hitter actually being a .310 hitter.

Phil is probably right that the true distribution will actually be wider, but probably only slightly wider, than what Doug is showing. In the end, it probably doesn't matter, and simply doing as Doug is doing is fine.

Maybe Doug can teach us about the Central Limit Theorem next?

Hmmm ... maybe Tango is right.

I agree with him on the baseball example, but the difference in the football example is that the success rate is lower. If you score 8 TDs for the season, but get two your first game, you have to be at a 6.4 rate for the rest of the season. If you hit .310 for the season, but go 3-for-5 your first game, you're still at .307 for the rest of the season -- a much smaller difference.

But, still, I'm too lazy to actually figure out the real probabilities, and I suspect Tango may be right, so I concede the point.

Phil makes a good point in posts [9] and [29] and I'm not convinced that the effect is small enough that it can be ignored - but as he points out in the latter post, the net effect would likely be that reality agrees even more closely with the model than your charts illustrate.

As for the (again, largely academic) question of whether it makes sense to use the limiting Poisson distribution, or one of the binomial models with discrete time periods, it seems to me that the sensible choice would be a "per possession" model. Of course, possessions have different lengths, and there are differing numbers of them in a game, but using the league average would give a pretty good estimate of the actual number of distinct chances any given receiver has of scoring (or not scoring) a touchdown.

I couldn't resist running a simulation, and it turns out Tango is indeed right.

In 100,000 simulated seasons of players with 8-TD talent, they scored 2 TDs in 7.55 percent of games. In only those seasons where the player scored *exactly* 8 touchdowns, it was 7.34 percent.

So I was wrong that this effect is what caused the observed differences ... it caused a small part, but not very much. Tango is right when he says it doesn't really matter.

Yes, that was my objection. I do understand that the data fits the curve, and am not that surprised.

What this suggests is that touchdown catches are essentially independent from one another. Meaning that something like Markov chains are a BAD predictor of TD's in a game (your current output at this point in the season has no bearing on your future production).

Can that really be true? For instance, say Bernard Berrian has like 4 TD's through 4 games, does this have any predictive ability for the rest of the season? I suspect that it does. That is why I am having difficulty with this.

Is there 3D Poisson? In that case we could have the likely hood of having a specific # of TD's in a season on one axis, the # of TD's per game on the other axis, and the third would be the probability of a specific thing. With something like that I think you could build a predictive Maarkov-chain engine to predict the liklihood of the # of touchdowns a receiver is going to score in a given week.

Andy [33], I posted estimates for the "per possession" model in comment 12. Like Phil's suggestion, it explains a part of the discrepancy but not the majority of it.

What if you combine the two suggestions? That is actually pretty easy to model. As before, we can simplify things and assume that each team has exactly 11 possessions per game. So each receiver has 176 possessions per season, and an 8-TD receiver by Doug's selection method (following Phil) is just one who has 8 scoring possessions and 168 nonscoring possessions. So the probability that he scores 1 TD in a game is just the probability that, if we randomly choose 8 of the 176 possessions, we'll select exactly 1 of those 8 scoring possessions (and 10 of the 168 nonscoring possessions). If you know how to do combinations, that isn't hard to calculate (although it's too long to write out the calculation here).

I ran all the numbers for 4-12 TD receivers, and I ended up with these predictions (for the games of the 243 receivers that Doug gives data for):

0 TD: 2488 games

1 TD: 1145 games

2 TD: 227 games

3+TD: 28 games

As I mentioned in my last post, this is the actual data:

0 TD: 2453 games

1 TD: 1190 games

2 TD: 199 games

3+TD: 30

And these were the predictions of the Poisson model:

0 TD: 2543

1 TD: 1058

2 TD: 241

3+TD: 46

Note that the new predictions are closer to the actual data than to the Poisson model. That means that the majority of the discrepancy between the Poisson model and the actual data goes away once we account for 1) the fact that scoring opportunities only occur within possessions, and there are only about 11 possessions per game, and 2) Doug using the actual season totals to represent receivers' underlying ability.

Great post! This is what i like about this blog, the math and formulas. I can go to any sports blog or newspaper to read observations but there arent many places where you can get this stuff. At least not for football.

Just a quick question, in the original charts players who did not play a game got 0 TD's for that game? It seems you are counting every WR at 16 games but the truth is that they only average something around 14 per season. Is this accounted for? I would think that scoring 8 TD's in 14 games would increase the liklihood of a multiple TD game or at least a 1 TD game and decrease the liklihood of 0 TD games.

Jacob (35): I think you are missing something important. They are independent and Markov would apply. That you have 3 TD in one game doesn't mean he's forecast for 5 TD for the other 15. We are not modelling each player scoring exactly 8 TD per season. We are starting with each player scoring 8 TD per season, and figuring out how often he scored 0,1,2 per game.

In effect, Doug is doing what Phil (34) is doing, but taking a shortcut. Phil is technically right (i.e., start with a true talent of 0.5 TD per game), and figure out how often he'd get 0,1,2 TD per game. Heck, figure out how often he'd get 6,7,8,9,10 TD per season.

Doug's shortcut, as shown, is acceptable.

Hey MO. Doug's original post says he looked only at players with 16 games played.

The best unit would likely be "per reception" and find their average receptions per year or per game. If you want to calculate something closer to value of the receiver, use the Poisson distribution for "per pass directed to player" or even "per down on the field."

remember that the Poisson distribution is a form of the binomial distribution as the number of instances reacches infinity as np remains fixed. Is your n large and p small? In other words what gets you your gamma value? If it is true then the Poisson distribution makes sense. Since your selection of n has been arbitrary, you could go from 1/64ths of games to 1/3600ths of a game (seconds). That would be a better check for the utility of the Poisson distribution I would think. I think it would immediately draw into question the utility of a player's impact per second, since the distribution of a player's time on the field is complex and hardly distributed over every second of the game.

I think a per play approach using the binomial distribution would be most interesting.

anybody do any research about modeling yards per game?

Given that player A is a starter, can the normal distribution be used to estimate player A's yards per game?

Does anyone know the distribution for a player with a true skill of 70-yards per game? I assume normal distribution and not Poisson since yards is not binomial like TDs.