SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com » Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

Benford’s Law in the NFL

Posted by Doug on November 30, 2006

Benford's Law is a fascinating bit of mathematical trivia that has nothing to do with football. Yesterday's post was superficially related to it, so I'm using that as an excuse to introduce it to those of you who haven't seen it before.

Yesterday's post was about the yards that get rounded out of a players' fantasy point total in a lot of leagues. The amount of yards a player loses to rounding depends on the last digit of his rushing yardage total for each game. In the comments, someone asked whether the distribution of final digits on rushing totals is uniform leaguewide. Well, it doesn't appear to be exactly uniform, but it's pretty close.


Final
digit Freq PCT
======================
0 1521 0.077
1 2334 0.118
2 2472 0.125
3 2223 0.113
4 2123 0.108
5 1994 0.101
6 1883 0.095
7 1838 0.093
8 1703 0.086
9 1644 0.083

Now, a very different thing happens if you take a look at the first digits of rushing totals:


First
digit Freq PCT
======================
1 5982 0.303
2 3146 0.159
3 2229 0.113
4 1923 0.097
5 1712 0.087
6 1439 0.073
7 1218 0.062
8 1133 0.057
9 953 0.048

Now that's clearly not uniform and far from it. And that's just what you'd be expecting if you know about as Benford's Law. Here is the wikipedia description:

Benford's law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is 1 almost one-third of the time, and further, larger numbers occur as the leading digit with less and less frequency as they grow in magnitude, to the point that 9 is the leading digit less than one time in twenty.

That's almost exactly what we see with the NFL rushing data. Now you may be thinking at this point that the distribution of NFL rushing yardage leading digits is an artifact of the game itself. People rush for between 100 and 199 yards all the time, but hardly ever between 200 and 299. Maybe that's the explanation. But maybe not. Benford's Law is pretty pervasive. It applies to populations of cities and countries, to lengths of rivers, to stock prices, and even to the collection of numbers --- from whatever source --- that appear on the front page of the newspaper over a long period of time. In a whole lot of real life data sets, you'll find numbers with a leading digit of 1 much, much more often than numbers with a leading digits of 9.

You won't find the same pattern in all sets of data. If we did this with yards per rush instead of yardage totals, we would not get a similar distribution. If we looked at the heights of NFL players, the distribution of first digits would not follow Benford's Law. But it is remarkable that it applies to so many data sets including, at least roughly, rushing yardage totals.

Now let's investigate whether the Benford phenomenon, as observed in this case, is merely an artifact of the structure of NFL football games.

What if we measured rushing totals in feet instead of yards? LaDainian Tomlinson gained 327 feet rushing last week, Rudi Johnson notched 192 feet, and so on. Here are what the leading digits look like:


First
digit Freq PCT
======================
1 5600 0.284
2 3539 0.179
3 3603 0.183
4 1366 0.069
5 1013 0.051
6 2012 0.102
7 587 0.030
8 547 0.028
9 1468 0.074

Not exactly the same pattern. But still far from uniform and still skewed in essentially the same way. Did you know that Rudi Johnson rushed for 5852 centimeters last week? Here is the distribution of leading digits of rushing yardage totals measured in cm:


First
digit Freq PCT
======================
1 5780 0.293
2 2901 0.147
3 2193 0.111
4 1886 0.096
5 1602 0.081
6 1348 0.068
7 1211 0.061
8 1012 0.051
9 1802 0.091

Rudi also rushed for .0364 miles last week (that counts as a leading digit of 3, not zero). Here is the distribution of leading digits for the "rushing miles" totals of all games played in the NFL since 1995:


First
digit Freq PCT
======================
1 5537 0.281
2 3442 0.174
3 2778 0.141
4 1591 0.081
5 2664 0.135
6 1398 0.071
7 1053 0.053
8 534 0.027
9 738 0.037

So it really doesn't have much to do with the fact that 100--199 yards is a more common total than 200--299, or anything like that. If that were the cause of the distribution of leading digits, then the pattern would likely disappear if we measured in some other units.

And that's actually the key to why Benford's Law works. For sets of data that have units, the distribution has to be (subject to a few caveats) one that is invariant to changes of units. It just so happens that the Benford distribution has that property.

If you find this interesting, the previously-cited wikipedia writeup has more information. If you want something more hardcore, check out the Mathworld entry.

This entry was posted on Thursday, November 30th, 2006 at 5:05 am and is filed under Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.