SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

Pro-Football-Reference.com ยป Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

Regression to the Mean

Posted by Chase Stuart on September 5, 2007

Whenever a player has an incredible season, people should expect lower results the following year due to regression to the mean. While most of the readers of this blog are certainly familiar with the concept, I thought I'd spend a few minutes today discussing exactly why it occurs. You also might want to check out how "regression to the mean" plays a part in the Madden Curse.

Let's pretend there are 10 WRs in the league, and break the NFL season up into three parts (ignoring that 16 doesn't divide by 3 evenly). Suppose a stud WR will gain 600 yards in a third of a season 1/3 of the time, 500 yards in a third of a season 1/3 of the time, and 400 yards in a third of a season 1/3 of the time. Obviously, his expected number of receiving yards in a given season is 1500. But equally obvious is that occasionally he'll get as many as 1800 yards in a season or as few as 1200 yards, purely based on luck. In fact, he can have 27 different possible season ending totals, assuming the order in which he gains his yards matters:


Part1 PartII PartIII Total
600 600 600 1800
500 600 600 1700
600 500 600 1700
600 600 500 1700
400 600 600 1600
500 500 600 1600
500 600 500 1600
600 400 600 1600
600 500 500 1600
600 600 400 1600
400 500 600 1500
400 600 500 1500
500 400 600 1500
500 500 500 1500
500 600 400 1500
600 400 500 1500
600 500 400 1500
400 400 600 1400
400 500 500 1400
400 600 400 1400
500 400 500 1400
500 500 400 1400
600 400 400 1400
400 400 500 1300
400 500 400 1300
500 400 400 1300
400 400 400 1200

So how often will each season-ending total occur (assuming the order in which he gained his yards doesn't matter)?


1800 1/27
1700 3/27
1600 6/27
1500 7/27
1400 6/27
1300 3/27
1200 1/27

The mode, median and mean of the sample are all 1500 receiving yards. Two-thirds of the time, our fictional WR will end up with between 1400-1600 yards. Once every 27 times, though, we'll see a crazy result.

Let's expand our field a bit to 10 WRs. Each WR has a 33% chance of any of the three numbers occurring in any of the three thirds of a season.


Part1 PartII PartIII Total
WR1 600 500 400 1500
WR2 575 475 375 1425
WR3 550 450 350 1350
WR4 525 425 325 1275
WR5 500 400 300 1200
WR6 475 375 275 1125
WR7 450 350 250 1050
WR8 425 325 225 975
WR9 400 300 200 900
WR10 375 275 175 825

What sort of season-ending outputs should we expect from this group?


rate/27 1 3 6 7 6 3 1
WR1 1800 1700 1600 1500 1400 1300 1200
WR2 1725 1625 1525 1425 1325 1225 1125
WR3 1650 1550 1450 1350 1250 1150 1050
WR4 1575 1475 1375 1275 1175 1075 975
WR5 1500 1400 1300 1200 1100 1000 900
WR6 1425 1325 1225 1125 1025 925 825
WR7 1350 1250 1150 1050 950 850 750
WR8 1275 1175 1075 975 875 775 675
WR9 1200 1100 1000 900 800 700 600
WR10 1125 1025 925 825 725 625 525

So WR4 should be expected to get 1375 yards or more, 10 out of 27 times (Do you see why?). And WR10 should be expected to get just 525 yards once, but also 1125 yards once. And so on.

What's it all mean? With our 10 WRs, what sort of results should we expect in a given season? How many times will a WR get between 1600 and 1699 yards? How many times between 1000-1099? What range should we see the most of? The table below answers all those questions.


Yards Times in a season
1800+ 0.04
1700-1799 0.15
1600-1699 0.37
1500-1599 0.67
1400-1499 0.96
1300-1399 1.19
1200-1299 1.30
1100-1199 1.33
1000-1099 1.30
900-999 1.15
800-899 0.85
700-799 0.48
600-699 0.19
500-599 0.04

Only once every 27 seasons played should we see a WR get 1800 yards, because there's only one WR that can even reach 1800 (WR1) and he only does it once every 27 years. But we should see 1500 hit a few more times; WR1 will land in the 1500s seven times, WR2 six times, WR3 three times, and WR4 and WR5 one time each.

Now we get to the point of today's post. Assume that these players don't age from year to year, and their situations don't change one bit. We'd project the same thing for them every single year.

Well what happens the year we see a WR (WR1, of course) hit 1800 yards? We'd project 1500 yards for him the next year. What about when we see a WR end up with 1500-1600 yards? Well seven times it will be WR1, and we'd project exactly 1500 yards again. Six times we'd project 1425 yards the next year, three times we'd project 1350 yards, once we'd project 1275 and once we'd project 1200. In Year N, the 18 WR seasons that landed in the 1500-1600 yard range averaged 1521 yards. In Year N+1, we'd project a weighted average of 1421 yards. Remember, absolutely nothing changed in between the two seasons, yet we'd reduced our projection for the WRs by a full 100 yards.

The reasoning behind "regression to the mean" is iron-clad: when an impressive feat is hit, there's a good bit of luck involved. Sometimes, it's hit by someone who is actually as good as his stats (although this becomes less likely the more impressive the feat is). But other times it's by a player who is a little lucky, and sometimes it's by a player who's really lucky.

Now NFL players aren't computer programs or dice, but the same theory applies. And we see these results every year in the NFL. No one projects LaDainian Tomlinson to rush for 28 TDs again, because we know his true ability isn't 28 TDs per season. To reach such a ridiculous result, a good bit of luck had to be involved. And regression to the mean becomes more likely in the NFL than when flipping a coin, because of strength of schedule. Many impressive feats involve general luck, and also luck due to facing an easy schedule. Every year, some team plays the easiest schedule in the league, and as a result, will achieve results they couldn't normally achieve without a ton of luck. But since strength of schedule is incredibly inconsistent from year to year, we see this effect ride on top of regression to the mean to push down the great seasons. Because if you're going to throw for 49 TDs in a season, you've got to be: a) awesome; b) have lots of luck; and c) have a really easy schedule. And only one of those traits is likely to be there the next season.

Enough theory...let's look at some real life results.

The table below includes all WRs since 1960, with the 1960-1961, 1977-1978, 1981-1982, 1982-1983, 1986-1987, 1987-1988 and 2006-2007 season pairs excluded due to changes in the league schedule (and since the 2007 hasn't been played yet). All players that did not play in Year N+1 were excluded, and all players that played for multiple teams had their yardage from all teams combined.

#WR	Year N tier	N Avg	N+1 Avg
22 1500+ 1625 1164
25 1400-1499 1437 1137
57 1300-1399 1344 997
67 1200-1299 1244 1025
122 1100-1199 1146 900
155 1000-1099 1046 845
168 900-999 948 798
242 800-899 844 724
312 700-799 748 645
300 600-699 650 539
327 500-599 549 479
346 400-499 449 453
354 300-399 351 374
419 200-299 249 331
438 100-199 148 240
932 001-99 33 159

If we look from just 1990 to now, the list doesn't change too much:


18 1500+ 1628 1226
18 1400-1499 1439 1132
41 1300-1399 1342 963
42 1200-1299 1244 1083
72 1100-1199 1148 950
89 1000-1099 1045 823
94 900-999 948 788
114 800-899 839 743
144 700-799 750 674
134 600-699 652 537
163 500-599 549 503
172 400-499 449 486
154 300-399 349 412
201 200-299 248 345
220 100-199 148 234
500 001-99 33 144

Obviously some of the decline is due to injury; perhaps even most of the decline. Let's see if we can remove that from the equation. All players that played fewer than 12 games in Year N or Year N+1 were excluded. All players that changed teams were excluded. Finally, we'll only look from 1990 to last season, and we'll use receiving yards per game instead of receiving yards:


#WR Year N tier N Avg N+1 Avg
12 100+ 105.0 81.7
14 90-99.9 94.3 79.1
49 80-89.9 84.4 72.2
81 70-79.9 74.4 69.5
120 60-69.9 64.8 59.7
130 50-59.9 54.7 52.1
138 40-49.9 45.4 48.1
143 30-39.9 34.8 38.8
111 20-29.9 25.3 33.9
133 10-19.9 14.9 24.6
163 0.0-9.9 3.9 11.8

While the N+1 data resembles the Year N data a little more closely, there's still a very large gap. And some, if not most, of that gap can be explained by regression to the mean. Of the 14 players that averaged 98+ receiving yards per game in a season the past 16 years, none of them averaged even 90 yards per game the next year. And only two of the other 12 WRs to average 90+ yards per game in a season hit the 90-yard mark the following year. No one likes to attribute incredible success to luck, but it plays a much bigger role in sports than we tend to remember.

This entry was posted on Wednesday, September 5th, 2007 at 1:16 am and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.