SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed.

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

# Pro Football Reference Blog

## Regression to the Mean

Posted by Chase Stuart on September 5, 2007

Whenever a player has an incredible season, people should expect lower results the following year due to regression to the mean. While most of the readers of this blog are certainly familiar with the concept, I thought I'd spend a few minutes today discussing exactly why it occurs. You also might want to check out how "regression to the mean" plays a part in the Madden Curse.

Let's pretend there are 10 WRs in the league, and break the NFL season up into three parts (ignoring that 16 doesn't divide by 3 evenly). Suppose a stud WR will gain 600 yards in a third of a season 1/3 of the time, 500 yards in a third of a season 1/3 of the time, and 400 yards in a third of a season 1/3 of the time. Obviously, his expected number of receiving yards in a given season is 1500. But equally obvious is that occasionally he'll get as many as 1800 yards in a season or as few as 1200 yards, purely based on luck. In fact, he can have 27 different possible season ending totals, assuming the order in which he gains his yards matters:

```
Part1   PartII  PartIII Total
600	600	600	1800
500	600	600	1700
600	500	600	1700
600	600	500	1700
400	600	600	1600
500	500	600	1600
500	600	500	1600
600	400	600	1600
600	500	500	1600
600	600	400	1600
400	500	600	1500
400	600	500	1500
500	400	600	1500
500	500	500	1500
500	600	400	1500
600	400	500	1500
600	500	400	1500
400	400	600	1400
400	500	500	1400
400	600	400	1400
500	400	500	1400
500	500	400	1400
600	400	400	1400
400	400	500	1300
400	500	400	1300
500	400	400	1300
400	400	400	1200
```

So how often will each season-ending total occur (assuming the order in which he gained his yards doesn't matter)?

```
1800	1/27
1700	3/27
1600	6/27
1500	7/27
1400	6/27
1300	3/27
1200	1/27
```

The mode, median and mean of the sample are all 1500 receiving yards. Two-thirds of the time, our fictional WR will end up with between 1400-1600 yards. Once every 27 times, though, we'll see a crazy result.

Let's expand our field a bit to 10 WRs. Each WR has a 33% chance of any of the three numbers occurring in any of the three thirds of a season.

```
Part1   PartII  PartIII Total
WR1	600	500	400	1500
WR2	575	475	375	1425
WR3	550	450	350	1350
WR4	525	425	325	1275
WR5	500	400	300	1200
WR6	475	375	275	1125
WR7	450	350	250	1050
WR8	425	325	225	 975
WR9	400	300	200	 900
WR10	375	275	175	 825
```

What sort of season-ending outputs should we expect from this group?

```
rate/27   1	  3	  6	  7	  6	  3	  1
WR1	1800	1700	1600	1500	1400	1300	1200
WR2	1725	1625	1525	1425	1325	1225	1125
WR3	1650	1550	1450	1350	1250	1150	1050
WR4	1575	1475	1375	1275	1175	1075	 975
WR5	1500	1400	1300	1200	1100	1000	 900
WR6	1425	1325	1225	1125	1025	 925	 825
WR7	1350	1250	1150	1050	 950	 850	 750
WR8	1275	1175	1075	 975	 875	 775	 675
WR9	1200	1100	1000	 900	 800	 700	 600
WR10	1125	1025	 925	 825	 725	 625	 525
```

So WR4 should be expected to get 1375 yards or more, 10 out of 27 times (Do you see why?). And WR10 should be expected to get just 525 yards once, but also 1125 yards once. And so on.

What's it all mean? With our 10 WRs, what sort of results should we expect in a given season? How many times will a WR get between 1600 and 1699 yards? How many times between 1000-1099? What range should we see the most of? The table below answers all those questions.

```
Yards		Times in a season
1800+		 0.04
1700-1799	 0.15
1600-1699	 0.37
1500-1599	 0.67
1400-1499	 0.96
1300-1399	 1.19
1200-1299	 1.30
1100-1199	 1.33
1000-1099	 1.30
900-999		 1.15
800-899		 0.85
700-799		 0.48
600-699		 0.19
500-599		 0.04
```

Only once every 27 seasons played should we see a WR get 1800 yards, because there's only one WR that can even reach 1800 (WR1) and he only does it once every 27 years. But we should see 1500 hit a few more times; WR1 will land in the 1500s seven times, WR2 six times, WR3 three times, and WR4 and WR5 one time each.

Now we get to the point of today's post. Assume that these players don't age from year to year, and their situations don't change one bit. We'd project the same thing for them every single year.

Well what happens the year we see a WR (WR1, of course) hit 1800 yards? We'd project 1500 yards for him the next year. What about when we see a WR end up with 1500-1600 yards? Well seven times it will be WR1, and we'd project exactly 1500 yards again. Six times we'd project 1425 yards the next year, three times we'd project 1350 yards, once we'd project 1275 and once we'd project 1200. In Year N, the 18 WR seasons that landed in the 1500-1600 yard range averaged 1521 yards. In Year N+1, we'd project a weighted average of 1421 yards. Remember, absolutely nothing changed in between the two seasons, yet we'd reduced our projection for the WRs by a full 100 yards.

The reasoning behind "regression to the mean" is iron-clad: when an impressive feat is hit, there's a good bit of luck involved. Sometimes, it's hit by someone who is actually as good as his stats (although this becomes less likely the more impressive the feat is). But other times it's by a player who is a little lucky, and sometimes it's by a player who's really lucky.

Now NFL players aren't computer programs or dice, but the same theory applies. And we see these results every year in the NFL. No one projects LaDainian Tomlinson to rush for 28 TDs again, because we know his true ability isn't 28 TDs per season. To reach such a ridiculous result, a good bit of luck had to be involved. And regression to the mean becomes more likely in the NFL than when flipping a coin, because of strength of schedule. Many impressive feats involve general luck, and also luck due to facing an easy schedule. Every year, some team plays the easiest schedule in the league, and as a result, will achieve results they couldn't normally achieve without a ton of luck. But since strength of schedule is incredibly inconsistent from year to year, we see this effect ride on top of regression to the mean to push down the great seasons. Because if you're going to throw for 49 TDs in a season, you've got to be: a) awesome; b) have lots of luck; and c) have a really easy schedule. And only one of those traits is likely to be there the next season.

Enough theory...let's look at some real life results.

The table below includes all WRs since 1960, with the 1960-1961, 1977-1978, 1981-1982, 1982-1983, 1986-1987, 1987-1988 and 2006-2007 season pairs excluded due to changes in the league schedule (and since the 2007 hasn't been played yet). All players that did not play in Year N+1 were excluded, and all players that played for multiple teams had their yardage from all teams combined.

```#WR	Year N tier	N Avg	N+1 Avg
22	1500+		1625	1164
25	1400-1499	1437	1137
57	1300-1399	1344	 997
67	1200-1299	1244	1025
122	1100-1199	1146	 900
155	1000-1099	1046	 845
168	900-999		 948	 798
242	800-899		 844	 724
312	700-799		 748	 645
300	600-699		 650	 539
327	500-599		 549	 479
346	400-499		 449	 453
354	300-399		 351	 374
419	200-299		 249	 331
438	100-199		 148	 240
932	001-99		  33	 159```

If we look from just 1990 to now, the list doesn't change too much:

```
18	1500+		1628	1226
18	1400-1499	1439	1132
41	1300-1399	1342	 963
42	1200-1299	1244	1083
72	1100-1199	1148	 950
89	1000-1099	1045	 823
94	900-999		 948	 788
114	800-899		 839	 743
144	700-799		 750	 674
134	600-699		 652	 537
163	500-599		 549	 503
172	400-499		 449	 486
154	300-399		 349	 412
201	200-299		 248	 345
220	100-199		 148	 234
500	001-99		  33	 144```

Obviously some of the decline is due to injury; perhaps even most of the decline. Let's see if we can remove that from the equation. All players that played fewer than 12 games in Year N or Year N+1 were excluded. All players that changed teams were excluded. Finally, we'll only look from 1990 to last season, and we'll use receiving yards per game instead of receiving yards:

```
#WR	Year N tier	N Avg	N+1 Avg
12	100+		105.0	81.7
14	90-99.9		 94.3	79.1
49	80-89.9		 84.4	72.2
81	70-79.9		 74.4	69.5
120	60-69.9		 64.8	59.7
130	50-59.9		 54.7	52.1
138	40-49.9		 45.4	48.1
143	30-39.9		 34.8	38.8
111	20-29.9		 25.3	33.9
133	10-19.9		 14.9	24.6
163	0.0-9.9		  3.9	11.8```

While the N+1 data resembles the Year N data a little more closely, there's still a very large gap. And some, if not most, of that gap can be explained by regression to the mean. Of the 14 players that averaged 98+ receiving yards per game in a season the past 16 years, none of them averaged even 90 yards per game the next year. And only two of the other 12 WRs to average 90+ yards per game in a season hit the 90-yard mark the following year. No one likes to attribute incredible success to luck, but it plays a much bigger role in sports than we tend to remember.

This entry was posted on Wednesday, September 5th, 2007 at 1:16 am and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.