## What is a correlation coefficient and what does it have to do with Ty Law?

OK, I'll warn you at the outset: this one is a little longer and a little more involved than my previous offerings. I realize that this isn't mathclass.net, and the last thing many of you want to do in your spare time is to read something technical. More than likely, in fact, you are currently surfing the web on the sly to avoid some technical reading at work.

However, you came here in search of fantasy football tips, and I think you will find some in this article. I'm going to make some conclusions, and they will be based on statistical ideas. If I just offer the conclusions with no evidence in support of them, then I'm just some clown throwing around crazy opinions, when in actuality I'm some clown throwing around sound theoretical ideas. So in some sense I have to be at least a little technical. Besides, some of you may be interested in the details.

I've come up with a scheme to resolve this dilemma. I tried to take the most essential ideas and put them in bold. I've averaged about one bold sentence per paragraph, and I think you can just read the bold ones to get the main ideas. If one of them intrigues you, you can read the surrounding material for more explanation. This plan also saves me the hassle of writing up a conclusion section. Just glue the bold sentences together and Presto! Instant conclusion.

Please let me know if you find this either incredibly useful or incredibly annoying.

RULE #1 WHEN CHOOSING A DEFENSE FOR YOUR FANTASY TEAM: Throw last seasons stats out the window.

Literally. Throw 'em out. They are useless to you, and I mean this in the most precise sense possible. I'll lead off with a few examples.

• In 1998, the Falcons surprised everyone by forcing 43 turnovers and scoring six TDs. The next year: 18 turnovers and one measly TD.
• The 1995 Cardinals forced 42 turnovers and scored five times. In 1996, they were far less impressive: 26 turnovers and one score.
• On the flip side, consider the 1998 Eagles: 17 turnovers and no scores. In 1999: 45 and five.
• 1995 Pack: 16 turnovers and one TD. The following season: 39 and four.
The above examples prove nothing except that it is possible for a team's defense to be a fantasy terror one year and a dud the next (and vice versa). The question before the house now is: how rare are examples like these?

In order to get to the bottom of this, I'm going to introduce a mathematical concept called the correlation coefficient. If you're a veteran of many a statistics course (and can still remember them), you might want to skip or skim the next few paragraphs. If you've never heard of a correlation coefficient, stay with me here, I'll try to make it as painless as possible, mixing in as many relevant examples as possible (by "relevant," I of course mean "pertaining to fantasy football").

Suppose I told you that a particular NFL running back had 250 rushing attempts last year. Would you guess that his total rushing yards were closest to 100, 1000, or 2000? Hopefully, you pulled the 1000 lever. Why? Because you know that, in general, there is a fairly predictable relationship, i.e. a strong correlation, between rushing attempts and rushing yards. The fact that there is such a correlation allows you to estimate rushing yards knowing only rushing attempts. Of course, there is a limit to how accurate your estimate can be. If I asked you to guess whether the 250-carry running back's yards were 919, 1056, or 1102, you probably wouldn't know. The correlation just isn't that strong. But it's certainly strong enough to let you know that 100 and 2000 are completely out of the question.

So anytime I have two sets of data, I can ask whether or not there is a relationship between them, and if so, how strong that relationship is. The correlation coefficient is what the statisticians conjured up to answer these questions. I won't get into specifics here, but the bottom line is that if you feed your data sets into the formula, it'll spit back out a number between -1 and 1.

• If the number is close to (positive) one, that means there is a strong positive correlation. Positive correlation means that both pieces of data should tend to increase or decrease together. The rushing attempts and rushing yards example above would have a strong positive correlation. More attempts generally means more yards. Fewer attempts means fewer yards.
• If the number is close to negative one, that means there is a strong negative correlation. This indicates that the correlation is strong, but one number tends to increase as the other decreases, and vice versa. An example of this would be to compare every NFL player's vertical leap to his weight. Generally speaking, lower weight means higher leap and vice versa. There are exceptions of course, which means the correlation isn't perfect.
• If the number is 0, that means there is no relationship at all between the two sets of data. For example, if I compared each NFL quarterback's touchdown passes with the number of pets he owns, I would expect zero correlation.

One fundamental correlation you use all the time, whether you realize it or not, is that NFL players' year 2000 stats will be correlated to their 1999 stats. If you think Peyton Manning will throw for more TDs than Kordell Stewart in 2000, you are basing that prediction, at least partially, on the fact that Manning threw more TDs in 1999. Likewise, Curtis Martin will very likely rush for more yards than Larry Centers does. Why? Because he has every year in the past. Not exactly rocket science here, but the point is that a player's stats from one year correlate with this stats from the next year. The purpose of this article is to see just how strong that correlation is for various statistics and more importantly, to attempt to use this information to our advantage when projecting players' stats.

Consider the following:

• For all the running backs in my database (complete career data for all players who were active in 1998 or later), the correlation coefficient between rushing yards in year N and rushing yards in year N+1 is .70.
• For all the running backs in the database, the correlation coefficient between receiving TDs in year N and receiving TDs in year N+1 is .23.
Remember, 1 is a very strong correlation, and 0 indicates no correlation. Hence, .70 is fairly strong. This means that running backs who rushed for a lot of yards in 1999 will be likely to rush for a lot of yards in 2000. On the other hand, .23 is a relatively weak correlation, and that means that backs with a lot of TD receptions last year will not necessarily have a lot this year.

Generally, a low year-to-year correlation for a particular statistic means that that there is a large luck component involved in compiling that statistic. I'll elaborate with an example. Suppose you live in a universe where every NFL back's receiving TDs are determined by rolling a six-sided die. This would generate a year-to-year correlation of zero. In this universe, if Sammy Scatback caught 6 TD passes last year and Pete Plodder caught 1, what does that mean? It means Sammy got lucky, and Pete was unlucky. It gives us no insight into what will happen this year.

On the other hand, suppose you live in a universe where every back's receiving TD total is etched in stone forevermore. Whatever a back had last year, he is guaranteed to duplicate it this year. This would generate a year-to-year correlation of one. This time, if I tell you Sammy had 6 TD passes last year and Pete had 1, then you would assume, and correctly so, that Sammy was a more talented pass receiver than Pete, and that he would continue to catch more TD passes in the future.

Meanwhile, back in this universe, year-to-year correlation coefficients are never zero or one. They lie somewhere in the middle because both luck and skill play a role. Further, it makes sense that TDs have more to do with luck than yards do. Consider the following example. Eddie George caught four TD passes last year, which is well above average for a running back. He had 1304 rushing yards, which is also well above average. Now answer this question: how many things would have to have gone differently for George to catch a below average number of TD passes? Not many. One holding call, one different decision by the Titans' coaching staff, one different read by McNair, and one time getting pushed out of bounds at the 1 instead of hitting the pylon, and George has no TD receptions. On the other hand, how many things would have to go differently for George to be a below average back in terms of rushing yards? Lots, or one major thing. A major injury would do it, but aside from that, it would take about 50 holding calls, 100 different decisions by the Titans' staff, and so on, to bring George down under 800 or so rushing yards.

Bottom line: backs (like George) who caught a lot of TD passes last season probably did so because of a few fortunate breaks, whereas backs who rushed for a lot of yards did so because they're good. If you're good, you usually stay good, but if you're lucky, you don't usually stay lucky. So we might hypothesize that backs whose value was touchdown-dependent last year will tend to decline more, as a group, than backs whose value was more yardage-dependent.

Let's put that hypothesis to the test with some actual data. Between 1995 and 1998, there have been 82 running backs who have scored more than 140 fantasy points (fantasy points being defined as (total yards)/10 + 6*(total TDs)). I chose 140 as the cutoff because that is roughly the borderline for being a legitimate fantasy starter. So I'm only considering the kinds of backs that have some fantasy impact -- Tony Carters and Carwell Gardners need not apply. I divided those 82 backs into two groups: a touchdown-dependent group and a yardage-dependent group, according to what percentage of their fantasy points were accounted for by TDs versus yardage. For example, heading the TD-dependent group was Terry Allen 1996 -- 460f his fantasy point total was accumulated via TDs and 54a yards. On the other end of the spectrum we have Warrick Dunn 1998. Only 80f his production came from scores, the other 92ame from yards. What happened to these 82 backs the following year?

```Group                 Improved    Declined
------------------------------------------
Touchdown-dependent      12          29
Yardage-dependent        20          21
```
So only 290f the TD-dependent backs improved the next year, while 49% of the yardage-dependent backs improved. These numbers confirm the hypothesis. As usual, I would have more confidence in this result if I had more data, but it looks fairly convincing, especially since it's backed up by theory.

So what does this mean for 2000? Here is a list of all backs who scored 140 or more fantasy points last year, sorted from most TD-dependent to most yardage-dependent.

```LastName     FirstName     Pct from Pct from
Yards     TDs
--------------------------------------------
STEWART      JAMES           .57     .43
DAVIS        STEPHEN         .60     .40
WHEATLEY     TYRONE          .63     .37
KIRBY        TERRY           .64     .36
ALLEN        TERRY           .66     .34
SMITH        EMMITT          .66     .34
JAMES        EDGERRIN        .68     .32
ALSTOTT      MIKE            .69     .31
GEORGE       EDDIE           .69     .31
RHETT        ERRICT          .71     .29
LEVENS       DORSEY          .73     .27
BETTIS       JEROME          .74     .26
GARY         OLANDIS         .76     .24
FAULK        MARSHALL        .77     .23
WATTERS      RICKY           .79     .21
DILLON       COREY           .81     .19
ENIS         CURTIS          .81     .19
STALEY       DUCE            .81     .19
GARNER       CHARLIE         .83     .17
MARTIN       CURTIS          .85     .15
```
Draw a line between Errict Rhett and Dorsey Levens. If history repeats itself, most of the guys above the line will see their fantasy production decline in 2000, while only about half the guys below the line will decline.

I need to make a few things clear here. First, this a very general tendency. By no means am I guaranteeing that Stephen Davis will be a bust or that Charlie Garner will have a huge season, and of course many of these guys will improve or decline based on factors unrelated to this issue (it's not hard to imagine a decline for Terry Allen or Olandis Gary, for example, that is unrelated). However, there's good reason to believe that, as a group, the top guys will decline more than the bottom guys. Second, I'm not telling you not to draft Edgerrin James. Even if his fantasy production declines, he could still be the top running back in the league. This is just something else to consider when trying to decide between two guys you have rated essentially even. It might cause me to lean slightly toward Curtis Martin over James Stewart, for example, or toward Levens over Stephen Davis, or Enis over Wheatley.

Now, a complete rundown of all the correlation coefficients I calculated, followed by a summary of the conclusions we might draw from them.

### Running Backs

```             Correlation
Stat         coefficient
------------------------
Rush Yards      .70
Rush TDs        .59
Rec. Yards      .53
Rec. TDs        .23
Fantasy Points  .64
```

```             Correlation
Stat         coefficient
------------------------
Rec. Yards      .53
Rec. TDs        .45
Fantasy Points  .48
```

### Quarterbacks

```             Correlation
Stat         coefficient
------------------------
Pass Yards      .47
Pass TDs        .46
Rush Yards      .70
Rush TDs        .40
INT             .36
Fantasy Points  .48 (based on 6 points per passing TD, 1 point per 25 passing yards)
```

### Notes:

• The fantasy points coefficient is much higher for running backs than for receivers or quarterbacks. This says that running backs are more likely to retain their value from year to year than QBs or receivers are. That is, backs are more predictable. This is intuitive to most experienced fantasy players and is one of the many reasons why several feel that a top-flight back should be the cornerstone of a fantasy team.
• For every position, the year-to-year correlation is stronger for yards than it is for touchdowns. This is the phenomenon discussed earlier for RBs. It seems to exist for QBs and receivers as well, but is much less pronounced.
• The correlation is fairly weak for interceptions. If your league counts them, don't let a high INT total from the previous year scare you off too much.
• An interesting fact not listed on the chart above: the correlation is stronger between rushing yards in year N and rushing TDs in year N+1 than it is between rushing TDs in year N and rushing TDs in year N+1 (although only slightly so). In other words, if you want to know how many TDs a back will score and can only have one piece of information, you'd be better off knowing last year's yards than last year's TDs.
If you've made it this far, you've probably forgotten that I started this article by talking about defense. In particular, I began with a list of defenses whose fantasy values changed drastically from one season to the next, and I wondered how rare such examples were. Check out the year-to-year correlations for defensive statistics:
```                   Correlation
Stat               coefficient
-------------------------------
Turnovers forced      -.10
Touchdowns            -.11
Fantasy points        -.08
```
Here I'm defining fantasy points as 3*turnovers + 10*TDs, which is what one of my leagues uses. I don't have data for sacks.

This is remarkable. Not only are the correlations negligibly small, they are negative. That means that, if anything, you would predict that last year's bad defenses will be good this year, and vice versa. To make sure this plays out in practice, I looked at every defense from 1995 to 1998 and ranked them, best to worst, in terms of fantasy points. The best defense of the period was the 1998 Seahawks (42 turnovers, 10 TDs, 226 fantasy points), and the worst was the 1998 Eagles (17 turnovers, 0 TDs, 51 fantasy points). Here are the top 10 and bottom 10 defenses of the period 1995-1998, along with how they did the following year:

```-------+
TOP 10 |
-------+------------------+--------------
FIRST YEAR           | THE NEXT YEAR
--------------------------+--------------
TM   YEAR   TO  TD   FPT  |  TO  TD   FPT
--------------------------+--------------
sea  1998   42  10   226  |  36   2   128
atl  1998   43   6   189  |  18   1    64
nor  1998   33   9   189  |  34   2   122
ari  1995   42   5   176  |  26   1    88
nyg  1997   44   4   172  |  26   3   108
sfo  1995   34   7   172  |  34   2   122
pit  1996   40   5   170  |  33   2   119
phi  1995   38   5   164  |  32   3   126
den  1997   31   7   163  |  30   2   110
cin  1996   44   3   162  |  23   1    79

----------+
BOTTOM 10 |
----------+---------------+--------------
FIRST YEAR           | THE NEXT YEAR
--------------------------+--------------
TM   YEAR   TO  TD   FPT  |  TO  TD   FPT
--------------------------+--------------
det  1998   21   1    73  |  32   4   136
den  1995   21   1    73  |  32   1   106
atl  1996   23   0    69  |  28   0    84
was  1998   22   0    66  |  38   4   154
buf  1997   22   0    66  |  31   2   113
car  1997   22   0    66  |  33   3   129
nor  1996   21   0    63  |  31   1   103
gnb  1995   16   1    58  |  39   4   157
ind  1998   19   0    57  |  21   2    83
phi  1998   17   0    51  |  45   5   185
```
The 10 killer defenses averaged 106.6 fantasy points the following year, while the 10 pathetic defenses averaged 125 fantasy points the following year. Let me restate that to make sure you've got it: the absolute bottom of the barrel godawful defenses did better the next year -- substantially better -- than the top defenses. In a typical season, a 10th ranked defense (roughly equivalent to the worst one that deserves to be a fantasy "starter") will score around 125-130 points. Only two of the top defenses reached that standard, while five of the bottom defenses did.

Here is a more complete breakdown of the data:

```    GROUP            AVERAGE IN
(rank in year N)      YEAR N+1
-------------------------------
1-10                106.6
11-20                123.3
21-30                113.6
31-40                113.5
41-50                113.7
51-60                100.7
61-70                129.3
71-80                137.5
81-90                104.0
91-100               109.9
101-110               117.6
111-120               125.0
```
Let me make sure this is clear. Over the period 1995-1998, there were 120 defenses. I've ranked them 1 (best) to 120 (worst), and then divided them into groups of 10. The rows at the top correspond to the best defenses and the rows at the bottom correspond to the worst. The second column tells you how those teams did the next year. As you can see, there is no discernible pattern. Good defenses from last year are not likely to be better this year than bad defenses from last year.

This is just one more reason to pick your defense late. Let someone else spend a relatively early pick on last year's top-ranked defense. You can pick from the bottom (several rounds later), and have just as good a chance of coming up with a gem. That's true in theory, and the data shows that it's true in practice as well.

I realize that fantasy defense scoring systems vary widely, and yours may not be anything like the one outlined above. However, the individual coefficients for turnovers and TDs suggest that you're likely to get similar results if your defensive scoring system includes those two factors in a significant way. Throwing in sacks, points allowed, and yards allowed might change things, depending on how all the factors are weighted.