OK, I'll warn you at the outset: this one is a little longer and a little more involved than my previous offerings. I realize that this isn't mathclass.net, and the last thing many of you want to do in your spare time is to read something technical. More than likely, in fact, you are currently surfing the web on the sly to avoid some technical reading at work.

However, you came here in search of fantasy football tips, and I think you
will find some in this article. I'm going to make some conclusions, and
they will be based on statistical ideas. If I just offer the conclusions
with no evidence in support of them, then I'm just some clown throwing
around crazy opinions, when in actuality I'm some clown throwing around
sound theoretical ideas. So in some sense I *have to* be at least a
little technical. Besides, some of you may be interested in the details.

I've come up with a scheme to resolve this dilemma. I tried to take the most essential ideas and put them in bold. I've averaged about one bold sentence per paragraph, and I think you can just read the bold ones to get the main ideas. If one of them intrigues you, you can read the surrounding material for more explanation. This plan also saves me the hassle of writing up a conclusion section. Just glue the bold sentences together and Presto! Instant conclusion.

Please let me know if you find this either incredibly useful or incredibly annoying.

**RULE #1 WHEN CHOOSING A DEFENSE FOR YOUR FANTASY TEAM**: Throw
last seasons stats out the window.

Literally. Throw 'em out. They are useless to you, and I mean this in the most precise sense possible. I'll lead off with a few examples.

- In 1998, the Falcons surprised everyone by forcing 43 turnovers and scoring six TDs. The next year: 18 turnovers and one measly TD.
- The 1995 Cardinals forced 42 turnovers and scored five times. In 1996, they were far less impressive: 26 turnovers and one score.
- On the flip side, consider the 1998 Eagles: 17 turnovers and no scores. In 1999: 45 and five.
- 1995 Pack: 16 turnovers and one TD. The following season: 39 and four.

In order to get to the bottom of this, I'm going to introduce a mathematical
concept called the *correlation coefficient*. If
you're a veteran of many a statistics course (and can still remember them),
you might want to skip or skim the next few paragraphs. If you've never
heard of a correlation coefficient, stay with me here, I'll try to make it
as painless as possible, mixing in as many relevant examples as possible
(by "relevant," I of course mean "pertaining to fantasy football").

Suppose I told you that a particular NFL running back had 250 rushing attempts
last year. Would you guess that his total rushing yards were closest to
100, 1000, or 2000? Hopefully, you pulled the 1000 lever. Why? Because
you know that, in general, there is a fairly predictable relationship, i.e.
a strong *correlation*,
between rushing attempts and rushing yards. **The fact that there is
such a
correlation allows you to estimate rushing yards knowing only rushing
attempts**. Of course, there is a limit to how accurate your estimate
can
be. If I asked you to guess whether the 250-carry running back's
yards
were 919, 1056, or 1102, you probably wouldn't know. The correlation just
isn't
that strong. But it's certainly strong enough to let you know that 100 and
2000 are completely out of the question.

So anytime I have two sets of data, I can ask whether or not there is a relationship between them, and if so, how strong that relationship is. The correlation coefficient is what the statisticians conjured up to answer these questions. I won't get into specifics here, but the bottom line is that if you feed your data sets into the formula, it'll spit back out a number between -1 and 1.

**If the number is close to (positive) one, that means there is a strong positive correlation**. Positive correlation means that both pieces of data should tend to increase or decrease together. The rushing attempts and rushing yards example above would have a strong positive correlation. More attempts generally means more yards. Fewer attempts means fewer yards.**If the number is close to negative one, that means there is a strong negative correlation**. This indicates that the correlation is strong, but one number tends to increase as the other decreases, and vice versa. An example of this would be to compare every NFL player's vertical leap to his weight. Generally speaking, lower weight means higher leap and vice versa. There are exceptions of course, which means the correlation isn't perfect.**If the number is 0, that means there is no relationship at all between the two sets of data**. For example, if I compared each NFL quarterback's touchdown passes with the number of pets he owns, I would expect zero correlation.

One fundamental correlation you use all the time, whether you realize
it or not, is that **NFL players' year 2000 stats will be correlated to
their 1999 stats**. If you think Peyton Manning will throw for more
TDs than Kordell Stewart in 2000, you are basing that prediction, at
least partially, on the fact that Manning threw more TDs in 1999.
Likewise, Curtis Martin will very likely rush for more yards
than Larry Centers does. Why? Because he has every year in the past.
Not exactly rocket science here, but the point is that a player's stats
from one year correlate with this stats from the next year.
The purpose of this article is to see just how strong that
correlation is for various statistics and more importantly, to attempt to
use this information to our advantage when projecting players' stats.

Consider the following:

- For all the running backs in my database (complete career data for all players who were active in 1998 or later), the correlation coefficient between rushing yards in year N and rushing yards in year N+1 is .70.
- For all the running backs in the database, the correlation coefficient between receiving TDs in year N and receiving TDs in year N+1 is .23.

Generally, **a low year-to-year correlation for a particular statistic
means
that that there is a large luck component involved in compiling that
statistic**.
I'll elaborate with an example.
Suppose you live in a universe where
every NFL back's receiving TDs are determined by rolling a six-sided die.
This would generate a year-to-year correlation of zero.
In this universe, if Sammy Scatback caught 6 TD passes last year and
Pete Plodder caught 1, what does that mean? It means Sammy got lucky, and
Pete was unlucky. It gives us no insight into what will happen this year.

On the other hand, suppose you live in a universe where every back's receiving TD total is etched in stone forevermore. Whatever a back had last year, he is guaranteed to duplicate it this year. This would generate a year-to-year correlation of one. This time, if I tell you Sammy had 6 TD passes last year and Pete had 1, then you would assume, and correctly so, that Sammy was a more talented pass receiver than Pete, and that he would continue to catch more TD passes in the future.

Meanwhile, back in this universe, year-to-year correlation coefficients are
never zero or one. They lie somewhere in the middle because both luck and
skill play a role. Further, it makes sense that **TDs have more to do
with
luck than yards do**. Consider the following example. Eddie George
caught
four TD passes last year, which
is well above average for a running back. He had 1304 rushing yards, which
is also well above average. Now answer this question: how many
things would have to have gone differently for George to catch a below
average number of TD passes? Not many. One holding call, one different
decision by the Titans' coaching staff, one different read by McNair, and
one time getting pushed out of bounds at the 1 instead of hitting the pylon,
and George has no TD receptions. On the other hand, how many things would
have to go differently for George to be a below average back in terms of
rushing yards? Lots, or one major thing. A major injury would do it, but
aside from that, it would take about 50 holding calls, 100 different
decisions by the Titans' staff, and so on, to bring George down under 800
or so rushing yards.

Bottom line: backs (like George) who caught a lot of TD passes last
season *probably* did so because of a few fortunate breaks, whereas
backs who rushed for a lot of yards did so because they're good. If
you're
good, you usually stay good, but if you're lucky, you don't usually stay
lucky. So we might hypothesize that **backs whose value was
touchdown-dependent last year will tend to decline more, as a group, than
backs whose value was more yardage-dependent**.

Let's put that hypothesis to the test with some actual data. Between 1995 and 1998, there have been 82 running backs who have scored more than 140 fantasy points (fantasy points being defined as (total yards)/10 + 6*(total TDs)). I chose 140 as the cutoff because that is roughly the borderline for being a legitimate fantasy starter. So I'm only considering the kinds of backs that have some fantasy impact -- Tony Carters and Carwell Gardners need not apply. I divided those 82 backs into two groups: a touchdown-dependent group and a yardage-dependent group, according to what percentage of their fantasy points were accounted for by TDs versus yardage. For example, heading the TD-dependent group was Terry Allen 1996 -- 460f his fantasy point total was accumulated via TDs and 54a yards. On the other end of the spectrum we have Warrick Dunn 1998. Only 80f his production came from scores, the other 92ame from yards. What happened to these 82 backs the following year?

Group Improved Declined ------------------------------------------ Touchdown-dependent 12 29 Yardage-dependent 20 21

So what does this mean for 2000? Here is a list of all backs who scored 140 or more fantasy points last year, sorted from most TD-dependent to most yardage-dependent.

LastName FirstName Pct from Pct from Yards TDs -------------------------------------------- STEWART JAMES .57 .43 DAVIS STEPHEN .60 .40 WHEATLEY TYRONE .63 .37 KIRBY TERRY .64 .36 ALLEN TERRY .66 .34 SMITH EMMITT .66 .34 JAMES EDGERRIN .68 .32 ALSTOTT MIKE .69 .31 GEORGE EDDIE .69 .31 RHETT ERRICT .71 .29 LEVENS DORSEY .73 .27 BETTIS JEROME .74 .26 GARY OLANDIS .76 .24 FAULK MARSHALL .77 .23 WATTERS RICKY .79 .21 DILLON COREY .81 .19 ENIS CURTIS .81 .19 STALEY DUCE .81 .19 GARNER CHARLIE .83 .17 MARTIN CURTIS .85 .15Draw a line between Errict Rhett and Dorsey Levens. If history repeats itself,

I need to make a few things clear here. First, this a very *general*
tendency. By no means am I guaranteeing that Stephen Davis will be a
bust or that Charlie Garner will have a huge season, and of course many of
these guys will improve or decline based on factors unrelated to this
issue (it's not hard to imagine a decline for Terry Allen or Olandis
Gary, for
example, that is unrelated). However, **there's good reason
to believe that, as a
group, the top guys will decline more than the bottom guys**. Second,
I'm not telling you not to draft Edgerrin James. Even if his fantasy
production declines, he could still be the top running back
in the league. This is just something else to consider when trying to
decide between two guys you have rated essentially even. It might cause
me to lean slightly toward Curtis Martin over James Stewart, for
example, or
toward Levens over Stephen Davis, or Enis over Wheatley.

Now, a complete rundown of all the correlation coefficients I calculated, followed by a summary of the conclusions we might draw from them.

Correlation Stat coefficient ------------------------ Rush Yards .70 Rush TDs .59 Rec. Yards .53 Rec. TDs .23 Fantasy Points .64

Correlation Stat coefficient ------------------------ Rec. Yards .53 Rec. TDs .45 Fantasy Points .48

Correlation Stat coefficient ------------------------ Pass Yards .47 Pass TDs .46 Rush Yards .70 Rush TDs .40 INT .36 Fantasy Points .48 (based on 6 points per passing TD, 1 point per 25 passing yards)

- The fantasy points coefficient is much higher for running backs than
for receivers or quarterbacks.
**This says that running backs are more likely to retain their value from year to year than QBs or receivers are**. That is, backs are more predictable. This is intuitive to most experienced fantasy players and is one of the many reasons why several feel that a top-flight back should be the cornerstone of a fantasy team. -
**For every position, the year-to-year correlation is stronger for yards than it is for touchdowns**. This is the phenomenon discussed earlier for RBs. It seems to exist for QBs and receivers as well, but is much less pronounced. - The correlation is fairly weak for interceptions. If your league
counts them,
**don't let a high INT total from the previous year scare you off too much**. - An interesting fact not listed on the chart above: the correlation
is stronger between rushing yards in year N and rushing TDs in year N+1
than it is between rushing TDs in year N and rushing TDs in year N+1
(although only slightly so). In
other words,
**if you want to know how many TDs a back will score and can only have one piece of information, you'd be better off knowing last year's yards than last year's TDs**.

Correlation Stat coefficient ------------------------------- Turnovers forced -.10 Touchdowns -.11 Fantasy points -.08Here I'm defining fantasy points as 3*turnovers + 10*TDs, which is what one of my leagues uses. I don't have data for sacks.

This is remarkable. Not only are the correlations negligibly small,
they are negative. That means that, **if anything, you would predict
that last year's bad defenses will be good this year, and vice versa**.
To make sure this plays out in practice, I looked at every defense
from 1995 to 1998 and ranked them, best to worst, in terms of fantasy
points. The best defense of the period was the 1998 Seahawks (42
turnovers, 10 TDs, 226 fantasy points), and the worst was the 1998
Eagles (17 turnovers,
0 TDs, 51 fantasy points). Here are the top 10 and bottom 10
defenses of the period 1995-1998, along with how they did the
following year:

-------+ TOP 10 | -------+------------------+-------------- FIRST YEAR | THE NEXT YEAR --------------------------+-------------- TM YEAR TO TD FPT | TO TD FPT --------------------------+-------------- sea 1998 42 10 226 | 36 2 128 atl 1998 43 6 189 | 18 1 64 nor 1998 33 9 189 | 34 2 122 ari 1995 42 5 176 | 26 1 88 nyg 1997 44 4 172 | 26 3 108 sfo 1995 34 7 172 | 34 2 122 pit 1996 40 5 170 | 33 2 119 phi 1995 38 5 164 | 32 3 126 den 1997 31 7 163 | 30 2 110 cin 1996 44 3 162 | 23 1 79 ----------+ BOTTOM 10 | ----------+---------------+-------------- FIRST YEAR | THE NEXT YEAR --------------------------+-------------- TM YEAR TO TD FPT | TO TD FPT --------------------------+-------------- det 1998 21 1 73 | 32 4 136 den 1995 21 1 73 | 32 1 106 atl 1996 23 0 69 | 28 0 84 was 1998 22 0 66 | 38 4 154 buf 1997 22 0 66 | 31 2 113 car 1997 22 0 66 | 33 3 129 nor 1996 21 0 63 | 31 1 103 gnb 1995 16 1 58 | 39 4 157 ind 1998 19 0 57 | 21 2 83 phi 1998 17 0 51 | 45 5 185The 10 killer defenses averaged 106.6 fantasy points the following year, while the 10 pathetic defenses averaged 125 fantasy points the following year. Let me restate that to make sure you've got it:

Here is a more complete breakdown of the data:

GROUP AVERAGE IN (rank in year N) YEAR N+1 ------------------------------- 1-10 106.6 11-20 123.3 21-30 113.6 31-40 113.5 41-50 113.7 51-60 100.7 61-70 129.3 71-80 137.5 81-90 104.0 91-100 109.9 101-110 117.6 111-120 125.0Let me make sure this is clear. Over the period 1995-1998, there were 120 defenses. I've ranked them 1 (best) to 120 (worst), and then divided them into groups of 10. The rows at the top correspond to the best defenses and the rows at the bottom correspond to the worst. The second column tells you how those teams did the next year. As you can see, there is no discernible pattern.

This is just one more reason to pick your defense late. Let someone
else spend a relatively early pick on last year's top-ranked
defense. **You can pick from the bottom (several rounds later), and
have just as good a chance of coming up with a gem**. That's true in
theory, and the data shows that it's true in practice as well.

I realize that fantasy defense scoring systems vary widely, and yours may not be anything like the one outlined above. However, the individual coefficients for turnovers and TDs suggest that you're likely to get similar results if your defensive scoring system includes those two factors in a significant way. Throwing in sacks, points allowed, and yards allowed might change things, depending on how all the factors are weighted.