However, you came here in search of fantasy football tips, and I think you will find some in this article. I'm going to make some conclusions, and they will be based on statistical ideas. If I just offer the conclusions with no evidence in support of them, then I'm just some clown throwing around crazy opinions, when in actuality I'm some clown throwing around sound theoretical ideas. So in some sense I have to be at least a little technical. Besides, some of you may be interested in the details.
I've come up with a scheme to resolve this dilemma. I tried to take the most essential ideas and put them in bold. I've averaged about one bold sentence per paragraph, and I think you can just read the bold ones to get the main ideas. If one of them intrigues you, you can read the surrounding material for more explanation. This plan also saves me the hassle of writing up a conclusion section. Just glue the bold sentences together and Presto! Instant conclusion.
Please let me know if you find this either incredibly useful or incredibly annoying.
RULE #1 WHEN CHOOSING A DEFENSE FOR YOUR FANTASY TEAM: Throw last seasons stats out the window.
Literally. Throw 'em out. They are useless to you, and I mean this in the most precise sense possible. I'll lead off with a few examples.
In order to get to the bottom of this, I'm going to introduce a mathematical concept called the correlation coefficient. If you're a veteran of many a statistics course (and can still remember them), you might want to skip or skim the next few paragraphs. If you've never heard of a correlation coefficient, stay with me here, I'll try to make it as painless as possible, mixing in as many relevant examples as possible (by "relevant," I of course mean "pertaining to fantasy football").
Suppose I told you that a particular NFL running back had 250 rushing attempts last year. Would you guess that his total rushing yards were closest to 100, 1000, or 2000? Hopefully, you pulled the 1000 lever. Why? Because you know that, in general, there is a fairly predictable relationship, i.e. a strong correlation, between rushing attempts and rushing yards. The fact that there is such a correlation allows you to estimate rushing yards knowing only rushing attempts. Of course, there is a limit to how accurate your estimate can be. If I asked you to guess whether the 250-carry running back's yards were 919, 1056, or 1102, you probably wouldn't know. The correlation just isn't that strong. But it's certainly strong enough to let you know that 100 and 2000 are completely out of the question.
So anytime I have two sets of data, I can ask whether or not there is a relationship between them, and if so, how strong that relationship is. The correlation coefficient is what the statisticians conjured up to answer these questions. I won't get into specifics here, but the bottom line is that if you feed your data sets into the formula, it'll spit back out a number between -1 and 1.
One fundamental correlation you use all the time, whether you realize it or not, is that NFL players' year 2000 stats will be correlated to their 1999 stats. If you think Peyton Manning will throw for more TDs than Kordell Stewart in 2000, you are basing that prediction, at least partially, on the fact that Manning threw more TDs in 1999. Likewise, Curtis Martin will very likely rush for more yards than Larry Centers does. Why? Because he has every year in the past. Not exactly rocket science here, but the point is that a player's stats from one year correlate with this stats from the next year. The purpose of this article is to see just how strong that correlation is for various statistics and more importantly, to attempt to use this information to our advantage when projecting players' stats.
Consider the following:
Generally, a low year-to-year correlation for a particular statistic means that that there is a large luck component involved in compiling that statistic. I'll elaborate with an example. Suppose you live in a universe where every NFL back's receiving TDs are determined by rolling a six-sided die. This would generate a year-to-year correlation of zero. In this universe, if Sammy Scatback caught 6 TD passes last year and Pete Plodder caught 1, what does that mean? It means Sammy got lucky, and Pete was unlucky. It gives us no insight into what will happen this year.
On the other hand, suppose you live in a universe where every back's receiving TD total is etched in stone forevermore. Whatever a back had last year, he is guaranteed to duplicate it this year. This would generate a year-to-year correlation of one. This time, if I tell you Sammy had 6 TD passes last year and Pete had 1, then you would assume, and correctly so, that Sammy was a more talented pass receiver than Pete, and that he would continue to catch more TD passes in the future.
Meanwhile, back in this universe, year-to-year correlation coefficients are never zero or one. They lie somewhere in the middle because both luck and skill play a role. Further, it makes sense that TDs have more to do with luck than yards do. Consider the following example. Eddie George caught four TD passes last year, which is well above average for a running back. He had 1304 rushing yards, which is also well above average. Now answer this question: how many things would have to have gone differently for George to catch a below average number of TD passes? Not many. One holding call, one different decision by the Titans' coaching staff, one different read by McNair, and one time getting pushed out of bounds at the 1 instead of hitting the pylon, and George has no TD receptions. On the other hand, how many things would have to go differently for George to be a below average back in terms of rushing yards? Lots, or one major thing. A major injury would do it, but aside from that, it would take about 50 holding calls, 100 different decisions by the Titans' staff, and so on, to bring George down under 800 or so rushing yards.
Bottom line: backs (like George) who caught a lot of TD passes last season probably did so because of a few fortunate breaks, whereas backs who rushed for a lot of yards did so because they're good. If you're good, you usually stay good, but if you're lucky, you don't usually stay lucky. So we might hypothesize that backs whose value was touchdown-dependent last year will tend to decline more, as a group, than backs whose value was more yardage-dependent.
Let's put that hypothesis to the test with some actual data. Between 1995 and 1998, there have been 82 running backs who have scored more than 140 fantasy points (fantasy points being defined as (total yards)/10 + 6*(total TDs)). I chose 140 as the cutoff because that is roughly the borderline for being a legitimate fantasy starter. So I'm only considering the kinds of backs that have some fantasy impact -- Tony Carters and Carwell Gardners need not apply. I divided those 82 backs into two groups: a touchdown-dependent group and a yardage-dependent group, according to what percentage of their fantasy points were accounted for by TDs versus yardage. For example, heading the TD-dependent group was Terry Allen 1996 -- 460f his fantasy point total was accumulated via TDs and 54a yards. On the other end of the spectrum we have Warrick Dunn 1998. Only 80f his production came from scores, the other 92 ame from yards. What happened to these 82 backs the following year?
Group Improved Declined ------------------------------------------ Touchdown-dependent 12 29 Yardage-dependent 20 21So only 290f the TD-dependent backs improved the next year, while 49% of the yardage-dependent backs improved. These numbers confirm the hypothesis. As usual, I would have more confidence in this result if I had more data, but it looks fairly convincing, especially since it's backed up by theory.
So what does this mean for 2000? Here is a list of all backs who scored 140 or more fantasy points last year, sorted from most TD-dependent to most yardage-dependent.
LastName FirstName Pct from Pct from Yards TDs -------------------------------------------- STEWART JAMES .57 .43 DAVIS STEPHEN .60 .40 WHEATLEY TYRONE .63 .37 KIRBY TERRY .64 .36 ALLEN TERRY .66 .34 SMITH EMMITT .66 .34 JAMES EDGERRIN .68 .32 ALSTOTT MIKE .69 .31 GEORGE EDDIE .69 .31 RHETT ERRICT .71 .29 LEVENS DORSEY .73 .27 BETTIS JEROME .74 .26 GARY OLANDIS .76 .24 FAULK MARSHALL .77 .23 WATTERS RICKY .79 .21 DILLON COREY .81 .19 ENIS CURTIS .81 .19 STALEY DUCE .81 .19 GARNER CHARLIE .83 .17 MARTIN CURTIS .85 .15Draw a line between Errict Rhett and Dorsey Levens. If history repeats itself, most of the guys above the line will see their fantasy production decline in 2000, while only about half the guys below the line will decline.
I need to make a few things clear here. First, this a very general tendency. By no means am I guaranteeing that Stephen Davis will be a bust or that Charlie Garner will have a huge season, and of course many of these guys will improve or decline based on factors unrelated to this issue (it's not hard to imagine a decline for Terry Allen or Olandis Gary, for example, that is unrelated). However, there's good reason to believe that, as a group, the top guys will decline more than the bottom guys. Second, I'm not telling you not to draft Edgerrin James. Even if his fantasy production declines, he could still be the top running back in the league. This is just something else to consider when trying to decide between two guys you have rated essentially even. It might cause me to lean slightly toward Curtis Martin over James Stewart, for example, or toward Levens over Stephen Davis, or Enis over Wheatley.
Now, a complete rundown of all the correlation coefficients I calculated, followed by a summary of the conclusions we might draw from them.
Correlation Stat coefficient ------------------------ Rush Yards .70 Rush TDs .59 Rec. Yards .53 Rec. TDs .23 Fantasy Points .64
Correlation Stat coefficient ------------------------ Rec. Yards .53 Rec. TDs .45 Fantasy Points .48
Correlation Stat coefficient ------------------------ Pass Yards .47 Pass TDs .46 Rush Yards .70 Rush TDs .40 INT .36 Fantasy Points .48 (based on 6 points per passing TD, 1 point per 25 passing yards)
Correlation Stat coefficient ------------------------------- Turnovers forced -.10 Touchdowns -.11 Fantasy points -.08Here I'm defining fantasy points as 3*turnovers + 10*TDs, which is what one of my leagues uses. I don't have data for sacks.
This is remarkable. Not only are the correlations negligibly small, they are negative. That means that, if anything, you would predict that last year's bad defenses will be good this year, and vice versa. To make sure this plays out in practice, I looked at every defense from 1995 to 1998 and ranked them, best to worst, in terms of fantasy points. The best defense of the period was the 1998 Seahawks (42 turnovers, 10 TDs, 226 fantasy points), and the worst was the 1998 Eagles (17 turnovers, 0 TDs, 51 fantasy points). Here are the top 10 and bottom 10 defenses of the period 1995-1998, along with how they did the following year:
-------+ TOP 10 | -------+------------------+-------------- FIRST YEAR | THE NEXT YEAR --------------------------+-------------- TM YEAR TO TD FPT | TO TD FPT --------------------------+-------------- sea 1998 42 10 226 | 36 2 128 atl 1998 43 6 189 | 18 1 64 nor 1998 33 9 189 | 34 2 122 ari 1995 42 5 176 | 26 1 88 nyg 1997 44 4 172 | 26 3 108 sfo 1995 34 7 172 | 34 2 122 pit 1996 40 5 170 | 33 2 119 phi 1995 38 5 164 | 32 3 126 den 1997 31 7 163 | 30 2 110 cin 1996 44 3 162 | 23 1 79 ----------+ BOTTOM 10 | ----------+---------------+-------------- FIRST YEAR | THE NEXT YEAR --------------------------+-------------- TM YEAR TO TD FPT | TO TD FPT --------------------------+-------------- det 1998 21 1 73 | 32 4 136 den 1995 21 1 73 | 32 1 106 atl 1996 23 0 69 | 28 0 84 was 1998 22 0 66 | 38 4 154 buf 1997 22 0 66 | 31 2 113 car 1997 22 0 66 | 33 3 129 nor 1996 21 0 63 | 31 1 103 gnb 1995 16 1 58 | 39 4 157 ind 1998 19 0 57 | 21 2 83 phi 1998 17 0 51 | 45 5 185The 10 killer defenses averaged 106.6 fantasy points the following year, while the 10 pathetic defenses averaged 125 fantasy points the following year. Let me restate that to make sure you've got it: the absolute bottom of the barrel godawful defenses did better the next year -- substantially better -- than the top defenses. In a typical season, a 10th ranked defense (roughly equivalent to the worst one that deserves to be a fantasy "starter") will score around 125-130 points. Only two of the top defenses reached that standard, while five of the bottom defenses did.
Here is a more complete breakdown of the data:
GROUP AVERAGE IN (rank in year N) YEAR N+1 ------------------------------- 1-10 106.6 11-20 123.3 21-30 113.6 31-40 113.5 41-50 113.7 51-60 100.7 61-70 129.3 71-80 137.5 81-90 104.0 91-100 109.9 101-110 117.6 111-120 125.0Let me make sure this is clear. Over the period 1995-1998, there were 120 defenses. I've ranked them 1 (best) to 120 (worst), and then divided them into groups of 10. The rows at the top correspond to the best defenses and the rows at the bottom correspond to the worst. The second column tells you how those teams did the next year. As you can see, there is no discernible pattern. Good defenses from last year are not likely to be better this year than bad defenses from last year.
This is just one more reason to pick your defense late. Let someone else spend a relatively early pick on last year's top-ranked defense. You can pick from the bottom (several rounds later), and have just as good a chance of coming up with a gem. That's true in theory, and the data shows that it's true in practice as well.
I realize that fantasy defense scoring systems vary widely, and yours may not be anything like the one outlined above. However, the individual coefficients for turnovers and TDs suggest that you're likely to get similar results if your defensive scoring system includes those two factors in a significant way. Throwing in sacks, points allowed, and yards allowed might change things, depending on how all the factors are weighted.