## Rebuilding the Favorite Toy II

Posted by Doug on April 4, 2006

Let's start by keeping this as simple as possible. Clinton Portis last year was a 24-year-old running back with 1516 rushing yards. We'll ignore those last two digits and place him in the 1500--1600 category, which we'll abbreviate '15.' Clinton Portis was a 24-15 last year.

The next step is to sift through the historical data to find out what other 24-15s have done. What percentage turned into 25-17s? What percentage turned into 25-9s? And so on. As it turns out, 100% of all 24-15s --- yep, all one of them --- turned into 25-17s. We need to widen the net a bit, and that introduces the usual problems. As we widen it, we increase the sample (which is good), but we also introduce more runners who are not truly comparable to Portis (which is bad). There is no right answer. We just play around until we get something that appears to pass the eyeball test.

Here are all the runners aged 23--25 with between 1400--1699 rushing yards, along with how they did the next year;

Runner YR YD NextYrYd

Thurman Thomas 1991 1407 1487

O.J. Simpson 1972 1251 2003

Deuce McAllister 2003 1641 1074

Terrell Davis 1996 1538 1750

LaDainian Tomlinson 2003 1645 1335

Franco Harris 1975 1246 1128

Wilbert Montgomery 1979 1512 778

Walter Payton 1979 1610 1460

Earl Campbell 1979 1697 1934

Barry Foster 1992 1690 711

Gerald Riggs 1984 1486 1719

Mark VanEeghen 1977 1273 1080

Travis Henry 2002 1438 1356

Emmitt Smith 1994 1484 1773

Otis Armstrong 1974 1407 155

George Rogers 1981 1674 535

Earl Campbell 1978 1450 1697

Barry Sanders 1991 1548 1352

Curt Warner 1986 1481 985

Rudi Johnson 2004 1454 1458

Jerome Bettis 1997 1665 1185

Stephen Davis 1999 1405 1318

Emmitt Smith 1993 1486 1484

Jerome Bettis 1996 1431 1665

LaDainian Tomlinson 2002 1683 1645

Note that, e.g., Mark van Eeghen did not fall into the 1400--1699 yard range, but if you pro-rate his season to 16 team games he did.

Which leads to the following probabilities for Portis next year:

Yardage Probability

0-- 99 0.0

100-- 199 4.0

200-- 299 0.0

300-- 399 0.0

400-- 499 0.0

500-- 599 0.0

600-- 699 0.0

700-- 799 8.0

800-- 899 0.0

900-- 999 4.0

1000--1099 12.0

1100--1199 4.0

1200--1299 4.0

1300--1399 16.0

1400--1499 16.0

1500--1599 0.0

1600--1699 12.0

1700--2200 20.0

Although it looks choppier than it ought to, this has the right general feel. It sets the over-under for Portis' rushing yards next year at about 1400. It gives him a respectable chance of breaking out for a huge year, a slim chance of a catastrophic injury, and also a chance of a minor injury or a major decline.

So we roll a die to determine how many yards Portis will have next year. Based on what he gets, we estimate his probabilities for the following year using the same technique, roll another die, and so on.

OK, here we go. According to this method, here is the probability of Portis reaching various career yardage levels.

Yardage PctChance

18000+ 0.3%

17000+ 0.6%

16000+ 1.4%

15000+ 2.9%

14000+ 5.8%

13000+ 10.8%

12000+ 19.7%

11000+ 32.3%

10000+ 48.2%

9000+ 66.0%

8000+ 81.7%

7000+ 94.1%

The original Favorite Toy said Portis was about a 3-to-1 shot to break Smith's record. This one says he's a 300-to-1 shot. This method depends heavily on real historical data. Records, by their very nature, are historically very rare accomplishments. So we shouldn't be too surprised to see that this method thinks Portis is a longshot. But I am surprised at just how much of a longshot it thinks he is.

The problem is that we're only using one year's worth of data to estimate the following year's production. Go back to that second table at the top of the page. It says that Portis has a 4% chance of gaining between 100 and 200 yards this year. I don't think that's unreasonable. What *is* unreasonable is projecting the rest of Portis' career under the assumption that he is a morally a 100-yard-per-year running back. If Portis gains only 150 yards in 2006, it will be because he got hurt. But the mathematical model doesn't know that. It thinks Portis is just another Heath Evans or Shaud Williams who will be out of football shortly.

So when the simulated Portis suffers a major injury, he has almost no chance of coming back. The model needs more information. [Markov chain fans will note at this point that we're up against the "memoryless" assumption of Markov chains that I glossed over in my last post.] I can think of two ways to provide this information:

- We could take into account more than one year's worth of statistics when determining the historical probabilities. In other words, instead of calling a hypothetically injured Portis a 25-year-old back who gained 150 yards last year, we could call him a 25-year-old back who gained 150 yards last year and 1500 the year before. This would certainly allow the model to distinguish between a hypothetical injured Clinton Portis and a healthy Shaud Williams. But it drastically cuts down the pool of available comps.
- We could measure everything in terms of yards per game instead of raw yards. Then independently assess the probability of injuries at each age. Under this scheme, our hypothetically injured Portis would be called a 25-year-old back who averaged 78 yards per game and played two games. It's just a guess at this point but I think this plan, while not without its problems, might actually yield some reasonable probabilities.

Either way, it's going to take more programming, which takes more time, which I don't think I have right now. For now we will have to file this under "crazy ideas that don't work and may or may not be salvageable." I will throw it on the to-do list and hope to attempt to salvage it sometime.

If a system is worked out to project statistics using the Markov chain...well, you may have created the greatest Fantasy Football tool ever devised.