Posted by Doug on April 20, 2006
I figured that some baseball stathead had probably attempted something similar, so I did some googling to see if they had any luck. I did not find any Markov models, but what I did find was this interesting article at baseballthinkfactory. It was written by a guy named Jesse Frey and it's a neat idea. I'll run through the basic gist of it using --- guess who --- Clinton Portis and the rushing record as an example.
We start by collecting all 25-year-old running backs throughout NFL history (subject to some fine print). We then record how many yards they gained at age 23 and at age 24, and how many yards they gained in the rest of their careers. So we've got a list that looks something like this:
Player Age23RshYD Age24RshYD RestOfCareer
Robert Smith 632 692 4989
Ricky Ervins 680 495 939
Terrell Davis 1117 1538 4952
Barry Foster 488 1690 1562
[... another hundred-or-so guys ...]
There is, of course, no exact formula that tells you the RestOfCareer rushing yards based on the age 23 and age 24 rushing yards, but using a technique called regression we can estimate the formula that works "best."
Given the above data, what we end up with is this:
Rest-of-career yards ~= -943 + 2.64*(age24yards) + 2.39*(age23yards)
Plugging Clinton Portis' 1516 age 24 yards and 1315 age 23 yards into that formula gives an estimate of 6202 yards for the remainder of his career.
That tells us that we expect Portis to gain about 6202 more yards in the rest of his career. But of course we're not saying he'll end up with exactly that. What we're saying is that we don't know, but our best guess is that it'll be somewhere in the neighborhood of 6202. But how big is that neighborhood? Obviously there is some chance of him exceeding that by a thousand yards. There is some chance of him exceeding that by 5000 yards. How big are those chances? To answer these questions in a mathematically justifiable way is beyond the scope of this post, but we can get pretty close with the data and our intuition.
Of the 106 running backs that comprised this data set, 20 of them (about 19%) doubled the rest-of-career rushing yards estimate provided by this formula. Doubling his expected rest-of-career rushing yards is almost exactly what Portis needs to do to break Emmitt Smith's record. So this calculation indicates that Portis has about a 19% chance of retiring as the rushing king. That's pretty close to the original Favorite Toy estimate and generally agrees with my gut feeling.
Neat, huh? If I get some time, I'll run this for some other players.