Posted by Doug on June 26, 2006
The general question we want to answer here is: assuming age and talent are equal, does previous workload help us predict future career length?
There is a mathematical technique called regression whose exact purpose is to answer questions like this. Suppose Factors A, B, and C play a role in determining Quantity D. Assuming you've got enough past data and assuming certain technical conditions are met, regression will give you a formula that tells you how to take a known A, B, and C and use them to predict the value of Quantity D.
And that's exactly what we want to do. We want a formula that will predict the future career length of a back given his his level of quality and his previous workload. The formula we get will tell us how important previous workload is (if at all).
The big problem here is that we can't just input each running back's "level of quality" into the formula. We have to decide on how to measure this. I'm going to use career-to-date VBD value as my measure of quality. While not perfect, I believe it does a pretty decent job of giving us a rough estimate of a running back's quality.
So I took all running back seasons since 1978 by running backs age 27 or older, and I recorded the following data:
- His VBD value for that year
- His career VBD prior to that year
- His career workload prior to that year
- His age
- The number of career rushes he had after that season
I plugged all that data into the computer and it spit out the following formula:
Future rushes =~ 3203 - 104*age + 2.3*VBDLastYr + .813*PreviousVBD - .13*PreviousRsh
For the purposes of this discussion, the key number is the -.13. It says: all else equal, every rushing attempt you had before last year will cost you .13 predicted future rushes. So if two backs are completely equal in every way, but one of them had an extra 500 rushes when he was young, you would expect the player with the higher workload to have 500*.13 = 65 fewer rushes during the rest of his career. The 104 next to "age" indicates that, all else equal, a player who is one year older will expect to have 104 fewer carries left in the tank. Combining these two numbers, we could infer that it would take about 800 previous rushes to age a back as much as one chronological year does.
Just for grins, let's see what this formula predicts for some of today's backs. The formula was created using data from backs who had completed their age 27 season, had at least 100 rushes the previous season, and at least 400 rushes prior to that, so we should only apply it to players meeting those conditions. Here they are:
Player Age rushes
Shaun Alexander 29 973
Edgerrin James 28 946
Tiki Barber 31 636
Thomas Jones 28 624
Ricky Williams 29 564
Fred Taylor 30 486
Michael Bennett 28 467
Marcel Shipp 28 466
Warrick Dunn 31 350
Priest Holmes 33 318
Curtis Martin 33 291
Corey Dillon 32 217
Stephen Davis 32 140
Mike Anderson 33 105
You might think that Alexander's projection of 973 future rushing attempts seems a little low, and you might think Edgerrin James' 946 seems even lower. But remember that this isn't supposed to be interpreted as the most likely outcome. Rather, it's an expected value, or a weighted average. The formula is not saying, "I project Shaun Alexander to have 973 more rushes in his career." It's saying something closer to, "there is some chance that Alexander will suffer a catastrophic injury early next year and never play again, there is some chance that he will lose effectiveness and only play for two more unimpressive seasons, there is some chance that he will play five more seasons, and there is some chance that he will play eight more seasons and shatter Emmitt Smith's rushing record. When I average these possible outcomes together, taking into account my best guess at the probabilities of each, I get 973 future rushes."
In some ways, the formula seems smart. Even though Thomas Jones is three years younger than Tiki Barber, the formula "recongizes" that Barber has a much longer history of excellence than Jones does, and so it projects him to get more future carries. Of course, the formula doesn't really recognize anything; it doesn't know Thomas Jones from a hole in the ground (or even from a binary string of 1s and 0s that represents a hole in the ground). All it's doing is attempting to predict the future in the way that best mimics the past. The past data we fed into the computer said that, in general, players who didn't accumulate much value earlier in their career --- like Thomas Jones --- don't have careers as long as those who did (like Barber).
The formula estimates that Tiki Barber has 636 carries left in him right now. It's instructive to look at what Tiki's projection will look like at the beginning of next year. If he gets hurt, let's say after 130 carries and zero VBD, then this time next year the formula will project that he is essentially finished: about 70 carries left. If, on the other hand, he has a year just like 2005, then the formula will project him to have about 500 more carries remaining.
No matter how old you are (within reason), as long as you were productive in your most recent season, the formula thinks you've got something left. But if you're on the north side of 30 and have a bad season, it will turn on you in a hurry. Since the formula was generated in such a way as to best fit the past data, the lesson is clear: age isn't much of a problem --- and neither is workload --- if you're productive. But once you start sliding, it's hard to put the brakes on.
Unfortunately, what I just said amounts to: old-but-productive running backs will continue to be productive right up until the point that they cease being productive. Genius.
But we've gotten off track. This post was supposed to be about age vs. workload and for the first time we can actually put a number on it. The number is .13. That's how many future rushes each past rush costs you.
Let's talk a bit about that number and the uncertainty associated with it. Regression answers two basic questions:
- What is our best guess at the number?
- given the sample size and the amount of variation we saw in our input data, how sure are we that the number isn't zero?
We answered #1 above. It's .13. I didn't tell you, though, that the answer to #2 is "not very." [For regression buffs, the p-value is about .22.] The point is: even though we have an estimate of .13, we do not have statistically significant evidence, in the generally agreed-upon sense, that workload has any effect on future career length.
Postscript: applied regression is
*##$!!*!***##! pretty tricky stuff. I had run this regression earlier and gotten different results. Quite a bit different. But then I realized that my data might be afflicted with a dread disease known as serial correlation, which is but one of many illnesses that can mess with your regression results. Most of these diseases have cures which can be administered simply by typing a few keystrokes into your regression software, but first you've got to recognize the illness.
As a mathematician, I understand these things on some theoretical level, but I sometimes have a hard time seeing them in practice and I have very little experience correcting them. Fortunately, I have a friend who is an economist, and economists are experts at diagnosing these sorts of problems.
The moral of the story: unless you know what you're doing --- or have a friend who does --- be very careful with regression.