## Running back deterioration III

Posted by Doug on June 26, 2006

For reference, Running back deterioration I and Running back deterioration II.

The general question we want to answer here is: assuming age and talent are equal, does previous workload help us predict future career length?

There is a mathematical technique called regression whose exact purpose is to answer questions like this. Suppose Factors A, B, and C play a role in determining Quantity D. Assuming you've got enough past data and assuming certain technical conditions are met, regression will give you a formula that tells you how to take a known A, B, and C and use them to predict the value of Quantity D.

And that's exactly what we want to do. We want a formula that will predict the future career length of a back given his his level of quality and his previous workload. The formula we get will tell us how important previous workload is (if at all).

The big problem here is that we can't just input each running back's "level of quality" into the formula. We have to decide on how to measure this. I'm going to use career-to-date VBD value as my measure of quality. While not perfect, I believe it does a pretty decent job of giving us a rough estimate of a running back's quality.

So I took all running back seasons since 1978 by running backs age 27 or older, and I recorded the following data:

- His VBD value for that year

- His career VBD prior to that year

- His career workload prior to that year

- His age

- The number of career rushes he had after that season

I plugged all that data into the computer and it spit out the following formula:

**Future rushes =~ 3203 - 104*age + 2.3*VBDLastYr + .813*PreviousVBD - .13*PreviousRsh**

For the purposes of this discussion, the key number is the -.13. It says: all else equal, every rushing attempt you had before last year will cost you .13 predicted future rushes. So if two backs are completely equal in every way, but one of them had an extra 500 rushes when he was young, you would expect the player with the higher workload to have 500*.13 = 65 fewer rushes during the rest of his career. The 104 next to "age" indicates that, all else equal, a player who is one year older will expect to have 104 fewer carries left in the tank. Combining these two numbers, we could infer that it would take about 800 previous rushes to age a back as much as one chronological year does.

Just for grins, let's see what this formula predicts for some of today's backs. The formula was created using data from backs who had completed their age 27 season, had at least 100 rushes the previous season, and at least 400 rushes prior to that, so we should only apply it to players meeting those conditions. Here they are:

Proj Fut.

Player Age rushes

=================================

Shaun Alexander 29 973

Edgerrin James 28 946

Tiki Barber 31 636

Thomas Jones 28 624

Ricky Williams 29 564

Fred Taylor 30 486

Michael Bennett 28 467

Marcel Shipp 28 466

Warrick Dunn 31 350

Priest Holmes 33 318

Curtis Martin 33 291

Corey Dillon 32 217

Stephen Davis 32 140

Mike Anderson 33 105

You might think that Alexander's projection of 973 future rushing attempts seems a little low, and you might think Edgerrin James' 946 seems even lower. But remember that this isn't supposed to be interpreted as the most likely outcome. Rather, it's an expected value, or a weighted average. The formula is not saying, "I project Shaun Alexander to have 973 more rushes in his career." It's saying something closer to, "there is some chance that Alexander will suffer a catastrophic injury early next year and never play again, there is some chance that he will lose effectiveness and only play for two more unimpressive seasons, there is some chance that he will play five more seasons, and there is some chance that he will play eight more seasons and shatter Emmitt Smith's rushing record. When I average these possible outcomes together, taking into account my best guess at the probabilities of each, I get 973 future rushes."

In some ways, the formula seems smart. Even though Thomas Jones is three years younger than Tiki Barber, the formula "recongizes" that Barber has a much longer history of excellence than Jones does, and so it projects him to get more future carries. Of course, the formula doesn't really recognize anything; it doesn't know Thomas Jones from a hole in the ground (or even from a binary string of 1s and 0s that represents a hole in the ground). All it's doing is attempting to predict the future in the way that best mimics the past. The past data we fed into the computer said that, in general, players who didn't accumulate much value earlier in their career --- like Thomas Jones --- don't have careers as long as those who did (like Barber).

The formula estimates that Tiki Barber has 636 carries left in him right now. It's instructive to look at what Tiki's projection will look like at the beginning of *next* year. If he gets hurt, let's say after 130 carries and zero VBD, then this time next year the formula will project that he is essentially finished: about 70 carries left. If, on the other hand, he has a year just like 2005, then the formula will project him to have about 500 more carries remaining.

No matter how old you are (within reason), as long as you were productive in your most recent season, the formula thinks you've got something left. But if you're on the north side of 30 and have a bad season, it will turn on you in a hurry. Since the formula was generated in such a way as to best fit the past data, the lesson is clear: age isn't much of a problem --- and neither is workload --- if you're productive. But once you start sliding, it's hard to put the brakes on.

Unfortunately, what I just said amounts to: old-but-productive running backs will continue to be productive right up until the point that they cease being productive. Genius.

But we've gotten off track. This post was supposed to be about age vs. workload and for the first time we can actually put a number on it. The number is .13. That's how many future rushes each past rush costs you.

Let's talk a bit about that number and the uncertainty associated with it. Regression answers two basic questions:

- What is our best guess at the number?
- given the sample size and the amount of variation we saw in our input data, how sure are we that the number isn't zero?

We answered #1 above. It's .13. I didn't tell you, though, that the answer to #2 is "not very." [For regression buffs, the p-value is about .22.] The point is: even though we have an estimate of .13, we do *not* have statistically significant evidence, in the generally agreed-upon sense, that workload has any effect on future career length.

**Postscript:** applied regression is ~~*##$!!*!***##!~~ pretty tricky stuff. I had run this regression earlier and gotten different results. Quite a bit different. But then I realized that my data might be afflicted with a dread disease known as *serial correlation*, which is but one of many illnesses that can mess with your regression results. Most of these diseases have cures which can be administered simply by typing a few keystrokes into your regression software, but first you've got to recognize the illness.

As a mathematician, I understand these things on some theoretical level, but I sometimes have a hard time seeing them in practice and I have very little experience correcting them. Fortunately, I have a friend who is an economist, and economists are experts at diagnosing these sorts of problems.

The moral of the story: unless you know what you're doing --- or have a friend who does --- be very careful with regression.

Don't try this at home kids, regression is dangerous stuff.

Yeah, but hey, at least we've learned something exciting: there doesn't seem to be any statistically significant evidence that running backs can't be productive late in their careers, which is very good news for Tiki Barber, among others. Nice work, Doug.

Well it looks like I've still got 915 carries in me.

I understand why you used the 100 carry coutoff for compiling the formula so as not to skew the data with low-carry guys who may have gotten injured early in the season. However, I'd be interested in knowing if there was a way to "tweak" the formula to get a feel how guys coming off limited-carry years will fare.

Specifically, I was thinking of the rebound potential of A. Green and D. McAllister.

I'd like to point out to all the jerks in my league who won't give me anything for Fred Taylor that this shows he's still a badass. Can you print color graphics of this for me? I need 11 copies. I also have Mike Anderson, therefore I think this study is totally wrong and needs to be re-worked dramatically. Good Luck.

How many more carries does W. McGahee have left? At least 6,000 I hope ðŸ™‚ How many more Super Bowls?

By the way, I seem to have developed a reputation as being a Bills fan on this site.... nothing could be further from the truth!

Chase's comment provides a crystal clear example of why it's not appropriate to apply the formula to anyone who doesn't have 400 previous career rushing attempts.

MattyP, Green and McAllister are indeed interesting cases. I'll look at them in a future post (but not using the regression).

ouch, i'm a -229 going on -333. I am actually in carry debt hell. Just like the ol' credit cards - but this time its Doug's fault, not my wife's.

So the career VBD, is that just the sum of the VBD each year?

I must be doing something wrong, because I'm getting like 500 carries left for Marshall Faulk. If Priest Holmes is down to 318 after one bad year, Faulk must be basically at nothing after two bad years.

Interesting Zac. Based on my calculations of Age (33), VBDLast Year (0), Previous VBD (1164) and Previous Rushes (2836) I've got Faulk with 348 carries left in his career. I'm doing something wrong though, because I think we're supposed to input the number of carries from the previous season. I would imagine we also want his career carries as well, but from Doug's formula I only see one variable containing rushes. Maybe Doug made an error when he transposed the data onto here, because he lists 5 variables at the top but there are only four variables in the formula.

The fifth variable is the dependent variable. OUCH!!! Oh crap, i just broke hurt myself messing around with regression again.

'Future carries' is the dependent variable. After re-reading, it looks like "last year's carries" doesn't seem to be one of the factors. That still gives Marshall Faulk 348 future carries. Now I see why Doug's caveat of needing at least 100 carries in the previous season was important. Since Faulk didn't have 100 carries last year, but the system doesn't know that, it will overrate him.

"â€˜Future carriesâ€™ is the dependent variable." No argument here. So is the fifth variable.

[...] The topic of Ahman Green and Deuce McAllister came up in the comments to this post on running back deterioration. I am going to set aside the particulars of their team situations (e.g. the Reggie Bush factor) and just take a quick look at what the historical data says about running backs coming back from significant injuries. Specifically, I found all running backs since 1970 who: [...]