## Drinen rambles about something having to do with: Anthony Thomas

If you're old to enough to have started thinking about saving for retirement, then you've certainly been lectured about how compound interest works. And you are probably aware that someone who starts saving for retirement at age 25 will have a lot more money at 60 than someone who starts at 35, even though he only put in a little more.

Well, one of the most important things that Bill James discovered is that the same principle applies to the development of baseball players (well, just hitters, actually). If two rookie hitters post identical numbers, but one of them is 21 and the other is 24, then the 21-year-old is likely to end his career with much, much, much better numbers. There are three reasons for this:

1. The young guy, obviously, has three extra years to pile up numbers.
2. The younger guy is simply a better talent. If he's able to do at 21 what the other guy couldn't do until he was 24, he's probably just better.
3. Assuming they both peak at age 27, the young guy has 6 years of improvement left, whereas the older guy only has 3. It's as though his talent has more time to accrue interest.

Anthony Thomas was 24 last year, which is pretty old for a rookie. He's only six months younger than Ricky Williams, for instance, and he's almost a year older than Edgerrin James. This caused me to wonder if debut age is as strong a determining factor for football players as it is for baseball players. My guess before running the study was probably not. In particular, reason #2 above doesn't really apply to football, since football players don't move through a minor league system at variable paces. Reason #3 is questionable as well, particularly for RBs, who tend to arrive in nearly finished form.

But I've been wrong before, so I decided to run the study. I ran a regression with two variables:

```INPUT VARIABLE #1:  the player's age during his rookie year.
INPUT VARIABLE #2:  the player's "VBD" during his rookie year.

Y VARIABLE:  the player's future career VBD (not counting his rookie year)
```

Technical notes:

• Included in the study were all players who debuted in 1970 or later, retired before 2000, and finished above the baseline in any season during their career.
• "Age" is defined to be the player's age on December 31 of the given year.
• For input variable #2 (rookie year VBD), I allowed negative VBD values because I wanted to distinguish between players who were way below the baseline and those who were just barely below the baseline. For the Y variable (career VBD), all seasons below the baseline were counted as zero.

So here are the results:

```
FUTURE VBD   =   897  +  .95*("VBD" in rookie year) - 32*(age in rookie year)

R^2 = .16

```

First, note that the R^2 is fairly low, which is to be expected. If you're trying to predict an RB's career value based solely on his rookie year and his debut age, you're going to be substantially off in many cases. But the coefficient in front of age is negative, which says that they younger a rookie is, the more career value he is likely to have. Based on our best guess from the data we have, if two rookie RBs post identical numbers, the younger one will likely have a better career. How much better? About 32 VBD points for each year of age difference.

Frankly, this is a pretty weak model. I think it would only be useful if you were choosing between two rookies who you thought were truly equal otherwise. That never happens, of course. I'm not sold on Thomas at all as a long term prospect, and this certainly doesn't change my mind. If I did like Thomas, though, it wouldn't have changed my mind either.

What in the world does "ran a regression" mean?
Very simply put, regression is a technique for taking a set of data and obtaining the (linear) formula that "best" fits that data. The actual procedure requires a computer, and is way too complicated to go into here. In this case, our data is a list of the rookie year performance, the rookie year age, and the eventual career VBD of as many RBs as we could get our hands on. Given this info, the computer is able to make a best guess as to how the first two variables (the inputs) relate to the third one (the output, which is what we want to know). R^2 is a measure of how well the formula actually fits the data. R^2 is always between 0 (which means the formula doesn't fit at all) and 1 (which means the formula fits perfectly).