## What’s a draft pick worth? (part III)

Posted by Doug on April 4, 2007

[NOTE: this post was edited to correct an error a couple of days after its original posting. The error is explained here.]

This is a continuation of last Wednesday's and Friday's posts about Cade Massey and Richard Thaler's study of the NFL draft.

They claim to have shown that having the first draft slot is a disadvantage. I claim they've shown that all draft slots in the first three rounds are essentially of equal value. That's not because first round players aren't better than third round players --- they are, at least on average --- it's because first round, second round, and third round players will, on average, outperform their contract by roughly equal amounts. In a salary capped world, that makes those picks equally valuable.

But that's all theoretical. What I want to know is, does it translate to wins and losses? If, as Massey and Thaler claim, high first round picks are a liability compared to later first round picks and second round picks, then teams with lots of valuable (according to their study) picks should improve more quickly than similar teams without such picks.

This is a sticky question because so many factors are involved. For example, consider the following table, which consists of actual data:

**All teams with 3 or fewer wins in Year N-1**

Avg wins in Year: N N+1 N+2 TOT ============================================================= Low M-T draft value in Year N 6.0 8.0 7.0 21.0 High M-T draft value in Year N 6.0 8.5 7.6 22.1

"High" and "low" are defined by "above the median" and below the median." In other words, I looked at all teams that won 3 or fewer games in Year N-1 and sorted them by M-T value in the Year N draft. I cut the list in half and called the top half the High M-T Value group and the bottom half the Low Value group.

This would seem to indicate that M-T value does play some small role in team improvements during the three year period. But the problem is if you do the analogous breakdown by NFL pick value chart value, you'll get similar results. That's to be expected, because chart value and M-T value are positively associated. Teams that have a lot of picks generally have a lot of both kinds of value. In order to put the Massey-Thaler theory to the test, we need to separate the two.

The only way I know to do that is with regression. As I've mentioned before, regression is an unbelievably complex subject. Like most people, I don't fully understand all its intricacies. Also like most people, I'm going to go ahead and use it anyway. The only difference is I'm going to warn you that I'm not completely sure if what I've done is OK. So take it for what it's worth.

I ran a regression with the following input variables:

- Team record in year N-1
- Total Massey-Thaler draft value in the Year N draft (described in Friday's post), in millions of dollars of surplus value
- Total NFL draft pick value chart value in the Year N draft (also described in the above-linked post) in their usual units. The first pick is 3000, the 2nd pick is 2600, and so on down to the last pick, which is worth essentially zero.

The output is team record in Year N. Here are the results:

**Wins in Yr N =~ 5.2 + .32*(Wins in Yr N-1) + .000040*(NFL pick value) + .056*(M-T value)**

The coefficients on the last two inputs were not significant and were not anywhere close to being significant.

Alright, so maybe it takes more than a year for the value of these draft picks to materialize. Here is another regression:

**Wins in Yr N PLUS Yr N+1 =~ 11.4 + .53*(Wins in Yr N-1) + .00029*(NFL pick value) - .017*(M-T value)**

Again neither of the draft-related coefficients is significant.

When you look at teams' records over the next *three* years, you get similar results:

**Wins in Yr N PLUS Yr N+1 PLUS Yr N+2 =~ 17.8 + .65*(Wins in Yr N-1) + .000025*(NFL pick value) + .32*(M-T value)**

Again, neither coefficient is significant. Even if they were statistically significant, they're small enough that it's clear that they have no practical significance, as the following thought experiment shows.

Suppose the Raiders swapped their #1 overall pick straight up for the Colts' #32 overall pick. In giving up #1 and getting back #32, the Raiders gain a net of about .13 million in Massey-Thaler value. Multiplying .13 by the appropriate coefficient (.32) yields about .01 wins. In a three year period.

The R^2 of this regression was about .07 which means that if you want to predict an NFL team's wins during the period 2007--2009 using its 2006 record and the M-T and NFL values of its 2007 draft picks, you're not going to be very successful. But that's pretty obvious.

Assuming the regression is technically OK, these results validate my criticism of the Massey-Thaler paper. Namely, teams do not appear to be able to the translate the theoretical surplus value they get from their draft picks into surplus production on the field. That's probably because the difference in theoretical average value between draft picks is so small that it's swamped by other factors. One of those factors, of course, is what the teams actually do with those picks. In other words, it's much, much less important for a team to know that pick #1 is on average less valuable than pick #30 than it is for them to know that Peyton Manning is better than Ryan Leaf (if they're picking #1) or that Chad Johnson is better than Quincy Morgan (if they're picking #30). What Massey and Thaler's paper shows is that the NFL draft is a meritocracy, or maybe a luckocracy, but it's not in its present form a mechanism for promoting parity.

Very cool stuff, Doug. What if we look at just the number of wins in Year N+3? Would that make any difference?

I dont' mean to be a dolt, but when I look at the chart I see the difference between Oakland's pick and Indy's and .357. Can someone explain what I'm missing? Here are the two values I'm looking at, maybe I'm looking in the wrong spot:

1 3.723

32 3.366

Deryl, you're not a dolt; I'm just sloppy with language.

3.723 and 3.366 are the total draft value of

allthe Raiders' and Colts' (resp) picks.The thought experiment in this post was about trading just the first round picks.

Good work, Doug. I agree with your conclusions.

I think there were some beneficial things to come out of the M-T study. For example, the value of the right to choose, and the false consensus, as it relates to teams trading up to take a particular player, versus getting the next player taken at the same position. (My Chiefs trading up two spots to get Ryan Sims, rather than keeping picks and settling for John Henderson or Albert Haynesworth comes to mind). I think their research there should be taken seriously by teams who feel the need to trade up 5 spots to get "their player".

I was guessing that M-T were closer to the truth than the consensus draft value charts. However, I am not confident that the early 2nds are actually more valuable than the early 1sts. I think there are two problems that they are not fully accounting for:

1. QB's. Many early drafted QB's do not start 8+ games right away, and it doesn't necessarily mean they are busts. Most QB's drafted in the first rounds are drafted early, at the point where the M-T curve is lowest in the first round. Are they missing something about how teams value QB play, and how teams are willing to sacrifice year 1 for greater benefits in later years?

2. Their crude categories. They go from 8+ starts, to pro bowl, with nothing in between. Thus, Philip Rivers and Charlie Frye are equals in their methodology. If early first rounders who are 8+ games starters but not pro bowlers tend to be in the upper 25% of their position more than others, then their methodology would undervalue early picks.

They try to address that issue by saying that among the stat skill positions, the production per dollar is higher at the end of the first and early second than in the early first. Fantasy drafters who use VBD should recognize the potential flaw here. Not all yards are created equal. The difference from 800 yards receiving to 1200 yards receiving is worth a heck of a lot more than the difference from 400 to 800. Their arguments do not convince me that they are not undervaluing the higher chance of upper level (just below pro bowl) performance.

That being said, I dont think there is any chance the NFL draft value charts are close to being accurate, and M-T are much closer to the truth.

On your thought experiment, there is no chance a team would actually make that trade, nor would they have to.

As an analogy, if I think Team A has 50% chance to win, I do not have to pay even money if the conventional thought and consensus is that they are a 7 point underdog. I take advantage of that by demanding the 7 points before I take Team A.

A team employing M-T principles could make a trade similar to the NY Giants and San Diego swap from a few years ago, and get the 4th overall, and the 2nd and 3rd rounders, plus a future pick. The net effect here is closer to 1 extra win over a 3 year period, which is not insignificant. And this would be a socially acceptable "fair market" trade.

Even though I don't think M-T is exactly correct, because they are better than the out of whack NFL charts, a team could employ M-T principles and gain a not insignificant advantage until the market corrected itself and recognized the fallacy of the current charts.

Granted. That's why it's a thought experiment. The point is that M and T think the Raiders should make that trade, even if for some reason they couldn't get a better offer.

The funny thing is that the only teams in position to use this information to their advantage are the crummy teams. If Tony Dungy reads the Massey-Thaler paper tonight and buys into it completely, there is really no way he can take advantage of the market inefficiency he has just discovered.

The funny thing is that the only teams in position to use this information to their advantage are the crummy teams. If Tony Dungy reads the Massey-Thaler paper tonight and buys into it completely, there is really no way he can take advantage of the market inefficiency he has just discovered.Couldn't he trade pick #32 for pick #43?

I agree with JKL in that there other important findings in the M-T study besides their draft surplus value. The false consensus and overconfidence effects described should encourage teams to think of the available talent in terms of tiers rather than absolute rankings.

BTW, the M-T paper was discussed at the Sabermetric Research blog a few months ago where another flaw was discussed.

One of my favorite parts of the paper is their findings for the high discount rate when trading picks between adjacent years. A pick in round n is worth roughly a pick in next year's round n-1. A team could exploit this by always trading their 7th round pick for a 6th round pick the following year, then the next year trade that 6th rounder for a 5th rounder, etc. Eventually you'd be getting an extra 1st rounder every season!

I don't think the regression says very much one way or the other about practical significance. The thought experiment purporting to show the minuscule effects of swapping the #1 and #32 picks is based on the rather small coefficient of the M-T value input, but the small coefficient is the result of statistical insignificance, not (necessarily) practical insignificance.

I'll pause to mention that I don't really know what I'm talking about here. So I welcome corrections wherever my shooting from the hip goes wrong.

Suppose there are five jars (labeled one through five) that each spit out a random number. The first jar spits out a random number between -1000 and +1000 (evenly distributed); the second jar spits out a random number between -999 and +1001, the third between -998 and +1002, and so on.

If you take, say, twenty samples of output from each jar and run a regression analysis trying to predict a jar's output using its jar number as input (finding the correlation between jar number and output), the coefficient for jar number will be pretty small. Because the standard deviation of each jar's output is so huge, it will take a very large sample size before the jar number coefficient becomes statistically significant (such that the regression analysis would "know" that the fifth jar spits out higher numbers than the first jar).

Now consider a second set of five jars, where the first spits out a random number between -1 and +1, the second spits out a random number between 0 and +2, the third between +1 and +3, and so on.

Here, even with just twenty samples of output from each jar, if we did a regression analysis using jar number as input, the jar number coefficient will be quite significant. The regression would "know" very quickly that the fifth jar spits out higher numbers than the first jar (and would even know roughly by how much).

With both sets of jars, jar number has the same practical significance. The fifth jar will produce output that is, on average, four points higher than the first jar. If we are bidding on the right to receive a jar's next numerical output in dollars, in both sets of jars we should be willing to pay $4 more for Jar #5 than for Jar #1.

For a given limited sample of jar outputs, however, a regression analysis will treat the difference between Jar #5 and Jar #1 as being way less significant in the first group of jars than in the second group of jars.

It seems to me that rookie draft picks in the NFL are very much like the first group of jars. The difference between Peyton Manning and Ryan Leaf absolutely dwarfs the difference between the [i]average[/i] #1 pick and the [i]average[/i] #2 pick -- which means that a regression analysis isn't going to discover the difference between the average #1 pick and the average #2 pick -- even if there is a very real difference -- without an insanely large sample size.

The fact that the M-T value coefficient is [i]statistically[/i] insignificant in predicting wins based on a regression analysis over a limited sample of data does not, as I understand it, say anything about whether M-T value is insignificant as a practical matter.

maurile,

This is probably the first and last time I'll ever get a chance to say this to you, but I think you're wrong.

You've got it backwards. The statistical insignificance is a result of the small coefficient (where "small" of course depends on the sample size and the variation within the sample).

Assuming the usual assumptions are in place, regression coefficients are unbiased, which means that their expected value is the same as the true (unknown) value of the coefficient.

If you ran your 100-sample jar experiment 1000 times with the first set of jars and averaged the regression coefficients, you'd get one or something very close to it.

If you ran your 100-sample jar experiment 1000 times with the second set of jars and averaged the regression coefficients, you'd also get one or something close to it.

[NOTE: I actually wrote a quick program with the R statistical package to verify this.]

That's what unbiased means. The difference between the two situations is that you get a wider spread of guesses in the first case than in the second. But the point is that the variability in the data does not systematically make the coefficient smaller.

Back to the M-T draft stuff....

The estimate of the coefficient on M-T draft value was .32. That might be too big, and it might be too small, but we have no reason to suspect that it'd be one rather than the other. Further, whether it's too big or too small isn't related to the sample size or the variability in the data.

The standard error on that coefficient was about .44, which means that we shouldn't be terribly surprised to learn that the true coefficient is not .32 but is .8 or even 1.2 (or, of course, -.2 or -.7). We should be very, very, very confident that the true coefficient is less than 2. If it were 2, then a #1 for #32 swap would net about .09 wins per year over the next three. That's not nothing, I guess, but bear in mind that that was a very optimistic estimate and that it's just as likely that the true effect is NEGATIVE .05 wins per year.

That, IMO, justifies the "no practical significance" conclusion.

Yes, I had things screwed up conceptually. (I told you I didn't know what I was talking about.) You're right that the expected value of the coefficient is equal to the true coefficient regardless of sample size or variance within the sample.

The coefficient is less likely to be accurate, for a given limited sample size, if the variation within the sample is large however. (Right? I'm thinking of the jars and this point seems obvious, although I haven't done the exercise of running any regressions for them.)

In any case, my instinct is still that we still haven't shown much about the practical significance of M-T value.

We are looking for the significance of

expecteddraft value (i.e., M-T value or draft chart value) by measuring its correlation with future wins, knowing that expected draft value doesn't directly affect future wins. Expected draft value affects actual draft value, which -- along with many other factors -- affects future wins.Some of the problems of doing things this way are that (a) the difference between teams' expected draft values each year are relatively small; (b) the difference between a team's expected draft value in a given year and its actual draft value that year can be huge due to random luck (e.g., Ryan Leaf); (c) even apart from random luck, expected draft value (in the M-T or value chart sense) is only one of several variables that affect actual draft value, and these other variables are hard to control for (i.e., some teams have consistently better scouting departments than others), (d) a team's future wins is subject to random luck (strength of schedule, etc.), and (3) even apart from random luck, a team's future wins is affected by a great many variables other than a team's actual draft value in Year N, and these other variables are hard to control for (i.e., some teams are consistently better than others at signing free agents, some teams are consistently better than others at coaching, and so on).

It occurs to me, however, that I may not be saying anything different from what's in the final (revised) paragraph of your post.

Awesome job, but it reads as if you used the theoretical draft order to compute your surplus value variable. In reality, picks don't go #1...#32. Cleveland, for example, had 2 1st round picks this year. You'd have to compute the real surplus value team by team, year by year for several years to get a good number.

Assuming you did this and the .07 number is valid, this is no small result. It is very large when compared to other stats conventionally accepted as very predictive of a team's success. Regressing previous year's wins on next year wins yields an r-squared of ... .06. Surprising but true, and using 2002-2006 seasons it's significant at the p=0.05 level. To me, this means that the 7 players picked up in a draft account for as much or more variance in the following year's record as the other 45 players already on the team. Or, more precisely, their surplus value accounts for as much variance as all the other players.

Another example, if you regress a team's previous year's pass efficiency onto it's following year's wins (CurrentWins vs. LastYrPassEff), you get an r-squared of only .04). Assuming the bulk of most teams' passing offenses remain in place year to year, passing game proficiency only accounts for 4% of the variance in next year's records. Would you rather have the NFL's #1 passing offense or the NFL's best draft class? Your methodology says the draft is more important.

Why is there such a small r-squared for each of these variables? My guess is randomness. The better team doesn't always win in the NFL. Due to the salary cap, there is so much parity in the NFL and it doesn't take much "luck" for an inferior team to win. So there is a large part of noise/luck/randomness in win-loss records. Probably about a 1/4 of a team's record is random by my studies. Moreover, the best statistical win-prediction models rarely ever achieve better than a 70% correct score.

No, I did use the value of the actual picks, not the originally-owned picks.

You're misunderstanding (I think). The regression included surplus draft value

andlast year's record as input variables. The last season record was highly significant, but the draft value was not.I stand corrected. I love what you're doing here by the way. But by using both previous records and draft values in the same regression raises a new (fatal) problem with that model.

By using both last year's record AND draft pick values in the same model you are violating one of the most sacred rules in regression. Draft pick values, by defition, are (overwhelmingly) a function of last year's record. The 2 variables are therefore highly collinear.

The same problem exists for M-T surplus values and last year's record. The former is a function of the latter. Using all 3 variables in the same model would be even worse.

It would be like using height and arm length in a regression model that estimates how high someone can do the "vertical" at the combine. Height and arm length are too closely related to each other. It's dividing up the dependent variable's variance among variables that represent the same thing.

Here's perhaps a better idea: Create a new variable "Delta Wins" representing the change in the # of wins from year to year for each team (DeltaW=WINSn-WINSn-1). Use that as your dependent variable. Now you can account for the previous year's record without the collinearity problem. Do 3 separate single-variable regressions using conventional pick values, then M-T surplus values, and finally previous year's wins.

DeltaWINSn vs. Conventional draft value

DeltaWINSn vs. M-T surplus value

DeltaWINSn vs. WINSn-1 (for comparison)

I'd do it myself but you did the legwork and got the data! You'd probably need several years of data to see significance.

Again, I love your site.