SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.
Also, our existing PFR blog rss feed will be redirected to the new site's feed.
Pro-Football-Reference.com » Sports Reference
For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.
Approximate value in the NFL
Baseball analysis pioneer Bill James had a tool called the Value Approximation Method. In the 1982 Abstract, he introduced it thusly:
The value approximation method is a tool that is used to make judgements not about individual seasons, but about groups of seasons. The key word is approximation, as this is the one tool in our assortment which makes no attempt to measure anything precisely. The purpose of the value approximation method is to render things large and obvious in a mathemtatical statement, and thus capapble of being put to use so as to reach other conclusions.
[The emphasis was in the original.]
James then goes on to describe the method a bit. He used basic stats like batting average, RBI, stolen bases, pitching wins and losses, strikeouts, and so forth, to assign an integer to each player season. A typical MVP would be around 16 or 17, and all-star around 13, an average starter about 10, and so on. He continues:
These approximations are not intended to tell you anything at all about the player that you do not already know. It is not essential that you accept the individual evaluations; there are cases where 10 or 11 points seasons will turn out, under careful scrutiny, to be better than 12 or 13 point seasons. The approximations are intended only to distinguish as quickly and reliably as possible between large contributions, very large contributions, gigantic contributions, medium-sized contributions, small, smaller, and negligible contributions.
...
The value approximation method enables one to set down on paper a simple representation of all of the things that one would otherwise have to hold in mind.
Basketball analysts will occasionally use a simple points-plus-rebounds-plus-assists-plus-some-other-stuff metric to accomplish the same goal. A few circumstances in which such a method might be useful here at this blog:
- Which teams have done the best jobs of drafting? To answer this, we'd need a tool that measures value across positions.
- Likewise, this post about how teams are built could be made more accurate. Instead of simply counting a starter as a starter, we could weight the more important starters more heavily, and we could include the non-starters as well. In other words, instead of saying things like "Team X got 4 of its 22 starters in the first round", we could say more meaningful things like, "Team X got 31% of its contributions from first round picks."
- What is the "real" age of a team? If you simply average the ages of the players, you might have a backup QB or a kicker or a couple of veteran linebackers who only play a few plays a game skewing the average. If you only use the starters, or something like that, you lose information about whether the depth is old or young. It would be nice to weight the average according to how much each player contributed.
- Do players from big name colleges tend to be overdrafted compared to players from smaller schools?
- If, for some reason, you want to know which college award has produced the best pros, you need a tool like this that cuts across eras and across positions.
In the above-linked post, I did whip up an ad hoc value approximation method, and I set it up using language similar to James' language above:
So, with all the players in place, I need a way to measure their NFL success. As we go through it, keep in mind that it is not meant to be a precise metric, but rather an approximate measure of value. Comparing a linebacker who has been in the league for nine years to a running back who is in his second year is very tough to do, so all I’m hoping to do is group guys into broad categories that seem reasonable. I’m going to put a number between 0 and 18 on each player. Peyton Manning is an 18. So are Warren Sapp and Randy Moss. Julius Peppers is a 14, Garrison Hearst a 12, Terrell Suggs a 10, Dominic Raiola an 8, E.J. Henderson a 6, Rashaan Salaam was a 4, Byron Hanspard a 2, and Eric Crouch a zero.
Again remember that the goal here is not to forever put an end to the debate about whether Daniel Graham or Antonio Bryant has had the better career. That’s too ambitious a goal. We simply want to classify them both as being a bit better than Michael Bishop or Travis Dorsch, but not as good as Terry Glenn or Carson Palmer.
Despite the disclaimer, many people took exception to my method, and not unjustifiably. If we want to do this for football, we've got a big problem that James didn't have. Namely, there are only a precious few objective pieces of data that are recorded for all players. James could use homers, RBIs, stolen bases, etc. for position players, give a few bonus points for playing the more important defensive positions, and now he's got all his position players rated. He can use wins, strikeouts, saves, etc. to rate his pitchers. Then all he has to do is find a way to interlace the two lists, and it's not too hard to come up with some intuitive justification for how to do it.
The only objective stats by which we can compare Randy Moss and Mike Singletary are games and games started. Pro Bowls are, I suppose, an objective piece of data in hindsight, but there was obviously a great deal of subjectivity involved in determining that number. Remember, we're trying to very approximately rate all players from all positions in one big list here.
Are these, even approximately, the best 10 players in NFL history?
+-----------------+-------+ | player | games | +-----------------+-------+ | Morten Andersen | 382 | | Gary Anderson | 353 | | George Blanda | 340 | | Jeff Feagles | 320 | | Jerry Rice | 303 | | Bruce Matthews | 296 | | Darrell Green | 295 | | Sean Landeta | 284 | | Jim Marshall | 282 | | Trey Junkin | 281 | +-----------------+-------+
How about these? (Note: games started aren't quite complete in my database, but please play along.)
+-----------------+---------------+ | player | games_started | +-----------------+---------------+ | Bruce Matthews | 292 | | Jerry Rice | 284 | | Jim Marshall | 282 | | Bruce Smith | 267 | | Darrell Green | 258 | | Mike Kenn | 251 | | Lomas Brown | 251 | | Clay Matthews | 248 | | Dan Marino | 240 | | Mick Tingelhoff | 240 | +-----------------+---------------+
This is probably better:
+------------------+---------------+ | player | pro_bowls | +------------------+---------------+ | Merlin Olsen | 14 | | Bruce Matthews | 14 | | Jerry Rice | 13 | | Reggie White | 13 | | Jim Otto | 12 | | Junior Seau | 12 | | Randall McDaniel | 12 | | Will Shields | 12 | | Ken Houston | 12 | | Bruce Smith | 11 | +------------------+---------------+
A combination of those three stats is what I used in the college award winners post, because that's all we have. I'd like to try to do better. But it's going to get complicated.
The main idea is very similar to another of Bill James' concoctions: Win Shares. The output and the goal of the Win Shares method are in some sense similar to those of the value approximation method: put a number on every player-season so that we can compare across years and across positions. But the method itself is completely different. Approximate values are simple, intuitive, and approximate, Win Shares are complicated and precise (or at least as precise as the available data allows). Here's the main idea, from the Win Shares entry at wikipedia:
Win shares is a top-down approach which starts with the number of games a team won, and then attempts to assign credit to players, proportionally based on their statistics.
I'm not going to do exactly that, but I'm going to use the same idea. I'm not going to get into too many specifics here because I want to introduce the main ideas without getting too bogged down, but here are the main steps.
The first thing we do is measure each teams's offense. Based on how good the team's offense is, we determine how many "points" are to be split among the players on that team's offense. Let's declare that an average team should have 100 points to split. If the 2007 Patriots' offense was, say, 60% better than average according to whatever metric we decide to use, then Brady, Welker, Light, Maroney, etc. would have 160 points to split. Meanwhile, a terrible offensive unit like the 2006 Raiders might only have 45 or 50 points to split.
Now do the same with defenses. The 2002 Bucs defenders will get a lot of points, somewhere around 150, to divvy up. The 2007 Lion defenders will have only 60 or so.
From here on out, I'm going to focus on the offensive side. I'll return to the defenders in a future post.
Now we have to divvy up the points, which is where it gets real dicey. I'm going to lay out a few assumptions that are almost certainly not correct, but whose use I'll try to justify anyway. Here is the first:
Assumption #1: the offensive line is exactly as good as the offense.
I will make no effort to try to determine whether Emmitt, Michael, and Troy made pro bowlers out of Erik, Nate, and Mark or vice versa. I do realize that this will overcredit lines that were fortunate enough to have superstars behind them, and it will overcredit runners and passers who really did have great lines in front of them, but I don't see what choice we have, because we don't even know who the really great lines were. I'm trying to build an objective method here. I can't be adding extra credit to the 90s Cowboys offensive linemen simply because everybody knows they were opening huge holes for Emmitt. Good offenses have good offensive lines. Bad offenses have bad offensive lines. Approximate Value.
Assumption #2: the offensive line is equally important in the running game as it is in the passing game.
I will let you to try to convince me otherwise, but I think Assumption #2 is a good null hypothesis unless I see some evidence to the contrary.
Putting these two assumptions together, we declare that every team's offensive line will get the same proportion of its team's offensive points. What proportion is that? We'll talk about that later.
Now how do we award points to the individual offensive linemen? First we award "pre-points" to each linemen based on (1) how many games he played, (2) how many games he started, (3) whether he was a tackle (as opposed to a guard or center), and (4) whether he made the pro bowl. Remember, I'm avoiding details in this post, so I'm not going to tell you exactly how I do that. Add up all the pre-points for each team's line and then divvy up the actual points proportionally. If a team's line is given, say, 40 points to distribute among them, then a player who had 25% of the team's pre-points would get 10 points.
OK, now the offensive linemen are done. Just 17 more positions to do.
The next step is to determine how good each team's offense was in the running game compared to the passing game. Once we've decided that, we take the team's remaining offensive points (the ones that haven't already been given to linemen), and we divide them into two categories: (1) running game points, and (2) passing game points. Exactly how we make that split is subject to some debate, and we will have that debate, but not now. For now, just note that in most cases we'll have to give a lot more passing game points than running game points because there are a lot more people that share in the passing game. In a lot of cases, virtually all of the running game points will go to a single individual (remember, the offensive line has already been credited), whereas the passing points will be generally be split among four or five key guys, as well as another half-dozen or so bit players.
On the rushing side, we award the running game points proportionally to the players according to how many rushing yards they had, or some similarly basic metric.
On the passing side, we have to make another split, and so here comes another assumption:
Assumption #3: the ratio of pass-thrower importance to pass-catcher importance is constant from team to team.
Rather than being merely shaky, this assumption is obviously wrong. As Chase pointed out to me, Ryan Fitzpatrick throwing to Bruce and Holt might produce numbers similar to Tom Brady throwing to Jabar Gaffney and Reche Caldwell, but in the former case everybody knows that Bruce and Holt are responsible while in the latter case it's obviously Brady. Believe me, I'd like nothing more than to be able to separate the contributions of the passers from the catchers on a given team. But that's basically the Holy Grail of football analysis and as far as I know no one is remotely close to knowing how to do it. So rather than go through every team in NFL history and put a flag by the ones where we know that either the QB or the receiver group is better than the other, I'm just going to point to the word "approximate" in the title of the post, hope things will even out in most cases, and use Assumption #3.
In keeping with the spirit of this post, I'll decline to talk about what the constant pass/catch ratio is. But assuming we've got that figured out, we now award the passing points proportionally according to passing yards (or maybe some slightly more interesting metric) and the catching points proportionally according to receiving yards. Note that many running backs will pick up a few points here to add to their rushing points.
That's it. That's all there is to it.
What's left to talk about? Oh yeah, lots of stuff:
- What metric do I use to determine offensive points at the team level?
- What fraction of points should go to the line?
- What is the pass/run split?
- On the passing side, what is the throw/catch split?
- We need to figure a way to give some of those offensive line points to fullbacks and tight ends, many of whose jobs include a lot of blocking.
Aside from that, we're finished. Oh wait.
- We have to go through all this stuff with defense too.
- Kickers, punters, returners?
As you can tell, it's going to take a few posts to get through this, so I'll stop this one here. But be aware that this isn't one of those posts where I conjure up some grand idea and then don't follow through on it. I actually have done most of the programming and even have lists. But I've been doing this long enough to know that, once I post the lists, no one will look at anything but that. I'd like to iron out some of the methodology without that as a distraction. If you have any comments so far, I'd love to hear them.
This entry was posted on Tuesday, January 15th, 2008 at 10:29 am and is filed under Approximate Value, General, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Here are some suggestions...
Metric for offensive points has to be a combination of yards and points scored. Points scored is not good enough because it does not account for field position bonus' (or penalties) given to it by the defense and special teams. Yards is not enough, because it does not detract enough from teams with bad red zone efficiency.
The line should get around half of the offensive points. A great line makes an average QB good or above average, a horrible line makes all but the top 5 QBs of all-time worthless. Plus purely from a numbers perspective they have 5 of the 11 starters on offense.
Pass/run is difficult. You bring up the point that one player essentially gets all the rush points. Also, there is the point that if we are using yardage then the passing game has the advantage of what a 2:1 ratio in most games, as far as yard production?
I think throw/catch has to be 50/50. (Unless a huge study is undertaken based on players switching teams which can give us a better idea which of the two is more important).
Is a percentage of plays played available? If so that will be helpful with your fullback issue. If not, going to be extremely difficult to objectively add points to the fullback. No ideas on TE.
I have no thoughts on defense or ST.
Intuitively, I'd posit that the importance of the offensive line to the running game is large and fairly constant, and that its importance to the passing game varies roughly in inverse proportion to the quality of the quarterback (Tom Brady and Peyton Manning get rid of the ball so quickly that their linemen look great regardless). Actually, that suggestion still doesn't quite work: certain good, quarterbacks like Roethlisberger, Warner and Culpepper 1.0 are massively affected by their line play, and certain bad ones like Harrington and Josh McCown far less so. I guess what it comes down to is that I'm not sure its any easier to split credit between a QB and his line than it is between a QB and his receivers. And that that's a big (possibly insurmountable) problem for the project as a whole, much as I like the idea. You'd need advanced game charting statistics like Joyner's to really get anywhere, I suspect.
This could very well be the coolest thing ever. It is going to be a LOT of work, though.
A very important question: How far back in football history will this list go, and what statistics do we have available for each position for those years?
Even once you're done divvying up the points for each player in a season, there's still a question of how to make it a career metric. How much do you value longevity? Does Peyton Manning get more points for 2004 than, say, Jeff Backus gets for his entire career? Can a player be given negative value? If so, how much negative value has Robert Gallery piled up so far?
I would lean toward some combination of net yards per pass and first downs for passing offense, rather than points scored (which is very subject to influence from field position, defense, special teams play). Offensive players on teams with good defenses would get too much credit if points scored was used.
For players since 1996, you could use some form of Football Outsiders DPAR/DVOA stats. (They are working on the 1995 stats) Doesn't help with anyone before that, though.
Re: the assumptions:
Assumptions #1 & 2 are reasonably valid. Compare Edgerrin James' career in INdy vs. Arizona. Also see Shaun Alexander's decline in Seattle. For the recent seasons listed above, FO's Adjusted Line Yards are a GREAT stat. Basically, the line gets credit for a 10 yd run when the back jukes a safety and runs for 60 yds. The OL gets credit for giving him big hole and the RB gets credit for being really fast and being able to make one guy miss.
My suggestions for the ratios, if you haven't worked them out--1/4 for the OL (they are not responsible for YAC except on screens or most of yds on long runs), 1/4 for running game, and the other 1/2 for the passing game to divvy up--1/3 for QB, 2/3 for receivers. This makes it easier to compare individual players at different positions. Also, a QB with average receivers will still stand out, because no receiver will have a high points total, and a bad QB with great receivers will have 1 or more receivers with almost as many points as his QB.
As for the receivers, I recommend something along these lines: (Receptions x2) + yds + (TD x3)
Re:offensive points as a hole. Total points available for a teams offense equal the team's ratio of points relative to the league average. E.g., if in 2000 the average team scored 24 ppg, and Offense X scored 30 ppg, they get 25% more than your baseline (1000, or whatever--a bigger baseline means less decimal places). This will help adjust for era, # of teams in the league that year, etc. For defenses, you could do the opposite. Hope this helps.
That's a pretty neat idea.
To aid your quest in finding the Holy Grail of football analysis, in terms of the QB/WR conundrum outlined above, is it possible to award similar pre-points (a la the offensive line) for Pro Bowls and things like that?
At first, it's usual that a Pro Bowl QB will, in turn, generate Pro Bowl WRs. But in some cases, this does add a valuable distinction, as in the Brady/Gaffney/Caldwell example above, or Steve McNair's MVP season when he was throwing to...uh, (went to look it up)Derrick Mason and a bunch of dudes, or between Steve Smith and Jake Delhomme and the other Carolina QBs. This helps account for things like, Derek Anderson, who wouldn't have had as good of a season in 2007 if not for Braylon Edwards, who had a Pro Bowl year.
Come to think of it, this could, in turn, become a valuable tool to learn things like, who is more valuable to the 'Skins in 1999, Michael Westbrook or Brad Johnson? (er..neither..?) The point I'm trying to make there is, when a player leaves and goes to another team and does nothing, we can begin to say, "Well obviously, a large part of his success was the person on his old team."
I basically started on something like this last week -- great minds truly think alike! However, I didn't break offenses down into passing/rushing, and I tried to look at salaries to see how GMs value each position. For instance, QBs made 17% of the salary cap # devoted to offensive players from 2000-07, so I allocated 17% of "Offensive Wins" to the QB position; RBs were paid 12% so they got 12% of wins, etc. Then I tried to approximate the allocation of snaps between starters and backups at each position, to see what % of the positional value each player earned. But I've since backed off on this because it was not appropriately valuing players individually; that is, the starting left tackle for the best offensive team is going to be seen as "better" than the starting left tackle for an average offensive team even if they are equal in quality, simply because the first LT played with better teammates.
Now, I'm thinking about ignoring team quality and simply breaking players into these 4 categories: backup, starter, pro bowl, and all-pro. My next step is to look at average salaries again and see how GMs (the guys whose job it is to know player value) value the 4 categories at each position. Like, the average guard may be much less valuable than the average QB, but an all-pro guard like Steve Hutchinson is unquestionably more valuable than the typical QB. So I'd ideally like to know the average salary at every position for a backup, starter, pro bowl, and all-pro. Then we'd still apply it on a per-game basis to players' games and games started, but it'd be much more individually-based than attributing Randy Moss' excellence in some way to Ben Watson. Does this make sense to anyone else?
Anyway, glad to see we're on the same page when it comes to devising an approximate value system for all positions. It's an incredible coincidence, and I'm pretty excited to see your results!
The part that interests me is how to handle players who serve more than one function. Like how do you handle a 3/4 where a linebacker may sometimes be acting as the 4th lineman? Linebackers are tricky in general since some are excellent run stoppers, some pass rushers, some are good in coverage, and every combination of those. How do you handle the value of fullbacks when they hardly ever touch the ball? Do you make a similar concession to them as you do with the OL, that part of the halfback's numbers are applied to the FB? How do you value tight ends when they're partially blockers and partially receivers?
Regarding Edge, I'm sure a large reason his numbers have dropped off is the fact he's on a sucky team, but he's also at that age where most RB's numbers start to tail off anyway, so it's hard to say how much to attribute to the Cardinals and how much to attribute to his age. I'm guessing it's largely the team he's on too, but who knows. I'm also fairly convinced that the offensive system is far more important to a RB's success than most people think. That's not to say LT would suck on another team, but I could see him being just one of the better RB's rather than unbelievable if he were on a team that didn't have an offense built around him.
I think the more important thing initially in this project is "what is the pie made of", i.e., what factors determine the overall value of an offense, rather than being more precise in "cutting the pie", how it is divided between line, receivers, quarterbacks and running backs.
If you are averaging across seasons, then eventually, any major errors in how things were divided across groups may appear. I do think there are some additional assumptions (also likely to be wrong due to coaching influence and player development, injuries, etc) that need to be made.
Assumption: players tend to stay relatively similar in their own underlying performance, so that drastic changes are more likely the effect of team mate changes.
I would kick out (for now) the rookie seasons and final seasons. So, if Willie Roaf averages a 15, and RG on 1999 New Orleans averaged a 3, but, because you are dividing up equally (or even weighted, but for my argument, I will stick to equally), they both are 8's for that season, we know that Roaf's is too low. I would set a maximum deviation type number, such as all players (kicking out first and last year starting) will be +/- 2 points of their average.
The other assumption, and I highly questionable one on an individual level, is that coaches generally recognize the talent level of their players. Thus, if a starter is replaced on a good offense and never starts again (and does not go directly to another team to start) we should assume his contribution is less than what is being other shown. Perhaps a formula to reduce such a player based on the value of his replacement the following season.
I would disagree with #6 on the pro bowl thing. Braylon Edwards is going to the pro bowl because he has almost 1300 yards receiving and 16 touchdowns. His value is already represented by those numbers, we dont need to boost him further based on an award because he in fact, did put up those numbers.
Another important piece for WRs that can easily be forgotten is their blocking downfield. We remember the fullback and TE, but what about guys like Hines Ward, Wes Welker and others. These guys have a lot of extra value to their teams because of the extra yards their RBs are getting as a direct result of their blocking.
#9--I see what you're saying. Good call.
I like what JKL is getting at. Perhaps some iteration can then be performed. So if we've got Roaf at 8, 11, 8, 20, 20, 8, 16, 20, 10, 19, that averages to 14. If we then give him a 14 each year, that could change what the other people around him get (hence the iteration). So his teammates in the years Roaf got an 8 should really be reduced (Do you see why?).
I think Doug can probably work on this, but it sounds pretty complicated.
I wonder if different group positions work the same. Is the quality of an offensive line more dependent on how good its weakest members are, while a defensive front may be about the best guy?
Intuitively, I'd say that the value of the o-line is higher in rushing than in passing, since there are two basic parts of the running game (blocking and running) and three basic parts of the passing game (blocking, throwing, and catching). Overall, if you really want them to be the same, I'd make it 40%, and not include blocking by non-linemen in running plays in that amount.
I also don't like total points for the overall offense/defense values, but they're better than total yards. Maybe factor in yards per play... that might be a good way to split the value between running and passing, too, using rushing yards above average and passing yards above average (or below).
You must use game charting statistics to come close to approximating value.
Otherwise you have less of an approximation of individual value and more of a correlation to team success. Without charting, all 5 linemen will be equal (when this is patently absurd). Penalties, % of blocks, mental errors, sacks allowed, etc. There are plenty of statistics out there to do this, you just have to find them.
One more thing, you don't have to use the same statistics at every position. You (the formula deviser) simply need to come up with a logical way to relate the VALUE of a successful ILB to the entire defense and a great Tackle to the entire offense, and ultimately to the team (and then create a win share).
Good luck.
I think it would be hard to find something better than simply adding up Pro-Bowl + All-Pro Seasons. Here is the QB totals. As imperfect as this method is... it sure is simple.
1. Unitas 15
2. Favre 13
3. Graham 12
3. Marino 12
5. Montana 11
6. Tarkenton 10
6. Tittle 10
6. Young 10
6. Baugh 10
10. Dawson 9
10. Moon 9
10. Elway 9
oops. I missed Van Brocklin and Greise who each had a total of 10.
Doug (#16 and 17), that's fine, but where does Randall McDaniel and his 19 fit in? Does Chris Doleman (10) belong in the same tier --- approximate though it is --- as Tarkenton and Steve Young? All the approximate disclaimers in the world wouldn't make me comfortable with Will Shields ahead of Joe Montana.
I would say it only applies for each position.
Here is an incomplete list at Defensive Back:
1. R. Woodson 17
2. R. Lott 16
3. D. Sanders 14
3. K. Houston 14
3. W. Brown 14
3. E. Tunnell 13
7. J. Robinson 13
7. L. Wilson 13
7. W. Wood 13
10. Y. Larry 12
11. A. Williams 11
11. Christiansen 11
11. M. Renfro 11
11. P. Krause 11
11. M. Haynes 11
16. Night Train Lane 10
16. J. Patton 10
16. R. Wehrli 10
16. D. Grayson 10
You could argue AFL counts less and some years the league was too small etc... I'm just giving raw data. And I don't think it is too bad.
(signing as Doug B now to avoid confusion)