Random thoughts on completion percentage
Posted by Jason Lisk on March 11, 2010
This post will be exactly what it says--random thoughts on completion percentage as a measure of quarterback performance, as a statistic used in passer rating, etc.
1. Contrary to what you might think, I do not hate completions. Incompletions are a bad thing. A quarterback who completes 50% of his passes for 7.0 yards per attempt would be more productive if a few of those incompletions were instead turned into short completions. He might be a 55% passer with a 7.3 yards per attempt instead. A few more, and he might be a 60% passer with a 7.6 yards per attempt. Clearly, the latter is much better than the former.
2. The issue, such as what comes up with passer rating, though, is not whether incompletions are bad--they are. It's whether the tradeoff of incompletions for higher yards per completion or attempt is bad. It's whether Jason Campbell's 62.3% completions at 6.4 yards per attempt in 2008 is really as good as Mark Brunell's 57.7% at 6.7 yards per attempt three years earlier for the same franchise. Both of those seasons garnered similar passer ratings.
3. Completion percentage is often cited as a measure of accuracy. Clearly, the more accurate you are, the more passes you will complete, all else being equal. The problem, though, is that everything is not equal. It is highly dependent on the offensive system, the game situation, and the quarterback (or offensive) philosophy when it comes to attacking the defense, as well as the quarterback's willingness to throw the ball under pressure rather than take sacks. Let's put it this way. Is Chad Pennington really the most accurate passer in NFL history? To me, citing his completion percentage as evidence of being most accurate is a bit like citing the old lady who drives five miles a week as the safest driver because she has no accidents. We need more information than number of accidents or completion percentage before we can make a value judgment about safest/most accurate.
4. I don't think the relationship between completion percentage and success is linear, where we can simply say that a every additional completion is worth x points across the board. Let's consider some extreme hypotheticals. We have two teams that both average a whopping 14 yards per attempt. One team completes 100% of its passes; the other 50% (for 28 yards per completion). If I were to model those two, it seems pretty clear that the team that completes 100% would score more. They would score on virtually every possession, only failing to score in limited cases where their 3 consecutive completions net 9 or fewer yards. The 50% team would also score alot, but string together a few more droughts. I suspect my 100% completion team with 14 yards per attempt would average about 60 points a game, while the 50% completion team would average closer to 50.
At the other end of the spectrum, we have two teams that average 3 yards per attempt. If one of those teams completed 100% of their passes, they would struggle to maintain drives or even get them started, while a 25% completion team would occasionally string together first downs and get into scoring range. Neither would score much at all, but if I were forced to watch both teams for 24 hours straight as punishment for all my transgressions, I'd take the team with the yards per completion to win in a non-shootout.
Now, the question is, where is the line where it begins to switch? Where does a higher completion percentage at the same yards per attempt matter significantly in terms of points?
5. A team that completes a higher percentage (at the same yards per attempt) will have more first down passes. This is true. The comparable team with the lower completion percentage will nevertheless have a higher percentage of completions that result in first downs avoided. What do I mean by this? Consider two quarterbacks who gain 30 yards, the one that went 2 for 2 may have picked up two first downs. The other who gained it in one pass picked up one first down, and avoided the necessity of executing and gaining at least one more.
6. This one hit me when I was looking at Sonny Jurgensen's career splits as a result of this post on quarterback schedule adjustments. Jurgensen's yards per attempt, touchdown percentage and interception percentage were all worse against the Western Division, supporting the theoretical adjustment to schedule in his case. However, his completion percentage was actually higher against the West.
People talk about quarterbacks who pad their stats when the team is trailing. They'll talk about things like garbage touchdowns or late yardage to put up what appears like a big game. You never hear people talk about padding the completion percentage, though. For good quarterbacks, these late situations are precisely when they can put up high completion rates, all else equal (I'll get back to that in a second). Defenses that have a lead are willing to concede yards and first downs for time and preventing big plays late in a game. This is the situation when the accurate QB should make hay. I play fantasy football in a league that gives points for first downs. I've watched it happen as someone like Brees or Warner racks up first down after first down late when trailing. On the other hand, defenses that are trailing are incentivized to prevent completions, because they need to stop the clock and force a change in possession; a big play may be only marginally more costly than allowing a couple of first downs. In these situations, the same quarterback may find that completing passes is more difficult, but gaining larger chunks of yards is more likely to occur.
The problem with measuring this effect, though, is that good passers usually don't trail against bad pass defenses to begin with, so if they are trailing, it's usually to a pretty good defense. If we could artificially manipulate it, I suspect we would find Peyton Manning would complete a higher percentage of passes against the same Ravens defensive unit if we started two artificial games with 15 minutes left, one with the Colts trailing by 10, the other with them leading by 10.
7. Sacks taken should count against completion percentage; sacks avoided by throwing incomplete or out of bounds do. Or at least we should have a separate statistic that measures completions versus total opportunities to pass. A quarterback who throws two passes away rather than taking a sack will have a lower completion percentage, but has saved his team's chances at points. A 18 for 30 game (60%) becomes a 18 for 32 game (56.25%) with two sacks avoided-type incompletions.
8. Back to the question of where the line is for where completion percentage increase becomes advantageous for the offense. My prediction/guess is at roughly 7.5 yards per pass attempt, or roughly the point where a team that completes 65% of its passes is still going to average 11.5 yards per completion and throw for a healthy number of first downs per completion, that it becomes very profitable to have the more consistent (i.e., higher completion rate) offense compared to the less consistent one. This is probably about one standard deviation better than the average, and would mean that higher completion percentage (for teams with the same yards per attempt) is either a small factor or a negative factor for about 85% of the teams. It matters greatly for the elite passing teams, but probably not for most.
In other words, it matters at the high extreme and was a big reason why the Niners teams of the 1980's and 1990's were so good. Not so much for guys like David Carr and Charlie Frye.
9. Completion percentage, like sack percentage, is somewhat consistent for individual quarterbacks, even when they change teams. This suggests that part of what we are measuring is a repeatable skill and not just entirely system-driven or teammate related. I would tend to take the quarterback with the higher comp% over the lower comp% going forward in the future to be able to maintain the yards per attempt rate. In other words, I think it has some predictive value. This has to be tempered by system, because it is pretty clear that the QB's in a Don Coryell-based offense (Fouts, all of the Redskins QB's of the 1980's and early 1990's, and the
10. After writing paragraph #8, I pulled a list of team-pairs going back to 1970 for teams with the exact same yards per attempt in the same season, but completion percentages that differed by 5% or more. I'll probably post that separately rather than lengthen this one.
Related posts:

March 11th, 2010 at 11:07 am
Just a statistical measure question: YPA would seem to be affected by completion %age given when you don't complete a pass, it reduces its rate. But YPC is independent of completion %age, and thus is not affected by incompletions. So if the idea is to separate out the effect of a completion (# of yards) from the rate of completing passes, shouldn't YPA be used instead?
March 11th, 2010 at 12:05 pm
Just thinking out loud here. What if we had a weighted completion percentage stat? What if we took actual passing yardage divided by attempted passing yardage? Each attempted pass would be calculated based on how far downfield the intended receiver was. There could be 2 numbers: net weighted completion % and gross weighted completion %. Net would include YAC by the receiver. Gross would just be the yardage gained from the pass itself (excluding YAC).
March 11th, 2010 at 12:27 pm
I haven't actually seen anyone present evidence that YAC isn't a reproducible QB skill. I could imagine the facts pointing either way; that either some QBs are better at hitting WRs in stride/reading not only which players are open, but which will have time to make a move before getting pounded; or that YAC is a receiver skill that is reproducible for a given WR with change of QB/team; or even that YAC is purely random.
My own research has, however, indicated the mind-numbingly unsurprising fact that to a statistically significant degree, leaguewide, deeper passes have lower completion %s than shorter passes. Surprise! In addition, pass distance distributions are very different for different QBs. Rationally, a useful completion % number would compare a QB's % to what a league-average QB's % would be with an identical pass distance distribution. This is obvious to an extreme degree.
March 11th, 2010 at 1:50 pm
I second the feeling that throw-aways actually are a good decision, but (stupidly) hurt completion %. From the data on profootballfocus, I took an "adjusted" completion % for 2009 QBs.((att/(comp - throw aways - drops - spikes)) (I didn't included passes hit at the line, because IMO that can be considered a negative decision to throw a flat ball and telegraph your passing lane)
Everyone's naturally goes up, and generally the order/ranking is the same, but some differences get smaller (The difference between Drew Brees' % and Philip River's % drops from about 5% to 2%, as Rivers had many throw-aways)
I would be really interested in some "study" on playing against a D (that has the lead) on effect on comp% (at least in which QBs regularly benefit from this).
March 11th, 2010 at 3:41 pm
...Rams and Chiefs recently are undersold by passer rating relative to adjusted net yards per attempt in terms of the value they provided
This is an excellent article, but you really need to remove this sentence since it's just not true.
Kurt Warner's ranks in any/a from 1999-2001: 1,1,1. His ranks in passer rating in those years: 1,3,1
Trent Green's ranks in any/a from 2000 and 2002-2005: 3,3,3,5,5. His passer rating ranks in the same years: 2,4,4,7,8.
Marc Bulger's ranks in any/a in 2004-2006: 10,10,8. His passer rating ranks in those years: 8,5,7.
That means there is virtually no difference in there ranks in the two stats (actually based on these numbers any/a (very) slightly overstates there value).
Again, this was an otherwise great post.
March 11th, 2010 at 3:43 pm
Ugh, sorry about the double post but I just remembered Scott Linehan replaced Mike Martz in St. Louis in 2006 so that number should be thrown out, but my point remains the same.
March 11th, 2010 at 4:07 pm
Frug, thanks for the comment. I certainly wrote this off memory and didn't check it. I would say that ANYA favored Green every year except 2000, when they were equal. Bulger took alot of sacks, so yes, he is a poor example. Philip Rivers currently would probably be a better example.
March 11th, 2010 at 4:32 pm
I will mention that in response to point #7, that we can and should measure yds per DROPBACK. This also needs to include positive rushing yds by the QB if applicable. However, I think this would be difficult to measure for more than ~5 yrs ago, as the play-by-play is not easily available. Don't know if Football Outsiders would have total dropback info, nor if the pbp would show the difference between a scramble and a QB draw.
March 11th, 2010 at 5:08 pm
Jason, hate to be nitpicky but even Rivers isn't a great example. Since Turner took over in 2007, Rivers has posted any/a+ of 102, 128, 130 and passer rating+ of 101, 127, 125. The first two years were virtually identical and even the 5 point difference in 2009 was only the difference between ranking second in any/a and third in passer rating. And for his career, (which includes 1 year before Turner) he ranks 2nd in both any/a and passer rating (min. 1500 att.). (The Fouts comparison though still holds)
What I find really interesting is that when I looked at some of the active West Coast QB's, I found that their career any/a+ and passer rating+ are also all almost the same: Matt Hasselbeck - 103 any/a+, 103 passer rating+
Jeff Garcia - 112, 111
Brett Favre - 109, 110
Donovan McNabb - 107, 108
This really surprised me because I always assumed that WCO inflated passer rating. And while its true that Warner, Green, Bulger, Hasslebeck, Garcia, Favre and McNabb are not the best examples to use because they are all really good in general this makes me wonder if modern offenses have begun blending elements with each that has caused them to produce similar numbers from different schemes.
March 11th, 2010 at 5:18 pm
Joseph, I agree with you. A rushing attempt is normally a sack avoided, which is a partial reason why running qb's have higher sack rates (the real rate should take successful scrambles into account in the denominator since they are being dinged for unsuccessful ones where they try to break the pocket and get tackled behind the line).
We've discussed some of this internally. Even without PBP data, we can get an educated estimate for rushes that were scrambles versus kneel downs and sneaks. If you look at the game logs for passing qb's like Marino or Manning, you can see that they have very few runs, so knowing long run, total rushes, and yards, you can reconstruct most of their rushes. If a qb has 3 rushes for -3 with a long of -1, that is likely 3 kneel downs. If Peyton Manning has 3 rushes for -10 with a long of +8, you can probably guess a scramble with two kneel downs--I suppose it could have been a designed draw, but not likely. It's not exact, but we can estimate. We still can't tell scramble versus designed run, but we can surmise that -1 yd runs were not scrambles, because they would have been classified as sacks.
It's far tougher to guess for scramblers like Vick or Young who ran 5-7 times a game though.
March 11th, 2010 at 5:33 pm
Frug, no problem. I think I conflated completion percentage with passer rating for Rivers. He is extremely high on YPA and only above average at comp. However, his high td rate score makes up for it in the passer rating formula, so he ends up at similar rankings.
You know, rather than making statements not central to the main point (not that this post had a "main point"), I should probably just do a post looking at guys who were favored strongly by ANYA index, and those strongly favored by passer rating, to see who's on the list. A incomplete and possibly inaccurate quick look shows that the guys with the highest ANYA index score relative to Pass Rate since 1978 are Grogan, 1979, Blake, 1994, Williams, 1980, and Schroeder, 1986. E. Manning 2005 is the most recent extreme disagreement between ANYA and pass rate.
On the other end, it looks like a couple of Brian Griese seasons, Beurlein in 1998, and two Aikman non-SB seasons (91 and 96) rate at the top of pass rate > ANYA.
March 11th, 2010 at 6:09 pm
I should add that you are dead on about Pennington. He's dink and dunked his way to the top of the completion% charts because he can't throw the ball with any accuracy further than 15 yards, and has to get a running start to throw it more than 30 yards. That would be okay if exploited one of the key advantages of a short passing game, the short drops and quick plays, but he still gets sacked at a league average rate. He also doesn't throw TDs at a high enough rate to make up for the fact that he merely above average at avoiding INTs. Of course his biggest issues is the fact he is incapable of staying healthy in odd numbered years but that's for another post.
March 11th, 2010 at 11:31 pm
Is Chad Pennington really the most accurate passer in NFL history? To me, citing his completion percentage as evidence of being most accurate is a bit like citing the old lady who drives five miles a week as the safest driver because she has no accidents.
...
... you are dead on about Pennington. He's dink and dunked his way to the top of the completion% charts because he can't throw the ball with any accuracy further than 15 yards, and has to get a running start to throw it more than 30 yards.
Which Pennington are you guys taking about?
In 2002, before he was injured, CP's AYA was 8.2, pct was 69%, and rating was 104.
Peyton's had I think only two better years in his career while running a top team built to his personal order. Brady's had only one, on all those Super Bowl teams.
After CP's first rotator cuff he had an AYA of 7.0.
After *two* rotator cuffs he had an AYA of 7.8 on a Mia team that went 1-15 the year before.
Since he first started in 2002 the NFL's average AYA has been 6.26.
Pennington's career AYA is 6.9. So he's been the #1 pct guy *and* a signifcantly above- average AYA guy -- on some pretty lousy teams, playing with an arm held on by a string.
That Pennington?
Yeah, as a NY football fan I heard all the complaints and wishes that the Jets get rid of Popgun for a *strong armed* QB like Byron Leftwich, or any other of the big *strong arms* that were drafted from Leaf to Boller to JaMarcus. Clemens was drafted because of his strong arm! Now strong-armed #1 pick Sanchez, on the best team the Jets have had since Tuna left a dozen years ago, has just produced a 4.9 AYA (with a 54% pct).
You know, being able to hurtle the ball 70 yards through the air doesn't translate into an AYA >6.26.
March 11th, 2010 at 11:48 pm
After our exchange in the prior thread Jason e-mailed me suggesting I run some numbers. I did and was going to e-mail them back, but his follow-up post was already here, so I'll reply here...
The issue ... is not whether incompletions are bad -- they are. It's whether the tradeoff of incompletions for higher yards per completion or attempt is bad. It's whether Jason Campbell's 62.3% completions at 6.4 yards per attempt in 2008 is really as good as Mark Brunell's 57.7% at 6.7 yards per attempt
Actually the issue is the value of completion percentage holding AYA equal.
Low completion pct for the same AYA means higher variance in yards per individual attempt -- more "boom or bust", as the larger number of 0-yd plays has to be offset by a higher average yards per completion to keep AYA unchanged.
It is a pretty robust general finding across many fields that, when reaching a *target level* of total return is what matters ($X of investment return achieved by date Y, enough points scored to win a game) lower variance is valuable, and higher variance is costly. The reason is that with higher variance events are more prone to "bunch up" -- bunched-up good events go over the target and are wasted, while the offsetting bunched-up bad event cause losses.
An extreme illustrative example I gave earlier was two baseball players, A who hits six home runs in one game, and B who hits one homer in each of six different games. Player A will get in the record books and receive all the celebrations -- but his six homers will win at most one game, maybe. Player B's six homers could win 1, 2, 3 ... as many as 6 games. To win the most games, you want B's performance. Consistent return has positive value.
As a real-world sports demonstration, a new study shows the "Pythagorean theorem", which as we all know projects W-L pct from total scores for/against, becomes significantly more accurate when team variance in scoring is considered, with low-variance teams winning more games. Quoting the author...
~~~~~
"Bottom line: More consistent teams (narrower run distribution) tend to win more games for the same RPG (runs per game). Teams with higher SLG (slugging percentage) tend to have a narrower run distribution. Given two teams with the same RPG, a team with a SLG .080 higher will on average win one more game a season. If their pitching/defense has the same RPG allowed but a SLG allowed .080 lower, that would add another game."
"The more consistent a team is in scoring runs, game to game, the better the team's winning percentage for the total number of runs scored .... runs alone don't tell the whole story ... Consistency is another factor. You want to score runs, and you want to be consistent."
~~~~
The "improved" formula including the measure of consistency reduces the error of Pythagorean projections by about half.
Why all this would not be as true in football as everywhere else is unexplained.
If it is true in football, then for comp PCT *not* to have a positive value in the general case, or actually have a negative value, the argument would have to be that *reduced* consistency of AYA results in from no-change to *increased* consistency in team scoring. That would require some explanation, and data to back it up.
I can't check game score data to test that (others can!) but I did look at aggregate data for 2007-9, three seasons, all teams.
The dominant factor in passing-scoring relationship should be AYA, as the goal is to move the ball, not complete a lot of passes that don't. But adjusting for AYA, comp PCT as a measure of consistency should add a modest positive value. (Just as in baseball runs F/A is the dominant factor in W-L, but consistency adds modest positive value.)
I ran a multiple regression for scoring, all 96 teams, to separate the effects of AYA and PCT. The result was: 348 + 3.02 AYAn + 0.49 PCTn, where AYAn and PCTn are a team's AYA and PCT divided by the league average.
In plain English it means 348 points (the league average) +/- 30 points for every 10% by which AYA differs from average, and 5 points for every 10% PCT differs from average. Those look reasonably as expected to me, AYA dominates while PCT has a modest positive benefit, league-wide.
Jason suggested to me that I run separate regressions for "best" and "worst" teams, so I did, for the top 25% of teams by scoring, mid 50%, and bottom 25%.
But before giving the results I'll point out how separate subgroup regressions can change and distort the original numbers in several ways...
[] Passing O is less than half of football. So good teams will "outscore" the predicted regression amount for all teams, and bad teams "underscore" it, due to the quality of D, etc. (The top and bottom 25% teams by about 30 points average). That by itself will shift the regression line for a group of only good (or bad) teams to fit this different scoring, and move coefficients up and down.
[] Comparing teams selected to be close to each other in scoring/strength greatly reduces the R-Squared, so the regression explains much less -- the numbers are a lot less meaningful, chance is a much bigger factor in results.
[] The variance for AYA is much higher than for PCT. Compared to average, AYA ran 143% to 59%, PCT only from 114% to 80%. So unless comparison is made by standard deviation (which I didn't -- lack of time) at the extreme there'll be distortion.
[] Game play effects matter too -- "correlation is not causation". E.g,. More passing correlates with less scoring not because passing hurts scoring but lack of scoring drives more passing. Winning teams rush more not because rushing causes winning but winning causes rushing, etc.
I could go on, but these make the point that reading "subgroup" regressions is tricky, especially at the extremes, for very good/bad teams. Lots of caveats. That said...
* For the middle 50%, 48 teams, the regression found 10% of play differential giving +/- 9 points for AYA and 1.4 points for PCT. The same basic ratio as before for all teams, with smaller coefficients (because these are all near .500 teams compared just to each other, so smaller differences).
This is the bulk of the league. To me this is consistent with the expected general rule applying *generally*.
* For the bottom 25%, 24 teams, the result per 10% was 8 points for AYA and negative 2 points for PCT.
But was the negative 2 because high PCT hurts poor teams? Or because poor teams score fewer points, pushing the coefficient down to start, and then because after they are outscored and way behind they face prevent Ds that permit only short passes? Losing causes the -2 instead of the -2 causing losing?
Also these numbers did not pass the statistical significance test, and had a very low R squared of only 0.22, showing they did not account for much of the scoring results.
* For the top 25%, 24 teams (again, just against each other, not the whole league) the result per 10% was 15 points for AYA and 35 points (!) for PCT. But was this big difference because very high PCT for top QBs is a "killer"? Or was it that top teams run up more points the equation has to capture, and after they run up big leads top teams turn to short safe passes as well as running. Winning causes high PCT passes, as well as rushing, instead of them causing winning? Or a mixture of all the above?
Of course, to the extent that high PCT is a "killer" for top QBs on top teams, it is entirely consistent with the idea that consistent performance maximizes scoring and winning. The very best QBs are the most consistent with the highest AYAs.
BTW, for these "top" teams the result was significant and the Rsquare was .55
It seems plausible that while the value of PCT is generally positive and modest, it scales up as higher PCT matches higher AYA in top QBs, compound benefits producing a non-linear rising result. But I don't know, and can't test that.
For what it's worth.
March 12th, 2010 at 12:26 am
Jim, everyone (or at least everyone who reads this blog) knows that arm strength is overrated as an evaluation tool. While there is a certain minimal level of arm strength necessary to succeed in the league (that's why Danny Wuerffel never made it in the NFL despite having good accuracy in college and a fairly high football IQ), anything on top of that is just icing on the cake.
That said, neither of us ever said Pennington was bad, or even bellow average, just that his completion% tended to exaggerate true value. When healthy Pennington has ranged between above average and good but I don't really think it's fair to call him the most accurate passer in league history.
(I do however stand by my statements about his inability to stay healthy in odd numbered years)
March 12th, 2010 at 1:01 am
Pennington's wikipedia page used to say he was the most accurate passer in NFL history because of his comp. %, but I changed it some time ago because that's not really what comp. % proves. It hasn't been changed back so maybe people are learning. Comp. % really isn't a stat that gets pimped by the networks during NFL broadcasts, or at least I don't think it does. Sure they'll routinely show you the numbers during the game, but you rarely see them focus on it or come up with special graphics to elaborate on it. It still gets much more attention than YPA, unfortunately.
March 12th, 2010 at 7:16 am
Thanks for comment, Jim. I agree with a lot of what you said. I was considering the best way to look at it, and have considered running linear regression at various YPA ranges over a larger data set. I will post the paired teams when I get some time, which gives us a larger set over time.
March 12th, 2010 at 3:06 pm
Interesting discussion. I think what you're looking for is some kind of functional maximum, based on completion % and depth of attempts, where a QB's impact on team's performance is maximized. I'll use my EPA/play stat as a measure of performance impact. This takes into account down, distance, and yard-line.
Since I can't post a graphic here, I'll just describe the plot. If you plot completion percentage on the x axis and EPA/play on the y axis, you see a very tight linear relationship. (Imagine a Milky Way galaxy from the side, tilted up and to the right.) There is no point of inflection, or 'U' bend, in the graph where really high completion percentages are net negatives.
There are however a scattering of QBs is in the upper left quadrant of the plot. These are guys with high completion percentage but who were net negatives for their team (think David Carr). I think if I can isolate all the QBs with high completion percentages, and look at what kind of YPA or YPC (per completion) is necessary for EPA to be at least break even.
March 12th, 2010 at 3:35 pm
Ok. I filtered only QBs with > 63% completion rates. 63% is where the guys in the upper left quadrant "break away" from the pack. In order to be safely assured of having a net positive EPA/play you need to have > 10.0 yds per completion.
Of the 81 high-percentage QBs with net positive EPA, all of them had 10.0 or greater YPC. Of the 19 high-percentage QBs with net negative EPA, only 4 of them had 10.0 or greater YPC. 15 of the 19 "David Carrs" had below 10 yds per completion. I think that's your point of inflection, where high completion percentage hurts more than helps.
Data comprised of all QBs from 2000-2009, qualifying with >20% of his teams pass attempts in a given season.
March 12th, 2010 at 4:12 pm
One other point...Jim-I don't think a regression like that is going to provide any answers for a couple reasons. One, YPA includes comp pct within itself. It's essentially YPC * comp%. So when you do a multivariate regression with YPC*Comp% and Comp%, you're going naturally collect the bulk of variance within the YPA variable's coefficient.
The second reason is that linear regression assumes just that--a purely linear relationship. If I understand correctly, what you're trying to find is a condition where high comp% is *not* linear with respect to team success.
March 12th, 2010 at 4:19 pm
Ok. I promise this is the last one. In case anyone is interested here are the guys with high comp pct (> 63) but with negative net EPA:
Brian Griese
Charlie Frye
David Carr
JP Losman
Tim Couch
Seneca Wallace
Jesse Martin
Daunte Culpepper (his last yr in MIN)
Steve McNair (his last season injured w/ BAL)
Kurt Warner (his abysmal final season with STL--mostly due to lots of fumbles and ints)
Kelly Holcomb
and yes, Chad Pennington (but for just 1 season with the Jets).
March 13th, 2010 at 12:28 am
Just for the record, the AYA - PCT relation should is best measured by standard deviation, especially to avoid problems at the extremes.
I said that before but didn't do that before -- so now I have, for the top and bottom subgroups (measured against themselves, with all the caveats)...
The standard deviation for AYA (in yards) and completion PCT (in percentage points)... pts scored per standard deviation ... change in standard deviation per point.
[] The top 25% of teams by points scored:
....... SD ........... pts ..... per pt.
AYA ... 0.85 yds .... 17.71 .... 0.048 yds
PCT ... 2.95 %pts ... 16.31 .... 0.181 %pts
[] The bottom 25% of teams:
AYA ... 0.80 yds .... 14.01 .... 0.057 yds
PCT ... 3.87 %pts ... -1.66 .... 2.331 %pts
These are less extreme than the non-SD numbers I put up before.
The negative number for PCT for the bottom teams is tiny -- increasing completion pct by 2.3 percentage points reduces scoring by one point for the season. That's really nothing, with the RSquare very low and the statistical significance test failed.
Personally I'd guess the negative number is the result of bad teams losing, not being able to score, and their bad QBs not being able to throw anything else but short passes. (Analogous to more throwing correlating with more losing not because it causes losing but losing causes throwing.)
For the top teams the value of PCT is very much higher than average (though not as high as my prior comment put it). How much of this is due to higher PCT rising in value with higher AYA and how much to good teams running up leads then switching to safer high pct passes (much as winning teams rush more) I can't say. Probably a mix of both.
As I mentioned before, good teams score more points than average, and bad teams fewer, which shifts the points allocated by the formula between the two groups.
Also:
[] All 96 teams 2007-9
AYA ... 1.04 yds .... 57.7 .... 0.018 yds
PCT ... 4.34 %pts ... 3.5 .... 1.234 %pts
These numbers are larger obviously because of the much larger spread from the top to bottom of all teams.
It looks to me like the value of PCT overall is positive, quite modest on average, however rises with rising AYA -- although at the top and bottom extremes the numbers are probably somewhat exaggerated by "reverse causation" effects associated with very good/bad teams.
IMHO, FWIW
March 13th, 2010 at 9:51 am
Jim-I don't mean to insult, but it looks like you're a little confused. Using SD instead of raw variables isn't going to help at the extremes. The outliers would just become standardized outliers. What normalizing the variables would do is allow us to compare the resulting coefficients on the same proportionate scale.
What you might be thinking of is using the log of the variables in the regression. That would mute the effect of outliers, but I don't think that's necessary after looking at the data.
Also, like I mentioned above, comp% is a very large component of AYA (or any other form of YPA). When the regression result heavily favors AYA over comp%, keep in mind that a very large part of AYA depends on comp%. "Inside" the regression, the 2 variables are battling over how much variance they account for. YPA is the same as yards per completion * comp%, so the regression is a battle between a team of YPC & comp% vs comp% alone.
March 13th, 2010 at 4:03 pm
Jim-I don't mean to insult, but it looks like you're a little confused...
Of course I'm confused, I'm calculating and posting these things after midnight in a flying rush. No insult taken.
The point is that calculating the regression by SD is better than by % off average, being that the range by % varies so much greater for AYA than PCT. One wants to keep measurement units comparable, and a given % of change from average is a lot more for PCT than AYA. (A 30% better than average AYA produces such-and-such. A 30% better than average PCT doesn't exist -- that's what I met by distortion at the extremes.)
Jason suggested I run a few regressions and now I've run 'em. That's all there's going to be on them from me.
If anyone wants to push the data analysis further, I welcome all and any to do so.
As to my personal opinion on the main subject -- that positive pct has a modest positive value for any given AYA -- I've explained the reasons for my thinking above, and see in the data no reason to change it.
That's all on this from me. IMHO, FWIW.
March 13th, 2010 at 8:43 pm
I gotcha. I understand now.