People, on the other hand, can and do lie. And they often use statistics as an aid, which is why the "damn lies" quote is so popular. I thoroughly approve of heavy doses of skepticism when reading through statistical material (or anything else, for that matter), but there's no reason to throw out the good with the bad. What I'd like to do in this article is to make you aware of some of the ways people can mislead you -- sometimes intentionally and sometimes not -- using statistics. Armed with this knowledge, you can distinguish the damn liars from the people who (while they may be boring and even a bit geeky) are legitimately attempting to convey useful information through the use of numbers.
"The problem arises when people use statistics like a drunk uses a lamp post: for support instead of illumination."This is by far the most important thing to keep in mind when reading through a statistical study. Illumination means looking at all available evidence to help answer a question. Support means finding and citing particular statistics that support a point that's already been decided upon by the author.
If, when reading through an article, you get the idea that the author had made up his/her mind on the issue before ever looking at the numbers, you should proceed with extreme skepticism. If statistics are not brought in until after a conclusion has been drawn, there is a good chance that you're getting the truth but not the whole truth.
For example, suppose I was trying to sell you on the theory that RBs who get a lot of carries one season are likely to get hurt the following year. I could rant and rave for a few paragraphs about how the human body simply isn't made to withstand the punishment that workhorse NFL backs get, and that they are therefore more susceptible to future injury. Then, to drive home the point, I'd produce this:
"Over the last two years, 17 RBs have gotten 275 or more carries. 11 of those 17 missed time due to injury the following season. 6 of the 17 suffered serious season-ending and/or career threatening injuries the next year." |
Pretty convincing, huh? (It's true, by the way). But I've misled you in several ways:
"In the last 15 years, the only RBs to amass 1000 rushing yards, 700 receiving yards, and 9 TDs in a season are Marshall Faulk and Tiki Barber." |
It's true. And it sure makes Tiki look good. Think of all the truly great RBs that have come and gone in 15 years, but none of them (except Faulk) could do what Tiki did last year.
There's really nothing wrong with this comment, as long as you recognize it for what it is: essentially meaningless trivia. Did Barber have a fine season last year? Yes. But this blurb somehow implies that it was a truly special season, which it wasn't.
Again, notice that the cutoffs (15, 1000, 700, 9) are specifically crafted to allow Tiki in while keeping others out.
The important thing to realize is that you can put almost anyone in a class with elite players if you choose just the right categories and just the right cutoffs.
Ed McCaffrey?
"The only WRs with at least 1000 yards and 7 TDs each of the last three seasons are Randy Moss, Cris Carter, and Ed McCaffrey." |
Ed McCaffrey is a very good WR, but he's not in a class with Moss and Carter as the above quote implies. Tinker with the cutoffs a little, and you'll get guys that are actually more comparable to McCaffrey. Make it two years, 1000 yards, and 6 TDs, and now you've just added Jimmy Smith, Tim Brown, Isaac Bruce, Marvin Harrison, Amani Toomer, and Muhsin Muhammad to the list. Doesn't sound quite so impressive anymore, but by setting cutoffs so that McCaffrey is in the middle, rather than at the bottom, of the list, we get a more realistic assessment of McCaffrey's achievements.
Keenan McCardell?
"The only players with 60 receptions and 850 yards in each of the last 5 seasons are Jimmy Smith, Tim Brown, Cris Carter, and Keenan McCardell." |
Hell, if I get a little creative, I can even make Kevin Faulk look good:
"The only players under 25 years old last year who led their team in rushing and had over 450 yards receiving were Edgerrin James, Ahman Green, and Kevin Faulk." |
"Last year, Tyrone Wheatley had more rushing yards than Ricky Williams, more rushing TDs than Robert Smith, and more receiving yards than Emmitt Smith." |
The key is to select backs who were better than Wheatley last year, but then pick the weakest part of each of their games before comparing with Wheatley. Robert Smith had a great year last year, but only had seven rushing TDs. That's his weakest link. Ricky Williams' yardage total was suppressed by an injury. Emmitt Smith had only 79 receiving yards.
It also doesn't hurt that Williams and Emmitt have a great deal of name recognition. If you're not paying close attention, you might read that and think, "wow, I didn't realize Wheatley is right up there with all those great backs," which is the intended effect.
"The Cowboys are 63-1 when Emmitt Smith carries the ball 25 or more times." |
First, note that you're only getting half the story. What's the Cowboys' record when Emmitt doesn't get 25 carries? But that's not the main issue here.
The author of the (fictitious) quote above is trying to convince you that giving Emmitt a lot of carries helps the Cowboys win. In pictures:
Emmitt gets lots of carries ========> Cowboys winBut isn't it possible that that arrow might be pointing the wrong direction? Maybe what's actually happening is that, whenever the Cowboys have the game wrapped up, they give Emmitt a lot of carries at the end to kill the clock. That is,
Cowboys win =======> Emmitt gets a lot of carriesWhich is it? I don't know, but the above quote doesn't give you any information. In short, just because two things (like Cowboy wins and big Emmitt games) are related -- even strongly related -- does not necessarily mean that one causes the other.
The classic (non-football) example of this is that ice cream sales are correlated with violent crime. It's a fact. In months where ice cream sales are high, violent crime rates are also high. When ice cream sales are down, violent crime is down. Does this mean that ice cream causes crime? Maybe, but probably not. The more likely explanation is that some other factor (like maybe the weather) is a factor in causing both.
To bring this back to football, suppose I produced irrefutable evidence that players who changed teams were more likely to have their numbers drop than players who didn't (I don't know if this is true or not, but suppose it is). Does this mean that changing teams hurts a player's stats? Maybe, but maybe not. Ask yourself if there might be another factor at work affecting both. Age might be such a factor. Maybe players who switch teams are more likely to be old and players whose numbers drop are also more likely to be old. It's possible that this bias is what's causing the correlation and that the team-switching has absolutely nothing to do with it.
Another example: suppose it were true that players with high salaries are more likely to be injured than players with low salaries (I don't know if this is true or not, but suppose it is). Does this mean that Eric Moulds and his new contract should be avoided? Maybe, but probably not. More plausible, I think, is that quarterbacks are more likely to have high salaries and quarterbacks are more likely to get injured. That's probably where the correlation is coming from.
I wish I could say I'd never perpetrated a damn lie, but I can't. I wish I could say I'll never do it again, but I probably will. Like all human beings, I have biases -- some that I'm aware of and some that I'm not -- and these can creep in to the work I do. What I can say is that I've never knowingly told you a damn lie. And the best way to make sure that I never tell you one in the future is to let you know how to spot them and invite you to question everything I do and my reasons for doing it.