SITE NEWS: We are moving all of our site and company news into a single blog for We'll tag all PFR content, so you can quickly and easily find the content you want.

Also, our existing PFR blog rss feed will be redirected to the new site's feed. » Sports Reference

For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.

Search Results

Greatest QB of All-Time III: Career rankings

Posted by Chase Stuart on August 11, 2009

Yesterday, I explained the methodology behind my grading of every quarterback-season in NFL history. Today, I'm going to present the career results. As usual, I'll be using the 100/95/90 approach, where each QB gets 100% of his score in his best season, 95% of his score in his second best season, 90% of his score in his third best season, and so on. This is the key to rewarding guys who played really well for a long time, but without killing guys with really bad rookie years or seasons late in their career. It also helps to prevent the guys who were compilers from dominating the top of the list. The table below shows the top regular season QBs in NFL history, using three different metrics.

"VALUE" shows each quarterback's converted yards over average, as explained in yesterday's post. "REPL" shows each QB's converted yards over replacement, defined as 75% of league average. I like using the "Value" score as a HOF indicator and to answer the question of who were the best quarterbacks ever. However, after the top 30 or so QBs, I like using replacement value, which rewards guys who were good for a long time. Being average for 10 seasons means you were probably a better QB than someone who was good for two seasons. Using three-fourths of league average as the baseline is probably the best to judge a large group of quarterbacks, like when we want to separate the Eli Mannings from the Jeff Georges and Trent Dilfers from the Ryan Leafs. If we have to rank a random 100 QBs in NFL history (or if we're trying to judge how good an average draft pick was), the replacement category is best. Deciding who was the 42nd and who was the 43rd best QBs ever? I'd use the replacement value category, but note that this only works well for players in the same era. The replacement value formula, for various reasons, is biased towards modern QBs. For both the "VALUE" and "REPL" metrics, I pro-rated non-16 game seasons in the usual manner, splitting the difference between pro-rating and not pro-rating at all (i.e., a 9-game season is pro-rated to a 12.5 game season).

If you want something that is totally era-independent, you'll want to use the third column, "SEARK." That formula measures each quarterback's rank in each season. If you were the #1 QB in the league in any season, you got 10 points; if you were #2, you received 9 points; #3, 8 points, and so on. This actually helps the older QBs since they played in smaller leagues, and therefore it was easier to accumulate more "SEARK" points; however, since older QBs did not stick around as long as modern QBs, I think this metric is pretty era neutral. Note that I combined the AFL and NFL QBs in the '60s for the purposes of the "SEARK" column, although each player was only compared to the other QBs in his own league for the VALUE and REPL categories. Additionally, all AAFC stats have been excluded (sorry Otto Graham), as the NFL does not officially recognize them the way the league does with AFL stats (although the HOF does consider AAFC performances).

Finally, I showed the main team each QB played for, along with what percentage of his career value came with that team. It is possible (see Daunte Culpepper) to get over 100% of your career value with one team, if you are below average with your other teams.

Here are the top 100 QBs in NFL history according to my formula, sorted by converted yards of value over average. An * means the player is in the HOF, while a + means the QB was active in 2008:

28 Comments | Posted in Best/Worst Ever

Greatest QB of All-Time, Version III (Methodology)

Posted by Chase Stuart on August 10, 2009

In 2006, I devised a system to rank every quarterback in NFL history. Not surprisingly, I found that post hopelessly out of date and imprecise after just two years. I created a new formula in June 2008 that was a big improvement, but still left a bit to be desired. I have some controversial thoughts that I'm considering implementing to improve the formula, but I'm not ready to make those changes right now and the season is fast approaching. On the other hand, there are some relatively noncontroversial tweaks I can make to the '08 system that would improve the results, and there is no reason to wait a full year to make those changes. I like the idea of updating the series every two years (methodology, worst QBs ever, best QBs ever, playoff analysis, best overall QBs ever), so I won't be doing a full update this year. Just a methodology discussion today and a look at the best QBs ever (by career) tomorrow. On Wednesday, Doug is going to break out some exciting new data, and I'll show you how those data affect the top QBs on Thursday. Next summer I'll have a full update, but we'll visit some of the issues relating to grading QBs in the coming months (and we'll need feedback from you guys).

For now, I've made three key updates to the formula that are clear improvements to the '08 version. For those that don't remember, here's a quick summary of that method: We begin by calculating adjusted net yards, which is done by starting with passing yards, adding a 10-yard bonus for all passing touchdowns, subtracting 45 yards for all interceptions, and subtracting out the number of sack yards lost. That number (ANY) is divided by adjusted attempts, calculated as pass attempts plus sacks. Then we compare each QB to the league average (excluding the QB in question) to see how many ANY/A each player was above or below league average. That difference is then multiplied by each QB's adjusted attempts to determine how many adjusted net yards over average he added. Finally, to give credit to rushing quarterbacks (but not too much credit), the last step in the old formula was to add adjusted rushing yards (10*rushing touchdowns plus rushing yards) over 4.0 yards per carry; so 400 yards (and zero touchdowns) on 50 carries would be worth +200.

Here are the three changes I'm making:

  • 1) Increasing the value of a touchdown from 10 to 20 yards. This was thoroughly derived last October. In addition to being more precise from a theoretical standpoint, it also conforms to common perception better than the smaller bonus. From now on, all passing touchdowns are worth 20 yards.
  • I have not done any rigorous analysis on the value of an interception, but it's on the to-do list. For now, I'm sticking with the 45-yard penalty as derived by the writers of The Hidden Game of Football.

25 Comments | Posted in Best/Worst Ever

AFL versus NFL: introduction

Posted by Jason Lisk on March 3, 2009

Next football season will mark the fiftieth since the American Football League began playing in 1960. With that historic anniversary approaching, I thought it would be interesting to take a look back and do an in-depth comparison of the teams in both leagues during the decade between the start of the AFL and the AFL-NFL merger for the 1970 season.

Ask someone who was around during this time how the AFL and NFL compared to each other, and you are likely to get a variety of answers, primarily dependent on where their allegiances lay. I hope to sort through this and provide a detailed statistical look that tries to bring all the available evidence to the table, put it in context, and try to develop a best estimate that answers both general and specific questions about the teams and leagues, and how they compared before 1970. This all may very well prove to be a fool’s errand, and but some of the types of questions which hopefully can be addressed include:

**When did the AFL catch up with the NFL during the 1960’s and become at least comparable competitors, if ever? Think of it in terms of kind versus degree. To draw a college football analogy, when did the AFL stop being the MAC to the NFL’s Big Ten—where a few teams may be able to be competitive but the rank and file would have trouble—and instead become the Pac-10, where one league or the other may have a better year at any given moment, but where we consider the talent roughly equal over time?

3 Comments | Posted in AFL versus NFL

Go West! (and then go West again)

Posted by Jason Lisk on October 14, 2008

This season, thanks to the schedule rotation adopted in 2002, both the New England Patriots and New York Jets play four games on the West Coast, against Seattle, San Fransisco, Oakland and San Diego. Until this year, neither has played more than two regular season road games in the Mountain and Pacific Time Zones since the merger. Over the previous four seasons combined, the Patriots have played two regular season games and three post-season games (including last year's Super Bowl in Arizona) out West, while the Jets have played three regular season games and one post-season game.

How rare is it for an Eastern Time Zone team to play this many games out West in a single season? As it turns out, pretty rare. My research has found thirteen individual seasons when an Eastern team has played four or more regular season games out West. For my purposes, I'll define West as both the Pacific and Mountain Time Zones, so I will include Denver and thus not have to figure out if Arizona was or was not on the same time schedule as the California teams due to daylight savings. Before the merger of the AFL and NFL, it was theoretically impossible for an Eastern team to play four games in the West (though we'll find out below it did happen once before). Here are the teams that have travelled West four or more times in a single regular season since the AFL-NFL merger:

1979 Atlanta Falcons (1-3 in West, 6-10 overall) 
1981 Cleveland Browns (1-3 in West, 5-11 overall)
1988 Atlanta Falcons (2-2 in West, 5-11 overall)
1989 New York Giants (3-2 in West, 12-4 overall)
1990 Cincinnati Bengals (2-2 in West, 9-7 overall)*
1991 Atlanta Falcons (3-1 in West, 10-6 overall)
1992 New York Giants (0-4 in West, 6-10 overall)
1994 Atlanta Falcons (1-3 in West, 7-9 overall)
1994 Cincinnati Bengals (1-3 in West, 3-13 overall)
1994 Pittsburgh Steelers (1-3 in West, 12-4 overall)
1997 Atlanta Falcons (2-2 in West, 7-9 overall)
1998 New York Giants (2-2 in West, 8-8 overall)
2005 New York Giants (2-2 in West, 11-5 overall)

*also lost playoff game at Los Angeles Raiders in 1990 divisional round

Two other teams, the 1987 Cleveland Browns, who lost in the AFC Championship game in Denver, and the 1990 New York Giants, who won at San Fransisco in the championship game before winning the Super Bowl, played three regular season games in the West in addition to the fourth game in the conference championship.

It should be no surprise, if you recall that Atlanta played in the NFC West, that they would appear on this list five times. What is surprising is that the New York Giants did it four times, while two other East Coast teams from the same division, the Philadelphia Eagles and Washington Redskins, have never played that many out West.

The sample size of teams here is so low that there is not much meaningful analysis that I can give you as to whether the cumulative effect of this much additional travel in a single season for an East Coast team matters. I went ahead and looked at the simple rating system numbers for each team in the year before and after the extensive Western travelling, compared to the year in question. In the seasons before and after, our Eastern teams had an average SRS of -1.0. In the season in question, the average SRS was -1.3. The Eastern teams performed about -4.0 points worse than their overall SRS in the Western games, which is not that much different from the generally expected 3 points for home field advantage. (I measured this by taking the end of season SRS for the road Eastern team minus the home team, then comparing the actual results versus the expected results from the SRS differences). There was no real pattern to performing worse or better as the season went on, as a whole, as the first game played out West showed the worst score (relative to season SRS) and the fourth game was the second worst.

Of course, New England isn't just playing four games out West. They just concluded back to back games on the West Coast against San Fransisco and San Diego, and stayed at San Jose in between games to practice rather than travel back East. Later this year, they will also play Seattle and Oakland in back to back weeks. This will mark the first time in the history of the NFL that an Eastern team has played consecutive Western games on two separate occasions within the same season.

Extended trips to the West Coast were not unusual for the old AFL teams, the Boston Patriots and the New York Titans/Jets (as well as the Buffalo Bills). In the old AFL, at least prior to Miami and Cincinnati joining, every team played all other league members on a home and home basis. The East Coast teams usually played two, or sometimes even three consecutive games on the road against Oakland, San Diego and Denver. The worst travel start for a team in the history of the AFL/NFL has to belong to the 1967 Boston Patriots, and it was because the team played their home games at Fenway Park. The Patriots opened the season with three consecutive losses on the road at Denver, San Diego, and Oakland. They returned East and won at Buffalo on September 24. The Patriots were scheduled to play their first home game against San Diego. However, the Boston Red Sox won the American League pennant for the first time since 1946, and advanced to play the Saint Louis Cardinals in the World Series. The Patriots lost out to the primary tenants. Even though the Series opened in Boston, and moved to Saint Louis the weekend of October 7th and 8th, the Patriots moved their game with the Chargers back to the West, playing a second game in San Diego. That game ended in a 31-31 tie, which happens to be the last time the Patriots franchise played in a game that ended in a tie.

So how have other teams done when they have played back to back games in the West? I found thirty occasions where an Eastern team has played a game in the West, then returned West to play again a week later (including the post-season). Based on Sunday night's game between the Chargers and Patriots, you might guess it had a big impact. It is a true factual statement (using the SRS differences to account for relative strength of opponents) that those teams collectively performed worse in the second consecutive game on the West Coast than they did in the first, applying the SRS differences for each team in the matchups compared to the actual scores. The Eastern road team was better (relative to their performance in the first game on the road trip) 11 times, worse 18 times, and about the same once.

That said, it's not so much that the Eastern teams played really badly in the second game. It's that they played REALLY well, as a group, in the first. Excluding the New England-San Diego game, since I don't have end of year SRS numbers, the average result in the first game was +0.97 points better than expected, without accounting for home field advantage and the fact the Eastern team was on the road. The average result in the second game was, in contrast, -2.16 worse than expected without accounting for home field, which is not a bad performance for a road team, regardless of where the game is played. I don't see any strong evidence that the performance in the second game was worse than what should be expected for a road team if we had no knowledge of where they played the week before. So New England may be the first team to play back to back games on the West Coast at two different times in the same season, but I don't see any reason to think this is a competitive disadvantage compared to, say, the way the Jets' trips to the West are spaced this season.

10 Comments | Posted in Home Field Advantage

2007 Standings: Simple Ranking System

Posted by Chase Stuart on June 16, 2008

A couple of years ago, Doug described the Simple Ranking System, which is a basic method of ranking just about anything. You can use it to rank NFL teams, as well as NFL offenses and defenses. Here's a quick description of the system:

To refresh your memory, it’s a system that’s been around forever and is extremely basic compared some of the other power rating systems out there. But I like it because it’s easy to understand. An average team will have a rating of zero. An above average team will have a positive rating while a below average team will have a negative rating. Every team will have a rating that is the equal to their average point margin plus the average of their opponent’s ratings, so the teams’ ratings are all interdependent: the Colts’ rating depends upon the ratings of all their opponents, which depends upon the ratings of all their opponents (some of which are the Colts), and so on.

The '07 Eagles outscored their opponents by 36 points, or 2.3 PPG. The Eagles had a really difficult schedule, playing nine games against the Patriots, Seahawks, Packers, Giants, Cowboys and Redskins. The Eagles' average opponent was 3.0 PPG better than average, so we can estimate that the Eagles must have been 5.3 theoretical points better than a league average team.

	Ovr Rat	   SOS
nwe	 20.1	   0.4
ind	 12.0	   0.3
dal	  9.5	   1.3
gnb	  9.0	   0.0
sdg	  8.8	   0.8
jax	  6.8	   0.1
phi	  5.3	   3.0
pit	  5.2	  -2.5
was	  4.5	   3.0
min	  3.8	   0.4
nyg	  3.3	   1.9
sea	  1.8	  -4.6
chi	  1.2	   2.1
tam	  1.2	  -2.8
ten	  0.7	   0.5
hou	  0.0	   0.3
cle	- 1.1	  -2.3
cin	- 2.4	  -2.1
nor	- 2.5	  -2.0
det	- 3.6	   2.6
nyj	- 3.7	   1.7
ari	- 3.9	  -4.3
den	- 3.9	   1.6
buf	- 4.1	   2.3
kan	- 5.5	   1.4
car	- 5.8	  -0.8
oak	- 6.0	   1.2
bal	- 6.7	   0.1
mia	- 8.4	   2.3
atl	-10.6	  -0.9
sfo	-11.9	  -2.9
ram	-13.0	  -2.0

The Patriots' +20.1 is by far the highest of all time. The '91 Redskins were +16.6, the '85 Bears at +15.9, and only three other teams were +15.0 or higher. The Patriots had a plus/minus differential of +19.7which is of course amazing; but incredibly, New England accomplished that against an above average schedule.

The Eagles, Redskins, Lions, Bills, Dolphins and Bears all faced rough schedules in 2007; I suspect that all those teams will be at least somewhat undervalued in 2008 because of that. Conversely, the Seahawks and Cardinals had incredibly easy schedules last year. Is it even going out on a limb anymore to say that Arizona will be overvalued this year?

The SRS has a lot of uses, including some predictive ability for the next season.

16 Comments | Posted in History, Statgeekery

History of the NFL’s structure and formats, part one

Posted by Jason Lisk on May 5, 2008

This post will trace the history of the NFL's structure, in terms of league size, expansion (particularly as it applies to the currently existing franchises), length of schedule and format, and playoff structure. I am not going to focus on the champions or specific on field results each and every year, though I may discuss a few. If you want to see any particular season, you can go here.

Part one will discuss the league up through the 1959 season. Part two will pick up in 1960, the year that the American Football League started. I would encourage comments from anyone if you think I have omitted something important, or have any personal knowledge or historical info. I am recreating this almost entirely from reviewing the yearly standings, franchise indexes, and specific yearly team pages here at pro-football-reference. Some of the historical references to World War II, as well as date checking were confirmed using this site for the Pacific and European Theatres.

11 Comments | Posted in History

Rushing, passing, and sacking simplicity

Posted by Doug on December 17, 2007

As regular readers know, I frequently refer to a scheme that I call the "simple rating system." There are lots of rating systems out there, and I'm not saying the SRS is necessarily better than any of them in any particular sense. I just think it's, for lack of a better word, neat. It creates a set of rankings that are easily interpreted, that "add up," and that just generally make sense.

Here is the post where I explained all the nuts and bolts of the system. Since then, I (and Chase) have used it many times for various things. In this post I talked about decomposing a team's overall rating into an offensive and defensive component. To be more precise, I should probably say a "points scored" component and a "points allowed" component. As was pointed out in the comments, points scored is not a measure of just offense, and points allowed in not a measure of just defense.

Anyway, the point of this post is to apply that same thinking to other stats. Mathematically, there's no difference between points scored and, say, rushing yards. Or passing yards, or sacks, or turnovers. All you have to do is tell the machine that the "score" of the game was the rushing yards for and against and you've got schedule-adjusted rushing ratings. So I finally got around to doing the programming, and I now have Simple Ratings for rushing yards, passing yards, total yards, turnovers, sacks, and sack yards. (If you want to do Simple Ratings for rate stats, like yards-per-rush or sack percentage, it gets a little more complicated, but it can be done. I'm still working on the programming for that, and I'll show it to you when it's done. For now, we'll limit ourselves to counting stats).

So this post will be mostly a data dump with just a bit of commentary thrown in, but I'll follow up on it tomorrow with some (possibly) more interesting analysis that I'll preview after the dump.

All these rankings are for the period 1970--2006.

Points for


MIN  1998    13.2    15-1-0
STL  2000    12.6    10-6-0
WAS  1983    11.7    14-2-0
WAS  1991    11.7    14-2-0
IND  2004    11.7    12-4-0
SDG  1982    10.5     6-3-0
BUF  1975    10.4     8-6-0
STL  2001    10.4    14-2-0
KAN  2004    10.0     7-9-0
SDG  2006    10.0    14-2-0

6 Comments | Posted in General, History, Statgeekery

Home Field Advantage and Team Efficiency Stats

Posted by Jason Lisk on November 13, 2007

I'm going to take a look at home field advantage, and whether a team's offensive and defensive passing or rushing efficiency stats have any relationship. When I use the term "home field advantage", or "HFA" here, what I really mean is "the difference between the advantage of playing at home, and the disadvantage of playing on the road." But that does not exactly flow off the tongue, so just know that not every thing that creates the difference has to do with the home field or characteristics of the home team.

Also, while the team efficiency stats (which you can find on each team's page as well as the yearly team stats pages) are not perfect, but they are much better than looking at raw yardage numbers. For example, if a team is averaging 4.0 yards a carry, does this mean the team is consistently gaining 4 to 5 yards on a lot of attempts, or that the team is more like a 3.5 yards per carry team, but one with a few big runs boosting the numbers? We cannot answer that as to any particular team, but it is better than nothing. With that in mind, let's look at what team characteristics might be tied to increasing or decreasing home field advantage.

I looked at all teams that finished between 6-10 and 10-6 since Jacksonville and Carolina joined the league (1995-2006). My choice of those records is partially arbitrary-- I could have just as easily narrowed it to 9-7/7-9, or expanded it to 11-5/5-11. But my goal was to look at the middle class of the NFL, teams that generally have some strengths but also some flaws. I felt this dividing line would accomplish that.

For each team, I then looked at the home/road splits in record, and compared it to the team's offensive yards per rush attempt, offensive yards per pass attempt, defensive yards allowed per rush attempt, and defensive yards allowed per pass attempt.

210 total teams finished between 6-10 and 10-6 during the 12 seasons reviewed, an average of over 17 per season--so slightly more than half the teams in the league on average. The entire population averaged 0.586 win percentage at home and 0.417 win percentage on the road, for a +0.169 difference. This would equate to +1.36 more home wins than road wins over the course of a 16 game schedule for the average team.

Two of the categories showed no correlation with changes in home field advantage. These were offensive yards per rush attempt, and defensive yards allowed per pass attempt. Within this population, as both team offensive yards per rush attempt and defensive yards allowed per pass attempt improved, the team's winning percentage, both home and road, improved. However, the differences between home and road stayed fairly constant.

Which leads to the other two categories. Let's start with the stronger of the two, defensive rush yards allowed per attempt. I divided the 210 teams into five roughly equal tiers based on rush defense: excellent (3.6 ypa or lower), above average (3.7 to 3.9), average (4.0 to 4.1), below average (4.2 to 4.4), and poor (4.5 or higher). Here are the home/road splits in winning percentage:

category    no.     home          road          difference
excellent   43      .606          .395          +.211
above avg   50      .614          .403          +.211
average     37      .622          .416          +.206
below avg   41      .537          .419          +.117
poor        39      .548          .460          +.088

It looks like a direct relationship between rush defense and home field advantage, as the better run defenses show an above average home/road difference, while the below average defenses have small splits.

Here are the pass offense numbers, sorted by excellent (7.5 or more ypa), above average (7.1 to 7.4), average (6.7 to 7.0), below average (6.3 to 6.6) and poor (6.2 or lower).

category    no.     home          road          difference
excellent   34      .614          .471          +.143
above avg   36      .608          .429          +.179
average     45      .574          .424          +.150
below avg   50      .578          .396          +.182
poor        35      .568          .382          +.186

These numbers are not nearly as pronounced as the rush defense. There is some tendency for pass offense to be inversely related to home field advantage, as the excellent group is a little below average in home/road difference, and the below average and poor groups perform above average in that respect. However, when we cross-reference rush defense and pass offense, two types of teams emerge that show significant differences in home field advantage.

Fifty-two teams had both an above average or excellent rush defense (3.9 or fewer yards allowed per rush attempt) and a below average or poor pass offense (6.6 or fewer yards per pass attempt). These run stopping, poor passing teams combined to win .604 at home and only .363 on the road, for a difference of +.241. That equates to almost two more home wins than road wins per season on average.

17 of the 52 (32.7%) had at least 3 more home wins than road wins. Only three of these teams finished a season with more road wins than home wins (and all finished with exactly one more road win).

At the opposite end of the spectrum, there were thirty-five teams that finished with a below average or poor rush defense (4.2 or more yards allowed per rush) and an above average or excellent pass offense (7.1 or more yards per pass attempt). These "good passing, can't stop the run" teams won .561 at home and .473 on the road, for a difference of +.088. That is an average of +0.70 more wins at home a season.

Only 6 of the 35 (17.1%) "good passing, can't stop the run" teams won at least 3 more home games than road games. Of these six, three came from Kansas City and Denver, two of the strongest home field advantages in the league. The other three were from dome teams (Detroit 1995, Minnesota 2003, Saint Louis 2004), two of which play in a division with outdoor cold weather rivals.

Almost half of these teams (17 of 35) finished with at least as many road wins as home wins. The 1997 Bengals and 2000 Saints both finished with 4 more road wins than home wins.

If the strength of rush defense does increase home field advantage, there is a potential explanation. If a team is better at stopping the run, it is conceivable that such a team would be somewhat more likely to place its opponent into more 3rd and long situations. This might translate to a bigger advantage at home, where the offense is subject to crowd noise, than on the road, where the home crowd would presumably be quiet to aid the offense. On the other hand, relatively poor passing offense could increase the road disadvantage, for much the same reasons.

5 Comments | Posted in Home Field Advantage

Every game counts

Posted by Doug on September 27, 2007

As everyone knows, there are lots of reasons to dislike the BCS. But today I'll tell you one reason to like it. Or at least one reason I like it. The fact that the computer ranking algorithms play a real role in the process means that, at least theoretically, every one of the dozens of games played each Saturday has the potential to affect your team's chances of making the title game.

As an example, let's take a look back at 2004, when Oklahoma, USC, and Auburn were all undefeated. You'll remember that USC demolished OU in the championship game while Auburn ended up playing a consolation game against Virginia Tech in the Sugar Bowl. In that particular case, Auburn probably wouldn't have ended up in the title game even if they had ranked higher in the computer polls, but the possibility certainly exists that this year (or any year), a few thousandths of a point on a few of the computer rankings could determine who plays in the big game. My personal margin-not-included ranking algorithm, which is very similar to at least one of the official BCS computer polls, shows the following pre-bowl rankings for that season.

  1. SouthernCalifornia         12-  0       22.53 
  2. Oklahoma                   12-  0       20.46 
  3. Auburn                     12-  0       19.75 

Auburn played a slightly weaker out-of-conference schedule than Oklahoma, and the Pac 10 was stronger than the SEC that year, so that's how Auburn ended up third. They were third in almost all the computer polls if I recall correctly. But the margin between Auburn and OU was close enough that changing the outcome of just a game here or there could flop them. The only SEC / Big 12 matchup of the regular season was a very close Texas win over Arkansas. Had Arkansas won it instead, we would have had this.

  1. SouthernCalifornia         12-  0       22.66 
  2. Auburn                     12-  0       21.54 
  3. Oklahoma                   12-  0       16.88

The point is: every single interconference game, especially those between two BCS conferences has the potential to make significant changes in the rankings.

That's pretty obvious. What's less obvious is that even intraconference games can make a difference. Let's flip the Arkansas/Texas result back, so that OU outranks Auburn again. Now, if you flip the results of the North Texas / Middle Tennessee State game and the Louisiana Tech / UTEP game, Auburn jumps OU again. Why? Because the Big 12 played three games against North Texas, winning all three. So where North Texas finishes in their conference is relevant to determining the overall strength of the Big 12, which is obviously is a key factor in determining how strong Oklahoma is. Likewise, SEC teams played a couple of games against La. Tech, so an extra win by them raises their stature just enough to prop the SEC up just enough for Auburn to slip ahead of the Sooners.

Once you've got that in mind, you begin to realize that you might have a rooting interest in lots of intra-conference games that you never thought you cared about.

If you're an Ohio State fan, you have to root for Oregon (who beat a Big 10 team) to beat Cal (who beat an SEC team) this weekend. If you're a West Virginia fan (who doesn't have any particular animosity for any of your conference-mates), you were happy about the South Florida win over Auburn and disappointed about the Louisville loss to Kentucky, obviously, but you were also not pleased about Mississippi State's win over Auburn. If you like West Virginia, in fact, you are now a big fan of Auburn and Kentucky in all their SEC games and you like Michigan State (who beat Pitt) in the Big 10 and Oregon State (who lost to Cincinnati) in the Pac 10.

So while it's very unlikely that the outcome of the Washington / Arizona State game will be the deciding factor in getting Oklahoma or Texas into the championship game instead of Ohio State or Wisconsin, games are always more fun to follow if you have a rooting interest. And whether you know it or not, you almost always do.

8 Comments | Posted in BCS, College

Trivial observations from the simple rating system

Posted by Doug on June 22, 2007

Wednesday's post about the 49ers forced me to dust off the simple rating system and get it updated with 2006 data.

Last May I wrote a long post about the SRS. Read that if you're interested in the nuts and bolts. For now, you just need to know that I frequently use the SRS as my quick gauge of team strength that allows me to compare teams across divisions and across years.

Given their terrible point differential and weak schedule, it is predictable that the SRS thinks the 2006 49ers were the worst 7-9 team ever. Just for fun, I decided to find the best and worst (according to the SRS) teams at each record.

            BEST               WORST
Record    tm  yr  rating     tm  yr  rating
 3-13    ind 1997  -4.3     ari 2000 -15.2
 4-12    cin 1979  -1.3     ari 2003 -12.6
 5-11    phi 1982  +0.8     stl 1985 -10.1
 6-10    den 1999  +3.4     nor 1973  -8.8
 7- 9    kan 2004  +5.3     sfo 2006  -8.7
 8- 8    jax 2006  +7.5     stl 2004  -6.0
 9- 7    sdg 2005  +9.9     ari 1998  -7.4
10- 6    sfo 1991 +10.9     chi 1977  -3.6
11- 5    sfo 1995 +11.8     atl 2004  -2.2
12- 4    pit 1979 +11.9     det 1991  +1.0
13- 3    gnb 1996 +15.3     bal 1970  +0.4

[NOTE: non-16-game-schedule teams have been mixed in with the closest 16-game record.]

That 1999 Broncos team, by the way, was the first team since the 1931 Frankford Yellow Jackets to play an entire season without facing a team that ended up with a losing record.

Another trivial observation: 2006 saw the smallest spread between the league's best and worst teams in nearly a decade. The Raiders may have looked worse than a typical worst-in-the-league team, but objectively they were actually pretty good for a worst-in-the-league team.

 YR     Best        Worst         Diff
2006  nwe +10.2   oak  -9.6       19.8
2005  ind +10.8   sfo -11.1       21.9
2004  nwe +12.8   sfo -13.6       26.5
2003  kan  +8.3   ari -12.6       20.9
2002  oak +10.6   cin -10.5       21.1
2001  stl +13.4   buf  -9.5       22.9
2000  oak  +9.7   ari -15.2       25.0
1999  stl +11.9   cle -14.1       25.9
1998  min +14.9   phi -12.8       27.6
1997  den +10.7   sdg  -8.9       19.6
1996  gnb +15.3   nyj -10.1       25.4
1995  sfo +11.8   nyj -11.2       22.9
1994  sfo +11.6   hou  -7.3       19.0
1993  sfo  +9.7   ind -11.3       20.9
1992  sfo +11.8   nwe -11.0       22.8
1991  was +16.6   ind -17.3       33.9
1990  buf  +8.6   nwe -14.6       23.2
1989  sfo +10.7   dal -10.4       21.1
1988  min +10.9   sdg  -7.4       18.2
1987  sfo +13.1   atl -13.9       26.9
1986  nyg  +9.0   tam -15.4       24.4
1985  chi +15.9   stl -10.1       26.0
1984  sfo +12.7   buf -12.0       24.8
1983  was +13.9   hou -11.5       25.5
1982  nyj +10.3   hou -10.9       21.2
1981  phi  +8.7   bal -15.8       24.5
1980  phi  +9.7   nor -10.4       20.2
1979  pit +11.9   det -11.0       23.0
1978  dal +11.0   sfo  -9.1       20.1
1977  den +11.3   tam -10.7       22.1
1976  pit +15.3   tam -19.7       35.0
1975  pit +14.2   nor -14.1       28.3
1974  was +10.2   atl -12.3       22.6
1973  ram +13.4   hou -16.7       30.0
1972  mia +11.0   nwe -17.4       28.4
1971  bal +10.4   buf -13.4       23.9
1970  min +15.1   bos -15.9       31.0

20 Comments | Posted in General

Will the Colts be tired this season?

Posted by Doug on June 18, 2007

I was listening to the podcast: The Audible earlier this week, and two thoughts came to mind.

First, it's been awhile since I plugged The Audible. Right now they are, among other things, having fifteen-minute conversations with beat writers from each NFL team. If you've got a commute or if you listen to headphones while you get your exercise in, The Audible makes ideal listening material.

And in particular, I got an idea as I listened to the interview with Colts' beat writer Mike Chappell from the Indianapolis Star. Chappell mentioned that, because they had such a long season last year, the Colts have been taking it easy during the offseason. Coach Tony Dungy has given them more time off and has made practices less intense.

Part of the curse of being me is that I am totally incapable of hearing something like this without trying to set up some sort of study on it. We know NFL teams that win a lot of games in Year N tend, as a group, to decline in the Year N+1. Part of that is due to regression to the mean, a phenomenon that transcends football. Part of it might be due to the structure of the NFL: the salary cap making it harder to keep star players, having the last draft slot, and so on. But might part of it be also due to the fact that Super Bowl teams play more games and have a shorter offseason?

From what I can gather, football can be a physical game. I don't think it's unreasonable to suggest that a team that has six months to recover from a 16-game season has an advantage over one that has five months to recover from a 20-game season.

Of course teams that play 19- or 20-game seasons will tend to do better the next year, as a group, than teams that play 16. Here is a meaningless chart that confirms that:

Playoff games played Year N      Av Wins Year N+1
              0                        7.3
              1                        8.8
              2                        9.1
              3+                      10.3

But that's to be expected. It's simply another example of causation and correlation not being synonymous. Year N+1 wins are not caused by playing playoff games in Year N. Rather, both of them are caused by the same unnamed factor: being good.

But the interesting (to me, that's who!) question is: given two teams with the same number of regular season wins in Year N, does the one that played more postseason games in Year N figure to do better than, worse than, or the same as the team that played fewer?

Answer: the same. More specifically, if your intuition tells you that "the same" was the right answer, then there is nothing in the data that should cause you to seriously re-consider that. In particular, here is what I did. I looked at all pairs of seasons starting in 1978 (the first year of the 16-game schedule), not counting pairs that included a strike year. For each number of wins starting at 9, I looked at all teams with that number of regular season wins in Year N and then ran a regression of Year N+1 wins versus Year N postseason games played.

For no group of teams did the input variable appear to be significant. For what it's worth, the coefficient was positive for most groups of teams. For example, here are the results for the 12-win group:

Year N+1 wins =~ 8.14 + .58*(postseason games played in Year N)

So based on what the data shows for 12-win teams, every playoff game played in Year N is associated with .58 more wins in Year N+1. But as I said above, that .58 is probably not big enough to infer a real effect, in the same sense that you probably wouldn't conclude that a coin was biased if it came up heads 56 times out of 100, unless you already had some other reason to believe it was biased.

Age, offseason movement, schedule strength, and countless other factors obviously play roles here too, and they haven't been accounted for. This was just a quick check and it failed to find evidence for a tiring factor, perhaps because Super Bowl coaches like Dungy adjust their teams' schedules accordingly.

10 Comments | Posted in General

Ranking the historical Super Bowl teams

Posted by Doug on February 1, 2007

If not now, when? Everyone else is posting lists of the best and worst Super Bowl teams of all time, so I may as well pile on. This one will be different --- not better --- than most because it's completely objective. This is simply a list of all 74 Super Bowl teams since the merger, ranked according to this simple rating system. So really, I'm not ranking the teams. I'm ranking the teams' distance away from their competitors in the given year.

Recall that, under the simple rating system, the units on Rating are points. So the 1991 Redskins' rating of 17.3 means that they were 17.3 points better than an average 1991 NFL team.

Here is the list, Super Bowl winners are marked with an asterisk. A few observations follow:

TM YR Rating SOS Record
1. *chi 1985 18.1 0.3 18-1
2. *was 1991 17.3 0.4 17-2
3. *gnb 1996 16.3 0.6 16-3
4. *sfo 1989 15.2 -0.0 17-2
5. *mia 1973 14.8 0.4 15-2
6. *sfo 1984 14.4 -1.6 18-1
7. *pit 1975 14.2 0.1 15-2
8. *nwe 2004 13.8 2.7 17-2
9. was 1983 13.7 1.8 16-3
10. stl 2001 13.6 -0.1 16-3
11. *sfo 1994 13.5 -0.7 16-3
12. *dal 1992 13.3 1.0 16-3
13. *nyg 1986 12.9 1.4 17-2
14. *pit 1979 12.8 2.2 15-4
15. *stl 1999 12.2 -4.1 16-3
16. *tam 2002 12.0 0.5 15-4
17. *den 1998 11.9 -1.5 17-2
18. *den 1997 11.7 0.5 16-4
19. *bal 2000 11.7 -0.3 16-4
20. *dal 1971 11.5 -1.7 14-3
21. *was 1982 11.2 1.6 12-1
22. *dal 1993 11.2 1.1 15-4
23. *oak 1976 11.1 2.2 16-1
24. buf 1990 11.1 -0.6 15-4
25. dal 1978 11.0 0.1 14-5
26. *mia 1972 11.0 -2.6 17-0
27. *pit 1978 10.8 -0.6 17-2
28. mia 1984 10.8 -1.4 16-3
29. *dal 1995 10.8 1.1 15-4
30. *dal 1977 10.6 -1.0 15-2
31. den 1977 10.6 3.2 14-3
32. *rai 1983 10.4 1.1 15-4
33. oak 2002 10.1 1.9 13-6
34. mia 1982 10.0 1.8 10-3
35. *pit 2005 10.0 1.2 15-5
36. phi 1980 9.5 0.4 14-5
37. *nyg 1990 9.2 1.0 16-3
38. sea 2005 9.2 -1.3 15-4
39. min 1973 9.1 1.2 14-3
40. min 1976 9.1 1.0 13-3
41. atl 1998 9.0 1.5 16-3
42. *pit 1974 8.6 -0.5 13-3
43. chi 2006 8.6 -2.5 15-3
44. gnb 1997 8.2 -0.2 15-4
45. was 1972 7.9 -0.8 13-4
46. *sfo 1988 7.9 1.1 13-6
47. *nwe 2003 7.6 1.0 17-2
48. mia 1971 7.6 -0.9 12-4
49. dal 1970 7.4 2.2 12-5
50. ind 2006 7.3 2.3 15-4
51. *sfo 1981 7.3 0.6 16-3
52. dal 1975 6.6 0.1 12-5
53. den 1989 6.6 0.9 13-6
54. *oak 1980 6.6 1.4 15-5
55. phi 2004 6.5 -1.5 15-4
56. min 1974 6.4 -0.9 12-5
57. nwe 1996 6.4 -0.4 13-6
58. cin 1988 6.4 -0.7 14-5
59. cin 1981 6.4 -1.0 14-5
60. *was 1987 6.3 -1.3 14-4
61. nwe 1985 6.1 2.5 14-6
62. *nwe 2001 5.3 -0.6 14-5
63. den 1986 5.0 2.9 13-6
64. buf 1991 5.0 -3.1 15-4
65. buf 1993 4.9 0.0 14-5
66. pit 1995 4.8 -0.1 13-6
67. buf 1992 4.6 -0.7 14-6
68. den 1987 4.4 -0.5 12-5
69. nyg 2000 4.4 -1.2 14-5
70. ten 1999 3.9 -0.6 16-4
71. sdg 1994 3.2 0.2 13-6
72. *bal 1970 2.5 -4.4 14-2
73. car 2003 1.9 -0.8 14-6
74. ram 1979 -0.1 -0.8 11-8


  • As you can see, I put the Bears and Colts in there; they're 43rd and 50th respectively. The winner could conceivably move into top half of the list with a blowout. But regardless, this year's winner will rank among the weaker Super Bowl champs. The loser, on the other hand, will be fairly strong (for a loser).
  • Students of the basic mathematics of ranking systems and of AFL/NFL history will know why I have not included the pre-merger Super Bowl teams on the list. Had I included them, the 1969 Chiefs would have been #1, largely on the strength of what the system perceived to be an incredibly difficult schedule. Because all the information we have from that season (one game) indicates that the AFL was better than the NFL (by 16 points!), the system essentially starts from the assumption that the AFL is the much stronger league.
  • I was born in 1971. I definitely do not remember the Raiders/Viking Super Bowl of 1976. I definitely do remember the Cowboys and Steelers in 1978. I think I remember the rather forgettable Broncos/Cowboys game in between, but I may be fabricating those memories. Anyway, I had always filed that Bronco team away with the rest of the bumbling Super Bowl losers, but they were a terrific team. They were 14-3 against arguably the toughest schedule of any Super Bowl team in history. They only played four games against teams with losing records. Of their three losses, two were to the eventual champion Cowboys and the other was against the defending champs: an 11-3 Oakland team.
  • We all know the 1999 Rams had a weak schedule, but every time I see it it seems to get worse. Here are the regular season win totals of their opponents: 8, 5, 4, 4, 5, 2, 13, 8, 8, 4, 3, 8, 3, 7, 6, 5. And their playoff opponents were very weak (by playoff standards) too.

23 Comments | Posted in General, History

Another rating system: maximum likelihood

Posted by Doug on December 14, 2006

Several months ago, I spent two posts (1, 2) talking about mathematical algorithms for ranking teams. All the chatter that comes along with the BCS standings has gotten me inspired to write up another one.

This one does not take into account margin of victory, and it is very similar to one of the BCS computer polls. I'll tell you about that at the end of the post.

Let's start with a 3-team league:

A beat B
B beat C
C beat A
A beat C

So A is 2-1, B is 1-1, and C is 1-2. We want to give each team a rating R_A, R_B, and R_C. And we want all those ratings to satisfy the following property:

Prob. of team i beating team j = R_i / (R_i + R_j)

What if we just arbitrarily picked some numbers. Say R_A = 10, R_B = 5, and R_C = 1. If those are the ratings, then (assuming the games are independent) the probability of seeing the results we actually saw would be:

(Prob of A beating B) * (Prob of B beating C) * (Prob of C beating A) * (Prob of A beating C)

which would be

10/(10+5) * 5/(5+1) * 1/ (1+10) * 10/(1+10) =~ .0459

To summarize: if 10, 5, and 1 represented the "true" strengths of the three teams, then there would be a 4.59% chance of seeing the results we actually saw. That number (4.59) is a measure of how well our ratings (10, 5, and 1) explain what actually happened. If we could find a trio of numbers that explained the actual data better, it would be reasonable to say that that trio of numbers is a better estimate of the teams' true strengths. So let's try 10, 6, and 2. That gives the real life data a 6.51% chance of happening, so 10, 6, and 2 is a better set of ratings than 10, 5, and 1.

What we want to do is find the set of ratings that best explain the data. That is, find the set of ratings that produce the maximum likelihood of seeing the results that actually happened. Hence the name; this is called the method of maximum likelihood. Imagine you have three dials you can control: one marked A, one B, and one C. You're trying to maximize this quanity:

(R_A / (R_A + R_B)) * (R_B / (R_B + R_C)) * (R_C / (R_A + R_C)) * (R_A / (R_A + R_C))

One way to increase the product might be to turn up the A dial; that will increase the first and fourth of those numbers. But there are diminishing returns to cranking the A dial. Once it's been turned up pretty high, then turning it up further doesn't increase the first and fourth terms much. Furthermore, turning up the A dial decreases the third number in the product, because A lost that third game. So you want to stop turning when the increases in the first and fourth terms are balanced by the decreases in the third.

The game is to simulaneously set all three dials at the place that maximizes the product. How exactly we find that maximum is a bit math-y, so I'll skip it. If people are interested, I can post it as an appendix in the comments [UPDATE: here it is]. But the point is, it can be done.

If we do it in this simplified example, we get this:

Team A: 8.37
Team B: 5.50
Team C: 3.62

[Of course, if you multiplied or divided all those numbers by the same constant, you'd have an equivalent set of ratings. It's the ratios and the order that matter, not the numbers themselves.]

Using these numbers we could estimate, for example, that the probability of A beating B is 8.37/(8.37+5.5), which is approximately 60.3%. I've never seen these predictions actually tested on future games. That is, if you look at all games where this method estimates a 60% chance of one team beating another, does the predicted winner actually win 60% of the time? Maybe I'll test that in a future post, but for now it's beside the point. Perhaps the best way to interpret the 60.3% figure is not: this method predicts that A has a 60.3% chance of beating B tomorrow. Rather it's this: assigning a 60.3% probability to A beating B is most consistent with the past data.

This distinction is reinforced when we look at the rankings produced by this method through week 14 of the 2006 NFL season:

TM Rating Record
sdg 4.790 11- 2- 0
ind 3.716 10- 3- 0
chi 3.617 11- 2- 0
bal 3.469 10- 3- 0
nwe 2.439 9- 4- 0
cin 1.714 8- 5- 0
nor 1.666 9- 4- 0
jax 1.617 8- 5- 0
dal 1.256 8- 5- 0
den 1.232 7- 6- 0
nyj 1.209 7- 6- 0
nyg 1.097 7- 6- 0
ten 1.056 6- 7- 0
buf 0.976 6- 7- 0
kan 0.887 7- 6- 0
phi 0.851 7- 6- 0
pit 0.777 6- 7- 0
mia 0.764 6- 7- 0
atl 0.753 7- 6- 0
sea 0.712 8- 5- 0
car 0.603 6- 7- 0
min 0.469 6- 7- 0
cle 0.448 4- 9- 0
hou 0.395 4- 9- 0
gnb 0.391 5- 8- 0
was 0.362 4- 9- 0
stl 0.312 5- 8- 0
sfo 0.306 5- 8- 0
tam 0.278 3-10- 0
ari 0.192 4- 9- 0
oak 0.134 2-11- 0
det 0.101 2-11- 0

The Colts' probability of beating the Lions, according to this method, is 3.72/(3.72+.101), which is about 97.4%. That's a bit higher than my intuition says it ought to be. Part of that, remember, is that the method doesn't take into account margin of victory and therefore does not know that the Colts have squeaked by in a lot of games and were destroyed by the Jaguars. All it sees is a team that has played a very tough schedule and still has nearly the best record in the league. But the other part is that this isn't designed to predict the future, it's designed to explain the past.

I told you that this method is similar to one of those actually in use by the BCS. That method is Peter Wolfe's, and he describes the method here.

The method we use is called a maximum likelihood estimate. In it, each team i is assigned a rating value R_i that is used in predicting the expected result between it and its opponent j, with the likelihood of i beating j given by:

R_i / (R_i + R_j)

The probability P of all the results happening as they actually did is simply the product of multiplying together all the individual probabilities derived from each game. The rating values are chosen in such a way that the number P is as large as possible.

That is precisely the system we've described above, but if you load up all the games and run the numbers, you won't get numbers that match up with the ones Wolfe publishes. I'll explain why in the next post.

18 Comments | Posted in BCS, Statgeekery

BCS thoughts

Posted by Doug on October 20, 2006

The first set of BCS standings was released last weekend. As usual, nobody is happy with "the BCS," but different people are unhappy with different aspects of it, and very few people actually understand it. "The BCS" has basically become a synomym for "something about the structure of college football that I don't like."

It would make a lot of sense for the powers that be to do away with all the formulas and standings and just have a selection committee announce the bowl pairings on December 10th. They could still use the computers in an advisory capacity, as the college hoops people do.

It is admirable, I suppose, that they are attempting to make the process transparent by stating the formula in advance. But if they really wanted to make it transparent, they'd choose open-source computer algorithms. Of the six computer algorithms, only Wes Colley's is fully open to public inspection (major kudos to Colley for this). Peter Wolfe, Kenneth Massey, and Jeff Sagarin give some information about their methods but not enough to completely reconstruct their rankings. The only information on the The Anderson-Hester rankings page is so vague that it is totally useless. Rob Billingsley says an awful lot but ultimately leaves us with no real idea of the nuts and bolts of his ranking system, which incidentally is either the high or the low ranking (and is therefore thrown out) on nine of the top 10 teams in the current BCS standings.

Now I don't blame Jeff Sagarin and the others for not publishing the details of their systems. Not one bit. The algorithms are proprietary and, at least in Sagarin's case, I assume he makes money doing what he does. But I do blame the NCAA for choosing these algorithms when there are some perfectly fine open methods out there. David Mease's, for example, is very good in my opinion, as is Colley's, which they do use.

The fact that they did not select open methods tells me one of two things: (1) no one associated with the NCAA really understands any of these ranking methods or knows about the variety of methods that are available, or (2) they specifically do not want the process to be transparent. I suspect it's probably both, but more (2) than (1). The unveiling of the BCS standings each Sunday loses a bit of its suspense if nerds across the internet are able to compute and post them immediately after Saturday night's games. The human polls would still prevent the nerds from being able to compute exactly, and there are some nerds that do a pretty good job of it as is, but my suspicion is that the NCAA doesn't want transparency. It wants publicity. And the weekly unveiling of the standings each week provides that.

Enough of the rant.

Of all the teams with a reasonable shot at the title, I think I'll be pulling for the West Virginia Mountaineers. One thing I've noticed about the computer rankings is that the Big East is not the weakest of the BCS conferences, not by a long shot. In fact, almost every reputable computer algorithm that I've seen has them ahead of the Big XII and ACC.

My margin-of-victory-not-included ranking system of choice is similar to Wolfe's and is nearly identical to Mease's (referenced earlier). Here is the top 25:

Team W-L Rating
1. SouthernCalifornia 6- 0 11.65 0.903
2. Michigan 7- 0 11.09 0.899
3. OhioState 7- 0 9.47 0.885
4. Florida 6- 1 6.38 0.844
5. Auburn 6- 1 5.93 0.836
6. Rutgers 6- 0 5.23 0.820
7. Louisville 6- 0 4.96 0.813
8. Arkansas 5- 1 4.85 0.810
9. BoiseState 6- 0 4.67 0.805
10. NotreDame 5- 1 4.57 0.802
11. California 6- 1 4.46 0.799
12. Tennessee 5- 1 4.21 0.791
13. WestVirginia 6- 0 3.88 0.779
14. Texas 6- 1 3.38 0.759
15. Oregon 5- 1 3.23 0.752
16. BostonCollege 5- 1 2.98 0.739
17. Clemson 6- 1 2.90 0.735
18. Nebraska 6- 1 2.90 0.735
19. Wisconsin 6- 1 2.87 0.733
20. GeorgiaTech 5- 1 2.53 0.712
21. Tulsa 5- 1 2.50 0.710
22. WakeForest 6- 1 2.46 0.708
23. TexasA&M 6- 1 2.46 0.707
24. LouisianaState 5- 2 2.43 0.706
25. Missouri 6- 1 2.38 0.702

When trying to sort out the relative strength of conferences, here is the view I like to look at. Here you'll see all the out-of-conference wins and losses by each conference, and the ranks of those teams.

Out-of-conference wins

Big East Big 10 Big XII ACC
29 Navy 10 NotreDame 33 Washington 34 BrighamYoung
41 Maryland 14 Texas 43 SouthFlorida 59 CentralMichigan
46 Ohio 28 Pittsburgh 46 Ohio 64 Houston
47 Miami(Florida) 56 BowlingGreenSta 54 Louisiana-Lafay 66 Connecticut
60 KansasState 56 BowlingGreenSta 63 Texas-ElPaso 69 Syracuse
62 Kentucky 57 WesternMichigan 70 ArkansasState 72 MiddleTennessee
65 Indiana 58 Kent 72 MiddleTennessee 76 Cincinnati
72 MiddleTennessee 59 CentralMichigan 75 SouthernMethodi 79 Wyoming
79 Wyoming 69 Syracuse 82 Army 86 Mississippi
82 Army 71 Idaho 86 Mississippi 92 Rice
85 Akron 73 Vanderbilt 88 Alabama-Birming 97 FloridaAtlantic
87 EastCarolina 76 Cincinnati 91 NorthTexas 98 Troy
90 CentralFlorida 77 NorthernIllinoi 92 Rice 98 Troy
90 CentralFlorida 78 IowaState 93 NewMexico 100 LouisianaTech
94 MississippiStat 85 Akron 97 FloridaAtlantic 115 FloridaInternat
95 Illinois 107 BallState 97 FloridaAtlantic 115 FloridaInternat
95 Illinois 107 BallState 98 Troy 116 Temple
96 Virginia 108 SanDiegoState 99 Toledo 120 1AAOpponent
99 Toledo 116 Temple 100 LouisianaTech 120 1AAOpponent
101 NorthCarolina 117 Miami(Ohio) 100 LouisianaTech 120 1AAOpponent
101 NorthCarolina 117 Miami(Ohio) 105 Marshall 120 1AAOpponent
105 Marshall 118 EasternMichigan 113 Nevada-LasVegas 120 1AAOpponent
115 FloridaInternat 118 EasternMichigan 114 Louisiana-Monro 120 1AAOpponent
116 Temple 120 1AAOpponent 120 1AAOpponent 120 1AAOpponent
117 Miami(Ohio) 120 1AAOpponent 120 1AAOpponent 120 1AAOpponent
117 Miami(Ohio) 120 1AAOpponent 120 1AAOpponent
120 1AAOpponent 120 1AAOpponent 120 1AAOpponent
120 1AAOpponent 120 1AAOpponent 120 1AAOpponent
120 1AAOpponent 120 1AAOpponent
120 1AAOpponent 120 1AAOpponent
120 1AAOpponent 120 1AAOpponent
120 1AAOpponent 120 1AAOpponent
120 1AAOpponent

Out-of-conference losses

Big East Big 10 Big XII ACC
3 OhioState 6 Rutgers 1 SouthernCalifor 6 Rutgers
22 WakeForest 10 NotreDame 3 OhioState 7 Louisville
22 WakeForest 10 NotreDame 7 Louisville 10 NotreDame
29 Navy 10 NotreDame 15 Oregon 13 WestVirginia
31 Iowa 11 California 30 WashingtonState 26 Alabama
48 VirginiaTech 46 Ohio 31 Iowa 28 Pittsburgh
50 MichiganState 66 Connecticut 37 Georgia 32 SouthernMississ
83 Kansas 69 Syracuse 51 ArizonaState 43 SouthFlorida
84 Nevada 61 TexasChristian 57 WesternMichigan
120 1AAOpponent 61 TexasChristian 85 Akron
120 1AAOpponent 64 Houston 87 EastCarolina
74 ColoradoState 120 1AAOpponent
82 Army
99 Toledo
120 1AAOpponent

The Big East doesn't have the quality wins that the Big 10 has, but it also doesn't have the multiple bad losses that the Big XII and ACC have. You can quibble about which set of wins and losses is best, but the point is that --- at least for 2006 --- the Big East is not a joke compared to the majority of the other BCS conferences (the SEC and Pac10 are a cut above the rest). I'm not going to claim that the Mountaineers (or Louisville or Rutgers) have a tough schedule, but by the time they've gotten through it, I'd have no problem calling it adequate.

7 Comments | Posted in BCS, College

Ten thousand 2005s

Posted by Doug on June 6, 2006

Prerequisite reading material:

How often does the best team win?

Ten thousand seasons

Ten thousand seasons again

In the previous posts, I simulated ten thousand generic NFL seasons. In some of those seasons the "Seattle Seahawks" were great. In some they were terrible. In some they played a tough schedule, in others an easy one. In this post, I'll simulate ten thousand 2005 NFL seasons. The Seattle Seahawks will be a very good team in each of them, and they will play an easy schedule in each of them.

Mechanically, the procedures are similar, but philosophically there is a world of difference. The generic seasons had teams whose strengths I knew, so I could say things like "the best team" and "Chicago was not very good." I knew who the best team was and I knew how good Chicago was or wasn't. Exactly. Only because I knew those team strengths could I assign the proper probabilities to each game.

But if I want to simulate the 2005 season, I've got a problem: I don't know the team strengths. Neither do you. We have to guess. The guess I'm going to use is the team's rating from the simple rating system. I'm not going to spend time here making a case that that's the best guess or even necessarily a good guess. If you don't think the simple rating system is an adequate representation of team strength, that's fine. No hard feelings. But you'd better stop reading now, because that's the foundation this post rests on.

For those still with me, I'll make one more disclaimer. If I happen to say something like:

Seattle was the 4th-best team in football.

What I actually mean is:

According to the measurement of team strength that we have agreed upon --- which we acknowledge is imperfect in some obvious and some non-obvious ways --- Seattle appears to be the 4th-best team in football.

I am not trying to quash discussion of the merits of the various ways of estimating team strength and I am well aware of the weaknesses of the one I have chosen. But we've got to pick something and go with it, and the prose just seems to flow a bit better if you allow me to use the above shorthand notation. As you know, I can use all the help I can get with making the prose flow.

Now let's get to it. I'll just throw this summary out and then we'll discuss it.

Rating is the team's rating, which is my guess as to its true strength. Avg Wins is the average number of wins each team had over the course of the 10,000 seasons. Div is the number of division titles each team won. WC is the number of times each team got into the playoffs as a wildcard. PO = Div + WC. It is the number of times each team made the playoffs. SB is the number times each team made it to the Super Bowl and Champ is the number of times they won it.

TM Rating AvgWins Div WC PO SB Champ
ind | 10.8 | 11.2 | 7128 1572 8700 | 2688 1640
sea | 9.1 | 11.1 | 8936 395 9331 | 3461 1780
car | 5.1 | 10.4 | 6304 1818 8122 | 1681 741
den | 10.8 | 10.4 | 4342 2797 7139 | 1825 1092
pit | 7.8 | 10.3 | 5741 1543 7284 | 1469 778
nyg | 7.5 | 10.1 | 5083 2534 7617 | 1785 817
sdg | 9.9 | 9.9 | 3190 2907 6097 | 1343 797
jax | 4.8 | 9.6 | 2727 2951 5678 | 674 321
kan | 7.0 | 9.4 | 2298 2842 5140 | 737 371
cin | 3.8 | 9.3 | 3015 1974 4989 | 516 242
was | 6.0 | 9.2 | 2986 2765 5751 | 989 416
chi | 1.4 | 9.1 | 5653 793 6446 | 721 256
nwe | 3.1 | 8.7 | 5001 476 5477 | 425 194
tam | -1.0 | 8.5 | 1969 2333 4302 | 315 103
dal | 3.2 | 8.3 | 1552 2249 3801 | 409 166
atl | -1.2 | 8.2 | 1652 2122 3774 | 236 73
mia | -0.8 | 8.0 | 3385 481 3866 | 165 52
rav | -1.8 | 7.4 | 773 829 1602 | 61 22
min | -3.5 | 7.3 | 1864 774 2638 | 113 36
gnb | -3.7 | 7.1 | 1616 755 2371 | 93 29
ram | -5.1 | 6.9 | 528 1013 1541 | 59 10
cle | -4.2 | 6.8 | 471 518 989 | 32 9
crd | -5.0 | 6.7 | 481 884 1365 | 46 9
phi | -2.3 | 6.6 | 379 878 1257 | 61 16
rai | -2.8 | 6.3 | 170 427 597 | 22 9
det | -6.7 | 6.3 | 867 417 1284 | 27 6
buf | -5.8 | 6.2 | 889 179 1068 | 20 7
nyj | -6.4 | 6.0 | 725 136 861 | 18 6
oti | -7.6 | 5.8 | 108 256 364 | 4 2
htx | -10.0 | 5.1 | 37 112 149 | 1 0
nor | -11.1 | 4.9 | 75 139 214 | 4 0
sfo | -11.1 | 4.7 | 55 131 186 | 0 0

Indianapolis averaged 11.2 wins per season in the simulation. They won the AFC South 71.2 percent of the time, they made the playoffs 87% of the time, they made it to the Super Bowl about 27% of the time and won it 16.4% of the time.

If you were to translate this into an English sentence, it would not be: at the beginning of the season, we should have estimated that the Colts had a 16.4% chance of winning the Super Bowl. It would be something more like: knowing what we now know in hindsight about how good these teams were in 2005, if we were to play the season again with those strengths remaining the same, the Colts would have a 16.4% chance of winning the Super Bowl. Alright, that's pretty bad English but I hope you get the point.

The probability of winning the Super Bowl depends two things: the team's strength and their schedule (including the playoff schedule). You can see the effect of both in the table. Denver and Indianapolis were essentially equally strong, but the Colts' chances of winning the Super Bowl were significantly higher. And Seattle's were even higher, despite being a weaker team. Carolina had a title chance that was disproportionately high (compared to their true strength) and San Diego's was disproportionally low. We'll revisist them in a moment.

Also note that the spread on average wins --- from Indy's 10.8 to Houston's 4.7 --- is much smaller than the spread on actual wins in the 2005 season. This makes sense. I think it's safe to say that there is almost never an NFL team that is morally a 14-2 team or a 2-14 team. There are, though, probably three or four teams each year --- maybe more --- that are capable of going 14-2 if things break right for them, and there are another few that might slip to 2-14 if things don't. And the result is that we see 14-2 teams and 2-14 with some regularity. This idea might strike some people as controversial, but it's really no different from pointing out that no basketball player truly is a 50-point-per-game player even though certain players do score 50 from time to time.

OK, time to play god. Let's move the Chargers to the NFC South and the Panthers to the AFC West and see what happens.

TM Rating AvgWins Div WC PO SB Champ
sdg | 9.9 | 11.7 | 8209 1158 9367 | 3344 1790
clt | 10.8 | 11.3 | 7255 1615 8870 | 2881 1610
sea | 9.1 | 11.1 | 8921 398 9319 | 2879 1520
den | 10.8 | 10.6 | 5370 2328 7698 | 2134 1196
pit | 7.8 | 10.4 | 5795 1684 7479 | 1563 787
nyg | 7.5 | 10.1 | 5063 2508 7571 | 1478 727
jax | 4.8 | 9.7 | 2592 3360 5952 | 731 317
kan | 7.0 | 9.6 | 2980 2784 5764 | 902 441
was | 6.0 | 9.3 | 3015 2879 5894 | 827 366
cin | 3.8 | 9.3 | 2979 2222 5201 | 570 256
chi | 1.4 | 9.0 | 5504 754 6258 | 522 184
nwe | 3.1 | 8.7 | 4984 487 5471 | 476 195
car | 5.1 | 8.5 | 1385 2076 3461 | 388 179
tam | -1.0 | 8.3 | 979 2862 3841 | 173 63
dal | 3.2 | 8.3 | 1530 2287 3817 | 306 138
atl | -1.2 | 8.0 | 782 2516 3298 | 147 38
mia | -0.8 | 8.0 | 3298 551 3849 | 180 56
rav | -1.8 | 7.4 | 778 960 1738 | 62 23
min | -3.5 | 7.3 | 1933 661 2594 | 88 19
gnb | -3.7 | 7.0 | 1681 634 2315 | 86 22
ram | -5.1 | 6.9 | 559 1011 1570 | 36 6
cle | -4.2 | 6.8 | 448 593 1041 | 25 6
phi | -2.3 | 6.7 | 392 924 1316 | 49 19
crd | -5.0 | 6.6 | 458 793 1251 | 37 12
rai | -2.8 | 6.5 | 265 524 789 | 33 10
det | -6.7 | 6.3 | 882 369 1251 | 28 6
buf | -5.8 | 6.2 | 935 222 1157 | 32 10
nyj | -6.4 | 6.0 | 783 152 935 | 18 2
oti | -7.6 | 5.9 | 113 315 428 | 4 1
htx | -10.0 | 5.1 | 40 127 167 | 1 1
nor | -11.1 | 4.8 | 30 133 163 | 0 0
sfo | -11.1 | 4.7 | 62 113 175 | 0 0


21 Comments | Posted in Statgeekery

Ten thousand seasons

Posted by Doug on June 1, 2006

You'd better read yesterday's post if you haven't yet.

So the plan is to simulate an NFL season a bazillion times and observe what kind of wacky stuff happens. Here are the particulars.

For each simulated season, I will assign each team a true strength which is a random number from a normal distribution with mean 0 and standard deviation 6. This means that the teams' true strengths are mostly somewhat close to zero. In particular, roughly two-thirds of all teams will have true strengths between -6 and +6, about 95% of all teams will have true strengths between -12 and +12. As you probably guessed, these numbers were rigged so that they generally agree with the values that the simple rating system produces for real NFL seasons in this decade.

You'll note that, even though it will be true for a real NFL season, I am not requiring that the teams' strengths in a given year average zero. Even though we can't observe it (at least not easily), there must surely be years when the league is stronger and years when it's weaker. And in any case, since we are primarily interested in questions like "how often does the best team in football (for that year) win the Super Bowl," it doesn't matter much.

Each simulated season had the same league structure and schedule as the 2005 NFL. That is, there were 32 teams divided into eight divisions of four teams each, and the schedule is just like that of the 2005 NFL.

There is one potential complication here, but I think it's minor. In the simulated world, each season is independent of the previous one, so the two intra-conference games in each team's schedule that are determined by last season's finish are instead essentially against random teams. In the real NFL, the seasons are not independent and good teams probably end up playing very slightly stronger schedules in general than bad teams do. Fortunately, this effect isn't nearly as dramatic now as it was in the 80s and 90s.

Also, I was too lazy to program the tiebreakers. All ties were broken by coin flip. I don't think this will affect anything, but let me know if you think I'm wrong about that.

Finally, the individual games are played by using the same formula we used in this post:

Home team prob. of winning =~ 1 / (1 + e^(-.438 - .0826*diff))

where diff is the home team's true strength minus the visiting team's true strength.

OK, that's that. Let's get to the question of the day, which is: how often does the best team in the NFL win the Super Bowl?

The answer is roughly 24% of the time.

I simulated 10,000 seasons. The table below shows that the best team won the Super Bowl 2,399 times, the second-best team won it 1,448 times, and so.

Tm# SBwins
1 2399
2 1448
3 1060
4 846
5 670
6 584
7 464
8 388
9 327
10 285
11 231
12 189
13 188
14 151
15 141
16 122
17 113
18 72
19 70
20 55
21 42
22 35
23 36
24 22
25 22
26 15
27 12
28 4
29 4
30 3
31 1
32 1

[NOTE: if you thought this table looked slightly different earlier, you're not seeing things. I accidentally inlcuded the wrong table at first, so I updated it about an hour later.]

Very nearly 50% of the time, the Super Bowl champion was one of the best three teams in football. And let me reiterate that when I say "the best team," I am not necessarily talking about the team with the best record. I am talking about the best team. Remember, we're omniscient here. We know which team really was the best.

I'm sure what caught your eye was that the 32nd-best (i.e. the worst) team in the NFL won the title once. Let me tell you about that season.

It was simulated season #6605. The Seattle Seahawks were truly a great team (true strength +15.1) and they played up to their potential, posting a 15-1 regular season record. The Chicago Bears were the worst team in football, but with a true strength of -9.0, they really weren't that bad, at least by worst-team-in-football standards. The NFC North was relatively weak, and Chicago took the division with an 8-8 record.

The Bears' first round playoff opponent was the Carolina Panthers, who were not great (+2.8) but had posted a 10-6 record to finish second in the NFC South. The game was in Chicago, of course, and it was therefore only a mild upset when Chicago won it. Chicago then beat the Saints in New Orleans and the Seahawks in Seattle to reach the Super Bowl.

The AFC was weak in 6605. The best they had to offer was the Jets (+7.2) who had gone 12-4 in the regular season and had beaten the Colts on the road to reach the Super Bowl. The Bears beat the Jets to win the title.

As James points out in his article, there is no single event here that is too hard to believe. It's not unlikely that there wouldn't be any truly terrible teams in the NFL in a given year. It's not unlikely that an entire division would be weak, and it's not unlikely that the worst team in such a division could win the title with an 8-8 record. In their four playoff games, their probabilites of victory were 37%, 10%, 8%, and 21%. That they'd win those four games is certainly unlikely, but no more unlikely than, say, an NL team getting four straight hits at the bottom of their batting order, and I'll bet you've seen that.

No one of those things is terribly bizarre. Yet they all come together to create an almost-unbelievable occurrence. Almost unbelievable. Ten thousand years is a long time. Most of you have probably been watching NFL football for 20 or 30 years, and think of all the crazy stuff you've seen in that time. If you lived another 500 lifetimes, you'd see some even crazier stuff.

Do you think you'd ever see a team like the 2005 Jets win a Super Bowl? And I'm not talking about the Jets if Pennington and Curtis Martin had stayed healthy. I'm talking about the Brooks Bollinger Cedric Houston 2005 New York Jets. If you gave that team 10,000 tries, would they win a Super Bowl? Before you say no, think about all the times you've seen a really bad team rattle off three or four unexpected victories; think of the Craig Krenzel-led Bears during that stretch in 2004, for example. Such runs are unlikely, but you've seen lots of them. Don't you think that, in 10,000 years, some team could string a couple of those runs together, get some breaks from the schedule, and then fluke out in the playoffs?

It could happen.

25 Comments | Posted in Statgeekery

How often does the best team win?

Posted by Doug on May 31, 2006

In the 1989 Baseball Abstract --- yes, there was a 1989 Baseball Abstract; I'll bet I am one of no more than ten people on the planet who has it on his bookshelf right now --- Bill James wrote an essay called How Often Does the Best Team Actually Win? Here is a passage from the introduction:

Yes, we know that the luck evens out in a 162-game schedule, but how consistently? Does the best team win the division, in a 162-game schedule, 90% of the time? 75%? How often? Does the best team in baseball win the World Championship nine years in ten, or two? Is it possible for a team which is in reality just average --- a .500 team --- to win its division (and therefore possibly even the World Series) by sheer luck?

Note that he was not asking how often the team with the best record wins the World Series, or how often a team with a .500 record would win. He was asking how often the team that really and truly was the best wins the World Series, and how often a team that was morally a .500 team would win the world series (most likely lucking into a better-than-.500 record in the process).

Questions like the former can't be answered by looking at real life results, but only because we don't have enough of a sample size. Questions like the latter, though, cannot be answered using real life results even if we live to see a million seasons. We don't know how often the best team wins the World Series or the Super Bowl because we don't know --- we can't know --- who the best team is. Pittsburgh may have been the best team in the NFL last year, or they may have been the 3rd best or the 14th best. We don't know how often a .500 team wins the Super Bowl because we don't know who the .500 teams are.

If you want to know how often the best team wins the title, you have to build a model. In that model, you can create teams whose strengths you know, because you defined them. James did just that, and he concluded that in Major League Baseball, structured as it was in the late 1980s, the best team wins the World Series 29% of the time. The best team in a division wins that division about 53% of the time. The best team in all of baseball missed the playoffs about 29% of the time.

These results seemed to make him a little uneasy. He closed the essay with this:

The belief that in a 162-game schedule the luck will even out is certainly unfounded --- but that unfounded belief may also be essential to the health of the game. Would people lose interest in baseball if they realized that the best team doesn't win nearly half the time? Would it damage the perception of the World Series if people realized that the best team in baseball only emerges with the crown about 30% of the time?

For me, no. It would not damage my interest, and for most of you also, I suspect. I am afraid that for some people, the answer would be the other one. I've learned a lot of surprising things in running these simulations, and I'm happy to have that knowledge....But I don't think it's something I'm going to talk about a whole lot.

I think he's got it backwards. I think it's the stat geeks who are concerned about the best team winning. The rest of the public, in my experience, doesn't give much thought at all to the notion of "the best team," or is content to define the best team to be the one that wins and/or to appreciate the unpredictability for unpredictability's sake. Furthermore, I don't think that, in a 26-team league, 29% is all that low. If the best team in baseball is morally a .600 team, say, then most years there are probably two or three more teams pretty close to that. If a third-best team that is within a few percentage points of the best team happens to win a title because of luck, I don't think anyone considers that a travesty.

In any event, I --- like James --- find the topic fascinating, and have for years been meaning to replicate this study for the NFL. Yesterday's post was not exactly like the James study, but was in some ways similar. And it prompted me to roll up the sleeves and get the simulator built. So I did. And I'm going to spend the next post or five discussing what kinds of things it spits out. Discussion will include, but not be limited to, the followng:

  • I'll answer the same questions James did. How often does the best team in football win the Super Bowl? How often does the best team in football fail to make the playoffs? How often does a sub-.500 team win the Super Bowl? It's not clear how the answers will differ from MLB circa 1989. On one hand, baseball plays ten times more games, which gives the luck more of a chance to even out. On the other hand, football simply doesn't have as much luck built into it as baseball does. If the worst team in baseball beats the best team, it barely raises an eyebrow. In football, that almost never happens.
  • I want to examine various playoff configurations and see how much the answers to the above questions change. For example, what if we eliminated the wildcard and simply let the eight division winners play a standard tournament? Would that increase or decrease the chances of the best team winning? It's not clear, not to me anyway. Sometimes the wildcard lets weak teams in, sometimes it lets strong teams in. What if we had four divisions of eight instead of eight divisions of four? How would that change things? What if, as a friend of mine advocates, we have two conferences of 16 teams each and no divisions at all?
  • I also want to briefly investigate questions along the lines of, how often does a sub-.500 team win its division? Unlike the first bullet, here I'm not talking about teams that were morally sub-.500. I'm talking about teams whose record was under .500. Similarly, we can investigate question like, how often should we see an undefeated team? How often should we see a winless team? What are the chances of a four-way tie in a division?
  • James didn't do this, but I think it will be fun to take a look at some specific teams in specific years. In the previous post, I talked about what would happen if we switched the 2004 Colts and Falcons prior to the playoffs. Now I'll talk about the what would have happened if we had switched them before the season started. This will require an extra step (i.e. leap of faith) which I'll explain when the time comes. As another example, I talked last week about the Chargers having a rough schedule last year. What if they had played the Panthers' schedule last year and the Panthers had played theirs?

Many of these ideas were touched upon in the comments to yesterday's post. If you have more suggestions of questions to ponder, bung them down in the comments.

27 Comments | Posted in Statgeekery

Conference imbalance and playoff fairness

Posted by Doug on May 30, 2006

Last week I posted some quick lists of bad teams that made the playoffs and good teams that didn't.

In the comments of the former appeared this:

2004 really was a bad year for the NFC! I can see at least 4 teams on the list [of below average teams that made the playoffs], and the Falcons are 16th, despite IMO being clearly the second best team in the conference that season.

Four of the six playoff teams in the NFC that year were indeed below average according to the simple rating system. In fact, according to that system, 14 of the 16 teams in the NFC were below average. The average rating of all AFC teams was +7.8, which means the average rating of all NFC teams was -7.8, which means that an average AFC team was 16 points better than an average NFC team in 2004. I'll do a full post (or more) on conference imbalance someday, but for now I'll just say that that differential is the highest since the merger. The NFL was an absurdly imbalanced league in 2004.

This is probably the place to remind everyone, self included, that the ratings are just rough estimates and we should be attaching some mental error bars to them. In particular here, I think the Eagles' rating is likely an understatement of their strength because they mailed in their last three games. This would have a ripple effect on the rest of the NFC, which might mean that, really and truly, only 11 of 16 teams being below average instead of the 14 we're estimating above. Or something like that. Anyway, it doesn't change the fact that the NFL was an absurdly imbalanced league in 2004.

Consider the Colts and the Falcons, for example. In order to reach the Super Bowl, the Colts would have had to first beat a Denver team that was arguably better than any team in the NFC. Then they would have had to beat a 14-2 team and a 15-1 team --- both of which compiled their records against tougher-than-average schedules, I might add --- on the road. That's rough. All the Falcons had to do was win two games, one of them against a below-average opponent. If you believe that teams who accomplished more in the regular season should be rewarded with an easier postseason road, something which is implicitly assumed in the postseason structure of every sports league I'm aware of, then you have to consider this unfair.

I decided to investigate just how unfair it was. The basic idea is this: estimate the Colts' chances of reaching and/or winning the Super Bowl, and compare it to what their chances would have been had they been in the other bracket.

The first thing we need to do is find a formula that relates two teams' ratings to their chances of winning a game between the two of them. I'll skip the details, but here is the formula I used:

Home team prob. of winning =~ 1 / (1 + e^(-.438 - .0826*diff))

where diff is the home team's rating minus the visiting team's rating. If the home team is 7 points better than the road team, this model gives the home team a 73% chance of winning. If the home team is 7 points worse, this model gives the home team a 46% chance of winning. I wouldn't go to war with any bookies using this alone, but it should serve our purpose here, which is to give us the rough estimates needed to simulate the playoff tournament a few bazillion times. That will then give us a rough estimate of each team's probability of winning the Super Bowl.

Here were each team's estimated chances of reaching and winning the Super Bowl at the beginning of the playoffs in 2004:

ReachSB WinSB
1. pit 35.4 22.1
2. nwe 35.7 24.6
3. ind 13.5 9.2
4. sdg 9.2 5.9
5. nyj 3.4 1.8
6. den 2.8 1.8

1. phi 56.3 22.4
2. atl 19.5 5.4
3. gnb 11.5 3.5
4. sea 6.4 1.6
5. stl 2.4 0.6
6. min 3.9 1.1

Anyway, let's see what happens if you switch the Colts and Falcons, giving the Colts the two seed in the NFC and the Falcons the three seed in the AFC:

ReachSB WinSB
1. pit 37.7 19.8
2. nwe 41.7 24.9
3. atl 2.0 0.6
4. sdg 9.9 5.3
5. nyj 3.8 1.8
6. den 5.0 2.6

1. phi 41.2 17.1
2. ind 47.1 24.8
3. gnb 5.2 1.6
4. sea 3.3 0.8
5. stl 1.2 0.2
6. min 2.1 0.5

The Colts' chances of reaching the Super Bowl would have been about three to four times greater had they been in the other league. The Falcons' chances would have decreased by a factor of 10 had they been in the other league. The Bills missed the playoffs in the AFC. Had they been the #6 seed in the NFC, they would have had a 15% chance of getting to the Super Bowl.

Finally, this comes from the comments of the "best non-playoff team" post:

Don’t forget, that 1991 San Francisco team lost to the Falcons on a Hail Mary pass (Tolliver to Haynes, I believe for 44 yards). If that pass is incomplete, SF goes 11-5 and wins the division, NO is a wildcard team and Atlanta misses the playoffs entirely.

Had things played out that way, San Francisco would have had an estimated 16% chance at reaching the Super Bowl and a 10% chance of winning it, and those numbers would be quite a bit higher had the 1991 Redskins not been such a juggernaut.

Yes, yes, I know. That's the way the ball bounces, that's why they play the games, great teams will find a way to overcome bad breaks, and so on and so forth. Anyone with the urge to post, "the Patriots won the 2004 title on the field and that's all that matters" will not be telling me anything I don't know. I get that. I am aware that it's meaningless to say that being in the AFC cost Indianapolis .156 Super Bowl titles in 2004.

For some reason, it's something I wanted to know anyway.

9 Comments | Posted in History, Statgeekery

The best non-playoff team in history

Posted by Doug on May 25, 2006

It might just be the 2005 San Diego Chargers.

If you go by the basic power rating system, the Chargers were the third best team in the NFL last year with a rating of +9.9, which means that, if you adjust for the schedule they played, they were about 9.9 points better than an average team. According to that metric, the Chargers were the third-best team since the merger to be watching the postseason on TV:

TM YR Rating
sfo 1991 10.4
cin 1976 10.0
sdg 2005 9.9
ram 1970 9.3
mia 1975 8.9
buf 2004 8.1
mia 1977 7.4
stl 1970 7.3
den 1976 7.2
kan 2005 7.0
buf 1975 6.6
sea 1986 6.5
cin 1989 6.5
kan 1999 6.4
ram 1971 6.3
oak 1999 6.2
hou 1975 6.1
min 1986 6.1
bal 2004 6.1
kan 2002 6.1
mia 2002 6.1
hou 1977 6.0
nwe 1980 6.0

Using this rating system to compare across years requires a bit of interpretation. This doesn't say the 2005 Chargers were a better team (or a worse team) than, say, the 1977 Oilers. It says that the 2005 Chargers were better, relative to their league, than the 1977 Oilers were, relative to theirs. It seems to me that's an appropriate metric by which to judge meaningless trivia like "best non-playoff team in history."

If you click on the 1991 49ers and the 1976 Bengals, you'll see that each of them has pretty strong claim to this title as well. The 49ers were third in the NFL in points scored and fourth in points allowed. The Bengals ranked sixth and seventh in those two categories. They were 10-4, with all four losses coming against playoff teams, including two to the eventual Super Bowl champion Steelers.

But I think the simple rating system I'm using actually understates the Chargers' strength. If memory serves (correct me if I'm incorrect), the loss to Denver in week 17 was essentially an exhibition game, as both teams' postseason destinations were already sealed. Further, the Chargers' loss to Philadelphia looks like a bad loss to the computer, but at the time, the Eagles still had Owens and McNabb and were among the best teams in the NFC. Likewise, there is little shame in their loss to the Dolphins, who were in the middle of a six-game win streak when they beat San Diego.

On the flip side, who is the worst playoff team of all time? Unlike the above, where you could reasonably argue for a few different teams, this one is not debateable. I knew who it was before running the numbers, but the numbers confirmed it. I'll write about them in a future post.

16 Comments | Posted in General

A very simple ranking system

Posted by Doug on May 8, 2006

My friend Joe Bryant says that the BCS bowl matchups are like getting a shrimp cocktail at Morton's Steakhouse. Sure, it's better than what you normally eat, but at the same time it's frustrating and disappointing because you can see a bunch of far preferable alternatives right there in front of your eyes. I tend to agree. Nonetheless, it is not in any way an exaggeration to say that the BCS revived my interest in college football. Not because of the matchups the system has produced, but because it gave me an excuse to learn some very interesting mathematics.

As you probably know, the participants in the BCS championship game are determined in part by a collection of computer rankings. Those computer rankings are implementing algorithms that "work" because of various mathematical theorems. At some point, I'm going to use this blog to write down everything I know about the topic (which by the way is a drop in the bucket compared to what many other people know; I am not an expert, just a fan) in language that a sufficiently interested and patient non-mathematician can understand.

I'll start that off today by describing one of the most basic ranking algorithms.

The idea is to define a system of 32 equations in 32 unknowns. The solution to that system will be collection of 32 numbers and those numbers will serve as the ratings of the 32 NFL teams. Define R_ind as Indianapolis' rating, R_pit as Pittsburgh's rankings, and so on. Those are the unknowns. The equations are:

R_ind = 12.0 + (1/16) (R_bal + R_jax + R_cle + . . . . + R_ari)
R_pit = 8.2 + (1/16) (R_ten + R_hou + R_nwe + . . . . + R_det)
R_stl = -4.1 + (1/16) (R_sfo + R_ari + R_ten + . . . . + R_dal)

One equation for each team. The number just after the equal sign is that team's average point margin. In plain English, the first equation says:

The Colts' rating should equal their average point margin (which was +12), plus the average of their opponents' ratings

So every team's rating is their average point margin, adjusted up or down depending on the strength of their opponents. Thus an average team would have a rating of zero. Suppose a team plays a schedule that is, overall, exactly average. Then the sum of the terms in parentheses would be zero and the team's rating would be its average point margin. If a team played a tougher-than-average schedule, the sum of the terms in parentheses would be positive and so a team's rating would be bigger than its average point margin.

It would be easy to find the Colts' rating if we knew all their opponents' ratings. But we can't figure those out until we've figured out their opponents' ratings, and we can't figure those out until. . ., you get the idea. Everyone's rating essentially depends on everyone else's rating.

So how do you actually find the set of values that solves this system of equations? In high school you probably learned how to solve 2-by-2 and maybe 3-by-3 systems of equations by putting some numbers into a matrix, doing some complicated operations on that matrix, and then reading the solutions off the new matrix. Same thing here, except you've got a 32-by-32 matrix instead of a 2-by-2 matrix. If you wanted college football rankings, it'd be 120-by-120. I recommend using a computer.

It's more instructive, though, to solve it a different way. We'll start by giving everyone an initial rating, which is just their average point margin. I'll use the Colts as an example. Their initial rating is +12.0. Now look at the average of their opponents' intial ratings:

Opp Rating
ari -4.75
bal -2.12
cin 4.44
cle -4.31
hou -10.69
hou -10.69
jax 5.75
jax 5.75
nwe 2.56
pit 8.19
ram -4.12
sdg 6.62
sea 11.31
sfo -11.81
ten -7.62
ten -7.62

Those average -1.2, so the Colts' new rating will be 12.0 - 1.2, which is 10.8. So after this calculation the Colts' rating changed from +12 to +10.8. But meanwhile, every other team's rating changed as well, so we have to do the whole thing over again with the new ratings. On the second pass, the Colts schedule looks a bit different:

Opp Rating
ari -4.76
bal -1.49
cin 4.09
cle -3.85
hou -9.69
hou -9.69
jax 4.85
jax 4.85
nwe 3.09
pit 8.02
ram -5.16
sdg 8.62
sea 8.99
sfo -10.77
ten -7.30
ten -7.30

The average of these is -1.1, so the Colts' opponents aren't quite as bad as they looked at first. Indy's new rating is 12.0 - 1.1, which is 10.9. Uh oh! Everyone else's ratings just changed again, so we've got to run through the same procedure again. And again. And again. And eventually the numbers stop changing. When that happens, you know you've arrived at the solution. Take a look at the Colts schedule with the final rankings and you'll be able to convince yourself that this method works:

WK OPP Margin Rating Margin
1 bal 17 -1.83 15.17
2 jax 7 4.76 11.76
3 cle 7 -4.22 2.78
4 ten 21 -7.57 13.43
5 sfo 25 -11.15 13.85
6 ram 17 -5.15 11.85
7 hou 18 -10.03 7.97
9 nwe 19 3.14 22.14
10 hou 14 -10.03 3.97
11 cin 8 3.82 11.82
12 pit 19 7.81 26.81
13 ten 32 -7.57 24.43
14 jax 8 4.76 12.76
15 sdg -9 9.94 0.94
16 sea -15 9.11 -5.89
17 ari 4 -4.98 -0.98
AVERAGE 12.0 -1.20 10.80

How to read this table: in week 1, the Colts beat the Ravens by 17. The Ravens were, all things considered, 1.83 points worse than average, so the Colts got a "score" of 17 - 1.83, or 15.17 for that game. In week 2, the Colts beat the Jaguars by 7. Jacksonville was 4.76 points better than average, so the Colts get an 11.76 for that game. Average their scores for each game and you've got their rating. The bottom line says:

The Colts' won their games by an average of 12 points each. Their opponents were, on average, 1.2 points worse than average. Thus the Colts were 10.8 points better than average.

Let's examine some of the features of this system:

  • The numbers it spits out are easy to interpret - if Team A's rating is 3 bigger than Team B's, this means that the system thinks Team A is 3 points better than Team B. With most ranking algorithms, the numbers that come out have no real meaning that can be translated into an English sentence. With this system, the units are easy to understand.
  • It is a predictive system rather than a retrodictive system - this is a very important distinction. You can use these ratings to answer the question: which team is stronger? I.e. which team is more likely to win a game tomorrow? Or you can use them to answer the question: which of these teams accomplished more in the past? Some systems answer the first questions more accurately; they are called predictive systems. Others answer the latter question more accurately; they are called retrodictive systems. As it turns out, this is a pretty good predictive system. For the reasons described below, it is not a good retrodictive system.
  • It weights all games equally - every football fan knows that the Colts' week 17 game against Arizona was a meaningless exhibition, but the algorithm gives it the same weight as all the rest of the games.
  • It weights all points equally, and therefore ignores wins and losses - take a look at the Colts season chart above. If you take away 10 points in week 3 and give them back 10 points in week 4, you've just changed their record, but you haven't changed their rating at all. If you take away 10 points in week 3 and give back 20 points in week 4, you have made their record worse but their rating better. Most football fans put a high premium on the few points that move you from a 3-point loss to a 3-point win and almost no weight on the many points that move you from a 20-point win to a 50-point win.
  • It is easily imressed by blowout victories - this system thinks a 50-point win and a 10-point loss is preferable to two 14-point wins. Most fans would disagree with that assessment.
  • It is slightly biased toward offensive-minded teams - because it considers point margins instead of point ratios, it treats a 50-30 win as more impressive than a 17-0 win. Again, this is an assessment that most fans would disagree with.
  • This should go without saying, but - I'll say it anyway. The system does not take into account injuries, weather conditions, yardage gained, the importance of the game, whether it was a Monday Night game or not, whether the quarterback's grandomother was sick, or anything else besides points scored and points allowed.

This system, like all systems, has some drawbacks, but it has the virtue of simplicity. It is easy to understand and it produces numbers that are easy to interpret. That is not to be sneezed at.

Furthermore, most of its drawbacks have easy fixes. For example, when computing a team's initial rating --- i.e. their average point margin --- you can tweak the individual game margins to make the initial rating "smarter." One way to do that is to cap the margin of victory at 21 points, or 14 points or whatever you want. You can explcitly incorporate wins and losses by giving the winning team a bonus of 3 points or 10 points or however many you want. To take it to the extreme, you could simply define all wins to be one-point wins and all losses to be one-point losses. This removes margin of victory from the scene completely. As usual, when you tweak the method to stengthen its weaknesses, you also weaken its strengths. In particular, if you use a modified margin of victory, the numbers don't have as nice an interpretation.

I'll close with some rankings. Here are the NFL's 2005 regular season rankings according to the original method:

Team Rating StrOfSched
1. ind 10.8 -1.2
2. den 10.8 2.2
3. sdg 9.9 3.3
4. sea 9.1 -2.2
5. pit 7.8 -0.4
6. nyg 7.5 0.7
7. kan 7.0 2.1
8. was 6.0 1.9
9. car 5.1 -3.2
10. jax 4.8 -1.0
11. cin 3.8 -0.6
12. dal 3.2 2.1
13. nwe 3.1 0.6
14. chi 1.4 -2.2
15. mia -0.8 -0.8
16. tam -1.0 -2.6
17. atl -1.2 -1.9
18. bal -1.8 0.3
19. phi -2.3 2.6
20. oak -2.8 3.0
21. min -3.5 -1.1
22. gnb -3.7 -0.8
23. cle -4.2 0.1
24. ari -5.0 -0.2
25. ram -5.1 -1.0
26. buf -5.8 0.2
27. nyj -6.4 0.8
28. det -6.7 -1.0
29. ten -7.6 0.1
30. hou -10.0 0.7
31. nor -11.1 -0.9
32. sfo -11.1 0.7

Here they are if every win of less than 7 points is counted as a 7-point win and if the margin of victory is capped at 21.

Team Rating StrOfSched
1. den 10.1 1.6
2. ind 9.9 -1.4
3. sea 7.1 -1.9
4. sdg 6.9 2.9
5. nyg 6.3 0.7
6. pit 6.1 -0.6
7. was 5.5 1.6
8. kan 5.4 1.7
9. car 4.8 -2.3
10. jax 4.8 -1.1
11. cin 3.8 -0.9
12. dal 3.6 1.6
13. nwe 2.8 0.7
14. chi 1.5 -1.8
15. tam 0.9 -1.9
16. mia 0.6 -0.7
17. atl -0.4 -1.3
18. min -1.8 -1.1
19. phi -1.9 2.1
20. cle -3.2 -0.2
21. bal -3.4 0.3
22. oak -3.6 2.6
23. gnb -4.9 -0.5
24. buf -5.1 0.3
25. ram -5.1 -0.7
26. ari -5.1 -0.1
27. nyj -5.8 0.8
28. det -6.0 -0.8
29. ten -6.7 -0.1
30. sfo -8.1 0.5
31. nor -9.2 -0.4
32. hou -9.8 0.5

Here they are with margin of victory removed altogether:

Team Rating StrOfSched
1. den 0.69 0.07
2. ind 0.66 -0.09
3. sea 0.50 -0.12
4. jax 0.42 -0.08
5. nyg 0.42 0.04
6. was 0.37 0.12
7. pit 0.34 -0.03
8. kan 0.33 0.08
9. cin 0.31 -0.07
10. sdg 0.29 0.17
11. nwe 0.26 0.01
12. chi 0.26 -0.11
13. tam 0.25 -0.13
14. car 0.24 -0.13
15. dal 0.22 0.09
16. mia 0.06 -0.07
17. min 0.05 -0.07
18. atl -0.06 -0.06
19. phi -0.14 0.11
20. bal -0.23 0.02
21. cle -0.26 -0.01
22. ram -0.28 -0.03
23. oak -0.36 0.14
24. ari -0.37 0.01
25. buf -0.37 0.01
26. det -0.41 -0.03
27. sfo -0.44 0.06
28. nyj -0.45 0.05
29. gnb -0.49 0.01
30. ten -0.50 0.00
31. nor -0.63 -0.01
32. hou -0.71 0.04

ADDENDUM: I need to clarify one thing about the simple rating system: it’s not my system. I didn’t invent it. In fact, it’s one of those systems that has been around for so long that no one in particular is credited with having developed it (as far as I know anyway). People were almost certainly using it before I was born. I like the system and use it a lot because it’s fairly easy to interpret and understand, and because the math behind it is nifty. But I just realized that I had never been clear enough about the fact that it’s not my system. I just use it.

79 Comments | Posted in BCS, Statgeekery

Page 4 of 41234