SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all PFR content, so you can quickly and easily find the content you want.
Also, our existing PFR blog rss feed will be redirected to the new site's feed.
Pro-Football-Reference.com » Sports Reference
For more from Chase and Jason, check out their work at Football Perspective and The Big Lead.
How to fill out your brackets
So you're not much of a college basketball fan. You follow your alma mater, and possibly keep loose tabs on the rest of their conference, but that's about as far as it goes. But the tourney is good entertainment and, as is customary, you enter a bracket pool so you can have a rooting interest where none would otherwise exist. How do you maximize your chances winning the thing?
If you're like me, the first thing you do is you head someplace like this smorgasboard of computer ranking algorithms and check out a few of them to get a quick feel for which teams appear to be over- or under-seeded. Some of them even do the work for you by putting a specific probability estimate on each team's chances of advancing to each round.
Whatever the rules of your bracket pool, you probably get some sort of score associated with your entry. And the highest score wins. In most pools, you can use estimates like those above to compute (at least approximately) the expected score of each possible entry. Now simply find the entry with the highest expected score and turn it in.
That's what I used to do. Only recently did I realize that that's wrong. Maximizing your expected score is not the same as maximizing your chance of having the highest score. Your goal is the latter, not the former.
To see why they're not the same, imagine a simple pool where you are simply trying to pick the winner of the tournament. Let's say that in the very likely event of a tie, the winner will be selected randomly from among those who correctly picked the champion. You believe these are the probabilities of each team winning the tourney:
Ohio State: 25%
UCLA: 20%
Kansas: 15%
UNC: 15%
Florida: 10%
Texas A&M: 10%
Washington State: 5%
The "score" of your entry in this simple pool is either one or zero, depending on whether you pick the champ correctly or not. So the entry with the highest expected score is Ohio State. But Ohio State might or might not be the entry that maximizes your chance of winning the pool. It depends on who everyone else picked. If you were the only Buckeye-picker, then great. But if 90% of the other pool participants picked Ohio State, then you'd be better off picking Washington State.
So, while Ohio State is the "best" pick in some sense, it's also likely to be a "crowded" pick, and that's the problem. You may be better off going with a "worse" pick, if it's a pick that's less popular. That's a simple example, but the same issues are present in a real pool. Even if there aren't necessarily ties, the best picks are also going to be the most popular picks, and that's going to cause the same kind of crowding. If you pick the entry that you believe is most likely to occur, then there will be lots of other entries that look very similar to yours. This is problematic because you know you're going to miss on a lot of games. And if your entry is too centrist, it's likely that there will be an entry that looks just like it except that it got a few of the games you missed.
The other extreme is to pick an entry with Cinderellas and longshots aplenty. This avoids the crowding problem. With a wacky entry, even if you miss a lot of games, there are not likely to be many entries close to yours to capitalize on your mistakes. The problem here is that, if you turn in a wacky entry, you probably won't end up being even close. That's what makes it a wacky entry.
To make this a little more concrete, imagine two extreme strategies:
Strategy #1: pick a final four with two #1 seeds, a #2 seed and a #3 seed.
Strategy #2: pick a final four with two #9 seeds and two #7 seeds.
The upside of Strategy #1 you're very likely to hit at least a couple of the final four teams. The downside is that, if your final four hits, you're probably not the only one who has it.
The upside of Strategy #2 is that, even if you just get one or two of the final four teams correct, you're probably still doing better than everyone else. The downside is that you're not likely to hit even one.
And of course you don't have to be at one extreme or the other. There is a continuum of possibilities in between. So where do you want to position yourself? You can't answer that question unless you know what the other entries in your pool look like, and you're probably not going to know that. So you have to make some assumptions.
If it's a big contest with a mixture of hardcore and casual fans, I think it's reasonable to expect that the entries will generally cluster around the most likely outcomes, but that there will be some longshot entries mixed in. With that in mind, I'm going to make the following assumption:
Assume the entries in your pool are distributed the same as the distribution of actual outcomes of the tournament.
Roughly speaking, what this means is that, if you think Ohio State as a 25% chance of winning the tourney, then about 25% of the pool's participants will pick Ohio State to win it. If you think there is a 1% chance of a final four consisting of Florida, UCLA, Texas A&M, and Georgetown, then about 1% of the pool's entries will have that for a Final Four. If you think Virginia Tech has a 59% chance of beating Illinois in the first round, then around 59% of the entries will have Virginia Tech beating Illinois. And so on.
Is this a reasonable assumption? I think it's at least in the ballpark. Yahoo.com publishes the entries in its Tournament pick 'em contest and they match up reasonably well with objectively-generated probabilities (e.g. from Sagarin ratings and the like). Not perfectly, but reasonably. This shouldn't be too surprising. Sports gambling markets are often cited as an example of the wisdom of crowds and are generally believed to be pretty efficient.
So let's go back and apply this assumption to our drastically simplified pool, where we are only picking the champion. If these are the probabilities of each of these teams winning the title:
Ohio State: 25%
UCLA: 20%
Kansas: 15%
UNC: 15%
Florida: 10%
Texas A&M: 10%
Washington State: 5%
Then our assumption would imply that the above is also the distribution of entries. Twenty-five percent of the people would take Ohio State, 20% UCLA, and so on. If that's the case, then what is the best pick?
There is no best pick! Your chances of winning are the same no matter who you pick.
If there are 100 entries for example, then 25 of them took Ohio State. So if you are one of those 25 riding the Buckeyes, your chances of winning are 1%: a 25% chance they'll win, and then a 1-in-25 chance that you'll win the tiebreaker. If you take Washington State, you've also got a 1% chance of winning: 5% chance of the Cougars winning, then a 1-in-5 chance of winning the tiebreaker. Regardless of which team you look at, the analysis will turn out the same: you have a 1% chance of winning. One percent, of course, is one of a hundred, because you are one of a hundred people in the pool.
But that's an oversimplified situation. What happens in more complicated settings?
As many of you know, I teach math for a living. Last summer, I got a student and a colleague interested in investigating this question with me. Some very interesting (to us, anyway) mathematics arose from the investigation.
As an abstract model of the tournament prediction problem, we imagined the following game. Suppose that a random number, called the target, is to be chosen. Millions of participants will guess what the number will be, and whoever guesses closest is the winner. Let's say, just for example, that it is to come from the standard normal distribution. So there is about a 2/3 probability that the target will be between -1 and 1, a 95% chance that it will be between -2 and 2, a 99% chance that it will be between -3 and 3, and so on. Your job is to guess closer to the target than any other competitor, and let's assume that their guesses are distributed as independent standard normals as well. In other words, two-thirds of the guesses will be between -1 and 1, 95% between -2 and 2, and so on.
If you guess near zero, then you are likely to be close to the target. But you are also likely to be crowded out by the multitudes of other guesses that are in the same vicinity. If you make a guess far out in the tail, like say 3.4, then there aren't many guesses near yours, but the target isn't likely to be near your guess either. If you picture a standard bell curve, you can picture the choice as being between a tall skinny piece of the distribution (a guess near zero) or a short fat piece (a guess far from zero). Which gives you the better chance of winning?
As it turns out, it doesn't matter. Either is as good as the other. And anywhere in between is also just as good.
Even more interesting is that it does not matter that the distribution is standard normal. No matter what the distribution is (well, there are a few technical caveats, but I don't feel like I'm betraying the spirit of the results to say that it doesn't matter), as long as the distribution of entries is the same as the distribution of possible outcomes, and as long as the pool has a lot of entries, it doesn't matter what you guess.
So, at least to the extent that you believe our abstract game models your pool reasonably well, any guess is as good as any other. Fill out your bracket based on geography, uniform color, fierceness of mascot, or whatever other criteria you want. Your chances are as good as anyone else's.
If you're a casual follower of college hoops, you might find this liberating. While I haven't given you any actual advice on how to fill out your brackets, at least I've absolved you of any guilt you may have had about entering a contest where you have no idea what you're doing.
This entry was posted on Monday, March 12th, 2007 at 4:02 am and is filed under Non-football, Statgeekery. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Darn it, I was hoping you had some real ideas. I don't follow college basketball (I barely know what is going on with Wisconsin and Marquette, the state teams), but people are asking me to fill out office pools.
Doug, you may enjoy King Yao's comments on this topic: March Madness Office Pools.
The problem, of course, is that while we can determine the percentage of entries that guess a certain way, we can't ever really know the true chances that a team will perform the way we want it to.
To use your simple algorithm, if 25% of the people pick Ohio State and they really do have a 25% chance of winning, you have that 1% chance of winning if you pick OSU. On the other hand, if OSU has a 30% chance of winning, then your odds are increased (to 1.2%). OTOH, if they only have a 15% chance of winning, your odds are 0.6%.
And that's the beauty of it. Nobody can ever nail down an exact percentage that a certain team can win a game, a tournament, whatever. We all have our "secrets" that we think positively or negatively impacts a team's chances to win -- "If the other team shuts down Player X, then we're screwed," "Our coach is far better than their coach," etc.
Of course, this is usually subjective and there are so many variables to a team's performance, many of which nobody can ever tell, that actually predicting a team's chances is more art than science. But that's what you have to do to win. If you have information that you believe raises or lowers a team's chances beyond their generally accepted chances of winning, then you adjust your picks accordingly. In theory, at least.
http://www.60feet6.com/research/talks/NCAA2007/NCAA2007.html
This is the output from a million runs using sagarin's predictor ratings. As Doug indicated, this approach usually gets me into the top 90%, but I don't win my office pool very often. Sagarin LOVES NC, btw.
"Assume the entries in your pool are distributed the same as the distribution of actual outcomes of the tournament."
I thought the whole point of filling out a bracket was that you might be able to identify teams that are overrated/underrated by the masses, and use this to your advantage, right? If you make that assumption, you're basically assuming that this can't be done, because the market is perfectly efficient. I mean, I don't think anyone expects to be able to beat a perfectly efficient market. Everyone that enters those pools either just does it for fun, or they're betting that the market is failing in some way that they can predict. Still, cool post.
I dunno, I just pick the teams that win - seems to work pretty well for me, but maybe I'm missing something.
"I dunno, I just pick the teams that win - seems to work pretty well for me, but maybe I’m missing something."
Well, if you win your pool more than the average member of your pool, it must either be that you've been getting lucky, or you've been able to predict the actual outcomes of the tournament better than the rest of your pool did. The latter would be impossible if Doug's assumption of perfect market efficiency were true.
It's not that you consciously think to yourself, "there must be a market failure in this pool that I can recognize and take advantage of." But if you think you can do a better job of predicting winners than the rest of your pool, you must not think that your pool is perfectly efficient. Look, I'm not trying to discourage anyone from entering these pools, I'm sure they're fun. I just don't see why it's so surprising that you can't beat a perfectly efficient market.
That's not the impression I get. 95% of people are in them just for fun. The other 5%, I think, are in them because they think they can pick winners better than everyone else, not because they think they can read the market better than everyone else.
My impression is that, if some hardcore guy in your pool turns in a bracket with Wisconsin winning the title, then that's because he thinks Wisconsin is most likely to win the title. It's not because he thinks North Carolina or Kansas are more likely to win it but the Badgers will be underrepresented on the other brackets.
Maybe I was the last guy to figure this out, but I always thought the best strategy for filling out a bracket was to fill it with the teams I thought had the best chance of winning.
In a typical sports betting situation, say the futures market on this same tournament, I might bet on Wisconsin even if I think North Carolina or Kansas has a better chance to win. But that's because I can get a better payoff on Wisconsin than I can on UNC or KU. In a bracket pool, the payoff is the same no matter who I pick. So I never saw the advantage to picking any team other than the one I thought was going to win.
Here is some tourney info, and here is a link to the NCAA tournament record book,
http://www.ncaa.org/library/records/basketball/m_final_four_records_book/2007/2007_m_final_four_record.pdf:
NCAA expanded to 64 teams in 1985, here is the # of each type of seed that won in each round:
#1 88-76-61-36-19-12
#2 84-55-40-19-10-4
#3 73-41-20-12-8-3
#4 70-39-14-9-2-1
#5 59-31-5-4-2-0
#6 61-34-12-3-2-1
#7 53-16-6-0-0-0
#8 41-9-6-3-1-1
#9 47-3-1-0-0-0
#10 35-17-6-0-0-0
#11 27-11-4-2-0-0
#12 29-14-1-0-0-0
#13 18-4-0-0-0-0
#14 15-2-0-0-0-0
#15 4-0-0-0-0-0
#16 0-0-0-0-0-0
Here are some splits that may or may not have anything behind them:
--The #9 seeds have actually won slightly more games than the #8 seeds in the first round (47-41); however, the #1 seeds are 32-9 vs #8 seeds that advance (78.0%) but are 44-3 vs #9 seeds that advance (93.6%).
--From 1985-1995, the 3,4, and 5 seeds were upset at about the same rate in the first round(10, 9, and 12, respectively). From 1996-2006, the # of 3 seed upsets has been cut in half to 5, while the # of 5-12 upsets has jumped to 17. Since 2001, when the field expanded to 65, the 5 seeds are only 13-11 in the first round. This is either a random split, or the committee is better at seeding the lesser known schools in recent years.
--2 seeds have performed worse in the 2nd round recently. From 1985-1995, the 2's were 31-11 vs. 7/10 in the 2nd round. Since 1996, the 2's are 24-18 in the 2nd round (57.1%). 2 seeds have a losing record versus 10 seeds (9-11) since 1996.
--teams with a losing record in conference play are 14-11 in the first round since 1985.
--3 seeds are 13-15 vs. 6 seeds since 1996, and 25-24 vs 6 seeds since 1985.
--12 seeds are 10-14 against 4 seeds in the second round (plus 4-1 vs. 13 seeds in round 2), so they have advanced to the sweet 16 almost half the time they won in the first round (14-15).
--10 seeds were only 13-31 in the 1st round between 1985-1995. Since 1996, however, they are 22-22 in the 1st round, and 13-9 in the second round. The successful 10 seeds have been a mix of good mid-majors (Gonzaga, Nevada, Kent St, Miami-Ohio) and double digit loss teams from major conferences, many of whom limped into the tourney off poor conference tourney showings or end of season slumps (NC State 05, Auburn 03, Georgetown 01, Seton Hall 00, Purdue 99, Providence 97, Texas 97). So, basically, any of the #10 seeds this year fit the profile (Creighton, Gonzaga, Georgia Tech, and Texas Tech).
This year's #2 seeds seem impressive enough, but are they any more impressive than Ohio State last year, or defending champ UConn or Wake Forest with Chris Paul in 2005?
Doug said:
For individual games within the tournament, I don't think it's a very good assumption, although I could be wrong.
Consider three hypothetical opponents in your tournament pool:
1. Bob always picks the team he thinks will win.
2. Joe always picks the team whose uniform he likes better.
3. Allison checks the Vegas odds, and if Virginia is 59% to win, she rolls a 100-sided die and picks Virginia 59% of the time.
If your pool is populated with Bobs, you should generally pick the underdogs.
If your pool is populated with Joes, you should generally pick the favorites.
If your pool is populated with Allisons, it doesn't matter what you do.
In most real life pools, I suspect that there will be some Bobs and some Joes -- with Bobs predominating, and also showing some minor Allison-like tendencies. That is, most people will generally try to pick winners but will throw in a few upsets here and there to keep things interesting. A few people will pick pretty much randomly.
I don't have any empirical evidence to back up my suspicion that this is how most pools work, but I think it's closer to reality than hypothesizing that most pools are dominated by Allisons.
If my suspicion is correct, picks on the 20-1 dogs are probably overrepresented because of the Joes, while picks on the 8-7 favorites are probably overrepresented because of the Bobs. (The Bobs will pick the favorite more than 8/15 of the time, even if he expects the favorite to win only 8/15 of the time. Bob is not Allison.)
If this is the case, the best strategy would generally be to pick the heavy favorites and the mild dogs.
Moreover, most of the Bobs are probably relying on seeds more than Vegas odds to determine which teams are most likely to win (at least, in games involving teams they are unfamiliar with). So, as King Yao suggests, an advantage can perhaps be gained in games where official seeds do not accurately reflect Vegas odds. We should pick Vegas favorites whenever they are the worse seeds, and also pick slight Vegas dogs when they are the worse seeds. These are the teams most likely to be underrepresented in the Bobs' picks, and since they are somewhere close to even money (i.e., not huge dogs), the Joes' picks won't skew things much.
Whatever you do, don't try the diversification method.
A particular Bob will pick more than 8/15 of the 8-7 favorites, but a bunch of independently-acting Bobs might indeed pick 8/15 of the 8-7 favorites, because they will disagree on who the better team is.
I'm not assuming that any individual person will behave like Allison. I'm assuming that the mob will, as a whole, end up acting (somewhat) like Allison because of the variety of different opinions within the mob.
I could be wrong, of course, and there is little doubt that King Yao's advice (thanks for the link, BTW, that's a good blog) is more practical than mine in any case. But I'm not sure that a bunch of Bobs (especially if there are a few Joes and if the Bobs do indeed display some occasional Allisonishness) invalidate the assumption.
This may well be true if people are handicapping the games themselves, but it wouldn't be true if they are going by seeds.
"That’s not the impression I get. 95% of people are in them just for fun. The other 5%, I think, are in them because they think they can pick winners better than everyone else, not because they think they can read the market better than everyone else."
That's not what I was trying to say. What I was trying to say was that, if you think you can pick winners better than everyone else, then it follows that you don't think your pool is perfectly efficient. If your pool were perfectly efficient, then it would "know" who's most likely to win, and how likely they are to win, so you wouldn't be able to pick winners more often than the rest of your pool does. Any deviation from the choices of a perfectly efficient market would be worse, not better, so if you think it's possible to pick winners better than your pool, you must think that your pool isn't perfectly efficient (and btw, I think you'd be right).
I think most people who think they can pick winners better than the rest of their pool just assume that their market is not perfectly efficient from the start, without considering the alternative. And, as this post showed, even if they're wrong, they won't do any worse than anyone else, so why not?
You might be right about that maurile. But to counterbalance that, you've got the Joes, and also the tendency of typical Bobs to pull stuff like, "well, since I know that one of the #3 seeds will probably lose, I'd better pick one of the #3 seeds to lose."
I think the market efficiency thought is right in the early rounds, but in thinking back to the pools I have run, in the later rounds, I do think there is a tendency for the higher seeds to be over-represented and for their to be value in identifying the "value" picks. People are not comfortable in taking their upsets to go too far past the round where the initial upset occurred.
Though most people don't put math to it, I would guess that people are willing to take a risk so long as the chances are roughly 25% or better of the event happening. In the first round, this means most people will throw in upsets in the 4/13 and 5/12 games. Not as many People like to take risks beyond the early rounds.
This year, I think #6 seeds Duke and Notre Dame present good value, as well as #10 Ga Tech.
This is a pretty intense breakdown of "how to win". I was writing an article on How to Win the NCAA office pool every time and linked here. I cover a few other interesting strategies people have.
http://www.dkworldwide.com/techlife/archives/2007/03/16/how-to-win-your-ncaa-pool-everytime/trackback/