So you're not much of a college basketball fan. You follow your alma mater, and possibly keep loose tabs on the rest of their conference, but that's about as far as it goes. But the tourney is good entertainment and, as is customary, you enter a bracket pool so you can have a rooting interest where none would otherwise exist. How do you maximize your chances winning the thing?
If you're like me, the first thing you do is you head someplace like this smorgasboard of computer ranking algorithms and check out a few of them to get a quick feel for which teams appear to be over- or under-seeded. Some of them even do the work for you by putting a specific probability estimate on each team's chances of advancing to each round.
Whatever the rules of your bracket pool, you probably get some sort of score associated with your entry. And the highest score wins. In most pools, you can use estimates like those above to compute (at least approximately) the expected score of each possible entry. Now simply find the entry with the highest expected score and turn it in.
That's what I used to do. Only recently did I realize that that's wrong. Maximizing your expected score is not the same as maximizing your chance of having the highest score. Your goal is the latter, not the former.
To see why they're not the same, imagine a simple pool where you are simply trying to pick the winner of the tournament. Let's say that in the very likely event of a tie, the winner will be selected randomly from among those who correctly picked the champion. You believe these are the probabilities of each team winning the tourney:
Ohio State: 25%
Texas A&M: 10%
Washington State: 5%
The "score" of your entry in this simple pool is either one or zero, depending on whether you pick the champ correctly or not. So the entry with the highest expected score is Ohio State. But Ohio State might or might not be the entry that maximizes your chance of winning the pool. It depends on who everyone else picked. If you were the only Buckeye-picker, then great. But if 90% of the other pool participants picked Ohio State, then you'd be better off picking Washington State.
So, while Ohio State is the "best" pick in some sense, it's also likely to be a "crowded" pick, and that's the problem. You may be better off going with a "worse" pick, if it's a pick that's less popular. That's a simple example, but the same issues are present in a real pool. Even if there aren't necessarily ties, the best picks are also going to be the most popular picks, and that's going to cause the same kind of crowding. If you pick the entry that you believe is most likely to occur, then there will be lots of other entries that look very similar to yours. This is problematic because you know you're going to miss on a lot of games. And if your entry is too centrist, it's likely that there will be an entry that looks just like it except that it got a few of the games you missed.
The other extreme is to pick an entry with Cinderellas and longshots aplenty. This avoids the crowding problem. With a wacky entry, even if you miss a lot of games, there are not likely to be many entries close to yours to capitalize on your mistakes. The problem here is that, if you turn in a wacky entry, you probably won't end up being even close. That's what makes it a wacky entry.
To make this a little more concrete, imagine two extreme strategies:
Strategy #1: pick a final four with two #1 seeds, a #2 seed and a #3 seed.
Strategy #2: pick a final four with two #9 seeds and two #7 seeds.
The upside of Strategy #1 you're very likely to hit at least a couple of the final four teams. The downside is that, if your final four hits, you're probably not the only one who has it.
The upside of Strategy #2 is that, even if you just get one or two of the final four teams correct, you're probably still doing better than everyone else. The downside is that you're not likely to hit even one.
And of course you don't have to be at one extreme or the other. There is a continuum of possibilities in between. So where do you want to position yourself? You can't answer that question unless you know what the other entries in your pool look like, and you're probably not going to know that. So you have to make some assumptions.
If it's a big contest with a mixture of hardcore and casual fans, I think it's reasonable to expect that the entries will generally cluster around the most likely outcomes, but that there will be some longshot entries mixed in. With that in mind, I'm going to make the following assumption:
Assume the entries in your pool are distributed the same as the distribution of actual outcomes of the tournament.
Roughly speaking, what this means is that, if you think Ohio State as a 25% chance of winning the tourney, then about 25% of the pool's participants will pick Ohio State to win it. If you think there is a 1% chance of a final four consisting of Florida, UCLA, Texas A&M, and Georgetown, then about 1% of the pool's entries will have that for a Final Four. If you think Virginia Tech has a 59% chance of beating Illinois in the first round, then around 59% of the entries will have Virginia Tech beating Illinois. And so on.
Is this a reasonable assumption? I think it's at least in the ballpark. Yahoo.com publishes the entries in its Tournament pick 'em contest and they match up reasonably well with objectively-generated probabilities (e.g. from Sagarin ratings and the like). Not perfectly, but reasonably. This shouldn't be too surprising. Sports gambling markets are often cited as an example of the wisdom of crowds and are generally believed to be pretty efficient.
So let's go back and apply this assumption to our drastically simplified pool, where we are only picking the champion. If these are the probabilities of each of these teams winning the title:
Ohio State: 25%
Texas A&M: 10%
Washington State: 5%
Then our assumption would imply that the above is also the distribution of entries. Twenty-five percent of the people would take Ohio State, 20% UCLA, and so on. If that's the case, then what is the best pick?
There is no best pick! Your chances of winning are the same no matter who you pick.
If there are 100 entries for example, then 25 of them took Ohio State. So if you are one of those 25 riding the Buckeyes, your chances of winning are 1%: a 25% chance they'll win, and then a 1-in-25 chance that you'll win the tiebreaker. If you take Washington State, you've also got a 1% chance of winning: 5% chance of the Cougars winning, then a 1-in-5 chance of winning the tiebreaker. Regardless of which team you look at, the analysis will turn out the same: you have a 1% chance of winning. One percent, of course, is one of a hundred, because you are one of a hundred people in the pool.
But that's an oversimplified situation. What happens in more complicated settings?
As many of you know, I teach math for a living. Last summer, I got a student and a colleague interested in investigating this question with me. Some very interesting (to us, anyway) mathematics arose from the investigation.
As an abstract model of the tournament prediction problem, we imagined the following game. Suppose that a random number, called the target, is to be chosen. Millions of participants will guess what the number will be, and whoever guesses closest is the winner. Let's say, just for example, that it is to come from the standard normal distribution. So there is about a 2/3 probability that the target will be between -1 and 1, a 95% chance that it will be between -2 and 2, a 99% chance that it will be between -3 and 3, and so on. Your job is to guess closer to the target than any other competitor, and let's assume that their guesses are distributed as independent standard normals as well. In other words, two-thirds of the guesses will be between -1 and 1, 95% between -2 and 2, and so on.
If you guess near zero, then you are likely to be close to the target. But you are also likely to be crowded out by the multitudes of other guesses that are in the same vicinity. If you make a guess far out in the tail, like say 3.4, then there aren't many guesses near yours, but the target isn't likely to be near your guess either. If you picture a standard bell curve, you can picture the choice as being between a tall skinny piece of the distribution (a guess near zero) or a short fat piece (a guess far from zero). Which gives you the better chance of winning?
As it turns out, it doesn't matter. Either is as good as the other. And anywhere in between is also just as good.
Even more interesting is that it does not matter that the distribution is standard normal. No matter what the distribution is (well, there are a few technical caveats, but I don't feel like I'm betraying the spirit of the results to say that it doesn't matter), as long as the distribution of entries is the same as the distribution of possible outcomes, and as long as the pool has a lot of entries, it doesn't matter what you guess.
So, at least to the extent that you believe our abstract game models your pool reasonably well, any guess is as good as any other. Fill out your bracket based on geography, uniform color, fierceness of mascot, or whatever other criteria you want. Your chances are as good as anyone else's.
If you're a casual follower of college hoops, you might find this liberating. While I haven't given you any actual advice on how to fill out your brackets, at least I've absolved you of any guilt you may have had about entering a contest where you have no idea what you're doing.