Terrence Tao has a great blog entry about the mathematics of polling. One thing he points out, which to me at first was unintuitive, is that polling accuracy is independent of the total population size given a few basic assumptions (simple random sampling , honest responses, fixed poll size, etc…) I higly recommend reading his post, if only for his intuitive examples of why the total population size is irrelevant.
[Update: some of his examples are so good, so I will just share them here. To test the salinity of the ocean accurately how many water samples would you need to take? Although the ocean contains ~ liters of water, only a sample of a few milliliters () is necessary to get a good estimation. That’s -24 orders of magnitude (compared to -5 orders of magnitude for presidential polls). Although I would guess that the entropy (uniformity) of ocean salinity is much higher than that of political opinions. Another example is identifying faces. Faces have their appearance based on millions of cells, however, people can readily identify a face based on only a few bytes of data such as a black and white pixelated image.]
So what can we do with this polling magic? FiveThirtyEight is a great website where baseball statistician Nate Silver used his techniques to predict the presidential election. He predicted, before the polls opened, Obama would receive 349 electoral votes and he actually received 365. The only states he missed were Indiana and Missouri — pretty impressive.
Another great site which deals with predictions is Intrade. This is where people buy contracts on certain future events. The price of the contract reflects how likely the market believes it to be true. (although as economist Gregory Mankiw has pointed out this might not necessarily be true, since it is also an effective hedging instrument.) I believe Intrade also only missed predicting Indiana and Missouri before the polls opened.