November 15, 2012 7:00am by David Farrar

Nate Silver

The Sydney Morning Herald reports:

The political emperors have no clothes, stripped bare by a big-data wizard named Nate Silver who showed dispassionate maths was more reliable than pundit intuition and cherry-picked polls.

Silver, 34, a statistician who previously predicted the career trajectories of baseball players, accurately tipped 49 out of 50 US states (with the 50th, Florida, highly likely to be accurate as well as Obama is ahead with 97 per cent of the votes counted) and most Senate contests.

As right-wing pundits attacked him and his “voodoo statistics” for failing to see that the election was on a knife edge – and in the case of some conservative wingnuts, for being openly gay and “effeminate” – Silver held his nerve and for the entire election cycle maintained that the data always pointed to an easy Obama victory. …

Even after Obama’s dismal first debate performance, Silver’s probability of Obama winning never dipped below 61.1 per cent, rising to more than 90 per cent on election day.

I am a big fan of both Silver’s analytic skills, and his demeanour while under fire. He deserves a lot of credit.

It is worth pointing out though that all the major polling aggregation sites did very well as reported by Cnet:

But Silver wasn’t the only one to do exceptionally well in the prediction department. In fact, each of the five aggregators that CNET surveyed yesterday — FiveThirtyEight, TPM PollTracker, HuffPost Pollster, the RealClearPolitics Average, and the Princeton Election Consortium — successfully called the election for Obama, and save for TPM PollTracker and RealClearPolitics handing Florida to Romney, the aggregators were spot on across the board when it came to picking swing state victors.

So if you listened to the polls rather than the pundits, you were likely to be correct. Why then is Silver the new political celebrity rather than say Mark Blumenthal who does HuffPost Pollster?

I think it is partly because Silver was attacked by several prominent pundits before the election. Those attacks backfired by giving him not just accuracy but vindication.

The other reason is that Silver does a bit more than just aggregate and weight the polls. His extra tweeks may not make a huge difference but they are seen as useful by many.

In addition to picking the winner in all 50 states — besting his 49 out of 50 slate in 2008 — Silver was also the closest among the aggregators to picking the two candidates’ popular vote percentages. All told, he missed Obama’s total of 50.8 percent by just four-tenths of a percentage point (50.4) and Romney’s 48 percent by just three-tenths of a point (48.3) for an average miss of just 0.35 percentage points. HuffPo Pollster and RealClearPolitics tied for second with an average miss of 0.85 points.

This may change a bit as the final votes come in. It is worth noting also that Silver didn’t have a 100% accuracy rate with calling Senate races. Again this takes nothing away from his highly deserved reputation – just that even his model is not infallible The strength of his model, as I see it, is that it learns from the past.

So what does Silver do to predict who wins. His exact methodology is secret (he has said he may reveal more over time) but he has detailed what he does for Senate races. My summary of it is:

Average the polls for that state
Give more recent polls a higher weight using an exponential decay formula
Weight by sample size so larger sample polls have more weight
Assign an accuracy rating to each pollster and weight those historically more accurate, higher. Exclude polls from very dodgy pollsters or polls released by parties. Note that many other polling aggregators also do steps 1 to 4. What is unique to Silver tends to be the later steps.
Adjust the result based on the national trend, so if nationwide one party has dropped say 5% in one week, assume it applies to that state also.
Adjust the result based on observed “house effects” for pollsters. So if one pollster consistently has Democrats 2% higher than they get, then take 2% off their poll.
Adjust polls of registered voters as if they were of likely voters, based on the normal difference between such polls (Republicans do better with likely voters).
Do a regression analysis of the state based on their partisan voting index, their party identification, donations to candidates, incumbency status, approval ratings for incumbents, and previous offices a candidate has been elected to
Add the results of the regression analysis to the weighted average of polls, as if it is a poll.
Do an error calculation
Stimulate the election and report how often one candidate beats the other over multiple simulations

So Silver has a very sophisticated model. I think for presidential elections he also uses economic data such as GDP growth and unemployment rates. Over time as more and more data is gathered, his model should remain accurate or become even more accurate.

There will be times when it will be wrong, just as the polls sometimes get it wrong. No model can compensate if the election is very volatile and large numbers of voters change their mind or are undecided in the final few days. Events will always matter.