Weighting Surveys
The NY Times reports:
There is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election.
He is sure he is going to vote for Donald J. Trump.
And he has been held up as proof by conservatives — including outlets like Breitbart News and The New York Post — that Mr. Trump is excelling among black voters. He has even played a modest role in shifting entire polling aggregates, like the Real Clear Politics average, toward Mr. Trump.
How? He’s a panelist on the U.S.C. Dornsife/Los Angeles Times Daybreak poll, which has emerged as the biggest polling outlier of the presidential campaign. Despite falling behind by double digits in some national surveys, Mr. Trump has generally led in the U.S.C./LAT poll. He held the lead for a full month until Wednesday, when Hillary Clinton took a nominal lead.
Our Trump-supporting friend in Illinois is a surprisingly big part of the reason. In some polls, he’s weighted as much as 30 times more than the average respondent, and as much as 300 times more than the least-weighted respondent.
That is a huge weighting. I get nervous if a respondent has a weighting of more than two, and preferably you aim for weights of say between 0.75 and 1.5.
Alone, he has been enough to put Mr. Trump in double digits of support among black voters. He can improve Mr. Trump’s margin by 1 point in the survey, even though he is one of around 3,000 panelists.
He is also the reason Mrs. Clinton took the lead in the U.S.C./LAT poll for the first time in a month on Wednesday. The poll includes only the last seven days of respondents, and he hasn’t taken the poll since Oct. 4. Mrs. Clinton surged once he was out of the sample for the first time in several weeks.
How has he made such a difference? And why has the poll been such an outlier? It’s because the U.S.C./LAT poll made a number of unusual decisions in designing and weighting its survey. …
Just about every survey is weighted — adjusted to match the demographic characteristics of the population, often by age, race, sex and education, among other variables.
The U.S.C./LAT poll is no exception, but it makes two unusual decisions that combine to produce an odd result.
■ It weights for very tiny groups, which results in big weights.
■ It weights by past vote.
Thomas Lumley comments at Stats Chat:
Even in New Zealand, you often see people claiming, for example, that opinion polls will underestimate the Green Party vote because Green voters are younger and more urban, and so are less likely to have landline phones. As we see from the actual elections, that isn’t true.
In fact the Greens tend to do worse than the polls have them.
Pollers know about these simple forms of bias, and use weighting to fix them — if they poll half as many young voters as they should, each of their votes counts twice. Weighting isn’t as good as actually having a representative sample, but it’s ok — and unlike actually having a representative sample, it’s achievable.
Exactly.
One of the tricky parts of weighting is which groups to weight. If you make the groups too broadly-defined, you don’t remove enough bias; if you make them too narrowly-defined, you end up with a few people getting really extreme weights, making the sampling error much larger than it should be. That’s what happened here: the survey had one person in one of its groups, and that person turned out to be unusual. But it gets worse.
The impact of the weighting was amplified because this is a panel survey, polling the same people repeatedly. Panel surveys are useful because they allow much more accurate estimation of changes in opinions, but an unlucky sample will persist over many surveys.
Worse still, one of the weighting factors used was how people say they voted in 2012. That sounds sensible, but it breaks one of the key assumptions about weighting variables: you need to know the population totals. We know the totals for how the population really voted in 2012, but reported vote isn’t the same thing at all — people are surprisingly unreliable at reporting how they voted in the past.
The NZ Political Polling Code recommends against weighting by previous vote for this exact reason – people are unreliable in reporting this. There is a tendency for more people to say they voted for the winning party and candidate than they actually did. And when a party gets into trouble, fewer voters will admit to having voted for them last time. For example over 4% of people voted Conservative last election, but a far smaller percentage will actually report doing so at the present because of the Colin Craig issues.
So weighting is good, but if you do it badly it may make a poll less accurate, not more accurate.