Weighting and Other Poll Controversies

I have always found opinion polls to be fascinating, and yet I often mistrust the way in which polls are built and their results reported. It also occurs to me, that most folks do not really understand polling as a science, and so take it as, well, the political version of a horoscope. When I extrapolate that to the election, it explains quite a lot of just how some folks vote, but that's beside the point for here. What I want to do today, is to explain just how it is that someone can take the opinions of a few to portray the opinion of the many, and what factors are the most influential in how a poll's results are reported.

Let's say you want to know how a group of a thousand people are likely to vote on an issue, say shareholders considering a potential merger with another company. If you ask one person, you will get one opinion, and that opinion would obviously represent 1/1000th or 0.1% of the group, so while you might be interested in that person's opinion, you would not take it as a solid indicator of how the whole group feels. But at the same time, you really don't want to have to ask all one thousand people, just to get an idea of what they would say now. If you ask a second person, you have still only addressed 0.2% of the whole, but at the same time your respondent pool has doubled in size and therefore increased in accuracy. To put it another way, let's say that 665 of the 1,000 people would vote in favor of a proposed merger. It's possible that you might get through all 335 people opposed to the merger before you get to anyone in favor of it, which would falsely indicate strong opposition, but if you make sure your queries are random, you are likely to start approaching a representative sample by the time you get to just ten people. Why? The key is partly how many people you ask - if you do it right, each person who answers your question lowers the statistical probability of error in your result by a relevant factor, a factor determined by the proportion of your respondent pool but also by the category of interest. In the example of the stockholders, for example, regional location, length of experience with the company, and preference for stock price or dividends might be relevant to how they would vote. That is, all of the stockholders who prefer a higher stock value to a higher dividend payment would be likely to vote the same way on the merger decision, and so the opinion of a relative few who have similar characteristics can reasonably represent the opinion of everyone in their group. Therefore, if the respondent pool includes a proportional representation of the whole population concerned, then statistically the small group may be expected to reflect the larger group's opinion in scale. Over the course of the last seventy years, polling groups have found that once a respondent pool reaches eight hundred or more, the margin of error in a national contest is generally below four percent, meaning that in a two-candidate race the polling results for the candidate is within a four-point radius; if 'A' and 'B' poll at 42% and 48%, for example, A's true level of support could actually be anywhere from 38 to 46 percent, while B could be anywhere from 44 percent to 52 percent in support. Frankly, in most elections this margin of error means that no clear message can or should be taken in terms of who is winning or by how much. The poll, however, is a valid tool for measuring development of support, when the questions and methodology used in the poll are consistent, and when the weighting used is consistent with Census norms.

- to be continued -

This brings us back to weighting. By now it should be obvious that the weighting of a poll is critical to its determination. For example, let's say you have a poll with exactly one thousand respondents. However, you have 700 Whites responding, with 150 Asians, 100 Hispanics, and 250 Blacks. The 2000 US Census reports that the racial breakdown is 71.6% White, 12.3% Black, 12.5% Hispanic, and 3.6% Asian. To match these demographic statistics, the polling data would then be weighted in the following manner:

The results from White respondents would be divided by 70.0 then multiplied by 71.6;
The results from Black respondents would be divided by 25.0 then multiplied by 12.3;
The results from Hispanic respondents would be divided by 10.0 then multiplied by 12.5; and
The results from Asian respondents would be divided by 15.0 then multiplied by 3.6.

This, of course, is only the racial weighting. Similar actions would be taken to adjust the statistical values of male and female responses to match Census norms, and responses would also be adjusted to match other relevant demographics, like age, geographic location, education, job category, military experience, and so on. The intent is to create an image aligned as correctly with the national model as much as possible. The problem, of course, is that every national poll is therefore manipulated to some degree.

There are three key problems to weighting polls. First, polls are driven by budget and time constraints, and as a result the weighting is often generalized, and not by the same method in each case. Some political polls, for example, start their age category with a broad "18-34" category, while others use a more narrow "18-24" or even "18-22" category to show college-age support. Worse, the range values sometimes fluctuate even by the same polling group, so that consistent methodology is lost, making the poll significantly less valid. Next, some polls have been known to fudge their weighting to match a different standard than the last Census. CBS and the New York Times, for example, have often ignored Census norms in favor of some arbitrary measure, which also violates the standard used in legitimate polling. And then there are the categories which defy clear definition. Almost no two major polls agree exactly, about what proportions of Republican and Democrat and Independent respondent should be used. Part of this is the fact that many states do not register political affiliation, and therefore the federal Census does not break down the population by party affiliation. So far, that doesn't really bother me, except that the reader had better be aware that different polling groups will use different proportions in the way they weight political responses, because even though there is no official and firm balance of Republican-to-Democrat-to-Independent-to-Something Else, polls do indeed weight poll responses according to their party affiliation,. What's worse, some of them will change the proportions from time to time, on no evidence beyond their belief that the mood has changed. This, of course, immediately invalidates the poll as an indicator of growing or lessening strength of support.

A poll is a useful indicator of trends and individual development of support by a candidate, provided the standards, methodology, and weighting remain constant. Otherwise, an opinion is absolutely worthless. Caveat Emptor, and then some.


Comments (9)

Good explanation, DJ. The e... (Below threshold)

Good explanation, DJ. The extent to which numbers can change when the sample being polled has been drawn according to correct (Census-based) proportions is pretty underwhelming, though. When I run a study and sample respondents based on age, gender, and region within the country, percentages generally do not change more than 2 or 3% (and usually less than that if the sample size is robust).

You are right about the difficulty identifying proportions along party lines. The Census doesn't have this information, nor does either political party. Best perhaps to target an equal number and report the figures as based on a hypothetical universe of equal numbers of Republicans and Democrats, with a significant chunk of 'unaffiliateds' or 'independents' as a remainder. I really don't know how this would be done, and am curious. I'm guessing that a random sampling of 1,000 respondents in any given area would fall out pretty close to something reflecting party/ideological affiliation. (Couldn't really be otherwise, could it?)

Another point of curiousity: if the NY Times polls don't weight data to Census figures, what do they weight it to? Surely they have to report that. There are national polling/surveying standards that have to be upheld and are enforced, aren't there? (There are in Canada.) You can't just weight a survey to the whims of an editor-in-chief or whatever.

heh, near as I can tell tha... (Below threshold)
DJ Drummond:

heh, near as I can tell that's just what they do, hyperbolist. And that's not just the NYT I am ragging on, I have less than total confidence in Rasmussen and Harris for the same thing, that there are aspects of their methodology which they refuse to discuss. In all three cases, they have told me - in case you wonder, some years back I harassed all of the major poll services about their methodology and weighting - that those details are proprietary. That would be reasonable, except that Gallup, SurveyUSA, and Pew all had no problems telling me how they built their models. Of course, at their level you just about need a Cray to crunch the numbers so it's not as if they are worried about me starting my own polling service.

DJ,Pardon me, but ... (Below threshold)


Pardon me, but I'm confused when you say: "Part of this is the fact that many states do not register political affiliation," While this may be true in the case of the census every 10 years, doesn't every state have records of what the current registered voters are?

Why wouldn't you use those numbers to do party breakdowns?

You can't do it Kenny, beca... (Below threshold)
DJ Drummond:

You can't do it Kenny, because not every state registers voters by party affiliation. Texas, for example, registers voters without party affiliation and has no way to track who is a Democrat, who is a Republican, or what. I voted in the GOP primary here, but legally I could have voted in the Democratic primary instead, if I had wanted to (though I could not vote in both primaries) without having to declare myself a Democrat or anything. So there is no hard number on just how many Democrats or Republicans there are, just estimates.

That's where it gets fun. Some folks would like to use voter support for parties from prior elections, but in actual fact there are probably more Republicans than voted for the GOP in 2006, and more Democrats than voted for Kerry in 2004. I can't call anyone on whatever system they want to use, except that I do not agree that a lot of folks change pary loyalty very much, so when some yahoo tries to say that the Democrats or Republicans are losing all their members, I call that hooey. It's usually that the voters for the party in the minority tend to be less noisy about it.

DJ,Thanks for the ... (Below threshold)


Thanks for the clarification. I wasn't aware that some states don't collect the party information when registering voters.

In most poll results that I see or read about, the details of the standards, methodology, and weighting are not disclosed, so how can I make a judgement as to the validity of a poll?



I know how you feel about r... (Below threshold)

I know how you feel about regulation, but it seems as though an organization purporting to present objective information ought to follow some sort of standard--enforced by their industry, or by the government. That's how it works here, and it works well. I'd write a letter if I was an American citizen concerned about the influence polls have on people prior to elections...

Thanks for the response, DJ.

'morning Kenney, hyperbolis... (Below threshold)
DJ Drummond:

'morning Kenney, hyperbolist. Well, that's sort of where I came in back in 2004. Like hyperbolist, I was under the impression that somebody had to be making sure the polls were kosher. I was a bit shocked to find out otherwise. There are two major groups which work to make sure polls are built according to standard guidelines; the National Council on Public Polls (NCPP),


and the American Association of Public Opinion Research (AAPOR)


The problem is, neither group has deterrent or punitive authority, so if someone puts out a bogus poll, there's no penalty for doing it, especially if folks do not find out.

I have to say that when I dug into polling in 2004, I found that most groups have internal codes of conduct by which they abide. For example, while I still do not know how CBS/NYT decide the party weighting in their polls, I do note and appreciate that their polls always include a demographic breakdown if you drill down far enough, so you can reverse-engineer their polls to see the raw data.

One place I would recommend you check, is the NCPP's records of how polls have done in the past.


Click on the links in the sidebar to check their analysis and evaluation of polling for any national election since 1936.

Hyperbolist, I agree with you that people need to know which polls are the most worthy of their attention and trust, and even more important, to understand what a poll can and cannot tell you. That's one reason I do things like this, and an area where I believe blogs can be a valuable asset for the public in general.

I'll continue to read and a... (Below threshold)

I'll continue to read and appreciate posts like this, DJ, for the sake of curiousity and to gain a better understanding of my own line of work (how it's conducted in the United States).

Thanks again!

DJ,Thanks for the ... (Below threshold)


Thanks for the explanation and the links.







