More Fun With Poll Numbers

The election is coming to a close, or at least we hope so (thank you Al Gore for proving that sometimes the nightmare just continues). All along, I have been saying that the poll numbers are invalid on their own standards, and once again I found another reason to repeat that claim: The state polls contradict many of the national polls.

The claim made by those who like the polls, has generally run along the lines that they cannot all be wrong, and that a consensus of the polls should be trusted. I hardly agree, because of a factor in statistics known as collinearity. Here’s the formal definition from statistics.com: “In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably.”

Informally, collinearity is a warning to statisticians to make sure that they are using data which is truly independent of other data. When data is redundant or co-related, using the additional data gives an invalid additional weight to the data used, corrupting the results. Tests have been created to detect multicollinearity, such as the Farrar-Glauber test (most commonly used in econometrics), but it does not appear that vector testing is commonly practiced in opinion poll analysis.

The math in that line of testing tends to get a bit complex for a casual discussion, so for here I will come back to another point of opinion polling: the statistical level of confidence. That is a critical test for an opinion poll, and what it means is a quick reference on whether the poll is valid. “Valid” does not mean right or wrong, it means the poll’s method is considered trustworthy. “Invalid” means that whatever the poll says, you should not rely on it. Again, I refer the reader to the National Council on Public Polls (NCPP), and their criteria for polling and their principles of disclosure. In short, when a poll will not tell you who paid for the poll, hides how many people refused to take the poll when contacted, or refused to release internal demographics used in the poll and from the response pool, that poll is in direct violation of NCPP rules and should not be taken seriously, even if you find their results believable. The bad news there, is that almost none of the publicly-released polls are in full compliance with NCPP standards.

Going back to the question of the confidence level, though, it’s a simple test for validity. All of the major polls use – or claim to use – similar methodologies and demographic weighting, with the exception of party affiliation weighting. Some of these groups insist that party affiliation is not a static demographic, and therefore should not be weighted at all, so for here we will use their logic in applying the numbers. The polls all claim a 95% confidence level. In statistics, they are saying that if the same method is used, polls should produce results within the margin of error 19 times or more out of every 20 polls. So, it should not be difficult to test that claim.

Here are the polls listed at Real Clear Politics for the last ten days (where a poll has been done more than once in that period, the most recent results are used) . I am listing these in descending order of support for Barack Obama, then in support for John McCain, noting a 3% claim for MOE and how many polls agree or disagree with the stated poll:

Pew Research – Oct 26 – Obama 53% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – Obama 53% (agree 8, disagree 4) FAIL
ABC News/WaPo – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
CBS News/NYT – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
Rasmussen – Oct 30 – Obama 51% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – Obama 51% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby – Oct 30 – Obama 50% (agree 12, disagree 0)
Gallup (Traditional) – Oct 29 – Obama 50% (agree 12, disagree 0)
Ipsos/McClatchey – Oct 27 – Obama 50% (agree 12, disagree 0)
GWU/Battleground – Oct 30 – Obama 49% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
IBD/TIPP – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
FOX News – Oct 29 – Obama 47% (agree 6, disagree 6) FAIL

– continued –

Rasmussen – Oct 30 – McCain 47% (agree 7, disagree 5) FAIL
GWU/Battleground – Oct 30 – McCain 45% (agree 9, disagree 3) FAIL
Gallup (Traditional) – Oct 29 – McCain 45% (agree 9, disagree 3) FAIL
Ipsos/McClatchey – Oct 27 – McCain 45% (agree 9, disagree 3) FAIL
FOX News – Oct 29 – McCain 44% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – McCain 44% (agree 11, disagree 1)
ABC News/WaPo – Oct 29 – McCain 44% (agree 11, disagree 1)
IDB/TIPP – Oct 29 – McCain 44% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby – Oct 30 – McCain 43% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – McCain 42% (agree 10, disagree 2) FAIL
CBS News/NYT – Oct 29 – McCain 41% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – McCain 41% (agree 8, disagree 4) FAIL
Pew Research – Oct 26 – McCain 38% (agree 2, disagree 10) FAIL

Note that every polling agency fails one side or the other of this validity test. Every one of them.

But let’s move on. We can look at the RCP averages from one of two perspectives. The RCP folks take the polls from the last week by polling date (not release date) and average them. That gives a claim that Obama is leading McCain 49.7% to 43.8%, with a 3 point MOE. If we extend that back to polls taken October 20 or later, then it becomes Obama 50.3%, McCain 43.3%. So, RCP’s national polls, if aggregated as they like it, show a 5.9% lead or a 7.0% lead.

OK, now let’s take a look at the RCP state polling. There are dozens of polling groups which have put out state polls, and I cannot speak here to their total authenticity. That, of course, is also a problem with some of the national polls, but for consistency we can use the RCP numbers. Now, if each state’s aggregate claimed level of support for Obama or McCain is applied to the state’s proportional level of the national vote (using 2004 voting statistics), we find that if the state aggregations are right for RCP’s state averages, plugging those numbers in gives Obama 46.9% of the popular vote, to 43.9% for McCain. The aggregation of the state polls, is we are going to accept them as valid, shows that the national polls are overstating Obama’s support. Once again, a simple check for validity shows that the confidence level test fails for the national polls.

One last thing. The state polls have assumed a significant shift from 2006 towards increased democratic participation, but even if that happens, the state polling indicates that Obama will still fail to reach 50% support. If those polls are reweighted according to 2006 turnout proportions and then plugged in to project national numbers, it becomes Obama 46.3% and McCain 47.1%, with 6.6% undecided. Take from that what you will.

Size matters III
No Longer a September 10 Election