### More Fun With Poll Numbers

The election is coming to a close, or at least we hope so (thank you Al Gore for proving that sometimes the nightmare just continues). All along, I have been saying that the poll numbers are invalid on their own standards, and once again I found another reason to repeat that claim: The state polls contradict many of the national polls.

The claim made by those who like the polls, has generally run along the lines that they cannot all be wrong, and that a consensus of the polls should be trusted. I hardly agree, because of a factor in statistics known as collinearity. Here's the formal definition from statistics.com: "In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably."

Informally, collinearity is a warning to statisticians to make sure that they are using data which is truly independent of other data. When data is redundant or co-related, using the additional data gives an invalid additional weight to the data used, corrupting the results. Tests have been created to detect multicollinearity, such as the Farrar-Glauber test (most commonly used in econometrics), but it does not appear that vector testing is commonly practiced in opinion poll analysis.

The math in that line of testing tends to get a bit complex for a casual discussion, so for here I will come back to another point of opinion polling: the statistical level of confidence. That is a critical test for an opinion poll, and what it means is a quick reference on whether the poll is valid. "Valid" does not mean right or wrong, it means the poll's method is considered trustworthy. "Invalid" means that whatever the poll says, you should not rely on it. Again, I refer the reader to the National Council on Public Polls (NCPP), and their criteria for polling and their principles of disclosure. In short, when a poll will not tell you who paid for the poll, hides how many people refused to take the poll when contacted, or refused to release internal demographics used in the poll and from the response pool, that poll is in direct violation of NCPP rules and should not be taken seriously, even if you find their results believable. The bad news there, is that almost none of the publicly-released polls are in full compliance with NCPP standards.

Going back to the question of the confidence level, though, it's a simple test for validity. All of the major polls use - or claim to use - similar methodologies and demographic weighting, with the exception of party affiliation weighting. Some of these groups insist that party affiliation is not a static demographic, and therefore should not be weighted at all, so for here we will use their logic in applying the numbers. The polls all claim a 95% confidence level. In statistics, they are saying that if the same method is used, polls should produce results within the margin of error 19 times or more out of every 20 polls. So, it should not be difficult to test that claim.

Here are the polls listed at Real Clear Politics for the last ten days (where a poll has been done more than once in that period, the most recent results are used) . I am listing these in descending order of support for Barack Obama, then in support for John McCain, noting a 3% claim for MOE and how many polls agree or disagree with the stated poll:

Pew Research - Oct 26 - Obama 53% (agree 8, disagree 4) FAIL
Newsweek - Oct 23 - Obama 53% (agree 8, disagree 4) FAIL
ABC News/WaPo - Oct 29 - Obama 52% (agree 9, disagree 3) FAIL
CBS News/NYT - Oct 29 - Obama 52% (agree 9, disagree 3) FAIL
Rasmussen - Oct 30 - Obama 51% (agree 11, disagree 1)
Gallup (Expanded) - Oct 29 - Obama 51% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - Obama 50% (agree 12, disagree 0)
Gallup (Traditional) - Oct 29 - Obama 50% (agree 12, disagree 0)
Ipsos/McClatchey - Oct 27 - Obama 50% (agree 12, disagree 0)
GWU/Battleground - Oct 30 - Obama 49% (agree 10, disagree 2) FAIL
Diageo/Hotline - Oct 29 - Obama 48% (agree 8, disagree 4) FAIL
IBD/TIPP - Oct 29 - Obama 48% (agree 8, disagree 4) FAIL
FOX News - Oct 29 - Obama 47% (agree 6, disagree 6) FAIL

- continued -

Rasmussen - Oct 30 - McCain 47% (agree 7, disagree 5) FAIL
GWU/Battleground - Oct 30 - McCain 45% (agree 9, disagree 3) FAIL
Gallup (Traditional) - Oct 29 - McCain 45% (agree 9, disagree 3) FAIL
Ipsos/McClatchey - Oct 27 - McCain 45% (agree 9, disagree 3) FAIL
FOX News - Oct 29 - McCain 44% (agree 11, disagree 1)
Gallup (Expanded) - Oct 29 - McCain 44% (agree 11, disagree 1)
ABC News/WaPo - Oct 29 - McCain 44% (agree 11, disagree 1)
IDB/TIPP - Oct 29 - McCain 44% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - McCain 43% (agree 10, disagree 2) FAIL
Diageo/Hotline - Oct 29 - McCain 42% (agree 10, disagree 2) FAIL
CBS News/NYT - Oct 29 - McCain 41% (agree 8, disagree 4) FAIL
Newsweek - Oct 23 - McCain 41% (agree 8, disagree 4) FAIL
Pew Research - Oct 26 - McCain 38% (agree 2, disagree 10) FAIL

Note that every polling agency fails one side or the other of this validity test. Every one of them.

But let's move on. We can look at the RCP averages from one of two perspectives. The RCP folks take the polls from the last week by polling date (not release date) and average them. That gives a claim that Obama is leading McCain 49.7% to 43.8%, with a 3 point MOE. If we extend that back to polls taken October 20 or later, then it becomes Obama 50.3%, McCain 43.3%. So, RCP's national polls, if aggregated as they like it, show a 5.9% lead or a 7.0% lead.

OK, now let's take a look at the RCP state polling. There are dozens of polling groups which have put out state polls, and I cannot speak here to their total authenticity. That, of course, is also a problem with some of the national polls, but for consistency we can use the RCP numbers. Now, if each state's aggregate claimed level of support for Obama or McCain is applied to the state's proportional level of the national vote (using 2004 voting statistics), we find that if the state aggregations are right for RCP's state averages, plugging those numbers in gives Obama 46.9% of the popular vote, to 43.9% for McCain. The aggregation of the state polls, is we are going to accept them as valid, shows that the national polls are overstating Obama's support. Once again, a simple check for validity shows that the confidence level test fails for the national polls.

One last thing. The state polls have assumed a significant shift from 2006 towards increased democratic participation, but even if that happens, the state polling indicates that Obama will still fail to reach 50% support. If those polls are reweighted according to 2006 turnout proportions and then plugged in to project national numbers, it becomes Obama 46.3% and McCain 47.1%, with 6.6% undecided. Take from that what you will.

Great analysis, DJ. One th... (Below threshold)
Randy R:

Great analysis, DJ. One thing I've noticed is that even though the national polls show Obama with around a five point lead, McCain is losing in many of the battleground states that went for Bush in 2004, according to RCP. Even in his home state of Arizona McCain only has a small lead according to state polling. If there really has been that large a shift in key red states from 2004 it seems like Obama would have a much larger national lead. Something doesn't smell right.

Gallup Expanded appears to ... (Below threshold)
NewEnglandDevil:

Gallup Expanded appears to pass both, but with every other poll failing, I'm not sure what exactly you could accurately compare Gallup to (IOW, it doesn't seem right to call one poll "accurate" by comparing it to several polls that are demonstrated to be inaccurate).

Gallup (Expanded) - Oct 29 - Obama 51% (agree 11, disagree 1)
Gallup (Expanded) - Oct 29 - McCain 44% (agree 11, disagree 1)

NED

I have always been suspicio... (Below threshold)
Alan Orfi:

I have always been suspicious of the "undecided" vote. If one has not been able to make their selection at this point in time, it is evident that they are not likely to vote for Obama. After all, this election is primarily a referendum on Obama because the McCain campaign has been all over the place on many issues. I think Obama has "maxed out" his number of supporters at this time and the primary question is whether they will all turn out. McCain's support is still growing because he is the default pick of the "undecided" who have obviously not been able to stomach Obama. I think McCain wins this 50% to 48%.

Side note: As a Floridian, I cannot fathom Obama taking this state as the polls suggest. I'm out in many communities day after day and I just don't see this supposed majority for Obama. I think McCain voters are simply a lot quieter because we are not doing backflips for him... but we will all turn out to defeat socialism.

Something fun to think abou... (Below threshold)
cirby:

So, we have all of these polls. They base them on (among other thing) voter registration levels in different regions.

ACORN shows up, and "registers" a helluva lot of people - 13 million or so. The majority of these "new" registrations are Democrats.

And they don't exist.

How much of that lead is imaginary, due to overestimation of the number of Democratic voters in heavily-canvassed states?

Early voting in Florida is really, REALLY not showing the number of Dem voters they expected. I wonder how many other state voting officials are noticing similar trends, and have warned their unofficial bosses that the election might go the other way...

Aside from a truly staggering and unmissable amount of true vote fraud, they might not be able to catch up in time.

Adding, it shouldn't be a s... (Below threshold)
NewEnglandDevil:

Adding, it shouldn't be a surprise that a single poll would be found that hits the mean/median, thus, "passes".

NED

Unfortunately I am acutely ... (Below threshold)

Unfortunately I am acutely allergic to statistics. My eyeballz are so swollen after the first two paragraphs I am having to rely on touch typing for a while. This may be fun 4 U but itz torture 4 me.

The problem for Gallup, NED... (Below threshold)
DJ Drummond:

The problem for Gallup, NED, is that they are the same agency which produced the Gallup (traditional) numbers, and so they fall out on that count. I call it the 'flat tire' theory of validity - even if the rest of the tires are fine, you cannot ignore the flat one.

JFO:

DJ

I've been reading your posts on the polls for a few weeks now and found them very interesting.. I've said before I claim no expertise in the area.

However, it seems to me that all the polls have Obama ahead nationally and almost all the polls in key states such as Ohio and Pa have Obama ahead.

With due respect (no snark) do you think you may be in denial about this? I don't know how you get past the sheer number of polls which have Obama ahead nationally and in some of those key states.

A friend of mine, as passionate a liberal as I am, said a very interesting thing to me a few weeks ago and I think he's right on. His perception is that both sides are fearful that the other is going to win and that it would be a good thing to accept that and to understand it when interacting with one another. So, I just wonder if your position is really being driven by that fear. No need to answer that and maybe I'm off base about it. Just an observation as I don't know see how you get past the unanimity of the polls favoring Obama.

JFO,I suspect that... (Below threshold)
NewEnglandDevil:

JFO,

I suspect that DJ's response would be something along these lines: He doesn't know what is going to happen, all he is pointing out is that the polls are not accurate and do not agree with one another within their own stated margin of error. He is not saying that McCain will win. He is saying that the polls can't tell us what is happening.

Obama could win by a margin that matches (within MOE) one of the polls. While that would certainly disprove all the polls that called the race outside their MOE's, it doesn't necessarily validate the poll that gets it "right" either. It would be like shooting at the election results with a shotgun - one of the BBs might hit it.

NED

Guess I ought to just not b... (Below threshold)
JLawson:

Guess I ought to just not bother to vote for McCain then by JFO's reasoning.

Sorry. The only poll that REALLY counts is the one at the ballot box. The results of that one aren't going to be in until very late on the 4th, or early on the 5th.

I am not wild about McCain - but I'm sure as hell not going to vote for someone like Obama who's basically managed to FUBAR his district (Grove Parc), the Annenburg Challenge (\$150 mil spent to no good effect) and the economy (through ACORN's mortgage efforts).

I don't give a damn how well-meaning he might be - he's an incompetent politician who missed his calling as a preacher or used car salesman.

Since McCain's the only candidate who's got a shot at defeating him, that's who I plan on voting for.

Sometimes you might not have anyone you want to vote FOR, but you can usually find someone you want to vote against, and this cycle OBAMA is the ONE!

You make a great point abou... (Below threshold)
Lummox JR:

You make a great point about ACORN, cirby. I wonder if all that organized voter fraud can actually pan out to much in the way of additional votes. Obviously double-voting and other forms of fraud do go on, but unless someone is keeping track of the fraudulent registrations that haven't been discovered yet, and distributing lists to people who intend to commit voter fraud directly by voting as a dead guy or 11-year-old or cartoon character, almost nobody is going to be using those bogus registrations for anything but number padding.

It kind of makes me wonder if all this juicy voter fraud might end up stabbing its own perpetrators right in the back. In attempting to bolster support for their party they invented voters out of thin air, which is kind of like telling a General you've mustered an army only to have him find out on the battlefield that they're mostly cardboard cutouts. It might be enough to daunt the enemy, but it won't withstand a serious charge.

Here again I have to say th... (Below threshold)
RicardoVerde:

Here again I have to say that nationwide polls mean nothing. Obama could easily win millions more votes than McCain and still lose the election.

The field is empty.<p... (Below threshold)

The field is empty.

The crowd is gone.

It's time to put the pom poms down and go home.

No, Adriane, you're just at... (Below threshold)
DJ Drummond:

Again.

You're missing the contest, the fight, the reason for the conflict.

As usual.

Go ahead and take your pom poms home and drink your kool-aid. The adults will tell you next week how it all came out.

I wonder how many of... (Below threshold)
Larry:

I wonder how many of the Liberals here know what is meant by "Kool-aid." Actually, to be correct, it should be "Flavor-Aid."

Anyway, there are several factors that seem to be confusing the pollsters. DJ has identified a number of them from a statistical POV and from the standards that apply to polling which the pollsters either choose to keep to themselves or have ignored.

1. Race Issue; Bradley effect. How many folks are going to admit that they will not vote for a black?

2. Turnout; This is the big one. Obama has put together a really, really GOOD voter turnout machine. The best the Democrats have ever been able to construct, better than 2004 which was the prior high water point for them.

The Republicans were better in 2004.

The question is what will the Republicans do this time? I haven't seen much on this, but I did not expect to see much since those with the resources (read traditional media) have chosen to focus on the positive parts of Obama and ignore the positive for McCain as a point of policy dictated by the Editor class.

To answer one guy, the reason why Obama is so far ahead of McCain in National numbers is simple; California and New York, with help from Oregon and Washington and a couple of other heavily populated Blue states. That doesn't help Obama in the battleground states where the Presidentcy will be won or lost exactly as happened in 2000 and 2004 and every other year for that matter.

There will likely be no blowout. It will be long night; likely. Other than that, both Obama and McCain are running as if each voter counted and for a fact, you do, count.

Hillary would have won this year without breaking into a sweat. No matter how much the far right hate her, she would have wrapped it up by now with a 10-15% margin. If the Dems lose, it is because they once again nominated the wrong guy. Mr. Cool's turn will eventually come, if not this time, maybe in 8 years.

If he wins this time, the Liberal cause will be set back by at least a generation. We live in interesting times.

I'm not a big believer in t... (Below threshold)
Lummox JR:

I'm not a big believer in the Bradley Effect. I think there may be a handful of people who won't vote for Obama based on his race (not remotely 5% though), but there are plenty of other reasons they could give pollsters for supporting McCain. I do however believe the Bradley Effect will be blamed if McCain wins in spite of what the polls are saying, and that will set up the narrative of how America must still be too darned racist to have a black President. It won't be true, but it's such a gift-wrapped cop-out that pollsters can bail out of any responsibility with it, avoid changing their flawed methodologies, and be every bit as wrong in 2012.

I believe the reason we've heard so much about the Bradley Effect lately is a little bit of early bet-hedging by pundits who have at least enough brains to know not to count all their chickens just yet.

