We need to talk about oversamples.

Correcting some increasingly common misperceptions about polls.

Sep 19, 2023

Let’s get this out of the way: polls conducted today, in September of 2023, are not highly predictive of how a presidential election in November of 2024 might turn out. Between now and then, we have at a minimum the following things that could or will occur:

The entire Republican primary process, which though it feels like it has been raging for months to those of us who follow this stuff for our jobs has really not begun in earnest for the vast majority of primary voters (most of whom aren’t casting ballots for many, many more months).
Developments, hearings, perhaps even full-on trials in any of the four cases in which former President Trump is facing criminal charges, to say nothing of the cases facing his associates in various jurisdictions.
An impeachment inquiry led by House Republicans into President Biden over whether or not he benefitted from his son Hunter’s business dealings with foreign entities during the time he was Vice President.
Two party conventions, billions of dollars spent on television and digital advertising, and an actual presidential campaign.

This is to say nothing of the possible “X Factors” such as “President Biden faces a health scare and decides to step aside,” “Major natural disaster/pandemic/terrorist attack occurs,” and so on.

Despite all of this, the polls you see pop up on TV or on podcasts, the polls that someone like me will produce or cite in my writing, still have value. They give us a benchmark against which we can measure future polling movement. If we treat them as the imperfect snapshots-in-time that they are rather than some predictive crystal ball, they can actually tell us a lot about how Americans are feeling and why they are feeling it, which tells us why campaigns might make certain decisions or focus on particular issues.

None of this, however, has stopped people on the internet from being very upset about the polls out there right now. People tend to get most upset about polls when the polls say X and their own personal experience or reason tells them X cannot possibly be true, Y must be true. In recent years, I have grown mostly accustomed to cries that the polls are skewed or are being intentionally framed to disadvantage Republicans. Lately, however, it has been the other side getting agitated, largely due to a string poll findings that are consistently fairly grim for Joe Biden.

The polls I am here to explain today are not my own polls. They are polls conducted by friends, colleagues and competitors. There are plenty of legitimate things to criticize about polls these days, but there are two things that have come under fire that I think are worth better explaining.

What Substack’s AI image generation produced from the prompt “A group of people with too many Republicans in it”.

Polling Complaint #1: Oversamples

When a pollster does a survey, they often pick an overall target universe they want to study. For most 2024 surveys, right now, that target universe is going to be something like “registered voters” (anyone who is registered to vote and eligible to participate in the 2024 election) or “likely voters” (someone who is either predicted to be likely to vote based on a modeled predictive score, or someone who reports themselves to be highly motivated, etc).

Because there are a lot of people who might vote in a general election who will not be allowed to or choose to participate in their state’s presidential primary process, the subgroup of voters in any poll who are “likely primary voters” is going to be much smaller, and even more tiny if looking at just one party. In 2016, for instance, Pew Research Center estimated that around 15% of eligible voters participated in the Republican primary.

If you did a survey of 1000 registered voters today, and you expect the 2016 primary turnout to hold, you would expect that poll to have a subsample of just north of 150 truly “likely Republican primary voters”. You might get a slightly larger number because people overestimate their own likelihood to vote, or maybe you count all Republicans in your poll as potential primary voters to intentionally cast a wide net, but nevertheless, let’s say you get 150. There’s a pretty hefty margin of error associated with that number. If you actually want to understand what’s going on in the GOP primary, you need more power.

That, combined with things like the RNC’s directive that polls must include a whopping 800 likely Republican primary voters in order to “count” for debate qualification means a lot of polls are really trying to go above and beyond to have a robust look at the primary. They are intentionally surveying far more Republicans than you would naturally find just through random sampling.

At first blush, that doesn’t sound like it bodes well for your overall results, right? If my poll is only supposed to naturally have 150-200 likely primary voters and the pollster is out here dialing and dialing and dialing until they hit 800, isn’t that a problem?

If that were the end of the story, yes. Thankfully, it is not. Take a poll like CNN’s, which has been the subject of some pushback. (Disclaimer: I am a CNN political contributor, though I do not work on their polling team.)

The initial methodology statement focuses on some basics: how many people were surveyed, when they were surveyed, what the oversample is.

Surveys were obtained August 25-31, 2023 with a representative sample of n=1,503 respondents, including an oversample of Republicans and Republican-leaning independents to reach a total of 898 Republicans and Republican-leaning independents.

If you aren’t familiar with how polling works, you might read that and assume this poll has a sample that is nearly 60% Republican. Yikes! But it also includes the following statement:

For more information about other trend data included in this topline, please see the “More on Methodology” section at the end of this document.

Aha! A clue! There is, in fact, more to this story. What do we find in this More on Methodology, anyway?

Among the entire sample, 32% described themselves as Democrats, 32% described themselves as Republicans, and 35% described themselves as independents or members of another party. The entire sample was weighted to reﬂect national Census ﬁgures for gender, race, age, education, region of country, population density, civic engagement, and Pew Research Center's NPORS ﬁgures for religious affliation and frequency of internet use. Demographics within party identification and the accurate proportion of the Republican oversample were weighted based on prior CNN benchmarking within the overall population.

In short, through the process of weighting, CNN makes each of those Republican interviews “count” in the overall result for the correct proportion. They weight that oversample down significantly when reporting overall results so that the number of Republicans matches benchmarks set in previous, non-oversampled polls. This is a very good practice, and it lets you have your cake AND eat it too: to get a really powerful look at a small subgroup while still preserving the ability to have a representative overall result.

This misunderstanding of oversamples is understandable, but it is nevertheless an misunderstanding and it has even taken hold among experienced political reporters in some places. Vanity Fair’s Molly Jong-Fast and National Review’s Ramesh Ponnuru sparred over this yesterday, and Ponnuru has it right when he cites Pew’s Andrew Mercer on this, explaining in great detail how an oversample works.

Oversamples are great, they are not a sign a poll is done badly, and you should always read the full methodology statement before you try to throw a poll in the trash bin on methodological grounds.

Extremely great return from Substack’s AI image generation from the prompt “Someone villainous trying to mislead others with survey data”

Polling Complaint #2: Partisan Pollsters Doing Media Polls

The vast majority of polls that Americans see covered in the news are media or academic institution polls. Quinnipiac, Marist, Siena, UNF, Emerson, these are all universities with polling institutes. Sometimes, like in the case of the exceptionally good Siena/New York Times partnership, they buddy up and collaborate.

The vast majority of political polls that Americans don’t see are done by pollsters working directly for candidates or political organizations directly. Occasionally, these polls will leak their way into the public, but they are generally done in service of helping a candidate get situational awareness and to develop a smart strategy. They test messages, they gauge favorables, they try to figure out where the race stands today and where it could go if X, Y, and Z were to occur.

What seems to be causing some consternation these days is when one or more of these pollsters who typically work for private clients also do a poll on behalf of a nonpartisan media organization. Fox News’ well-regarding polling outfit has worked like this for years, relying both on its own bright internal team of folks like Dana Blanton as well as outside pollsters Daron Shaw and Chris Anderson. For quite a while, the NBC/Wall Street Journal’s polling partnership also worked this way, bringing together the Democratic firm Hart Research Associates with Republican firm Public Opinion Strategies to execute their poll.

After the 2020 election, NBC and WSJ parted ways and now each execute their own polls. WSJ’s team includes a bipartisan team as well, these days with firm GBAO representing the Democratic side and Fabrizio Lee on the Republican side. (Another disclosure: The “O” in GBAO is Margie Omero, my dear friend and for a long, long time my podcast co-host at The Pollsters.)

Well, the most recent WSJ poll found a lot of tough news for President Biden and a lot of good news for Donald Trump. This caught the attention of a variety of Biden’s fans online, most notably an account called “Mueller, She Wrote”.

A lot to unpack here, but it appears this tweet is engaging in the aforementioned oversample confusion as well as ignoring the GBAO logo that starts the chain of three logos. Steve Kornacki, bless him, decided to try to go on a crusade on Twitter (X?) to correct the record and defend the honor of the WSJ poll’s use of a bipartisan team, but it seemed to fall on somewhat deaf ears. The FiveThirtyEight podcast broke this all down well. Polling Data Nerd Avengers assembled. That did not, however, end the discussion. “Mueller, She Wrote” responded:

They also imply that I don’t realize a dem pollster called GBAO was also part of the WSJ poll…I am well aware of that. But GBAO didn’t stand out to me because they weren’t in the Mueller report or the Senate Intelligence Committee report on Russian interference.

What? Sorry Margie, you weren’t…in the Mueller Report? I don’t even know what to do with this.

Here’s the somewhat crazy part: I actually think there is a legitimate complaint to make about the pollster team makeup of the WSJ poll! I think highly of Tony Fabrizio’s skill as a pollster and I do not think he or his firm are trying to put a thumb on the scale, nor do I think his counterweights at GBAO would let him do so for an instant even if he wanted to, but I do think that if you are doing the polling for a candidate in a race you probably shouldn’t also be a media outlet’s chosen pollster to also poll that same race.

But the benefit of using pollsters who are in the arena, doing polls for active candidates on a live political battlefield, is real: it means you get a peek into what those pollsters, whose work is normally tightly under wraps, might be showing. You get the benefit of the sorts of questions a campaign pollster might be asking. It does produce a poll that has the potential to be very interesting, with a questionnaire that gives you the sorts of insights a real campaign would be looking at.

The Bottom Line

If you don’t want to look at the polls today, or you think they’re garbage, you definitely don’t have to. But I think it is important to have a little more context for why highly regarded pollsters, particularly media pollsters, do some of the things that they do that may seem perplexing to the untrained eye.

Thank you for reading Codebook by Kristen Soltis Anderson. This post is public so feel free to share it.

Kelly D Johnston

This is a great post, Kristen, thank you. You educate a great many people with simplicity and clarity around the challenges pollsters face. While I'm not a pollster, I've worked with several over the past several decades in some three dozen US House and Senate races, and you ignore the fact that a small number of pollsters manufacture surveys favorable to one party over another to help candidates with fundraising and party support, or recruit certain people to run. Smart people see through it with a simple look at the crosstabs and demographics. I've personally experienced it. And guess what - the media highlights those "surprising" polls that show unusual support for candidates (especially Democrats). I've seen polls that over/undersample certain party IDs or conduct surveys on Friday and Saturday nights that tend to favor D's. Many Americans sense this and, as a result, reject polling. Woke University polls are among the worst. Elon University smartly gave up their polling when they're results consistently favored Democrats, and misled the public on the status of elections. If only other schools had such integrity. And then there's the WaPo/ABC poll in Wisconsin just before the 2020 election that had Biden beating Trump by 17 points. It turned out, a few days later, to be .6 percent. Damage done. It would be nice to see the polling industry police their own and get rid of these bad actors.

Expand full comment

Blaise Bleakley

Assuming the RNCs 800 minimum primary vote requirement is to have a large enough sample to trust the polls results, why even call registered democrats at all? If it’s to collect data on how non likely republican voters feel, doesn’t that data need to be sufficiently large as well?

It seems like the weighing technique is just used to circumvent the spirit of the RNCs minimum because pollsters believe they can get accurate data with smaller sample sizes? If that’s true then why isn’t the RNC listening to the pollsters that work for them in regards to required sample sizes?

My head’s spinning, but it seems that this technique purposely skews the data and then later on un-skews it. If the goal of the RNC is to have accurate polling data, it seems like skipping those two steps would result in something more accurate.

Very interesting piece as always!

Codebook by Kristen Soltis Anderson

We need to talk about oversamples.

Correcting some increasingly common misperceptions about polls.

Polling Complaint #1: Oversamples

Polling Complaint #2: Partisan Pollsters Doing Media Polls

The Bottom Line

Discussion about this post