How Our Primary Model Works « Machine Learning Times

2020 Primaries, Analytics, data analytics, Machine Learning, Predictive Analytics, presidential analytics
4451 Views

7 years ago
How Our Primary Model Works

Originally published in FiveThirtyEight, January 9, 2020

Here at FiveThirtyEight, we’ve never built a complete back-to-front model of the presidential primaries before. Instead, in 2008, 2012 and 2016, we issued forecasts of individual primaries and caucuses on piecemeal basis, using polls and demographics. We always thought there were too many complexities involved — how the outcome in one state can affect the next one, for example — to build a full-fledged primary model.

But this year, we’re giving it a shot. We’ve built a forecast that plays out the outcome of the 57 delegate-selection contests (50 states, D.C., five territories and Democrats Abroad) that Democrats will contest this year, simulating polling swings, post-primary “bounces” and candidates dropping out, starting with Iowa on Feb. 3 and ending with the Virgin Islands on June 6. We don’t try to anticipate what would happen in the event of a contested convention or if there are other complications in how delegates are chosen after June 6. But this is still a pretty ambitious project.

Why build a fancy primary model when we hadn’t before? Well, for one thing, there’s actually a lot more data available now than when we launched FiveThirtyEight 12 years ago. The Democratic primaries in 2008 and 2016, and the Republican ones in 2012 and 2016, were all long contests that give us more information on how the latter stages of the primary process play out. Since the current presidential nomination system is a relatively new invention — before 1972, voters had little direct say in how candidates were chosen — the data from these recent elections reduce the degree of difficulty in building a primary model. It’s still pretty hard, but it’s no longer an intractable problem.

Also, I suppose we’re feeling frisky these days. If building a full-fledged primary model presents its share of challenges — some of which I’ll describe here — there are also plenty of problems with publishing a half-assed forecasting product. (Meanwhile, trying to navigate our way through the primaries without any sort of forecasting product would present bigger challenges still.)

Before I run through the steps the model takes, here are a few key things to keep in mind. Even if you read nothing else about our model, please do read these. They’ll likely answer a few questions — or complaints — that you might have later on.

Our model is a forecast; it is not an estimation of what would happen in an election held today. Forecasted results in later states reflect “bounces” from earlier states and other contingencies. For example: Upon launch, our model gives former Vice President Joe Biden only about a 60 percent chance of winning Delaware, his home state. Why only 60 percent? Isn’t Biden hugely popular there? Well, yes. And Biden would almost certainly be a massive favorite in Delaware if it were the first state to vote. But in reality, Delaware votes relatively late in the process, on April 28. And there’s the chance that Biden will have dropped out by that time, or that his campaign will otherwise be severely diminished. The model accounts for these possibilities.
Our forecast is probabilistic. The degree of uncertainty in the primaries is high, and the process is path-dependent and nonlinear. The nomination process consists of layers of uncertainty piled on top of one another. Just looking ahead at January and February, for instance, there’s the chance the race could shift in the final few weeks before Iowa. Then Iowa itself is not very easy to forecast. Then whatever happens in Iowa will have uncertain effects on New Hampshire. And so forth.

But it’s not as though we’re totally in the dark, either. Candidates who poll well in the run-up to the primaries are much more likely to win the nomination than those that don’t. If you hear things like “the primaries are unpredictable,” what does that mean, exactly? Does it mean that former Rep. John Delaney and author Marianne Williamson are as likely to win the nomination as Biden and Sen. Bernie Sanders? If that’s what you think, you know where to find me for a friendly wager.

In other words — like most things in life — the primaries exist somewhere along the spectrum between predictable and unpredictable. The model’s job is to sort all of this uncertainty out. And we encourage you to take probabilities we publish quite literally. A 60 percent chance of a candidate winning a particular state means that she’ll win it six out of 10 times over the long run — but fail to do so four out of 10 times. Historically, over 10 years of issuing forecasts, the probabilities that FiveThirtyEight publishes really are quite honest, i.e. our 60 percent probabilities really do occur about 60 percent of the time. With 57 primaries and caucuses to come, there will probably be some big upsets, and it’s likely that a candidate with a 5 percent chance or a 2 percent chance or even an 0.3 percent chance of winning a state will surprise us somewhere along the line.

Because of the path-dependent nature of the primaries — events in one state can affect the results in the next ones — the probability distributions our model generates can be pretty weird-looking. For instance, as of Jan. 7, here’s the range of possible outcomes that our model shows for Sanders in Ohio:

What’s going on here? Why the concentration of outcomes near zero percent? Those cases represent the chance — about one in three, our model figures — that Sanders will drop out at some point before Ohio. If he hasn’t dropped out at that point, Sanders figures to do decently well, on the other hand, most likely winning somewhere between 15 percent and 35 percent of the vote. But there’s also the chance that Sanders will be just one of two or three major candidates left by the time Ohio votes. If that’s the case, Sanders could win 50 percent or 60 percent of the vote there, or more. When you see the probabilities in our model, remember that they reflect this variety of possibilities.

Our model forecasts the chance of winning the plurality and majority of pledged delegates — it’s not a forecast of the nomination per se…

To continue reading this article click here.

EXCLUSIVE HIGHLIGHTS

Related

7 years ago
How Our Primary Model Works

Originally published in FiveThirtyEight, January 9, 2020

2 thoughts on “How Our Primary Model Works”

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

7 years agoHow Our Primary Model Works

Originally published in FiveThirtyEight, January 9, 2020

Recommended

The AI jobs apocalypse probably isn’t coming anytime soon

What will be left for us to work on?

How I’m Making Sure My Analytics Career Doesn’t Get Eaten by AI

Up the Stack: How AI’s Escape From the Commodity Trap Risks Enterprise Lock-in

2 thoughts on “How Our Primary Model Works”

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

7 years ago
How Our Primary Model Works

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact