MN Senate Election Targeting 2020

Note

As a note, the majority of this blog post was written before the murder of George Floyd. While this post seeks to explore the electoral impact of demographics in Minnesota broadly, I am not going to try to predict how these events may affect the coming election. If you support data-driven solutions to real world problems, consider donating to Campaign Zero.

Introduction

This analysis will focus on which districts should see the most investment in a democratic strategy for the Minnesota State Senate, as well as a first draft of a predictive model for the 2020 Minnesota state senate elections. Data should certainly inform the decision making around elections, but in cases of high uncertainty, like elections, data should not be a stand in for sound strategy, based on reasonable assumptions. This is important to understanding this model and paper, as it currently incorporated demographic data, but does not include current estimates of public sentiment or a likely voter turnout model for the 2020 election.

The code for this project can be found in the following github repository: https://github.com/tajubenv/Minnesota_Election_Analysis.

Data Sources

Data for this post was pulled from the Minnesota secretary of state elections website, https://www.sos.state.mn.us/elections-voting/election-results for election results. Demographic data was pulled from the 5 year American Community Survey (ACS) Data https://www.census.gov/programs-surveys/acs/data.html using the tidycensus in R. The ACS data has varying levels of depth between 1, 3, and 5 year datasets. The 5 year data is the only survey that contains State Senate level data for Minnesota. The 5 year estimates are calculated based on data within the 5 preceding years of the date given on the data. Thus when data for the 2016 ACS is estimated, the estimate includes data from 2012-2016. This poses the obvious issue that it is desirable to understand the demographics on election day, rather than an estimate for the previous 5 years. This analysis is focused more on strategy and prediction, rather than inference, so it is most appropriate to use resources that will be available before the election in November. As a result, the ACS estimates will be examined for the year they end, rather than offset in any manner.

Methods

Data was collected from the ACS 5 year survey for 2010, 2012, and 2016 for all counties that had a Democratic-Farmer-Labor Party candidate running for state senate in those same years. Specific demographic data included median income, median age, and proportions of the county that were White, Black, Native American, Asian or Other as defined by the ACS data. Proportion of Hispanic descent was also included. All proportions for this model are proportions of the voting age population.

After data collection and cleaning, a logistic regression was performed to predict the likelihood of a democratic victory within a MN senate district, based on demographic characteristics of the district. In addition, linear models were used to project the same demographic information for the 2020 election across the state. The model was then used to analyze potential districts for democratic campaigns to target in 2020.

Results

Summaries for the demographic data included in the regression model are shown in the table below.

	year: 2010 (N = 65)	year: 2012 (N = 67)	year: 2016 (N = 67)
winner
minimum	0.00	0.00	0.00
median (IQR)	0.00 (0.00, 1.00)	1.00 (0.00, 1.00)	0.00 (0.00, 1.00)
mean (sd)	0.46 ± 0.50	0.58 ± 0.50	0.49 ± 0.50
maximum	1.00	1.00	1.00
median_income
minimum	32,779.00	36,227.00	40,444.00
median (IQR)	57,115.00 (44,799.00, 69,189.00)	59,081.00 (47,921.50, 74,187.50)	63,232.00 (52,185.50, 77,299.50)
mean (sd)	58,263.58 ± 15,159.17	61,079.57 ± 15,698.15	65,153.85 ± 16,239.96
maximum	91,156.00	94,043.00	101,286.00
median_age
minimum	27.30	27.60	27.30
median (IQR)	37.10 (35.10, 40.30)	37.90 (35.10, 41.35)	38.70 (35.95, 41.55)
mean (sd)	37.37 ± 4.48	37.77 ± 4.42	38.37 ± 4.36
maximum	46.70	46.50	47.00
white_prop
minimum	0.47	0.49	0.48
median (IQR)	0.92 (0.86, 0.95)	0.92 (0.85, 0.96)	0.90 (0.84, 0.95)
mean (sd)	0.88 ± 0.11	0.88 ± 0.10	0.87 ± 0.11
maximum	0.98	0.97	0.97
black_prop
minimum	0.00	0.00	0.00
median (IQR)	0.02 (0.01, 0.06)	0.02 (0.01, 0.06)	0.03 (0.01, 0.06)
mean (sd)	0.05 ± 0.07	0.04 ± 0.06	0.05 ± 0.06
maximum	0.35	0.33	0.34
native_prop
minimum	0.00	0.00	0.00
median (IQR)	0.00 (0.00, 0.01)	0.00 (0.00, 0.01)	0.00 (0.00, 0.01)
mean (sd)	0.01 ± 0.02	0.01 ± 0.02	0.01 ± 0.02
maximum	0.11	0.11	0.12
asian_prop
minimum	0.00	0.00	0.00
median (IQR)	0.02 (0.01, 0.05)	0.03 (0.01, 0.06)	0.03 (0.01, 0.06)
mean (sd)	0.04 ± 0.04	0.04 ± 0.04	0.04 ± 0.04
maximum	0.19	0.19	0.25
other_prop
minimum	0.00	0.00	0.00
median (IQR)	0.01 (0.01, 0.02)	0.01 (0.00, 0.01)	0.01 (0.00, 0.02)
mean (sd)	0.01 ± 0.01	0.01 ± 0.01	0.01 ± 0.01
maximum	0.06	0.05	0.10
hispanic_prop
minimum	0.01	0.01	0.01
median (IQR)	0.04 (0.02, 0.06)	0.03 (0.02, 0.06)	0.04 (0.03, 0.06)
mean (sd)	0.05 ± 0.04	0.05 ± 0.04	0.05 ± 0.04
maximum	0.24	0.23	0.23
year
minimum	2,010.00	2,012.00	2,016.00
median (IQR)	2,010.00 (2,010.00, 2,010.00)	2,012.00 (2,012.00, 2,012.00)	2,016.00 (2,016.00, 2,016.00)
mean (sd)	2,010.00 ± 0.00	2,012.00 ± 0.00	2,016.00 ± 0.00
maximum	2,010.00	2,012.00	2,016.00

There are clearly some areas for concern with modeling this data. The demographic proportion variables have high collinearity, which could make the estimates unreliable or unstable. In this case, removing multiple proportions did not have a large effect on model predictions, so they were included. Results for the logistic regression are shown in the table below. The only variable with a non-significant effect is the proportion of Hispanic descent within the district.

term	estimate	std.error	statistic	p.value	exp_estimate	CI_lower	CI_upper
(Intercept)	329.66	75.33	4.38	0.00	1.484003e+143	2.858017e+110	7.705571e+175
median_income	0.00	0.00	-3.83	0.00	1.000000e+00	1.000000e+00	1.000000e+00
median_age	0.13	0.06	2.29	0.02	1.140000e+00	1.080000e+00	1.210000e+00
white_prop	-334.59	75.84	-4.41	0.00	0.000000e+00	0.000000e+00	0.000000e+00
black_prop	-334.56	85.30	-3.92	0.00	0.000000e+00	0.000000e+00	0.000000e+00
native_prop	-369.58	88.29	-4.19	0.00	0.000000e+00	0.000000e+00	0.000000e+00
asian_prop	-305.23	75.00	-4.07	0.00	0.000000e+00	0.000000e+00	0.000000e+00
other_prop	-387.25	91.32	-4.24	0.00	0.000000e+00	0.000000e+00	0.000000e+00
hispanic_prop	15.03	13.69	1.10	0.27	3.377433e+06	3.830000e+00	2.978202e+12

Strategy

The development of a model to help predict outcomes is useful to help put specific races in context and provides a lens for targeting races. The most obvious strategy however is to simply examine the close races from 2016.

Vulnerable Democratic Districts

The following 10 districts are those that democrats won narrowly in 2016, ranked by the raw difference between the top two candidates.

District	Democratic Candidate	Democratic Votes	Percent DFL	Republican Candiate	Republican Votes	Percent R	Margin of Victory
58	Matt Little	22833	50.38	Tim Pitcher	22446	49.53	387
53	Susan Kent	23035	50.38	Sharna Wahlgren	22636	49.51	399
36	John Hoffman	21793	51.00	Jeffrey Lunde	20840	48.77	953
48	Steve Cwodzinski	24303	51.10	David Hann	23205	48.79	1098
37	Jerry Newton	22129	51.41	Brad Sanford	20838	48.41	1291
54	Dan Schoen	22162	53.13	Leilani Holmstadt	19480	46.70	2682
57	Greg Clausen	24519	53.06	Cory Campbell	21633	46.81	2886
11	Tony Lourey	20519	54.50	Michael Cummins	17079	45.36	3440
27	Dan Sparks	20540	54.76	Gene Dornink	16944	45.17	3596
51	Jim Carlson	24358	54.04	Victor Lake	20662	45.84	3696
Total:	-	226191	NA	-	205763	NA	20428

Vulnerable Republican Districts

The following table shows vulnerable Republican districts using the same methodology as the table above.

District	Democratic Candidate	Democratic Votes	Percent DFL	Republican Candidate	Republican Votes	Percent R	Margin of Victory
14	Dan Wolgamott	17378	47.02	Jerry Relph	17519	47.40	141
44	Deb Calvert	25114	49.74	Paul Anderson	25309	50.13	195
5	Tom Saxhaug	19687	49.21	Justin Eichorn	20240	50.59	553
20	Kevin L. Dahle	20577	47.95	Rich Draheim	22274	51.91	1697
21	Matt Schmit	19282	45.67	Mike Goggin	22901	54.24	3619
56	Phillip M. Sterner	19178	44.75	Dan Hall	23602	55.07	4424
26	Rich Wright	18317	43.95	Carla Nelson	23325	55.96	5008
2	Rod Skoe	17002	43.29	Paul Utke	22232	56.60	5230
32	Tim Nelson	18388	43.33	Mark Koran	23992	56.53	5604
17	Lyle Koenen	16713	42.67	Andrew Lang	22421	57.25	5708
Total:	-	191636	NA	-	223815	NA	32179

Projections

The model was used to create predictions for projected 2020 demographics. A summary of the projections vs reality are shown in the table below. The model clearly shows an under performance in 2016 based purely on demographic factors. Interestingly, the model predicts the same number of seats in both 2020 as 2016. There are obviously many electoral factors that will be different in 2020 compared to 2016. Actually predicting the results will clearly not be possible with this dataset, but it can be used to guide decision making.

year	Predicted Democratic Seats	Actual Democratic Seats
2010	28	30
2012	33	39
2016	40	33
2020	40	NA

To use the model for district targeting, districts that Democratic candidates lost, but had favorable demographics are shown below:

District	Percent DFL	Percent R	Margin of Victory	Modeled Win Probability (%)	Voter Turnout (%)
14	47.02	47.40	141	87.67	57.42
26	43.95	55.96	5008	65.65	66.50
56	44.75	55.07	4424	63.69	69.06
55	31.24	68.53	15850	57.27	70.63
2	43.29	56.60	5230	54.69	66.44
5	49.21	50.59	553	54.38	63.38
10	35.56	64.31	12483	54.01	69.72
1	38.54	61.41	8607	53.37	62.22
9	28.71	71.19	16558	50.31	65.46
22	29.72	70.20	14859	41.36	62.19

The most important information that can be gleaned from this table are the districts that did not appear in the “close loss” table above. These represent districts that would not have been identified as winnable simply from vote totals. These districts are 1, 9, 10, 22, and 55. There are likely other factors that play an important part within these districts as outliers, but they are still areas that are demographically favorable to democrats.

Visualizations

While working through this data, I thought it was important to visualize the results to challenge my assumptions and ensure the results appeared consistent. I may work this into a full shiny app in the future, but for now it will remain as the separate visuals below:

plot of chunk unnamed-chunk-15

Limitations

There are a lot of clear limitations with the current data. To begin with, the data does not include encumbent information, or information on the presence of third party candidates. The ACS estimates are for 5 year periods, so they are likely not truly representative of the district on election day, particularly in districts that undergo rapid demographic change. There is also no inclusion of polling data, which could serve to analyze likely voters and also general trends that are not shown in demographic data.

Regarding the modeling, there are several key issues. The projections are simple linear projections from the ACS data, and thus have a high degree of uncertainty. In addition, there is no voter turnout model within the current iteration for this model. Multicollinearity could also be a problem with all of the demographic proportions included within the model. Finally the logistic regression may not be the most appropriate model for this problem. Other methods may result in better predictive power.

Future Work

There is clearly a lot of room to expand this analysis. To begin with I plan on branching this project out in 2 ways. I want to expand this analysis to include the Minnesota House of Representatives. Combining this analysis should allow for a greater level of detail and theoretically better projections. In addition, I would like to spend another post going into deeper depth on building the most predictive model possible for Minnesota elections. To do this, I will address as many of the aforementioned limitations possible and explore other modeling techniques with this data. Finally, once a more complete dataset and model are in place, I would like to develop a shiny app for interactive visualization.

Share on

Twitter Facebook LinkedIn

Tyler Jubenville