How We Analyzed Literacy and Voter Turnout


Sign up for ProPublica’s User’s Guide to Democracy, a series of personalized emails that help you understand the upcoming election, from who’s on your ballot to how to cast your vote.

This story was co-published with Gray TV.

One in five Americans struggles to read English at a basic level, and without the necessary reading and writing skills, everyday tasks can be insurmountable. The routine challenges of low literacy take a toll on individual livelihoods as well as this country’s collective democracy. For people who struggle to read, the electoral process can become its own form of literacy test — creating impenetrable barriers at every step, from registration to casting a ballot.

Our reporting has found that helping people with low literacy skills to read their ballots at the polls enables them to understand what and who they are voting for and ensures that their votes are counted. But decades of voter suppression — particularly in the South — have made this kind of assistance difficult to access in many communities.

We wanted to better understand the relationship between literacy and voter turnout. For decades, academic and political researchers have studied the factors that influence voter participation, including the impact of educational attainment on whether people vote. But literacy skills are less commonly featured in elections research. There are few reliable databases documenting literacy rates in the United States. But in recent years, the National Center for Education Statistics has released more granular information on literacy levels, including estimates of reading skills at the county level. Using this data, we can ask more pointed questions about how literacy skills correlate with voter participation.

Data Sources

To understand voter participation rates across the country, we acquired county-level turnout data for the 2018 midterm election and the 2016 and 2020 general elections from Dave Leip’s Atlas of U.S. Elections. The election data comprises vote counts for more than 3,100 counties across the country. We compared the raw county vote totals with the citizen voting-age population in each county, drawn from U.S. Census Bureau estimates, for each election year. The citizen voting-age population includes all people 18 and older who are native or naturalized citizens, and the number is frequently used as a base figure in turnout calculations.

Some researchers prefer to use a different number as their denominator — the voting-eligible population, which is adjusted to remove those disenfranchised due to felony convictions, and sometimes also those ruled ineligible due to a mental disability. We chose to use a broader denominator so we could include all people who potentially could cast ballots if felony voting restrictions were lifted. Another requirement for voting in the United States is registration: All states, except for North Dakota, require eligible voters to officially register before they vote. We chose not to limit our analysis to registered voters, as the registration process can function as a major barrier for people with low literacy levels. We did not include U.S. territories, Puerto Rico or Alaska in our analysis, due to incomplete or unavailable data.

To understand variations in literacy levels across the country, we used modeled survey data from the National Center for Education Statistics, which was collected as part of the Program for the International Assessment of Adult Competencies (PIAAC), also known as the Survey of Adult Skills. The data includes state and county estimates of average literacy levels and is based on the results of surveys collected in 2012, 2014 and 2017. The survey assesses adults for a range of skills, including reading ability and comprehension, numeracy and digital problem-solving. It represents the most comprehensive picture of the nation’s literacy levels today. The county and state literacy estimates are produced using a statistical method called small area estimation, which, in addition to the survey results, incorporates additional covariate data, such as educational attainment and poverty figures, to allow for a better extrapolation of survey data to low-population areas.

There are limitations to using modeled, survey-based estimates to understand larger national trends. More reliable data could be derived from a survey that examined both literacy and voter participation or from a more comprehensive survey that doesn’t require external data points to bolster responses. At this time, those surveys do not exist, so the PIAAC data remains the best option for understanding variations in nationwide literacy rates. This data has regularly been used by both federal researchers and academics.

For our analysis, we compared counties with low estimated literacy rates to those where literacy was estimated to be proficient or better. We defined proficiency or nearing proficiency as people who, according to the National Center for Education Statistics, have, at a minimum, the skills to complete reading and writing tasks, such as comparing and contrasting information, paraphrasing and drawing low-level inferences. People with low literacy skills may be able to read a basic vocabulary and decipher short texts, but their reading comprehension abilities are severely limited. The National Center for Education Statistics defined adults with low literacy skills as those who tested at or below the lowest proficiency level of the national survey, or those who were unable to participate in the survey because of cognitive, physical or language barriers.

About 80% of American adults were assessed as proficient or nearing proficiency in reading, and 20% had difficulty completing literacy tasks, according to the national survey. While adults born outside the country are disproportionately represented among the lowest-skilled levels, two-thirds of adults with low reading skills were born in the United States. White and Hispanic adults constitute 70% of adults with low literacy, and Black adults were overrepresented, making up 23% of adults with low literacy while accounting for less than 13% of the total population.

Top-line Findings

If more people who live in counties with low literacy voted, especially in tight races, it could potentially sway the outcome of elections. Our analysis found that as the literacy rates in a county decline, voter participation also tends to decrease.

We plotted county-level voter turnout against the percentage of residents in each county with low estimated literacy levels, and again against the share with high estimated literacy levels, and we found inverse relationships between the two literacy groups. For the purposes of our calculations, the low literacy level was defined as the population that is at or below Level 1 in indirect literacy estimates, and high estimated literacy levels were defined as Level 3 or above. As the percentage of people with low literacy in each county increases, voter turnout tends to decrease (2016: r = -0.57, p < 0.0001; 2018: r = -0.57, p < 0.0001; 2020: r = -0.58, p < 0.0001). Conversely, as the percentage of people with higher literacy skills goes up, voter turnout increases (2016: r = 0.60, p < 0.0001; 2018: r = 0.60, p < 0.0001; 2020: r = 0.61, p < 0.0001 ). This trend appeared consistently for all three election years we analyzed: 2016, 2018 and 2020. The relationship between voter turnout and literacy levels does not appear to be a random pattern.

Counties With Lower Literacy Levels Often See Lower Voter Turnout

Source: Literacy estimate data from the National Center for Education Statistics; election data from Dave Leip’s Atlas of U.S. Elections (2016 presidential election data) and the U.S. Census Bureau (Citizen Voting-Age Population Estimate, 2016). Note: The low literacy level used in county literacy calculations is defined as the population that is at or below Level 1 in indirect literacy estimates, according to the National Center for Education Statistics. Voter turnout is defined as the total number of votes in each election divided by the citizen voting-age population.

(Graphic by Annie Waldman)

However, as the saying goes, correlation is not causation. While our analysis shows a valid pattern, our findings do not suggest that lower literacy rates cause lower turnout, or that higher literacy rates increase voter participation. We also do not know whether the adults who are not voting are the same adults as those with low literacy skills.

That said, there’s a robust body of research connecting educational attainment to voter turnout: “A person’s level of formal educational attainment is a very strong predictor of whether they vote in elections, especially nonpresidential elections,” said Barry Burden, a professor and the director of the Elections Research Center at the University of Wisconsin, Madison.

Recent data from the federal Current Population Survey supports this long-standing trend. Data on educational attainment and voter turnout from the 2020 Voting and Registration Supplement shows that among Americans with less than a high school diploma or its equivalent, the percentage who reported that they voted is similar to the share who said they didn’t vote. But with each additional level of educational attainment, the percentage of people reporting that they voted increases. For Americans with only a high school diploma who have not attended college, the percentage who said they voted was twice as big as the share saying they did not vote. For adults with only a bachelor’s degree, the group who said they’d voted was about nine times the size of the group that reported that they did not vote. And for adults with a master’s degree, about 17 times as many people reported voting as not voting. Given the deep link between education and turnout, the notion that literacy might have a similar connection is not unreasonable.

Some of the most consequential elections of our time have been determined by narrow margins — just tens of thousands of votes in a country of hundreds of millions of people. For example, in 2016, Donald Trump secured the presidency by winning Pennsylvania, Wisconsin and Michigan with a total margin of just under 80,000 votes. In 2018, Ron DeSantis won Florida’s gubernatorial election by about 32,000 votes. And in 2020, President Joe Biden prevailed by winning Arizona, Georgia and Wisconsin by about 40,000 votes combined.

Given how relatively few people can swing an election, we wanted to consider what the impact might be of people with low literacy skills staying away from the polls. We clustered counties across the country by average literacy level, producing three equal groups of about 1,030 counties each, and calculated average turnout for low-, medium- and high-literacy counties. For example, in 2020, across the United States, counties with low literacy levels had an average voter turnout of 58.8%, and those with high literacy levels had turnout of 73.1% on average. We then applied the participation rate of high-literacy counties (73.1%) to the total population of low-literacy counties to estimate how many votes those counties might be “missing.” We found that if counties with lower literacy levels had similar participation rates to high-literacy areas, turnout could increase by up to 7 million votes nationally. Of course, we cannot predict or assume for whom any additional votes would be cast.


The purpose of this analysis was to gain a better sense of the relationship between turnout and literacy, rather than conduct a causal or inferential analysis. There are several limitations that could affect our understanding.

For our analysis, we relied on county-level data, and as it represents groups (i.e. counties) rather than individuals, we cannot be certain that the low-literacy people in each county were the same individuals who were not voting. Thus, one reason for the correlation between literacy and turnout could be that literacy is acting as a proxy for other factors that influence participation, like lower levels of income or a lack of social capital.

While literacy may impact voter participation, there are many other reasons why some parts of the country may cast fewer ballots. People do not vote for a number of reasons, including difficulty getting time off from work and limited options for transportation to the polls. Some people may not have much interaction with voter mobilization groups and others may feel disengaged from politics. And barriers in the process, like states disenfranchising people with felony convictions, may also impede voter participation. Some of these factors may also be influenced by an individual’s ability to read.

An important limitation of the Current Population Survey is that it relies on self-reporting, and individuals’ responses about whether they voted have not been verified against official voting records. Thus, the data is susceptible to misreporting. Some research has shown that higher socioeconomic or educational attainment levels may be associated with higher misreporting, which could affect the results.

Election participation is often influenced by local policies, and the correlation between literacy and voter turnout varies by state. While the majority of states exhibit moderate to strong relationships between voter participation and literacy, in a handful of states, there are weaker connections, which presents an intriguing path for a more comprehensive future analysis. There may be state-by-state differences in voting accessibility or ballot complexity that may also have varying effects on turnout.

In addition, the literacy data has limitations. As mentioned above, the National Center for Education Statistics developed a predictive model based on the results from its skills survey and a handful of auxiliary data points from the census, used to bolster the model’s predictive precision. These data points include, but are not limited to, high school diploma rates, poverty levels, racial breakdowns, health insurance coverage and fraction of the population working service jobs. These variables might confound the literacy variable’s relationship with turnout, possibly boosting the correlation.

The literacy data for counties with fewer residents may also have greater uncertainty than the data for more populous counties. These small counties may affect the results of the analysis, particularly in analyses done at the state level in states that have numerous small counties. These small counties, with fewer than 1,500 people, represent about 2% of all 3,100 counties in the data set. To assess their influence, we resampled the data, randomly drawing new estimates for each county, and reran the analysis 1,000 times. The findings did not significantly change.

One in Five Americans Struggles to Read. We Want to Understand Why.