Making Sense of Census Data Resources

In my role as Ohio State’s Geospatial Information Librarian, a lot of the work that I do is related to helping researchers – at all levels and across a wide variety of disciplines – think through how they can locate, analyze, and visualize geographic data. And a lot of the time, data products provided by the U.S. Census Bureau will be relevant for addressing the research questions that they are asking.

When we hear the word “census” in 2020, our thoughts likely turn to the decennial census, and for good reason. It is hard to overstate the importance of the 2020 Census in terms of political representation and federal funding allocation, and the ways these will impact our communities over the next decade.

But it’s also important to note that census data products cover a lot more than the decennial census. In fact, the U.S. Census Bureau conducts more than 130 different surveys and programs, including the American Community Survey (ACS), Current Population Survey (CPS), Economic Census, and Longitudinal Employer-Household Dynamics (LEHD) program, to name a few.

More recently, the U.S. Census Bureau has also been releasing a variety of interesting experimental data products, which are described as “innovative statistical products created using new data sources or methodologies that benefit users in the absence of other relevant products.” Two that garnered some attention earlier this year and that have recently gone through a second phase are the Household Pulse Survey and Small Business Pulse Survey, which provide data about the social and economic effects of the COVID-19 pandemic on American households and businesses, respectively.

As mentioned in an earlier post, data products from the U.S. Census Bureau are free and publicly available. Here are a few different ways you can access these data for research, teaching, or class assignments:

U.S. Census Bureau

A lot of census data products are directly accessible in data.census.gov, a new platform that replaced American FactFinder in early 2020. The platform features a new search interface aimed at making it easier for users to locate the data they need, with more datasets planned to be added over time. It’s also possible to browse and download data tables for various programs by topic and year. If you are unable to find the data you are looking for through either of those options, you can always go directly to the website for the specific program you are interested in to see what data access options are available (and see here for the list of all surveys and programs). TIGER data products are also publicly available for working with census data in a GIS.

data.census.gov is the U.S. Census Bureau’s new platform for facilitating data access

IPUMS

IPUMS is a great resource for accessing a number of historical and contemporary census data products not readily available elsewhere. For example, NHGIS – the National Historical Geographic Information System – provides access to summary data tables and GIS-compatible boundary files from 1790 to the present and for all levels of U.S. census geography. For those working internationally, IPUMS also recently announced the launch of IHGIS – the International Historical Geographic Information System – with data tables and GIS-compatible boundary files from population, housing, and agricultural censuses from a number of countries, with more to be added over time.

Up to this point, all of the data resources I’ve been discussing have been more focused on providing summary data, presented in aggregate at different levels of U.S. census geography. But various IPUMS products also provide access to historical and contemporary census microdata, that is, individual records containing information collected about persons or households. IPUMS USA, for example, provides access to harmonized microdata from decennial censuses from 1850 to 2010 and American Community Surveys from 2000 to the present, though geographic information for these records is limited compared to summary data. IPUMS also recently announced the release of the Multigenerational Longitudinal Panel (MLP), which links individuals’ records between censuses spanning 1900-1940, with plans to extend back to 1850 in the future.

All IPUMS data products are free and publicly available, though there is a registration process required before gaining access to these data.

IPUMS provides access to various unique historical and contemporary census data products

Licensed Resources

In addition to the public data resources described above, the University Libraries licenses several resources that provide access to census data products in a fairly user-friendly way, especially for beginners. PolicyMap and Social Explorer are two examples, both of which include interactive map viewers that facilitate some geographic exploration of the data without the need to download and import data into a GIS every time. I have worked with instructors in various departments who have incorporated one of these databases into an assignment or recommended them as data sources for student projects. One other important note about Social Explorer is that it includes data tables for the 1970, 1980, 1990, and 2000 decennial censuses normalized to the 2010 census geographies to facilitate longitudinal comparisons, with data available down to the tract level.

Social Explorer has a number of interactive map viewers for exploring census data variables

This list of census data resources is by no means exhaustive, but I hope it will be a good starting point for those looking to use census data products for research, teaching, or class assignments. Have fun exploring these resources, especially if you are new to census data or less familiar with some of the other surveys and programs conducted by the U.S. Census Bureau. And if you are having trouble finding the data you need or have other questions, you can always contact a librarian.

Joshua Sadvari

Assistant Professor, Geospatial Information Librarian

University Libraries

The Ohio State University

Census Data: A Personal Note on Some Challenges and Successes

The census data we use today is a symbol of American democracy. The U.S. Constitution states that “the actual enumeration shall be made … within every subsequent term of ten years, in such manner as they shall by Law direct” (Article 1, Section 2). After this historical point, the census has a brand new meaning beyond the mere means for the royalty or state to make their economic or political gains. Today, U.S. census data are commonly used for mapping and many other purposes. It is literally the textbook example of spatial data and applications in GIS and cartography education. Indeed, the U.S. census has empowered individuals and organizations around the world. A few simple clicks on an interactive map shown below, for example, will reveal some stunning pattern of spatial dynamics across America, even at a county level.

Map to explore census data of U.S. Counties

An interactive map to explore census data of U.S. counties.

However, as useful and powerful as the census data are, shortcomings and challenges are also noticeable. Let’s start from a spatial perspective and ask ourselves this question: are census data safe? The following map comes from the dark side of using census data. It was made by Nazi Germany circa 1940, before the U.S. formally entered the Second World War. This map details the first and second generation of middle and western Europe immigrants in the United States, based on the publicly available data from the 1930 U.S. Census. It also has a label at the top left corner that reads “For official use only!” Its cartographic achievement aside, this map was used by the Nazi propaganda machine to strategically spend their war money to persuade the public opinion in the U.S. to avoid being involved in the war raging in Europe. Many believed such a campaign was successful to some extent. It is safe to say that, ever since then, the use of the census data and maps in today’s affairs, from political campaigns to social media disinformation to foreign meddling of our elections, is everything but the lack of imagination.

“20th Century Through Maps” Courtesy of British Library (permission pending)

The arguably darkest moment of the U.S. census also came in the Second World War, when the census information used by the U.S. government directly led to the internment of Japanese Americans after the Pearl Harbor attack. So is it really provocative to ask will the census data, mapped or not, put us in danger? Will history repeat itself in the 21st century? Will another ethnic group become the victims? While the 2020 census eventually did not include citizenship questions, it should not be the time to celebrate. Instead, we should ask will such questions ever come back, and in what form? These issues may be beyond the scope of the census, but the census has been the vehicle that carries these issues.

Also from the spatial perspective, it is well known that census geographies are designed in a hierarchical fashion where the blocks are the smallest spatial units and from there we can aggregate to units such as block groups, tracts, and counties. Census tracts have been considered to be the relatively stable units for statistical analysis because by design they aim to have an ideal population of 4,000. But why should space be delineated in such a fixed and perhaps artificial way? What if we can re-arrange the blocks and come up with different kind of units that are compatible with the official tracts? This is a notoriously difficult task because there are an astronomical amount of ways to recombine the blocks. But if we test some algorithms on a manageable number of units, we can see how the world can be different. The two figures below show the result of such an exercise. It is clear that we can actually achieve a better set of spatial units where the population is more evenly distributed and more centered around the ideal size. Also, the new aggregated units show no significant spatial auto-correlation, which makes them more suitable for statistical analysis.

Visual representation of Population of the 284 census tracts in Franklin County, Ohio.

Population of the 284 census tracts in Franklin County, Ohio.

Visual representation of Population of the 284 new units that are aggregated using the 887 census block groups in Franklin County, Ohio.

Population of the 284 new units that are aggregated using the 887 census block groups in Franklin County, Ohio.

Issues related to spatial units are not new and have been around in statistics and geography for at least more than a half century. Computational advances have made it possible to explore new and different approaches to spatial organization. The question is: how can we embrace such a new way of thinking about these statistical units? Should we even go down this rabbit hole where things will become constantly changing.

We can certainly read the history of the use of census data through different lenses. But, however we read it, we will find both bright and dark sides that are full of conflicts, betrayal, conspiracy, struggle, and promises. The world envies the richness of the census data available in the United States that dates back to the beginning days. From this perspective, I personally do see more promises than anything else, as the new century should be the time for us, the research community as well as the general public, to re-imagine what the census data could be.

 

Ningchuan Xiao, Professor

Department of Geography

The Ohio State University