Estimated reading time: 8 minutes
Although we strive to obtain all demographic and housing data from every individual in the census, missing data are part of every census process. Fortunately, we have long-established procedures we’ve used in previous censuses and surveys to fill in these missing pieces.
As you’ll see from this blog, this process is complex but is a reflection of the extensive standard statistical methodology we use to account for missing or conflicting data.
We collected demographic and housing data in a few ways:
Once we’ve collected all the data we can, we use statistical techniques, such as edits and characteristic imputation, for the small number of missing, invalid or inconsistent housing or demographic characteristics. With editing, we compare an individual’s responses to those of other household members or the overall group quarters to look for invalid or inconsistent information. With characteristic imputation, we fill in missing information by using a combination of sources, including other information in that individual’s or other family members’ census responses, responses from that individual or family member from another census or survey, or other existing records or information from similar nearby neighbors. More information is available in our blog: How We Complete the Census When Households or Group Quarters Don’t Respond.
It is important to note that edits and characteristic imputation occur after total population counts are finalized — these processes do not affect the number of people counted in the 2020 Census. Also keep in mind that the Census Bureau receives administrative records from many sources, but the data we collect cannot and will not be shared with anyone else, including other government agencies, or used for anything other than statistical purposes. Statistical purposes never include identifying respondents or the data they provided.
Why do we impute missing demographic and housing data? We’ve done it for a long time and it’s an established process for most statistical agencies around the world. In fact, the Census Bureau has used characteristic imputation since the 1960 Census to ensure that each person and housing unit have demographic and housing data for each item on the census questionnaire. Imputation has been shown to improve data quality and accuracy compared to leaving these fields blank, or without information from respondents.
The Public Law (P.L.) 94-171 Redistricting Data Summary File is the first detailed data file released from the 2020 Census, which will be released by August 16. The 2020 Census data are used to distribute hundreds of billions of dollars in federal funding to state and local governments and communities across the country and is typically used by states to redraw congressional and state legislative districts.
Edits and characteristic imputation are part of our quality control and assurance measures and can be divided into three general stages — edits, assignment and allocation.
In this phase, we take the responses people reported and run a series of checks:
The ideal situation is that every person counted in the 2020 Census fills out their census questionnaire completely and provides valid and consistent responses. When they do, we call it “as reported” and these responses do not get assigned or allocated.
Assignment occurs when missing responses can be determined based on other information provided for that same person.
In the 2010 Census, we used responses from the 2000 Census to fill in missing Hispanic origin and race information. A major improvement for the 2020 Census is the expanded use of administrative records to assign demographic and housing characteristics during characteristic imputation.
In 2020, we used 2010 Census responses to fill in missing values for sex, age, Hispanic origin and race. Plus, we used information from the American Community Survey, Social Security Administration (such as records from Social Security card applications), other federal administrative records, and commercial housing tax and deed information to assign missing characteristics.
Below are specific examples of how we assign each of the key demographic and housing characteristics collected in the 2020 Census:
We turn to allocation when we can’t determine missing responses from other information provided for that same person living in a household or group quarters. The primary method of allocation is to use information from similar nearby households. We use allocation to determine responses for:
There’s one other allocation method we can use for individuals missing Hispanic origin or race before looking at data from nearby neighbors. For people living in households, we fill it in using information from other household members. For example, we use information from a parent if they report their race but do not provide it for their child.
Once all the data have been processed, missing data imputed, and internal quality checks completed, the next step is to apply differential privacy to prevent unauthorized disclosure of confidential data. Upcoming blogs will provide more details about these files and products.
We plan to provide characteristic imputation rates by key demographic and housing items in 2022.
We emphasize that characteristic imputation is only implemented long after all data collection has ended. The imputation methods follow after all attempts to obtain a response have been exhausted.
We prefer when information about a household or group quarters facility comes directly from the household or the people at the group quarters facility who reported their demographic data. When they do not respond, this technique helps us deliver more complete and accurate statistics and statistical products.