2020_AMC_Replicates¶

This landing page contains data files derived from the 2020 Census, which can be used to apply the Approximate Monte Carlo Method (AMC method) to construct estimates of uncertainty introduced by the 2020 Census Top Down Algorithm (TDA) as part of the 2020 Disclosure Avoidance System. The AMC approach was designed by U.S. Census Bureau research staff to provide a method for generating estimates of the amount of uncertainty introduced by the TDA, the formally private mechanism used to protect the confidentiality of individuals' census responses in the 2020 Census Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristics File, and the Demographic Profile.

The AMC method was inspired by traditional Monte Carlo methods, and works by, first, generating a Privacy-Protected Microdata File (PPMF) by executing the TDA using the confidential Census Edited File (CEF) as input; we note that the official location for the PPMF is on the United States Census Bureau FTP server, but we also include a copy of it here (and refer to it as the “PPMF0”) for convenience. U.S. Census Bureau staff then generated a large number of replicates by executing the TDA repeatedly, in iterations which each treated the initial PPMF0 as the ground truth. That is, the PPMF0 was substituted for the CEF, and then run repeatedly in this mode. This generated a series of iterates, PPMFi, for i=1,2,...,50, where 50 was a value determined empirically to be reasonable, as discussed in the AMC paper. This README is hosted in the root directory of an external location hosting each of these 2020-derived PPMF0 and PPMFi files. Comparisons between the PPMF0 and PPMFi, and variability in these comparisons, can be used to construct estimates of uncertainty, including intervals that behave like traditional confidence intervals. The AMC paper also discusses appropriate use of the AMC method, using the 2010 Census to examine when the AMC method works well and when its results should be interpreted cautiously.

The original 2020 PPMF, which serves here as the PPMF0, was previously released on the U.S. Census Bureau FTP server on 2024-08-05 at this location. The PPMF person and housing unit files include the geocodes, down to the census block level, used for the published 2020 Census data products referenced above. For the file layout and descriptions of each field, see the 2020 Census Privacy-Protected Microdata File Technical Documentation.

The 50 PPMFi replicate files have a different layout than the 2020 PPMF (PPMF0) file; most importantly, they do not include the additional geocode information contained in the PPMF. The schema information (variable/column names, and valid levels for each column) for each of the PPMF replicates is very similar the schema information for the 2010 PPMF previously released on the U.S. Census Bureau FTP server on 2023-04-03 at this location, which has a schema file located in the same directory. The 2020 replicate file layout is in the present s3 bucket, titled amc_replicates_2024-09-30_record_layout.pdf. This schema includes variables sufficient for calculating all tabulations published in the 2020 Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristics File, and the Demographic Profile.

Users should note that tabulations on which very little privacy-loss budget (PLB) was expended will generally have larger amounts of statistical noise added to them, and should not be expected to mirror the original CEF-based counts as closely as queries that are allocated more PLB. The privacy-loss budget for the 2020 Disclosure Avoidance System (DAS) can be found here. Also, note that a variety of other factors also impact accuracy, such as the geographic entity type and the size of the population being measured.

Note on how to use the PPMF0 file to apply geographic information to the replicates¶

The 2020 PPMF includes all the geocodes necessary to produce tabulations for the same geographic areas included in the published 2020 Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristics File, and the Demographic Profile. For both person and housing unit records, geocodes are provided down to the census block level, the lowest geography for which 2020 Census tables can be produced. In many cases, this will allow for data to be tabulated for levels of geography not available in the tabular 2020 Census products, including custom geographic areas.

Note that unlike the PPMF0, the 50 PPMFi replicate files do not include the full range of geocodes. We did this to:

  • Maintain consistency with the 2010 PPMFi replicate files, which were released on AWS Open Data in April 2024,

  • Reduce the space required to store the files; the uncompressed 2020 PPMF0 file size is 168 GB, while the file size for an uncompressed replicate file is 27GB,

  • Make downloading the files easier and faster.

The file layout and descriptions of the geographic fields can be found in the 2020 Census Privacy-Protected Microdata File Technical Documentation, starting on page 3-10.

Data users can utilize the geographic information in the PPMF0 to create their own block level crosswalk. To do this, follow the steps below.

  1. Using the PPMF0 Unit File, create a new data file containing only the geographic columns desired to be applied to the PPMFi replicate files.

    • The fields TABBLKST, TABBLKCOU, TABTRACTCE, and TABBLK must be included among the selected geographies.

    • Do not include non-geographic columns; this will cause there to be multiple records for each block.

    • Do use the PPMF0 Unit File; the PPMF0 Person File does not contain blocks that only contained vacant housing units.

  2. Remove duplicate rows so there is only one row for each unique TABBLKST, TABBLKCOU, TABTRACTCE, and TABBLK combination.

This block crosswalk should contain one record for each block that contained an occupied or vacant housing unit, or occupied group quarters, in the 2020 Census. It can then be joined with the PPMFi replicate files on the columns TABBLKST, TABBLKCOU, TABTRACTCE, and TABBLK. Every record in every PPMFi replicate file will have a match to this crosswalk.