2010_AMC_Replicates

This landing page contains data files derived from the 2010 Decennial Census, which can be used to apply the Approximate Monte Carlo Method (AMC method) to construct estimates of uncertainty introduced by the 2020 Decennial Census Top Down Algorithm (TDA) as part of the 2020 Decennial Census Disclosure Avoidance System. In the future, we expect to provide products based on the same AMC method, applied directly to the 2020 Decennial Census. The AMC approach was designed by United States Census Bureau research staff to provide a method for generating estimates of the amount of uncertainty introduced by the 2020 Decennial Census Top Down Algorithm (TDA), the formally private mechanism used to protect the confidentiality of individuals' census responses in the 2020 Census Redistricting (P.L. 94-171) and Demographic and Housing Characteristics (DHC) data products

The AMC method was inspired by traditional Monte Carlo methods, and works by taking a Privacy-Protected Microdata File (PPMF0) generated by executing the TDA using a confidential Census Edited File (CEF) as input, and then generating a large number of replicates by executing the TDA repeatedly, in iterations which each treat this initial PPMF0 as the ground truth. That is, the PPMF0 is substituted for the confidential Census Edited File (CEF), and then the TDA is run repeatedly in this mode; the set of files published in this location carry out this procedure but using the 2010 CEF in lieu of the 2020 CEF, as a demonstration. This generates a series of iterates, PPMFi, for i=1,2,...,25, where 25 was a value determined empirically to be reasonable, as discussed in the AMC paper. This README is hosted in the root directory of an external location hosting each of these 2010-derived PPMF0 and PPMFi files. Comparisons between the PPMF0 and PPMFi, and variability in these comparisons, can be used to construct estimates of uncertainty, including intervals that behave like traditional confidence intervals. The AMC paper also discusses appropriate use of the AMC method, using the 2010 Decennial Census to examine when the AMC method works well and when its results should be interpreted cautiously.

Schema information (variable/column names, and valid levels for each column) for each of the PPMF replicates is identical to the schema information for the PPMF previously released on the United States Census Bureau FTP server on 2023-03-23 at this location, with schema file, except for minor differences in vintage and release date. For convenience, we've provided a version of this layout file in the present s3 bucket as well (with vintage and release date updated). This schema includes variables sufficient for calculating 2010 analogues to 2020 Decennial Census tabulations from either the 2020 Redistricting or 2020 Demographic and Housing Characteristics File products. Users should note that tabulations on which very little privacy-loss budget was expended will generally have larger amounts of statistical noise added to them, and should not be expected to closely mirror the original CEF.

Errata Note: 2010 AMC Replicates

A recently discovered deviation in the TDA codebase used to generate the 2010 AMC replicate files may limit the utility of the 2010 PPMF0 and replicates for estimating valid confidence intervals in a subset of geographies. The forthcoming 2020 AMC replicate files are unaffected.

The underlying premise of the AMC approach to estimating confidence intervals for data protected by the 2020 TDA is to use the production codebase and parameters (i.e., the exact system code and settings used to produce the official 2020 Census Redistricting Data and DHC) to generate replicate files using the production PPMF as the input instead of the confidential Census Edited File. In practice, however, it was impractical for the Census Bureau to use the exact production codebase and platform for this purpose at the time the AMC research was being conducted. In the period between the production runs of the TDA for the 2020 Census and the subsequent research supporting the AMC method, there were numerous software updates, patches, system migrations, and code optimizations performed to improve processing efficiency and which reduced operating costs by an order of magnitude. Unfortunately, amidst these system changes, and in spite of our best efforts to ensure that the codebase used to produce the 2010 replicate files was functionally equivalent to the production codebase, a discrepancy in the spine optimization routines was introduced prior to the initial 2010 PPMF0 run of the AMC research.

This code deviation resulted in some of the geographic units in the “Prim” geographic level in a subset of states of the optimized geographic spine to be defined incorrectly.1 Within the subset of states without functioning Minor Civil Divisions (MCDs), this resulted in artificially improved accuracy for DHC replicate tabulations for some non-incorporated places and artificially worse accuracy for the corresponding tabulations for some incorporated places. The deviation did not impact spine optimization or accuracy for geographies in MCD states, nor did it impact the accuracy for any tabulations that are estimated in the initial Redistricting Data File TDA evaluations prior to carrying out the subsequent DHC evaluations.2 It should also be noted that the same version of the codebase was used to produce the 2010 PPMF0 and the twenty-five 2010 AMC replicates, which preserves the fundamental modeling assumption of the AMC method. Consequently, no rerun or rerelease of the 2010 PPMF0 and AMC replicates is planned; the limited geographic scope of the deviation’s impact and the codebase consistency across these 2010 runs still permit evaluation of the effectiveness of the AMC method for estimating confidence intervals.

This discrepancy did not impact the 2020 AMC replicates, since these runs reused the geographic unit definitions included in the 2020 production Noisy Measurement Files, thereby bypassing the implicated spine optimization routines.

The Census Bureau would like to thank Jan Vink of Cornell University for alerting us to the discrepancy and for assisting us in identifying its underlying cause.