Synthetic Data Generation for Small Area Estimation with Application to Large-Scale Surveys


Session:

Confidentiality and Related Topics in SAE

Author: Joseph Sakshaug (University of Manchester, UK and Institute for Employment Research, Germany)
Abstract:

Small area statistics provide an important source of information used to study local trends related to social, health, and economic phenomena. However, most large-scale sample surveys, for which rigorous measures of these phenomena are collected, are not designed for purposes of producing reliable small area estimates. A further complication is that data disseminators are typically prohibited from releasing small-area identifiers in public-use survey data sets due to disclosure risk concerns. In this presentation, I will examine a method of generating synthetic microdata that permits detailed geographical information to be released in public-use data files. The method is based on a hierarchical Bayesian model that accounts for multiple levels of geography and complex sample design features (e.g., stratification, clustering). The model is used to simulate multiple, fully-synthetic versions of the observed data. Inferences based on these simulated (or synthetic) data files are then made possible using standard combining rules. The method is demonstrated on two large-scale national surveys for which small area estimates are desired: The National Health Interview Survey and the American Community Survey. The analytic properties of the resulting small area inferences are presented using direct comparison with the observed data, simulations, and a cross-validation study.



Download presentation.

Sakshaug_SAE2017_Paris