Small Area Estimation with Data from Multiple Sources

Chair: Akhil K. Vaish

Face-to-face multistage cluster probability surveys are the gold standard for obtaining reliable information at the national level. However, most of the policy decisions are made at the local level such as states, counties, or health service areas. In national surveys, due to sampling procedures such as probability proportion to size selection methodology, there is no data from most of the local areas. To compensate for the missing information, various sources of auxiliary data which provide related information at the local level are used in the SAE modeling. Also, in recent years more and more data is becoming available from alternate sources such as nonprobability surveys, telephone surveys, web-panel surveys, social media data, and administrative data. For example, National Health Interview Survey (face-to-face survey) (NHIS) collects information about diabetes, similar information is also collected by Behavior Risk Factors Surveillance System (telephone survey) (BRFSS), and diabetes information can also be extracted from the prescription data. To produce reliable information at the county level, these data sources can be pooled e.g. similar outcome variable from one survey can be used as the auxiliary variable in the SAE model for another related outcome measure or similar outcome variables from more than two data sources can be concatenated (stacked) to provide improved coverage across small areas. These efforts require development of advanced SAE techniques to properly account for all or most of the intricacies (e.g. mode effect, measurement errors in covariates, coverage issues etc.) associated with each data sources in the SAE modeling process.
In the past two decades, due to availability of high power computing resources, major advances have been made in the field of SAE and there are several efforts made to produce small area estimates by combining data across multiple sources. For example, Ybarra and Lohr (2008) and Lohr and Prasad (2003) developed SAE models when auxiliary data was obtained from another surveys; Raghunathan et al. (2007) developed a hierarchical Bayes approach that combined the outcome measures from BRFSS and NHIS and incorporated nonresponse, noncoverage errors in the BRFSS, and complex sample design features of both surveys in the modeling; and Kim et al. (2015) used system of structural error models to combine several small area estimates obtained from several sources. Merkouris (2010), Manzi (2011), and Kim and Rao (2012) also developed SAE models for combining data from multiple sources. When both sides (outcome variables and auxiliary variables) of an SAE model involve data from multiple sources which are subject to various sampling and nonsampling errors and postulated model is unit-level nonlinear model then fitting such SAE model becomes challenging. In this era of big data with the availability of advanced computing techniques, clearly more research work and resources are needed to utilize as much available information as possible at the small area level of interest and produce reliable small area estimates.


Estimation of small area means using area-level and unit-level covariates based on multiple surveys

Unit-level models are extensively used in small area estimation. These models incorporate both unit-level and area-level covariates to accurately estimate finite population means of small areas. To borrow information from the unit-level covariates, that are available only from the sampled units, we propose a multivariate adaptation of the nested error regression model. Information on the […]

Small Area Estimation by Combining Information from Multiple Data Sources on Correlated Variables at Different Levels of Aggregation

Demands for small area estimates are ever increasing and are useful for the local policy evaluation and implementation. Increasing concerns about privacy and confidentiality is preventing agencies from providing data at the desired level of geography. This paper develops procedures for combining information from multiple data sources that provide data at different levels of aggregation […]

Small Area Estimation by Mass Imputation: Combining Information from Two Independence Surveys

Combining information from two independence surveys with similar measurement can be a promising area of research in small area estimation. To incorporate the survey specific effect, we use a random effect model in the population level. The sampling design can be informative in the sense that the sample distribution can be different from that of […]

This session was organised by Akhil K. Vaish.