Face-to-face multistage cluster probability surveys are the gold standard for obtaining reliable information at the national level. However, most of the policy decisions are made at the local level such as states, counties, or health service areas. In national surveys, due to sampling procedures such as probability proportion to size selection methodology, there is no data from most of the local areas. To compensate for the missing information, various sources of auxiliary data which provide related information at the local level are used in the SAE modeling. Also, in recent years more and more data is becoming available from alternate sources such as nonprobability surveys, telephone surveys, web-panel surveys, social media data, and administrative data. For example, National Health Interview Survey (face-to-face survey) (NHIS) collects information about diabetes, similar information is also collected by Behavior Risk Factors Surveillance System (telephone survey) (BRFSS), and diabetes information can also be extracted from the prescription data. To produce reliable information at the county level, these data sources can be pooled e.g. similar outcome variable from one survey can be used as the auxiliary variable in the SAE model for another related outcome measure or similar outcome variables from more than two data sources can be concatenated (stacked) to provide improved coverage across small areas. These efforts require development of advanced SAE techniques to properly account for all or most of the intricacies (e.g. mode effect, measurement errors in covariates, coverage issues etc.) associated with each data sources in the SAE modeling process.
In the past two decades, due to availability of high power computing resources, major advances have been made in the field of SAE and there are several efforts made to produce small area estimates by combining data across multiple sources. For example, Ybarra and Lohr (2008) and Lohr and Prasad (2003) developed SAE models when auxiliary data was obtained from another surveys; Raghunathan et al. (2007) developed a hierarchical Bayes approach that combined the outcome measures from BRFSS and NHIS and incorporated nonresponse, noncoverage errors in the BRFSS, and complex sample design features of both surveys in the modeling; and Kim et al. (2015) used system of structural error models to combine several small area estimates obtained from several sources. Merkouris (2010), Manzi (2011), and Kim and Rao (2012) also developed SAE models for combining data from multiple sources. When both sides (outcome variables and auxiliary variables) of an SAE model involve data from multiple sources which are subject to various sampling and nonsampling errors and postulated model is unit-level nonlinear model then fitting such SAE model becomes challenging. In this era of big data with the availability of advanced computing techniques, clearly more research work and resources are needed to utilize as much available information as possible at the small area level of interest and produce reliable small area estimates.