Survey-weighted Unit-Level SAE


Poster Session

Poster no.06

Jan Pablo Burgard (University of Trier, Faculty IV - Economics, Economics and Social Statistics Department) (Speaker)
Patricia Dörr (University of Trier, Faculty IV – Economics, Economics and Social Statistics Department)

Official statistic surveys often aim to provide reliable estimates for nation-wide population figures. These surveys often make use of complex survey designs to improve efficiency of the national estimates. However, when considering regional estimates, the sample sizes within the regions may be too low to apply classical design-based estimators. In this case, the small area estimation theory may be applicable. In small area estimation, the idea is to borrow strength across domains by imposing a model on the population. In many cases the model is a regression model. A prominent example is the Battese-Harter-Fuller estimator (BHF) which assumes a linear mixed model with a random intercept on the areas. Under the correct model, the BHF is a consistent estimator if the survey has the self-weighting property. Otherwise, even under the correct model, the sample model does not correspond to the population model and the estimated parameters are not consistent. In turn, the BHF is biased when using inconsistent parameter estimates in the small sample situation.
This self-weighting property does not necessarily hold for complex survey designs employed in official statistics. When there is uncertainty about the regression model or about the informativeness of the sampling design (i.e. a violation of the self-weighting property), survey weights might reduce possible estimation bias. Therefore, we consider their inclusion in the unit-level random-intercept models employed for SAE estimation in order to account for possible model-misspecification and complex survey designs.
A direct pseudo-ML approach to multi-level modeling requires known inclusion probabilities at each sampling stage and that the random effects structure reflects the survey design. This is problematic if the domain of interest is not a sampling unit itself but goes across sampling units, e.g., estimation of totals for age x gender domains where the sample is drawn regionally clustered. As an alternative, we propose to run a Monte-Carlo EM (MCEM) algorithm whose complete-data likelihood leads to a survey-weighted single-level modeling problem. For the E-step, the random effects are seen as the missing data needed to complete the likelihood and are found by Monte-Carlo integration. This makes the regression model flexible in the random effects structure and needs only the final individual inclusion probabilities. The weighted model parameter estimates are then used to form the survey-weigthed unit-level pseudo-EB(LU)P.
In a simulation study, we apply the procedure to assess the MCEM performance in comparison with unweighted SAE estimators and classical design-based estimators. We find that under a self-weighting sampling design, when the use of survey weights is not necessary, the loss in efficiency is relatively small. Further, under informative sampling, the use of survey weights reduces the BHF estimator’s relatively dominant bias and thus the simulated mean squared error.