Small Area Estimation in Case of Nonresponse: A Cautious Approach


Miscellanenous (Nonignorability, Measurement Error, Errors in Sampling Variance, Multiple-category Outcome )

Julia Plass (Department of Statistics, LMU Munich, Germany)
Aziz Omar (Department of Statistics, LMU Munich, Germany and Helwan University, Egypt) (Speaker)
Thomas Augustin (Department of Statistics, LMU Munich, Germany)

In the context of Small are estimation (SAE), nonresponse may seriously reduce the already small sample size. Accordingly, a joint consideration of both problems is especially challenging. For survey practitioners, it has been a common practice to use weighting and imputation to mitigate nonresponse. Both techniques achieve point-identifiability by imposing the assumption of missing at random (MAR), i.e.~the missingness is occurring randomly, hence independently, of the true underlying value of the variable of interest conditioning on available covariates. Existing literature concerning nonresponse in SAE context relies on strong assumptions on the missingness process as MAR or a certain missing not at random (MNAR) plus strict distributional assumptions. Since, generally, neither the MAR nor other MNAR distributional assumptions are testable and wrongly implying them may cause a substantial bias, the results of such treatments have to be treated with caution.

According to the methodology of partial identification in the spirit of Manski (2003),, strong assumptions on the missing process are not imperative to obtain reliable results. Instead, allowance for partially identified parameters enables to incorporate tenable cognizance. In this way, imprecise — yet credible — results could be obtained, and could also be refined when additional knowledge about the missingness is available. In favor of this point of view are some already available approaches, (cf. e.g. Couso and Dubois (2014); Denœux (2014) and also Manski (2015) in the context of official statistics), that refrain from strong assumptions on the missingness process. These approaches are not developed with a reference to standard SAE situations in mind, though.

For this reason, the aim of this work is to develop cautious versions of some common small area estimators to reflect the dispensability of strong assumptions regarding the missingness process. In this sense, cautiousness is practiced through forbearing such assumptions, (cf. Plass et al., 2017). Specifically, the objective is to estimate the probabilities of possible outcomes of a binary variable whose values are not completely observed in samples drawn from several small areas. In addition to survey data involving the binary variable along with certain covariates, auxiliary information regarding these covariates at areas’ levels is included.

Technically, the proposed approach distinguishes between a latent world with the binary variable of interest $Y in {0,1}$ and an observed world with variable $mathcal{Y} in {0,1,na}$, such that the missing values (denoted by $na$) are regarded as a category of its own. As a first step, the maximum likelihood estimator of the resulting multinomial distribution of the observed variable is uniquely determined. Due to the invariance of the likelihood under parameter transformation, the likelihood based on the observed variable can then be rewritten in terms of the parameters of the latent variable distribution and the missingness parameters. While, for instance, the MAR assumption would point-identify the latent variable distribution, generally the relation between both worlds is not one-to-one, such that there are several latent variable distributions and missingness parameters leading to the same observed variable distribution. Thus, taking all the compatible solutions gives a set of solutions as the (first) result of the cautious approach.

Since the obtained set can get rather large, from a practical point of view, it is attractive to add weak and tenable assumptions about the missingness process, e.g. “rich people rather refuse to answer to the income question compared to poor people”. Assumptions as such are common in many situations and could not be used if point-identified results are forced. In the current cautious approach (cf. Plass et al., 2015),, the inclusion of such weak assumptions is achieved by adding restrictions on the observation model above relating $Y$ and $mathcal{Y}$, framing the procedure of employing auxiliary information on the missingness process.

In this work, by relying on the proposed likelihood approach, first cautious versions of common estimators such as the synthetic estimator and the LGREG estimator are developed. It turns out that the cautious approach can not directly be applied for model-based estimators. Nevertheless, some first studies investigating the obtained proportions under different missingness scenarios are given.

The results are illustrated by means of the German General Social Survey where auxiliary information in terms of totals is inferred from a data report by the German Federal Statistical Office. In this application, it is aimed to estimate the proportion of people at risk of poverty in German federal states (areas) in the light of differences in gender and education attained.

Couso, I., and Dubois, D. (2014). Statistical reasoning with set-valued information: Ontic vs. epistemic views. International Journal of Approximate Reasoning, 55 , 1502-1518. Denoeux, T. (2014). Likelihood-based belief function: Justification and some extensions to low-quality data. International Journal of Approximate Reasoning, 55 , 1535-1547. Manski, C. (2003). Partial identification of probability distributions. Springer. Manski, C. (2015). Credible interval estimates for official statistics with survey nonresponse. Journal of Econometrics, 191 , 293-301. Plass, J., Augustin, T., Cattaneo, M., and Schollmeyer, G. (2015). Statistical modelling under epistemic data imprecision: Some results on estimating multinomial distributions and logistic regression for coarse categorical data. In T. Augustin, S. Doria, E. Miranda, & E. Quaeghebeur (Eds.), Proc. 9th international symposium on imprecise probability: Theories and applications (pp. 247-256). Arcane, Rome. Plass, J., Omar, A., and Augustin, T. (2017). Towards a cautious modelling of missing data in small area estimation. (Preliminary version of a technical report available at, (Under review)).
Keywords: Small area estimation; LGREG-synthetic estimator; missing data; NMAR; partial identification; logistic regression; logistic mixed model.

Only logged in users can see slides when author's permission was given. Please register to have access.