Small area models with misclassified covariates


Bayesian methods (2)

Serena Arima (Sapienza University of Rome, Italy) (Speaker)
Silvia Polettini (Sapienza Università di Roma, Italia)

Modern small area estimation methods focus on mixed effects regression models that link the small areas and borrow strength from similar domains.
However, when the auxiliary variables used in the models are measured with error, small area estimators that ignore such error may be worse than direct estimators ([1], [2]). In regression models, the presence of measurement error in covariates is known to cause biases in estimated model parameters and lead to loss of power for detecting interesting relationships among variables [3]. We discuss these issues in the context of small area models. Extending the model in [4], we propose a Bayesian unit-level model that accounts for measurement error in both continuous and categorical covariates. We assume that the continuous observable covariates are modeled as Gaussian variables centered at the true unobservable value. For the discrete variables we model the misclassification probabilities and estimate them jointly with all the unknown model parameters.
We test our model through a simulation study exploring different scenarios. Based on simulated data, we also study the model capability in reconstructing the true value of the perturbed variables for each unit. Under the assumption of unknown missclassification probabilities , our model is not only able to reduce the estimation bias, but also to recover a large fraction of the original scores for the misclassified variables. Other proposals in the literature, most notably MC-SIMEX [5], address the issue of misclassification in covariates. Our proposal offers the advantage to allow for unknown missclassification probabilities, that are estimated jointly with all the unknown model’s parameters. A clear example of the effect of neglecting measurement error and an illustration of the advantages of the proposed procedure arise in the analysis of body mass index (BMI) of Ethiopian women, that we base on 2011 Ethiopia Demographic and Health Survey (DHS) data. BMI is taken as a measure of women’s nutritional status. We fit the proposed model to obtain accurate estimates of women’s mean BMI levels across domains. The model also allows assessing the role on BMI of a number of socio-economic characteristics such as age, household’s wealth index, number of children, and level of educational attainment, while accounting for regional variation. All of the above variables are clearly potentially explicative of the woman’s nutritional status and highlighted as important determinants of undernutrition in previous studies. However, for some of them it is reasonable to assume that they are measured with error. Our application reveals that, even in the presence of large subsamples, the small area predictions obtained ignoring the measurement error may be misleading and covariates effect may be severely altered.


[1] Ybarra, L.M.R. and Lohr, S.L.(2008) Small area estimation when auxiliary information is measured with error, Biometrika, 95(4), 91.09–931.0. [2] Arima, S., Datta, G.S. and Liseo, B. (2015) Bayesian Estimators for Small Area Models when Auxiliary Information is Measured with Error, Scandinavian Journal of Statistics, 42 (2),518–529 [3] Carroll, R.J., Ruppert, D., Stefanski, L. and Crainiceanu, C. (2006) Mea- surement error in nonlinear models: a modern perspective, 2nd edn. Chapman & Hall, CRC [4]Ghosh, M., Sinha, K. and Kim, D. (2006) Empirical and Hierarchical Bayesian estimation in finite population sampling under structural measurement error model, Scandinavian Journal of Statistics, 33(3),591.0-568. [5] Küchenhoff, H., Mwalili, S.M. and Lesaffre, E. (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX, Biometrics, 62(1), 85–96.

Download presentation.