Authors:Setareh Ranjbar (Research Center for Statistics, Geneva School of Economics and Management, University of Geneva) (Speaker) Elvezio Ronchetti (Research Center for Statistics, Geneva School of Economics and Management, University of Geneva)
Today the availability of rich sample surveys provides a ground for researchers and policy makers to pursue more ambitious objectives. This information in line with auxiliary data coming through administrative channels are used for a better prediction/estimation of social and economic indices, e.g. inequality or poverty measures, that can help to determine more precisely their target groups (target domains). In this context the domains (sub-population groups) for which the sample size is not large enough to provide an acceptable direct estimate, are referred to as “small areas” and one needs to “borrow strength” from other existing sources of information such as other domains and/or time periods to estimate/predict with plausible accuracy. Where an explicit model is imposed on the data, this approach is called indirect model-based estimation. For a comprehensive review on this subject we refer to Rao and Molina (2015). However, the existence of outliers in the sample data can significantly harm the estimation for areas in which they occur, especially where the domain-sample size is small. Chambers (1986) discussed the robust estimation of finite population total and mean in the presence of outliers. Later, based on this approach, Welsh and Ronchetti (1998) proposed a bias-calibration for the robust cumulative distribution for a finite population.
Tzavidids et al. (2010) provide a comprehensive review on the bias calibrated robust small area estimators. Their argument is based on the fact that if the small area cumulative distributions are estimated/predicted adequately, other functional statistics can be derived in a consistent way. We modify slightly this general approach by using a different way of weighing the observed and predicted outcomes. We do this by considering equal weights for all the points on the predictive distributions of each unobserved as well as the observed units. Our results are consistent with those presented in Chambers et al. (2014). However, while their paper mainly focuses on outliers that are drawn from a distribution with a different mean from the rest of the survey, our study first considers skewed heavy tail distributions as a source of contamination to the model. We then propose a new calibration method based on a truncation function that exploits the extra information regarding the source of the contamination. That is to say, we apply a skewed truncation function to correct for the bias of the robust estimator(s) and we propose a simple algorithm to choose the truncation interval for this type of calibration.
Our simulations are based on a realistic setting where the so called “representative outliers” are coming from the tail of a skewed heavy tail distribution. Our results show that the new bias-calibration method often leads to more efficient estimators of the areas’ finite population distribution in general, and specifically inequality indices such as the Gini index. However, we point out based on different simulation scenarios that the ranking of the small areas based on bias-calibrated robust inequality indices must be treated with more care. We argue that our approach can be particularly valuable for the empirical analysis of income and wealth inequality and/or poverty measures, yet can be extended to all linear and non-linear population parameters, where a skewed heavy tail distribution for random errors/components is plausible. This approach can also be proposed as a way to impute the missing income or wage values, which is a common occurrence in labour force surveys, under the assumption that the missing values are at random.
References:Chambers, R. L. (1986). Outlier robust finite population estimation. Journal of the American Statistical Association, 81(396), 1063-1069.
Chambers, R., Chandra, H., Salvati, N., and Tzavidis, N. (2014). Outlier robust small area estimation. Journal of the Royal Statistical Society: Series B, 76(1), 47-69.
Rao, J. N., and Molina, I. (2015). Small area estimation. John Wiley and Sons.
Tzavidis, N., Marchetti, S., and Chambers, R. (2010). Robust estimation of small‐area means and quantiles. Australian and New Zealand Journal of Statistics, 52(2), 167-186.
Welsh, A. H., and Ronchetti, E. (1998). Bias‐calibrated estimation from sample surveys containing outliers. Journal of the Royal Statistical Society: B , 60(2), 413-428.