Small area estimation based on quantile regression: a Bayesian approach


Bayesian methods (1)

Enrico Fabrizi (Università Cattolica del S. Cuore)$star
Giovanni Riccardi (Università di Bologna, Italy)
Nicola Salvati (Università di Pisa)
Carlo Trivisano (Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Italy)

Quantile and M-quantile regression methods have been applied to small area estimation in several papers (we can quote Chambers and Tzavidis (2006) ; Chambers et al. (2014) among those). The main idea is that of using a semi-parametric regression model for quantiles, thus avoiding parametric distributional assumptions on regression’s residuals and random effects. Chambers and Tzavidis (2006) propose the idea of measuring heterogeneity among areas using M-quantile coefficients. The M-quantile coefficient of a population unit is defined as the quantile characterizing the regression plane where the unit lies. Averages of estimated M-quantile coefficients of units from the same area define area-specific quantiles that are used when applying to M-quantile regression to prediction of area parameters.
Although the Bayesian literature on quantile regression is vast and rapidly growing according to a number of different lines of research, its application to small area estimation are very limited. The main aim of this contribution is to fill this gap.
With respect to Bayesian literature on quantile regression, we restrict our attention to methods based on the joint estimation of conditional quantiles as this favors the borrowing strength in small area estimation. We avoid a parametric specification of the likelihood, while keeping the normal as a special case, as it represents the standard choice in many small area applications. Eventually, we are interested in a method that can be implemented using widely popular MCMC software such as JAGS. To the best of our knowledge there is no small area estimation strategy with these properties.
To meet these goals, we extend the quantile regression method proposed by Reich and Smith (2013) in order to apply it to small area estimation. In the Reich proposal, the quantile function is represented as a linear combination of basis functions. The basis functions are chosen so that the resulting quantile function is that of the normal (or other reference distribution) when the coefficients ruling the linear combination are all equal, but flexible enough to accommodate a wide range of quantile functions. As normality is characterized in terms of equality of the parameters, we can use their posterior distributions to assess the normality of the data being analyzed. We extend the methodology of Reich to include area-specific random effects. Specifically, we assume that both intercepts and slopes associated to covariates are random. Parameters ruling tail behavior of the area-specific distributions are assumed to be constant across areas. The way we include random effects in the model proposed by Reich et al. (2013) is new. Differently from Chambers and Tzavidis (2006) our method does not make use of quantile or M-quantile coefficients.
For the random slope and intercept, we specify a Dirichlet process prior with the normal as the basis distribution, in line with the non-parametric specification of the likelihood. Simpler parametric alternatives such as the normal and t distribution are also considered for comparative purposes.
Quantile regression can straightforwardly be applied to the prediction of conditional quantiles by using known summaries of auxiliary variables accurately known at the area level. Joint estimation of quantiles, allows for the estimation of the whole quantile function at the area level; this can provide the basis for estimating inequality measures and many other parameters. This represent an innovation with respect to most small area estimation methods. The posterior distribution of the area-level predicted population mean is obtained using convex linear combinations of predicted quantiles, according to different choices of coefficients and quantiles. These predictors of the mean, based on the representation of the expected value as the integral of the quantile function include the tri-mean estimator proposed by Tukey (1977).
Using an extended simulation exercise, this latter estimator is shown to be as efficient or more than the frequentist small area estimators compared in Chambers et al. (2014).
The paper includes an application to real data. Specifically, we use sample data from the Survey on Income and Living Conditions of Household with Foreign People, a survey carried out by Italian National Institute of Statistics for the first time in 2009 when a sample of about 6,000 households with at least one foreign member were drawn from the population of Italian households. The survey is based on the same questionnaire and methods of the Italian section of the EU-SILC survey. The target variable we consider is the equivalized income, equivalization based on the modified OECD scale, while the target areas are given by 116 most relevant foreign communities residing in Italy. Direct estimation of mean equivalized income has adequate precision for a small minority of the communities (just in 13 out 116 cases the coefficients of variation is less than one third). The auxiliary information we use are taken from the Italian Population Census of 2001, population registers and other administrative archives.

Chambers, R. and Tzavidis, N. (2006), M-quantile Models for Small Area Estimation, Biometrika, 93, 255-268. Chambers, R., Chandra, H., Salvati, N., Tzavidis, N. (2014), Outlier robust small area estimation, Journal of the Royal Statistical Society, series B,76, 47-69. Reich B., Smith L.B. (2013) Bayesian quantile regression for censored data, Biometrics, 69, 651-660. Tukey, J. W. (1977), Exploratory data analysis, Addison-Wesley, 84, 327-344.