Authors:Tomasz Józefowski (Center for Small Area Estimation, Statistical Office in Poznan)Andrzej Młodak (Center for Small Area Estimation, Statistical Office in Poznan) (Speaker) Tomasz Klimanek (Center for Small Area Estimation, Statistical Office in Poznan)Marcin Szymkowiak (Department of Statistics, Poznan University of Economics and Business, Center for Small Area Estimation, Statistical Office in Poznan)
Comprehensive and reliable assessment of the quality of estimates obtained using small area estimation methodology is one of the key challenges facing national statistical institutes. Indirect estimation theory provides many criteria for the statistical assessment of results and model diagnostics. They involve assessing relative estimation errors and relative bias, measures of the goodness of fit, evaluating model assumptions and checking whether estimates in small domains sum up to higher-level aggregates published in official statistics for larger domains (benchmarking). Estimates obtained by means of small area estimation should also be assessed by subject matter experts with knowledge and experience in a given field, who are able to evaluate their quality.
For purposes of subject matter assessment of results, one can also draw on auxiliary sources of information, which contain similar variables (in terms of definitions) to the target variables estimated by means of SAE methods for domains of interest. For example, when estimating the number of unemployed across domain using indirect estimation, one can compare its spatial distribution with the distribution of the number of registered unemployed from the administrative register. This auxiliary variable is a kind of proxy variable, which can be used to assess the quality of small area estimates.
Unfortunately, not all variables estimated by means of SAE techniques can be linked to relevant proxy variables. Nevertheless, one can attempt to construct an artificial composite variable based on other variables which affect it using methods provided by taxonomic analysis. Such analysis is, in some sense, similar to the TOPSIS (Technique for Order Performance by Similarity to Ideal Solution) approach. However, it differs from TOPSIS in that the composite variable is a special function that only represents distances from the development benchmark (the positive ideal object) while attempting to account as much as possible for any possible connections between variables (especially other than correlation, including those that are statistically unquantifiable).
In the article, the authors present a way of constructing a proxy variable for poverty rate, estimated for districts (NUTS 4) of Poland, using SAE. The spatial median (a multivariate generalisation of the median – one of the methods used in taxonomic analysis) is used to construct a complex measure which will serve as a proxy variable for the poverty rate. The complex measure will be constructed using generally available indicator variables about demography, labour market, housing management and living conditions. This synthetic proxy variable will then provide a reference distribution of poverty which will be compared with small area estimates of poverty.
Since the proxy variable is created so as to account for all possible (and not always directly identifiable) connections between original indicators, one can assess the degree to which the SAE procedure reflects these connections. Used in addition to standard statistical criteria, this approach will be another valuable assessment of the quality of indirect estimates of poverty rate in Poland. Additionally, the proxy variable can itself prove to be a very efficient auxiliary variable in in other SAE models, which will be better than specific indicators used to construct it, but treated as separate covariates.