Social media as a data source for official statistics; the Dutch Consumer Confidence Index


New Challenging Problems in SAE from Real Life

Jan van den Brakel (Statistics Netherlands and Maastricht Unversity School of Business and Economics) (Speaker)
Piet Daas (Statistics Netherlands)
Bart Buelens (Statistics Nehterlands)

One way to use big data sources in the production of official statistics is to use them as auxiliary information in models for small area estimation procedures. Marchetti et al. (2015) used mobility data to predict poverty in a Fay Herriot model that improves the effective sample size with sample information from other domains. Most national statistical institutes conduct surveys repeatedly. Therefore, a multivariate structural time series modelling approach is an alternative way to improve the precision of direct estimates with sample information from previous periods and auxiliary series derived from related (big) data sources. While Marchetti et al. (2015) uses big data as auxiliary information to borrow strength over space, this paper follows a time series approach to use sample information from preceding periods and a related big data series. This model also makes it possible to utilize the higher frequency of big data to produce more precise estimates for the sample survey in real time at the moment that statistics from a big data source become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey, which is used to estimate consumers’ confidence on a monthly frequency at a national level. In this case small samples arises at the national level due to the short reference period and high nonresponse rates. The auxiliary series is a sentiment index derived from social media.

Marchetti, S., Giusti, C., Pratesi, M., Salvati, N., Giannotti, F., Perdreschi, D., Rinzivillo, S., Pappalardo, L., and Gabrielli, L. (2015). Small area model-based estimators using Big data sources. Journal of Official Statistics, 31, 263-281.