Design Based Approach for SAE with Gradient Boosted Models.


Poster Session

Poster no.13

Hongjian Yu (UCLA) (Speaker)
Carl Ganz (UCLA)
Yueyan Wang (UCLA)
Pan Wang (UCLA)
Ninez Ponce (UCLA)

Weighted model estimates such as those from pseudo maximum likelihood (PML) are both model consistent and design consistent. Design-based Jackknife replication are sound in theory and flexible in practice for variance calculation. These pave a path to applying machine-learning in SAE in which model likelihood function may not be available. A major advantage of machine-learner is its ability to include a large number of auxiliary variables in various forms with relatively easy implementation. Boosting algorithm is a computationally efficient algorithm that creates strong predictions through an ensemble of weak learners. In this study, we compared SAEs using gradient boosted models (GBM) with those from PML method and direct estimates. The estimates from GBM performed as well as those from PML method, while adding more flexibility to the regression model. The GBM is capable of taking high-dimensional data, and holds no assumption about predictors. Challenges remain such as choosing base-learners, determination of hyper-parameters such as number of iterations, and integration of other machine learning methods.

Hastie, T.,, Tibshirani, R., & Friedman, J. (2013) The Elements of Statistical Learning. Springer. Hofner, B., Mayr, A., Robinzonov, N., (2014) Model-based Boosting in R. Computational Statistics, 29:3-35. Netekin, A. & Knoll, A., (2013) Gradient Boosting Machines. Frontiers in Neurorobotics, doi: 10.3389/fnbot.2013.00021. Wang, Y., Ponce, N. A., Wang, P., Opsomer, J. D., & Yu, H. (2015). Generating Health Estimates by Zip Code: A Semiparametric Small Area Estimation Approach Using the California Health Interview Survey. Am J Public Health, 105(12), 2534-2540. doi: 10.2105/AJPH.2015.302810.
Keywords: Small area estimation, Generalized additive models, Gradient boosting.