Maximum likelihood estimation of odds ratios with application to prediction of deduplicated audience under marginal constraints


Multiple Data Sources, Data Linkage

William Waldron (Booz Allen Hamilton) (Speaker)
Daniel Bonnery (University of Maryland)
Neung Soo Ha (Nielsen)

One of the key audience estimates in marketing and media research is the number of unique persons in a subpopulation that viewed a television network, program, or episode in either one of two platforms: traditional television or digital media. The latter platform includes a personal computer (PC), mobile device, or electronic notebook. The viewing may also be defined over different time periods: a day, week, month, or even quarter of a year. We consider the following case in Television and Digital audience measurement with two “Big” datasets that contain detailed information about individual level viewing exposures from television and digital media by PC. Our interest is about the audience that viewed an entity on both TV and digital, and only a limited subset of respondents in both platforms are linked. In this context, the independence assumption needed for standard data fusion techniques are not reasonable: negative correlations between viewing on TV and PC of all content types must be accounted for.
For this study, we use a unit level model on the viewing where the odds of viewing on TV versus PC depends on the individual demographic characteristics and on program classification and period length. We then use the estimated odds ratio to estimate the duplicated viewers by using the marginal area level estimates from television and digital audience. The machinery of the odds ratios allows us to estimate the duplication on the limited linked datasets. Hence, the duplicated audience may be derived from marginal estimates of TV and PC audience and viewing odds ratio. Finer fragmentation of demographic subgroups leads to small sample sizes in many cells (some have 0 observations) and the usual estimators of the odds ratio become highly unreliable. We therefore model the odds ratio in a cell as a function of the cell characteristics, allowing us to borrow strength across margins. Based on simulation results, the methodology outperforms the typical of odds ratio estimators. We applied our methods to the Nielsen TV and PC data.

Download presentation.