We seek to democratise public-opinion research by providing practitioners with a general methodology to make representative inference from cheap, high-frequency, highly unrepresentative samples. We focus specifically on samples which are readily available in moderate sizes. To this end, we provide two major contributions: 1) we introduce a general sample-selection process which we name online selection, and show it is a special-case of selection on the dependent variable. We improve MrP for severely biased samples by introducing a bias-correction term in the style of King and Zeng to the logistic-regression framework. We show this bias-corrected model outperforms traditional MrP under online selection, and achieves performance similar to random-sampling in a vast array of scenarios; 2) we present a protocol to use Large Language Models (LLMs) to extract structured, survey-like data from social-media. We provide a prompt-style that can be easily adapted to a variety of survey designs. We show that LLMs agree with human raters with respect to the demographic, socio-economic and political characteristics of these online users. The end-to-end implementation takes unrepresentative, unsrtuctured social media data as inputs, and produces timely high-quality area-level estimates as outputs. This is Artificially Intelligent Opinion Polling. We show that our AI polling estimates of the 2020 election are highly accurate, on-par with estimates produced by state-level polling aggregators such as FiveThirtyEight, or from MrP models fit to extremely expensive high-quality samples.
翻译:暂无翻译