This paper proposes a new method to predict individual political ideology from digital footprints on one of the world's largest online discussion forum. We compiled a unique data set from the online discussion forum reddit that contains information on the political ideology of around 91,000 users as well as records of their comment frequency and the comments' text corpus in over 190,000 different subforums of interest. Applying a set of statistical learning approaches, we show that information about activity in non-political discussion forums alone, can very accurately predict a user's political ideology. Depending on the model, we are able to predict the economic dimension of ideology with an accuracy of up to 90.63% and the social dimension with and accuracy of up to 82.02%. In comparison, using the textual features from actual comments does not improve predictive accuracy. Our paper highlights the importance of revealed digital behaviour to complement stated preferences from digital communication when analysing human preferences and behaviour using online data.
翻译:本文提出了从世界上最大的在线讨论论坛之一的数字足迹中预测个人政治意识形态的新方法。 我们从网上讨论论坛重新编辑了一套独特的数据集,其中载有约91 000名用户的政治意识形态信息及其评论频率的记录和超过190 000个不同感兴趣的子论坛的评论文体。我们采用一套统计学习方法,显示仅有关非政治讨论论坛活动的信息就可以非常准确地预测用户的政治意识形态。根据模型,我们能够预测意识形态的经济层面,精确度高达90.63%,社会层面精确度高达82.02%。相比之下,使用实际评论的文字特征并不能提高预测准确性。我们的文件强调,在利用在线数据分析人类偏好和行为时,披露的数字行为对于数字通信所显示的偏好具有补充作用的重要性。