To study emotions at the macroscopic level, affective scientists have made extensive use of sentiment analysis on social media text. However, this approach can suffer from a series of methodological issues with respect to sampling biases and measurement error. To date, it has not been validated if social media sentiment can measure the day to day temporal dynamics of emotions aggregated at the macro level of a whole online community. We ran a large-scale survey at an online newspaper to gather daily self-reports of affective states from its users and compare these with aggregated results of sentiment analysis of user discussions on the same online platform. Additionally, we preregistered a replication of our study using Twitter text as a macroscope of emotions for the same community. For both platforms, we find strong correlations between text analysis results and levels of self-reported emotions, as well as between inter-day changes of both measurements. We further show that a combination of supervised and unsupervised text analysis methods is the most accurate approach to measure emotion aggregates. We illustrate the application of such social media macroscopes when studying the association between the number of new COVID-19 cases and emotions, showing that the strength of associations is comparable when using survey data as when using social media data. Our findings indicate that macro level dynamics of affective states of users of an online platform can be tracked with social media text, complementing surveys when self-reported data is not available or difficult to gather.
翻译:为了研究宏观层面的情绪,有情感的科学家广泛使用社交媒体文本的情绪分析。然而,这一方法可能会在抽样偏差和测量错误方面出现一系列方法问题。迄今为止,如果社交媒体情绪能够测量整个在线社区宏观层面的情绪日间动态,那么这一方法还没有得到验证。我们在一家在线报纸上进行了大规模调查,以收集来自其用户的有感情国家的日常自我报告,并将这些调查与在同一在线平台上的用户讨论的情绪分析综合结果进行比较。此外,我们事先用Twitter文本复制了我们的研究,作为同一社区情感的宏观范围。对于这两个平台,我们发现在文本分析结果和自我报告情绪水平之间以及两种测量的日常变化之间有着很强的关联。我们进一步表明,在使用社会数据来测量情绪总量时,监管和非监管的文本分析方法是最准确的方法。我们演示了在研究新的COVID-19案例和情感之间的关联时,社会媒体宏观分析的力度与在线数据相比,当我们使用社会数据来评估时,对于用户的宏观动态的力度是可比的。