处理机器学习概念漂移问题显示,在COVID-19大流行期间,疫苗情绪下降 (Addressing machine learning concept drift reveals declining vaccine sentiment during the COVID-19 pandemic)

Social media analysis has become a common approach to assess public opinion on various topics, including those about health, in near real-time. The growing volume of social media posts has led to an increased usage of modern machine learning methods in natural language processing. While the rapid dynamics of social media can capture underlying trends quickly, it also poses a technical problem: algorithms trained on annotated data in the past may underperform when applied to contemporary data. This phenomenon, known as concept drift, can be particularly problematic when rapid shifts occur either in the topic of interest itself, or in the way the topic is discussed. Here, we explore the effect of machine learning concept drift by focussing on vaccine sentiments expressed on Twitter, a topic of central importance especially during the COVID-19 pandemic. We show that while vaccine sentiment has declined considerably during the COVID-19 pandemic in 2020, algorithms trained on pre-pandemic data would have largely missed this decline due to concept drift. Our results suggest that social media analysis systems must address concept drift in a continuous fashion in order to avoid the risk of systematic misclassification of data, which is particularly likely during a crisis when the underlying data can change suddenly and rapidly.

翻译：社交媒体分析已近实时地成为评估包括健康在内的各种议题的公众舆论的共同方法。社交媒体日多导致在自然语言处理过程中更多地使用现代机器学习方法。社交媒体的快速动态可以迅速捕捉基本趋势,但也带来了一个技术问题:过去在附加说明数据方面受过培训的算法在应用到当代数据时可能表现不佳。这个被称为“概念漂移”的现象,当兴趣主题本身或讨论主题的方式发生迅速变化时,可能特别成问题。在这里,我们探索机器学习概念漂移的影响,重点是在Twitter上表达的疫苗情绪,这是一个非常重要的主题,特别是在COVID-19大流行期间。我们表明,虽然2020年COVID-19大流行期间疫苗情绪大幅下降,但是,由于概念漂移,接受过关于广度前数据培训的算法可能在很大程度上忽略了这种下降。我们的研究结果表明,社会媒体分析系统必须持续处理概念漂移问题,以避免数据系统性分类的风险,在危机期间,在基本数据可以突然和迅速变化的情况下,这种可能性特别大。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日