Twitter sentiment analysis, which often focuses on predicting the polarity of tweets, has attracted increasing attention over the last years, in particular with the rise of deep learning (DL). In this paper, we propose a new task: predicting the predominant sentiment among (first-order) replies to a given tweet. Therefore, we created RETWEET, a large dataset of tweets and replies manually annotated with sentiment labels. As a strong baseline, we propose a two-stage DL-based method: first, we create automatically labeled training data by applying a standard sentiment classifier to tweet replies and aggregating its predictions for each original tweet; our rationale is that individual errors made by the classifier are likely to cancel out in the aggregation step. Second, we use the automatically labeled data for supervised training of a neural network to predict reply sentiment from the original tweets. The resulting classifier is evaluated on the new RETWEET dataset, showing promising results, especially considering that it has been trained without any manually labeled data. Both the dataset and the baseline implementation are publicly available.
翻译:Twitter情绪分析通常侧重于预测推文的极极性,过去几年来,特别是随着深入学习(DL)的上升,这种分析日益引起越来越多的关注。在本文中,我们提出一项新的任务:预测某一推文(第一顺序)答复中的主要情绪。因此,我们创建了一个关于推文的大量数据集RETWEET, 并手工用情绪标签附加注释。作为一个强有力的基线,我们提议一个基于DL的两阶段方法:首先,我们通过在推文答复中应用标准情绪分类器并汇总每条原始推文的预测,自动创建标签培训数据;我们的理由是,分类员的个别错误很可能在聚合步骤中被取消。第二,我们使用自动标签数据对神经网络进行监督培训,以预测原始推文的回感。由此产生的解译器在新的RETWEET数据集上进行了评估,显示有希望的结果,特别是考虑到它是在没有任何人工标记的数据的情况下接受培训的。数据集和基线的实施都公开提供。