Identifying emotions from text is crucial for a variety of real world tasks. We consider the two largest now-available corpora for emotion classification: GoEmotions, with 58k messages labelled by readers, and Vent, with 33M writer-labelled messages. We design a benchmark and evaluate several feature spaces and learning algorithms, including two simple yet novel models on top of BERT that outperform previous strong baselines on GoEmotions. Through an experiment with human participants, we also analyze the differences between how writers express emotions and how readers perceive them. Our results suggest that emotions expressed by writers are harder to identify than emotions that readers perceive. We share a public web interface for researchers to explore our models.
翻译:从文字中识别情感对于各种真实的世界任务至关重要。 我们认为目前最大的两种情感分类公司是:GoEmotions,由读者标注的58k条信息;Vent,由作者标注的33M条信息。我们设计了一个基准并评价了几个特色空间和学习算法,包括在BERT之上的两个简单而新颖的模型,这些模型的性能优于以往关于GoEmotion的强大基线。我们通过对人类参与者的实验,还分析了作家表达情感的方式和读者如何看待这些情感的区别。我们的结果表明,作者表达的情绪比读者感知的情绪更难识别。我们分享了一个公共网络界面,供研究人员探索我们的模型。