Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large sets of hand-annotated training data are needed. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework. We use a range of methodologies to provide baseline moral-sentiment classification results for this new corpus, e.g., cross-domain classification and knowledge transfer.
翻译:在自然语言处理(NLP)中,使用了各种计算方法,从文字数据中检测出道德情绪,但为了在这种主观任务中取得更好的表现,需要大量手语附加说明的培训数据。 以往的道德情绪附加说明证明被证明是有价值的,并在国家语言方案内部和整个社会科学中产生了新的见解,但仅限于推特。为了促进我们对道德言辞作用的理解,我们介绍了道德基金会Reddit Corpus,这是从12个不同的子编辑中整理出来的16 123项评论集,由至少3名受过训练的道德情绪类别(即Care、相称性、平等、纯度、权威、洛亚尔提、Thin Morality、隐性/深度道德)的批量性,根据最新的道德基础理论(MFT)框架,我们使用一系列的道德分类方法,为新的道德分类和知识转移提供道德分类。