Content moderation is the process of flagging content based on pre-defined platform rules. There has been a growing need for AI moderators to safeguard users as well as protect the mental health of human moderators from traumatic content. While prior works have focused on identifying hateful/offensive language, they are not adequate for meeting the challenges of content moderation since 1) moderation decisions are based on violation of rules, which subsumes detection of offensive speech, and 2) such rules often differ across communities which entails an adaptive solution. We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 Million Reddit comments spanning 56 subreddits in English, German, Spanish and French. We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems such as cross-lingual transfer, learning under label noise (human biases), transfer of moderation models, and predicting the violated rule. Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.
翻译:内容温和是一种基于预先界定的平台规则的标志性内容过程; AI主持人越来越需要保护用户和保护人类主持人的心理健康不受创伤性内容的影响; 尽管先前的工作侧重于识别仇恨/冒犯性语言,但不足以应对内容温和性挑战,因为:(1) 温和性决定基于违反规则,包含对攻击性言论的检测,和(2) 此类规则在各社区之间往往有所不同,这需要适应性的解决办法; 我们提议研究内容温和性的挑战,采用一套包含1.8百万次Reddit评论的多语种数据集,以英文、德文、西班牙文和法文为56次次修改提供。 我们进行了广泛的实验分析,以突出潜在的挑战,并提出相关的研究问题,如跨语言转移、在标签噪音(人类偏见)、调和模式转移以及预测被违反的规则。 我们的数据组合和分析可以帮助更好地准备自动调和的挑战和机遇。