In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL -- a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.
翻译:鉴于互联网和社交媒体的普及程度空前提高,评论节制从来就不是一项更为相关的任务。半自动评论节制系统通过自动分类实例或允许主持人优先考虑哪些意见,极大地帮助了人类主持人。然而,不适当内容的概念往往是主观的,这种内容可以许多微妙和间接的方式传达。在这项工作中,我们提议CORAL -- -- 一种语言和文化上了解的克罗地亚恶劣数据集,涵盖隐性现象和依赖当地和全球背景。我们实验性地显示,当评论不明确时,当需要语言技能和背景知识来解释评论时,目前的模型会退化,还会进一步退化。