Content moderation is often performed by a collaboration between humans and machine learning models. However, it is not well understood how to design the collaborative process so as to maximize the combined moderator-model system performance. This work presents a rigorous study of this problem, focusing on an approach that incorporates model uncertainty into the collaborative process. First, we introduce principled metrics to describe the performance of the collaborative system under capacity constraints on the human moderator, quantifying how efficiently the combined system utilizes human decisions. Using these metrics, we conduct a large benchmark study evaluating the performance of state-of-the-art uncertainty models under different collaborative review strategies. We find that an uncertainty-based strategy consistently outperforms the widely used strategy based on toxicity scores, and moreover that the choice of review strategy drastically changes the overall system performance. Our results demonstrate the importance of rigorous metrics for understanding and developing effective moderator-model systems for content moderation, as well as the utility of uncertainty estimation in this domain.
翻译:内容节制往往通过人与机器学习模式之间的协作来进行,然而,对于如何设计协作进程以最大限度地提高主持人-示范系统的综合性能,人们并不十分了解。这项工作对该问题进行了严格的研究,侧重于将模型不确定性纳入协作进程的方法。首先,我们引入了原则性衡量标准,以描述合作系统在人与主持人能力制约下的表现,量化综合系统如何有效地利用人类的决定。我们使用这些衡量标准,进行了一项大型基准研究,评估不同合作审查战略下最新不确定性模型的性能。我们发现,基于不确定性的战略始终超越基于毒性分数的广泛使用的战略,此外,审查战略的选择极大地改变了整个系统的业绩。我们的成果表明,严格的衡量标准对于理解和开发有效的主持人-示范系统以节制内容的重要性,以及该领域不确定性估算的效用。