Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can thus lead to a sidelining of the various groups they aim to help in the first place. It has piqued researchers' interest in examining unintended biases and their mitigation. Due to the nascent and multi-faceted nature of the work, complete literature is chaotic in its terminologies, techniques, and findings. In this paper, we put together a systematic study of the limitations and challenges of existing methods for mitigating bias in toxicity detection. We look closely at proposed methods for evaluating and mitigating bias in toxic speech detection. To examine the limitations of existing methods, we also conduct a case study to introduce the concept of bias shift due to knowledge-based bias mitigation. The survey concludes with an overview of the critical challenges, research gaps, and future directions. While reducing toxicity on online platforms continues to be an active area of research, a systematic study of various biases and their mitigation strategies will help the research community produce robust and fair models.
翻译:在线毒性的检测始终因其固有的主观性而是一个挑战,诸如这些站点的生产者和消费者的背景、地理、地理、社会政治气候等因素在确定内容是否可标注为有毒物质方面发挥着关键的作用。在生产过程中采用自动毒性检测模型,可以导致最初旨在帮助的各类群体的排挤。它使研究人员对研究意外偏向和减轻偏向感兴趣。由于这项工作的初创性和多面性,完整的文献在其术语、技术和结果方面混乱无序。在本文件中,我们系统地研究了现有减轻毒性检测偏向的方法的局限性和挑战。我们仔细研究了评估并减轻有毒言辞检测偏向的拟议方法。为了审查现有方法的局限性,我们还开展了一项案例研究,以引入因减少基于知识的偏向而导致的偏向转变概念。调查最后概述了关键的挑战、研究差距和未来方向。在网上平台上降低毒性继续是一个积极的研究领域,同时系统研究各种偏向及其稳健的缓解战略将帮助社区产生公平的研究模型。