Collecting annotations from human raters often results in a trade-off between the quantity of labels one wishes to gather and the quality of these labels. As such, it is often only possible to gather a small amount of high-quality labels. In this paper, we study how different training strategies can leverage a small dataset of human-annotated labels and a large but noisy dataset of synthetically generated labels (which exhibit bias against identity groups) for predicting toxicity of online comments. We evaluate the accuracy and fairness properties of these approaches, and trade-offs between the two. While we find that initial training on all of the data and fine-tuning on clean data produces models with the highest AUC, we find that no single strategy performs best across all fairness metrics.
翻译:收集人类保值器的注释往往导致在人们希望收集的标签数量与这些标签的质量之间取舍。 因此,通常只能收集少量高质量标签。 在本文中,我们研究不同的培训战略如何利用少量的人类加注标签数据集和大量但又吵闹的合成标签数据集来预测在线评论的毒性(这些标签对身份群体有偏见)。我们评估了这些做法的准确性和公平性,以及两者之间的取舍。虽然我们发现关于所有数据的初步培训和清洁数据的微调都产生了与最高奥地利联合自卫军的模型,但我们发现没有任何单一战略在所有公平指标中最能发挥作用。