We present a human-and-model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models. We provide a new dataset of ~40,000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation. It includes ~15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and target of hate. Hateful entries make up 54% of the dataset, which is substantially higher than comparable datasets. We show that model performance is substantially improved using this approach. Models trained on later rounds of data collection perform better on test sets and are harder for annotators to trick. They also perform better on HateCheck, a suite of functional tests for online hate detection. We provide the code, dataset and annotation guidelines for other researchers to use. Accepted at ACL 2021.
翻译:我们展示了动态生成数据集的人类和模范流动流程,并培训了业绩更好、更健全的仇恨检测模型。我们提供了一套新数据集,由经过培训的批注员在四轮动态数据生成过程中生成和贴标签的约40,000个条目,由经过培训的批注员在四轮动态数据生成过程中生成和贴上标签,包括~15,000个挑战性扰动,每条仇恨条目都有关于仇恨类型和目标的细微标签。仇恨条目占数据集的54%,大大高于可比数据集。我们显示,采用这一方法,模型的性能得到大幅改进。在以后几轮数据收集方面受过培训的模型在测试数据集上表现更好,对批注员来说更难于操控。在Hate Check上,这是一套用于在线仇恨检测的功能性测试。我们为其他研究人员提供了代码、数据集和批注指南,供其他研究人员使用。我们在ACL 2021接受。