We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.
翻译:摘要:我们引入了HATELEXICON,该词汇表包含巴西、德国、印度和肯尼亚国家的诽谤和仇恨言论目标,以帮助模型的训练和可解释性。我们展示了如何使用我们的词汇表来解释模型的预测,在极端言论分类的开发中,研究表明模型在进行预测时依赖于目标词。我们还提出了一种通过HATELEXICON来帮助低资源环境下训练选择文本的方法。在少样本学习中,选择文本对模型的性能至关重要。在我们的研究中,我们模拟了德语和印地语的少样本设置,使用HASOC数据进行训练,并使用多语言仇恨检查 (MHC) 作为基准。我们展示了基于我们的词汇表选择文本可使模型在MHC上表现更好,而不是在随机选择的样本上进行训练的模型。因此,在只给出少量训练示例的情况下,使用我们的词汇表选择包含更多社会文化信息的文本可带来更好的少样本性能。