We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.
翻译:我们引入 HATELEXICON,这是一个包括巴西、德国、印度和肯尼亚国家的辱骂词汇和仇恨言论目标的词库,以帮助模型的训练和解释性。我们展示了如何使用我们的词库来解释模型预测,说明分类极端言语的模型在做出预测时在很大程度上依赖于目标词汇。此外,我们提出了一种方法,利用 HATELEXICON 来帮助在资源有限的情况下选择训练中的拍摄,针对德语和印地语进行了 Few-shot 学习的模拟,并使用 HASOC 数据进行了训练,使用 Multilingual HateCheck (MHC) 作为基准。我们表明,基于我们的词库选择拍摄会使模型在 MHC 上的表现优于随机抽取拍摄的模型。因此,在仅给出少量训练示例的情况下,利用我们的词库选择包含更多社会文化信息的拍摄会导致更好的 Few-shot 表现。