We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.
翻译:我们引入了HATELEXICON,一种针对巴西、德国、印度和肯尼亚的恶语词汇表和目标,以帮助模型的训练和解释。我们演示了如何使用我们的词汇表来解释模型的预测,展示了开发用于分类极端言辞的模型在做出预测时如何严重依赖于目标词。此外,我们提出了一种通过HATELEXICON协助在低资源环境下进行训练的选片方法。在少样本学习中,选片对于模型性能至关重要。在我们的研究中,我们模拟了德语和印地语的少样本情境,使用HASOC数据进行训练,并将多语言厌恶检查(MHC)作为基准。我们表明,根据我们的词汇表选片可以使模型在MHC上表现得更好,而不是在随机选择样本的情况下进行训练的模型。因此,当给出很少的培训案例时,使用我们的词汇表选取包含更多社会文化信息的训练片段可导致更好的少样本性能。