Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RAFT (Rationale Adaptor for Few-shoT classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RAFT models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RAFT-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RAFT-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.
翻译:与在线社交媒体有关的问题是,过去关于探测滥用语言的研究涉及不同平台、语言、人口统计等。 但是,使用这些数据集培训的模型在跨域评价环境中效果不佳。 要克服这一点,一个共同的战略是使用目标领域的一些样本来培训模型,以提高该领域的性能(跨多功能的少发培训)。然而,这可能导致模型过度适应这些样本的工艺品。一个令人信服的解决办法是指导模型寻找理由,即证明文本标签合理的文本范围。在各种NLP任务的内部设置中,已经发现使用这些数据集培训的模型效果不佳。为了克服这一点,我们建议使用目标领域的一些样本样本来培训模型,以便在这方面取得更好的业绩(跨多功能的几发培训),但是,这可能会导致模型的多功能学习设置,以共同学习原理、目标和标签。在仅以理由为基础的分类方法中,在仅以理由为基础的分类方法中,我们采用了两种基于理论的综合BERFER-AS-条件的模型,在基于五度基准模型的模型中,我们用基于RAFFA-格式的模型来评估其他标准格式的模型。