Patronizing and condescending language (PCL) is everywhere, but rarely is the focus on its use by media towards vulnerable communities. Accurately detecting PCL of this form is a difficult task due to limited labeled data and how subtle it can be. In this paper, we describe our system for detecting such language which was submitted to SemEval 2022 Task 4: Patronizing and Condescending Language Detection. Our approach uses an ensemble of pre-trained language models, data augmentation, and optimizing the threshold for detection. Experimental results on the evaluation dataset released by the competition hosts show that our work is reliably able to detect PCL, achieving an F1 score of 55.47% on the binary classification task and a macro F1 score of 36.25% on the fine-grained, multi-label detection task.
翻译:支持性语言(PCL)无处不在,但很少关注媒体对弱势社区使用这种语言的情况。准确检测这种形式的PCL是一个困难的任务,因为标签数据有限,而且可能非常微妙。在本文中,我们描述了我们向SemEval 2022任务4:支持性和征服性语言探测提供的这种语言探测系统。我们的方法使用了一组预先培训的语言模型、数据增强和优化检测阈值。竞争东道方发布的评估数据集的实验结果显示,我们的工作可以可靠地检测PCL,在二进制分类任务上达到55.47 %的F1分,在精细的多标签检测任务上达到36.25%的F1分。