Most hate speech detection research focuses on a single language, generally English, which limits their generalisability to other languages. In this paper we investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another. We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech (CCNL-Ex). Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita2018 and AMI@Ibereval2018 involving three languages: English, Spanish and Italian, outperforming state-of-the-art baselines on all six language pairs.
翻译:多数仇恨言论检测研究侧重于一种单一语言,一般是英语,这限制了其通用性,在本文中,我们调查了跨语言仇恨言论检测任务,通过将仇恨言论资源从一种语言调整到另一种语言来解决这一问题。我们提出了一个跨语言胶囊网络学习模式,并增加了针对特定域的仇恨言论词汇(CCNL-Ex)。我们的模式取得了AMI@Evalita2018和AMI@Ibereval2018基准数据集的最新业绩,涉及三种语言:英语、西班牙语和意大利语,在所有六种语言中都超过了最先进的基线。