Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection. In this paper, we highlight this limitation for hate speech detection in several domains and languages using strict experimental settings. Then, we propose to train on multilingual auxiliary tasks -- sentiment analysis, named entity recognition, and tasks relying on syntactic information -- to improve zero-shot transfer of hate speech detection models across languages. We show how hate speech detection models benefit from a cross-lingual knowledge proxy brought by auxiliary tasks fine-tuning and highlight these tasks' positive impact on bridging the hate speech linguistic and cultural gap between languages.
翻译:实践证明,对于涉及许多语言特点的任务或不同语言之间存在文化差距时,如在仇恨言论探测中,零点跨语言交流学习非常具有挑战性。在本文件中,我们强调对利用严格的实验环境在多个领域和语言中发现仇恨言论的限制。然后,我们提议就多语言辅助任务 -- -- 情绪分析、名称实体识别和依赖合成信息的任务 -- -- 进行培训,以改进在各语言之间零点传递仇恨言论检测模式。我们展示仇恨言论检测模式如何受益于辅助任务微调带来的跨语言知识代用,并突出这些任务对弥合语言之间仇恨言论和文化差距的积极影响。