We propose a multilingual adversarial training model for determining whether a sentence contains an idiomatic expression. Given that a key challenge with this task is the limited size of annotated data, our model relies on pre-trained contextual representations from different multi-lingual state-of-the-art transformer-based language models (i.e., multilingual BERT and XLM-RoBERTa), and on adversarial training, a training method for further enhancing model generalization and robustness. Without relying on any human-crafted features, knowledge bases, or additional datasets other than the target datasets, our model achieved competitive results and ranked 6th place in SubTask A (zero-shot) setting and 15th place in SubTask A (one-shot) setting.
翻译:我们提出了一个多语种对抗性培训模式,用于确定一个句子是否包含一种特殊表达方式。鉴于这项任务面临的一个关键挑战是附加说明的数据数量有限,我们的模式依赖于来自不同多语种最新变压器语言模型(即多语种BERT和XLM-ROBERTA)的预先培训背景介绍,以及基于对抗性培训的培训,这是进一步加强模式概括和稳健性的培训方法。我们的模式在不依赖任何人为特征、知识基础或目标数据集以外的额外数据集的情况下,实现了竞争结果,在SubTask A(零发)设置和SubTask A(一发)设置的第6位位位位,在SubTask A(一发)设置中排第15位。