Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing with each other. Inspired by this phenomenon, we present two strong black-box adversarial attacks (one word-level, one phrase-level) for multilingual models that push their ability to handle code-mixed sentences to the limit. The former uses bilingual dictionaries to propose perturbations and translations of the clean example for sense disambiguation. The latter directly aligns the clean example with its translations before extracting phrases as perturbations. Our phrase-level attack has a success rate of 89.75% against XLM-R-large, bringing its average accuracy of 79.85 down to 8.18 on XNLI. Finally, we propose an efficient adversarial training scheme that trains in the same number of steps as the original model and show that it improves model accuracy.
翻译:多语种模式已经表现出令人印象深刻的跨语言转移性能。 但是,类似 XNLI 这样的测试组在示例层面是单语级的。 在多语言社区中,多语种社区在相互交融时通常使用代码混合。受这一现象的启发,我们为多语种模式展示了两种强烈的黑盒对抗性攻击(一个单词级,一个词级),将他们处理编码混合判决的能力提高到极限。前者使用双语词典来提议对清洁示例的扰动和翻译,以便产生感错觉。后者直接将清洁示例与其译文相匹配,然后将词提取为扰动词。我们的语句级攻击成功率为89.75%对XLM-R大,将平均精确度从79.85降到XNLI的8.18。最后,我们建议了一个高效的对抗性培训培训计划,按照原始模型的相同步骤进行培训,并表明它提高了模型的准确性。