Virtual Adversarial Training (VAT) has been effective in learning robust models under supervised and semi-supervised settings for both computer vision and NLP tasks. However, the efficacy of VAT for multilingual and multilabel text classification has not been explored before. In this work, we explore VAT for multilabel emotion recognition with a focus on leveraging unlabelled data from different languages to improve the model performance. We perform extensive semi-supervised experiments on SemEval2018 multilabel and multilingual emotion recognition dataset and show performance gains of 6.2% (Arabic), 3.8% (Spanish) and 1.8% (English) over supervised learning with same amount of labelled data (10% of training data). We also improve the existing state-of-the-art by 7%, 4.5% and 1% (Jaccard Index) for Spanish, Arabic and English respectively and perform probing experiments for understanding the impact of different layers of the contextual models.
翻译:虚拟互动培训(VAT)在计算机视觉和NLP任务的监管和半监管环境下,有效地学习了稳健模型。然而,以前尚未探讨过多语种和多标签文本分类的增值税效力。在这项工作中,我们探索了多标签情感识别增值税,重点是利用不同语言的未贴标签数据改进模型性能。我们在SemEval2018多标签和多语言情感识别数据集上进行了广泛的半监管实验,并显示在使用相同数量的数据(占培训数据的10%)进行监管学习后,取得了6.2%(阿拉伯文)、3.8%(西班牙文)和1.8(英文)的绩效收益。我们还分别将西班牙语、阿拉伯语和英语的现有最新水平提高了7%、4.5%和1%(雅卡指数),并进行了实验,以了解背景模型不同层面的影响。