In this paper, we present a transfer learning system to perform technical domain identification on multilingual text data. We have submitted two runs, one uses the transformer model BERT, and the other uses XLM-ROBERTa with the CNN model for text classification. These models allowed us to identify the domain of the given sentences for the ICON 2020 shared Task, TechDOfication: Technical Domain Identification. Our system ranked the best for the subtasks 1d, 1g for the given TechDOfication dataset.
翻译:在本文中,我们提出了一个传输学习系统,用于对多语种文本数据进行技术域识别。我们提交了两个运行,一个使用变压器模型BERT,另一个使用有线电视新闻网的文本分类模型XLM-ROBERTA。这些模型使我们能够确定ICON 2020 共享任务( TechDofication: TechDrofication: TechDolical Done ID)的给定句的域。我们的系统为子任务 1d 排列了最好的,为给定的 TechDification 数据集排列了1g 。