This paper mainly describes the dma submission to the TempoWiC task, which achieves a macro-F1 score of 77.05% and attains the first place in this task. We first explore the impact of different pre-trained language models. Then we adopt data cleaning, data augmentation, and adversarial training strategies to enhance the model generalization and robustness. For further improvement, we integrate POS information and word semantic representation using a Mixture-of-Experts (MoE) approach. The experimental results show that MoE can overcome the feature overuse issue and combine the context, POS, and word semantic features well. Additionally, we use a model ensemble method for the final prediction, which has been proven effective by many research works.
翻译:本文主要介绍提交TempoWiCC任务的dma文件,该任务取得了77.05%的宏观F1分,在这项任务中位居第一位。我们首先探讨不同预先培训的语言模型的影响。然后我们采用数据清理、数据增强和对抗性培训战略,以加强模型的概括性和稳健性。为了进一步改进,我们采用混合专家混合法,整合了POS信息和字词语义表达。实验结果显示教育部可以克服过度使用问题,将上下文、POS和词语义特征结合起来。此外,我们使用模型组合方法进行最终预测,许多研究都证明了这种方法是有效的。