Text classification is a very classic NLP task, but it has two prominent shortcomings: On the one hand, text classification is deeply domain-dependent. That is, a classifier trained on the corpus of one domain may not perform so well in another domain. On the other hand, text classification models require a lot of annotated data for training. However, for some domains, there may not exist enough annotated data. Therefore, it is valuable to investigate how to efficiently utilize text data from different domains to improve the performance of models in various domains. Some multi-domain text classification models are trained by adversarial training to extract shared features among all domains and the specific features of each domain. We noted that the distinctness of the domain-specific features is different, so in this paper, we propose to use a curriculum learning strategy based on keyword weight ranking to improve the performance of multi-domain text classification models. The experimental results on the Amazon review and FDU-MTL datasets show that our curriculum learning strategy effectively improves the performance of multi-domain text classification models based on adversarial learning and outperforms state-of-the-art methods.
翻译:文本分类是一项非常经典的NLP任务,但有两个显著的缺点:一方面,文本分类是高度依赖域的,也就是说,在某一域的文体上受过训练的分类人员在另一域可能不会很好地发挥作用。另一方面,文本分类模式需要大量附加说明的数据来进行培训;然而,对于某些领域,可能没有足够的附加说明的数据。因此,调查如何有效利用不同领域的文本数据来改进不同领域的模型的性能是有价值的。一些多域文本分类模式通过对抗性培训得到培训,以提取所有领域和每个领域的具体特征之间的共同特征。我们注意到,具体领域特征的独特性是不同的,因此在本文件中,我们提议使用基于关键字重量排序的课程学习战略来改进多域文本分类模型的性能。亚马逊审查和FDU-MTL数据集的实验结果表明,我们的课程学习战略有效地改进了基于对抗性学习和超常规方法的多域文本分类模型的性能。