Multi-domain text classification (MDTC) aims to leverage all available resources from multiple domains to learn a predictive model that can generalize well on these domains. Recently, many MDTC methods adopt adversarial learning, shared-private paradigm, and entropy minimization to yield state-of-the-art results. However, these approaches face three issues: (1) Minimizing domain divergence can not fully guarantee the success of domain alignment; (2) Aligning marginal feature distributions can not fully guarantee the discriminability of the learned features; (3) Standard entropy minimization may make the predictions on unlabeled data over-confident, deteriorating the discriminability of the learned features. In order to address the above issues, we propose a co-regularized adversarial learning (CRAL) mechanism for MDTC. This approach constructs two diverse shared latent spaces, performs domain alignment in each of them, and punishes the disagreements of these two alignments with respect to the predictions on unlabeled data. Moreover, virtual adversarial training (VAT) with entropy minimization is incorporated to impose consistency regularization to the CRAL method. Experiments show that our model outperforms state-of-the-art methods on two MDTC benchmarks.
翻译:多域文本分类(MDTC)旨在利用多个领域的所有可用资源,学习可以广泛推广这些领域的预测模型。最近,许多MDTC方法采用了对抗性学习、共同-私人范式和微量最小化机制,以产生最先进的成果。然而,这些方法面临三个问题:(1) 最大限度地缩小域差异不能充分保证域对齐成功;(2) 调和边际特征分布不能充分保证所学特征的不相容性;(3) 标准最小化极有可能对未贴标签的数据作出预测,使其过于自信,使所学特征的不均恶化。为了解决上述问题,我们提议为MDTC建立一个共同正规化的对抗性学习机制。这一方法构建了两种不同的共同潜在空间,对每个空间进行域对齐,并惩罚这两种对未贴标签数据预测的不一致性。此外,将不贴标签的虚拟对抗性培训(VAT)纳入对最小化数据的预测,将一致性调整为CRAL的两种方法。实验显示,我们的模型超越了MTC基准。