One of the key problems in multi-label text classification is how to take advantage of the correlation among labels. However, it is very challenging to directly model the correlations among labels in a complex and unknown label space. In this paper, we propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model. LM-MTC is able to capture implicit relationships among labels through the powerful ability of pre-train language models. On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM). We train the MTC and MLM together, further improving the generalization ability of the model. A large number of experiments on multiple datasets demonstrate the effectiveness of our method.
翻译:多标签文本分类的关键问题之一是如何利用标签之间的关联。然而,直接模拟标签在复杂和未知标签空间中的关联性非常具有挑战性。在本文件中,我们提出了一个标签面具多标签文本分类模型(LM-MTC),该模型的灵感来自语言模型的凝块问题。LM-MTC能够通过培训前语言模型的强大能力捕捉到标签之间的隐含关系。在此基础上,我们为每个潜在标签指定了不同的标志,并随机遮盖标记,以某种可能性建立一个基于标签的蒙面语言模型(MLM)。我们一起培训MTC和MLM,进一步提高模型的通用能力。许多多数据集实验证明了我们的方法的有效性。