Hierarchical multi-label text classification (HMTC) has been gaining popularity in recent years thanks to its applicability to a plethora of real-world applications. The existing HMTC algorithms largely focus on the design of classifiers, such as the local, global, or a combination of them. However, very few studies have focused on hierarchical feature extraction and explore the association between the hierarchical labels and the text. In this paper, we propose a Label-based Attention for Hierarchical Mutlti-label Text Classification Neural Network (LA-HCN), where the novel label-based attention module is designed to hierarchically extract important information from the text based on the labels from different hierarchy levels. Besides, hierarchical information is shared across levels while preserving the hierarchical label-based information. Separate local and global document embeddings are obtained and used to facilitate the respective local and global classifications. In our experiments, LA-HCN outperforms other state-of-the-art neural network-based HMTC algorithms on four public HMTC datasets. The ablation study also demonstrates the effectiveness of the proposed label-based attention module as well as the novel local and global embeddings and classifications. By visualizing the learned attention (words), we find that LA-HCN is able to extract meaningful information corresponding to the different labels which provides explainability that may be helpful for the human analyst.
翻译:近些年来,由于基于等级的多标签文本分类(HMTC)具有大量真实世界应用程序的可应用性,现有HMTC算法主要侧重于本地、全球或组合等分类器的设计。然而,很少有研究侧重于等级特征提取,并探索等级标签和文本之间的联系。在本文中,我们提议基于标签的对等级性软体标签-标签文本分类神经网络(LA-HCN)的注意(LA-HCN)进行基于标签的新版关注模块的设计,以便从基于不同等级层次标签的文本中逐级提取重要信息。此外,等级信息在各级别之间共享,同时保存基于等级标签的信息。获得并使用独立的本地和全球文件嵌入,以促进各自的本地和全球分类。在我们的实验中,LA-HCN比其他基于状态的神经网络基于神经网络的HMTC算法(LA-HCN),该基于标签的模块的设计是从等级层次上提取重要信息的重要信息。此外,基于等级的信息在各级别之间共享,而基于等级的层次共享信息,通过我们所了解的、可理解的、可理解的、可解释的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的、可理解的人类分类的标签的标签的分类等的标签的分类。