Many important classification problems in the real-world consist of a large number of closely related categories in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories organized in a hierarchy or taxonomy has become a challenging problem. In this paper, we present a hierarchical and fine-tuning approach based on the Ordered Neural LSTM neural network, abbreviated as HFT-ONLSTM, for more accurate level-by-level HMTC. First, we present a novel approach to learning the joint embeddings based on parent category labels and textual data for accurately capturing the joint features of both category labels and texts. Second, a fine tuning technique is adopted for training parameters such that the text classification results in the upper level should contribute to the classification in the lower one. At last, the comprehensive analysis is made based on extensive experiments in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches over two benchmark datasets, and the experimental results show that our HFT-ONLSTM approach outperforms these approaches, in particular reducing computational costs while achieving superior performance.
翻译:在现实世界中,许多重要的分类问题都由等级结构或分类结构中的许多密切相关的类别组成。等级性多标签文本分类(HMTC)对于在等级或分类中组织的大组密切相关类别而言,其准确性较高的等级性多标签文本分类(HMTC)已成为一个具有挑战性的问题。在本文中,我们提出了一个基于有秩序神经LSTM神经网络的等级性和微调方法,以HFT-ONLSTM为缩写,作为HFT-ONLSTM为缩写,以更精确地逐级地分类HMTC。首先,我们提出了一个新颖的学习方法,根据母类标签和文本数据来学习联合嵌入,以准确捕捉两种类别标签和文本的联合特征。第二,对培训参数采用了微调技术,这样,文本分类的上层结果应有助于较低层的分类。最后,全面分析是在与两个基准数据集的高级等级和固定多标签分类方法进行比较的广泛实验的基础上进行的。我们的研究结果表明,HFT-ONLSTM方法超越了这些方法,同时实现更高的计算方法。