作为次等级序列生成的等级文字分类 (Hierarchical Text Classification As Sub-Hierarchy Sequence Generation)

Hierarchical text classification (HTC) is essential for various real applications. However, HTC models are challenging to develop because they often require processing a large volume of documents and labels with hierarchical taxonomy. Recent HTC models based on deep learning have attempted to incorporate hierarchy information into a model structure. Consequently, these models are challenging to implement when the model parameters increase for a large-scale hierarchy because the model structure depends on the hierarchy size. To solve this problem, we formulate HTC as a sub-hierarchy sequence generation to incorporate hierarchy information into a target label sequence instead of the model structure. Subsequently, we propose the Hierarchy DECoder (HiDEC), which decodes a text sequence into a sub-hierarchy sequence using recursive hierarchy decoding, classifying all parents at the same level into children at once. In addition, HiDEC is trained to use hierarchical path information from a root to each leaf in a sub-hierarchy composed of the labels of a target document via an attention mechanism and hierarchy-aware masking. HiDEC achieved state-of-the-art performance with significantly fewer model parameters than existing models on benchmark datasets, such as RCV1-v2, NYT, and EURLEX57K.

翻译：然而,HTC模型的开发具有挑战性,因为它们往往需要处理大量带有等级分类的文档和标签。最近基于深层次学习的HTC模型试图将等级信息纳入一个模型结构。因此,这些模型具有挑战性,在模型结构取决于等级大小而导致大规模等级结构的模型参数增加时,要实施。为了解决这一问题,我们将HTC作为分等级序列生成,以便通过关注机制和分级保护结构将等级信息纳入一个目标标签序列,而不是模型结构。随后,我们建议HDEC(HDEC)将文本序列解码成一个子等级序列,使用循环等级分级分解法,将同一级别的所有父母一次性分类为子女。此外,HIDC经过培训,在一个子等级分级结构中使用从根到每个叶的等级路径信息,该分类由目标文件的标签组成,通过关注机制和分级保护模式和分级保护模式。HDEC已经实现的文本序列在次等级序列中进行解码,其格式比现有的标准模型要少得多。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日