Recent years have witnessed the rapid development of concept map generation techniques due to their advantages in providing well-structured summarization of knowledge from free texts. Traditional unsupervised methods do not generate task-oriented concept maps, whereas deep generative models require large amounts of training data. In this work, we present GT-D2G (Graph Translation-based Document To Graph), an automatic concept map generation framework that leverages generalized NLP pipelines to derive semantic-rich initial graphs, and translates them into more concise structures under the weak supervision of downstream task labels. The concept maps generated by GT-D2G can provide interpretable summarization of structured knowledge for the input texts, which are demonstrated through human evaluation and case studies on three real-world corpora. Further experiments on the downstream task of document classification show that GT-D2G beats other concept map generation methods. Moreover, we specifically validate the labeling efficiency of GT-D2G in the label-efficient learning setting and the flexibility of generated graph sizes in controlled hyper-parameter studies.
翻译:近年来,由于概念地图生成技术在提供结构完善的免费文本知识汇总方面的优势,概念地图生成技术迅速发展;传统的未经监督的方法并不产生面向任务的概念地图,而深基因模型则需要大量的培训数据;在这项工作中,我们介绍了GT-D2G(基于格列弗翻译的文件图图),这是一个概念地图生成自动框架,它利用通用的NLP管道来获取精密的初始图,并在下游任务标签的薄弱监管下将其转化为更简洁的结构;GT-D2G生成的概念地图可以为输入文本提供可解释的结构知识的可解释的汇总,这通过人类对三个现实世界生物体的评估和案例研究来证明;关于文件分类的下游任务的进一步实验表明,GT-D2G比其他概念地图生成方法要强;此外,我们具体验证了GD-D2G在标签效率学习环境中的标签效率,以及在受控制的超参数研究中生成的图形大小的灵活性。</s>