When coping with literary texts such as novels or short stories, the extraction of structured information in the form of a knowledge graph might be hindered by the huge number of possible relations between the entities corresponding to the characters in the novel and the consequent hurdles in gathering supervised information about them. Such issue is addressed here as an unsupervised task empowered by transformers: relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations. All the sentences in the same cluster are finally summarized (with BART) and a descriptive label extracted from the summary. Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
翻译:当处理小说或短故事等文学文本时,以知识图形式提取结构化信息可能会受到与小说中人物相对应的实体之间可能存在的大量关系以及由此在收集受监督的信息方面造成的障碍的阻碍,这些问题在这里作为不受监督的任务处理,由变压器授权:原始文本中的关联句是嵌入的(与SBERT一起的),并组合在一起,以便将语义相似的关系结合在一起;最后(与BART一起)总结了同一组中的所有句子,并从摘要中摘录了描述性标签;初步测试表明,这种集群可能成功发现类似关系,并为半监督办法提供宝贵的预处理。