Keyphrase extraction is one of the essential tasks for document understanding in NLP. While the majority of the prior works are dedicated to the formal setting, e.g., books, news or web-blogs, informal texts such as video transcripts are less explored. To address this limitation, in this work we present a novel corpus and method for keyphrase extraction from the transcripts of the videos streamed on the Behance platform. More specifically, in this work, a novel data augmentation is proposed to enrich the model with the background knowledge about the keyphrase extraction task from other domains. Extensive experiments on the proposed dataset dataset show the effectiveness of the introduced method.
翻译:关键词提取是NLP文件理解的基本任务之一。 虽然以前的大部分作品都专门用于正式设置,例如书籍、新闻或网络博客,但是,对视频记录誊本等非正式文本的探索较少。为了解决这一局限性,我们在此工作中提出了一个新颖的文字和方法,用于从Behance平台上流出视频记录誊本中提取关键文字。更具体地说,在这项工作中,提议增加新的数据,以从其他领域获取关键词提取任务的背景知识来丰富模型。关于拟议数据集的广泛试验显示了引入方法的有效性。