关键词:从短文本中提取带有文本到文字传输变换器的文字 (Keyword Extraction from Short Texts with a Text-To-Text Transfer Transformer)

from arxiv, Accepted to ACIIDS 2022. The proceedings of ACIIDS 2022 will be published by Springer in series Lecture Notes in Artificial Intelligence (LNAI) and Communications in Computer and Information Science (CCIS)

The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish (plT5) to the task of intrinsic and extrinsic keyword extraction from short text passages. The evaluation is carried out on the new Polish Open Science Metadata Corpus (POSMAC), which is released with this paper: a collection of 216,214 abstracts of scientific publications compiled in the CURLICAT project. We compare the results obtained by four different methods, i.e. plT5kw, extremeText, TermoPL, KeyBERT and conclude that the plT5kw model yields particularly promising results for both frequent and sparsely represented keywords. Furthermore, a plT5kw keyword generation model trained on the POSMAC also seems to produce highly useful results in cross-domain text labelling scenarios. We discuss the performance of the model on news stories and phone-based dialog transcripts which represent text genres and domains extrinsic to the dataset of scientific abstracts. Finally, we also attempt to characterize the challenges of evaluating a text-to-text model on both intrinsic and extrinsic keyword extraction.

翻译：本文探讨了波兰的文本到文字转换变换语言模式(T5)与从短短文本段落中提取内在和外源关键词的任务的相关性;对波兰新的开放科学元数据公司(POSMAC)进行了评价,并随本文件发布:在CURLICAT项目中汇编了216 214份科学出版物摘要;我们比较了以四种不同方法,即plT5kw、triorText、TermoPL、KeyBERT取得的结果,并得出结论认为,plT5kw模型对经常和很少代表的关键词都产生了特别有希望的结果;此外,在POSMACS上培训的plT5kw关键词生成模型似乎也产生了非常有用的跨主题文本标签设想方案的结果;我们讨论了新闻报道模型和电话对话记录的工作表现,这些模式代表了文本的gens和域与科学摘要的数据集的外源和域。最后,我们还试图说明评价内在和极端关键词提取的文本模型的挑战。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日