In this paper, we present an efficient deep learning based approach to extract technology-related topics and keywords within scientific literature, and identify corresponding technologies within patent applications. Specifically, we utilize transformer based language models, tailored for use with scientific text, to detect coherent topics over time and describe these by relevant keywords that are automatically extracted from a large text corpus. We identify these keywords using Named Entity Recognition, distinguishing between those describing methods, applications and other scientific terminology. We create a large amount of search queries based on combinations of method- and application-keywords, which we use to conduct semantic search and identify related patents. By doing so, we aim at contributing to the growing body of research on text-based technology mapping and forecasting that leverages latest advances in natural language processing and deep learning. We are able to map technologies identified in scientific literature to patent applications, thereby providing an empirical foundation for the study of science-technology linkages. We illustrate the workflow as well as results obtained by mapping publications within the field of neuroscience to related patent applications.
翻译:在本文中,我们展示了一种高效的深层次学习方法,在科学文献中提取与技术有关的专题和关键词,并在专利应用中找出相应的技术。具体地说,我们利用基于变压器的语言模型,专门设计用于科学文本,以便长期发现连贯的专题,用从大量文本中自动提取的相关关键词来描述这些主题。我们用名称实体识别这些关键词,区分描述方法、应用和其他科学术语的这些关键词。我们根据方法和应用关键词的组合,创建了大量的搜索查询,我们用这些关键词进行语义搜索和识别相关专利。我们这样做的目的是促进不断增长的基于文本的技术绘图和预测研究,利用自然语言处理和深层学习的最新进展。我们能够绘制科学文献中发现的技术用于专利应用的图谱,从而为科学技术联系研究提供经验基础。我们用神经科学领域的出版物与相关专利应用进行绘图,我们以此来说明工作流程和结果。