Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.
翻译:创建知识库和本体是一项耗时的任务,依赖于手动策划。人工智能/自然语言处理方法可以协助专家策划者填充这些知识库,但目前的方法依赖于广泛的训练数据,并且不能填充任意复杂的嵌套知识架构。在这里,我们提出了基于大型语言模型 (LLM) 执行零样本学习和通用查询回答的结构化提示问询和递归提取语义 (SPIRES) 知识提取方法,以灵活的提示为基础返回与指定架构符合的信息。给定一个详细的用户定义知识架构和输入文本,SPIRES 递归地对GPT-3+进行提示问询,以获取与提供的架构匹配的一组响应。SPIRES 使用现有本体和词汇为所有匹配元素提供标识符。我们介绍了 SPIRES 在不同领域中的使用示例,包括食谱、多物种细胞信号通路、疾病治疗、多步骤药物机制和化学物质对疾病因果图的提取。目前的 SPIRES 准确性与现有关系提取 (RE) 方法中间的范围相当,但具有易于定制、灵活和关键性的优势,即在没有任何训练数据的情况下执行新任务的能力。该方法支持利用LLM的语言解释能力组装知识库的一般策略,帮助手动知识策划和获取,同时支持验证外部到LLM的公共数据库和本体。SPIRES 可作为开源 OntoGPT 包的一部分提供:https://github.com/monarch-initiative/ontogpt。