A food composition knowledge base, which stores the essential phyto-, micro-, and macro-nutrients of foods is useful for both research and industrial applications. Although many existing knowledge bases attempt to curate such information, they are often limited by time-consuming manual curation processes. Outside of the food science domain, natural language processing methods that utilize pre-trained language models have recently shown promising results for extracting knowledge from unstructured text. In this work, we propose a semi-automated framework for constructing a knowledge base of food composition from the scientific literature available online. To this end, we utilize a pre-trained BioBERT language model in an active learning setup that allows the optimal use of limited training data. Our work demonstrates how human-in-the-loop models are a step toward AI-assisted food systems that scale well to the ever-increasing big data.
翻译:食物构成知识库储存食物的基本植物、微营养素和宏观营养素,对研究和工业应用都有用。虽然许多现有的知识库试图保存这种信息,但往往受到耗费时间的人工整理过程的限制。在食品科学领域以外,使用经过培训的语言模型的自然语言处理方法最近显示了从未经结构化的文本中提取知识的有希望的结果。在这项工作中,我们提议了一个半自动化框架,用于从网上现有的科学文献中建立食物构成知识库。为此,我们利用预先培训的生物-生物-生物-生物-生物-生物-生物-生物-生物伦理学语言模型进行积极学习,以便最佳地利用有限的培训数据。我们的工作展示了人类流动模式如何成为向人工辅助食品系统迈出的一步,从而将规模提高到不断增加的大数据。