The size of the National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) is growing exponentially, allowing researchers to make discoveries. However, making discoveries is challenging and time-consuming due to the size of the data catalogs, and as many concepts and data are indirectly connected. This paper proposes a pipeline to generate knowledge graphs (KGs) representing different NASA SMD domains. These KGs can be used as the basis for dataset search engines, saving researchers time and supporting them in finding new connections. We collected textual data and used several modern natural language processing (NLP) methods to create the nodes and the edges of the KGs. We explore the cross-domain connections, discuss our challenges, and provide future directions to inspire researchers working on similar challenges.
翻译:NASA科学使命局(SMD)的规模呈指数级增长,使得研究人员可以做出发现。然而,由于数据目录的规模以及许多概念和数据间存在间接联系,因此进行发现是具有挑战性和耗时的。本文提出了一个管道,用于生成代表不同NASA SMD领域的知识图谱(KG)。这些KG可以作为数据集搜索引擎的基础,节省研究人员的时间并支持他们寻找新的联系。我们收集了文本数据,并使用几种现代自然语言处理(NLP)方法创建了KG的节点和边缘。我们探索了跨领域的连接,讨论了我们遇到的挑战,并提供了未来的方向,以启发从事类似挑战的研究人员。