Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute. An RDF graph prototype pipeline is presented, followed by a discussion of research implications and conclusion summarizing the results.
翻译:灵活元数据管道对于支持FAIR数据原则至关重要。尽管有这种需要,研究人员很少报告其确定元数据标准和协议的方法,以支持最佳灵活性。本文报告了一项旨在开发灵活的元数据管道以收集30多万个数字鱼样图像、从多个数据储存库和鱼类收集中提取的收集资料的倡议。图像及其相关元数据正在用于涉及自动化物种识别、分解和特质提取的与AI有关的科学研究。文件提供了背景情况,随后介绍了一个四阶段方法,包括:1. 问题评估,2. 解决方案调查,3. 实施和4. 改进。这项工作是NSF利用数据革命、生物引导神经网络(NSF/HDR-BGNN)项目和人类发展报告图像学研究所的一部分。介绍了RDF的原型管道,随后讨论了研究影响并总结了结果。