In this decade astronomy is undergoing a paradigm shift to handle data from next generation observatories such as the Square Kilometre Array (SKA) or the Vera C. Rubin Observatory (LSST). Producing real time data streams of up to 10 TB/s and data products of the order of 600 Pbytes/year, the SKA will be the biggest civil data producing machine of the world that demands novel solutions on how these data volumes can be stored and analysed. Through the use of complex, automated pipelines the provenance of this real time data processing is key to establish confidence within the system, its final data products, and ultimately its scientific results. The intention of this paper is to lay the foundation for making an automated provenance generation tool for astronomical/data-processing pipelines. We therefore present a use case analysis, specific to the astronomical needs which addresses the issues of trust and reproducibility as well as other ulterior use cases which are of interest to astronomers. This analysis is subsequently used as the basis to discuss the requirements, challenges, and opportunities involved in designing both the tool and the associated provenance model.
翻译:在这个十年里,天文学正在经历一个范式转变,以处理下一代观测台的数据,如Squa Komilm Arra(SKA)或Vera C. Rubin天文台(LSST),生成最多10个TB/s的实时数据流和每年600个字节的数据产品。因此,SKA将是世界上最大的民用数据生成机器,要求就如何储存和分析这些数据量提出新的解决办法。通过使用复杂、自动化管道,这一实时数据处理的源头是建立系统内的信心、其最终数据产品以及最终的科学结果的关键。本文的目的是为制作一个天文/数据处理管道自动出品生成工具奠定基础。因此,我们提出一个使用案例分析,具体针对天文学需要,解决信任和再生问题,以及天文学家感兴趣的其他别有用案例。这一分析随后被用作讨论设计工具和相关验证模型的要求、挑战和机会的基础。