The data volumes stored in telescope archives is constantly increasing due to the development and improvements in the instrumentation. Often the archives need to be stored over a distributed storage architecture, provided by independent compute centres. Such a distributed data archive requires overarching data management orchestration. Such orchestration comprises of tools which handle data storage and cataloguing, and steering transfers integrating different storage systems and protocols, while being aware of data policies and locality. In addition, it needs a common Authorisation and Authentication Infrastructure (AAI) layer which is perceived as a single entity by end users and provides transparent data access. The scientific domain of particle physics also uses complex and distributed data management systems. The experiments at the Large Hadron Collider\,(LHC) accelerator at CERN generate several hundred petabytes of data per year. This data is globally distributed to partner sites and users using national compute facilities. Several innovative tools were developed to successfully address the distributed computing challenges in the context of the Worldwide LHC Computing Grid (WLCG). The work being carried out in the ESCAPE project and in the Data Infrastructure for Open Science (DIOS) work package is to prototype a Scientific Data Lake using the tools developed in the context of the WLCG, harnessing different physics scientific disciplines addressing FAIR standards and Open Data. We present how the Scientific Data Lake prototype is applied to address astronomical data use cases. We introduce the software stack and also discuss some of the differences between the domains.
翻译:存储在望远镜档案中的数据量由于仪器的开发和改进而不断增加。通常,档案需要通过独立计算中心提供的分布式存储结构储存,这种分布式数据档案需要总体的数据管理管弦化。这种管弦化包括处理数据储存和编目的工具,以及将不同存储系统和协议的传输指导结合起来,同时了解数据政策和地点。此外,它需要有一个共同的认证和认证基础设施(AAI)层,该层被终端用户视为一个单一实体,并提供透明的数据访问机会。粒子物理的科学领域也使用复杂和分布式的数据管理系统。大型哈德伦相交中心(LHC)的加速器每年生成几百个数据节点。这些数据在全球向伙伴地点和用户分发,同时了解数据政策和地点;此外,它需要开发一些创新工具,以成功地解决全球LHC计算机网络(WLC)中分布式的计算机挑战。在亚太经社会E项目和开放科学数据基础设施(DIOS)中开展的工作也使用复杂和分布式数据管理系统。CORT(LC)工作包的实验每年生成数百个数据节点。这些数据被全球使用一个科学数据库的原型数据库。我们正在讨论如何利用一个科学数据库数据库数据库数据库数据库,将数据工具的模型用于目前使用。