Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various communities to deploy software on worldwide distributed computing infrastructures by decoupling the software from the Operating System. However, the installation of this file system depends on a collaboration with system administrators of the remote resources and an HTTP connectivity to fetch dependencies from external sources. Supercomputers, which offer tremendous computing power, generally have more restrictive policies than grid sites and do not easily provide the mandatory conditions to exploit CVMFS. Different solutions have been developed to tackle the issue, but they are often specific to a scientific community and do not deal with the problem in its globality. In this paper, we provide a generic utility to assist any community in the installation of complex software dependencies on supercomputers with no external connectivity. The approach consists in capturing dependencies of applications of interests, building a subset of dependencies, testing it in a given environment, and deploying it to a remote computing resource. We experiment this proposal with a real use case by exporting Gauss-a Monte-Carlo simulation program from the LHCb experiment-on Mare Nostrum, one of the top supercomputers of the world. We provide steps to encapsulate the minimum required files and deliver a light and easy-to-update subset of CVMFS: 12.4 Gigabytes instead of 5.2 Terabytes for the whole LHCb repository.
翻译:提供一个可复制的环境,以及分布在数千个分布式异构工作节点上的复杂且最新的软件栈是一项关键任务。 CernVM-文件系统(CVMFS)旨在帮助各个社区在全球分布式计算基础设施上部署软件,通过从操作系统中分离软件来实现。然而,该文件系统的安装取决于与远程资源的系统管理员的协作以及从外部源获取依赖项的HTTP连接。超级计算机提供极大的计算能力,但通常比网格站点具有更严格的策略,并且不易提供利用CVMFS所必需的条件。已经开发了不同的解决方案来解决这个问题,但它们通常是特定于一个科学社区,并且不处理问题的整体性。 在本文中,我们提供了一个通用实用程序,以协助任何社区在没有外部连接的超级计算机上安装复杂的软件依赖项。该方法包括捕获感兴趣的应用程序的依赖关系,构建依赖关系的子集,在给定环境中测试它,并将其部署到远程计算资源中。 我们通过一个真实案例来试验这个提议,即从LHCb实验中导出Gauss-a Monte-Carlo模拟程序 - 在Mare Nostrum上进行,这是世界上顶尖的超级计算机之一。我们提供了封装最低要求文件的步骤,并提供了一个轻便且易于更新的CVMFS子集:12.4千兆字节,而不是整个LHCb存储库的5.2太字节。