HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allows for autonomous, demand-driven provisioning of Kubernetes-managed resources. A high-level overview of the employed architectures is presented, paired with the description of the setups used in both on-prem and Cloud deployments in support of several Open Science Grid communities. The experience suggests that the described solution should be generally suitable for contributing Kubernetes-based resources to existing HTCondor pools.
翻译:HTCondor在管理全球分布的、令人愉快地平行的科学工作量方面非常成功,特别是作为开放科学网网的一部分。HTCondor系统的设计使综合计算从任何地方提供的资源成为理想,但它对自主提供由其他解决办法管理的资源的本地支持非常有限。这项工作提出了一个解决办法,允许以需求驱动的方式自主提供Kubernetes管理的资源。介绍了对所使用结构的高级别概览,同时介绍了用于支持几个开放科学网社区在部署前和云层时使用的设置说明。经验表明,所述解决办法一般应适合于向现有的HTCondor集合提供Kubernetes为基础的资源。