Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
翻译:通过使用实验设计方法选择最佳模拟来进行,可以大大加速涉及模拟集合的科学应用。使用机器学习(ML)来创建模拟的代用模型的方法显示了指导模拟群的特别希望,但由于需要协调模拟和学习任务的动态组合而难以部署。我们介绍了Colmena,一个开放源码的Python框架,使用户能够通过提供单项任务的执行和用于选择何时执行的任务的逻辑来指导运动。Colmena 处理任务发送、结果核对、 ML 模型职业和 ML 模型(再培训),使用 Parsl 执行HPC 系统的任务。我们描述了Colmena 的设计,并通过将其应用于电解设计来说明其能力,在电解设计中将它标标标标标为65536 CPU,并将高性能分子的发现速度加速到100倍于非制搜索的系数。