Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert user input. To accelerate this process, this work presents \text{\it hyperactive learning} (HAL), a framework for formulating an accelerated sampling algorithm specifically for the task of training database generation. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics) and add a biasing term that drives the system towards high uncertainty and thus to unseen training configurations. Building on this framework, general protocols for building training databases for alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configurations (32 atoms each) with fast evaluation times of <100 microsecond/atom/cpu-core. These potentials are demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL database is built using ACE, able to determine the density of a long polyethylene glycol (PEG) polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated PEG polymers with sizes ranging from 2 to 32.
翻译:由数据驱动的跨原子潜力已成为一个强大的替代模型类别,用于为 prit ab itio} 潜在的能源表面提供强大的替代模型,能够可靠地预测实验精确的宏观特性。在生成准确和可转让的潜力方面,最耗时和可能最重要的任务是生成培训数据集,这仍需要大量的专家用户投入。为加快这一过程,这项工作提供了\ text ~it 超活跃学习} (HAL),这是专门为培训数据库生成任务制定加速取样算法的框架。关键的想法是从一个具有物理动机的取样器(例如,分子动态)开始,并增加一个偏向性术语,将系统推向高度不确定性,从而推向不可见的培训配置。在这个框架的基础上,将提出为合金和聚合物建立培训数据库的一般协议,这需要大量的专家用户投入。对于合金来说,Als,Alsi10 ACE的潜力是通过一个包含88个配置的最低限度的HAL生成的数据库(每组32个原子),其快速评价时间为 < 100 微秒/ AStominal 范围,从一个精度的PIEG/cliental AL Creal 建立一个具有极精度的磁度的磁度数据库,这些潜力通过一个极级的磁度为CEEEGILILILILILIL2号,这些潜力通过一个精度的精度,通过建立到一个精度的CEEG- sreal 。 这些潜力通过一个精度到一个精度的精度的精度的精度由一个精度的磁度的CEEEEEEEBIB到一个精度到一个精度的精度的精度的精度的精度的精度的精度的精度的模型来被展示到制到制成。