Developing machine learning-based interatomic potentials from ab-initio electronic structure methods remains a challenging task for computational chemistry and materials science. This work studies the capability of transfer learning, in particular discriminative fine-tuning, for efficiently generating chemically accurate interatomic neural network potentials on organic molecules from the MD17 and ANI data sets. We show that pre-training the network parameters on data obtained from density functional calculations considerably improves the sample efficiency of models trained on more accurate ab-initio data. Additionally, we show that fine-tuning with energy labels alone can suffice to obtain accurate atomic forces and run large-scale atomistic simulations, provided a well-designed fine-tuning data set. We also investigate possible limitations of transfer learning, especially regarding the design and size of the pre-training and fine-tuning data sets. Finally, we provide GM-NN potentials pre-trained and fine-tuned on the ANI-1x and ANI-1ccx data sets, which can easily be fine-tuned on and applied to organic molecules.
翻译:在计算化学和材料科学方面,从AB-nitio电子结构方法中开发基于机学习的跨原子潜力仍然是一项具有挑战性的任务。这项工作研究的是转移学习的能力,特别是有区别的微调,以便高效率地产生MD17和ANI数据集中有关有机分子的化学准确的跨原子神经网络潜力。我们表明,对从密度功能计算中获得的数据进行网络参数培训前,大大提高了经过更准确的AB-nitio数据培训的模型的样本效率。此外,我们表明,仅对能源标签进行微调,就足以获得准确的原子力量和进行大规模原子模拟,并提供设计完善的微调数据集。我们还调查了转移学习的可能局限性,特别是培训前和微调数据集的设计与规模。最后,我们提供了对ANI-1x和ANI-1ccx数据集进行预先培训和微调的全球机制-NN潜力,这些数据很容易对有机分子进行微调。