Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for ML MD simulation. We curate representative MD systems, including water, organic molecules, peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate further work.
翻译:分子动态(MD)模拟技术被广泛用于各种自然科学应用。机器学习(ML)力场模型开始通过直接从原子结构中预测力量取代腹部模拟。尽管在这一领域取得了显著进展,但此类技术主要是以其武力/能源预测错误为基准,尽管实际使用案例是产生现实的MD轨迹。我们的目标是通过引入一个新的MLMD模拟基准套件来填补这一差距。我们为MLM模拟推出一个新型基准套件。我们开发了具有代表性的MD系统,包括水、有机分子、浸泡和材料,并设计了与各自系统科学目标相对应的评估指标。我们为收集最新工艺(SOTA)MLFF模型制定基准基准,并特别说明通常设定的基准力精度与相关模拟指标不相符的情况。我们展示了选定的SOTA方法何时和如何失败,同时提出了进一步改进的方向。具体地说,我们确定稳定是ML模型改进的关键衡量标准。我们的基准套件有一个全面的开放源代码库,用于与MLFF进一步推动工作。