Computational catalysis and machine learning communities have made considerable progress in developing machine learning models for catalyst discovery and design. Yet, a general machine learning potential that spans the chemical space of catalysis is still out of reach. A significant hurdle is obtaining access to training data across a wide range of materials. One important class of materials where data is lacking are oxides, which inhibits models from studying the Oxygen Evolution Reaction and oxide electrocatalysis more generally. To address this we developed the Open Catalyst 2022(OC22) dataset, consisting of 62,521 Density Functional Theory (DFT) relaxations (~9,884,504 single point calculations) across a range of oxide materials, coverages, and adsorbates (*H, *O, *N, *C, *OOH, *OH, *OH2, *O2, *CO). We define generalized tasks to predict the total system energy that are applicable across catalysis, develop baseline performance of several graph neural networks (SchNet, DimeNet++, ForceNet, SpinConv, PaiNN, GemNet-dT, GemNet-OC), and provide pre-defined dataset splits to establish clear benchmarks for future efforts. For all tasks, we study whether combining datasets leads to better results, even if they contain different materials or adsorbates. Specifically, we jointly train models on Open Catalyst 2020 (OC20) Dataset and OC22, or fine-tune pretrained OC20 models on OC22. In the most general task, GemNet-OC sees a ~32% improvement in energy predictions through fine-tuning and a ~9% improvement in force predictions via joint training. Surprisingly, joint training on both the OC20 and much smaller OC22 datasets also improves total energy predictions on OC20 by ~19%. The dataset and baseline models are open sourced, and a public leaderboard will follow to encourage continued community developments on the total energy tasks and data.
翻译:为解决这一问题,我们在开发用于催化剂发现和设计的机器学习模型方面取得了相当大的进展。然而,覆盖催化化学空间的通用机器学习潜力仍然遥不可及。一个重大障碍是获得范围广泛的材料的培训数据。缺少数据的一个重要材料类别是氧化物,这阻碍了研究氧气进化前再进和氧化电解的模型。我们为此开发了Opal Callyst 2022 (OC22) 数据集,其中包括62,521 Density 功能基线模型(DFO) 放松(~9,884,504 单点计算 ) 跨越一系列氧化物材料、覆盖范围和粘合材料(*H,*OO,*OH,*OH,*OH,*OH2,*O2,*O2,*O2,*CO。我们定义了通用任务,通过碳解,在多个图形神经网络(Schentnet, DimeNet,OFREO-Net,OFRILO-OD) 数据更新(OICNEO-NUC) 或OIGEOIGNGNAD Dent Studental Dent dental Studental Studental Studental Studental Studental Studental Studental Student Studental Studental Studental Statal Students) 这样的数据, 这样的数据总和OOUT Hism Students dism Studental Studental Stutitals 和OT), 和OUT Fet the dism dism Stents dism dismod dismoddaldents lauts coms lauts met the Stents mess Outs dism Stents lauts las lad dism lad dism lad dism commod dism lads lad dism lauts lad dism las lad lad lads comm lad dents lad dism la las la la la la la