公开催化2020(OC20) 公开催化2020(OC20)数据集和社区挑战 (The Open Catalyst 2020 (OC20) Dataset and Community Challenges)

Lowik Chanussot,Abhishek Das,Siddharth Goyal,Thibaut Lavril,Muhammed Shuaibi,Morgane Riviere,Kevin Tran,Javier Heras-Domingo,Caleb Ho,Weihua Hu,Aini Palizhati,Anuroop Sriram,Brandon Wood,Junwoong Yoon,Devi Parikh,C. Lawrence Zitnick,Zachary Ulissi

from arxiv, 37 pages, 11 figures, submitted to ACS Catalysis

Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,040 Density Functional Theory (DFT) relaxations (~264,890,000 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with pre-defined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, Dimenet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources, as well as a public leader board to encourage community contributions to solve these important tasks.

翻译：催化发现和优化是解决许多社会和能源挑战的关键,其中包括太阳能燃料合成、长期能源储存和可再生肥料生产。尽管催化界在计算催化剂发现过程应用机器学习模型方面做出了大量努力,但在建立模型方面仍是一个公开的挑战,这些模型可以覆盖表层和吸附体的元素构成,或许是因为在催化方面数据集比相关领域要小。为了解决这个问题,我们开发了OC20数据集,其中包括1,281,040 Density 功能 Thetory(DFT)放松(~264,890,000个单一点评价),在广泛的材料、表面和粘索贝(硝基、碳和氧化学化学)过程中应用机器模型,我们用随机的扰动结构、短期分子动态动态和电子结构分析来补充这一数据集。数据集由三个核心任务组成,表明日常催化剂模型的建模,并配有事先确定的火车/鉴定/测试分解(DFT)放松(~264,890,90,000个点的单点评价) 放松(~264,890,000个初步评估) 社区对材料、表面和表面的单点评估模型做出直接比较(Nexorate) 贡献) 贡献。我们将这些数据作为每个模型的模型的建模的模型的模型的模型中,这些模型的建模都提供这些模型的建成了一个可能的模型,这些模型是用于每个模型的建模模的建模的建模的建模的建于每个模型。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日