We present the Open MatSci ML Toolkit: a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data with a specific focus on materials science and the OpenCatalyst Dataset. Our toolkit provides: 1. A scalable machine learning workflow for materials science leveraging PyTorch Lightning, which enables seamless scaling across different computation capabilities (laptop, server, cluster) and hardware platforms (CPU, GPU, XPU). 2. Deep Graph Library (DGL) support for rapid graph neural network prototyping and development. By publishing and sharing this toolkit with the research community via open-source release, we hope to: 1. Lower the entry barrier for new machine learning researchers and practitioners that want to get started with the OpenCatalyst dataset, which presently comprises the largest computational materials science dataset. 2. Enable the scientific community to apply advanced machine learning tools to high-impact scientific challenges, such as modeling of materials behavior for clean energy applications. We demonstrate the capabilities of our framework by enabling three new equivariant neural network models for multiple OpenCatalyst tasks and arrive at promising results for compute scaling and model performance.
翻译:我们展示了开放的 MatSci ML 工具包:一个灵活、自足和可扩展的 Python 框架,以应用科学数据方面的深学习模式和方法,特别侧重于材料科学和 OpenCatalyst 数据集。我们的工具包提供:1. 一个可扩缩的材料科学学习流程,利用PyTorrch Lightning 进行材料科学,使各种计算能力(笔记本、服务器、集群)和硬件平台(CPU、GPU、XPU)能够无缝扩展。2. 深图库(DGL)支持快速图形神经网络原型和开发。通过开放源发布与研究界公布和分享该工具包,我们希望:1. 降低新机器学习研究人员和从业人员的进入屏障,他们希望从OpenCatalyst数据集开始,该数据集目前包括最大的计算材料科学数据集。2. 使科学界能够应用先进的机器学习工具应对高影响的科学挑战,例如清洁能源应用的材料行为模型。我们展示了框架的能力,通过为多个 OpenCata comlistal space space space and assing researmental beal