Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing open a path to reduce the computational cost for GCNN training effectively. However, efficient utilization of high performance computing (HPC) resources for training requires simultaneously optimizing large-scale data management and scalable stochastic batched optimization techniques. In this work, we focus on building GCNN models on HPC systems to predict material properties of millions of molecules. We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch. We use ADIOS, a high-performance data management framework for efficient storage and reading of large molecular graph data. We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap. We measure the scalability, accuracy, and convergence of our approach on two DOE supercomputers: the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) and the Perlmutter system at the National Energy Research Scientific Computing Center (NERSC). We present our experimental results with HydraGNN showing i) reduction of data loading time up to 4.2 times compared with a conventional method and ii) linear scaling performance for training up to 1,024 GPUs on both Summit and Perlmutter.
翻译:在材料科学中,高性能计算(HPC)资源的有效使用需要同时优化大型数据管理和可缩放的分批优化技术。在这项工作中,我们的重点是在高能分子结构图示中建立高能成份模型,以预测数百万分子的物质特性。我们使用内部图书馆HINGNN来进行大规模GNNN培训,利用PyTorch中分布的数据平行关系。我们使用ADIOS,一个高效存储和阅读大型分子图数据的高性能数据管理框架。我们用两个开放源的大型图表数据集进行平行培训,以建立高能成份的GCNNNN模型,用于预测数百万分子的物质特性。我们使用我们内部图书馆的HINGNNN来进行大规模图形数据集设计,这通常是一个耗时的过程。我们使用HINGNNNNNS来进行大规模G培训,利用在PyTrchr的分布式数据平行,我们使用高性能数据管理框架来高效存储和阅读大型分子图数据。我们用两个开放源的大型图表数据集进行平行培训,用GNNNNEAR系统来制作一个重要数量预测器,用来显示在OMO-CUCUCUCEB级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级的升级的升级的高级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级的高级级的高级级级级级级级级的高级级的高级级的高级级级的高级级的高级级级级级级的高级级级级级级级级级级级级级级级级级的级级级级级级级级级级级级级级级的级的级级的级的级的级级级级级的级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级