From AlexNet to Inception, autoencoders to diffusion models, the development of novel and powerful deep learning models and learning algorithms has proceeded at breakneck speeds. In part, we believe that rapid iteration of model architecture and learning techniques by a large community of researchers over a common representation of the underlying entities has resulted in transferable deep learning knowledge. As a result, model scale, accuracy, fidelity, and compute performance have dramatically increased in computer vision and natural language processing. On the other hand, the lack of a common representation for chemical structure has hampered similar progress. To enable transferable deep learning, we identify the need for a robust 3-dimensional representation of materials such as molecules and crystals. The goal is to enable both materials property prediction and materials generation with 3D structures. While computationally costly, such representations can model a large set of chemical structures. We propose $\textit{ParticleGrid}$, a SIMD-optimized library for 3D structures, that is designed for deep learning applications and to seamlessly integrate with deep learning frameworks. Our highly optimized grid generation allows for generating grids on the fly on the CPU, reducing storage and GPU compute and memory requirements. We show the efficacy of 3D grids generated via $\textit{ParticleGrid}$ and accurately predict molecular energy properties using a 3D convolutional neural network. Our model is able to get 0.006 mean square error and nearly match the values calculated using computationally costly density functional theory at a fraction of the time.
翻译:从AlexNet到概念、自动校正到扩散模型、新颖和强大的深层次学习模型和学习算法的开发速度快得快,部分而言,我们认为,大批研究人员以共同代表基础实体的方式迅速复制模型结构和学习技巧,导致可以转让深层次的学习知识。结果,模型规模、精确度、忠诚度和计算性能在计算机视觉和自然语言处理方面急剧增加。另一方面,化学结构缺乏共同代表,也阻碍了类似的进展。为了能够进行可转移的深层次学习,我们确定有必要对分子和晶体等材料进行强健的三维代表。目标是使材料属性预测和材料生成与3D结构相一致。虽然计算成本很高,但这种表达可以模拟大量的化学结构。我们提议用美元(textit{Partlegrick},3D结构的SIMMD-优化图书馆,旨在进行深层次学习应用,并与深层次学习框架紧密结合。我们高度优化的电网生成模型,以便利用高成本的电流数据模型,在CPU的存储中精确地显示我们所生成的存储的存储和正价的网络的电压。