Data-driven machine learning methods have the potential to dramatically accelerate the rate of materials design over conventional human-guided approaches. These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory. For crystal structure generation, a key bottleneck lies in developing suitable atomic structure fingerprints or representations for the machine learning model, analogous to the graph-based or SMILES representations used in molecular generation. However, finding data-efficient representations that are invariant to translations, rotations, and permutations, while remaining invertible to the Cartesian atomic coordinates remains an ongoing challenge. Here, we propose an alternative approach to this problem by taking existing non-invertible representations with the desired invariances and developing an algorithm to reconstruct the atomic coordinates through gradient-based optimization using automatic differentiation. This can then be coupled to a generative machine learning model which generates new materials within the representation space, rather than in the data-inefficient Cartesian space. In this work, we implement this end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model. We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept. Furthermore, this method can be readily extended to any suitable structural representation, thereby providing a powerful, generalizable framework towards structure-based generation.
翻译:由数据驱动的机器学习方法有可能大大加快材料设计速度,超过传统的人类引导方法。这些方法将有助于识别甚至创建新型的晶体材料结构,其中含有一系列特定功能属性,然后在实验室中合成或隔离。对于晶体结构的生成,关键瓶颈在于为机器学习模型开发合适的原子结构指纹或演示,类似于分子生成中使用的基于图形的表示式或SMILES。然而,找到在翻译、旋转和变换方面不易变的数据效率表示法,而对于刻度原子坐标则保持不可逆状态,这仍然是一项持续的挑战。在这里,我们建议了一种解决这一问题的替代方法,即利用现有的不可倒置的表达法,与理想的变换法一起,制定一种算法,通过基于梯度的优化来重建原子坐标坐标模型,这与分子生成的模型或SMILES表示法相类似。在代表空间内生成新材料,而不是在数据高效的刻度的卡斯特里亚空间。在这项工作中,我们将这一最终的生成结构结构走向易被卡泰氏原子坐标坐标,但仍然是一项持续的挑战性挑战。我们建议一种替代的替代方法,即采用一种可更新的模型结构结构结构结构结构结构结构结构,作为可成功生成的模型, 将这种结构结构结构的模型的模型,可以生成的模型,作为一个可复制的模型的模型,可以成功生成方法,作为一个可复制的模型的模型的模型,作为一个可复制的模型,可以生成方法,作为一个可复制的模型,作为一个可复制法。