Machine learning techniques have successfully been used to extract structural information such as the crystal space group from powder X-ray diffractograms. However, training directly on simulated diffractograms from databases such as the ICSD is challenging due to its limited size, class-inhomogeneity, and bias toward certain structure types. We propose an alternative approach of generating synthetic crystals with random coordinates by using the symmetry operations of each space group. Based on this approach, we demonstrate online training of deep ResNet-like models on up to a few million unique on-the-fly generated synthetic diffractograms per hour. For our chosen task of space group classification, we achieved a test accuracy of 79.9% on unseen ICSD structure types from most space groups. This surpasses the 56.1% accuracy of the current state-of-the-art approach of training on ICSD crystals directly. Our results demonstrate that synthetically generated crystals can be used to extract structural information from ICSD powder diffractograms, which makes it possible to apply very large state-of-the-art machine learning models in the area of powder X-ray diffraction. We further show first steps toward applying our methodology to experimental data, where automated XRD data analysis is crucial, especially in high-throughput settings. While we focused on the prediction of the space group, our approach has the potential to be extended to related tasks in the future.
翻译:机器学习技术已成功地用于从粉末X射线衍射图中提取结构信息,例如晶体空间群。然而,直接在ICSD数据库等模拟衍射图上进行训练具有挑战性,因为该库的规模有限、类不均匀并且偏向某些结构类型。我们提出了一种替代方法,即使用每个空间群的对称操作生成随机坐标的人工合成晶体。基于此方法,我们演示了对高达每小时数百万个唯一的即时生成的合成衍射图进行深度ResNet样式模型在线训练的能力。对于我们选择的空间群分类任务,我们在来自大部分空间群的未见过的ICSD结构类型上达到了79.9%的测试准确率。这超过了当前最先进方法直接在ICSD晶体上进行训练的56.1%准确率。我们的结果表明,在ICSD粉末衍射图中,可以使用人工合成晶体提取结构信息,这使得在粉末X射线衍射领域应用非常大的最先进机器学习模型成为可能。我们进一步展示了将我们的方法应用于实验数据的第一步,在高通量环境中自动化XRD数据分析至关重要。虽然我们专注于预测空间群,但我们的方法具有将来拓展到相关任务的潜力。