Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has use cases in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms has impacted on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, possessing a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy states. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing the potential energy of a conformation, and (ii) novel message passing and pooling layers for processing and making predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the very same transfer learning approach can be used to group, in an unsupervised way, various secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.
翻译:分子系统的预测结构和能量特性是分子模拟的基本任务之一,它利用了化学、生物学和医学方面的案例。在过去十年中,机器学习算法的出现对各种任务的分子模拟产生了影响,包括原子系统的财产预测。在本文件中,我们提出了将从简单的分子系统获得的知识转移到一个更为复杂的系统的新颖方法,拥有大量原子和自由度。特别是,我们侧重于高低自由能源国家的分类。我们的方法依赖于(一) 分子的新超光谱表示,将所有相关信息编码,以说明符合性的潜在能量,以及(二) 新的信息传递和汇集层,用于处理和预测此类超结构数据。尽管问题复杂,但我们的结果显示,从3-八九到Deca-aline系统的学习为0.92。此外,我们展示了非常相同的转移学习方法,可以以未受监督的方式,将分子的超高超光谱表示,各种次层结构结构结构结构结构结构结构可以代表我们所设计的生物循环系统。