In recent years, machine learning (ML) has gained significant popularity in the field of chemical informatics and electronic structure theory. These techniques often require researchers to engineer abstract "features" that encode chemical concepts into a mathematical form compatible with the input to machine-learning models. However, there is no existing tool to connect these abstract features back to the actual chemical system, making it difficult to diagnose failures and to build intuition about the meaning of the features. We present ElectroLens, a new visualization tool for high-dimensional spatially-resolved features to tackle this problem. The tool visualizes high-dimensional data sets for atomistic and electron environment features by a series of linked 3D views and 2D plots. The tool is able to connect different derived features and their corresponding regions in 3D via interactive selection. It is built to be scalable, and integrate with existing infrastructure.
翻译:近年来,在化学信息学和电子结构理论领域,机器学习(ML)已获得显著普及,这些技术常常要求研究人员将化学概念编码成与机器学习模型投入相容的数学形式的抽象“地物”,但目前没有工具将这些抽象特征与实际化学系统连接起来,从而难以诊断失败,也难以建立关于这些特征含义的直觉。我们展示了电子Lens,这是一个用于高维空间解析特征的新的可视化工具,用以解决这一问题。该工具通过一系列链接的三维视图和二维图谱将原子和电子环境特征的高维数据集可视化。该工具能够通过互动选择将不同衍生特征及其相应的区域连接到三维中。该工具的构建可以伸缩,并与现有基础设施融合。