Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g.\ FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a log-probability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits $l$ and to aggregate them into a global log-density $\log \hat{P}_θ(\boldsymbol{r})$ whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from $\arg\max_c l_{ac}$, and the $l$ values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice--water interfaces, and shock-compressed Ti.
翻译:原子尺度模拟生成大量含噪声的结构数据,但以通用、鲁棒且可解释的方式提取相标签、序参量(OPs)及缺陷信息仍具挑战性。现有工具如PTM和CNA仅限于少量手工构建的晶格(例如FCC/BCC/HCP),在强热无序或缺陷条件下性能下降,且产生基于模板的硬标签,缺乏每个原子的概率或置信度评分。本文提出一种对数概率基础模型,将去噪、相分类和OP提取统一在单一概率框架内。我们复用MACE-MP基础原子间势能模型,将其应用于映射至AFLOW原型的晶体结构,训练其预测每个原子、每个相的对数几率$l$,并将其聚合为全局对数密度$\log \hat{P}_θ(\boldsymbol{r})$,其梯度定义了一个保守的得分场。去噪对应于该学习到的对数密度的梯度上升,相标签由$\arg\max_c l_{ac}$导出,而$l$值可作为连续、对缺陷敏感且可解释的OPs,用于量化与理想相之间的欧几里得距离。我们展示了该模型在数百种原型上的通用性、在强热无序和缺陷诱导无序下的鲁棒性,以及对复杂系统(如冰多晶型、冰-水界面和冲击压缩钛)的精确处理能力。