AI-driven materials discovery that couples automated experimentation with algorithmic decision-making requires process aware recipe to property predictors that are accurate, calibrated, and physically admissible. We approach this as a reasoning problem with large reasoning models (LRMs). To instill reasoning capability into language models, we curate reasoning traces from a teacher model to train a student model. However, most training pipelines select reasoning traces using binary correctness or learned preference signals that poorly reflect physical admissibility. We introduce Physics-aware Rejection Sampling (PaRS), a training-time trace selection scheme that favors traces consistent with fundamental physics and numerically close to targets, with lightweight halting to control compute. We instantiate our framework with a large student model fine-tuned on traces synthesized by a larger teacher model, and evaluate under matched token budgets against various rejection sampling baselines. Our method improves accuracy and calibration, reduces physics-violation rates, and lowers sampling cost relative to baselines. These results indicate that modest, domain-aware constraints combined with trace-level selection provide a practical path toward reliable, efficient LRMs for process-aware property prediction and closed-loop materials design.
翻译:将自动化实验与算法决策相结合的AI驱动材料发现,需要具备过程感知的配方-性能预测模型,这些模型必须准确、校准良好且物理上可接受。我们将此问题视为一个推理问题,并采用大型推理模型(LRMs)来解决。为了向语言模型注入推理能力,我们通过整理教师模型生成的推理轨迹来训练学生模型。然而,大多数训练流程仅依据二元正确性或习得的偏好信号来选择推理轨迹,这些信号难以反映物理可接受性。我们提出了物理感知拒绝采样(PaRS),这是一种训练时的轨迹选择方案,它优先选择符合基础物理原理且在数值上接近目标的轨迹,并通过轻量级中止机制来控制计算成本。我们通过一个大型学生模型(基于更大教师模型合成的轨迹进行微调)来实例化该框架,并在匹配的令牌预算下,与多种拒绝采样基线方法进行比较评估。相较于基线方法,我们的方法提高了准确性和校准度,降低了物理违规率,并减少了采样成本。这些结果表明,适度的、领域感知的约束与轨迹级选择相结合,为开发可靠、高效的LRMs提供了一条实用路径,可用于过程感知的性能预测和闭环材料设计。