Synthesis procedures play a critical role in materials research, as they directly affect material properties. With data-driven approaches increasingly accelerating materials discovery, there is growing interest in extracting synthesis procedures from scientific literature as structured data. However, existing studies often rely on rigid, domain-specific schemas with predefined fields for structuring synthesis procedures or assume that synthesis procedures are linear sequences of operations, which limits their ability to capture the structural complexity of real-world procedures. To address these limitations, we adopt PROV-DM, an international standard for provenance information, which supports flexible, graph-based modeling of procedures. We present MatPROV, a dataset of PROV-DM-compliant synthesis procedures extracted from scientific literature using large language models. MatPROV captures structural complexities and causal relationships among materials, operations, and conditions through visually intuitive directed graphs. This representation enables machine-interpretable synthesis knowledge, opening opportunities for future research such as automated synthesis planning and optimization.
翻译:合成流程在材料研究中起着关键作用,因其直接影响材料性能。随着数据驱动方法日益加速材料发现,从科学文献中提取结构化合成流程数据的需求不断增长。然而,现有研究通常依赖具有预定义字段的刚性领域特定模式来结构化合成流程,或假定合成流程为线性操作序列,这限制了其捕捉真实世界流程结构复杂性的能力。为应对这些局限性,我们采用溯源信息国际标准PROV-DM,该标准支持基于图的灵活流程建模。我们提出MatPROV——一个通过大语言模型从科学文献中提取的、符合PROV-DM标准的合成流程数据集。MatPROV通过视觉直观的有向图捕捉材料、操作与条件之间的结构复杂性和因果关系。这种表示形式实现了机器可解释的合成知识,为自动化合成规划与优化等未来研究开辟了新机遇。