Multi-omics studies often rely on pathway enrichment to interpret heterogeneous molecular changes, but pathway enrichment (PE)-based workflows inherit structural limitations of pathway resources, including curation lag, functional redundancy, and limited sensitivity to molecular states and interventions. Although recent work has explored using large language models (LLMs) to improve PE-based interpretation, the lack of a standardized benchmark for end-to-end multi-omics pathway mechanism elucidation has largely confined evaluation to small, manually curated datasets or ad hoc case studies, hindering reproducible progress. To address this issue, we introduce BIOME-Bench, constructed via a rigorous four-stage workflow, to evaluate two core capabilities of LLMs in multi-omics analysis: Biomolecular Interaction Inference and end-to-end Multi-Omics Pathway Mechanism Elucidation. We develop evaluation protocols for both tasks and conduct comprehensive experiments across multiple strong contemporary models. Experimental results demonstrate that existing models still exhibit substantial deficiencies in multi-omics analysis, struggling to reliably distinguish fine-grained biomolecular relation types and to generate faithful, robust pathway-level mechanistic explanations.
翻译:多组学研究通常依赖通路富集来解释异质性分子变化,但基于通路富集的工作流程继承了通路资源的结构性局限,包括人工标注滞后、功能冗余以及对分子状态与干预措施的敏感性有限。尽管近期研究探索了利用大语言模型来改进基于通路富集的解释,但由于缺乏面向端到端多组学通路机制阐明的标准化基准,评估工作大多局限于小型人工标注数据集或临时性案例研究,阻碍了可重复的进展。为解决此问题,我们引入了通过严格四阶段工作流程构建的BIOME-Bench,用以评估大语言模型在多组学分析中的两项核心能力:生物分子相互作用推断与端到端多组学通路机制阐明。我们为这两项任务开发了评估方案,并在多个当前主流强模型上进行了全面实验。实验结果表明,现有模型在多组学分析中仍存在显著不足,难以可靠地区分细粒度生物分子关系类型,也无法生成忠实且稳健的通路层面机制解释。