Representation learning on multi-omics data is challenging due to extreme dimensionality, modality heterogeneity, and cohort-specific batch effects. While pre-trained transformer backbones have shown broad generalization capabilities in biological sequence modeling, their application to multi-omics integration remains underexplored. We present MoRE (Multi-Omics Representation Embedding), a framework that repurposes frozen pre-trained transformers to align heterogeneous assays into a shared latent space. Unlike purely generative approaches, MoRE employs a parameter-efficient fine-tuning (PEFT) strategy, prioritizing cross-sample and cross-modality alignment over simple sequence reconstruction. Specifically, MoRE attaches lightweight, modality-specific adapters and a task-adaptive fusion layer to the frozen backbone. It optimizes a masked modeling objective jointly with supervised contrastive and batch-invariant alignment losses, yielding structure-preserving embeddings that generalize across unseen cell types and platforms. We benchmark MoRE against established baselines, including scGPT, scVI, and Harmony with scArches, evaluating integration fidelity, rare population detection, and modality transfer. Our results demonstrate that MoRE achieves competitive batch robustness and biological conservation while significantly reducing trainable parameters compared to fully fine-tuned models. This work positions MoRE as a practical step toward general-purpose omics foundation models.
翻译:多组学数据的表征学习面临极端维度、模态异质性以及队列特异性批次效应的挑战。尽管预训练Transformer主干在生物序列建模中展现出广泛的泛化能力,但其在多组学整合中的应用仍待深入探索。本文提出MoRE(多组学表征嵌入)框架,该框架通过重新利用冻结的预训练Transformer,将异质检测数据对齐至共享潜在空间。与纯生成式方法不同,MoRE采用参数高效微调策略,优先考虑跨样本与跨模态对齐而非简单序列重构。具体而言,MoRE在冻结主干上附加轻量级模态特定适配器与任务自适应融合层,通过联合优化掩码建模目标、监督对比损失及批次不变对齐损失,生成能够保持结构且泛化至未见细胞类型与平台的嵌入表征。我们将MoRE与现有基线方法(包括scGPT、scVI及结合scArches的Harmony)进行基准比较,评估其整合保真度、稀有群体检测能力与模态迁移性能。实验结果表明,相较于全参数微调模型,MoRE在显著减少可训练参数的同时,实现了具有竞争力的批次鲁棒性与生物学保守性。本研究将MoRE定位为迈向通用组学基础模型的实践性进展。