Learning disentangled representations is a fundamental task in multi-modal learning. In modern applications such as single-cell multi-omics, both shared and modality-specific features are critical for characterizing cell states and supporting downstream analyses. Ideally, modality-specific features should be independent of shared ones while also capturing all complementary information within each modality. This tradeoff is naturally expressed through information-theoretic criteria, but mutual-information-based objectives are difficult to estimate reliably, and their variational surrogates often underperform in practice. In this paper, we introduce IndiSeek, a novel disentangled representation learning approach that addresses this challenge by combining an independence-enforcing objective with a computationally efficient reconstruction loss that bounds conditional mutual information. This formulation explicitly balances independence and completeness, enabling principled extraction of modality-specific features. We demonstrate the effectiveness of IndiSeek on synthetic simulations, a CITE-seq dataset and multiple real-world multi-modal benchmarks.
翻译:学习解缠表示是多模态学习中的一项基础任务。在现代应用中,例如单细胞多组学分析,共享特征和模态特定特征对于表征细胞状态和支持下游分析均至关重要。理想情况下,模态特定特征应与共享特征相互独立,同时捕获每个模态内的所有互补信息。这种权衡自然可以通过信息论准则来表达,但基于互信息的目标难以可靠估计,其变分替代方法在实践中往往表现不佳。本文提出IndiSeek,一种新颖的解缠表示学习方法,通过结合独立性增强目标与计算高效的重构损失(该损失约束了条件互信息)来解决这一挑战。该公式明确平衡了独立性和完备性,实现了模态特定特征的原则性提取。我们在合成模拟、CITE-seq数据集以及多个真实世界多模态基准测试中验证了IndiSeek的有效性。