Learning meaningful representations of data that can address challenges such as batch effect correction, data integration and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we identify the mathematical principle that unites these challenges: learning a representation that is marginally independent of a condition variable. We therefore propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty to enforce this independence. This penalty is defined in terms of mixtures of the variational posteriors themselves, unlike prior work which uses external discrepancy measures such as MMD to ensure independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, especially when there is complex global structure in latent space. We further demonstrate state of the art performance on a number of real-world problems, including the challenging tasks of aligning human tumour samples with cancer cell-lines and performing counterfactual inference on single-cell RNA sequencing data. Incidentally, we find parallels with the fair representation learning literature, and demonstrate CoMP has competitive performance in learning fair yet expressive latent representations.
翻译:我们采用条件VAE框架,我们确定将这些挑战结合在一起的数学原则:学习一种与条件变量略有独立的代表,因此,我们提议采用“ Posides 相矛盾混合法”方法,用一种新颖的错配惩罚来实施这一独立性。这种惩罚的定义是变异后人本身的混合物,不同于以往使用MMD等外部差异措施确保潜空独立的工作。我们表明,COMP与以往的做法相比具有有吸引力的理论属性,特别是在潜空存在复杂的全球结构的情况下。我们进一步展示了一些现实世界问题的艺术表现状况,包括将人类肿瘤样本与癌症细胞线相匹配和对单细胞RNA测序数据进行反事实推断的艰巨任务。我们发现与公平代表性学习文献的相似之处,并展示了COMP在学习公平而明确的潜在表现方面的竞争性表现。