Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures $(\mu, \nu)$, a parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures $(\mu, \nu)$ that define optimal transport problems do not arise in isolation but are associated with a context $c$, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures $\left(\mu_i, \nu_i\right)$ tagged with a context label $c_i$. CondOT learns a global map $\mathcal{T}_\theta$ conditioned on context that is not only expected to fit all labeled pairs in the dataset $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$, i.e., $\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$, but should also generalize to produce meaningful maps $\mathcal{T}_\theta\left(c_{\text {new }}\right)$ when conditioned on unseen contexts $c_{\text {new }}$. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately.
翻译:最优输运(OT)理论描述了定义和选择概率测度映射的最有效方式的一般原则。这个理论通常用于估计由一对源概率测度和目标概率测度 $(\mu, \nu)$ 确定的、能有效地映射 $\mu$ 到 $\nu$ 的参数化映射 $T_\theta$。在很多应用中,如预测细胞对治疗的响应,定义最优输运问题的输入/输出数据测度 $(\mu, \nu)$ 并不是单独存在的,而是与上下文 $c$ 相关联的,例如,当比较未经处理和经过处理的细胞人群时,会出现很多这样的数据测度对 $(\mu_i, \nu_i)$,并伴随着一个上下文标签 $c_i$。为了在最优输运的估计中考虑这个上下文,我们引入了 CondOT,一种多任务方法,用于估计条件变量条件下的一组最优输运映射族,使用带有上下文标签的多个数据测度对 $\left(\mu_i, \nu_i\right)$。CondOT 学习一个条件化后的全局映射 $\mathcal{T}_\theta$,不仅适应数据集 $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$ 中的所有标记对,即 $\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$,而且还应该可以产生有意义的映射 $\mathcal{T}_\theta\left(c_{\text {new }}\right)$,当在未见过的上下文 $c_{\text {new }}$ 条件下时。我们的方法利用了部分输入凸神经网络,并提供了一个受高斯近似启发的稳健有效的初始化策略。我们展示了 CondOT 利用单个细胞的扰动效应,可推断出任意组合的基因或治疗干扰效应,仅使用对这些干扰效应独立观察的观测。