We devise a theoretical framework and a numerical method to infer trajectories of a stochastic process from samples of its temporal marginals. This problem arises in the analysis of single cell RNA-sequencing data, which provide high dimensional measurements of cell states but cannot track the trajectories of the cells over time. We prove that for a class of stochastic processes it is possible to recover the ground truth trajectories from limited samples of the temporal marginals at each time-point, and provide an efficient algorithm to do so in practice. The method we develop, Global Waddington-OT (gWOT), boils down to a smooth convex optimization problem posed globally over all time-points involving entropy-regularized optimal transport. We demonstrate that this problem can be solved efficiently in practice and yields good reconstructions, as we show on several synthetic and real datasets.
翻译:我们设计了一个理论框架和一个数值方法,从其时间边际的样本中推断出随机过程的轨迹。这个问题在单细胞RNA测序数据的分析中出现,这些数据提供了细胞状态的高维测量,但不能跟踪细胞随时间的轨迹。我们证明了,对于某些随机过程,有可能从每个时刻的时间边际的有限样本中恢复真实的轨迹,并提供了一个在实践中高效的算法。我们开发的方法,全局Waddington-optimal transportation或者简称gWOT,归结为一种涉及熵正则化最优传输的全局光滑凸优化问题。我们证明了这个问题在实践中可以有效解决,并展示了在多个合成和真实数据集上良好地重构结果。