优化的基于运输的机器学习,以匹配流体数据中特定表达模式 (Optimal transport-based machine learning to match specific expression patterns in omics data)

We present two algorithms designed to learn a pattern of correspondence between two data sets in situations where it is desirable to match elements that exhibit a relationship belonging to a known parametric model. In the motivating case study, the challenge is to better understand micro-RNA (miRNA) regulation in the striatum of Huntington's disease (HD) model mice. The two data sets contain miRNA and messenger-RNA (mRNA) data, respectively, each data point consisting in a multi-dimensional profile. The biological hypothesis is that if a miRNA induces the degradation of a target mRNA or blocks its translation into proteins, or both, then the profile of the former should be similar to minus the profile of the latter (a particular form of affine relationship).The algorithms unfold in two stages. During the first stage, an optimal transport plan P and an optimal affine transformation are learned, using the Sinkhorn-Knopp algorithm and a mini-batch gradient descent. During the second stage, P is exploited to derive either several co-clusters or several sets of matched elements.We share codes that implement our algorithms. A simulation study illustrates how they work and perform. A brief summary of the real data application in the motivating case-study further illustrates the applicability and interest of the algorithms.

翻译：我们提出两种算法,目的是在两个数据集之间在适宜匹配显示已知参数模型关系的要素时,学习两个数据集之间的对应模式。在激励性案例研究中,挑战在于如何更好地理解亨廷顿病样小鼠(HD)病状中的微-RNA(MIRNA)规则。两个数据集分别包含MIRNA和送信者-RNA(mRNA)数据,每个数据点由多维剖面组成。生物假设是,如果一个MIRNA导致目标 mRNA退化或将其转换成蛋白质,或两者兼而有之,那么前者的剖面应类似于后者的剖面(一种特定形式)。算法分为两个阶段展开。在第一阶段,利用Sinkhorn-Knopp 算法和微型批量梯级脱落来学习每个数据。在第二阶段,P被利用来得出几个组合或几组相匹配元素的转化。我们分享了应用性能的代码,用于执行真正的算法的缩略图。