This work is motivated by the study of local protein structure, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space $\mathcal{P}(\mathbb{R}^2/\mathbb{Z}^2)$ with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions. We extend existing results of the theory of Optimal Transport to the $d$-dimensional flat torus $\mathbb{T}^d=\mathbb{R}^d/\mathbb{Z}^d$, in particular a Central Limit Theorem. Moreover, we assess different techniques for two-sample goodness-of-fit testing for the two-dimensional case, based on the Wasserstein distance. We provide an implentation of these approaches in R. Their performance is illustrated by numerical experiments on synthetic data and protein structure data.
翻译:这项工作的动因是当地蛋白结构研究,该研究由两个不同角度界定,其值取自平面的概率分布。我们的目标是为空间提供 $mathbb{R ⁇ 2/mathbb ⁇ 2}P}(\\mathb{R\\\\\mathb ⁇ 2}),该标准可以量化因蛋白序列变化而导致的地方结构变化,并界定相关的两样优美测试方法。由于它适应空间几何学,我们侧重于瓦西斯坦距离,作为分布之间的一个度量度。我们将最佳运输理论的现有结果推广到美元-维平面的 $\\mathb{T ⁇ d ⁇ mathb{R ⁇ d/\\mathbb ⁇ d ⁇ d$, 特别是一个中央限值理论。此外,我们根据瓦西斯坦距离,我们评估了两维测试的两种模美性测试方法的不同技术。我们在R中提供了这些方法的精度。通过合成数据和蛋白结构的数值实验来说明其性。