We propose new tools for the geometric exploration of data objects taking values in a general separable metric space $(\Omega, d)$. Given a probability measure on $\Omega$, we introduce depth profiles, where the depth profile of an element $\omega\in\Omega$ refers to the distribution of the distances between $\omega$ and the other elements of $\Omega$. Depth profiles can be harnessed to define transport ranks, which capture the centrality of each element in $\Omega$ with respect to the entire data cloud based on the optimal transport maps between the depth profiles. We study the properties of transport ranks and show that they provide an effective device for detecting and visualizing patterns in samples of random objects. Specifically, we study estimates of depth profiles and transport ranks based on samples of random objects and establish the convergence of the empirical estimates to the population targets using empirical process theory. We demonstrate the usefulness of depth profiles and associated transport ranks and visualizations for distributional data through a sample of age-at-death distributions for various countries, for compositional data through energy usage for U.S. states and for network data through New York taxi trips.
翻译:我们建议采用新的工具对数据对象进行几何勘探,以一般可分离的公吨空间值(美元,d)美元。根据对美元的一个概率度量,我们采用深度剖面,其中元素(美元)的深度剖面是指美元与美元的其他元素之间的距离分布。可以利用深度剖面来界定运输等级,根据深度剖面图之间最佳运输图显示每个元素(美元)相对于整个数据云的中心点。我们研究运输等级的特性,并表明它们为随机物体样本中的探测和可视模式提供了有效的装置。具体地说,我们研究根据随机物体样本对深度剖面和运输等级的估计,并利用实证过程理论确定实证估计与人口目标的趋同。我们通过对美国各州的能源使用和通过网络旅行的数据进行编成数据,展示深度剖面剖面剖面、相关运输等级和可视化对分布数据的有用性,我们通过对不同国家的年龄分布分布进行抽样,通过美国各州的能源使用和纽约的航空公司的数据进行。