How can we tell complex point clouds with different small scale characteristics apart, while disregarding global features? Can we find a suitable transformation of such data in a way that allows to discriminate between differences in this sense with statistical guarantees? In this paper, we consider the analysis and classification of complex point clouds as they are obtained, e.g., via single molecule localization microscopy. We focus on the task of identifying differences between noisy point clouds based on small scale characteristics, while disregarding large scale information such as overall size. We propose an approach based on a transformation of the data via the so-called Distance-to-Measure (DTM) function, a transformation which is based on the average of nearest neighbor distances. For each data set, we estimate the probability density of average local distances of all data points and use the estimated densities for classification. While the applicability is immediate and the practical performance of the proposed methodology is very good, the theoretical study of the density estimators is quite challenging, as they are based on i.i.d. observations that have been obtained via a complicated transformation. In fact, the transformed data are stochastically dependent in a non-local way that is not captured by commonly considered dependence measures. Nonetheless, we show that the asymptotic behaviour of the density estimator is driven by a kernel density estimator of certain i.i.d. random variables by using theoretical properties of U-statistics, which allows to handle the dependencies via a Hoeffding decomposition. We show via a numerical study and in an application to simulated single molecule localization microscopy data of chromatin fibers that unsupervised classification tasks based on estimated DTM-densities achieve excellent separation results.
翻译:在忽略全球特征的同时,我们如何区分具有不同规模的复杂点云层,而忽略全球特征?我们能否找到一种合适的数据转换方法,从而允许在这种意义上的差异与统计保证有区别?在本文中,我们考虑在获得复杂点云层时对其进行分析和分类,例如,通过单一分子本地化显微镜,我们侧重于根据小规模特性找出噪音点云层之间的差异,同时不考虑总体大小等大尺度信息。我们建议一种基于通过所谓的“远程到计量(DTM)”函数转换数据的方法,这种转换基于最近的相邻距离的平均值。在每套数据集中,我们估计所有数据点平均本地距离的概率密度,并使用估计的密度。虽然这个方法的可适用性和实用性是非常良好的,但对密度估量的理论研究是相当具有挑战性的,因为通过一个复杂的变异性(I.i.d.) 观测结果可以使数据转换为快速的数值。事实上,变异性数据是通过近距离的平均值应用,我们通过非地层性化的统计性能显示,我们用一种通常的直系的统计方式,通过一种不测测测测测测的直系的特性,从而显示,从一个不测测测的直系的度的特性,通过不依地的特性的度的特性的特性的测量的特性,通过一种方法显示一种不测测的特性,通过一种不测测测的特性,通过一种不测测测测的测的测的规律的测的特性的特性,通过不测的特性,通过不测地的测地的特性是显示的规律性能。