In this work clustering schemes for uncertain and structured data are considered relying on the notion of Wasserstein barycenters, accompanied by appropriate clustering indices based on the intrinsic geometry of the Wasserstein space where the clustering task is performed. Such type of clustering approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed clustering schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The advantages and capabilities of the proposed approach and the geodesic criterion performance are illustrated through a simulation study and the implementation in two real world applications: (a) clustering eurozone countries according to their observed government bond yield curves and (b) classifying the areas of a satellite image to certain land uses categories, a standard task in remote sensing.
翻译:在这种关于不确定和结构化数据的工作集群办法中,考虑依靠瓦塞斯特林培训中心的概念,并辅以基于进行集群任务的瓦塞斯坦空间内在几何的适当的集群指数,在观测/实验错误重大(如天文学、生物学、遥感等)或数据性质较复杂的领域,或数据性质较复杂的领域,以及传统的学习算法不适用于或有效处理这些观察方法(如网络数据、间隔数据、高频记录、矩阵数据等)。在这一角度下,每一项观察都通过适当的概率计量方法确定,而拟议的集群办法则依赖通过最佳运输理论的核心技术利用概率计量空间的几何结构的歧视标准,通过模拟研究和在两种真实世界应用软件(a) 将欧元区国家按其观察到的政府债券收益曲线进行分组,(b) 将卫星图像领域划分为某些土地使用类别,这是遥感的一项标准任务,显示了拟议办法和大地学标准性标准的绩效。