Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.
翻译:Gaussian 混合物模型找到其作为强大工具的位置,主要在组群问题中,但是在特征提取、模式识别、图像分割和一般机器学习方面也有适当的准备。当面临机率匹配问题时,根据不同数据项计算的不同混合物模型可以保持关于数据集结构的关键信息。为了测量或比较混合物模型的结果,Wasserstein距离非常有用,但很难计算混合物分布。在本文中,我们得出了Gaussian混合物模型之间瓦瑟斯坦距离的可能近似值之一,并将其减少到线性问题。此外,还展示了真实世界数据的应用实例。