This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric model with a probabilistic model in a principled way. For simplicity, the geometric model in the earlier paper was restricted to the three-dimensional case. The present paper removes this restriction, and considers the full $n$-dimensional case. Although the mathematical model is the same, the strategies for computing solutions in the $n$-dimensional case are different, and one of the main purposes of this paper is to develop and analyze these strategies. Another main purpose is to devise techniques for estimating the parameters of the model from sample data, again in $n$ dimensions. We evaluate the solution strategies and the estimation techniques by applying them to two familiar real-world examples: the classical MNIST dataset and the CIFAR-10 dataset.
翻译:本文对差异相似性理论进行了扩展和阐述,该理论最初在ArXiv:1401.2411[cs.LG]中提出。目的是以有原则的方式为组合和编码制定一种算法,将几何模型与概率模型结合起来。为了简单起见,前一份文件中的几何模型仅限于三维案例。本文件取消了这一限制,并考虑了完整的美元-维案例。虽然数学模型是相同的,但美元-维案例的计算解决方案战略是不同的,本文的主要目的之一是制定和分析这些战略。另一个主要目的是设计从抽样数据中估算模型参数的技术,同样以美元维度计算。我们通过将解决方案战略和估算技术应用于两个熟悉的现实世界实例,即典型的MNIST数据集和CIFAR-10数据集,来评估解决方案战略和估算技术。